Monday, July 27, 2020

Configuring mediated devices (Part 1)

vfio-mdev has become popular over the last few years for assigning certain classes of devices to guests. On the s390x side, vfio-ccw and vfio-ap are using the vfio-mdev framework for making channel devices and crypto adapters accessible to guests.
This and a follow-up article aim to give an overview of the infrastructure, how to set up and manage devices, and how to use tooling for this.

What is a mediated device?

A general overview

Mediated devices grew out of the need to build upon the existing vfio infrastructure in order to support more fine grained management of resources. Some of the initial use cases included GPUs and (maybe somewhat surprisingly) s390 channel devices.

When using the mediated device (mdev) API, common tasks are performed in the mdev core driver (like device management), while device-specific tasks are done in a vendor driver. Current in-kernel examples of vendor drivers are the Intel vGPU driver, vfio-ccw, and vfio-ap.

Examples on s390

vfio-ccw

vfio-ccw can be used to assign channel devices. It is pretty straightforward: vfio-ccw is an alternative driver for I/O subchannels, and a single mediated device per subchannel is supported.

vfio-ap

vfio-ap can be used to assign crypto cards/queues (APQNs). It is a bit more involved, requiring prior setup on the ap bus level and configuration of a 'matrix' device. Complex relationships between the resources that can be assigned to different guests exist. Configuration-wise, this is probably the most complex mediated device available today.

Configuring a mediated device: the manual way

Mediated devices can be configured manually via sysfs operations. This is a good way to see what actually happens, but probably not what you want to do as a general administration task. Tools to help here will be introduced in part 2 of this article.

I will show the steps for both vfio-ccw and vfio-ap, just to show two different approaches. (Both examples are also used in the QEMU documentation, in case this looks familiar.)

Binding to the correct driver

vfio-ccw

Assume you want to use a DASD with the device bus ID 0.0.2b09. As vfio-ccw operates on the subchannel level, you first need to locate the subchannel for this device:

   [root@host ~]# lscss | grep 0.0.2b09 | awk '{print $2}'
  0.0.0313

(A word of caution: a device is not guaranteed to use the same subchannel at all times; on LPARs, the subchannel number will usually be stable, but z/VM -- and QEMU -- assign subchannel numbers in a consecutive order. If you don't get any hotplug events for a device, the subchannel number will stay stable for at least as long as the guest is running, though.)

Now you need to unbind the subchannel device from the default I/O subchannel driver and bind it to the vfio-ccw driver (make sure the device is not in use!):

    [root@host ~]# echo 0.0.0313 > /sys/bus/css/devices/0.0.0313/driver/unbind
    [root@host ~]# echo 0.0.0313 > /sys/bus/css/drivers/vfio_ccw/bind

vfio-ap

You need to perform some preliminary configuration of your crypto adapters before you can use any of them with vfio-ap. If nothing different has been set up, a crypto adapter will only bind to the default device drivers, and you cannot use it via vfio-ap. In order to be able to bind an adapter to vfio-ap, you first need to modify the /sys/bus/ap/apmask and /sys/bus/ap/aqmask entries. Both are basically bitmasks that indicate that the matching adapter IDs respectively queue indices can only be bound to the default drivers. If you want to use a certain APQN via vfio-ap, you need to unset the respective bits.

Let's assume you want to assign the APQNs (5, 4) and (5, ab). First, you need to make the adapter and the domains available to non-default drivers:

  [root@host ~]#  echo -5 > /sys/bus/ap/apmask
  [root@host ~]#  echo -4, -0xab > /sys/bus/ap/aqmask

This should result in the devices being bound to the vfio_ap driver (you can verify this by looking for them under /sys/bus/ap/drivers/vfio_ap/).

Create a mediated device

The basic workflow is "pick a uuid, create a mediated device identified by it".

vfio-ccw

For vfio-ccw, the two steps of the basic workflow are enough:

  [root@host ~]# uuidgen
  7e270a25-e163-4922-af60-757fc8ed48c6
  [root@host ~]# echo "7e270a25-e163-4922-af60-757fc8ed48c6" > \
    /sys/bus/css/devices/0.0.0313/mdev_supported_types/vfio_ccw-io/create

vfio-ap

For vfio-ap, you need a more involved approach. The uuid is used to create a mediated device under the 'matrix' device:

  [root@host ~] # uuidgen
  669d9b23-fe1b-4ecb-be08-a2fabca99b71
 [root@host ~]# echo "669d9b23-fe1b-4ecb-be08-a2fabca99b71" > /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/create

This mediated device will need to collect all APQNs that you want to pass to a specific guest. For that, you need to use the assign_adapter, assign_domain, and possibly assign_control_domain attributes (we'll ignore control domains for simplicity's sake.) All attributes have a companion unassign_ attribute to remove adapters/domains from the mediated device again. You can only assign adapters/domains that you removed from apmask/aqmask in the previous step. To follow up on our example again:

  [root@host ~]# echo 5 > /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/669d9b23-fe1b-4ecb-be08-a2fabca99b71/assign_adapter
 [root@host ~]# echo 4 > /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/669d9b23-fe1b-4ecb-be08-a2fabca99b71/assign_domain
 [root@host ~]# echo 0xab > /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/669d9b23-fe1b-4ecb-be08-a2fabca99b71/assign_domain

If you want to make sure that the mediated device is set up correctly, check via

  [root@host ~]# cat /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/669d9b23-fe1b-4ecb-be08-a2fabca99b71/matrix
  05.0004
  05.00ab

Configuring QEMU/libvirt

Your mediated device is now ready to be passed to a guest.

vfio-ccw

Let's assume you want the device to show up as device 0.0.1234 in the guest.

For the QEMU command line, use

-device vfio-ccw,devno=fe.0.1234,sysfsdev=\
    /sys/bus/mdev/devices/7e270a25-e163-4922-af60-757fc8ed48c6

For libvirt, use the following XML snippet in the <devices> section:

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ccw'>
  <source>
    <address uuid='7e270a25-e163-4922-af60-757fc8ed48c6'/>
  </source>
  <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x1234'/>
</hostdev>

vfio-ap

Any APQNs will show up in the guest exactly as they show up in the host (i.e., no remapping is possible.)

For the QEMU command line, use

-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/669d9b23-fe1b-4ecb-be08-a2fabca99b71

For libvirt, use the following XML snippet in the <devices> section:

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
  <source>
    <address uuid='669d9b23-fe1b-4ecb-be08-a2fabca99b71'/>
  </source>
</hostdev>

Tooling

All this manual setup is a bit tedious; the next article in this series will look at some of the tooling that is available for mediated devices.