Thursday, March 30, 2017
Oldies but Goldies: Channel I/O KVM Forum 2012 talk
Some of the information has been superseded in the meanwhile, but the slides from my talk at the 2012 KVM Forum contain some information that may still be interesting. (Sadly, no video of the talk was recorded.)
Tuesday, March 28, 2017
Channel I/O: Talking to devices
Having a nice set of channel devices available to your OS is all fine and good; but how do you actually talk to them? This post attempts to give a high-level overview, while also explaining some more acronyms.
Let's look again at the example configuration from the last post:
Device 0.0.0042 is accessed via channel path 0, and subchannel 0.0.0001 is used as a means to address it.
The access (channel path) is configured in the hypervisor (or in the hardware definitions). The subchannel is what the OS will use as a target for I/O instructions and how it can associate I/O interrupts with the device they are for.
I/O instructions? There's a whole zoo of them, but they share some characteristics:
ccws consist of three parts:
The operating system will assemble a ccw: The command code will be 0xe4 for SENSE ID, and the data address will point to a location wherethe OS wants to have the obtained information. The OS will also assemble a so-called ORB (operation request block), which, amongst other things, points to the assembled ccw (respectively the first one in a chain). This ORB and the subchannel id are the two parameters for the SSCH instruction. If all goes well, the OS will receive a condition code 0 and knows that it will be signalled asynchronously once the channel program has been processed (successfully or with errors)1.
Processing of the actual channel command is done asynchronously by real hardware (QEMU does it synchronously for simplicity reasons). The result is that the wanted data is put into the memory area refered to by the ccw2. Subsequently, the subchannel is made status pending: Information is ready for retrieval by the OS.
Usually, the OS wants to have a notification that the subchannel became status pending; this is done via an I/O interrupt. I/O interrupts on s390 carry extra status which is written to the low memory area of the cpu receiving the interrupt; amongst other things, this status contains the subchannel id.
Next, the OS needs to actually retreive the status information: This is done via the TSCH instruction, which in turn makes the subchannel no longer status pending and ready for the next I/O request via SSCH. The status contains enough information for the OS to determine whether the request was successful (and the sense id information has been stored), or whether there was an error.3
Of course, this is all only scratching at the surface of channel programs; interested readers can peek at the Linux kernel and QEMU to get a feel for both parts or at the Principles of Operation for the whole story.4
1. In the Linux source code, you'll find this under drivers/s390/cio/↩
2. In the QEMU source code, you'll find channel command interpretation under hw/s390x/css.c↩
3. Again, you'll find this under drivers/s390/cio/ in the Linux source code↩
4. Command chaining, channel path management, I/O instructions to terminate a channel program are just some of the interesting topics.↩
Let's look again at the example configuration from the last post:
Device Subchan. DevType CU Type Use PIM PAM POM CHPIDsThe second device is the virtio-blk device 0.0.0042 on subchannel 0.0.0001, having channel path 0. Being virtio, this is a very simplified variation of what you'd see on real hardware (although this also can be a benefit in some way). Think of it as the following:
----------------------------------------------------------------------
0.0.0000 0.0.0000 0000/00 3832/01 yes 80 80 ff 00000000 00000000
0.0.0042 0.0.0001 0000/00 3832/02 yes 80 80 ff 00000000 00000000
Device 0.0.0042 is accessed via channel path 0, and subchannel 0.0.0001 is used as a means to address it.
The access (channel path) is configured in the hypervisor (or in the hardware definitions). The subchannel is what the OS will use as a target for I/O instructions and how it can associate I/O interrupts with the device they are for.
I/O instructions? There's a whole zoo of them, but they share some characteristics:
- They take a subchannel identifier as parameter.
- They are privileged: I.e., on a Linux system, they can only be issued by the kernel and not from user space.
- START SUBCHANNEL (SSCH) - start a channel program
- TEST SUBCHANNEL (TSCH) - retrieve subchannel status
ccws consist of three parts:
- The command. This falls into the categories of read (read data from the device), write (write data to the device) or control (for example, rewinding a tape). An 8-bit value.
- The flags, which control error handling or program flow. I'll ignore them for simplicity here.
- The data address. This is an address in memory where data is written to (read) or read from (write).
The operating system will assemble a ccw: The command code will be 0xe4 for SENSE ID, and the data address will point to a location wherethe OS wants to have the obtained information. The OS will also assemble a so-called ORB (operation request block), which, amongst other things, points to the assembled ccw (respectively the first one in a chain). This ORB and the subchannel id are the two parameters for the SSCH instruction. If all goes well, the OS will receive a condition code 0 and knows that it will be signalled asynchronously once the channel program has been processed (successfully or with errors)1.
Processing of the actual channel command is done asynchronously by real hardware (QEMU does it synchronously for simplicity reasons). The result is that the wanted data is put into the memory area refered to by the ccw2. Subsequently, the subchannel is made status pending: Information is ready for retrieval by the OS.
Usually, the OS wants to have a notification that the subchannel became status pending; this is done via an I/O interrupt. I/O interrupts on s390 carry extra status which is written to the low memory area of the cpu receiving the interrupt; amongst other things, this status contains the subchannel id.
Next, the OS needs to actually retreive the status information: This is done via the TSCH instruction, which in turn makes the subchannel no longer status pending and ready for the next I/O request via SSCH. The status contains enough information for the OS to determine whether the request was successful (and the sense id information has been stored), or whether there was an error.3
Of course, this is all only scratching at the surface of channel programs; interested readers can peek at the Linux kernel and QEMU to get a feel for both parts or at the Principles of Operation for the whole story.4
1. In the Linux source code, you'll find this under drivers/s390/cio/↩
2. In the QEMU source code, you'll find channel command interpretation under hw/s390x/css.c↩
3. Again, you'll find this under drivers/s390/cio/ in the Linux source code↩
4. Command chaining, channel path management, I/O instructions to terminate a channel program are just some of the interesting topics.↩
Tuesday, March 21, 2017
Channel I/O: What's in a channel subsystem?
When you start trying to get familiar with channel I/O and its concepts, one thing you notice is usually a host of very similar-sounding acronyms that are easily confused. The easiest way to get a hold of this is probably to look at a small machine started by QEMU and to examine what a Linux guest sees.
So, let's start with the following command line:
s390x-softmmu/qemu-system-s390x -machine s390-ccw-virtio,accel=kvm -m 1024 -nographic -drive file=/dev/dasdb,if=none,id=drive-virtio-disk0,format=raw,serial=ccwdasd1,cache=none -device virtio-blk-ccw,devno=fe.0.0042,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,scsi=off
(Note that this assumes you're running an s390x system and have a bootable system on /dev/dasdb.)
This will start up a machine with two channel devices: one virtio-blk (as specified on the command line) and one virtio-net (always autogenerated unless explicitly turned off).
Let's log into the guest via the console1 and examine what channel devices Linux sees:
Let's go through this information column-by-column.
Device is the identifier for the device, which is unique guest-wide. The xx.y.zzzz format (often called bus id) is specific to Linux (and has leaked over into QEMU) and is made up of the following elements:
The devno basically belongs to the device; cssid and ssid indicate the addressing within the channel subsystem, which is why we encounter them again in the next id, Subchan.
This is the identifier for the subchannel, which is basically the means to actually address the device. It again uses the xx.y.zzzz format and is made up of the following elements:
In contrast to the bus id for a device (which is a Linux and QEMU construct), the bus id for a subchannel actually has an equivalent in the architecture: the subchannel-identification word (often referred to as schid in Linux and QEMU), which is basically a 32 bit value composed of the cssid, the ssid, and the subchannel number. This is used to address a device via a certain subchannel by the various channel I/O related instructions.
The next two columns, DevType and CU Type, are part of the self description element of channel devices: The concept is that the operating system asks the device nicely to identify itself and the device responds with information about its type and what it can do. The device and the control unit are, in principle, two separate, cascaded entities; for virtio purposes, you can think of the device as the virtio backend (like the virtio-blk device) and of the control unit as the virtio proxy device (like the pci device used to access virtio devices on other platforms). That's also the reason why the device type is always zero for virtio devices. The control unit type is of the form aaaa/bb and consists of the following elements:
The next column, Use, points to a big difference from other I/O architectures: In order to be able to use a subchannel to talk to a device, the operating system first needs to enable it. For virtio devices, this is done by the Linux driver by default (see the 'yes' for all devices); for other device types, this needs to be triggered by Linux user space (which implies that you can't simply go ahead and use a device, you always need to do some kind of setup).
The last four columns, PIM, PAM, POM and CHPIDs, deal with channel paths: An issue which is completely irrelevant for QEMU guests, but very interesting on real hardware. Just a quick overview:
1. A VT220 compatible console via SCLP is automatically generated.↩
2. Which, in hindsight, turned out to not be the cleverest choice - see the confusing output of lscss.↩
So, let's start with the following command line:
s390x-softmmu/qemu-system-s390x -machine s390-ccw-virtio,accel=kvm -m 1024 -nographic -drive file=/dev/dasdb,if=none,id=drive-virtio-disk0,format=raw,serial=ccwdasd1,cache=none -device virtio-blk-ccw,devno=fe.0.0042,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,scsi=off
(Note that this assumes you're running an s390x system and have a bootable system on /dev/dasdb.)
This will start up a machine with two channel devices: one virtio-blk (as specified on the command line) and one virtio-net (always autogenerated unless explicitly turned off).
Let's log into the guest via the console1 and examine what channel devices Linux sees:
[root@localhost ~]# lscss
Device Subchan. DevType CU Type Use PIM PAM POM CHPIDs
----------------------------------------------------------------------
0.0.0000 0.0.0000 0000/00 3832/01 yes 80 80 ff 00000000 00000000
0.0.0042 0.0.0001 0000/00 3832/02 yes 80 80 ff 00000000 00000000
Let's go through this information column-by-column.
Device is the identifier for the device, which is unique guest-wide. The xx.y.zzzz format (often called bus id) is specific to Linux (and has leaked over into QEMU) and is made up of the following elements:
- The channel subsystem id (cssid) xx (here: 0, as on all current Linux systems)
- The subchannel set id (ssid) y (here: 0, can be any value from 0-3 on current Linux systems)
- The device number (devno) zzzz (here: 0000 respectively 0042, can be any value from 0-0xffff)
- 0.0.0000 (the virtio-net device) has been autogenerated.
- 0.0.0042 (the virtio-blk device) has been specified on the command line.
The devno basically belongs to the device; cssid and ssid indicate the addressing within the channel subsystem, which is why we encounter them again in the next id, Subchan.
This is the identifier for the subchannel, which is basically the means to actually address the device. It again uses the xx.y.zzzz format and is made up of the following elements:
- The cssid xx (same as for the device)
- The ssid y (again, the same as for the device)
- The subchannel number zzzz (here: 0000 respectively 0001, generally not the same as the devno, although it can be any value from 0-0xffff as well)
In contrast to the bus id for a device (which is a Linux and QEMU construct), the bus id for a subchannel actually has an equivalent in the architecture: the subchannel-identification word (often referred to as schid in Linux and QEMU), which is basically a 32 bit value composed of the cssid, the ssid, and the subchannel number. This is used to address a device via a certain subchannel by the various channel I/O related instructions.
The next two columns, DevType and CU Type, are part of the self description element of channel devices: The concept is that the operating system asks the device nicely to identify itself and the device responds with information about its type and what it can do. The device and the control unit are, in principle, two separate, cascaded entities; for virtio purposes, you can think of the device as the virtio backend (like the virtio-blk device) and of the control unit as the virtio proxy device (like the pci device used to access virtio devices on other platforms). That's also the reason why the device type is always zero for virtio devices. The control unit type is of the form aaaa/bb and consists of the following elements:
- The type aaaa (a value from 0-0xffff; 0x3832 denotes a virtio-ccw control unit)
- The model bb (a value from 0-0xff; for virtio devices, this is the device id as specified by the virtio standard)
The next column, Use, points to a big difference from other I/O architectures: In order to be able to use a subchannel to talk to a device, the operating system first needs to enable it. For virtio devices, this is done by the Linux driver by default (see the 'yes' for all devices); for other device types, this needs to be triggered by Linux user space (which implies that you can't simply go ahead and use a device, you always need to do some kind of setup).
The last four columns, PIM, PAM, POM and CHPIDs, deal with channel paths: An issue which is completely irrelevant for QEMU guests, but very interesting on real hardware. Just a quick overview:
- PIM (path installed mask), PAM (path available mask) and POM (path operational mask) are all 8 bit values corresponding bit-by-bit to one of eight channel paths. If the corresponding bit is set in all of the three masks, the channel path can be used for I/O.
- CHPIDs are channel path identifiers: Each channel path has an id from 0-0xff, which is unique combined with the relevant cssid. For virtio devices, there's only one valid channel path with the id 02.
[root@localhost ~]# lschp
CHPID Vary Cfg. Type Cmg Shared PCHID
============================================
0.00 1 - 32 - - -
- CHPID is the channel-path identifier of the form xx.nn, where xx is the cssid and nn the chpid. This is always 0.00 on virtio-only guests.
- Vary means that the channel path is online to the guest. You don't want to change this for the only path.
- Type is the channel-path type. 0x32 is a reserved type for the virtio virtual channel path.
1. A VT220 compatible console via SCLP is automatically generated.↩
2. Which, in hindsight, turned out to not be the cleverest choice - see the confusing output of lscss.↩
Labels:
Channel I/O
Subscribe to:
Posts (Atom)