Wednesday, January 22, 2020

Channel Measurements: A Quick Overview

The s390 channel subsystem can gather some statistics on I/O performance for you, which might be useful if you try to figure out why something is not performing as well as you'd expect it to be. From a QEMU/KVM perspective, this is currently mainly useful on the host.

Channel monitoring for ccw devices

The first kind of channel measurements is those collected per subchannel. For a detailed overview of what actually happens there, turn to the Principles of Operation, Chapter 17 ("I/O Support Functions"), "Channel Monitoring". I'll cover here what will most likely be of interest to people running a Linux (host) system.

Enabling channel measurements

If you a running a non-vintage machine (i.e. a z990 or later), you will not need a system-wide setup. Older machines should be fine as well, if you do not want to measure more than 1024 devices.

To enable measurements for a specific ccw device (say, 0.0.1234), simply issue:

chccwdev -a cmb_enable=1 0.0.1234

Measurements collected

Under /sys/bus/ccw/device/0.0.1234/, you should now have a new subdirectory called cmf, which contains some files. For a system that has been running for some time, the contents may look something like the following:

head cmf/*
==> cmf/avg_control_unit_queuing_time <==
0
==> cmf/avg_device_active_only_time <==
0
==> cmf/avg_device_busy_time <==
0
==> cmf/avg_device_connect_time <==
829031
==> cmf/avg_device_disconnect_time <==
398526
==> cmf/avg_function_pending_time <==
142810
==> cmf/avg_initial_command_response_time <==
19170
==> cmf/avg_sample_interval <==
8401681344
==> cmf/avg_utilization <==
00.0%
==> cmf/sample_count <==
10803
==> cmf/ssch_rsch_count <==
10803

Note that all values but sample_count and ssch_rsch_count are averaged over time. We also see that samples seem to have been taken whenever the driver issued a ssch.

The device in our example shows an avg_utilization of 0%, which is consistent with a device that mostly sits idle. But what about a device where something is actually happening?

head cmf/*
==> cmf/avg_control_unit_queuing_time <==
0
==> cmf/avg_device_active_only_time <==
0
==> cmf/avg_device_busy_time <==
0
==> cmf/avg_device_connect_time <==
58454
==> cmf/avg_device_disconnect_time <==
16743818
==> cmf/avg_function_pending_time <==
99322
==> cmf/avg_initial_command_response_time <==
20284
==> cmf/avg_sample_interval <==
153014636
==> cmf/avg_utilization <==
11.0%
==> cmf/sample_count <==
1281
==> cmf/ssch_rsch_count <==
1281

Here, we see a higher avg_utilization, but actually not that many ssch invocations. Interesting is the relatively high value of avg_device_disconnect_time: It indicates that there are quite long intervals where the device and the channel subsystem do not talk to each other. That might, for example, happen if other LPARs on the same system drive a lot of I/O via the same channel paths as the device.

Help, I cannot enable channel measurements on my device!

There's one drawback when trying to enable channel measurements on a live device: It needs to execute a msch, which only can be done on an idle subchannel. For devices that execute separate ssch invocations to go about their business (e.g. dasd), the common I/O layer can squeeze in the msch between ssch invocations and all is well. However, some devices use a long-running channel program, which will not conclude during the time the device is enabled; the most prominent example are devices using QDIO, like zFCP adapters or OSA cards. In that case, the common I/O layer cannot squeeze in a msch; you might try disabling the device, but that's usually not something you want to do in a live system.

Extended channel measurements

What if you want to find out something not about an individual device, but for a channel path? There's a feature for that; you can issue
echo 1 > /sys/devices/css0/cm_enable
and will find new entries (measurement, measurement_chars) under the various chp0.xx objects.

Unfortunately, these attributes only provide some binary data, which does not seem to be publicly documented, and I'm not aware of any tool that can parse them.

Channel measurements in QEMU guests

So far, all measurements have been collected on the host; but what about measurements in the guest?

The good news: You can turn on channel measurements for ccw devices in the guest. The bad news: They are not very useful.

Consider, for example, this virtio-ccw device:
 head cmf/*
==> cmf/avg_control_unit_queuing_time <==
0
==> cmf/avg_device_active_only_time <==
0
==> cmf/avg_device_busy_time <==
0
==> cmf/avg_device_connect_time <==
0
==> cmf/avg_device_disconnect_time <==
0
==> cmf/avg_function_pending_time <==
0
==> cmf/avg_initial_command_response_time <==
0
==> cmf/avg_sample_interval <==
-1
==> cmf/avg_utilization <==
00.0%
==> cmf/sample_count <==
0
==> cmf/ssch_rsch_count <==
134

No samples, just a ssch count. Why? QEMU does not fully emulate the sampling infrastructure; only counting of ssch is done (which is very easy to implement). Moreover, virtio-ccw devices use channel programs mainly to set up queues, negotiate features, etc., so measurements here do not reflect what is going on on the virtqueues, which would be the interesting part for performance issues.

But what about a dasd passed through via vfio-ccw? That one should have more statistics, right?
head cmf/*           
==> cmf/avg_control_unit_queuing_time <==
0
==> cmf/avg_device_active_only_time <==
0
==> cmf/avg_device_busy_time <==
0
==> cmf/avg_device_connect_time <==
0
==> cmf/avg_device_disconnect_time <==
0
==> cmf/avg_function_pending_time <==
0
==> cmf/avg_initial_command_response_time <==
0
==> cmf/avg_sample_interval <==
-1
==> cmf/avg_utilization <==
00.0%
==> cmf/sample_count <==
0
==> cmf/ssch_rsch_count <==
144

No samples, just a ssch count, again. Why? Currently, vfio-ccw uses the same emulation infrastructure as the other emulated devices. In the future, we may implement some kind of passthrough for channel measurements, but that requires some work.