-
Notifications
You must be signed in to change notification settings - Fork 109
Using Virtio (QEMU)
With the recent merges, Harvey is now capable of using the Virtio capabilities provided by QEMU. Here is a brief guide how to set it up on both guest and host sides.
Harvey uses the PCI representation of Virtio devices. During the kernel initialization, all PCI devices are scanned and enumerated. Separate additional enumeration is maintained for virtio devices. They are given internal names consisting of the word "virtio", device class (9p, console, etc.), and the internal number given during the initial enumeration. These names can be seen in the file /dev/irqalloc
as an example below shows:
3 0 0 0 trap #BP
7 0 1 1975 trap #NM
8 0 0 0 trap #DF
14 0 1552 2642646005 trap #PF
15 0 0 0 trap #15
16 0 0 0 trap #MF
19 0 0 0 trap #XF
50 50 65193 10844815152 lapic APIC timer
65 1 440 312778444 ioapic keyb
73 12 956 338362551 ioapic mouse
89 11 3526 1351727073 ioapic virtio-9p-0
89 11 3526 1351727073 ioapic ether0
113 10 1 187675 ioapic virtio-console-2
113 10 1 187675 ioapic virtio-9p-1
The order of PCI devices as presented to the quest is supposedly defined by the order of the device definitions in the QEMU command line, so it is important to remember that internal names of virtio devices as seen by the Harvey kernel are not persistent. The QEMU command line fragment corresponding to the output above is presented below:
-device virtio-9p-pci,fsdev=hrvtmp,mount_tag=harvtmp \ [0]
-fsdev local,path=$HARVEY/usr/harvey/tmp,id=hrvtmp,security_model=none \
-device virtio-9p-pci,fsdev=fstmp,mount_tag=hosttmp \ [1]
-fsdev local,path=/tmp,id=fstmp,security_model=none \
-device virtio-serial-pci,max_ports=1 \ [2]
-device virtconsole,chardev=vc02 \
-chardev socket,id=vc02,path=/tmp/vc02 \
The representation of virtio devices to userland programs may not be the same for all devices. For example, native 9p mount device operates "under the hood", and does not expose itself directly; device identification is done via mount tags. Contrarily, virtio-serial devices expose their raw virtqueues for direct file read-write operations, and are addressed by their internal name. Details of such representation will be discussed further for each device type.
A well-known capability of QEMU is to provide access to any host directory via 9p to a guest which has to use a specialized virtio device in order to access them. The actual implementation is based on the 9p2000.u (or .L) flavors of 9p, but the special Harvey kernel-level driver performs the necessary protocol translation by stripping extra message elements, so the rest of the Harvey kernel is not confused.
Each host share should be properly tagged in order for guests to distinguish between them. For use with Harvey, mount tags should not contain spaces and colons.
See this page for general information. Example setup is below (a fragment from a typical shell script to run Harvey in QEMU):
qemu-system-x86_64 \
-append "..."
...
-device virtio-9p-pci,fsdev=hrvtmp,mount_tag=harvtmp \
-fsdev local,path=$HARVEY/usr/harvey/tmp,id=hrvtmp,security_model=none \
...
-kernel $HARVEY/sys/src/9/amd64/harvey.32bit $*
Here, $HARVEY
should point to the root directory of Harvey source tree, and /usr/harvey/tmp
will be visible in Harvey as /tmp
(just for example).
The list of available mount tags is provided via the /dev/mtags
(served by the console driver):
--r--r--r-- P 0 harvey harvey 0 Feb 7 2006 /dev/mtags
term% cat /dev/mtags
harvtmp:-
hosttmp:-
For each tag, a column separated list is provided. the first token being the tag name. Unmounted tags have a hyphen after the first colon.
In order to mount a tag, the following format of a command is used:
mount -d '#9' /dev/null /mnt/xxx harvtmp
What matters here: use the '#9' mount device in order to get proper translation of the protocol (the standard '#M' device does not support 9p2000.u, and '#9' is a "phantom" device on top of '#M' which takes care of it). The first command parameter, /dev/null
can be arbitrary existing file, it will not be accessed, and is needed here just as a placeholder (cf. mount none -t tmpfs...
in Linux). The next parameter is the mount point location, provided as usual. The last parameter, harvtmp
is mandatory. It is the "spec" in the mount parlance, and must contain the mount tag to identify the host share to be mounted.
After the command above, the contents of /dev/mtags is now presented differently:
harvtmp:9P2000.u:131096:2:2
hosttmp:-
the first tag is now mounted, so its line now displays tag name, protocol version, message size, PID cache use and PID cache hits numbers (the PID cache is needed to provide proper ordering of 9p messages transmitted over the virtqueue as the QEMU native 9p implementation requires).
-
Unmounting of a tag is not properly detected, so even after
unmount /mnt/xxx
the contents of/dev/mtags
remains unchanged. -
If a host directory contains "non-regular" files (e. g. sockets) reading the directory contents causes "malformed stat buffer" error.
The Virtio Serial Port device is provided by QEMU for arbitrary stream-like exchange of information. It can be connected with a pipe-like resource (e. g. an Unix socket, or a TCP connection) on the host side. Guest provides a buffer to be written to the pipe, and it will be read on the host side; consequently the information written by the host program into the pipe will be returned in the read buffer to the guest. The device has one limitation that it is not possible to know how much of the read buffer was filled by the host program. So whatever is the size of the read buffer provided by the guest, the same number will always be returned by the read operation even if the host modified nothing in the buffer.
The virtio-serial-pci devices are presented under /dev/virtcon
, one file per virtqueue:
--rw-rw-rw- C 0 harvey harvey 0 Feb 7 2006 /dev/virtcon/virtio-console-2
Note the file name: it matches the internal PCI device name as shown in the /dev/irqalloc
example earlier. The file can be opened, read, written, closed as usual. Seek is not supported, and the offset
parameter of read/write is ignored.
The host setup looks like this:
-device virtio-serial-pci,max_ports=1 \ [line 1]
-device virtconsole,chardev=vc02 \ [line 2]
-chardev socket,id=vc02,path=/tmp/vc02 \ [line 3]
Even though QEMU provides multi-port feature with virtio serial ports (multiple virtqueues per controller), Harvey does not use it. The interrupt bit is set per device rather than per virtqueue, and if a device has multiple virtqueues, additional step is required to find which queue caused an interrupt which makes interrupt processing longer. So it is more feasible to have N serial devices each with one port than one serial device with N ports.
Line 1 in the example above defines the controller. The property max_ports=1 is recommended as it limits the number of virtqueues created per controller, reduces PCI scan time, and interrupt processing time. Line 2 names the serial device and its associated host pipe resource. It is necessary to use virtconsole
rather than virtio-serial-port
because max_ports=1
sets the upper limit for port index to 0, and QEMU associates port 0 with virtual console specifically. Line 3 defines the pipe resource on the host (chardev=
on line 2 should be the same as id=
on line 3). A socket can be in either client (as shown) or in server mode (refer to the QEMU documentation for details).
Write operations on a virtio serial port whose host resource is not connected will lose information, read operations on such resource will block. Beware that malfunctioning host program connected to the resource (not returning from a read-write operation) may block entire QEMU, but this is not a Harvey limitation.
The native 9p under QEMU has limitation that it cannot work with an arbitrary host program which would serve 9p on its standard input/output. Using virtio serial ports for 9p makes it possible. In this example we use ufs
to serve a host directory via 9p over a virtio serial port, but any other compliant program can be used. Host file system access over virtio serial ports is generally slower and less stable than over native 9p.
Define a Unix socket in client mode, like shown in the example above. Before starting QEMU with Harvey, make sure that ufs
is serving the desired host directory:
rm -f /tmp/vc02
$HARVEY/util/ufs -ntype=unix -addr=/tmp/vc02 -root=/tmp -debug=1 &
nppid=$!
sleep 1
The virtio serial port device limitation (no way to know actual number of information provided in the read buffer) makes it impossible (at least with existing devmnt) to directly mount a virtqueue file: devmnt validates the 9p messages received, and at least with Rversion, it rejects messages whose actual (returned by read) length differs from the length encoded in the first 4 octets.
A very simple userland program, 9pvpxy
(9p virtio proxy) was added to Harvey to enable proper 9p message handling that works around the said device limitation. The program takes a single parameter with the virtio serial port virtqueue file name, and no other options. From the kernel standpoint, this program forms a 9p server over its standard input/output which makes it possible to work with srv
and mount
as shown below:
srv -e '9pvpxy /dev/virtcon/virtio-console-2' vc2
mount -c -n /srv/vc2 /mnt/xxx
After these commands, the host directory served by ufs
on the host side is visible under the mountpoint chosen on the guest side.
It might be technically possible to modify the driver for the '#9' device to work with virtio serial port virtqueues as well, keeping both mechanisms under the kernel hood. On the other hand, virtio serial ports can be viewed as a generic guest-host pipes usable for various purposes (e. g. the Spice remote guest viewer uses them as control channels). It is more logical to expect userland programs on both sides communicating over such pipes rather than to restrict them for filesystem services within the kernel only. Future usage practice will show the correctness of such assumption.