
Hey all, sorry for joining a bit late... General note: hostdev-passthrough wiki will be updated ASAP in order to reflect ongoing progress. ----- Original Message -----
From: "Alona Kaplan" <alkaplan@redhat.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: "Eldan Hildesheim" <ehildesh@redhat.com>, devel@ovirt.org, "Nir Yechiel" <nyechiel@redhat.com> Sent: Sunday, November 2, 2014 2:17:40 PM Subject: Re: [ovirt-devel] SR-IOV feature
----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Alona Kaplan" <alkaplan@redhat.com>, bazulay@redhat.com Cc: "Itamar Heim" <iheim@redhat.com>, "Eldan Hildesheim" <ehildesh@redhat.com>, "Nir Yechiel" <nyechiel@redhat.com>, devel@ovirt.org Sent: Thursday, October 30, 2014 7:47:31 PM Subject: Re: [ovirt-devel] SR-IOV feature
On Sun, Oct 26, 2014 at 06:39:00AM -0400, Alona Kaplan wrote:
On 10/05/2014 07:02 AM, Alona Kaplan wrote:
Hi all,
Currently SR-IOV in oVirt is only supported using vdsm-hook [1]. This feature will add SR-IOV support to oVirt management system (including migration).
You are more than welcome to review the feature page- http://www.ovirt.org/Feature/SR-IOV
Thanks, Alona. _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Glad to see this.
some questions:
Note: this feature is about exposing a virtualized (or VirtIO) vNic to the guest, and not about exposing the PCI device to it. This restriction is necessary for migration to be supported.
did not understand this sentence - are you hinting to macvtap?
Most likely macvtap, yes.
Additionally I think Martin PolednĂk is looking into direct sr-iov attachment to VMs as part of the pci passthrough work he is doing.
add/edit profile
so i gather the implementation is at profile level, which is at logical network level? how does this work exactly? can this logical network be vlan tagged or must be native? if vlan tagged who does the tagging for the passthrough device? (I see later on vf_vlan is one of the parameters to vdsm, just wondering how the mapping can be at host level if this is a passthrough device)? is this because the use of virtio (macvtap)?
The logical network can be vlan tagged. As you mentioned the vf_vlan is one of the parameters to the vdsm (on create verb). Setting the vlan on the vf is done as follows- ip link set {DEVICE} vf {NUM} [ vlan VLANID ] It is written in the notes section.
It is not related to the use of virtio. The vlan can be set on the vf whether it is connected to the vm via macvtap or directly.
Are you sure about this? I think that when a host device is attached to a VM, it disappears from the host, and the the guest can send arbitrary unmodified packets through the wire. But I may well be wrong.
I think you are correct for the case of mtu (that's why I added it as an open issue- "Is applying MTU on VF supported by libvirt?"). But as I understand from the documentation (although I didn't test it by myself)- that is the purpose of ip link set {DEVICE} vf {NUM} vlan VLANID The documentation says- "all traffic sent from the VF will be tagged with the specified VLAN ID. Incoming traffic will be filtered for the specified VLAN ID, and will have all VLAN tags stripped before being passed to the VF."
Note- It is also supported by libvirt. As you can read in- http://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/Virt... "type='hostdev' SR-IOV interfaces do support transparent vlan tagging of guest traffic".
wouldn't it be better to support both macvtap and passthrough and just flag the VM as non migratable in that case?
Martin Polednik is working on pci-passthrough- http://www.ovirt.org/Features/hostdev_passthrough
I'm actively working on hostdev passthrough (not only PCI but PCI, scsi and usb currently) and part of my testing was done on SR-IOV capable nic (intel 82576 chip).
Maybe we should wait for his feature to be ready and then combine it with the sr-iov feature. As I see in his feature page he plans to attach a specific device directly to the vm.
Hostdev passthrough is working on a VFIO granularity - that means it's reporting to engine whole computer bus tree (libvirt's listAllDevices()) including few unique device identifiers (for me that is name of the nevice such as pci_0000_af_01_1c OR the tuple (vendor_id, device_id). The api is very general - it doesn't care if we're dealing with PV or VF, only restriction is that whole IOMMU group has to be attached (libvirt limitation) - in case of SR-IOV NICs that presents no complications as these are in unique IOMMU groups. This is the API you should use when dealing with physical host devices, if anything is missing feel free to bring it up and we can work it in atleast so we don't implement the same thing twice.
We can combine his feature with the sr-iov feature- 1. The network profile will have type property- bridge (the regular configuration we have today, vnic->tap->bridge->physical nic). virtio(in the current feature design it is called passthrough, vnic->macvtap->vf) pci-passthrough(vnic->vf) 2. Attaching a network profile with pci-passthrough type to a vnic will mark the vm as non-migratable.
This marking can be tuned by the admin. If the admin requests migration despite the pci-passthrough type, Vdsm can auto-unplug the PCI device before migration, and plug it back on the destination. That would allow some kind of migration to guests that are willing to see a PCI device disappear and re-appear.
For NICs this can even be avoided by using bonding[1], for other devices we'll need to manually handle cases of specific device on specific bus { specific device (any bus) { VF belonging to specific PF VF (any PF) (and possibly more, to be discussed)
Added it as an open issue to the feature page.
3. When running a vm with pci-passthrough vnic a free VF will be attached to the vm with the vlan and mtu configuration of the profile/network (same as for virio profile, as described in the feature page).
The benefit of it is that the user won't have to choose the vf directly and will be able to set vlan and mtu on the vf.
also (and doesn't have to be in first phase) what happens if i ran out of hosts with sr-iov (or they failed) - can i fail back to non pcipassthrough profile for backup (policy question at vm level if more important to have sr-iov or more important it will run even without it since it provides a critical service, with a [scheduling] preference to run on sr-iov? (oh, i see this is in the "futures" section already.
:)
A benefit of this "Nice to have passthrough" is that one could set it on vNic profiles that are already used by VMs. Once they are migrated to a new host, the passthrough-ness request would take effect.
Added this benefit to the feature page.
management, display and migration properties are not relevant for the VFs configuration
just wondering - any technical reason we can't put the management on a VF (not saying its a priority to do so)?
Today we mark the logical network with a role (management/display/migration) when attaching it to the cluster. A logical network can be attached to one physical nic (PF).
We can't use the current attachment of a role for sr-iov, since the network can be configured as "vf allowed" on more than one nic (maybe even on all the nics). If the network is "vf allowed" on the nic, a vnic with this network can be attached to a free vf on the nic.
So we can't use the logical network to mark a vf with a role. We have to mark the vf explicitly. Since in the current design we don't expose the vf, setting the roles was blocked. But if there is a requirement for setting a vf as management/migration/display we can re-think about the design for it.
We can relax this requirement by allowing the network to be attached on one nic (be it VF or PF or legacy), and to set they "vf allowed" on a completely disjoint set of PFs.
I'm not sure I understand your suggestion. And still don't understand the benefit of using a vf as management/display/migration.
sr-iov host nic management - num of VFs
I assume this is for admin to define a policy on how many VFs to use, based on the max as reported by getVdsCaps. worth stating that for clarity.
Updated the wiki with the following- "It is used for admin to enable this number of VFs on the nic. Changing this value will remove all the VFs from the nic and create new #numOFVfs VFs on the nic."
The max value reported by getVdsCaps is just the theoretical maximum value.
I think that Itamar suggests that this should be automated. And admin could say "give me all the VFs you can", and when adding a new host, Engine would set it seemlessly.
By the way, do you know what's the down side of asking for the maximum number of VFs? Is it memory overhead? CPU? network performence?
I think "give me all the VFs you can" would rarely be used because in practice this maximum is much lower, since each VF consumes resources. Network device needs the resources to support the VF such as queues for data, data address space, command processing, and more.
I wonder whether it makes sense for Vdsm to set the max on each reboot?
You're not updating the max, you're updating the number of of existing VFs on a PF.
On a reboot all the VFs are destroyed. When the host is started, #defaultNum of VFs are created.
Updating the num of VFs via sysfs is cross modules. Since the sriov_numvfs value passed to sysfs is not persistent cross reboots, after a reboot the new value is taken from the module specific configuration.
Each module has its own way to specify persistent default num of VFs. For example- with Intel VT-d you should add the line- options igb max_vfs=7 to any file in /etc/modprobe.d If the module doesn't specify the number of VFs in its configuration the default number is 0.
So if vdsm won't set /sys/class/net/'device_name'/device/sriov_numvfs on each reboot, the user will have to control the number manually and module specifically.
Another related issue, that is mentioned as an open question: The current suggestion, of having updateSriovMaxVFs as an independent verb has a down side: you cannot use it to updateSriovMaxVFs of the PF that is used by the management network. If we want to support this use case, we should probably expose the functionality within the transactional setupNetworks verb.
Why can't it be used on the PF that is used by the management network? AFAIK the PF doesn't lose connectivity when updating /sys/class/net/eth0/device/sriov_numvfs but I"m not sure about it. Added it to the open issues section.
User Experience - Setup networks - Option 1
in the last picture ("Edit VFs networks and labels") - why are there labels here together with the networks (if labels appear at the PF level in the first dialog)?
iiuc, the option 2 is re-using the setup networks, where the PF will just be another physical interface, and networks or labels edited just like for regular network interfaces? (not sure where you are on this, but it sounds more straight forward/similar to existing concepts iiuc).
As I wrote in the answer about the roles. There are two concepts- 1. The attachment of network to physical nic (what we have today). 2. Containing the network in the "VFs managenet tab=>allowed networks" of the nic.
In 1, we actually configure the host's nics and bridges according to the setup networks. In 2, we just specify the "allowed" list, it doesn't even sent to the vdsm. It is used by the engine when it schedules a host for a vm.
The connection between networks to nics is many to many. The same network can be part of 1 and 2 on the same nic. And even part of 2 in other sr-iov enabled nics.
Since 2 is completely different concept than 1, we weren't sure that using drag and drop as for PFs isn't to much in this case.
Question: any issues with hot plug/unplug or just expected to work normally?
Expected to work (but wasn't tested yet).
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel