----- Original Message -----
From: "Dan Kenigsberg" <danken(a)redhat.com>
To: "Alona Kaplan" <alkaplan(a)redhat.com>, bazulay(a)redhat.com
Cc: "Itamar Heim" <iheim(a)redhat.com>, "Eldan Hildesheim"
<ehildesh(a)redhat.com>, "Nir Yechiel" <nyechiel(a)redhat.com>,
devel(a)ovirt.org
Sent: Thursday, October 30, 2014 7:47:31 PM
Subject: Re: [ovirt-devel] SR-IOV feature
On Sun, Oct 26, 2014 at 06:39:00AM -0400, Alona Kaplan wrote:
>
> > > On 10/05/2014 07:02 AM, Alona Kaplan wrote:
> > > > Hi all,
> > > >
> > > > Currently SR-IOV in oVirt is only supported using vdsm-hook [1].
> > > > This feature will add SR-IOV support to oVirt management system
> > > > (including
> > > > migration).
> > > >
> > > > You are more than welcome to review the feature page-
> > > >
http://www.ovirt.org/Feature/SR-IOV
> > > >
> > > >
> > > > Thanks,
> > > > Alona.
> > > > _______________________________________________
> > > > Devel mailing list
> > > > Devel(a)ovirt.org
> > > >
http://lists.ovirt.org/mailman/listinfo/devel
> > > >
> > >
> > > Glad to see this.
> > >
> > > some questions:
> > >
> > > > Note: this feature is about exposing a virtualized (or VirtIO) vNic
> > > > to
> > > > the
> > > > guest, and not about exposing the PCI device to it. This restriction
> > > > is
> > > > necessary for migration to be supported.
> > >
> > > did not understand this sentence - are you hinting to macvtap?
> >
> > Most likely macvtap, yes.
> >
> > Additionally I think Martin Poledník is looking into direct sr-iov
> > attachment
> > to VMs as part of the pci passthrough work he is doing.
> >
> > >
> > > > add/edit profile
> > >
> > > so i gather the implementation is at profile level, which is at logical
> > > network level?
> > > how does this work exactly? can this logical network be vlan tagged or
> > > must be native? if vlan tagged who does the tagging for the passthrough
> > > device? (I see later on vf_vlan is one of the parameters to vdsm, just
> > > wondering how the mapping can be at host level if this is a passthrough
> > > device)?
> > > is this because the use of virtio (macvtap)?
>
> The logical network can be vlan tagged.
> As you mentioned the vf_vlan is one of the parameters to the vdsm (on
> create verb).
> Setting the vlan on the vf is done as follows-
> ip link set {DEVICE} vf {NUM} [ vlan VLANID ]
> It is written in the notes section.
>
> It is not related to the use of virtio. The vlan can be set on the vf
> whether it
> is connected to the vm via macvtap or directly.
Are you sure about this? I think that when a host device is attached to
a VM, it disappears from the host, and the the guest can send arbitrary
unmodified packets through the wire. But I may well be wrong.
I think you are correct for the case of mtu
(that's why I added it as an open issue- "Is applying MTU on VF supported by
libvirt?").
But as I understand from the documentation (although I didn't test it by myself)-
that is the purpose of ip link set {DEVICE} vf {NUM} vlan VLANID
The documentation says- "all traffic sent from the VF will be tagged with the
specified VLAN ID.
Incoming traffic will be filtered for the specified VLAN ID, and will have all
VLAN tags stripped before being passed to the VF."
Note- It is also supported by libvirt. As you can read in-
http://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/V...
"type='hostdev' SR-IOV interfaces do support transparent vlan tagging of
guest traffic".
> > > wouldn't it be better to support both macvtap and
passthrough and just
> > > flag the VM as non migratable in that case?
>
> Martin Polednik is working on pci-passthrough-
>
http://www.ovirt.org/Features/hostdev_passthrough
>
> Maybe we should wait for his feature to be ready and then combine it with
> the
> sr-iov feature.
> As I see in his feature page he plans to attach a specific device directly
> to the vm.
> We can combine his feature with the sr-iov feature-
> 1. The network profile will have type property-
> bridge (the regular configuration we have today,
> vnic->tap->bridge->physical nic).
> virtio(in the current feature design it is called passthrough,
> vnic->macvtap->vf)
> pci-passthrough(vnic->vf)
> 2. Attaching a network profile with pci-passthrough type to a vnic will
> mark the vm as non-migratable.
This marking can be tuned by the admin. If the admin requests migration
despite the pci-passthrough type, Vdsm can auto-unplug the PCI device
before migration, and plug it back on the destination.
That would allow some kind of migration to guests that are willing to
see a PCI device disappear and re-appear.
Added it as an open issue to the feature page.
> 3. When running a vm with pci-passthrough vnic a free VF will be
attached
> to the vm with the vlan and mtu
> configuration of the profile/network (same as for virio profile, as
> described in the feature page).
>
> The benefit of it is that the user won't have to choose the vf directly and
> will
> be able to set vlan and mtu on the vf.
>
> > >
> > > also (and doesn't have to be in first phase) what happens if i ran
out
> > > of hosts with sr-iov (or they failed) - can i fail back to non
> > > pcipassthrough profile for backup (policy question at vm level if more
> > > important to have sr-iov or more important it will run even without it
> > > since it provides a critical service, with a [scheduling] preference to
> > > run on sr-iov?
> > > (oh, i see this is in the "futures" section already.
>
> :)
A benefit of this "Nice to have passthrough" is that one could set it on
vNic profiles that are already used by VMs. Once they are migrated to a
new host, the passthrough-ness request would take effect.
Added this benefit to the feature page.
>
> > >
> > >
> > > > management, display and migration properties are not relevant for
the
> > > > VFs
> > > > configuration
> > >
> > > just wondering - any technical reason we can't put the management on
a
> > > VF (not saying its a priority to do so)?
>
> Today we mark the logical network with a role
> (management/display/migration)
> when attaching it to the cluster.
> A logical network can be attached to one physical nic (PF).
>
> We can't use the current attachment of a role for sr-iov, since the network
> can
> be configured as "vf allowed" on more than one nic (maybe even on all the
> nics).
> If the network is "vf allowed" on the nic,
> a vnic with this network can be attached to a free vf on the nic.
>
> So we can't use the logical network to mark a vf with a role.
> We have to mark the vf explicitly.
> Since in the current design we don't expose the vf, setting the roles was
> blocked.
> But if there is a requirement for setting a vf as
> management/migration/display we can
> re-think about the design for it.
We can relax this requirement by allowing the network to be attached on
one nic (be it VF or PF or legacy), and to set they "vf allowed" on a
completely disjoint set of PFs.
I'm not sure I understand your suggestion.
And still don't understand the benefit of using a vf as management/display/migration.
>
> > >
> > > > sr-iov host nic management - num of VFs
> > >
> > > I assume this is for admin to define a policy on how many VFs to use,
> > > based on the max as reported by getVdsCaps. worth stating that for
> > > clarity.
> > >
>
> Updated the wiki with the following-
> "It is used for admin to enable this number of VFs on the nic.
> Changing this value will remove all the VFs from the nic and create new
> #numOFVfs VFs on the nic."
>
> The max value reported by getVdsCaps is just the theoretical maximum value.
I think that Itamar suggests that this should be automated. And admin
could say "give me all the VFs you can", and when adding a new host,
Engine would set it seemlessly.
By the way, do you know what's the down side of asking for the maximum
number of VFs? Is it memory overhead? CPU? network performence?
I think "give me all the VFs you can" would rarely be used because in
practice this maximum is much lower, since each VF consumes resources.
Network device needs the resources to support the VF such as queues for data,
data address space, command processing, and more.
I wonder whether it makes sense for Vdsm to set the max on each
reboot?
You're not updating the max, you're updating the number of of existing
VFs on a PF.
On a reboot all the VFs are destroyed.
When the host is started, #defaultNum of VFs are created.
Updating the num of VFs via sysfs is cross modules.
Since the sriov_numvfs value passed to sysfs is not persistent cross reboots,
after a reboot the new value is taken from the module specific configuration.
Each module has its own way to specify persistent default num of VFs.
For example- with Intel VT-d you should add the line- options igb max_vfs=7
to any file in /etc/modprobe.d
If the module doesn't specify the number of VFs in its configuration
the default number is 0.
So if vdsm won't set /sys/class/net/'device_name'/device/sriov_numvfs on each
reboot,
the user will have to control the number manually and module specifically.
Another related issue, that is mentioned as an open question:
The current suggestion, of having updateSriovMaxVFs as an independent
verb has a down side: you cannot use it to updateSriovMaxVFs of the PF
that is used by the management network. If we want to support this use
case, we should probably expose the functionality within the
transactional setupNetworks verb.
Why can't it be used on the PF that is used by the management network?
AFAIK the PF doesn't lose connectivity when updating
/sys/class/net/eth0/device/sriov_numvfs
but I"m not sure about it. Added it to the open issues section.
>
>
> > > > User Experience - Setup networks - Option 1
> > >
> > > in the last picture ("Edit VFs networks and labels") - why are
there
> > > labels here together with the networks (if labels appear at the PF
> > > level
> > > in the first dialog)?
> > >
> > > iiuc, the option 2 is re-using the setup networks, where the PF will
> > > just be another physical interface, and networks or labels edited just
> > > like for regular network interfaces?
> > > (not sure where you are on this, but it sounds more straight
> > > forward/similar to existing concepts iiuc).
> > >
>
> As I wrote in the answer about the roles.
> There are two concepts-
> 1. The attachment of network to physical nic (what we have today).
> 2. Containing the network in the "VFs managenet tab=>allowed networks"
of
> the nic.
>
> In 1, we actually configure the host's nics and bridges according to the
> setup networks.
> In 2, we just specify the "allowed" list, it doesn't even sent to the
vdsm.
> It is used by the engine when it schedules a host for a vm.
>
> The connection between networks to nics is many to many.
> The same network can be part of 1 and 2 on the same nic.
> And even part of 2 in other sr-iov enabled nics.
>
> Since 2 is completely different concept than 1, we weren't sure that using
> drag and drop
> as for PFs isn't to much in this case.
>
> > > Question: any issues with hot plug/unplug or just expected to work
> > > normally?
>
> Expected to work (but wasn't tested yet).