On Mon, 27 Jul 2020 15:24:40 +0800
Yan Zhao <yan.y.zhao(a)intel.com> wrote:
> > As you indicate, the vendor driver is responsible for
checking version
> > information embedded within the migration stream. Therefore a
> > migration should fail early if the devices are incompatible. Is it
> but as I know, currently in VFIO migration protocol, we have no way to
> get vendor specific compatibility checking string in migration setup stage
> (i.e. .save_setup stage) before the device is set to _SAVING state.
> In this way, for devices who does not save device data in precopy stage,
> the migration compatibility checking is as late as in stop-and-copy
> stage, which is too late.
> do you think we need to add the getting/checking of vendor specific
> compatibility string early in save_setup stage?
>
hi Alex,
after an offline discussion with Kevin, I realized that it may not be a
problem if migration compatibility check in vendor driver occurs late in
stop-and-copy phase for some devices, because if we report device
compatibility attributes clearly in an interface, the chances for
libvirt/openstack to make a wrong decision is little.
I think it would be wise for a vendor driver to implement a pre-copy
phase, even if only to send version information and verify it at the
target. Deciding you have no device state to send during pre-copy does
not mean your vendor driver needs to opt-out of the pre-copy phase
entirely. Please also note that pre-copy is at the user's discretion,
we've defined that we can enter stop-and-copy at any point, including
without a pre-copy phase, so I would recommend that vendor drivers
validate compatibility at the start of both the pre-copy and the
stop-and-copy phases.
so, do you think we are now arriving at an agreement that we'll
give up
the read-and-test scheme and start to defining one interface (perhaps in
json format), from which libvirt/openstack is able to parse and find out
compatibility list of a source mdev/physical device?
Based on the feedback we've received, the previously proposed interface
is not viable. I think there's agreement that the user needs to be
able to parse and interpret the version information. Using json seems
viable, but I don't know if it's the best option. Is there any
precedent of markup strings returned via sysfs we could follow?
Your idea of having both a "self" object and an array of "compatible"
objects is perhaps something we can build on, but we must not assume
PCI devices at the root level of the object. Providing both the
mdev-type and the driver is a bit redundant, since the former includes
the latter. We can't have vendor specific versioning schemes though,
ie. gvt-version. We need to agree on a common scheme and decide which
fields the version is relative to, ex. just the mdev type?
I had also proposed fields that provide information to create a
compatible type, for example to create a type_x2 device from a type_x1
mdev type, they need to know to apply an aggregation attribute. If we
need to explicitly list every aggregation value and the resulting type,
I think we run aground of what aggregation was trying to avoid anyway,
so we might need to pick a language that defines variable substitution
or some kind of tagging. For example if we could define ${aggr} as an
integer within a specified range, then we might be able to define a type
relative to that value (type_x${aggr}) which requires an aggregation
attribute using the same value. I dunno, just spit balling. Thanks,
Alex