On Thu, Aug 20, 2020 at 12:19 PM Vojtech Juranek <vjuranek(a)redhat.com> wrote:
Hi,
as Fedor is on PTO, I was asked to take over.
I don't think new functionality is need, more detail list of proposed flows and
changes is bellow,
TL;DR: I'd suggest to
- change in engine: send PDIV instead of CD path from engine to vdsm
- change in vdsm: implement counter of ISOs being used by VMs to know when we can
deactivate volume
- change in vdsm: remove old drivespec from VM XML when changing/removing CD (and
eventually deactivate volume)
You comments are welcome.
Thanks
Vojta
Flows
===
VM without a CD
-------------------------
- Should not be possible to insert any CD, this option should not be available/active
in the UI.
I don't think we have such configuration, all VMs have empty cdrom by default:
<disk type='file' device='cdrom'>
<driver name='qemu' error_policy='report'/>
<source startupPolicy='optional'/>
<target dev='sdc' bus='sata'/>
<readonly/>
<alias name='ua-d95b29f2-e8a2-4119-9ac2-5b4267e6637d'/>
<address type='drive' controller='0' bus='0'
target='0' unit='2'/>
</disk>
But of course if we have such configuration when adding CD is not
possible, the menu/button
should be disabled in the UI.
VM without CD, changeCD inserts new ISO
-----------------------------------------------------------------
- add new drivespec to VM metadata
How failure is handled?
- prepare new drivespec
What if vdsm is restarted at this point?
How VM recovery should handle this drivespec referring to inactive volume?
- attach new device to VM
- if attaching to VM fails, tear down drivespec and remove drivespec from VM metadata
Tearing down and removing drivespec cannot be done atomically. any
operation may fail and
vdsm may be killed at any point.
The design should suggest how vdsm recover from all errors.
VM with CD, changeCD removes current ISO
-------------------------------------------------------------------
- tear down previous drivespec
- if volume with ISO is inactive (as a result e.g. of failure of vdsm after inserting
drivespec into VM metadata, but before activating volume), continue without error
Sounds good, and may be already handled in lvm module. You can try to deactivate
LV twice to confirm this.
- if drivespec is used by another VM, don’t deactivate volume
This is the tricky part, vdsm does not have a mechanism for tracking
usage of block devices,
and adding such mechanism is much bigger work than supporting CD on
block devices, so
it cannot be part of this work.
-remove drivespec from VM metadata
What if this fails, or vdsm is restarted before we try to do this?
When vdsm starts it recovers running vms, we need to handle this case
- drivespec
referencing non-existing volume in this flow.
> VM with CD, changeCD inserts new ISO and removes old
>
--------------------------------------------------------------------------------------
> - tear down previous drivespec
> - if volume with ISO is inactive (as a result e.g. of failure of vdsm after
inserting drivespec into VM metadata, but before activating volume), continue without
error
- if drivespec is used by another VM, don’t deactivate volume
> - remove previous drivespec from VM metadata
> - add new drivespec to VM metadata
- prepare new drivespec
> - attach new device to VM
> - if attaching new drivespac fails, tear down new drivespec and attach back
previous drivespec
This is too complicated. When I discussed this Fedor, our conclusion
was that we don't
want to support this complexity, and it will be much easier to
implement 2 APIs, one for
removing a CD, and one for inserting a CD.
With separate APIs, error handling is much easier, and is similar to
existing error handling
in hotplugDisk and hotunplugDisk. You have good example how to handle
all errors in these
APIs.
For example, if we have API for removing a CD:
1. engine send VM.removeCD
2. vdsm mark drivespec for removal
Possible implementation:
<ovirt-vm:device devtype="disk" name="sdc"
removing="yes">
...
</ovirt-vm:device>
3. vdsm detach device from VM
4. vsdm deactivate volume
5. vdsm remove drivespec from metadata
Vdsm may be killed after any step. When recovering the VM, we can have
these cases:
1. vdsm was killed after step 2, the CD is marked for removal, but
device is attached to VM,
and the volume is active:
- change the metadata un-marking the drivespec for removal.
2. vdsm was killed after step 3, the drivespec is marked for removal,
and the device is not
attached to the VM, but the volume is active:
- deactivate the volume
- remove the drivespec from the metadata
Note that this should not happen before vdsm deactivate unused
logical volumes when starting
so we should really have only case 3.
3. vdsm was killed after step 4, the drivespec is marked for removal,
device not attached
to the VM, and the volume is not active:
- remove the drivespec from the metadata
4. vdsm was killed after step 5, everything is good in vdsm side.
Teardown should not fail in storage layer if volume is still used. In
current code we cannot
avoid the lvm failures, but we don't have to pass the failure up to
the caller, or we can pass
specific error like "volume is used", which can be ignored silently by
the caller, since this
error means someone else is using this volume and is responsible for
deactivating it.
In case of simple errors that vdsm can report, engine may fail the
change CD, or maybe
continue to insert the new one, if the old Cd was detached. For
example if old CD failed
to deactivate, this is storage issue that should not fail attaching of
a new CD. Storage layer
should make sure the volume is deactivate if nobody uses it, but the
user should not care
about that.
If vdsm was killed, engine will fail the operation, but it must wait
until the VM is recovered.
At this point engine get the VM XML and can update the real state of
the CD, either still
attached, or detached. Virt code should not care about deactivation failure.
Storage layer should be concerned with deactivation of active volumes,
but I don't think we
handle such issues in engine yet. On vdsm side, unused logical volumes
are deactivated
during startup, see:
https://github.com/oVirt/vdsm/blob/f6de2ca915faa719381e22086feb8b65aa4de9...
We probably need to add periodic job in storage to deactivate unused
volumes so we don't wait
until the next vdsm restart. This can also be handled in engine but I
don't think it is the right place
to handle host state.
Proposed changes
===========
Engine
----------
- change ChangeDiskCommand to use PDIV (pool, domain, image, volume ID) instead of ISO
path
- modify UI not to provide an option to change CD for VMs without CD
vdsm
--------
- add volume/ISO counter which would count the number of VM using the given ISO. When
CDROM is inserted, the counter is increased, when ejected, the counter is decreased. ISO
volume is deactivated only when on VM uses it.
This is a hack for supporting ISO, while the issue of shared block volume is
not limited to ISO. We have this issue in many flows with many partial
and broken
solution like avoiding activation in engine (image transfer), or
avoiding activation in
vdsm (snapshot).
This should be solved separately in a proper way in vdsm storage. This
is a large and
complex change that we may have in future version, not something for
the next zstream.
Unfortunately this area in vdsm was neglected for years, and we have
huge technical debt.
So for now we will have to live with warning when active ISO image is
deactivated, or improve
error handling so the caller can detect and handle properly the case
of volume in use. I think
better error handling will be easy to do and good enough for now.
We already have sqlite DB for managed storage, so we can add DB
for ISO quite easily.
- modify Vm._changeBlockDev to reflect flows suggested above
The solution require persistence, since vdsm can be killed, and most
we would most likely
use a database for this, but this should not be part of the
mangedvolume database, since
sqlite is not good with concurrent usage.
Nir
> Hi Fedor,
>
>
> > On 27 Jun 2020, at 14:46, Fedor Gavrilov <fgavrilo(a)redhat.com> wrote:
> >
> >
> >
> > So from what I was able to see in hotplugDisk example, for CDs I would
> > have to do similar:
>
> >
> >
> > 1. parse disk params from stored xml metadata
> > 2. call prepareVolumePath with these params
> > (skip normalizeVdsmImg)
> > 3. call updateDriveIndex
>
>
> Why?
>
>
> > (skip validating whether volume has leases)
> > 4. drive.getXML() and then hooks.before_disk_hotplug with it
>
>
> If only the storage stuff would ever finish moving to xml....
> But it’s not right to call this hook anyway, iiuc you’re not doing a
> hotplug
>
> > 5. call attachDevice with drive XML and handle possible errors
> >
> >
> >
> > I am not sure, however, whether we need what "else" does in
hotplugDisk
> > (vm.py line 3468)...
>
> >
> >
> > Looking forward to hearing your opinions,
> > Fedor
> >
> >
> >
> > ----- Original Message -----
> > From: "Fedor Gavrilov" <fgavrilo(a)redhat.com>
> > To: "devel" <devel(a)ovirt.org>
> > Sent: Monday, June 22, 2020 4:37:59 PM
> > Subject: [ovirt-devel] implementing hotplugCd/hotunplugCd in vdsm
> >
> >
> >
> > Hey,
> >
> >
> >
> > So in an attempt to fix change CD functionality we discovered a few other
> > potential
>
> CD has a long history of issues caused by historical simplifications.
> What is the intended fix, stop using string paths?
>
>
> > issues and what Nir suggested was to implement two [somewhat] new
> > functions in VDSM: hotplug and hotunplug for CDs similar to how it works
> > for normal disks now.
>
> This is conceptually different, CD is a removable media,
> hotplug/unplug is for the device itself. We never supported hotplug of
> the device though.
>
>
> > Existing changeCD function will be left as is for backwards
> > compatibility.
>
>
> Who would use the new function?
>
>
> > As I found out, engine already calculates iface and index before invoking
> > VDSM functions, so we will just pass these along with PDIV to the VDSM.
>
> >
> >
> > Suggested flow is, let me quote:
> >
> >
> >
> >> So the complete change CD flow should be:
> >>
> >>
> >>
> >> 1. get the previous drivespec from vm metadata
> >> 2. prepare new drivespec
> >> 3. add new drivespec to vm metadata
> >> 4. attach a new device to vm
>
>
> Don’t call it a new device when you just change a media
>
>
> >> 5. teardown the previous drivespec
> >> 6. remove previous drivespec from vm metadata
> >>
> >>
> >>
> >> When the vm is stopped, it must do:
> >>
> >>
> >>
> >> 1. get drive spec from vm metadata
> >> 2. teardown drivespec
> >>
> >>
> >>
> >> During attach, there are interesting races:
> >> - what happens if vdsm crashes after step 2? who will teardown the
> >> volume?
> >> maybe you need to add the new drivespec to the metadata first,
> >> before preparing it.
> >> - what happens if attach failed? who will remove the new drive from
> >> the metadata?
> >
> >
> >
> > Now, what makes hotplugDisk/hotunplugDisk different? From what I
> > understand, the flow is same there, so what difference is there as far as
> > VDSM is concerned? If none, this means if I more or less copy that code,
> > changing minor details and data accordingly for CDs, this should work,
> > shouldn't it?
>
> Hotplug/unplug is similar, but I would like to see the proposed change
> in context of engine as well, and I expect it to be a bit more complex
> there. Without that this is not worth it.
>
> Thanks,
> michal
>
> >
> >
> > Thanks!
> > Fedor
> > _______________________________________________
> > Devel mailing list -- devel(a)ovirt.org
> > To unsubscribe send an email to devel-leave(a)ovirt.org
> > Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct:
> >
https://www.ovirt.org/community/about/community-guidelines/
List
> > Archives:
> >
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/LAQR3RW4RMT
> > UNFUXL5T4HWLPKXJKEC3Y/ _______________________________________________
> > Devel mailing list -- devel(a)ovirt.org
> > To unsubscribe send an email to devel-leave(a)ovirt.org
> > Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct:
> >
https://www.ovirt.org/community/about/community-guidelines/
List
> > Archives:
> >
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/VKPDAB7XXTQ
> > KPYWAZEHQYTV7TRCQQI2E/
> _______________________________________________
> Devel mailing list -- devel(a)ovirt.org
> To unsubscribe send an email to devel-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/ List Archives:
>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/W2U55OKENZP2Z
> MNG7AJDYWE3NKBOPGOA/