[ovirt-devel] status update: consuming the BLOCK_THRESHOLD event from libvirt (rhbz#1181665)
Michal Skrivanek
michal.skrivanek at redhat.com
Sun Jul 23 13:26:36 UTC 2017
> On 23 Jul 2017, at 11:58, Roy Golan <rgolan at redhat.com> wrote:
>
> On Wed, Jul 19, 2017 at 12:46 PM Francesco Romani <fromani at redhat.com <mailto:fromani at redhat.com>> wrote:
> Hi all,
>
>
> With libvirt 3.2.0 and onwards, it seems we have now the tools to solve
> https://bugzilla.redhat.com/show_bug.cgi?id=1181665 <https://bugzilla.redhat.com/show_bug.cgi?id=1181665>
>
> and eventually get rid of the disk polling we do. This change is
> expected to have huge impact on performance, so I'm working on it.
>
>
> I had plans for a comprehensive refactoring in this area, but looks like
> a solution backportable for 4.1.z is appealing, so I
>
> started with this first, saving the refactoring (which I still very much
> want) for later.
>
>
> So, quick summary: libvirt >= 3.2.0 allows to set a threshold to any
> node in the backing chain of each drive of a VM
>
> (https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold <https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold>),
> and fire one event exactly once
>
> when that threshold is crossed. The event needs to be explicitely
> rearmed after.
>
> This is exactly what we need to get rid of polling in the steady state,
> so far so good.
>
>
> The problem is: we can't use this for some important flows we have, and
> which involve usage of disks not (yet) attached to a given VM.
>
>
>
>
> Possibly affected flows:
>
> - live storage migration:
>
> we use flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |
> libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT |
> VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB)
>
> meaning that Vdsm is in charge of handling the volume
>
> - snapshots:
>
> we use snapFlags = (libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT |
> libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA)
>
>
> (same meaning as above)
>
> - live merge: should be OK (according to a glance at the source and a
> chat with Adam).
>
>
> So looks like we will need to bridge this gap.
>
>
> So we can still use the BLOCK_THRESHOLD event for steady state, and
> avoid polling in the vast majority of the cases.
>
> With "steady state" I mean that the VM is running, with no
> administration (snapshot, live merge, live storage migration...)
> operation in progress.
>
> I think it is fair to assume that VMs are in this state the vast
> majority of the time.
> For the very important cases on which we cannot depend on events, we can
> fall back to polling, but in a smarter way:
>
> instead of polling everything every 2s, let's just poll just the drives
> involved in the ongoing operations.
>
> Those should be far less of the total amount of drives, and for a far
> shorter time than today, so polling should be practical.
>
> Since the event fires once, we will need to rearm it only if the
> operation is ongoing, and only just before to start it (both conditions
> easy to check)
> We can disable the polling on completion, or on error. This per se is
> easy, but we will need a careful review of the flows, and perhaps some
> safety nets in place.
>
>
> Consider fusing polling and events into a single pipeline of events so they can be used together. If a poll triggers an event (with distinguished origin)
> then it all the handling is done in one place and it should be easy to stop or start polling, or remove them totally.
>
> Anyway, should we miss to disable the polling, we will "just" have some
> overhead.
>
>
> On recovery, we will need to make sure to rearm all the relevant events,
> but we can just plug in the recovery we must do already, so this should
> be easy as well.
>
>
> What is needed in order to 'rearm' it? is there an API to get the state of event subscription?
> If we lost an event how do we know to rearm it? is it idempotent to rearm?
>
> Remind me, do we extend a disk if the VM paused with out of space event?
>
> How will we handle 2 subsequent events if we didn't extend between them? (expecting the extend to be async operation)
>
> So it seems to me this could fly and we can actually have the
> performance benefits of events.
>
>
> However, due to the fact that we need to review some existing and
> delicate flows, I think we should still keep the current polling code
> around for the next release.
>
> +1
>
> I believe the best route is:
>
> 1. offer the new event-based code for 4.2, keep the polling around.
> Default to events for performance
>
> 2. remove the polling completely in 4.3
>
>
> Still wonder if removing them totally is good. The absence of the events should be supervised somehow - like in today, a failure to poll getstats of a domain will result in a VM going unresponsive. Not the most accurate state but at least gives some visibility. So polling should cover us where events will fail. (similar to engine's vms monitoring)
It is a different case. With disk extensions there always is the fallback of actually hitting the 100%, pausing the VM, and triggering the extend anyway. So I do not think there is a need for another mechanism when event is missed (e.g. due to a vdsm restart).
>
> I'm currently working on the patches here:
> https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal <https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal>
>
>
> Even though the basics are in place, I don't think they are ready for
> review yet.
>
>
> Comments welcome, as usual.
>
>
> --
> Francesco Romani
> Senior SW Eng., Virtualization R&D
> Red Hat
> IRC: fromani github: @fromanirh
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org <mailto:Devel at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170723/059ab69d/attachment.html>
More information about the Devel
mailing list