[ovirt-devel] status update: consuming the BLOCK_THRESHOLD event from libvirt (rhbz#1181665)

Sun Jul 23 13:26:36 UTC 2017

> On 23 Jul 2017, at 11:58, Roy Golan <rgolan at redhat.com> wrote:
> 
> On Wed, Jul 19, 2017 at 12:46 PM Francesco Romani <fromani at redhat.com <mailto:fromani at redhat.com>> wrote:
> Hi all,
> 
> 
> With libvirt 3.2.0 and onwards, it seems we have now the tools to solve
> https://bugzilla.redhat.com/show_bug.cgi?id=1181665 <https://bugzilla.redhat.com/show_bug.cgi?id=1181665>
> 
> and eventually get rid of the disk polling we do. This change is
> expected to have huge impact on performance, so I'm working on it.
> 
> 
> I had plans for a comprehensive refactoring in this area, but looks like
> a solution backportable for 4.1.z is appealing, so I
> 
> started with this first, saving the refactoring (which I still very much
> want) for later.
> 
> 
> So, quick summary: libvirt >= 3.2.0 allows to set a threshold to any
> node in the backing chain of each drive of a VM
> 
> (https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold <https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold>),
> and fire one event exactly once
> 
> when that threshold is crossed. The event needs to be explicitely
> rearmed after.
> 
> This is exactly what we need to get rid of polling in the steady state,
> so far so good.
> 
> 
> The problem is: we can't use this for some important flows we have, and
> which involve usage of disks not (yet) attached to a given VM.
> 
> 
> 
>  
> Possibly affected flows:
> 
> - live storage migration:
> 
>   we use         flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |
>                  libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT |
>                  VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB)
> 
>   meaning that Vdsm is in charge of handling the volume
> 
> - snapshots:
> 
>   we use         snapFlags = (libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT |
>                      libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA)
> 
> 
>   (same meaning as above)
> 
> - live merge: should be OK (according to a glance at the source and a
> chat with Adam).
> 
> 
> So looks like we will need to bridge this gap.
> 
> 
> So we can still use the BLOCK_THRESHOLD event for steady state, and
> avoid polling in the vast majority of the cases.
> 
> With "steady state" I mean that the VM is running, with no
> administration (snapshot, live merge, live storage migration...)
> operation in progress.
> 
> I think it is fair to assume that VMs are in this state the vast
> majority of the time.
> For the very important cases on which we cannot depend on events, we can
> fall back to polling, but in a smarter way:
> 
> instead of polling everything every 2s, let's just poll just the drives
> involved in the ongoing operations.
> 
> Those should be far less of the total amount of drives, and for a far
> shorter time than today, so polling should be practical.
> 
> Since the event fires once, we will need to rearm it only if the
> operation is ongoing, and only just before to start it (both conditions
> easy to check)
> We can disable the polling on completion, or on error. This per se is
> easy, but we will need a careful review of the flows, and perhaps some
> safety nets in place.
> 
> 
> Consider fusing polling and events into a single pipeline of events so they can be used together. If a poll triggers an event (with distinguished origin)
> then it all the handling is done in one place and it should be easy to stop or start polling, or remove them totally.
>  
> Anyway, should we miss to disable the polling, we will "just" have some
> overhead.
> 
>  
> On recovery, we will need to make sure to rearm all the relevant events,
> but  we can just plug in the recovery we must do already, so this should
> be easy as well.
> 
>  
> What is needed in order to 'rearm' it? is there an API to get the state of event subscription?
> If we lost an event how do we know to rearm it? is it idempotent to rearm?
> 
> Remind me, do we extend a disk if the VM paused with out of space event?
> 
> How will we handle 2 subsequent events if we didn't extend between them? (expecting the extend to be async operation)
> 
> So it seems to me this could fly and we can actually have the
> performance benefits of events.
> 
> 
> However, due to the fact that we need to review some existing and
> delicate flows, I think we should still keep the current polling code
> around for the next release.
> 
> +1   
>  
> I believe the best route is:
> 
> 1. offer the new event-based code for 4.2, keep the polling around.
> Default to events for performance
> 
> 2. remove the polling completely in 4.3
> 
> 
> Still wonder if removing them totally is good. The absence of the events should be supervised somehow - like in today, a failure to poll getstats of a domain will result in a VM going unresponsive. Not the most accurate state but at least gives some visibility. So polling should cover us where events will fail. (similar to engine's vms monitoring)

It is a different case. With disk extensions there always is the fallback of actually hitting the 100%, pausing the VM, and triggering the extend anyway. So I do not think there is a need for another mechanism when event is missed (e.g. due to a vdsm restart).

> 
> I'm currently working on the patches here:
> https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal <https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal>
> 
> 
> Even though the basics are in place, I don't think they are ready for
> review yet.
> 
> 
> Comments welcome, as usual.
> 
> 
> --
> Francesco Romani
> Senior SW Eng., Virtualization R&D
> Red Hat
> IRC: fromani github: @fromanirh
> 
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org <mailto:Devel at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170723/059ab69d/attachment.html>