<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 23 Jul 2017, at 11:58, Roy Golan <<a href="mailto:rgolan@redhat.com" class="">rgolan@redhat.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_quote"><div dir="ltr" class="">On Wed, Jul 19, 2017 at 12:46 PM Francesco Romani <<a href="mailto:fromani@redhat.com" target="_blank" class="">fromani@redhat.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br class="">
<br class="">
<br class="">
With libvirt 3.2.0 and onwards, it seems we have now the tools to solve<br class="">
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1181665" rel="noreferrer" target="_blank" class="">https://bugzilla.redhat.com/show_bug.cgi?id=1181665</a><br class="">
<br class="">
and eventually get rid of the disk polling we do. This change is<br class="">
expected to have huge impact on performance, so I'm working on it.<br class="">
<br class="">
<br class="">
I had plans for a comprehensive refactoring in this area, but looks like<br class="">
a solution backportable for 4.1.z is appealing, so I<br class="">
<br class="">
started with this first, saving the refactoring (which I still very much<br class="">
want) for later.<br class="">
<br class="">
<br class="">
So, quick summary: libvirt >= 3.2.0 allows to set a threshold to any<br class="">
node in the backing chain of each drive of a VM<br class="">
<br class="">
(<a href="https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold" rel="noreferrer" target="_blank" class="">https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold</a>),<br class="">
and fire one event exactly once<br class="">
<br class="">
when that threshold is crossed. The event needs to be explicitely<br class="">
rearmed after.<br class="">
<br class="">
This is exactly what we need to get rid of polling in the steady state,<br class="">
so far so good.<br class="">
<br class="">
<br class="">
The problem is: we can't use this for some important flows we have, and<br class="">
which involve usage of disks not (yet) attached to a given VM.<br class="">
<br class="">
<br class=""></blockquote><div class=""><br class=""></div><div class=""> <br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Possibly affected flows:<br class="">
<br class="">
- live storage migration:<br class="">
<br class="">
we use flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |<br class="">
libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT |<br class="">
VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB)<br class="">
<br class="">
meaning that Vdsm is in charge of handling the volume<br class="">
<br class="">
- snapshots:<br class="">
<br class="">
we use snapFlags = (libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT |<br class="">
libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA)<br class="">
<br class="">
<br class="">
(same meaning as above)<br class="">
<br class="">
- live merge: should be OK (according to a glance at the source and a<br class="">
chat with Adam).<br class="">
<br class="">
<br class="">
So looks like we will need to bridge this gap.<br class="">
<br class="">
<br class="">
So we can still use the BLOCK_THRESHOLD event for steady state, and<br class="">
avoid polling in the vast majority of the cases.<br class="">
<br class="">
With "steady state" I mean that the VM is running, with no<br class="">
administration (snapshot, live merge, live storage migration...)<br class="">
operation in progress.<br class="">
<br class="">
I think it is fair to assume that VMs are in this state the vast<br class="">
majority of the time.<br class="">
For the very important cases on which we cannot depend on events, we can<br class="">
fall back to polling, but in a smarter way:<br class="">
<br class="">
instead of polling everything every 2s, let's just poll just the drives<br class="">
involved in the ongoing operations.<br class="">
<br class="">
Those should be far less of the total amount of drives, and for a far<br class="">
shorter time than today, so polling should be practical.<br class="">
<br class="">
Since the event fires once, we will need to rearm it only if the<br class="">
operation is ongoing, and only just before to start it (both conditions<br class="">
easy to check)<br class="">
We can disable the polling on completion, or on error. This per se is<br class="">
easy, but we will need a careful review of the flows, and perhaps some<br class="">
safety nets in place.<br class="">
<br class=""></blockquote><div class=""><br class=""></div><div class="">Consider fusing polling and events into a single pipeline of events so they can be used together. If a poll triggers an event (with distinguished origin)<br class=""></div><div class="">then it all the handling is done in one place and it should be easy to stop or start polling, or remove them totally.<br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Anyway, should we miss to disable the polling, we will "just" have some<br class="">
overhead.<br class="">
<br class=""></blockquote><div class=""> <br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On recovery, we will need to make sure to rearm all the relevant events,<br class="">
but we can just plug in the recovery we must do already, so this should<br class="">
be easy as well.<br class="">
<br class=""></blockquote><div class=""> </div></div><div dir="ltr" class=""><div class="gmail_quote"><div class="">What is needed in order to 'rearm' it? is there an API to get the state of event subscription?<br class=""></div><div class="">If we lost an event how do we know to rearm it? is it idempotent to rearm?<br class=""></div><div class=""><br class="">Remind me, do we extend a disk if the VM paused with out of space event?<br class=""><br class=""></div><div class="">How will we handle 2 subsequent events if we didn't extend between them? (expecting the extend to be async operation)<br class=""><br class=""></div></div></div><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
So it seems to me this could fly and we can actually have the<br class="">
performance benefits of events.<br class="">
<br class=""></blockquote><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
However, due to the fact that we need to review some existing and<br class="">
delicate flows, I think we should still keep the current polling code<br class="">
around for the next release.<br class="">
<br class=""></blockquote><div class="">+1 <br class=""> <br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I believe the best route is:<br class="">
<br class="">
1. offer the new event-based code for 4.2, keep the polling around.<br class="">
Default to events for performance<br class="">
<br class="">
2. remove the polling completely in 4.3<br class="">
<br class="">
<br class=""></blockquote><div class="">Still wonder if removing them totally is good. The absence of the events should be supervised somehow - like in today, a failure to poll getstats of a domain will result in a VM going unresponsive. Not the most accurate state but at least gives some visibility. So polling should cover us where events will fail. (similar to engine's vms monitoring)<br class=""></div></div></div></div></div></blockquote><div><br class=""></div>It is a different case. With disk extensions there always is the fallback of actually hitting the 100%, pausing the VM, and triggering the extend anyway. So I do not think there is a need for another mechanism when event is missed (e.g. due to a vdsm restart).</div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I'm currently working on the patches here:<br class="">
<a href="https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal" rel="noreferrer" target="_blank" class="">https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal</a><br class="">
<br class="">
<br class="">
Even though the basics are in place, I don't think they are ready for<br class="">
review yet.<br class="">
<br class="">
<br class="">
Comments welcome, as usual.<br class="">
<br class="">
<br class="">
--<br class="">
Francesco Romani<br class="">
Senior SW Eng., Virtualization R&D<br class="">
Red Hat<br class="">
IRC: fromani github: @fromanirh<br class="">
<br class="">
_______________________________________________<br class="">
Devel mailing list<br class="">
<a href="mailto:Devel@ovirt.org" target="_blank" class="">Devel@ovirt.org</a><br class="">
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/mailman/listinfo/devel</a><br class="">
</blockquote></div></div></div>
_______________________________________________<br class="">Devel mailing list<br class=""><a href="mailto:Devel@ovirt.org" class="">Devel@ovirt.org</a><br class="">http://lists.ovirt.org/mailman/listinfo/devel</div></blockquote></div><br class=""></body></html>