<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Jul 19, 2017 at 12:46 PM Francesco Romani <<a href="mailto:fromani@redhat.com" target="_blank">fromani@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>
<br>
<br>
With libvirt 3.2.0 and onwards, it seems we have now the tools to solve<br>
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1181665" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1181665</a><br>
<br>
and eventually get rid of the disk polling we do. This change is<br>
expected to have huge impact on performance, so I'm working on it.<br>
<br>
<br>
I had plans for a comprehensive refactoring in this area, but looks like<br>
a solution backportable for 4.1.z is appealing, so I<br>
<br>
started with this first, saving the refactoring (which I still very much<br>
want) for later.<br>
<br>
<br>
So, quick summary: libvirt >= 3.2.0 allows to set a threshold to any<br>
node in the backing chain of each drive of a VM<br>
<br>
(<a href="https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold" rel="noreferrer" target="_blank">https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold</a>),<br>
and fire one event exactly once<br>
<br>
when that threshold is crossed. The event needs to be explicitely<br>
rearmed after.<br>
<br>
This is exactly what we need to get rid of polling in the steady state,<br>
so far so good.<br>
<br>
<br>
The problem is: we can't use this for some important flows we have, and<br>
which involve usage of disks not (yet) attached to a given VM.<br>
<br>
<br></blockquote><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Possibly affected flows:<br>
<br>
- live storage migration:<br>
<br>
we use flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |<br>
libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT |<br>
VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB)<br>
<br>
meaning that Vdsm is in charge of handling the volume<br>
<br>
- snapshots:<br>
<br>
we use snapFlags = (libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT |<br>
libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA)<br>
<br>
<br>
(same meaning as above)<br>
<br>
- live merge: should be OK (according to a glance at the source and a<br>
chat with Adam).<br>
<br>
<br>
So looks like we will need to bridge this gap.<br>
<br>
<br>
So we can still use the BLOCK_THRESHOLD event for steady state, and<br>
avoid polling in the vast majority of the cases.<br>
<br>
With "steady state" I mean that the VM is running, with no<br>
administration (snapshot, live merge, live storage migration...)<br>
operation in progress.<br>
<br>
I think it is fair to assume that VMs are in this state the vast<br>
majority of the time.<br>
For the very important cases on which we cannot depend on events, we can<br>
fall back to polling, but in a smarter way:<br>
<br>
instead of polling everything every 2s, let's just poll just the drives<br>
involved in the ongoing operations.<br>
<br>
Those should be far less of the total amount of drives, and for a far<br>
shorter time than today, so polling should be practical.<br>
<br>
Since the event fires once, we will need to rearm it only if the<br>
operation is ongoing, and only just before to start it (both conditions<br>
easy to check)<br>
We can disable the polling on completion, or on error. This per se is<br>
easy, but we will need a careful review of the flows, and perhaps some<br>
safety nets in place.<br>
<br></blockquote><div><br></div><div>Consider fusing polling and events into a single pipeline of events so they can be used together. If a poll triggers an event (with distinguished origin)<br></div><div>then it all the handling is done in one place and it should be easy to stop or start polling, or remove them totally.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Anyway, should we miss to disable the polling, we will "just" have some<br>
overhead.<br>
<br></blockquote><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On recovery, we will need to make sure to rearm all the relevant events,<br>
but we can just plug in the recovery we must do already, so this should<br>
be easy as well.<br>
<br></blockquote><div> </div></div><div dir="ltr"><div class="gmail_quote"><div>What is needed in order to 'rearm' it? is there an API to get the state of event subscription?<br></div><div>If we lost an event how do we know to rearm it? is it idempotent to rearm?<br></div><div><br>Remind me, do we extend a disk if the VM paused with out of space event?<br><br></div><div>How will we handle 2 subsequent events if we didn't extend between them? (expecting the extend to be async operation)<br><br></div></div></div><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
So it seems to me this could fly and we can actually have the<br>
performance benefits of events.<br>
<br></blockquote><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
However, due to the fact that we need to review some existing and<br>
delicate flows, I think we should still keep the current polling code<br>
around for the next release.<br>
<br></blockquote><div>+1 <br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I believe the best route is:<br>
<br>
1. offer the new event-based code for 4.2, keep the polling around.<br>
Default to events for performance<br>
<br>
2. remove the polling completely in 4.3<br>
<br>
<br></blockquote><div>Still wonder if removing them totally is good. The absence of the events should be supervised somehow - like in today, a failure to poll getstats of a domain will result in a VM going unresponsive. Not the most accurate state but at least gives some visibility. So polling should cover us where events will fail. (similar to engine's vms monitoring)<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I'm currently working on the patches here:<br>
<a href="https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal</a><br>
<br>
<br>
Even though the basics are in place, I don't think they are ready for<br>
review yet.<br>
<br>
<br>
Comments welcome, as usual.<br>
<br>
<br>
--<br>
Francesco Romani<br>
Senior SW Eng., Virtualization R&D<br>
Red Hat<br>
IRC: fromani github: @fromanirh<br>
<br>
_______________________________________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a><br>
</blockquote></div></div></div>