<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Jul 19, 2017 at 12:46 PM Francesco Romani &lt;<a href="mailto:fromani@redhat.com" target="_blank">fromani@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

<br>

With libvirt 3.2.0 and onwards, it seems we have now the tools to solve<br>

<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1181665" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1181665</a><br>

<br>

and eventually get rid of the disk polling we do. This change is<br>

expected to have huge impact on performance, so I&#39;m working on it.<br>

<br>

<br>

I had plans for a comprehensive refactoring in this area, but looks like<br>

a solution backportable for 4.1.z is appealing, so I<br>

<br>

started with this first, saving the refactoring (which I still very much<br>

want) for later.<br>

<br>

<br>

So, quick summary: libvirt &gt;= 3.2.0 allows to set a threshold to any<br>

node in the backing chain of each drive of a VM<br>

<br>

(<a href="https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold" rel="noreferrer" target="_blank">https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetBlockThreshold</a>),<br>

and fire one event exactly once<br>

<br>

when that threshold is crossed. The event needs to be explicitely<br>

rearmed after.<br>

<br>

This is exactly what we need to get rid of polling in the steady state,<br>

so far so good.<br>

<br>

<br>

The problem is: we can&#39;t use this for some important flows we have, and<br>

which involve usage of disks not (yet) attached to a given VM.<br>

<br>

<br></blockquote><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Possibly affected flows:<br>

<br>

- live storage migration:<br>

<br>

  we use         flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |<br>

                 libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT |<br>

                 VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB)<br>

<br>

  meaning that Vdsm is in charge of handling the volume<br>

<br>

- snapshots:<br>

<br>

  we use         snapFlags = (libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT |<br>

                     libvirt.VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA)<br>

<br>

<br>

  (same meaning as above)<br>

<br>

- live merge: should be OK (according to a glance at the source and a<br>

chat with Adam).<br>

<br>

<br>

So looks like we will need to bridge this gap.<br>

<br>

<br>

So we can still use the BLOCK_THRESHOLD event for steady state, and<br>

avoid polling in the vast majority of the cases.<br>

<br>

With &quot;steady state&quot; I mean that the VM is running, with no<br>

administration (snapshot, live merge, live storage migration...)<br>

operation in progress.<br>

<br>

I think it is fair to assume that VMs are in this state the vast<br>

majority of the time.<br>

For the very important cases on which we cannot depend on events, we can<br>

fall back to polling, but in a smarter way:<br>

<br>

instead of polling everything every 2s, let&#39;s just poll just the drives<br>

involved in the ongoing operations.<br>

<br>

Those should be far less of the total amount of drives, and for a far<br>

shorter time than today, so polling should be practical.<br>

<br>

Since the event fires once, we will need to rearm it only if the<br>

operation is ongoing, and only just before to start it (both conditions<br>

easy to check)<br>

We can disable the polling on completion, or on error. This per se is<br>

easy, but we will need a careful review of the flows, and perhaps some<br>

safety nets in place.<br>

<br></blockquote><div><br></div><div>Consider fusing polling and events into a single pipeline of events so they can be used together. If a poll triggers an event (with distinguished origin)<br></div><div>then it all the handling is done in one place and it should be easy to stop or start polling, or remove them totally.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Anyway, should we miss to disable the polling, we will &quot;just&quot; have some<br>

overhead.<br>

<br></blockquote><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On recovery, we will need to make sure to rearm all the relevant events,<br>

but  we can just plug in the recovery we must do already, so this should<br>

be easy as well.<br>

<br></blockquote><div> </div></div><div dir="ltr"><div class="gmail_quote"><div>What is needed in order to &#39;rearm&#39; it? is there an API to get the state of event subscription?<br></div><div>If we lost an event how do we know to rearm it? is it idempotent to rearm?<br></div><div><br>Remind me, do we extend a disk if the VM paused with out of space event?<br><br></div><div>How will we handle 2 subsequent events if we didn&#39;t extend between them? (expecting the extend to be async operation)<br><br></div></div></div><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

So it seems to me this could fly and we can actually have the<br>

performance benefits of events.<br>

<br></blockquote><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

However, due to the fact that we need to review some existing and<br>

delicate flows, I think we should still keep the current polling code<br>

around for the next release.<br>

<br></blockquote><div>+1   <br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I believe the best route is:<br>

<br>

1. offer the new event-based code for 4.2, keep the polling around.<br>

Default to events for performance<br>

<br>

2. remove the polling completely in 4.3<br>

<br>

<br></blockquote><div>Still wonder if removing them totally is good. The absence of the events should be supervised somehow - like in today, a failure to poll getstats of a domain will result in a VM going unresponsive. Not the most accurate state but at least gives some visibility. So polling should cover us where events will fail. (similar to engine&#39;s vms monitoring)<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I&#39;m currently working on the patches here:<br>

<a href="https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal</a><br>

<br>

<br>

Even though the basics are in place, I don&#39;t think they are ready for<br>

review yet.<br>

<br>

<br>

Comments welcome, as usual.<br>

<br>

<br>

--<br>

Francesco Romani<br>

Senior SW Eng., Virtualization R&amp;D<br>

Red Hat<br>

IRC: fromani github: @fromanirh<br>

<br>

_______________________________________________<br>

Devel mailing list<br>

<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>

<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a><br>

</blockquote></div></div></div>