<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
TL;DR: NEWS!<br>
<br>
First two patches (<a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/79386/5">https://gerrit.ovirt.org/#/c/79386/5</a> and
<a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/79264/14">https://gerrit.ovirt.org/#/c/79264/14</a>) are now review worthy!<br>
<br>
<br>
<div class="moz-cite-prefix">On 07/23/2017 10:58 AM, Roy Golan
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote"><br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
[...]<br>
So we can still use the BLOCK_THRESHOLD event for steady
state, and<br>
avoid polling in the vast majority of the cases.<br>
<br>
With "steady state" I mean that the VM is running, with no<br>
administration (snapshot, live merge, live storage
migration...)<br>
operation in progress.<br>
<br>
I think it is fair to assume that VMs are in this state the
vast<br>
majority of the time.<br>
For the very important cases on which we cannot depend on
events, we can<br>
fall back to polling, but in a smarter way:<br>
<br>
instead of polling everything every 2s, let's just poll just
the drives<br>
involved in the ongoing operations.<br>
<br>
Those should be far less of the total amount of drives, and
for a far<br>
shorter time than today, so polling should be practical.<br>
<br>
Since the event fires once, we will need to rearm it only if
the<br>
operation is ongoing, and only just before to start it (both
conditions<br>
easy to check)<br>
We can disable the polling on completion, or on error. This
per se is<br>
easy, but we will need a careful review of the flows, and
perhaps some<br>
safety nets in place.<br>
<br>
</blockquote>
<div><br>
</div>
<div>Consider fusing polling and events into a single pipeline
of events so they can be used together. If a poll triggers
an event (with distinguished origin)<br>
</div>
<div>then it all the handling is done in one place and it
should be easy to stop or start polling, or remove them
totally.<br>
</div>
</div>
</div>
</blockquote>
<br>
Yes, this is the final design I have in mind. I have plans to
refactor Vdsm master to make it look like that.<br>
It will play nice with refactorings that storage team has planned.<br>
Let's see if virt refactorings are just needed to have the block
threshold events, or if we can postpone them.<br>
<br>
<blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
On recovery, we will need to make sure to rearm all the
relevant events,<br>
but we can just plug in the recovery we must do already, so
this should<br>
be easy as well.<br>
<br>
</blockquote>
<div> </div>
</div>
<div dir="ltr">
<div class="gmail_quote">
<div>What is needed in order to 'rearm' it? is there an API
to get the state of event subscription?<br>
</div>
<div>If we lost an event how do we know to rearm it? is it
idempotent to rearm?<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
QEMU supports a single threshold per block device (= node of backing
chain), so rearming a<br>
threshold just means setting a new threshold, overwriting the old
one.<br>
To rearm the, we need to get the highest allocation of block devices
and set the threshold.<br>
If we do that among the first thing of recovery, it should be little
risk, if any.<br>
<br>
To know if we need to do that, we "just" need to inspect all block
devices at recovery.<br>
It doesn't come for free, but I believe it is a fair price.<br>
<br>
<blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div>Remind me, do we extend a disk if the VM paused with
out of space event?<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Yes we do. We examine the last paused reason in recovery, we do
extension in this case<br>
<br>
<blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>How will we handle 2 subsequent events if we didn't
extend between them? (expecting the extend to be async
operation)<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
At qemu level, the event cannot fire twice, must be rearmed after
every firing.<br>
In general, should virt code receive two events before the extension
completed... I don't know yet :) Perhaps we can start just handling
the first event, I don't think we can easily queue extension
requests (and I'm not sure we should)<br>
<br>
<blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
<div dir="ltr"> <br>
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
I believe the best route is:<br>
<br>
1. offer the new event-based code for 4.2, keep the
polling around.<br>
Default to events for performance<br>
<br>
2. remove the polling completely in 4.3<br>
<br>
<br>
</blockquote>
<div>Still wonder if removing them totally is good. The
absence of the events should be supervised somehow - like
in today, a failure to poll getstats of a domain will
result in a VM going unresponsive. Not the most accurate
state but at least gives some visibility. So polling
should cover us where events will fail. (similar to
engine's vms monitoring)<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
I don't have strong opinions about polling removal as long as it is
disabled by default.<br>
Actually, I like having fallbacks and safety nets in place.<br>
However, the libvirt event support is here to stay, and as time
goes, it should only get better (featurewise and reliability wise).<br>
<br>
<blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
I'm currently working on the patches here:<br>
<a
href="https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal</a><br>
<br>
<br>
Even though the basics are in place, I don't think they
are ready for<br>
review yet.<br>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
First two patches (<a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/79386/5">https://gerrit.ovirt.org/#/c/79386/5</a> and
<a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/79264/14">https://gerrit.ovirt.org/#/c/79264/14</a>) are now review worthy!<br>
<br>
<br>
<pre class="moz-signature" cols="72">--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh</pre>
</body>
</html>