Re: [ovirt-devel] status update: consuming the BLOCK_THRESHOLD event from libvirt (rhbz#1181665)

24 Jul 2017

      This is a multi-part message in MIME format.
--------------A8BFBE836D14266855BB394E
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

TL;DR: NEWS!

First two patches (https://gerrit.ovirt.org/#/c/79386/5 and
https://gerrit.ovirt.org/#/c/79264/14) are now review worthy!

On 07/23/2017 10:58 AM, Roy Golan wrote:
...
[...]
    So we can still use the BLOCK_THRESHOLD event for steady state, and
    avoid polling in the vast majority of the cases.
With "steady state" I mean that the VM is running, with no
    administration (snapshot, live merge, live storage migration...)
    operation in progress.
I think it is fair to assume that VMs are in this state the vast
    majority of the time.
    For the very important cases on which we cannot depend on events,
    we can
    fall back to polling, but in a smarter way:
instead of polling everything every 2s, let's just poll just the
    drives
    involved in the ongoing operations.
Those should be far less of the total amount of drives, and for a far
    shorter time than today, so polling should be practical.
Since the event fires once, we will need to rearm it only if the
    operation is ongoing, and only just before to start it (both
    conditions
    easy to check)
    We can disable the polling on completion, or on error. This per se is
    easy, but we will need a careful review of the flows, and perhaps some
    safety nets in place.
Consider fusing polling and events into a single pipeline of events so
they can be used together. If a poll triggers an event (with
distinguished origin)
then it all the handling is done in one place and it should be easy to
stop or start polling, or remove them totally.
Yes, this is the final design I have in mind. I have plans to refactor
Vdsm master to make it look like that.
It will play nice with refactorings that storage team has planned.
Let's see if virt refactorings are just needed to have the block
threshold events, or if we can postpone them.
...
On recovery, we will need to make sure to rearm all the relevant
    events,
    but  we can just plug in the recovery we must do already, so this
    should
    be easy as well.
What is needed in order to 'rearm' it? is there an API to get the
state of event subscription?
If we lost an event how do we know to rearm it? is it idempotent to rearm?
QEMU supports a single threshold per block device (= node of backing
chain), so rearming a
threshold just means setting a new threshold, overwriting the old one.
To rearm the, we need to get the highest allocation of block devices and
set the threshold.
If we do that among the first thing of recovery, it should be little
risk, if any.

To know if we need to do that, we "just" need to inspect all block
devices at recovery.
It doesn't come for free, but I believe it is a fair price.
...
Remind me, do we extend a disk if the VM paused with out of space event?
Yes we do. We examine the last paused reason in recovery, we do
extension in this case
...
How will we handle 2 subsequent events if we didn't extend between
them? (expecting the extend to be async operation)
At qemu level, the event cannot fire twice, must be rearmed after every
firing.
In general, should virt code receive two events before the extension
completed... I don't know yet :) Perhaps we can start just handling the
first event, I don't think we can easily queue extension requests (and
I'm not sure we should)
...
I believe the best route is:
1. offer the new event-based code for 4.2, keep the polling around.
    Default to events for performance
2. remove the polling completely in 4.3
Still wonder if removing them totally is good. The absence of the
events should be supervised somehow - like in today, a failure to poll
getstats of a domain will result in a VM going unresponsive. Not the
most accurate state but at least gives some visibility. So polling
should cover us where events will fail. (similar to engine's vms
monitoring)
I don't have strong opinions about polling removal as long as it is
disabled by default.
Actually, I like having fallbacks and safety nets in place.
However, the libvirt event support is here to stay, and as time goes, it
should only get better (featurewise and reliability wise).
...
I'm currently working on the patches here:
    https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:wa...
Even though the basics are in place, I don't think they are ready for
    review yet.
First two patches (https://gerrit.ovirt.org/#/c/79386/5 and
https://gerrit.ovirt.org/#/c/79264/14) are now review worthy!

-- 
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh

--------------A8BFBE836D14266855BB394E
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    TL;DR: NEWS!<br>
    <br>
    First two patches (<a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/79386/5">https://gerrit.ovirt.org/#/c/79386/5</a> and
    <a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/79264/14">https://gerrit.ovirt.org/#/c/79264/14</a>) are now review worthy!<br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 07/23/2017 10:58 AM, Roy Golan
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote"><br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            [...]<br>
            So we can still use the BLOCK_THRESHOLD event for steady
            state, and<br>
            avoid polling in the vast majority of the cases.<br>
            <br>
            With "steady state" I mean that the VM is running, with no<br>
            administration (snapshot, live merge, live storage
            migration...)<br>
            operation in progress.<br>
            <br>
            I think it is fair to assume that VMs are in this state the
            vast<br>
            majority of the time.<br>
            For the very important cases on which we cannot depend on
            events, we can<br>
            fall back to polling, but in a smarter way:<br>
            <br>
            instead of polling everything every 2s, let's just poll just
            the drives<br>
            involved in the ongoing operations.<br>
            <br>
            Those should be far less of the total amount of drives, and
            for a far<br>
            shorter time than today, so polling should be practical.<br>
            <br>
            Since the event fires once, we will need to rearm it only if
            the<br>
            operation is ongoing, and only just before to start it (both
            conditions<br>
            easy to check)<br>
            We can disable the polling on completion, or on error. This
            per se is<br>
            easy, but we will need a careful review of the flows, and
            perhaps some<br>
            safety nets in place.<br>
            <br>
          </blockquote>
          <div><br>
          </div>
          <div>Consider fusing polling and events into a single pipeline
            of events so they can be used together. If a poll triggers
            an event (with distinguished origin)<br>
          </div>
          <div>then it all the handling is done in one place and it
            should be easy to stop or start polling, or remove them
            totally.<br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Yes, this is the final design I have in mind. I have plans to
    refactor Vdsm master to make it look like that.<br>
    It will play nice with refactorings that storage team has planned.<br>
    Let's see if virt refactorings are just needed to have the block
    threshold events, or if we can postpone them.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
           <br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            On recovery, we will need to make sure to rearm all the
            relevant events,<br>
            but  we can just plug in the recovery we must do already, so
            this should<br>
            be easy as well.<br>
            <br>
          </blockquote>
          <div> </div>
        </div>
        <div dir="ltr">
          <div class="gmail_quote">
            <div>What is needed in order to 'rearm' it? is there an API
              to get the state of event subscription?<br>
            </div>
            <div>If we lost an event how do we know to rearm it? is it
              idempotent to rearm?<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    QEMU supports a single threshold per block device (= node of backing
    chain), so rearming a<br>
    threshold just means setting a new threshold, overwriting the old
    one.<br>
    To rearm the, we need to get the highest allocation of block devices
    and set the threshold.<br>
    If we do that among the first thing of recovery, it should be little
    risk, if any.<br>
    <br>
    To know if we need to do that, we "just" need to inspect all block
    devices at recovery.<br>
    It doesn't come for free, but I believe it is a fair price.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div class="gmail_quote">
            <div>Remind me, do we extend a disk if the VM paused with
              out of space event?<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Yes we do. We examine the last paused reason in recovery, we do
    extension in this case<br>
    <br>
    <blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>How will we handle 2 subsequent events if we didn't
              extend between them? (expecting the extend to be async
              operation)<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    At qemu level, the event cannot fire twice, must be rearmed after
    every firing.<br>
    In general, should virt code receive two events before the extension
    completed... I don't know yet :) Perhaps we can start just handling
    the first event, I don't think we can easily queue extension
    requests (and I'm not sure we should)<br>
    <br>
    <blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
      <div dir="ltr"> <br>
        <div dir="ltr">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              I believe the best route is:<br>
              <br>
              1. offer the new event-based code for 4.2, keep the
              polling around.<br>
              Default to events for performance<br>
              <br>
              2. remove the polling completely in 4.3<br>
              <br>
              <br>
            </blockquote>
            <div>Still wonder if removing them totally is good. The
              absence of the events should be supervised somehow - like
              in today, a failure to poll getstats of a domain will
              result in a VM going unresponsive. Not the most accurate
              state but at least gives some visibility. So polling
              should cover us where events will fail. (similar to
              engine's vms monitoring)<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I don't have strong opinions about polling removal as long as it is
    disabled by default.<br>
    Actually, I like having fallbacks and safety nets in place.<br>
    However, the libvirt event support is here to stay, and as time
    goes, it should only get better (featurewise and reliability wise).<br>
    <br>
    <blockquote type="cite"
cite="mid:CAC_Jqc=nZY+sqDD21_mxPmr+kzxTM15jq=_4c5zJDyzQnupHLA@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              I'm currently working on the patches here:<br>
              <a
href="https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:wa..."
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:watermark-event-minimal</a><br>
              <br>
              <br>
              Even though the basics are in place, I don't think they
              are ready for<br>
              review yet.<br>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    First two patches (<a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/79386/5">https://gerrit.ovirt.org/#/c/79386/5</a> and
    <a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/79264/14">https://gerrit.ovirt.org/#/c/79264/14</a>) are now review worthy!<br>
    <br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh</pre>
  </body>
</html>

--------------A8BFBE836D14266855BB394E--