<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    It turns out I was wrong before. I don't have to start up Engine to

    get into this situation.<br>

    <br>

    I did the following:<br>

    <ul>

      <li>Turn on Global Maintenance</li>

      <li>Engine init 0</li>

      <li>Reboot node</li>

      <li>Wait a few minutes</li>

      <li>poweroff<br>

      </li>

    </ul>

    <br>

    I'll get the timeouts and hangs during shutdown again, and a reset

    instead of poweroff.<br>

    <br>

    It's possible that somehow the system is coming out of Global

    Maintenance mode during shutdown, and the Engine VM is starting up

    and causing this issue.<br>

    <br>

    I did the following.<br>

    1. hosted-engine --set-maintenance --mode=none<br>

    You can see the attached output from 'hosted-engine --vm-status'

    (hosted-engine.out) at this point, indicating that the system is in

    Global Maintenance<br>

    <br>

    2. Waited 60 seconds, and checked sanlock<br>

    You can see the attached output of 'sanlock client status'

    (sanlock-status.out) at this point, showing the Engine VM locks

    being held<br>

    <br>

    3. I stopped the vdsmd service (note that the first time I tried I

    got "Job for vdsmd.service cancelled", and re-issued the stop.<br>

    You can see the attached output of 'sanlock client status', and the

    following commands (output)<br>

    <br>

    What's interesting and I didn't notice right away, is that after I

    stopped vdsmd the sanlock status started changing as if the locks

    were being manipulated.<br>

    After I stopped vdsmd, the HA services, and libvirtd, and waited 60

    seconds, I noticed the locks seemed to be changing state and that

    HostedEngine was listed. At that point I got suspicious and started

    vdsmd again so that I could recheck Global Maintenance mode, and I

    found that the system was no longer *in* maintenance, and that the

    Engine VM was running.<br>

    <br>

    So I think this partly explains the situation. Somehow the act of

    stopping vdsmd is making the system look like it is *out* of Global

    Maintenance mode, and the Engine VM starts up while the system is

    shutting down. This creates new sanlock leases on the Engine VM

    storage, which prevents the system from shutting down cleanly. Oddly

    after a reboot Global Maintenance is preserved.<br>

    <br>

    But there may be more going on. Even if I stop vdsmd, the HA

    services, and libvirtd, and sleep 60 seconds, I still see a lock

    held on the Engine VM storage:<br>

    <br>

    <pre>daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar

p -1 helper

p -1 listener

p -1 status

s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0

s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

</pre>

    <br>

    It stays in this state however and HostedEngine doesn't grab a lock

    again.<br>

    In any case no matter what I do, it's impossible to shut the system

    down cleanly.<br>

    <br>

    -Bob<br>

    <br>

    <div class="moz-cite-prefix">On 06/13/2014 08:33 AM, Doron Fediuck

      wrote:<br>

    </div>

    <blockquote

      cite="mid:18670369.25624513.1402662818691.JavaMail.zimbra@redhat.com"

      type="cite">

      <pre wrap="">

----- Original Message -----

</pre>

      <blockquote type="cite">

        <pre wrap="">From: "Andrew Lau" <a class="moz-txt-link-rfc2396E" href="mailto:andrew@andrewklau.com">&lt;andrew@andrewklau.com&gt;</a>

To: "Bob Doolittle" <a class="moz-txt-link-rfc2396E" href="mailto:bob@doolittle.us.com">&lt;bob@doolittle.us.com&gt;</a>

Cc: "users" <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org">&lt;users@ovirt.org&gt;</a>

Sent: Friday, June 6, 2014 6:14:18 AM

Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

On Fri, Jun 6, 2014 at 1:09 PM, Bob Doolittle <a class="moz-txt-link-rfc2396E" href="mailto:bob@doolittle.us.com">&lt;bob@doolittle.us.com&gt;</a> wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">Thanks Andrew, I'll try this workaround tomorrow for sure. But reading

though that bug report (closed not a bug) it states that the problem should

only arise if something is not releasing a sanlock lease. So if we've

entered Global Maintenance and shut down Engine, the question is what's

holding the lease?

How can that be debugged?

</pre>

        </blockquote>

        <pre wrap="">For me it's wdmd and sanlock itself failing to shutdown properly. I

also noticed even when in global maintenance and the engine VM powered

off there is still a sanlock lease for the

/rhev/mnt/....hosted-engine/? lease file or something along those

lines. So the global maintenance may not actually be releasing that

lock.

I'm not too familiar with sanlock etc. So it's like stabbing in the dark :(

</pre>

      </blockquote>

      <pre wrap="">Sounds like a bug since once the VM is off there should not

be a lease taken.

Please check if after a minute you still have a lease taken

according to: <a class="moz-txt-link-freetext" href="http://www.ovirt.org/SANLock#sanlock_timeouts">http://www.ovirt.org/SANLock#sanlock_timeouts</a>

In this case try to stop vdsm and libvirt just so we'll know

who still keeps the lease.

</pre>

      <blockquote type="cite">

        <blockquote type="cite">

          <pre wrap="">-Bob

On Jun 5, 2014 10:56 PM, "Andrew Lau" <a class="moz-txt-link-rfc2396E" href="mailto:andrew@andrewklau.com">&lt;andrew@andrewklau.com&gt;</a> wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle <a class="moz-txt-link-rfc2396E" href="mailto:bob@doolittle.us.com">&lt;bob@doolittle.us.com&gt;</a>

wrote:

</pre>

            <blockquote type="cite">

              <pre wrap="">On 05/25/2014 02:51 PM, Joop wrote:

</pre>

              <blockquote type="cite">

                <pre wrap="">On 25-5-2014 19:38, Bob Doolittle wrote:

</pre>

                <blockquote type="cite">

                  <pre wrap="">

Also curious is that when I say "poweroff" it actually reboots and

comes

up again. Could that be due to the timeouts on the way down?

</pre>

                </blockquote>

                <pre wrap="">Ah, that's something my F19 host does too. Some more info: if engine

hasn't been started on the host then I can shutdown it and it will

poweroff.

IF engine has been run on it then it will reboot.

Its not vdsm (I think) because my shutdown sequence is (on my f19

host):

 service ovirt-agent-ha stop

 service ovirt-agent-broker stop

 service vdsmd stop

 ssh root@engine01 "init 0"

init 0

I don't use maintenance mode because when I poweron my host (= my

desktop)

I want engine to power on automatically which it does most of the time

within 10 min.

</pre>

              </blockquote>

              <pre wrap="">

For comparison, I see this issue and I *do* use maintenance mode

(because

presumably that's the 'blessed' way to shut things down and I'm scared

to

mess this complex system up by straying off the beaten path ;). My

process

is:

ssh root@engine "init 0"

(wait for "vdsClient -s 0 list | grep Status:" to show the vm as down)

hosted-engine --set-maintenance --mode=global

poweroff

And then on startup:

hosted-engine --set-maintenance --mode=none

hosted-engine --vm-start

There are two issues here. I am not sure if they are related or not.

1. The NFS timeout during shutdown (Joop do you see this also? Or just

#2?)

2. The system reboot instead of poweroff (which messes up remote machine

management)

Thanks,

     Bob

</pre>

              <blockquote type="cite">

                <pre wrap="">I think wdmd or sanlock are causing the reboot instead of poweroff

</pre>

              </blockquote>

            </blockquote>

            <pre wrap="">While searching for my issue of wdmd/sanlock not shutting down, I

found this which may interest you both:

<a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=888197">https://bugzilla.redhat.com/show_bug.cgi?id=888197</a>

Specifically:

"To shut down sanlock without causing a wdmd reboot, you can run the

following command: "sanlock client shutdown -f 1"

This will cause sanlock to kill any pid's that are holding leases,

release those leases, and then exit.

"

</pre>

            <blockquote type="cite">

              <blockquote type="cite">

                <pre wrap="">Joop

</pre>

              </blockquote>

            </blockquote>

          </blockquote>

        </blockquote>

      </blockquote>

    </blockquote>

    <br>

  </body>

</html>