Shutdown Problems on oVirt Node

Hi their I have currently a strange problem on a new oVirt 3.6 installation. At the moment a clean shutdown doesn't work, most of the time he reboots the system or hangs in the shutdown progress. I discovered this, when i tested our multiple UPS solution and send some test signals over ipmi to our server with ex. ipmipower -h 192.168.2.218 -u root -p password --soft . We also discovered that shutdown -h now , poweroff or init 0 had the same effect. On a clean CentOS installation, which is not included in our oVirt setup this works as expected, but on our ovirt-node this doesn't work. In the shutdown progress i see the following which tooks very long:
A stop job is running for Shared Storage Lease Manager (23s / 1min 47s)
At the end i had then the following on my console screen: [ OK ] Reached target Shutdown Nothing more happens. no poweroff or something. I can wait more than three minutes and nothing happens. I also tried a clean re-install from the oVirt Administration WebUI but this doesn't have any effect on this issue. When i type "service sanlock stop" or "service vdsmd stop" in the server console and then do a poweroff , all works as expected. The shutdown is the also realy fast as expected. At the moment we think that the problem is on ovirt, vdsmd or on the sanlock settings for ovirt, because all settings on our site are on default settings. Currently Setup are two Intel Server (With RRM) with CentOS-7 (1511) and oVirt 3.6.3 and one Intel Server with OpenIndiana which provides Storage via NFS. Had someone a solution for this? is this perhaps a bug and shoul'd be reported? Greetings Roger Meier

Odd. Never seen such behavior in any of our set ups. Can you please include vdsm's logs, sanlock's logs and /var/log/messages? Thanks! On Wed, Mar 2, 2016 at 6:00 PM, Roger Meier <roger.meier@4synergy.com> wrote:
Hi their
I have currently a strange problem on a new oVirt 3.6 installation. At the moment a clean shutdown doesn't work, most of the time he reboots the system or hangs in the shutdown progress.
I discovered this, when i tested our multiple UPS solution and send some test signals over ipmi to our server with ex. ipmipower -h 192.168.2.218 -u root -p password --soft . We also discovered that shutdown -h now , poweroff or init 0 had the same effect.
On a clean CentOS installation, which is not included in our oVirt setup this works as expected, but on our ovirt-node this doesn't work.
In the shutdown progress i see the following which tooks very long:
A stop job is running for Shared Storage Lease Manager (23s / 1min 47s)
At the end i had then the following on my console screen:
[ OK ] Reached target Shutdown
Nothing more happens. no poweroff or something. I can wait more than three minutes and nothing happens.
I also tried a clean re-install from the oVirt Administration WebUI but this doesn't have any effect on this issue.
When i type "service sanlock stop" or "service vdsmd stop" in the server console and then do a poweroff , all works as expected. The shutdown is the also realy fast as expected.
At the moment we think that the problem is on ovirt, vdsmd or on the sanlock settings for ovirt, because all settings on our site are on default settings.
Currently Setup are two Intel Server (With RRM) with CentOS-7 (1511) and oVirt 3.6.3 and one Intel Server with OpenIndiana which provides Storage via NFS.
Had someone a solution for this? is this perhaps a bug and shoul'd be reported?
Greetings Roger Meier _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------030500060604080107050603 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 14-3-2016 10:43, Allon Mureinik wrote:
Odd. Never seen such behavior in any of our set ups. Can you please include vdsm's logs, sanlock's logs and /var/log/message= s?
I have noticed the same behaviour but not on server hardware but on my workstations which I use as a ovirt test setup. One would expect that a shutdown on a host would shut it down cleanly but the only way to get that is to run a small script that will take care of: - service ovirt-ha-agent/broker stop - shutting down engine if it runs on this host - service vdsmd stop - service sanlock stop (takes quite a bit of time (~2min?)) - umount whatever is needed - service nfs stop - shutdown This will poweroff my host which normally runs my hosted-engine everytime. Sanlock seems to be indirectly the problem. wdmd(?) (watchdog daemon) seems able to keep the host from powering off, most of the time it will result in a reboot, or hanging at 'powering off' I spend quite a bit of time looking into logs but have not been able to find anything conclusive, could be my problem not knowing which log to look at or to dig up enough info to find the root cause. Joop --------------030500060604080107050603 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">On 14-3-2016 10:43, Allon Mureinik wrote:<br> </div> <blockquote cite="mid:CADgXf3ynGobo4eBzAXGN2tQDa-Z1NG6cL4xz7oe8_7khgy=VBw@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_default" style="font-family:monospace,monospace">Odd. Never seen such behavior in any of our set ups.</div> <div class="gmail_default" style="font-family:monospace,monospace">Can you please include vdsm's logs, sanlock's logs and /var/log/messages?</div> <div class="gmail_default" style="font-family:monospace,monospace"><br> </div> </div> </blockquote> I have noticed the same behaviour but not on server hardware but on my workstations which I use as a ovirt test setup.<br> One would expect that a shutdown on a host would shut it down cleanly but the only way to get that is to run a small script that will take care of:<br> - service ovirt-ha-agent/broker stop<br> - shutting down engine if it runs on this host<br> - service vdsmd stop<br> - service sanlock stop (takes quite a bit of time (~2min?))<br> - umount whatever is needed<br> - service nfs stop<br> - shutdown<br> <br> This will poweroff my host which normally runs my hosted-engine everytime. Sanlock seems to be indirectly the problem. wdmd(?) (watchdog daemon) seems able to keep the host from powering off, most of the time it will result in a reboot, or hanging at 'powering off'<br> <br> I spend quite a bit of time looking into logs but have not been able to find anything conclusive, could be my problem not knowing which log to look at or to dig up enough info to find the root cause.<br> <br> Joop<br> <br> </body> </html> --------------030500060604080107050603--

On Tue, Mar 15, 2016 at 10:28 PM, Joop <jvdwege@xs4all.nl> wrote:
On 14-3-2016 10:43, Allon Mureinik wrote:
Odd. Never seen such behavior in any of our set ups. Can you please include vdsm's logs, sanlock's logs and /var/log/messages?
I have noticed the same behaviour but not on server hardware but on my workstations which I use as a ovirt test setup. One would expect that a shutdown on a host would shut it down cleanly but the only way to get that is to run a small script that will take care of: - service ovirt-ha-agent/broker stop - shutting down engine if it runs on this host - service vdsmd stop - service sanlock stop (takes quite a bit of time (~2min?)) - umount whatever is needed - service nfs stop - shutdown
This will poweroff my host which normally runs my hosted-engine everytime. Sanlock seems to be indirectly the problem. wdmd(?) (watchdog daemon) seems able to keep the host from powering off, most of the time it will result in a reboot, or hanging at 'powering off'
I spend quite a bit of time looking into logs but have not been able to find anything conclusive, could be my problem not knowing which log to look at or to dig up enough info to find the root cause.
The issue is probably sanlock - it will refuse to stop if it is maintaining lockspaces on shared storage. If you kill sanlock, the machine watchdog will trigger a reboot after a minute or so. This behavior is by design and what allows ovirt to use locks on shared storage, used for SPM, hosted engine ha agent, and hosted engine vm. To shutdown or reboot a hypervisor, you should release the sanlock leases on shared storage. The process is: 1. Put the hypervisor in maintenance mode via engine This will migrate vms to another hypervisor 2. Put the hosted engine ha server in local maintenance mode 3. Reboot For emergency reboot, when you cannot put the host to maintenance: 1. Kill sanlock (This will cause a reboot in a minute or so) 2. Reboot Nir

On Wed, Mar 2, 2016 at 6:00 PM, Roger Meier <roger.meier@4synergy.com> wrote:
Hi their
I have currently a strange problem on a new oVirt 3.6 installation. At the moment a clean shutdown doesn't work, most of the time he reboots the system or hangs in the shutdown progress.
I discovered this, when i tested our multiple UPS solution and send some test signals over ipmi to our server with ex. ipmipower -h 192.168.2.218 -u root -p password --soft . We also discovered that shutdown -h now , poweroff or init 0 had the same effect.
On a clean CentOS installation, which is not included in our oVirt setup this works as expected, but on our ovirt-node this doesn't work.
In the shutdown progress i see the following which tooks very long:
A stop job is running for Shared Storage Lease Manager (23s / 1min 47s)
This is sanlock - maybe it would not stop because it has active lockspaces, delaying shutdown? Did you put the host to maintenance before shutting it down? In maintenance mode, vdsm will release all lockspaces, so sanlock should not delay shutdown in any way.
At the end i had then the following on my console screen:
[ OK ] Reached target Shutdown
Nothing more happens. no poweroff or something. I can wait more than three minutes and nothing happens.
I also tried a clean re-install from the oVirt Administration WebUI but this doesn't have any effect on this issue.
When i type "service sanlock stop" or "service vdsmd stop" in the server console and then do a poweroff , all works as expected. The shutdown is the also realy fast as expected.
During poweroff sanlock service is stopped like any other service, so if it worked from the shell, it should work during shutdown. Stopping vdsm is not needed and should not effect the shutdown.
At the moment we think that the problem is on ovirt, vdsmd or on the sanlock settings for ovirt, because all settings on our site are on default settings.
Currently Setup are two Intel Server (With RRM) with CentOS-7 (1511) and oVirt 3.6.3 and one Intel Server with OpenIndiana which provides Storage via NFS.
Had someone a solution for this? is this perhaps a bug and shoul'd be reported?
You can file a bug and attach the logs mentioned by Allon, it will help to track this issue. Since this is a problem with ovirt-node, I would open a bug for it. It may also be an issue with sanlock init scripts. Nir
participants (4)
-
Allon Mureinik
-
Joop
-
Nir Soffer
-
Roger Meier