
On Tue, Mar 15, 2016 at 10:28 PM, Joop <jvdwege@xs4all.nl> wrote:
On 14-3-2016 10:43, Allon Mureinik wrote:
Odd. Never seen such behavior in any of our set ups. Can you please include vdsm's logs, sanlock's logs and /var/log/messages?
I have noticed the same behaviour but not on server hardware but on my workstations which I use as a ovirt test setup. One would expect that a shutdown on a host would shut it down cleanly but the only way to get that is to run a small script that will take care of: - service ovirt-ha-agent/broker stop - shutting down engine if it runs on this host - service vdsmd stop - service sanlock stop (takes quite a bit of time (~2min?)) - umount whatever is needed - service nfs stop - shutdown
This will poweroff my host which normally runs my hosted-engine everytime. Sanlock seems to be indirectly the problem. wdmd(?) (watchdog daemon) seems able to keep the host from powering off, most of the time it will result in a reboot, or hanging at 'powering off'
I spend quite a bit of time looking into logs but have not been able to find anything conclusive, could be my problem not knowing which log to look at or to dig up enough info to find the root cause.
The issue is probably sanlock - it will refuse to stop if it is maintaining lockspaces on shared storage. If you kill sanlock, the machine watchdog will trigger a reboot after a minute or so. This behavior is by design and what allows ovirt to use locks on shared storage, used for SPM, hosted engine ha agent, and hosted engine vm. To shutdown or reboot a hypervisor, you should release the sanlock leases on shared storage. The process is: 1. Put the hypervisor in maintenance mode via engine This will migrate vms to another hypervisor 2. Put the hosted engine ha server in local maintenance mode 3. Reboot For emergency reboot, when you cannot put the host to maintenance: 1. Kill sanlock (This will cause a reboot in a minute or so) 2. Reboot Nir