On Tue, Mar 15, 2016 at 10:28 PM, Joop <jvdwege(a)xs4all.nl> wrote:
On 14-3-2016 10:43, Allon Mureinik wrote:
Odd. Never seen such behavior in any of our set ups.
Can you please include vdsm's logs, sanlock's logs and /var/log/messages?
I have noticed the same behaviour but not on server hardware but on my
workstations which I use as a ovirt test setup.
One would expect that a shutdown on a host would shut it down cleanly but
the only way to get that is to run a small script that will take care of:
- service ovirt-ha-agent/broker stop
- shutting down engine if it runs on this host
- service vdsmd stop
- service sanlock stop (takes quite a bit of time (~2min?))
- umount whatever is needed
- service nfs stop
- shutdown
This will poweroff my host which normally runs my hosted-engine everytime.
Sanlock seems to be indirectly the problem. wdmd(?) (watchdog daemon) seems
able to keep the host from powering off, most of the time it will result in
a reboot, or hanging at 'powering off'
I spend quite a bit of time looking into logs but have not been able to find
anything conclusive, could be my problem not knowing which log to look at or
to dig up enough info to find the root cause.
The issue is probably sanlock - it will refuse to stop if it is
maintaining lockspaces
on shared storage. If you kill sanlock, the machine watchdog will
trigger a reboot
after a minute or so. This behavior is by design and what allows ovirt
to use locks
on shared storage, used for SPM, hosted engine ha agent, and hosted engine vm.
To shutdown or reboot a hypervisor, you should release the sanlock leases on
shared storage.
The process is:
1. Put the hypervisor in maintenance mode via engine
This will migrate vms to another hypervisor
2. Put the hosted engine ha server in local maintenance mode
3. Reboot
For emergency reboot, when you cannot put the host to maintenance:
1. Kill sanlock
(This will cause a reboot in a minute or so)
2. Reboot
Nir