[ovirt-users] relationship between sanlock and wdmd services

Gianluca Cecchi gianluca.cecchi at gmail.com
Tue Jul 5 03:25:33 EDT 2016


Hello,
sometimes one could be in need to keep hypervisor up (CentOS 7.2 in my
case) but with all ovirt releated services stopped.

I see that sanlock and wdmd systemd units are bot part of sanlock rpm
package.

In these cases, having only one single host in a lab environment, I follow
this comment by Joop
http://lists.ovirt.org/pipermail/users/2016-June/040214.html

So, I stop all a VMs, put env in global maintenance and then on host:
systemctl stop ovirt-ha-agent
systemctl stop ovirt-ha-broker

shutdown engine vm

On host again:
systemctl stop vdsmd
systemctl stop sanlock.service

At this point sometimes I can work, sometimes after some minutes the host
restarts itself, I presume due to wdmd

In fact I see in messages:
Jul  4 17:05:47 ractor wdmd[1258]: test failed rem 26 now 804 ping 760
close 770 renewal 697 expire 777 client 1285
sanlock_2025c2ea-6205-4bc1-b29d-745b47f8f806:1
Jul  4 17:05:48 ractor wdmd[1258]: test failed rem 25 now 805 ping 760
close 770 renewal 697 expire 777 client 1285
sanlock_2025c2ea-6205-4bc1-b29d-745b47f8f806:1
Jul  4 17:05:49 ractor wdmd[1258]: test failed rem 24 now 806 ping 760
close 770 renewal 697 expire 777 client 1285
sanlock_2025c2ea-6205-4bc1-b29d-745b47f8f806:1
Jul  4 17:05:50 ractor wdmd[1258]: test failed rem 23 now 807 ping 760
close 770 renewal 697 expire 777 client 1285
sanlock_2025c2ea-6205-4bc1-b29d-745b47f8f806:1
Jul  4 17:05:51 ractor wdmd[1258]: test failed rem 22 now 808 ping 760
close 770 renewal 697 expire 777 client 1285
sanlock_2025c2ea-6205-4bc1-b29d-745b47f8f806:1
Jul  4 17:05:51 ractor systemd[1]: wdmd.service stop-sigterm timed out.
Killing.
Jul  4 17:05:51 ractor systemd[1]: wdmd.service: main process exited,
code=killed, status=9/KILL
Jul  4 17:05:51 ractor systemd[1]: Stopped Watchdog Multiplexing Daemon.
Jul  4 17:05:51 ractor systemd[1]: Unit wdmd.service entered failed state.
Jul  4 17:05:51 ractor systemd[1]: wdmd.service failed.

In systemd unit file for sanlock:
[Unit]
Description=Shared Storage Lease Manager
After=syslog.target
Wants=wdmd.service

Nothing special instead for wdmd.
I tried also to stop it but server still rebooted.
Also, it seems to me that sometimes sanlock is ale to stop, someties exits
with "failed".

So the question is if wdmd is able to be stopped or if it is the same
behavior of old watchdogd on Linux

Thanks in advance,
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160705/7963d47e/attachment.html>


More information about the Users mailing list