[ovirt-users] where to get information about wdmd, sanlock logic

Nir Soffer nsoffer at redhat.com
Sun Sep 4 14:04:04 UTC 2016


On Sat, Sep 3, 2016 at 1:26 PM, Gianluca Cecchi
<gianluca.cecchi at gmail.com> wrote:
>
> Hello,
> is wdmd similar to the old watchdog daemon that was used to kill node (eg in old Oracle 9i RAC environments on Linux)?
> Is it stoppable at all  without having host reboot itself?

Only if not lease is used (see below).

> I see many other daemons on the hypervisor but not so clear the correlation between them and inter-dependency
> Eg
> momd, vdsmd, supervdsmd
> or
> ovirt-ha-agent, ovirt-habroker

supervdsmd is a helper service running as root, used by vdsm to perform
operations that require privileges. Started by systemd if vdsm is started.
If you stop it when vdsm is running, many vdsm operations will fail until
you start it again.

In 4.0 we also have ovirt-imageio-daemon - this is a service used to upload
or download images using http. Started when vdsmd starts. If you stop it
during an upload/download, you will fail the operation, and you will have to
resume it via engine. You will not be able to upload/download images if
the service is stopped.

Adding Martin to explain about momd, ovirt-ha-agent, ovirt-habroker

> Suppose I have an environment with only one host up (eg single host environment but not only; think about a site where due to planned maintenance I have to stop all one at a time), do I have a way to put the host in single user mode without having it automatically reboot itself?

Put the host in maintenance via engine, then all services
can be stopped safely.

> Any docs?

We may have something in ovirt.org.

Here some info regarding vdsm, sanlock and wdmd

- vdsmd - take leases on share storage via sanlock
- sanlock - manage leases on shared storage, uses wdmd to reboot the
  system on fatal failures
- wdmd  -  used by sanlock to multiplex multiple timeouts onto the host
  watchdog timer.

If sanlock has a lease on shared storage (spm lease on the spm host,
or volume lease in the host running hosted engine), you cannot stop
it. If you kill it, it will not pet the host watchdog in time, and the host
watchdog will reboot the machine.

Stopping/killing wdmd may lead to reboot if the host watchdog is used.

To stop sanlock and wdmd safely (for example for upgrade), you must
put the host in maintenance mode via engine. This will release any lease
on shared storage and stop the host watchdog.

You can find more info in sanlock manual:
https://fedorahosted.org/sanlock/

Nir



More information about the Users mailing list