On Sat, Sep 3, 2016 at 1:26 PM, Gianluca Cecchi
<gianluca.cecchi(a)gmail.com> wrote:
Hello,
is wdmd similar to the old watchdog daemon that was used to kill node (eg in old Oracle
9i RAC environments on Linux)?
Is it stoppable at all without having host reboot itself?
Only if not lease is used (see below).
I see many other daemons on the hypervisor but not so clear the
correlation between them and inter-dependency
Eg
momd, vdsmd, supervdsmd
or
ovirt-ha-agent, ovirt-habroker
supervdsmd is a helper service running as root, used by vdsm to perform
operations that require privileges. Started by systemd if vdsm is started.
If you stop it when vdsm is running, many vdsm operations will fail until
you start it again.
In 4.0 we also have ovirt-imageio-daemon - this is a service used to upload
or download images using http. Started when vdsmd starts. If you stop it
during an upload/download, you will fail the operation, and you will have to
resume it via engine. You will not be able to upload/download images if
the service is stopped.
Adding Martin to explain about momd, ovirt-ha-agent, ovirt-habroker
Suppose I have an environment with only one host up (eg single host
environment but not only; think about a site where due to planned maintenance I have to
stop all one at a time), do I have a way to put the host in single user mode without
having it automatically reboot itself?
Put the host in maintenance via engine, then all services
can be stopped safely.
Any docs?
We may have something in
ovirt.org.
Here some info regarding vdsm, sanlock and wdmd
- vdsmd - take leases on share storage via sanlock
- sanlock - manage leases on shared storage, uses wdmd to reboot the
system on fatal failures
- wdmd - used by sanlock to multiplex multiple timeouts onto the host
watchdog timer.
If sanlock has a lease on shared storage (spm lease on the spm host,
or volume lease in the host running hosted engine), you cannot stop
it. If you kill it, it will not pet the host watchdog in time, and the host
watchdog will reboot the machine.
Stopping/killing wdmd may lead to reboot if the host watchdog is used.
To stop sanlock and wdmd safely (for example for upgrade), you must
put the host in maintenance mode via engine. This will release any lease
on shared storage and stop the host watchdog.
You can find more info in sanlock manual:
https://fedorahosted.org/sanlock/
Nir