[ovirt-users] Re: Unable to start ovirt-ha-agent on all hosts

28 Dec 2021

      Hi Didi,
...
Can you please check/share also broker.log? Thanks.
I did that. Turns out that ...

ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException: path to storage domain e1f61a9f-0c93-4d01-8f6f-7f8a5470ee2f not found in /rhev/data-center/mnt/glusterSD

... and I noticed that the glusterd service was not started on host3 (vendor setting was set to disabled). After starting the glusterd service the ovirt-ha-agent services recovered, the hosted-engine could be started and then it blew my mind:

While I was switching host3 into maintenance, I did not notice that the hosted-engine marked host1 "non-responsive" (although the host was fine) and scheduled the migration of host1 VMs to host3. By setting host3 to maintenance the migration of scheduled VMs was cancelled but two VMs were migrated, so they were migrated (back) to host2.

Now this is the result:

VM xyz is down with error. Exit message: internal error: process exited while connecting to monitor: 2021-12-28T06:33:19.011352Z qemu-kvm: -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}: qcow2: Image is corrupt; cannot be opened read/write.
12/28/217:33:21 AM

Trying to repair the image with "qemu-img check -r all" failed.

What an experience. Maybe I'm too stupid for this.

[ovirt-users] Re: Unable to start ovirt-ha-agent on all hosts

martin＠fulmo.org