Hi Didi,
Can you please check/share also broker.log? Thanks.
I did that. Turns out that ...
ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException: path to storage
domain e1f61a9f-0c93-4d01-8f6f-7f8a5470ee2f not found in /rhev/data-center/mnt/glusterSD
... and I noticed that the glusterd service was not started on host3 (vendor setting was
set to disabled). After starting the glusterd service the ovirt-ha-agent services
recovered, the hosted-engine could be started and then it blew my mind:
While I was switching host3 into maintenance, I did not notice that the hosted-engine
marked host1 "non-responsive" (although the host was fine) and scheduled the
migration of host1 VMs to host3. By setting host3 to maintenance the migration of
scheduled VMs was cancelled but two VMs were migrated, so they were migrated (back) to
host2.
Now this is the result:
VM xyz is down with error. Exit message: internal error: process exited while connecting
to monitor: 2021-12-28T06:33:19.011352Z qemu-kvm: -blockdev
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}:
qcow2: Image is corrupt; cannot be opened read/write.
12/28/217:33:21 AM
Trying to repair the image with "qemu-img check -r all" failed.
What an experience. Maybe I'm too stupid for this.