lockspace the command just silently executes and I'm not sure if it doing something at all.
I also tried to clean the metadata. On one host it went correct, on second host it always failing with following messages:
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to start monitoring domain (sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162, host_id=2): timeout during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 67, in action_clean
return he.clean(options.force_cleanup)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 345, in clean
self._initialize_domain_monitor()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 829, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain (sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162, host_id=2): timeout during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0'
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred, giving up. Please review the log and consider filing a bug.
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
I'm not an expert when it comes to read the sanlock but the output looks a bit strange to me:
from first host (host_id=2)
[root@ovirt1 ~]# sanlock client status
p -1 helper
p -1 listener
p -1 status
p 3763
p 62861 quaggaVM
p 63111 powerDNS
p 107818 pjsip_freepbx_14
p 109092 revizorro_dev
p 109589 routerVM
s hosted-engine:2:/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769:0
s a40cc3a9-54d6-40fd-acee-
525ef29c8ce3:2:/rhev/data-center/mnt/glusterSD/ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/ids:0s 4a7f8717-9bb0-4d80-8016-
498fa4b88162:1:/rhev/data-center/mnt/glusterSD/ovirt2.telia.ru\:_engine/4a7f8717-9bb0-4d80-8016-498fa4b88162/dom_md/ids:0r a40cc3a9-54d6-40fd-acee-
525ef29c8ce3:SDM:/rhev/data-center/mnt/glusterSD/ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/leases:1048576:49 p 3763
from second host (host_id=1)
[root@ovirt2 ~]# sanlock client status
p -1 helper
p -1 listener
p 150440 CentOS-Desk
p 151061 centos-dev-box
p 151288 revizorro_nfq
p 151954 gitlabVM
p -1 status
s hosted-engine:1:/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769:0
s a40cc3a9-54d6-40fd-acee-
525ef29c8ce3:1:/rhev/data-center/mnt/glusterSD/ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/ids:0s 4a7f8717-9bb0-4d80-8016-
498fa4b88162:1:/rhev/data-center/mnt/glusterSD/ovirt2.telia.ru\:_engine/4a7f8717-9bb0-4d80-8016-498fa4b88162/dom_md/ids:0 ADD
Not sure if there is a problem with locspace 4a7f8717-9bb0-4d80-8016-498fa4b88162, but both hosts showing 1 as a host_id here. Is this correct? Should't they have different Id's here?
Once ha-agent's has been started hosted-engine --vm-status showing 'unknow-stale-data' for the second host. And HE just doesn't start on second host at all.
Host redeployment haven't helped as well.
Any advises on this?
Regards,
Artem