Couldn't connect to VDSM within 60 seconds

Hi, we have 3 hosts and a self-hosted engine VM (ovirt version 4.4.). After rebooting all the hosts, we are unable to start the VM hosted-engine. In particular, the output of 'hosted-engine --vm-start' is as follows: "The hosted engine configuration has not been retrieved from shared storage yet, for more details please check sanlock status." We checked the sanlock, ovirt-ha-agent and broker status but all are "active (running)". Checking the log files, all share the common error "RuntimeError: Couldn't connect to VDSM within 60 second", returned in loop. In the vdsm.log we found this error: " 2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] START repoStats(domains=()) from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:48) 2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] FINISH repoStats return={} from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:54) 2022-04-05 16:20:11,789+0200 WARN (periodic/0) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) " We tried googling to resolve this issue but, unfortunately, unsuccessfully. Can someone help us to solve our critical issue? Bests, Pasquale

what is the status of the vdsmd.service & supervdsmd.service ? Best Regards,Strahil Nikolov On Tue, Apr 5, 2022 at 17:39, pasquale.borrelli--- via Users<users@ovirt.org> wrote: Hi, we have 3 hosts and a self-hosted engine VM (ovirt version 4.4.). After rebooting all the hosts, we are unable to start the VM hosted-engine. In particular, the output of 'hosted-engine --vm-start' is as follows: "The hosted engine configuration has not been retrieved from shared storage yet, for more details please check sanlock status." We checked the sanlock, ovirt-ha-agent and broker status but all are "active (running)". Checking the log files, all share the common error "RuntimeError: Couldn't connect to VDSM within 60 second", returned in loop. In the vdsm.log we found this error: " 2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] START repoStats(domains=()) from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:48) 2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] FINISH repoStats return={} from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:54) 2022-04-05 16:20:11,789+0200 WARN (periodic/0) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) " We tried googling to resolve this issue but, unfortunately, unsuccessfully. Can someone help us to solve our critical issue? Bests, Pasquale _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/X7S47EIM4U46ST...

Hi,
we have 3 hosts and a self-hosted engine VM (ovirt version 4.4.). After rebooting all the hosts, we are unable to start the VM hosted-engine. In particular, the output of 'hosted-engine --vm-start' is as follows: "The hosted engine configuration has not been retrieved from shared storage yet, for more details please check sanlock status."
We checked the sanlock, ovirt-ha-agent and broker status but all are "active (running)".
Checking the log files, all share the common error "RuntimeError: Couldn't connect to VDSM within 60 second", returned in loop. In the vdsm.log we found this error: " 2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] START repoStats(domains=()) from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:48) 2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] FINISH repoStats return={} from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:54) 2022-04-05 16:20:11,789+0200 WARN (periodic/0) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) "
We tried googling to resolve this issue but, unfortunately, unsuccessfully.
Can someone help us to solve our critical issue?
Bests, Pasquale
Is there anyone who can provide any suggestions? Unfortunately we have several urgencies :( Best regards, Pasquale

Have you checked your atorage is mounted ?Have you tried by restarting vdsmd & supervdsmd ? It's hard to guess based on the info provided. Best Regards,Strahil Nikolov On Thu, Apr 7, 2022 at 20:06, pasquale.borrelli--- via Users<users@ovirt.org> wrote: > Hi,
we have 3 hosts and a self-hosted engine VM (ovirt version 4.4.). After rebooting all the hosts, we are unable to start the VM hosted-engine. In particular, the output of 'hosted-engine --vm-start' is as follows: "The hosted engine configuration has not been retrieved from shared storage yet, for more details please check sanlock status."
We checked the sanlock, ovirt-ha-agent and broker status but all are "active (running)".
Checking the log files, all share the common error "RuntimeError: Couldn't connect to VDSM within 60 second", returned in loop. In the vdsm.log we found this error: " 2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] START repoStats(domains=()) from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:48) 2022-04-05 16:20:11,786+0200 INFO (periodic/0) [vdsm.api] FINISH repoStats return={} from=internal, task_id=8541000f-e7fd-4b59-8ae6-522c87538688 (api:54) 2022-04-05 16:20:11,789+0200 WARN (periodic/0) [root] Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? (api:168) "
We tried googling to resolve this issue but, unfortunately, unsuccessfully.
Can someone help us to solve our critical issue?
Bests, Pasquale
Is there anyone who can provide any suggestions? Unfortunately we have several urgencies :( Best regards, Pasquale _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MP5N72BWCRUSL3...

Il giorno mar 5 apr 2022 alle ore 16:42 pasquale.borrelli--- via Users < users@ovirt.org> ha scritto:
Hi,
we have 3 hosts and a self-hosted engine VM (ovirt version 4.4.).
Hi Pasquale, can you please provide output of ``` dnf -q list installed centos-release\* ovirt-release\* ovirt-engine redhat-release vdsm glusterfs ``` "ovirt version 4.4" is not enough for identifying the version you're running. thanks, -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

Dear Marco, the information you request are as follows: glusterfs.x86_64 8.6-2.el8s @ovirt-4.4-centos-gluster8 ovirt-release44.noarch 4.4.10.2-1.el8 @@commandline vdsm.x86_64 4.40.100.2-1.el8 @ovirt-4.4 centos-release 8.5.2111 In addition, this is the output of /var/log/messages (in loop): pr 8 10:14:51 localhost journal[50186]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Failed initializing the broker: Couldn't connect to VDSM within 60 seconds Apr 8 10:14:51 localhost journal[50186]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 64, in run#012 self.storage_broker_instance = self._get_storage_broker()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 143, in _get_storage_broker#012 return storage_broker.StorageBroker()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 97, in __init_#012 self._backend.connect()#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 370, in connect#012 connection = util.connect_vdsm_json_rpc(logger=self._logger)#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 474, in connect_vdsm_json_rpc#012 __vdsm_json_rpc_connect(logger, timeout)#012 File "/usr/lib/python3.6/sit e-packages/ovirt_hosted_engine_ha/lib/util.py", line 415, in __vdsm_json_rpc_connect#012 timeout=VDSM_MAX_RETRY * VDSM_DELAY#012RuntimeError: Couldn't connect to VDSM within 60 seconds Apr 8 10:14:51 localhost journal[50186]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.broker.Broker ERROR Trying to restart the broker Apr 8 10:14:51 localhost platform-python[50186]: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker' Apr 8 10:14:52 localhost systemd[1]: ovirt-ha-broker.service: Main process exited, code=exited, status=1/FAILURE Apr 8 10:14:52 localhost systemd[1]: ovirt-ha-broker.service: Failed with result 'exit-code'. Apr 8 10:14:52 localhost abrt-server[50224]: Deleting problem directory Python3-2022-04-08-10:14:52-50186 (dup of Python3-2022-04-05-10:41:27-1439) Apr 8 10:14:52 localhost systemd[1]: ovirt-ha-broker.service: Service RestartSec=100ms expired, scheduling restart. Apr 8 10:14:52 localhost systemd[1]: ovirt-ha-broker.service: Scheduled restart job, restart counter is at 980. Apr 8 10:14:52 localhost systemd[1]: Stopped oVirt Hosted Engine High Availability Communications Broker. Apr 8 10:14:52 localhost systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Apr 8 10:14:52 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:53 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:54 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:54 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:55 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:55 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:56 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:56 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:57 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:57 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:58 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' Apr 8 10:14:58 localhost vdsm[47879]: WARN Unrecognized protocol: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03' It seems that both ovirt-ha-broker and ovirt-ha-agent trying to restart :( Thank you! Pasquale

Hi Sandro, Do you have any advise on this issue? It is very critical for our research activity since all our VMs suddenly became unavailable :-(

Il giorno lun 11 apr 2022 alle ore 13:40 Marco Aiello < marcoaiello1978@gmail.com> ha scritto:
Hi Sandro, Do you have any advise on this issue? It is very critical for our research activity since all our VMs suddenly became unavailable :-(
You didn't mention but from provided info I guess you're running hyperconverged self hosted engine right? Did you check glusterfs storage status? I'm not a glusterfs expert but something like `gluster volume status all` should work. If you're not running hyperconverged, can you share a sosreport from one of the hosts? Please check the content of the report before submitting to ensure it doesn't contain sensitive information data. Thanks, -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
participants (4)
-
Marco Aiello
-
pasquale.borrelli@synlab.it
-
Sandro Bonazzola
-
Strahil Nikolov