Re: oVirt 4.3.5/6 HC: Reinstall fails from WEB UI

Ssh to host and check the status of : sanlock.service supervdsmd.service vdsmd.service ovirt-ha-broker.service ovirt-ha-agent.service For example if sanlock is working, but supervdsmd is not - try to restart it. If it fais, run: systemctl cat supervdsmd.service And execute the commands in sections: ExecStartPre ExecStart And report any issues, then follow the next service in the chain. Best Regards, Strahil NikolovOn Oct 16, 2019 23:52, adrianquintero@gmail.com wrote: > > Hi, > I am trying to re-install a host from the web UI in oVirt 4.3.5, but it always fails and goes to "Setting Host state to Non-Operational" > > From the engine.log I see the following WARN/ERROR: > 2019-10-16 16:32:57,263-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) attached to the Data Center Default-DC1. Setting Host state to Non-Operational. > 2019-10-16 16:32:57,271-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings. > 2019-10-16 16:32:57,276-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1 > 2019-10-16 16:35:06,151-04 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (EE-ManagedThreadFactory-engine-Thread-137245) [] Could not connect host 'host1.example.com' to pool 'Default-DC1': Error storage pool connection: (u"spUUID=7d3fb14c-ebf0-11e9-9ee5-00163e05e135, msdUUID=4b87a5de-c976-4982-8b62-7cffef4a22d8, masterVersion=1, hostID=1, domainsMap={u'8c2df9c6-b505-4499-abb9-0d15db80f33e': u'active', u'4b87a5de-c976-4982-8b62-7cffef4a22d8': u'active', u'5d9f7d05-1fcc-4f99-9470-4e57cd15f128': u'active', u'fe24d88e-6acf-42d7-a857-eaf1f8deb24a': u'active'}",) > 2019-10-16 16:35:06,248-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) attached to the Data Center Default-DC1. Setting Host state to Non-Operational. > 2019-10-16 16:35:06,256-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings. > 2019-10-16 16:35:06,261-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1 > 2019-10-16 16:37:46,011-04 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'host1.example.com', last response arrived 1501 ms ago. > 2019-10-16 16:41:57,095-04 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (EE-ManagedThreadFactory-engine-Thread-137527) [17f3aadd] Could not connect host 'host1.example.com' to pool 'Default-DC1': Error storage pool connection: (u"spUUID=7d3fb14c-ebf0-11e9-9ee5-00163e05e135, msdUUID=4b87a5de-c976-4982-8b62-7cffef4a22d8, masterVersion=1, hostID=1, domainsMap={u'8c2df9c6-b505-4499-abb9-0d15db80f33e': u'active', u'4b87a5de-c976-4982-8b62-7cffef4a22d8': u'active', u'5d9f7d05-1fcc-4f99-9470-4e57cd15f128': u'active', u'fe24d88e-6acf-42d7-a857-eaf1f8deb24a': u'active'}",) > 2019-10-16 16:41:57,199-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) attached to the Data Center Default-DC1. Setting Host state to Non-Operational. > 2019-10-16 16:41:57,211-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings. > 2019-10-16 16:41:57,216-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1 > > Any ideas why this might be happening? > I have researched, however I have not been able to find a solution. > > thanks, > > Adrian > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-leave@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3XAKJ23DSQVGIT...

Strahil, this is what i see for each service all services are active and running except for ovirt-ha-agent which says "activating", even though the rest of the services are Active/running they still show a few errors warnings. ------------------------------------------------------------------------------------------------------- ● sanlock.service - Shared Storage Lease Manager Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2019-10-17 00:47:20 EDT; 2min 1s ago Process: 16495 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS) Main PID: 2023 Tasks: 10 CGroup: /system.slice/sanlock.service └─2023 /usr/sbin/sanlock daemon Oct 17 00:47:20 host1.example.com systemd[1]: Starting Shared Storage Lease Manager... Oct 17 00:47:20 host1.example.com systemd[1]: Started Shared Storage Lease Manager. Oct 17 00:47:20 host1.example.com sanlock[16496]: 2019-10-17 00:47:20 33920 [16496]: lockfile setlk error /var/run/sanlock/sanlock.pid: Resource temporarily unavailable ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: active (running) since Thu 2019-10-17 00:43:06 EDT; 6min ago Main PID: 15277 (supervdsmd) Tasks: 5 CGroup: /system.slice/supervdsmd.service └─15277 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock Oct 17 00:43:06 host1.example.com systemd[1]: Started Auxiliary vdsm service for running helper functions as root. Oct 17 00:43:07 host1.example.com supervdsmd[15277]: failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2019-10-17 00:47:27 EDT; 1min 54s ago Process: 16402 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 16499 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 16572 (vdsmd) Tasks: 38 CGroup: /system.slice/vdsmd.service └─16572 /usr/bin/python2 /usr/share/vdsm/vdsmd Oct 17 00:47:28 host1.example.com vdsm[16572]: WARN MOM not available. Oct 17 00:47:28 host1.example.com vdsm[16572]: WARN MOM not available, KSM stats will be missing. Oct 17 00:47:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:47:43 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:47:58 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:48:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:48:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:48:43 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:48:58 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:49:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2019-10-17 00:44:11 EDT; 5min ago Main PID: 16379 (ovirt-ha-broker) Tasks: 2 CGroup: /system.slice/ovirt-ha-broker.service └─16379 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker Oct 17 00:44:11 host1.example.com systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Thu 2019-10-17 00:49:13 EDT; 8s ago Process: 16925 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157) Main PID: 16925 (code=exited, status=157) Oct 17 00:49:13 host1.example.com systemd[1]: Unit ovirt-ha-agent.service entered failed state. Oct 17 00:49:13 host1.example.com systemd[1]: ovirt-ha-agent.service failed. [root@host1]# systemctl status sanlock;echo ---------;systemctl status supervdsmd;echo -------------;systemctl status vdsmd;echo ---------;systemctl status ovirt-ha-broker;echo ----------;systemctl status ovirt-ha-agent ● sanlock.service - Shared Storage Lease Manager Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2019-10-17 00:47:20 EDT; 3min 14s ago Process: 16495 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS) Main PID: 2023 Tasks: 10 CGroup: /system.slice/sanlock.service └─2023 /usr/sbin/sanlock daemon Oct 17 00:47:20 host1.example.com systemd[1]: Starting Shared Storage Lease Manager... Oct 17 00:47:20 host1.example.com systemd[1]: Started Shared Storage Lease Manager. Oct 17 00:47:20 host1.example.com sanlock[16496]: 2019-10-17 00:47:20 33920 [16496]: lockfile setlk error /var/run/sanlock/sanlock.pid: Resource temporarily unavailable --------- ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: active (running) since Thu 2019-10-17 00:43:06 EDT; 7min ago Main PID: 15277 (supervdsmd) Tasks: 5 CGroup: /system.slice/supervdsmd.service └─15277 /usr/bin/python2 /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock Oct 17 00:43:06 host1.example.com systemd[1]: Started Auxiliary vdsm service for running helper functions as root. Oct 17 00:43:07 host1.example.com supervdsmd[15277]: failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory ------------- ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2019-10-17 00:47:27 EDT; 3min 7s ago Process: 16402 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 16499 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 16572 (vdsmd) Tasks: 38 CGroup: /system.slice/vdsmd.service └─16572 /usr/bin/python2 /usr/share/vdsm/vdsmd Oct 17 00:48:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:48:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:48:43 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:48:58 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:49:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:49:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:49:43 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:49:58 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:50:13 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? Oct 17 00:50:28 host1.example.com vdsm[16572]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? --------- ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2019-10-17 00:44:11 EDT; 6min ago Main PID: 16379 (ovirt-ha-broker) Tasks: 2 CGroup: /system.slice/ovirt-ha-broker.service └─16379 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker Oct 17 00:44:11 host1.example.com systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. ---------- ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Thu 2019-10-17 00:50:27 EDT; 7s ago Process: 17107 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157) Main PID: 17107 (code=exited, status=157) Oct 17 00:50:27 host1.example.com systemd[1]: Unit ovirt-ha-agent.service entered failed state. Oct 17 00:50:27 host1.example.com systemd[1]: ovirt-ha-agent.service failed. -------------------------------------------------------------------------------------------------------

"Host host1.example.com cannot access the Storage Domain(s) attached to the Data Center Default-DC1." Can you check the vdsm logs from this host to check why the storage domains are not attached? On Thu, Oct 17, 2019 at 9:43 AM Strahil <hunter86_bg@yahoo.com> wrote:
Ssh to host and check the status of : sanlock.service supervdsmd.service vdsmd.service ovirt-ha-broker.service ovirt-ha-agent.service
For example if sanlock is working, but supervdsmd is not - try to restart it. If it fais, run: systemctl cat supervdsmd.service
And execute the commands in sections: ExecStartPre ExecStart
And report any issues, then follow the next service in the chain.
Best Regards, Strahil Nikolov On Oct 16, 2019 23:52, adrianquintero@gmail.com wrote: > > Hi, > I am trying to re-install a host from the web UI in oVirt 4.3.5, but it always fails and goes to "Setting Host state to Non-Operational" > > From the engine.log I see the following WARN/ERROR: > 2019-10-16 16:32:57,263-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) attached to the Data Center Default-DC1. Setting Host state to Non-Operational. > 2019-10-16 16:32:57,271-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings. > 2019-10-16 16:32:57,276-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [491c8bd9] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1 > 2019-10-16 16:35:06,151-04 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (EE-ManagedThreadFactory-engine-Thread-137245) [] Could not connect host ' host1.example.com' to pool 'Default-DC1': Error storage pool connection: (u"spUUID=7d3fb14c-ebf0-11e9-9ee5-00163e05e135, msdUUID=4b87a5de-c976-4982-8b62-7cffef4a22d8, masterVersion=1, hostID=1, domainsMap={u'8c2df9c6-b505-4499-abb9-0d15db80f33e': u'active', u'4b87a5de-c976-4982-8b62-7cffef4a22d8': u'active', u'5d9f7d05-1fcc-4f99-9470-4e57cd15f128': u'active', u'fe24d88e-6acf-42d7-a857-eaf1f8deb24a': u'active'}",) > 2019-10-16 16:35:06,248-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) attached to the Data Center Default-DC1. Setting Host state to Non-Operational. > 2019-10-16 16:35:06,256-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings. > 2019-10-16 16:35:06,261-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [690baf86] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1 > 2019-10-16 16:37:46,011-04 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'host1.example.com', last response arrived 1501 ms ago. > 2019-10-16 16:41:57,095-04 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (EE-ManagedThreadFactory-engine-Thread-137527) [17f3aadd] Could not connect host 'host1.example.com' to pool 'Default-DC1': Error storage pool connection: (u"spUUID=7d3fb14c-ebf0-11e9-9ee5-00163e05e135, msdUUID=4b87a5de-c976-4982-8b62-7cffef4a22d8, masterVersion=1, hostID=1, domainsMap={u'8c2df9c6-b505-4499-abb9-0d15db80f33e': u'active', u'4b87a5de-c976-4982-8b62-7cffef4a22d8': u'active', u'5d9f7d05-1fcc-4f99-9470-4e57cd15f128': u'active', u'fe24d88e-6acf-42d7-a857-eaf1f8deb24a': u'active'}",) > 2019-10-16 16:41:57,199-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host host1.example.com cannot access the Storage Domain(s) attached to the Data Center Default-DC1. Setting Host state to Non-Operational. > 2019-10-16 16:41:57,211-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: VDS_ALERT_FENCE_TEST_FAILED(9,001), Power Management test failed for Host host1.example.com.There is no other host in the data center that can be used to test the power management settings. > 2019-10-16 16:41:57,216-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [508ddb44] EVENT_ID: CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host host1.example.com to Storage Pool Default-DC1 > > Any ideas why this might be happening? > I have researched, however I have not been able to find a solution. > > thanks, > > Adrian > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-leave@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3XAKJ23DSQVGIT... _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OFXP3RQMKGMKOQ...

I checked the vdsm.log of the host and see among other warnings the following: 2019-10-19 14:21:19,427-0400 ERROR (jsonrpc/6) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:191) 2019-10-19 14:21:19,887-0400 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='6507ff29-62c9-4056-b7a2-e4fe9395ec44') Unexpected error (task:875) 2019-10-19 14:21:19,887-0400 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83) 2019-10-19 14:21:21,894-0400 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='dc4820ae-c6e9-46cf-868f-c30c584c3604') Unexpected error (task:875) 2019-10-19 14:21:21,894-0400 ERROR (jsonrpc/4) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83) 2019-10-19 14:21:23,892-0400 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='c64aa233-18b3-4a13-8791-5fd5ba03774a') Unexpected error (task:875) 2019-10-19 14:21:23,892-0400 ERROR (jsonrpc/2) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83) 2019-10-19 14:21:25,925-0400 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='3f8e7814-c72f-41af-9969-971d25e42d63') Unexpected error (task:875) 2019-10-19 14:21:25,925-0400 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83) 2019-10-19 14:21:27,888-0400 ERROR (jsonrpc/3) [storage.TaskManager.Task] (Task='62b5c85e-cf79-4167-bb4e-65620b4cc16a') Unexpected error (task:875) 2019-10-19 14:21:27,888-0400 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83) 2019-10-19 14:21:29,882-0400 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='fd19fa4a-72e0-4a69-bd57-28532af092a4') Unexpected error (task:875) 2019-10-19 14:21:29,882-0400 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83) 2019-10-19 14:21:30,454-0400 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:191) 2019-10-19 14:21:31,899-0400 ERROR (jsonrpc/5) [storage.TaskManager.Task] (Task='e370869a-07de-4d75-9e1c-410202e48a60') Unexpected error (task:875) 2019-10-19 14:21:31,899-0400 ERROR (jsonrpc/5) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83) 019-10-19 14:21:21,868-0400 INFO (jsonrpc/5) [vdsm.api] FINISH getStorageDomainInfo return={'info': {'uuid': u'4b87a5de-c976-4982-8b62-7cffef4a22d8', 'version': '5', 'role': 'Master', 'alignment': 1048576, 'remotePath': 'vmm13.virt.iad3p.rsapps.net:/engine', 'block_size': 512, 'type': 'GLUSTERFS', 'class': 'Data', 'pool': ['7d3fb14c-ebf0-11e9-9ee5-00163e05e135'], 'name': 'HostedEngine'}} from=::1,33908, task_id=55a03cc9-1f9f-44b6-b87f-bbb73cefee53 (api:54) 2019-10-19 14:21:21,868-0400 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo succeeded in 0.01 seconds (__init__:312) 2019-10-19 14:21:21,878-0400 INFO (jsonrpc/4) [vdsm.api] START prepareImage(sdUUID=u'4b87a5de-c976-4982-8b62-7cffef4a22d8', spUUID=u'00000000-0000-0000-0000-000000000000', imgUUID=u'7f969d21-1445-4993-a7d8-3af8fb83cbd4', leafUUID=u'1ba19dea-32e9-465a-a43c-eec81a88d2e0', allowIllegal=False) from=::1,33908, task_id=dc4820ae-c6e9-46cf-868f-c30c584c3604 (api:48) 2019-10-19 14:21:21,894-0400 INFO (jsonrpc/4) [vdsm.api] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) from=::1,33908, task_id=dc4820ae-c6e9-46cf-868f-c30c584c3604 (api:52) 2019-10-19 14:21:21,894-0400 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='dc4820ae-c6e9-46cf-868f-c30c584c3604') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in prepareImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3203, in prepareImage raise se.VolumeDoesNotExist(leafUUID) VolumeDoesNotExist: Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) 2019-10-19 14:21:21,894-0400 INFO (jsonrpc/4) [storage.TaskManager.Task] (Task='dc4820ae-c6e9-46cf-868f-c30c584c3604') aborting: Task is aborted: "Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',)" - code 201 (task:1181) 2019-10-19 14:21:21,894-0400 ERROR (jsonrpc/4) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83) 2019-10-19 14:21:21,895-0400 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Image.prepare failed (error 201) in 0.01 seconds (__init__:312) 2019-10-19 14:21:22,391-0400 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:33910 (protocoldetector:61) 2019-10-19 14:21:22,399-0400 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:33910 (protocoldetector:125) 2019-10-19 14:21:22,399-0400 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95)

Can anyone point out if this "https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrast..." works for a 3 node hyperconverged setup in ovirt? I have tried it and seems that every time I get different errors, does not matter if I use Dell HW or HPE HW. Maybe I am not following it properly or not all the full documentation for oVirt HI deployment with ansible is out there for an automated deployment on a 3 node setup, if you can point me to the right documentation I would appreciate it, for now I am only doing WEB UI deployments. thanks, Adrian

Hi Sahina, I have checked the vdsm.log of the host and I can see the following: StorageDomainDoesNotExist: Storage domain does not exist: (u'62bfe528-cd5c-4b6c-808a-b097fef76629',) raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'8c2df9c6-b505-4499-abb9-0d15db80f33e',) raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'5d9f7d05-1fcc-4f99-9470-4e57cd15f128',) raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'fe24d88e-6acf-42d7-a857-eaf1f8deb24a',) StorageDomainDoesNotExist: Storage domain does not exist: (u'5d9f7d05-1fcc-4f99-9470-4e57cd15f128',) 2019-10-21 15:03:45,742-0400 ERROR (monitor/fe24d88) [storage.Monitor] Error checking domain fe24d88e-6acf-42d7-a857-eaf1f8deb24a (monitor:425) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 406, in _checkDomainStatus self.domain.selftest() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line 62, in findDomain return GlusterStorageDomain(GlusterStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line 58, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) Is there a way to fix this? Also, when I try to reinstall one of the hosts (6 node cluster) it reinstalls just fine (except for host1) but does not deploy the engine capabilities to it. thoughts? thanks, Adrian

on the other hosts I see entries like the following: 2019-10-21 15:41:32,767-0400 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:191) 2019-10-21 15:41:47,799-0400 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:191) 2019-10-21 15:41:49,916-0400 ERROR (jsonrpc/4) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'1ba19dea-32e9-465a-a43c-eec81a88d2e0',) (dispatcher:83)

Im still not able to incorporate host2 and 3 thru ansible hyperconverged deployment using ansible. After the deploy the engine is up but cant add the servers manually as it complains about the host keys... TASK [Set Engine public key as authorized key without validating the TLS/SSL certificates] ************************************************************************************************************************************************************************************************************************************************** task path: /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/add_hosts_storage_domains.yml:4 Failed: [host1.example.com -> host2.example.com] (item=host2.example.com) => { "ansible_loop_var": "host", "changed": false, "host": "host2.example.com", "invocation": { "module_args": { "comment": null, "exclusive": false, "follow": false, "key": "https://ovirt-engine2.example.com/ovirt-engine/services/pki-resource?resource=engine-certificate&format=OPENSSH-PUBKEY", "key_options": null, "manage_dir": true, "path": null, "state": "present", "user": "root", "validate_certs": false } }, "msg": "Error getting key from: https://ovirt-engine2.example.com/ovirt-engine/services/pki-resource?resource=engine-certificate&format=OPENSSH-PUBKEY" } thoughts? Thank you, Adrian

Hey! Host2 and Host3 should be added automatically if you have provided the FQDN for these hosts during the deployment. From the error above "msg": "Error getting key from: https://ovirt-engine2.example.com/ovirt-engine/services/pki-resource?resource=engine-certificate&format=OPENSSH-PUBKEY " I think that the HE deployment had an issue, which is causing trouble for the host1 to get the key from the engine to add to the second host. In case the HE deployment has failed across all three hosts, you can clean up the HE deployment and go through the UI of host1. Then follow the steps for HE deployment. On the last step, there will be an option for mount where you mention the mount point for the first host and an option for backup-volfile-servers where you mention host2:host3 in this format. This should add both the hosts automatically. On Wed, Oct 23, 2019 at 3:02 AM <adrianquintero@gmail.com> wrote:
Im still not able to incorporate host2 and 3 thru ansible hyperconverged deployment using ansible. After the deploy the engine is up but cant add the servers manually as it complains about the host keys...
TASK [Set Engine public key as authorized key without validating the TLS/SSL certificates] ************************************************************************************************************************************************************************************************************************************************** task path: /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/add_hosts_storage_domains.yml:4
Failed: [host1.example.com -> host2.example.com] (item=host2.example.com) => { "ansible_loop_var": "host", "changed": false, "host": "host2.example.com", "invocation": { "module_args": { "comment": null, "exclusive": false, "follow": false, "key": " https://ovirt-engine2.example.com/ovirt-engine/services/pki-resource?resource=engine-certificate&format=OPENSSH-PUBKEY",
"key_options": null, "manage_dir": true, "path": null, "state": "present", "user": "root", "validate_certs": false } }, "msg": "Error getting key from: https://ovirt-engine2.example.com/ovirt-engine/services/pki-resource?resource=engine-certificate&format=OPENSSH-PUBKEY " }
thoughts?
Thank you,
Adrian _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VY36JSI4YD2URT...

Parth, I am able to successfully install using the UI, however I am trying to automate the installation process using ansible for a hyperconverged setup. The install works but just have that issue where the last 2 of 3 servers are not being joined by the engine and in the ansible output that is what I see: https://ovirt-engine2.example.com/ovirt-engine/services/pki-resource?resource=engine-certificate&format=OPENSSH-PUBKEY" So then I try to do it manually after the Engine was online using host1 but they are not capable of hosting the engine (the crown icon is not shown as it does on host1). I logged in to each of the hosts and see the follwing: [root@host1 ~]# vdsm-client Host getCapabilities | grep hostedEngineDeployed "hostedEngineDeployed": true, [root@host2 ~]# vdsm-client Host getCapabilities | grep hostedEngineDeployed "hostedEngineDeployed": true, [root@host3 ~]# vdsm-client Host getCapabilities | grep hostedEngineDeployed "hostedEngineDeployed": true, but if I right click the Hosted Engine VM on the UI and click on migrate "Destination Host" entry indicates "No available host to migrate VMs to" Gluster setup worked ok using ansible playbooks by following: https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrast... Thoughts?
participants (4)
-
adrianquintero@gmail.com
-
Parth Dhanjal
-
Sahina Bose
-
Strahil