VDSM HOST ISSUE - Message timeout which can be caused by communication issues

Good morning, i have installed a new ovirt 4.3.10 enviroment but sometimes on the events of some hosts i read this error message: VDSM node-1-ra command Get Host Capabilities failed: Message timeout which can be caused by communication issues Due to this issue hosts are in unresponsive state and the only way to resolve this is and activate host in the cluster, is to restart the ovirt engine service. Can anyone help me? Thanks in advance Luigi

i attach the hosts info : Software OS Version: RHEL - 7 - 8.2003.0.el7.centos OS Description: CentOS Linux 7 (Core) Kernel Version: 3.10.0 - 1062.el7.x86_64 KVM Version: 2.12.0 - 44.1.el7_8.1 LIBVIRT Version: libvirt-4.5.0-33.el7_8.1 VDSM Version: vdsm-4.30.46-1.el7

What do you see in the engine's logs ? Best Regards, Strahil Nikolov На 16 юли 2020 г. 13:24:03 GMT+03:00, lu.alfonsi@almaviva.it написа:
i attach the hosts info :
Software OS Version: RHEL - 7 - 8.2003.0.el7.centos OS Description: CentOS Linux 7 (Core) Kernel Version: 3.10.0 - 1062.el7.x86_64 KVM Version: 2.12.0 - 44.1.el7_8.1 LIBVIRT Version: libvirt-4.5.0-33.el7_8.1 VDSM Version: vdsm-4.30.46-1.el7 _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7SZVPDQSIJVDHQ...

2020-07-15 11:41:58,968+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-79) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2020-07-15 11:41:59,350+02 WARN [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) [] Retry failed 2020-07-15 11:41:59,350+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] Exception during connection 2020-07-15 11:41:59,351+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] Unable to RefreshCapabilities: ConnectException: Connection timeout 2020-07-15 11:41:59,354+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = sia-svr-ct02, VdsIdAndVdsVDSCommandParametersBase:{hostId='9e65ad71-f633-4be3-aa74-a7fc51b285e6', vds='Host[sia-svr-ct02,9e65ad71-f633-4be3-aa74-a7fc51b285e6]'})' execution failed: java.rmi.ConnectException: Connection timeout 2020-07-15 11:42:00,465+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [] Command 'SpmStatusVDSCommand(HostName = sia-svr-ct01, SpmStatusVDSCommandParameters:{hostId='2a97ece4-c83e-4107-ac0d-915757c17f16', storagePoolId='cdeb9ed3-60ab-441e-8026-de62e6aa8022'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2020-07-15 11:42:00,484+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to sia-svr-ct01.afis/10.234.50.133 2020-07-15 11:42:02,370+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to sia-svr-ct02.afis/10.234.50.134 2020-07-15 11:42:02,745+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [] VM 'faf4149e-4e31-40a1-a4ce-f08f03b282c9' is migrating to VDS 'b8e7c982-f816-4eb0-9634-762cd6956fac'(sia-svr-rc02) ignoring it in the refresh until migration is done 2020-07-15 11:42:17,771+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [] VM 'faf4149e-4e31-40a1-a4ce-f08f03b282c9' is migrating to VDS 'b8e7c982-f816-4eb0-9634-762cd6956fac'(sia-svr-rc02) ignoring it in the refresh until migration is done 2020-07-15 11:42:20,485+02 WARN [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) [] Retry failed 2020-07-15 11:42:20,486+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [] Exception during connection 2020-07-15 11:42:20,487+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [] IrsBroker::Failed::GetStoragePoolInfoVDS 2020-07-15 11:42:20,487+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [] Command 'GetStoragePoolInfoVDSCommand( GetStoragePoolInfoVDSCommandParameters:{storagePoolId='cdeb9ed3-60ab-441e-8026-de62e6aa8022', ignoreFailoverLimit='true'})' execution failed: IRSProtocolException: 2020-07-15 11:42:20,497+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Connection issue Connection timeout 2020-07-15 11:42:22,370+02 WARN [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) [] Retry failed 2020-07-15 11:42:22,370+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [] Exception during connection 2020-07-15 11:42:22,371+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [] Unable to RefreshCapabilities: ConnectException: Connection timeout 2020-07-15 11:42:22,373+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = sia-svr-ct02, VdsIdAndVdsVDSCommandParametersBase:{hostId='9e65ad71-f633-4be3-aa74-a7fc51b285e6', vds='Host[sia-svr-ct02,9e65ad71-f633-4be3-aa74-a7fc51b285e6]'})' execution failed: java.rmi.ConnectException: Connection timeout 2020-07-15 11:42:25,384+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to sia-svr-ct02.afis/10.234.50.134 2020-07-15 11:42:32,805+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [] VM 'faf4149e-4e31-40a1-a4ce-f08f03b282c9' is migrating to VDS 'b8e7c982-f816-4eb0-9634-762cd6956fac'(sia-svr-rc02) ignoring it in the refresh until migration is done 2020-07-15 11:42:43,163+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'sia-svr-rc02.afis', last response arrived 1501 ms ago.

What is the output of: host sia-svr-ct02 nslookup sia-svr-ct02 Best Regards, Strahil Nikolov На 17 юли 2020 г. 10:46:08 GMT+03:00, lu.alfonsi@almaviva.it написа:
2020-07-15 11:41:58,968+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-79) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2020-07-15 11:41:59,350+02 WARN [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) [] Retry failed 2020-07-15 11:41:59,350+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] Exception during connection 2020-07-15 11:41:59,351+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] Unable to RefreshCapabilities: ConnectException: Connection timeout 2020-07-15 11:41:59,354+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = sia-svr-ct02, VdsIdAndVdsVDSCommandParametersBase:{hostId='9e65ad71-f633-4be3-aa74-a7fc51b285e6', vds='Host[sia-svr-ct02,9e65ad71-f633-4be3-aa74-a7fc51b285e6]'})' execution failed: java.rmi.ConnectException: Connection timeout 2020-07-15 11:42:00,465+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [] Command 'SpmStatusVDSCommand(HostName = sia-svr-ct01, SpmStatusVDSCommandParameters:{hostId='2a97ece4-c83e-4107-ac0d-915757c17f16', storagePoolId='cdeb9ed3-60ab-441e-8026-de62e6aa8022'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2020-07-15 11:42:00,484+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to sia-svr-ct01.afis/10.234.50.133 2020-07-15 11:42:02,370+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to sia-svr-ct02.afis/10.234.50.134 2020-07-15 11:42:02,745+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [] VM 'faf4149e-4e31-40a1-a4ce-f08f03b282c9' is migrating to VDS 'b8e7c982-f816-4eb0-9634-762cd6956fac'(sia-svr-rc02) ignoring it in the refresh until migration is done 2020-07-15 11:42:17,771+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [] VM 'faf4149e-4e31-40a1-a4ce-f08f03b282c9' is migrating to VDS 'b8e7c982-f816-4eb0-9634-762cd6956fac'(sia-svr-rc02) ignoring it in the refresh until migration is done 2020-07-15 11:42:20,485+02 WARN [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) [] Retry failed 2020-07-15 11:42:20,486+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [] Exception during connection 2020-07-15 11:42:20,487+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [] IrsBroker::Failed::GetStoragePoolInfoVDS 2020-07-15 11:42:20,487+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [] Command 'GetStoragePoolInfoVDSCommand( GetStoragePoolInfoVDSCommandParameters:{storagePoolId='cdeb9ed3-60ab-441e-8026-de62e6aa8022', ignoreFailoverLimit='true'})' execution failed: IRSProtocolException: 2020-07-15 11:42:20,497+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Connection issue Connection timeout 2020-07-15 11:42:22,370+02 WARN [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) [] Retry failed 2020-07-15 11:42:22,370+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [] Exception during connection 2020-07-15 11:42:22,371+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [] Unable to RefreshCapabilities: ConnectException: Connection timeout 2020-07-15 11:42:22,373+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = sia-svr-ct02, VdsIdAndVdsVDSCommandParametersBase:{hostId='9e65ad71-f633-4be3-aa74-a7fc51b285e6', vds='Host[sia-svr-ct02,9e65ad71-f633-4be3-aa74-a7fc51b285e6]'})' execution failed: java.rmi.ConnectException: Connection timeout 2020-07-15 11:42:25,384+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to sia-svr-ct02.afis/10.234.50.134 2020-07-15 11:42:32,805+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [] VM 'faf4149e-4e31-40a1-a4ce-f08f03b282c9' is migrating to VDS 'b8e7c982-f816-4eb0-9634-762cd6956fac'(sia-svr-rc02) ignoring it in the refresh until migration is done 2020-07-15 11:42:43,163+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'sia-svr-rc02.afis', last response arrived 1501 ms ago. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GHC5QMCO3XIYBZ...

This is the output from the engine: [root@dacs-ovirt ~]# host sia-svr-ct02 sia-svr-ct02.afis has address 10.234.50.134 [root@dacs-ovirt ~]# nslookup sia-svr-ct02 Server: 10.249.20.21 Address: 10.249.20.21#53 Name: sia-svr-ct02.afis Address: 10.234.50.134

Definitely it's not a resolve issue. Have you made changes to sshd_config on sia-svr-ct02 ? Is root login opened ? Best Regards, Strahil Nikolov На 17 юли 2020 г. 13:58:09 GMT+03:00, lu.alfonsi@almaviva.it написа:
This is the output from the engine:
[root@dacs-ovirt ~]# host sia-svr-ct02 sia-svr-ct02.afis has address 10.234.50.134
[root@dacs-ovirt ~]# nslookup sia-svr-ct02 Server: 10.249.20.21 Address: 10.249.20.21#53
Name: sia-svr-ct02.afis Address: 10.234.50.134 _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5AUR2KK63HHJHJ...

Hello, yes, root login is opened. No, I didn't make changes to the sshd_config The fact is that this issue happens from time to time. Let me give you more details: When the hosts go in "non responsive state", probably due to network errors, they remain in this state for a few hours. In the meantime, if we try to reach the hosts and the vm, we can connect to them correctly. So in my opinion, the ovirt manager, when it is not able to reach the hypervisors, set the server in the non responsive state, but it has issues when he tries to reconnect to it and to bring them up again.

Hm... then you need to play on a TEST ovirt with the options described in in https://www.ovirt.org/develop/developer-guide/engine/engine-config-options.h... Some of the more interesting options are: SSHInactivityTimoutSeconds TimeoutToResetVdsInSeconds VDSAttemptsToResetCount VdsRecoveryTimeoutInMintues VdsRefreshRate vdsTimeout Try them first in a test system before deploying on production. Best Regards, Strahil Nikolov В събота, 18 юли 2020 г., 10:40:14 Гринуич+3, lu.alfonsi@almaviva.it <lu.alfonsi@almaviva.it> написа: Hello, yes, root login is opened. No, I didn't make changes to the sshd_config The fact is that this issue happens from time to time. Let me give you more details: When the hosts go in "non responsive state", probably due to network errors, they remain in this state for a few hours. In the meantime, if we try to reach the hosts and the vm, we can connect to them correctly. So in my opinion, the ovirt manager, when it is not able to reach the hypervisors, set the server in the non responsive state, but it has issues when he tries to reconnect to it and to bring them up again. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CTRGGZ3V46YMEL...

Hello, the link is not available

Just copy/paste it in a browser. На 20 юли 2020 г. 17:00:01 GMT+03:00, lu.alfonsi@almaviva.it написа:
Hello,
the link is not available _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/IBCAF2YSYIZPAD...
participantes (2)
-
lu.alfonsi@almaviva.it
-
Strahil Nikolov