Hi,
time to time one of our four ovirt hosts become NonResponsive.
From engine point of view it looks this way (engine.log)
2019-05-21 13:10:30,261+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-95) [] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt03.net.slu.cz command Get
Host Capabilities failed: Message timeout which can be caused by
communication issues
2019-05-21 13:10:30,261+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-95) [] Unable to
RefreshCapabilities: VDSNetworkException: VDSGenericException:
VDSNetworkException: Message timeout which can be caused by
communication issues
from host (which is reachable) it looks like (vdsm.log)
2019-05-21 13:10:27,154+0200 INFO (vmrecovery) [vdsm.api] START
getConnectedStoragePoolsList(options=None) from=internal,
task_id=a1bebf2f-7070-4344-90b7-1d709ba94b5c (api:48)
2019-05-21 13:10:27,154+0200 INFO (vmrecovery) [vdsm.api] FINISH
getConnectedStoragePoolsList return={'poollist': []} from=internal,
task_id=a1bebf2f-7070-4344-90b7-1d709ba94b5c (api:54)
2019-05-21 13:10:27,155+0200 INFO (vmrecovery) [vds] recovery: waiting
for storage pool to go up (clientIF:709)
2019-05-21 13:10:31,245+0200 INFO (jsonrpc/4) [api.host] START
getAllVmStats() from=::1,39144 (api:48)
2019-05-21 13:10:31,247+0200 INFO (jsonrpc/4) [api.host] FINISH
getAllVmStats return={'status': {'message': 'Done',
'code': 0},
'statsList': (suppressed)} from=::1,39144 (api:54)
2019-05-21 13:10:31,249+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer]
RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
hosts are latest CentOS7 (but old AMD Opteron HW), oVirt is 4.3.3.7-1.el7
I cannot track it down to network layer. We have 4 other RHV hosts on
the same infrastructure and it works well. Some clues what is happening?
Thanks in advance,
Jiri Slezka
Attachments:
- smime.p7s
(application/pkcs7-signature — 3.6 KB)