Hello,
Have an issue that feels sanlock related, but I can't get sorted with our
installation. This is 4.2.1, hosted engine. One of our hosts is stuck in a loop. It:
- gets a VDSM GetStatsVDS timeout, is marked as down,
- throws a warning about not being fenced (because that's not enabled yet, because of
this problem).
- and is set up Up about a minute later.
This repeats every 4 minutes and 20 seconds.
The hosted engine is running on the host that is stuck like this, and it doesn't
appear to get in the way of creating new VMs or other operations, but obviously I
can't use fencing, which is a big part of the point of running Ovirt in the first
place.
I tried setting global maintenance and running hosted-engine --reinitialize-lockspace,
which (a) took nearly exactly 2 minutes to run, making me think something timed out, (b)
exited with rc 0, and (c) didn't fix the problem.
Anyone have an idea of how to fix this?
-j
- - details - -
I still can't quite figure out how to interpret what sanlock says, but the -1s look
like wrongness.
[sc5-ovirt-1]# sanlock client status
daemon bedae69e-03cc-49f8-88f4-9674a85a3185.sc5-ovirt-
p -1 helper
p -1 listener
p 122268 HostedEngine
p -1 status
s
1aabcd3a-3fd3-4902-b92e-17beaf8fe3fd:1:/rhev/data-center/mnt/glusterSD/172.16.0.151\:_sc5-images/1aabcd3a-3fd3-4902-b92e-17beaf8fe3fd/dom_md/ids:0
s
b41eb20a-eafb-481b-9a50-a135cf42b15e:1:/rhev/data-center/mnt/glusterSD/sc5-gluster-10g-1\:_sc5-ovirt__engine/b41eb20a-eafb-481b-9a50-a135cf42b15e/dom_md/ids:0
r
b41eb20a-eafb-481b-9a50-a135cf42b15e:8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87:/rhev/data-center/mnt/glusterSD/172.16.0.153\:_sc5-ovirt__engine/b41eb20a-eafb-481b-9a50-a135cf42b15e/images/a9d01d59-f146-47e5-b514-d10f8867678e/8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87.lease:0:5
p 122268
engine.log:
2018-03-21 16:09:26,081-07 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM sc5-ovirt-1 command GetStatsVDS failed: Message
timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Command
'GetStatsVDSCommand(HostName = sc5-ovirt-1,
VdsIdAndVdsVDSCommandParametersBase:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
vds='Host[sc5-ovirt-1,be3517e0-f79d-464c-8169-f786d13ac287]'})' execution
failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by
communication issues
2018-03-21 16:09:26,081-07 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failed getting vds stats,
host='sc5-ovirt-1'(be3517e0-f79d-464c-8169-f786d13ac287):
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException:
VDSNetworkException: Message timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failure to refresh host
'sc5-ovirt-1' runtime info: VDSGenericException: VDSNetworkException: Message
timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 WARN [org.ovirt.engine.core.vdsbroker.VdsManager]
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failed to refresh VDS, network
error, continuing, vds='sc5-ovirt-1'(be3517e0-f79d-464c-8169-f786d13ac287):
VDSGenericException: VDSNetworkException: Message timeout which can be caused by
communication issues
2018-03-21 16:09:26,081-07 WARN [org.ovirt.engine.core.vdsbroker.VdsManager]
(EE-ManagedThreadFactory-engine-Thread-102682) [] Host 'sc5-ovirt-1' is not
responding.
2018-03-21 16:09:26,088-07 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-102682) [] EVENT_ID:
VDS_HOST_NOT_RESPONDING(9,027), Host sc5-ovirt-1 is not responding. Host cannot be fenced
automatically because power management for the host is disabled.
2018-03-21 16:09:27,070-07 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient]
(SSL Stomp Reactor) [] Connecting to sc5-ovirt-1/10.181.26.129
2018-03-21 16:09:27,918-07 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler4) [493fb316] START, GlusterServersListVDSCommand(HostName =
sc5-gluster-2,
VdsIdVDSCommandParametersBase:{hostId='797cbf42-6553-4a75-b8b1-93b2adbbc0db'}),
log id: 6afccc01
2018-03-21 16:09:28,579-07 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler4) [493fb316] FINISH, GlusterServersListVDSCommand, return:
[192.168.122.1/24:CONNECTED, sc5-gluster-3:CONNECTED, sc5-gluster-10g-1:CONNECTED], log
id: 6afccc01
2018-03-21 16:09:28,606-07 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler4) [493fb316] START, GlusterVolumesListVDSCommand(HostName =
sc5-gluster-2,
GlusterVolumesListVDSParameters:{hostId='797cbf42-6553-4a75-b8b1-93b2adbbc0db'}),
log id: 44e90100
2018-03-21 16:09:29,015-07 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler4) [493fb316] FINISH, GlusterVolumesListVDSCommand, return:
{6fe949b5-894a-4843-b3e4-af81545574dc=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@140a4a60,
bc29ba89-8fc0-494d-9fe5-bc7b34396b65=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@29637467},
log id: 44e90100
2018-03-21 16:09:29,686-07 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [] START,
GetHardwareInfoVDSCommand(HostName = sc5-ovirt-1,
VdsIdAndVdsVDSCommandParametersBase:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
vds='Host[sc5-ovirt-1,be3517e0-f79d-464c-8169-f786d13ac287]'}), log id: 6b1cb74b
2018-03-21 16:09:29,692-07 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [] FINISH, GetHardwareInfoVDSCommand,
log id: 6b1cb74b
2018-03-21 16:09:29,900-07 INFO
[org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [576fddcc] Running command:
HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID:
be3517e0-f79d-464c-8169-f786d13ac287 Type: VDS
2018-03-21 16:09:29,944-07 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [26c5f844] Running command:
InitVdsOnUpCommand internal: true. Entities affected : ID:
c4e2ca40-1e72-11e8-beac-00163e0994d8 Type: StoragePool
2018-03-21 16:09:29,977-07 INFO
[org.ovirt.engine.core.bll.storage.pool.ConnectHostToStoragePoolServersCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] Running command:
ConnectHostToStoragePoolServersCommand internal: true. Entities affected : ID:
c4e2ca40-1e72-11e8-beac-00163e0994d8 Type: StoragePool
2018-03-21 16:09:30,002-07 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] START,
ConnectStorageServerVDSCommand(HostName = sc5-ovirt-1,
StorageServerConnectionManagementVDSParameters:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
storagePoolId='c4e2ca40-1e72-11e8-beac-00163e0994d8',
storageType='GLUSTERFS',
connectionList='[StorageServerConnections:{id='0e2e93f1-3904-4d70-82aa-16bcc83ea314',
connection='172.16.0.153:/sc5-ovirt_engine', iqn='null',
vfsType='glusterfs',
mountOptions='backup-volfile-servers=172.16.0.152:172.16.0.151',
nfsVersion='null', nfsRetrans='null', nfsTimeo='null',
iface='null', netIfaceName='null'},
StorageServerConnections:{id='26c9dbd8-f550-4b7a-9f84-3e905f1a00db',
connection='172.16.0.151:/sc5-images', iqn='null',
vfsType='glusterfs',
mountOptions='backup-volfile-servers=172.16.0.152:172.16.0.153',
nfsVersion='null', nfsRetrans='null', nfsTimeo='null',
iface='null', netIfaceName='null'}]'}), log id: acd504a
2018-03-21 16:09:30,099-07 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] FINISH,
ConnectStorageServerVDSCommand, return: {26c9dbd8-f550-4b7a-9f84-3e905f1a00db=0,
0e2e93f1-3904-4d70-82aa-16bcc83ea314=0}, log id: acd504a
2018-03-21 16:09:30,107-07 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] START,
ConnectStorageServerVDSCommand(HostName = sc5-ovirt-1,
StorageServerConnectionManagementVDSParameters:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
storagePoolId='c4e2ca40-1e72-11e8-beac-00163e0994d8', storageType='NFS',
connectionList='[StorageServerConnections:{id='2239cb49-a8bb-49ee-9a5a-90d72c4602d0',
connection='sc5-archive-10g-1:/var/ovirt/ovirt_iso_new', iqn='null',
vfsType='null', mountOptions='null', nfsVersion='AUTO',
nfsRetrans='null', nfsTimeo='null', iface='null',
netIfaceName='null'}]'}), log id: 35528d0f
2018-03-21 16:09:30,099-07 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] FINISH,
ConnectStorageServerVDSCommand, return: {26c9dbd8-f550-4b7a-9f84-3e905f1a00db=0,
0e2e93f1-3904-4d70-82aa-16bcc83ea314=0}, log id: acd504a