Hi all,

I have a ovirt 2 node cluster for testing with self hosted engine on top gluster.

The cluster was running on 4.1. After the upgrade to 4.2, which generally went smoothly, I am seeing that the bricks of one of the hosts (v1) are detected as down, while the gluster is ok when checked with command lines and all volumes mounted.

Below is the error that the engine logs:

2018-06-17 00:21:26,309+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2)
[98d7e79] Error while refreshing brick statuses for volume 'vms' of cluster 'test': null
2018-06-17 00:21:26,318+03 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
(DefaultQuartzScheduler2) [98d7e79] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v0.test-group.com,
VdsIdVDSCommandParametersBase:{hostId='d5a96118-ca49-411f-86cb-280c7f9c421f'})' execution failed: null
2018-06-17 00:21:26,323+03 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
(DefaultQuartzScheduler2) [98d7e79] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName = v1.test-group.com,
VdsIdVDSCommandParametersBase:{hostId='12dfea4a-8142-484e-b912-0cbd5f281aba'})' execution failed: null
2018-06-17 00:21:27,015+03 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler9)
[426e7c3d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[00000002-0002-0002-0002-00000000017a=GLUSTER]', sharedLocks=''}'
2018-06-17 00:21:27,926+03 ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler2)
[98d7e79] Error while refreshing brick statuses for volume 'engine' of cluster 'test': null

Apart from this everything else is operating normally and VMs are running on both hosts.

Any idea to isolate this issue?

Thanx,
Alex