On Thu, Feb 14, 2019 at 8:24 PM Jayme <jaymef@gmail.com> wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1677160 doesn't seem relevant to me?  Is that the correct link?

Like I mentioned in a previous email I'm also having problems with Gluster bricks going offline since upgrading to oVirt 4.3 yesterday (previously I've never had a single issue with gluster nor have had a brick ever go down).  I suspect this will continue to happen daily as some other users on this group have suggested.  I was able to pull some logs from engine and gluster from around the time the brick dropped.  My setup is 3 node HCI and I was previously running the latest 4.2 updates (before upgrading to 4.3).  My hardware is has a lot of overhead and I'm on 10Gbe gluster backend (the servers were certainly not under any significant amount of load when the brick went offline).  To recover I had to place the host in maintenance mode and reboot (although I suspect I could have simply unmounted and remounted gluster mounts). 

Anything in the brick logs..the below logs only indicate that engine detected that brick was down. To get to why the brick was marked down, the bricks logs would help


grep "2019-02-14" engine.log-20190214 | grep "GLUSTER_BRICK_STATUS_CHANGED"
2019-02-14 02:41:48,018-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from UP to DOWN via cli.
2019-02-14 03:20:11,189-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/engine/engine of volume engine of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:14,819-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/prod_b/prod_b of volume prod_b of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:19,692-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/isos/isos of volume isos of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:25,022-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/prod_a/prod_a of volume prod_a of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:29,088-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:34,099-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_a/non_prod_a of volume non_prod_a of cluster Default from DOWN to UP via cli

glusterd.log

# grep -B20 -A20 "2019-02-14 02:41" glusterd.log
[2019-02-14 02:36:49.585034] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:36:49.597788] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:36:49.597788] and [2019-02-14 02:36:49.900505]
[2019-02-14 02:36:53.437539] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:36:53.452816] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:36:53.864153] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:36:53.875835] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:36:30.958649] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:36:35.322129] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
[2019-02-14 02:36:39.639645] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:36:45.301275] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:36:53.875835] and [2019-02-14 02:36:54.180780]
[2019-02-14 02:37:59.193409] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:38:44.065560] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:38:44.072680] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:38:44.077841] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:38:44.082798] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:38:44.088237] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
[2019-02-14 02:38:44.093518] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:37:59.193409] and [2019-02-14 02:38:44.100494]
[2019-02-14 02:41:58.649683] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 6 times between [2019-02-14 02:41:58.649683] and [2019-02-14 02:43:00.286999]
[2019-02-14 02:43:46.366743] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:43:46.373587] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:43:46.378997] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:43:46.384324] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:43:46.390310] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
[2019-02-14 02:43:46.397031] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
[2019-02-14 02:43:46.404083] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:45:47.302884] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:45:47.309697] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:45:47.315149] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:45:47.320806] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:45:47.326865] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
[2019-02-14 02:45:47.332192] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
[2019-02-14 02:45:47.338991] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:46:47.789575] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:46:47.795276] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
[2019-02-14 02:46:47.800584] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
[2019-02-14 02:46:47.770601] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:46:47.778161] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:46:47.784020] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a

engine.log

# grep -B20 -A20 "2019-02-14 02:41:48" engine.log-20190214
2019-02-14 02:41:43,495-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 172c9ee8
2019-02-14 02:41:43,609-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@479fcb69, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6443e68f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2b4cf035, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5864f06a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6119ac8c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1a9549be, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5614cf81, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@290c9289, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5dd26e8, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@35355754, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@452deeb4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8f8b442, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@647e29d3, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7bee4dff, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@511c4478, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c0bb0bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@92e325e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@260731, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@33aaacc9, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@72657c59, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@aa10c89], log id: 172c9ee8
2019-02-14 02:41:43,610-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 3a0e9d63
2019-02-14 02:41:43,703-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@5ca4a20f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@57a8a76, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@7bd1b14], log id: 3a0e9d63
2019-02-14 02:41:43,704-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 49966b05
2019-02-14 02:41:44,213-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 49966b05
2019-02-14 02:41:44,214-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 30db0ce2
2019-02-14 02:41:44,311-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@61a309b5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@ea9cb2e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@749d57bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c49f9d0, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@655eb54d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@256ee273, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3bd079dc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6804900f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@78e0a49f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2acfbc8a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12e92e96, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5ea1502c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2398c33b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7464102e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2f221daa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7b561852, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1eb29d18, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4a030b80, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@75739027, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3eac8253, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@34fc82c3], log id: 30db0ce2
2019-02-14 02:41:44,312-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 6671d0d7
2019-02-14 02:41:44,329-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:44,345-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:44,374-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:44,405-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@f6a9696, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@558e3332, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@5b449da], log id: 6671d0d7
2019-02-14 02:41:44,406-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 6d2bc6d3
2019-02-14 02:41:44,908-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 6d2bc6d3
2019-02-14 02:41:44,909-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVolumeAdvancedDetailsVDSCommand(HostName = Host0, GlusterVolumeAdvancedDetailsVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5', volumeName='non_prod_b'}), log id: 36ae23c6
2019-02-14 02:41:47,336-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:47,351-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:47,379-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:47,979-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVolumeAdvancedDetailsVDSCommand, return: org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeAdvancedDetails@7a4a787b, log id: 36ae23c6
2019-02-14 02:41:48,018-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from UP to DOWN via cli.
2019-02-14 02:41:48,046-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_DOWN(4,151), Status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b on cluster Default is down.
2019-02-14 02:41:48,139-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler1) [5ff5b093] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:48,140-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] START, GlusterServersListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: e1fb23
2019-02-14 02:41:48,911-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] FINISH, GlusterServersListVDSCommand, return: [10.12.0.220/24:CONNECTED, host1.replaced.domain.com:CONNECTED, host2.replaced.domain.com:CONNECTED], log id: e1fb23
2019-02-14 02:41:48,930-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler1) [5ff5b093] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:48,931-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] START, GlusterVolumesListVDSCommand(HostName = Host0, GlusterVolumesListVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 68f1aecc
2019-02-14 02:41:49,366-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] FINISH, GlusterVolumesListVDSCommand, return: {6c05dfc6-4dc0-41e3-a12f-55b4767f1d35=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@1952a85, 3f8f6a0f-aed4-48e3-9129-18a2a3f64eef=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@2f6688ae, 71ff56d9-79b8-445d-b637-72ffc974f109=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@730210fb, 752a9438-cd11-426c-b384-bc3c5f86ed07=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@c3be510c, c3e7447e-8514-4e4a-9ff5-a648fe6aa537=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@450befac, 79e8e93c-57c8-4541-a360-726cec3790cf=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@1926e392}, log id: 68f1aecc
2019-02-14 02:41:49,489-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 38debe74
2019-02-14 02:41:49,581-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5e5a7925, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2cdf5c9e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@443cb62, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@49a3e880, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@443d23c0, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1250bc75, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8d27d86, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5e6363f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@73ed78db, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@64c9d1c7, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7fecbe95, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3a551e5f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2266926e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@88b380c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1209279e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3c6466, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@16df63ed, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@47456262, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c2b88c3, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7f57c074, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12fa0478], log id: 38debe74
2019-02-14 02:41:49,582-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 7ec02237
2019-02-14 02:41:49,660-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@3eedd0bc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@7f78e375, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@3d63e126], log id: 7ec02237
2019-02-14 02:41:49,661-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 42cdad27
2019-02-14 02:41:50,142-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 42cdad27
2019-02-14 02:41:50,143-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 12f5fdf2
2019-02-14 02:41:50,248-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2aaed792, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8e66930, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@276d599e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1aca2aec, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@46846c60, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7d103269, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@30fc25fc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7baae445, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1ea8603c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@62578afa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@33d58089, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1f71d27a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4205e828, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c5bbac8, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@395a002, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12664008, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7f4faec4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3e03d61f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1038e46d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@307e8062, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@32453127], log id: 12f5fdf2
2019-02-14 02:41:50,249-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 1256aa5e
2019-02-14 02:41:50,338-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@459a2ff5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@123cab4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@1af41fbe], log id: 1256aa5e
2019-02-14 02:41:50,339-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 3dd752e4
2019-02-14 02:41:50,847-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 3dd752e4
2019-02-14 02:41:50,848-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 29a6272c
2019-02-14 02:41:50,954-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@364f3ec6, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@c7cce5e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@b3bed47, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@13bc244b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5cca81f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@36aeba0d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@62ab384a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1047d628, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@188a30f5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5bb79f3b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@60e5956f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4e3df9cd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7796567, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@60d06cf4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2cd2d36c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@d80a4aa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@411eaa20, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@22cac93b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@18b927bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@101465f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@246f927c], log id: 29a6272c
2019-02-14 02:41:50,955-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 501814db
2019-02-14 02:41:51,044-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@1cd55aa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@32c5aba2, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@6ae123f4], log id: 501814db
2019-02-14 02:41:51,045-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 7acf4cbf
2019-02-14 02:41:51,546-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 7acf4cbf
2019-02-14 02:41:51,547-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVolumeAdvancedDetailsVDSCommand(HostName = Host0, GlusterVolumeAdvancedDetailsVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5', volumeName='non_prod_a'}), log id: 11c42649

On Thu, Feb 14, 2019 at 10:16 AM Sandro Bonazzola <sbonazzo@redhat.com> wrote:


Il giorno gio 14 feb 2019 alle ore 07:54 Jayme <jaymef@gmail.com> ha scritto:
I have a three node HCI gluster which was previously running 4.2 with zero problems.  I just upgraded it yesterday.  I ran in to a few bugs right away with the upgrade process, but aside from that I also discovered other users with severe GlusterFS problems since the upgrade to new GlusterFS version.  It is less than 24 hours since I upgrade my cluster and I just got a notice that one of my GlusterFS bricks is offline.  There does appear to be a very real and serious issue here with the latest updates.

tracking the issue on Gluster side on this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1677160
If you can help Gluster community providing requested logs it would be great.


 


On Wed, Feb 13, 2019 at 7:26 PM <dscott@umbctraining.com> wrote:
I'm abandoning my production ovirt cluster due to instability.   I have a 7 host cluster running about 300 vms and have been for over a year.  It has become unstable over the past three days.  I have random hosts both, compute and storage disconnecting.  AND many vms disconnecting and becoming unusable.

7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts running 3.12.5.  I submitted a bugzilla bug and they immediately assigned it to the storage people but have not responded with any meaningful information.  I have submitted several logs. 

I have found some discussion on problems with instability with gluster 3.12.5.  I would be willing to upgrade my gluster to a more stable version if that's the culprit.  I installed gluster using the ovirt gui and this is the version the ovirt gui installed.

Is there an ovirt health monitor available?  Where should I be looking to get a resolution the problems I'm facing.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QULCBXHTKSCPKH4UV6GLMOLJE6J7M5UW/


--

SANDRO BONAZZOLA

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA

sbonazzo@redhat.com   

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/4YV6ERVJ4OYRJYTH4BWPJMUABNKYL45R/