Re: Delete snapshots task hung

Try to migrate a VM from one host to another. I had a similar issue (1000 warnings in UI) that have stopped immediately I have migrated that VM. Best Regards, Strahil NikolovOn Oct 8, 2019 09:59, Leo David <leoalex@gmail.com> wrote:
Hi Everyone, I'm waiting since 3 days for 5 x delete snapshot tasks to finish, and for some reason it seems to be stucked.For other vms snapshot removal took at most 20 mins, with havin the disks pretty much same size, and snapshots numbers. Any thoughts on how should I get this fixed ? Below, some lines from the engine.log, and it seems to show some complains regarding locks ( Failed to acquire lock and wait lock) , although I am not sure if thats the root cause: Thank you very much !
Leo
2019-10-08 09:52:48,692+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-47) [73016a4a-bb2f-487f-91c5-cd027b278930] Command 'RemoveSnapshotSingleDiskLive' (id: '341d9c1b-2915-48d6-a8a9-9146ab19d5f8') waiting on child command id: '329da0fd-801b-4e0d-b7c0-fbb5c2a98bb5' type:'DestroyImage' to complete 2019-10-08 09:52:48,702+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-47) [73016a4a-bb2f-487f-91c5-cd027b278930] Command 'RemoveSnapshotSingleDiskLive' (id: '580fa033-35fd-44f0-9979-e60e9bbf8a29') waiting on child command id: 'c00bdeb6-2e8b-4ef8-a3dc-1aaa088ae052' type:'DestroyImage' to complete 2019-10-08 09:52:49,713+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [539ba19e-0cb5-42cf-9a23-7916ee2de4a9] Command 'RemoveSnapshotSingleDiskLive' (id: 'de747f91-ec59-4e70-9345-77e16234bfe0') waiting on child command id: '10812160-cf4c-4239-bb92-1d5a847687ee' type:'DestroyImage' to complete 2019-10-08 09:52:50,725+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-100) [baed2fa3-bcad-43b2-8164-480598bc72f3] Command 'RemoveSnapshotSingleDiskLive' (id: '4919b287-e980-4d34-a219-c08a169cd8f7') waiting on child command id: '5eceb6a8-f08e-42aa-8258-c907f5927e6c' type:'DestroyImage' to complete 2019-10-08 09:52:51,563+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler7) [306a2296] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[c6087b9e-2214-11e9-9288-00163e168814=GLUSTER]', sharedLocks=''}' 2019-10-08 09:52:51,583+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler7) [306a2296] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[c6087b9e-2214-11e9-9288-00163e168814=GLUSTER]', sharedLocks=''}' 2019-10-08 09:52:51,604+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler7) [306a2296] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[c6087b9e-2214-11e9-9288-00163e168814=GLUSTER]', sharedLocks=''}' 2019-10-08 09:52:51,606+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler7) [306a2296] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[c6087b9e-2214-11e9-9288-00163e168814=GLUSTER]', sharedLocks=''}' 2019-10-08 09:52:51,735+03 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [73016a4a-bb2f-487f-91c5-cd027b278930] Command 'RemoveSnapshot' (id: 'c9ab1344-ae27-4934-9358-d6a7b10a4f0a') waiting on child command id: '341d9c1b-2915-48d6-a8a9-9146ab19d5f8' type:'RemoveSnapshotSingleDiskLive' to complete 2019-10-08 09:52:52,706+03 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler10) [8921c9c] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@21830b5f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@676adc3e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@385a3510, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@af24d00, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@331266f2], log id: 1a3515fe 2019-10-08 09:52:52,708+03 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler10) [8921c9c] START, GetGlusterVDOVolumeListVDSCommand(HostName =

Thank you Strahil, But the vm's are not starting at all... Error is clear: " Exit message: Bad volume specification" , but i just do not understand how to deal with this. Cheers, Leo On Tue, Oct 8, 2019 at 2:44 PM Strahil <hunter86_bg@yahoo.com> wrote:
Try to migrate a VM from one host to another. I had a similar issue (1000 warnings in UI) that have stopped immediately I have migrated that VM.
Best Regards, Strahil Nikolov On Oct 8, 2019 09:59, Leo David <leoalex@gmail.com> wrote:
Hi Everyone, I'm waiting since 3 days for 5 x delete snapshot tasks to finish, and for some reason it seems to be stucked.For other vms snapshot removal took at most 20 mins, with havin the disks pretty much same size, and snapshots numbers. Any thoughts on how should I get this fixed ? Below, some lines from the engine.log, and it seems to show some complains regarding locks ( Failed to acquire lock and wait lock) , although I am not sure if thats the root cause: Thank you very much !
Leo
2019-10-08 09:52:48,692+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-47) [73016a4a-bb2f-487f-91c5-cd027b278930] Command 'RemoveSnapshotSingleDiskLive' (id: '341d9c1b-2915-48d6-a8a9-9146ab19d5f8') waiting on child command id: '329da0fd-801b-4e0d-b7c0-fbb5c2a98bb5' type:'DestroyImage' to complete 2019-10-08 09:52:48,702+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-47) [73016a4a-bb2f-487f-91c5-cd027b278930] Command 'RemoveSnapshotSingleDiskLive' (id: '580fa033-35fd-44f0-9979-e60e9bbf8a29') waiting on child command id: 'c00bdeb6-2e8b-4ef8-a3dc-1aaa088ae052' type:'DestroyImage' to complete 2019-10-08 09:52:49,713+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [539ba19e-0cb5-42cf-9a23-7916ee2de4a9] Command 'RemoveSnapshotSingleDiskLive' (id: 'de747f91-ec59-4e70-9345-77e16234bfe0') waiting on child command id: '10812160-cf4c-4239-bb92-1d5a847687ee' type:'DestroyImage' to complete 2019-10-08 09:52:50,725+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-100) [baed2fa3-bcad-43b2-8164-480598bc72f3] Command 'RemoveSnapshotSingleDiskLive' (id: '4919b287-e980-4d34-a219-c08a169cd8f7') waiting on child command id: '5eceb6a8-f08e-42aa-8258-c907f5927e6c' type:'DestroyImage' to complete 2019-10-08 09:52:51,563+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler7) [306a2296] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[c6087b9e-2214-11e9-9288-00163e168814=GLUSTER]', sharedLocks=''}' 2019-10-08 09:52:51,583+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler7) [306a2296] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[c6087b9e-2214-11e9-9288-00163e168814=GLUSTER]', sharedLocks=''}' 2019-10-08 09:52:51,604+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler7) [306a2296] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[c6087b9e-2214-11e9-9288-00163e168814=GLUSTER]', sharedLocks=''}' 2019-10-08 09:52:51,606+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler7) [306a2296] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[c6087b9e-2214-11e9-9288-00163e168814=GLUSTER]', sharedLocks=''}' 2019-10-08 09:52:51,735+03 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [73016a4a-bb2f-487f-91c5-cd027b278930] Command 'RemoveSnapshot' (id: 'c9ab1344-ae27-4934-9358-d6a7b10a4f0a') waiting on child command id: '341d9c1b-2915-48d6-a8a9-9146ab19d5f8' type:'RemoveSnapshotSingleDiskLive' to complete 2019-10-08 09:52:52,706+03 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler10) [8921c9c] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@21830b5f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@676adc3e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@385a3510, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@af24d00, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@331266f2], log id: 1a3515fe 2019-10-08 09:52:52,708+03 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler10) [8921c9c] START, GetGlusterVDOVolumeListVDSCommand(HostName =
-- Best regards, Leo David

Hi Everyone, Please let me know if any thoughts or recommandations that could help me solve this issue.. The real bad luck in this outage is that these 5 vms are part on an Openshift deployment, and now we are not able to start it up... Before trying to sort this at ocp platform level by replacing the failed nodes with new vms, I would rather prefer to do it at the oVirt level and have the vms starting since the disks are still present on gluster. Thank you so much ! Leo
participants (2)
-
Leo David
-
Strahil