Can't remove snapshot

newer
hosted-engine setup Gluster fails...

older
hosted-engine --deploy errors out...

Rik Theys

3 Feb 2016 3 Feb '16

11:26 a.m.

Hi, I created a snapshot of a running VM prior to an OS upgrade. The OS upgrade has now been succesful and I would like to remove the snapshot. I've selected the snapshot in the UI and clicked Delete to start the task. After a few minutes, the task has failed. When I click delete again on the same snapshot, the failed message is returned after a few seconds.

...

From browsing through the engine log (attached) it seems the snapshot was correctly merged in the first try but something went wrong in the finalizing fase. On retries, the log indicates the snapshot/disk image no longer exists and the removal of the snapshot fails for this reason.

Is there any way to clean up this snapshot? I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image? Regards, Rik -- Rik Theys System Engineer KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee +32(0)16/32.11.07 ---------------------------------------------------------------- <<Any errors in spelling, tact or fact are transmission errors>>

Attachments:

engine.log (text/x-log — 186.6 KB)

Show replies by date

Rik Theys

3 Feb 3 Feb

11:37 a.m.

Hi, In the mean time I've noticed the following entries in our periodic logcheck output: Feb 3 09:05:53 orinoco journal: block copy still active: disk 'vda' not ready for pivot yet Feb 3 09:05:53 orinoco journal: vdsm root ERROR Unhandled exception#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 734, in wrapper#012 return f(*a, **kw)#012 File "/usr/share/vdsm/virt/vm.py", line 5168, in run#012 self.tryPivot()#012 File "/usr/share/vdsm/virt/vm.py", line 5137, in tryPivot#012 ret = self.vm._dom.blockJobAbort(self.drive.name, flags)#012 File "/usr/share/vdsm/virt/virdomain.py", line 68, in f#012 ret = attr(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper#012 ret = f(*args, **kwargs)#012 File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort#012 if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)#012libvirtError: block copy still active: disk 'vda' not ready for pivot yet This is from the host running the VM. Note that this host is not the SPM of the cluster. I always thought all operations on disk volumes happened on the SPM host? My question still remains:

...

I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

Regards, Rik On 02/03/2016 10:26 AM, Rik Theys wrote:

...

Hi,

I created a snapshot of a running VM prior to an OS upgrade. The OS upgrade has now been succesful and I would like to remove the snapshot. I've selected the snapshot in the UI and clicked Delete to start the task.

After a few minutes, the task has failed. When I click delete again on the same snapshot, the failed message is returned after a few seconds.

...
From browsing through the engine log (attached) it seems the snapshot was correctly merged in the first try but something went wrong in the finalizing fase. On retries, the log indicates the snapshot/disk image no longer exists and the removal of the snapshot fails for this reason.

Is there any way to clean up this snapshot?

I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

Regards,

Rik

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Rik Theys System Engineer KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee +32(0)16/32.11.07 ---------------------------------------------------------------- <<Any errors in spelling, tact or fact are transmission errors>>

Michal Skrivanek

9 Feb 9 Feb

1:08 p.m.

...

On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:

Hi,

In the mean time I've noticed the following entries in our periodic logcheck output:

Feb 3 09:05:53 orinoco journal: block copy still active: disk 'vda' not ready for pivot yet Feb 3 09:05:53 orinoco journal: vdsm root ERROR Unhandled exception#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 734, in wrapper#012 return f(*a, **kw)#012 File "/usr/share/vdsm/virt/vm.py", line 5168, in run#012 self.tryPivot()#012 File "/usr/share/vdsm/virt/vm.py", line 5137, in tryPivot#012 ret = self.vm._dom.blockJobAbort(self.drive.name, flags)#012 File "/usr/share/vdsm/virt/virdomain.py", line 68, in f#012 ret = attr(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper#012 ret = f(*args, **kwargs)#012 File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort#012 if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)#012libvirtError: block copy still active: disk 'vda' not ready for pivot yet

This is from the host running the VM.

Note that this host is not the SPM of the cluster. I always thought all operations on disk volumes happened on the SPM host?

My question still remains:

...
I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

No, it’s not ok to remove it while live merge(apparently) is still ongoing I guess that’s a live merge bug? Thanks, michal

...

Regards,

Rik

On 02/03/2016 10:26 AM, Rik Theys wrote:

...
Hi,

I created a snapshot of a running VM prior to an OS upgrade. The OS upgrade has now been succesful and I would like to remove the snapshot. I've selected the snapshot in the UI and clicked Delete to start the task.

After a few minutes, the task has failed. When I click delete again on the same snapshot, the failed message is returned after a few seconds.

...
From browsing through the engine log (attached) it seems the snapshot was correctly merged in the first try but something went wrong in the finalizing fase. On retries, the log indicates the snapshot/disk image no longer exists and the removal of the snapshot fails for this reason.

Is there any way to clean up this snapshot?

I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

Regards,

Rik

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Rik Theys System Engineer KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee +32(0)16/32.11.07 ---------------------------------------------------------------- <<Any errors in spelling, tact or fact are transmission errors>> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Greg Padgett

10 Feb 10 Feb

2:30 a.m.

On 02/09/2016 06:08 AM, Michal Skrivanek wrote:

...

...
On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:

Hi,

In the mean time I've noticed the following entries in our periodic logcheck output:

Feb 3 09:05:53 orinoco journal: block copy still active: disk 'vda' not ready for pivot yet Feb 3 09:05:53 orinoco journal: vdsm root ERROR Unhandled exception#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 734, in wrapper#012 return f(*a, **kw)#012 File "/usr/share/vdsm/virt/vm.py", line 5168, in run#012 self.tryPivot()#012 File "/usr/share/vdsm/virt/vm.py", line 5137, in tryPivot#012 ret = self.vm._dom.blockJobAbort(self.drive.name, flags)#012 File "/usr/share/vdsm/virt/virdomain.py", line 68, in f#012 ret = attr(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper#012 ret = f(*args, **kwargs)#012 File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort#012 if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)#012libvirtError: block copy still active: disk 'vda' not ready for pivot yet

This is from the host running the VM.

Note that this host is not the SPM of the cluster. I always thought all operations on disk volumes happened on the SPM host?

My question still remains:

...
I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

No, it’s not ok to remove it while live merge(apparently) is still ongoing I guess that’s a live merge bug?

Indeed, this is bug 1302215. I wrote a sql script to help with cleanup in this scenario, which you can find attached to the bug along with a description of how to use it[1]. However, Rik, before trying that, would you be able to run the attached script [2] (or just the db query within) and forward the output to me? I'd like to make sure everything looks as it should before modifying the db directly. Thanks, Greg [1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (Also note that the engine should be stopped before running this.) [2] Arguments are the ovirt db name, db user, and the name of the vm you were performing live merge on.

...

Thanks, michal

...
Regards,

Rik

On 02/03/2016 10:26 AM, Rik Theys wrote:

...
Hi,

I created a snapshot of a running VM prior to an OS upgrade. The OS upgrade has now been succesful and I would like to remove the snapshot. I've selected the snapshot in the UI and clicked Delete to start the task.

After a few minutes, the task has failed. When I click delete again on the same snapshot, the failed message is returned after a few seconds.

...
From browsing through the engine log (attached) it seems the snapshot was correctly merged in the first try but something went wrong in the finalizing fase. On retries, the log indicates the snapshot/disk image no longer exists and the removal of the snapshot fails for this reason.

Is there any way to clean up this snapshot?

I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

Regards,

Rik

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Rik Theys System Engineer KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee +32(0)16/32.11.07 ---------------------------------------------------------------- <<Any errors in spelling, tact or fact are transmission errors>> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Marcelo Leandro

16 Feb 16 Feb

2:10 p.m.

Hello, I have the same problem, i tried delete snapshot but it did not success, the status snapshot as illegal , look the engine.log follow you can see the messages error: 2016-02-16 08:46:20,059 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-57) [46dd2ef7] Waiting on Live Merge child commands to complete 2016-02-16 08:46:21,069 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-40) [15b703ee] Waiting on Live Merge command step 'MERGE' to complete 2016-02-16 08:46:22,072 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-65) [30cdf6ed] Waiting on merge command to complete 2016-02-16 08:46:23,670 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (default task-48) [5e0c088f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[94d788f4-eba4-49ee-8091-80028cc46627=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2016-02-16 08:46:23,795 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (default task-48) [5e0c088f] Running command: RemoveSnapshotCommand internal: false. Entities affected : ID: 94d788f4-eba4-49ee-8091-80028cc46627 Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER 2016-02-16 08:46:23,824 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (default task-48) [5e0c088f] Lock freed to object 'EngineLock:{exclusiveLocks='[94d788f4-eba4-49ee-8091-80028cc46627=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2016-02-16 08:46:23,876 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (pool-7-thread-5) [1be123ac] Running command: RemoveSnapshotSingleDiskLiveCommand internal: true. Entities affected : ID: 00000000-0000-0000-0000-000000000000 Type: Storage 2016-02-16 08:46:23,921 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-48) [] Correlation ID: 5e0c088f, Job ID: aa811e83-24fb-4658-b849-d36439f58d95, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'BKP the VM' deletion for VM 'Servidor-Cliente' was initiated by admin@internal. 2016-02-16 08:46:24,093 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-14) [1be123ac] Executing Live Merge command step 'EXTEND' 2016-02-16 08:46:24,122 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-14) [] Waiting on Live Merge child commands to complete 2016-02-16 08:46:24,133 INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-7-thread-6) [766ffc9f] Running command: MergeExtendCommand internal: true. Entities affected : ID: c2dc0101-748e-4a7b-9913-47993eaa52bd Type: Storage 2016-02-16 08:46:24,134 INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-7-thread-6) [766ffc9f] Base and top image sizes are the same; no image size update required 2016-02-16 08:46:25,133 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-16) [1be123ac] Executing Live Merge command step 'MERGE' 2016-02-16 08:46:25,168 INFO [org.ovirt.engine.core.bll.MergeCommand] (pool-7-thread-7) [1b7bc421] Running command: MergeCommand internal: true. Entities affected : ID: c2dc0101-748e-4a7b-9913-47993eaa52bd Type: Storage 2016-02-16 08:46:25,169 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] START, MergeVDSCommand(HostName = Host01, MergeVDSCommandParameters:{runAsync='true', hostId='d4f29978-1540-44d9-ab22-1e6ff750059f', vmId='94d788f4-eba4-49ee-8091-80028cc46627', storagePoolId='77e24b20-9d21-4952-a089-3c5c592b4e6d', storageDomainId='c2dc0101-748e-4a7b-9913-47993eaa52bd', imageGroupId='b7a27d0c-57cc-490e-a3f8-b4981310a9b0', imageId='7f8bb099-9a18-4e89-bf48-57e56e5770d2', baseImageId='2e59f7f2-9e30-460e-836a-5e0d3d625059', topImageId='7f8bb099-9a18-4e89-bf48-57e56e5770d2', bandwidth='0'}), log id: 2a7ab7b7 2016-02-16 08:46:25,176 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] Failed in 'MergeVDS' method 2016-02-16 08:46:25,179 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-7-thread-7) [1b7bc421] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM Host01 command failed: Drive image file could not be found 2016-02-16 08:46:25,179 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=13, message=Drive image file could not be found]]' 2016-02-16 08:46:25,179 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] HostName = Host01 2016-02-16 08:46:25,179 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] Command 'MergeVDSCommand(HostName = Host01, MergeVDSCommandParameters:{runAsync='true', hostId='d4f29978-1540-44d9-ab22-1e6ff750059f', vmId='94d788f4-eba4-49ee-8091-80028cc46627', storagePoolId='77e24b20-9d21-4952-a089-3c5c592b4e6d', storageDomainId='c2dc0101-748e-4a7b-9913-47993eaa52bd', imageGroupId='b7a27d0c-57cc-490e-a3f8-b4981310a9b0', imageId='7f8bb099-9a18-4e89-bf48-57e56e5770d2', baseImageId='2e59f7f2-9e30-460e-836a-5e0d3d625059', topImageId='7f8bb099-9a18-4e89-bf48-57e56e5770d2', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 2016-02-16 08:46:25,179 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] FINISH, MergeVDSCommand, log id: 2a7ab7b7 2016-02-16 08:46:25,180 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-7-thread-7) [1b7bc421] Command 'org.ovirt.engine.core.bll.MergeCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 (Failed with error imageErr and code 13) 2016-02-16 08:46:25,186 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-7-thread-7) [1b7bc421] Transaction rolled-back for command 'org.ovirt.engine.core.bll.MergeCommand'. 2016-02-16 08:46:26,159 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-25) [15b703ee] Waiting on Live Merge child commands to complete 2016-02-16 08:46:27,164 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-32) [1be123ac] Failed child command status for step 'MERGE' 2016-02-16 08:46:27,497 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-37) [30cdf6ed] VM job '77669e28-4aa2-4038-b7b6-1a949a1d039e': In progress, updating 2016-02-16 08:46:28,192 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-87) [1be123ac] Merging of snapshot '119f668e-af60-49ea-aa08-735be8af0a7d' images '2e59f7f2-9e30-460e-836a-5e0d3d625059'..'7f8bb099-9a18-4e89-bf48-57e56e5770d2' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation. 2016-02-16 08:46:28,204 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-87) [5e0c088f] All Live Merge child commands have completed, status 'FAILED' 2016-02-16 08:46:29,216 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (DefaultQuartzScheduler_Worker-89) [5e0c088f] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotCommand' with failure. 2016-02-16 08:46:29,263 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-89) [5e0c088f] Correlation ID: 5e0c088f, Job ID: aa811e83-24fb-4658-b849-d36439f58d95, Call Stack: null, Custom Event ID: -1, Message: Failed to delete snapshot 'BKP the VM' for VM 'Servidor-Cliente'. 2016-02-16 08:46:30,287 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-33) [] Waiting on Live Merge child commands to complete 2016-02-16 08:46:31,298 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-21) [15b703ee] Waiting on Live Merge command step 'MERGE' to complete 2016-02-16 08:46:32,301 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-68) [30cdf6ed] Waiting on merge command to complete 2016-02-16 08:46:40,304 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-55) [280a8a32] Waiting on Live Merge child commands to complete 2016-02-16 08:46:41,308 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-54) [15b703ee] Waiting on Live Merge command step 'MERGE' to complete 2016-02-16 08:46:42,312 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-57) [30cdf6ed] Waiting on merge command to complete 2016-02-16 08:46:42,850 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-84) [] VM job '77669e28-4aa2-4038-b7b6-1a949a1d039e': In progress, updating 2016-02-16 08:46:42,854 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-84) [] START, FullListVDSCommand(HostName = , FullListVDSCommandParameters:{runAsync='true', hostId='aebc403a-ec4e-4346-9029-6353d5d76f01', vds='Host[,aebc403a-ec4e-4346-9029-6353d5d76f01]', vmIds='[6af1f9c3-7210-45c3-90dc-bd7793346c0c]'}), log id: 74961dad I cannot see the snapshot disk at the storage domain: [root@ ~]# cd /rhev/data-center/77e24b20-9d21-4952-a089-3c5c592b4e6d/c1938052-7524-404c-bac9-f238227269ea/images/b7a27d0c-57cc-490e-a3f8-b4981310a9b0/ [root@ b7a27d0c-57cc-490e-a3f8-b4981310a9b0]# ls 2e59f7f2-9e30-460e-836a-5e0d3d625059 2e59f7f2-9e30-460e-836a-5e0d3d625059.meta Thanks. 2016-02-09 21:30 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...

On 02/09/2016 06:08 AM, Michal Skrivanek wrote:

...
...
On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:

Hi,

In the mean time I've noticed the following entries in our periodic logcheck output:

Feb 3 09:05:53 orinoco journal: block copy still active: disk 'vda' not ready for pivot yet Feb 3 09:05:53 orinoco journal: vdsm root ERROR Unhandled exception#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 734, in wrapper#012 return f(*a, **kw)#012 File "/usr/share/vdsm/virt/vm.py", line 5168, in run#012 self.tryPivot()#012 File "/usr/share/vdsm/virt/vm.py", line 5137, in tryPivot#012 ret = self.vm._dom.blockJobAbort(self.drive.name, flags)#012 File "/usr/share/vdsm/virt/virdomain.py", line 68, in f#012 ret = attr(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper#012 ret = f(*args, **kwargs)#012 File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort#012 if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)#012libvirtError: block copy still active: disk 'vda' not ready for pivot yet

This is from the host running the VM.

Note that this host is not the SPM of the cluster. I always thought all operations on disk volumes happened on the SPM host?

My question still remains:

...
I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

No, it’s not ok to remove it while live merge(apparently) is still ongoing I guess that’s a live merge bug?

Indeed, this is bug 1302215.

I wrote a sql script to help with cleanup in this scenario, which you can find attached to the bug along with a description of how to use it[1].

However, Rik, before trying that, would you be able to run the attached script [2] (or just the db query within) and forward the output to me? I'd like to make sure everything looks as it should before modifying the db directly.

Thanks, Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (Also note that the engine should be stopped before running this.)

[2] Arguments are the ovirt db name, db user, and the name of the vm you were performing live merge on.

...
Thanks, michal

...
Regards,

Rik

On 02/03/2016 10:26 AM, Rik Theys wrote:

...
Hi,

I created a snapshot of a running VM prior to an OS upgrade. The OS upgrade has now been succesful and I would like to remove the snapshot. I've selected the snapshot in the UI and clicked Delete to start the task.

After a few minutes, the task has failed. When I click delete again on the same snapshot, the failed message is returned after a few seconds.

...
From browsing through the engine log (attached) it seems the snapshot

was correctly merged in the first try but something went wrong in the finalizing fase. On retries, the log indicates the snapshot/disk image no longer exists and the removal of the snapshot fails for this reason.

Is there any way to clean up this snapshot?

I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

Regards,

Rik

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Rik Theys System Engineer KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee +32(0)16/32.11.07 ---------------------------------------------------------------- <<Any errors in spelling, tact or fact are transmission errors>> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Rik Theys

3:02 p.m.

Hi Greg,

...

2016-02-09 21:30 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...
On 02/09/2016 06:08 AM, Michal Skrivanek wrote:

...
...
On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:

...

...
...
...
...
I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

No, it’s not ok to remove it while live merge(apparently) is still ongoing I guess that’s a live merge bug?

Indeed, this is bug 1302215.

I wrote a sql script to help with cleanup in this scenario, which you can find attached to the bug along with a description of how to use it[1].

However, Rik, before trying that, would you be able to run the attached script [2] (or just the db query within) and forward the output to me? I'd like to make sure everything looks as it should before modifying the db directly.

I ran the following query on the engine database: select images.* from images join snapshots ON (images.vm_snapshot_id = snapshots.snapshot_id) join vm_static on (snapshots.vm_id = vm_static.vm_guid) where vm_static.vm_name = 'lena' and snapshots.description='before jessie upgrade'; The resulting output is: image_guid | creation_date | size | it_guid | parentid | images tatus | lastmodified | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_da te | _update_date | active | volume_classification --------------------------------------+------------------------+-------------+--------------------------------------+--------------------------------------+------- ------+----------------------------+--------------------------------------+-------------+---------------+--------------------------------------+------------------- ------------+-------------------------------+--------+----------------------- 24d78600-22f4-44f7-987b-fbd866736249 | 2015-05-19 15:00:13+02 | 34359738368 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 4 | 2016-01-30 08:45:59.998+01 | 4b4930ed-b52d-47ec-8506-245b7f144102 | 1 | 5 | b2390535-744f-4c02-bdc8-5a897226554b | 2015-05-19 15:00:1 1.864425+02 | 2016-01-30 08:45:59.999422+01 | f | 1 (1 row) Regards, Rik

...

...
Thanks, Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (Also note that the engine should be stopped before running this.)

[2] Arguments are the ovirt db name, db user, and the name of the vm you were performing live merge on.

...
Thanks, michal

...
Regards,

Rik

On 02/03/2016 10:26 AM, Rik Theys wrote:

...
Hi,

I created a snapshot of a running VM prior to an OS upgrade. The OS upgrade has now been succesful and I would like to remove the snapshot. I've selected the snapshot in the UI and clicked Delete to start the task.

After a few minutes, the task has failed. When I click delete again on the same snapshot, the failed message is returned after a few seconds.

...
From browsing through the engine log (attached) it seems the snapshot

was correctly merged in the first try but something went wrong in the finalizing fase. On retries, the log indicates the snapshot/disk image no longer exists and the removal of the snapshot fails for this reason.

Is there any way to clean up this snapshot?

I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

Rik Theys

3:50 p.m.

...

From the above I conclude that the disk with id that ends with 6249 is

Hi, I'm trying to determine the correct "bad_img" uuid in my case. The VM has two snapshots: * The "Active VM" snapshot which has a disk that has an actual size that's 5GB larger than the virtual size. It has a creation date that matches the timestamp at which I created the second snapshot. The "disk snapshot id" for this snapshot ends with dc39. * A "before jessie upgrade" snapshot that has status "illegal". It has an actual size that's 2GB larger than the virtual size. The creation date matches the date the VM was initialy created. The disk snapshot id ends with 6249. the "bad" img I need to specify. However, I grepped the output from 'lvs' on the SPM host of the cluster and both disk id's are returned: [root@amazone ~]# lvs | egrep 'cd39|6249' 24d78600-22f4-44f7-987b-fbd866736249 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi-ao---- 34.00g 81458622-aa54-4f2f-b6d8-75e7db36cd39 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi------- 5.00g I expected the "bad" img would no longer be found? The SQL script only cleans up the database and not the logical volumes. Would running the script not keep a stale LV around? Also, from the lvs output it seems the "bad" disk is bigger than the "good" one. Is it possible the snapshot still needs to be merged?? If so, how can I initiate that? Regards, Rik On 02/16/2016 02:02 PM, Rik Theys wrote:

...

Hi Greg,

...
2016-02-09 21:30 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...
On 02/09/2016 06:08 AM, Michal Skrivanek wrote:

...
...
On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:

...
...
...
...
...
I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

No, it’s not ok to remove it while live merge(apparently) is still ongoing I guess that’s a live merge bug?

Indeed, this is bug 1302215.

I wrote a sql script to help with cleanup in this scenario, which you can find attached to the bug along with a description of how to use it[1].

However, Rik, before trying that, would you be able to run the attached script [2] (or just the db query within) and forward the output to me? I'd like to make sure everything looks as it should before modifying the db directly.

I ran the following query on the engine database:

select images.* from images join snapshots ON (images.vm_snapshot_id = snapshots.snapshot_id) join vm_static on (snapshots.vm_id = vm_static.vm_guid) where vm_static.vm_name = 'lena' and snapshots.description='before jessie upgrade';

The resulting output is:

image_guid | creation_date | size | it_guid | parentid | images tatus | lastmodified | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_da te | _update_date | active | volume_classification --------------------------------------+------------------------+-------------+--------------------------------------+--------------------------------------+------- ------+----------------------------+--------------------------------------+-------------+---------------+--------------------------------------+------------------- ------------+-------------------------------+--------+----------------------- 24d78600-22f4-44f7-987b-fbd866736249 | 2015-05-19 15:00:13+02 | 34359738368 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 4 | 2016-01-30 08:45:59.998+01 | 4b4930ed-b52d-47ec-8506-245b7f144102 | 1 | 5 | b2390535-744f-4c02-bdc8-5a897226554b | 2015-05-19 15:00:1 1.864425+02 | 2016-01-30 08:45:59.999422+01 | f | 1 (1 row)

Regards,

Rik

...
...
Thanks, Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (Also note that the engine should be stopped before running this.)

[2] Arguments are the ovirt db name, db user, and the name of the vm you were performing live merge on.

...
Thanks, michal

...
Regards,

Rik

On 02/03/2016 10:26 AM, Rik Theys wrote:

...
Hi,

I created a snapshot of a running VM prior to an OS upgrade. The OS upgrade has now been succesful and I would like to remove the snapshot. I've selected the snapshot in the UI and clicked Delete to start the task.

After a few minutes, the task has failed. When I click delete again on the same snapshot, the failed message is returned after a few seconds.

> From browsing through the engine log (attached) it seems the snapshot

was correctly merged in the first try but something went wrong in the finalizing fase. On retries, the log indicates the snapshot/disk image no longer exists and the removal of the snapshot fails for this reason.

Is there any way to clean up this snapshot?

I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

Greg Padgett

11:52 p.m.

On 02/16/2016 08:50 AM, Rik Theys wrote:

...

Hi,

I'm trying to determine the correct "bad_img" uuid in my case.

The VM has two snapshots:

* The "Active VM" snapshot which has a disk that has an actual size that's 5GB larger than the virtual size. It has a creation date that matches the timestamp at which I created the second snapshot. The "disk snapshot id" for this snapshot ends with dc39.

* A "before jessie upgrade" snapshot that has status "illegal". It has an actual size that's 2GB larger than the virtual size. The creation date matches the date the VM was initialy created. The disk snapshot id ends with 6249.

From the above I conclude that the disk with id that ends with 6249 is the "bad" img I need to specify.

Similar to what I wrote to Marcelo above in the thread, I'd recommend running the "VM disk info gathering tool" attached to [1]. It's the best way to ensure the merge was completed and determine which image is the "bad" one that is no longer in use by any volume chains. If indeed the "bad" image (whichever one it is) is no longer in use, then it's possible the image wasn't successfully removed from storage. There are 2 ways to fix this: a) Run the db fixup script to remove the records for the merged image, and run the vdsm command by hand to remove it from storage. b) Adjust the db records so a merge retry would start at the right place, and re-run live merge. Given that your merge retries were failing, option a) seems most likely to succeed. The db fixup script is attached to [1]; as parameters you would need to provide the vm name, snapshot name, and the id of the unused image as verified by the disk info tool. To remove the stale LV, the vdsm deleteVolume verb would then be run from `vdsClient` -- but note that this must be run _on the SPM host_. It will not only perform lvremove, but also do housekeeping on other storage metadata to keep everything consistent. For this verb I believe you'll need to supply not only the unused image id, but also the pool, domain, and image group ids from your database queries. I hope that helps. Greg [1] https://bugzilla.redhat.com/show_bug.cgi?id=1306741

...

However, I grepped the output from 'lvs' on the SPM host of the cluster and both disk id's are returned:

[root@amazone ~]# lvs | egrep 'cd39|6249' 24d78600-22f4-44f7-987b-fbd866736249 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi-ao---- 34.00g

81458622-aa54-4f2f-b6d8-75e7db36cd39 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi------- 5.00g

I expected the "bad" img would no longer be found?

The SQL script only cleans up the database and not the logical volumes. Would running the script not keep a stale LV around?

Also, from the lvs output it seems the "bad" disk is bigger than the "good" one.

Is it possible the snapshot still needs to be merged?? If so, how can I initiate that?

Regards,

Rik

On 02/16/2016 02:02 PM, Rik Theys wrote:

...
Hi Greg,

...
2016-02-09 21:30 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...
On 02/09/2016 06:08 AM, Michal Skrivanek wrote:

...
...
On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:

...
...
...
...
> I can see the snapshot in the "Disk snapshot" tab of the storage. It has > a status of "illegal". Is it OK to (try to) remove this snapshot? Will > this impact the running VM and/or disk image?

No, it’s not ok to remove it while live merge(apparently) is still ongoing I guess that’s a live merge bug?

Indeed, this is bug 1302215.

I wrote a sql script to help with cleanup in this scenario, which you can find attached to the bug along with a description of how to use it[1].

However, Rik, before trying that, would you be able to run the attached script [2] (or just the db query within) and forward the output to me? I'd like to make sure everything looks as it should before modifying the db directly.

I ran the following query on the engine database:

select images.* from images join snapshots ON (images.vm_snapshot_id = snapshots.snapshot_id) join vm_static on (snapshots.vm_id = vm_static.vm_guid) where vm_static.vm_name = 'lena' and snapshots.description='before jessie upgrade';

The resulting output is:

image_guid | creation_date | size | it_guid | parentid | images tatus | lastmodified | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_da te | _update_date | active | volume_classification --------------------------------------+------------------------+-------------+--------------------------------------+--------------------------------------+------- ------+----------------------------+--------------------------------------+-------------+---------------+--------------------------------------+------------------- ------------+-------------------------------+--------+----------------------- 24d78600-22f4-44f7-987b-fbd866736249 | 2015-05-19 15:00:13+02 | 34359738368 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 4 | 2016-01-30 08:45:59.998+01 | 4b4930ed-b52d-47ec-8506-245b7f144102 | 1 | 5 | b2390535-744f-4c02-bdc8-5a897226554b | 2015-05-19 15:00:1 1.864425+02 | 2016-01-30 08:45:59.999422+01 | f | 1 (1 row)

Regards,

Rik

...
...
Thanks, Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (Also note that the engine should be stopped before running this.)

[2] Arguments are the ovirt db name, db user, and the name of the vm you were performing live merge on.

...
Thanks, michal

...
Regards,

Rik

On 02/03/2016 10:26 AM, Rik Theys wrote: > > Hi, > > I created a snapshot of a running VM prior to an OS upgrade. The OS > upgrade has now been succesful and I would like to remove the snapshot. > I've selected the snapshot in the UI and clicked Delete to start the > task. > > After a few minutes, the task has failed. When I click delete again on > the same snapshot, the failed message is returned after a few seconds. > >> From browsing through the engine log (attached) it seems the snapshot > > was correctly merged in the first try but something went wrong in the > finalizing fase. On retries, the log indicates the snapshot/disk image > no longer exists and the removal of the snapshot fails for this reason. > > Is there any way to clean up this snapshot? > > I can see the snapshot in the "Disk snapshot" tab of the storage. It has > a status of "illegal". Is it OK to (try to) remove this snapshot? Will > this impact the running VM and/or disk image?

Marcelo Leandro

17 Feb 17 Feb

12:51 a.m.

Hello Greg, I not see disk image at the storage: [root@srv-qemu01 ~]# cd /rhev/data-center/77e24b20-9d21-4952-a089-3c5c592b4e6d/c2dc0101-748e-4a7b-9913-47993eaa52bd/images/b7a27d0c-57cc-490e-a3f8-b4981310a9b0/ [root@srv-qemu01 b7a27d0c-57cc-490e-a3f8-b4981310a9b0]# ls 2e59f7f2-9e30-460e-836a-5e0d3d625059 But in DB i see : SELECT imagestatus,storage_pool_id, storage_domain_id, image_group_id, image_guid, parentid FROM storage_domains,image_storage_domain_map,images,vm_static,vm_device WHERE image_storage_domain_map.image_id = images.image_guid AND storage_domains.id = image_storage_domain_map.storage_domain_id AND vm_static.vm_guid = vm_device.vm_id AND images.image_group_id = vm_device.device_id AND vm_device.device = 'disk' AND vm_static.vm_name = 'Servidor-Cliente'; imagestatus | storage_pool_id | storage_domain_id | image_group_id | image_guid | parentid -------------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+----------------------- --------------- 1 | 77e24b20-9d21-4952-a089-3c5c592b4e6d | c2dc0101-748e-4a7b-9913-47993eaa52bd | b7a27d0c-57cc-490e-a3f8-b4981310a9b0 | 7f8bb099-9a18-4e89-bf48-57e56e5770d2 | 2e59f7f2-9e30-460e-836 a-5e0d3d625059 4 | 77e24b20-9d21-4952-a089-3c5c592b4e6d | c2dc0101-748e-4a7b-9913-47993eaa52bd | b7a27d0c-57cc-490e-a3f8-b4981310a9b0 | 2e59f7f2-9e30-460e-836a-5e0d3d625059 | 00000000-0000-0000-000 0-000000000000 in this case I have that update the imagem with you describe? """Arguments in your case would be the VM name, snapshot name, and the UUID of the image that is missing from your storage. (You may need to manually mark the image as illegal first, [2]). "" Thanks.

Greg Padgett

1:07 a.m.

On 02/16/2016 05:51 PM, Marcelo Leandro wrote:

...

Hello Greg, I not see disk image at the storage:

[root@srv-qemu01 ~]# cd /rhev/data-center/77e24b20-9d21-4952-a089-3c5c592b4e6d/c2dc0101-748e-4a7b-9913-47993eaa52bd/images/b7a27d0c-57cc-490e-a3f8-b4981310a9b0/ [root@srv-qemu01 b7a27d0c-57cc-490e-a3f8-b4981310a9b0]# ls 2e59f7f2-9e30-460e-836a-5e0d3d625059

But in DB i see :

SELECT imagestatus,storage_pool_id, storage_domain_id, image_group_id, image_guid, parentid FROM storage_domains,image_storage_domain_map,images,vm_static,vm_device WHERE image_storage_domain_map.image_id = images.image_guid AND storage_domains.id = image_storage_domain_map.storage_domain_id AND vm_static.vm_guid = vm_device.vm_id AND images.image_group_id = vm_device.device_id AND vm_device.device = 'disk' AND vm_static.vm_name = 'Servidor-Cliente'; imagestatus | storage_pool_id | storage_domain_id | image_group_id | image_guid | parentid

-------------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+----------------------- --------------- 1 | 77e24b20-9d21-4952-a089-3c5c592b4e6d | c2dc0101-748e-4a7b-9913-47993eaa52bd | b7a27d0c-57cc-490e-a3f8-b4981310a9b0 | | 2e59f7f2-9e30-460e-836 a-5e0d3d625059 4 | 77e24b20-9d21-4952-a089-3c5c592b4e6d | c2dc0101-748e-4a7b-9913-47993eaa52bd | b7a27d0c-57cc-490e-a3f8-b4981310a9b0 | 2e59f7f2-9e30-460e-836a-5e0d3d625059 | 00000000-0000-0000-000 0-000000000000

in this case I have that update the imagem with you describe?

"""Arguments in your case would be the VM name, snapshot name, and the UUID of the image that is missing from your storage. (You may need to manually mark the image as illegal first, [2]). ""

Thanks.

I believe it would still be prudent to verify that image 7f8bb099-9a18-4e89-bf48-57e56e5770d2 is supposed to be missing, by running the disk info tool from: https://bugzilla.redhat.com/show_bug.cgi?id=1306741. Once we verify that it's ok to remove the image, then I'd proceed to clean up the db records via the image cleanup sql script. Thanks, Greg

Marcelo Leandro

3:15 a.m.

Thank Greg, My problem was resolved. 2016-02-16 20:07 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...

On 02/16/2016 05:51 PM, Marcelo Leandro wrote:

...
Hello Greg, I not see disk image at the storage:

[root@srv-qemu01 ~]# cd

/rhev/data-center/77e24b20-9d21-4952-a089-3c5c592b4e6d/c2dc0101-748e-4a7b-9913-47993eaa52bd/images/b7a27d0c-57cc-490e-a3f8-b4981310a9b0/ [root@srv-qemu01 b7a27d0c-57cc-490e-a3f8-b4981310a9b0]# ls 2e59f7f2-9e30-460e-836a-5e0d3d625059

But in DB i see :

SELECT imagestatus,storage_pool_id, storage_domain_id, image_group_id, image_guid, parentid FROM storage_domains,image_storage_domain_map,images,vm_static,vm_device WHERE image_storage_domain_map.image_id = images.image_guid AND storage_domains.id = image_storage_domain_map.storage_domain_id AND vm_static.vm_guid = vm_device.vm_id AND images.image_group_id = vm_device.device_id AND vm_device.device = 'disk' AND vm_static.vm_name = 'Servidor-Cliente'; imagestatus | storage_pool_id | storage_domain_id | image_group_id | image_guid | parentid

-------------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+----------------------- --------------- 1 | 77e24b20-9d21-4952-a089-3c5c592b4e6d | c2dc0101-748e-4a7b-9913-47993eaa52bd | b7a27d0c-57cc-490e-a3f8-b4981310a9b0 | | 2e59f7f2-9e30-460e-836 a-5e0d3d625059 4 | 77e24b20-9d21-4952-a089-3c5c592b4e6d | c2dc0101-748e-4a7b-9913-47993eaa52bd | b7a27d0c-57cc-490e-a3f8-b4981310a9b0 | 2e59f7f2-9e30-460e-836a-5e0d3d625059 | 00000000-0000-0000-000 0-000000000000

in this case I have that update the imagem with you describe?

"""Arguments in your case would be the VM name, snapshot name, and the UUID of the image that is missing from your storage. (You may need to manually mark the image as illegal first, [2]). ""

Thanks.

I believe it would still be prudent to verify that image 7f8bb099-9a18-4e89-bf48-57e56e5770d2 is supposed to be missing, by running the disk info tool from: https://bugzilla.redhat.com/show_bug.cgi?id=1306741.

Once we verify that it's ok to remove the image, then I'd proceed to clean up the db records via the image cleanup sql script.

Thanks, Greg

Rik Theys

10:42 a.m.

Hi, On 02/16/2016 10:52 PM, Greg Padgett wrote:

...

On 02/16/2016 08:50 AM, Rik Theys wrote:

...
Hi,

I'm trying to determine the correct "bad_img" uuid in my case.

The VM has two snapshots:

* The "Active VM" snapshot which has a disk that has an actual size that's 5GB larger than the virtual size. It has a creation date that matches the timestamp at which I created the second snapshot. The "disk snapshot id" for this snapshot ends with dc39.

* A "before jessie upgrade" snapshot that has status "illegal". It has an actual size that's 2GB larger than the virtual size. The creation date matches the date the VM was initialy created. The disk snapshot id ends with 6249.

From the above I conclude that the disk with id that ends with 6249 is the "bad" img I need to specify.

Similar to what I wrote to Marcelo above in the thread, I'd recommend running the "VM disk info gathering tool" attached to [1]. It's the best way to ensure the merge was completed and determine which image is the "bad" one that is no longer in use by any volume chains.

I've ran the disk info gathering tool and this outputs (for the affected VM): VM lena Disk b2390535-744f-4c02-bdc8-5a897226554b (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) Volumes: 24d78600-22f4-44f7-987b-fbd866736249 The id of the volume is the ID of the snapshot that is marked "illegal". So the "bad" image would be the dc39 one, which according to the UI is in use by the "Active VM" snapshot. Can this make sense? Both the "Active VM" and the defective snapshot have an actual size that's bigger than the virtual size of the disk. When I remove the bad disk image/snapshot, will the actual size of the "Active VM" snapshot return to the virtual size of the disk? What's currently stored in the "Active VM" snapshot? Would cloning the VM (and removing the original VM afterwards) work as an alternate way to clean this up? Or will the clone operation also clone the snapshots? Regards, Rik

...

If indeed the "bad" image (whichever one it is) is no longer in use, then it's possible the image wasn't successfully removed from storage. There are 2 ways to fix this:

a) Run the db fixup script to remove the records for the merged image, and run the vdsm command by hand to remove it from storage. b) Adjust the db records so a merge retry would start at the right place, and re-run live merge.

Given that your merge retries were failing, option a) seems most likely to succeed. The db fixup script is attached to [1]; as parameters you would need to provide the vm name, snapshot name, and the id of the unused image as verified by the disk info tool.

To remove the stale LV, the vdsm deleteVolume verb would then be run from `vdsClient` -- but note that this must be run _on the SPM host_. It will not only perform lvremove, but also do housekeeping on other storage metadata to keep everything consistent. For this verb I believe you'll need to supply not only the unused image id, but also the pool, domain, and image group ids from your database queries.

I hope that helps.

Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1306741

...
However, I grepped the output from 'lvs' on the SPM host of the cluster and both disk id's are returned:

[root@amazone ~]# lvs | egrep 'cd39|6249' 24d78600-22f4-44f7-987b-fbd866736249 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi-ao---- 34.00g

81458622-aa54-4f2f-b6d8-75e7db36cd39 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi------- 5.00g

I expected the "bad" img would no longer be found?

The SQL script only cleans up the database and not the logical volumes. Would running the script not keep a stale LV around?

Also, from the lvs output it seems the "bad" disk is bigger than the "good" one.

Is it possible the snapshot still needs to be merged?? If so, how can I initiate that?

Regards,

Rik

On 02/16/2016 02:02 PM, Rik Theys wrote:

...
Hi Greg,

...
2016-02-09 21:30 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...
On 02/09/2016 06:08 AM, Michal Skrivanek wrote:

...
> On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> > wrote:

...
...
...
>> I can see the snapshot in the "Disk snapshot" tab of the >> storage. It has >> a status of "illegal". Is it OK to (try to) remove this >> snapshot? Will >> this impact the running VM and/or disk image?

No, it’s not ok to remove it while live merge(apparently) is still ongoing I guess that’s a live merge bug?

Indeed, this is bug 1302215.

I wrote a sql script to help with cleanup in this scenario, which you can find attached to the bug along with a description of how to use it[1].

However, Rik, before trying that, would you be able to run the attached script [2] (or just the db query within) and forward the output to me? I'd like to make sure everything looks as it should before modifying the db directly.

I ran the following query on the engine database:

select images.* from images join snapshots ON (images.vm_snapshot_id = snapshots.snapshot_id) join vm_static on (snapshots.vm_id = vm_static.vm_guid) where vm_static.vm_name = 'lena' and snapshots.description='before jessie upgrade';

The resulting output is:

image_guid | creation_date | size | it_guid | parentid | images tatus | lastmodified | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_da te | _update_date | active | volume_classification --------------------------------------+------------------------+-------------+--------------------------------------+--------------------------------------+-------

------+----------------------------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------

------------+-------------------------------+--------+-----------------------

24d78600-22f4-44f7-987b-fbd866736249 | 2015-05-19 15:00:13+02 | 34359738368 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 4 | 2016-01-30 08:45:59.998+01 | 4b4930ed-b52d-47ec-8506-245b7f144102 | 1 | 5 | b2390535-744f-4c02-bdc8-5a897226554b | 2015-05-19 15:00:1 1.864425+02 | 2016-01-30 08:45:59.999422+01 | f | 1 (1 row)

Regards,

Rik

...
...
Thanks, Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (Also note that the engine should be stopped before running this.)

[2] Arguments are the ovirt db name, db user, and the name of the vm you were performing live merge on.

...
Thanks, michal

> > > Regards, > > Rik > > On 02/03/2016 10:26 AM, Rik Theys wrote: >> >> Hi, >> >> I created a snapshot of a running VM prior to an OS upgrade. The OS >> upgrade has now been succesful and I would like to remove the >> snapshot. >> I've selected the snapshot in the UI and clicked Delete to start >> the >> task. >> >> After a few minutes, the task has failed. When I click delete >> again on >> the same snapshot, the failed message is returned after a few >> seconds. >> >>> From browsing through the engine log (attached) it seems the >>> snapshot >> >> was correctly merged in the first try but something went wrong >> in the >> finalizing fase. On retries, the log indicates the snapshot/disk >> image >> no longer exists and the removal of the snapshot fails for this >> reason. >> >> Is there any way to clean up this snapshot? >> >> I can see the snapshot in the "Disk snapshot" tab of the >> storage. It has >> a status of "illegal". Is it OK to (try to) remove this >> snapshot? Will >> this impact the running VM and/or disk image?

Greg Padgett

6:14 p.m.

On 02/17/2016 03:42 AM, Rik Theys wrote:

...

Hi,

On 02/16/2016 10:52 PM, Greg Padgett wrote:

...
On 02/16/2016 08:50 AM, Rik Theys wrote:

...
Hi,

I'm trying to determine the correct "bad_img" uuid in my case.

The VM has two snapshots:

* The "Active VM" snapshot which has a disk that has an actual size that's 5GB larger than the virtual size. It has a creation date that matches the timestamp at which I created the second snapshot. The "disk snapshot id" for this snapshot ends with dc39.

* A "before jessie upgrade" snapshot that has status "illegal". It has an actual size that's 2GB larger than the virtual size. The creation date matches the date the VM was initialy created. The disk snapshot id ends with 6249.

From the above I conclude that the disk with id that ends with 6249 is the "bad" img I need to specify.

Similar to what I wrote to Marcelo above in the thread, I'd recommend running the "VM disk info gathering tool" attached to [1]. It's the best way to ensure the merge was completed and determine which image is the "bad" one that is no longer in use by any volume chains.

I've ran the disk info gathering tool and this outputs (for the affected VM):

VM lena Disk b2390535-744f-4c02-bdc8-5a897226554b (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) Volumes: 24d78600-22f4-44f7-987b-fbd866736249

The id of the volume is the ID of the snapshot that is marked "illegal". So the "bad" image would be the dc39 one, which according to the UI is in use by the "Active VM" snapshot. Can this make sense?

It looks accurate. Live merges are "backwards" merges, so the merge would have pushed data from the volume associated with "Active VM" into the volume associated with the snapshot you're trying to remove. Upon completion, we "pivot" so that the VM uses that older volume, and we update the engine database to reflect this (basically we re-associate that older volume with, in your case, "Active VM"). In your case, it seems the pivot operation was done, but the database wasn't updated to reflect it. Given snapshot/image associations e.g.: VM Name Snapshot Name Volume ------- ------------- ------ My-VM Active VM 123-abc My-VM My-Snapshot 789-def My-VM in your case is actually running on volume 789-def. If you run the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") (note the volume is the newer, "bad" one), then it will switch the volume association for you and remove the invalid entries. Of course, I'd shut down the VM, and back up the db beforehand. I'm not terribly familiar with how vdsm handles block storage, but I'd image you could then e.g. `lvchange -an` the bad volume's LV, start the VM, and verify that the data is current without having the to-be-removed volume active, just to make sure everything lines up before running the vdsClient verb to remove the volume.

...

Both the "Active VM" and the defective snapshot have an actual size that's bigger than the virtual size of the disk. When I remove the bad disk image/snapshot, will the actual size of the "Active VM" snapshot return to the virtual size of the disk? What's currently stored in the "Active VM" snapshot?

"Active VM" should now be unused; it previously (pre-merge) was the data written since the snapshot was taken. Normally the larger actual size might be from qcow format overhead. If your listing above is complete (ie one volume for the vm), then I'm not sure why the base volume would have a larger actual size than virtual size. Adam, Nir--any thoughts on this?

...

Would cloning the VM (and removing the original VM afterwards) work as an alternate way to clean this up? Or will the clone operation also clone the snapshots?

It would try to clone everything in the engine db, so no luck there.

...

Regards,

Rik

...
If indeed the "bad" image (whichever one it is) is no longer in use, then it's possible the image wasn't successfully removed from storage. There are 2 ways to fix this:

a) Run the db fixup script to remove the records for the merged image, and run the vdsm command by hand to remove it from storage. b) Adjust the db records so a merge retry would start at the right place, and re-run live merge.

Given that your merge retries were failing, option a) seems most likely to succeed. The db fixup script is attached to [1]; as parameters you would need to provide the vm name, snapshot name, and the id of the unused image as verified by the disk info tool.

To remove the stale LV, the vdsm deleteVolume verb would then be run from `vdsClient` -- but note that this must be run _on the SPM host_. It will not only perform lvremove, but also do housekeeping on other storage metadata to keep everything consistent. For this verb I believe you'll need to supply not only the unused image id, but also the pool, domain, and image group ids from your database queries.

I hope that helps.

Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1306741

...
However, I grepped the output from 'lvs' on the SPM host of the cluster and both disk id's are returned:

[root@amazone ~]# lvs | egrep 'cd39|6249' 24d78600-22f4-44f7-987b-fbd866736249 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi-ao---- 34.00g

81458622-aa54-4f2f-b6d8-75e7db36cd39 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi------- 5.00g

I expected the "bad" img would no longer be found?

The SQL script only cleans up the database and not the logical volumes. Would running the script not keep a stale LV around?

Also, from the lvs output it seems the "bad" disk is bigger than the "good" one.

Is it possible the snapshot still needs to be merged?? If so, how can I initiate that?

Regards,

Rik

On 02/16/2016 02:02 PM, Rik Theys wrote:

...
Hi Greg,

...
2016-02-09 21:30 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...
On 02/09/2016 06:08 AM, Michal Skrivanek wrote: > > >> On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> >> wrote:

...
...
>>> I can see the snapshot in the "Disk snapshot" tab of the >>> storage. It has >>> a status of "illegal". Is it OK to (try to) remove this >>> snapshot? Will >>> this impact the running VM and/or disk image? > > > No, it’s not ok to remove it while live merge(apparently) is still > ongoing > I guess that’s a live merge bug?

Indeed, this is bug 1302215.

I wrote a sql script to help with cleanup in this scenario, which you can find attached to the bug along with a description of how to use it[1].

However, Rik, before trying that, would you be able to run the attached script [2] (or just the db query within) and forward the output to me? I'd like to make sure everything looks as it should before modifying the db directly.

I ran the following query on the engine database:

select images.* from images join snapshots ON (images.vm_snapshot_id = snapshots.snapshot_id) join vm_static on (snapshots.vm_id = vm_static.vm_guid) where vm_static.vm_name = 'lena' and snapshots.description='before jessie upgrade';

The resulting output is:

image_guid | creation_date | size | it_guid | parentid | images tatus | lastmodified | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_da te | _update_date | active | volume_classification --------------------------------------+------------------------+-------------+--------------------------------------+--------------------------------------+-------

------+----------------------------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------

------------+-------------------------------+--------+-----------------------

24d78600-22f4-44f7-987b-fbd866736249 | 2015-05-19 15:00:13+02 | 34359738368 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 4 | 2016-01-30 08:45:59.998+01 | 4b4930ed-b52d-47ec-8506-245b7f144102 | 1 | 5 | b2390535-744f-4c02-bdc8-5a897226554b | 2015-05-19 15:00:1 1.864425+02 | 2016-01-30 08:45:59.999422+01 | f | 1 (1 row)

Regards,

Rik

...
...
Thanks, Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (Also note that the engine should be stopped before running this.)

[2] Arguments are the ovirt db name, db user, and the name of the vm you were performing live merge on.

> Thanks, > michal > >> >> >> Regards, >> >> Rik >> >> On 02/03/2016 10:26 AM, Rik Theys wrote: >>> >>> Hi, >>> >>> I created a snapshot of a running VM prior to an OS upgrade. The OS >>> upgrade has now been succesful and I would like to remove the >>> snapshot. >>> I've selected the snapshot in the UI and clicked Delete to start >>> the >>> task. >>> >>> After a few minutes, the task has failed. When I click delete >>> again on >>> the same snapshot, the failed message is returned after a few >>> seconds. >>> >>>> From browsing through the engine log (attached) it seems the >>>> snapshot >>> >>> was correctly merged in the first try but something went wrong >>> in the >>> finalizing fase. On retries, the log indicates the snapshot/disk >>> image >>> no longer exists and the removal of the snapshot fails for this >>> reason. >>> >>> Is there any way to clean up this snapshot? >>> >>> I can see the snapshot in the "Disk snapshot" tab of the >>> storage. It has >>> a status of "illegal". Is it OK to (try to) remove this >>> snapshot? Will >>> this impact the running VM and/or disk image?

Adam Litke

6:29 p.m.

On 17/02/16 11:14 -0500, Greg Padgett wrote:

...

On 02/17/2016 03:42 AM, Rik Theys wrote:

...
Hi,

On 02/16/2016 10:52 PM, Greg Padgett wrote:

...
On 02/16/2016 08:50 AM, Rik Theys wrote:

...
Hi,

I'm trying to determine the correct "bad_img" uuid in my case.

The VM has two snapshots:

* The "Active VM" snapshot which has a disk that has an actual size that's 5GB larger than the virtual size. It has a creation date that matches the timestamp at which I created the second snapshot. The "disk snapshot id" for this snapshot ends with dc39.

* A "before jessie upgrade" snapshot that has status "illegal". It has an actual size that's 2GB larger than the virtual size. The creation date matches the date the VM was initialy created. The disk snapshot id ends with 6249.

From the above I conclude that the disk with id that ends with 6249 is the "bad" img I need to specify.

Similar to what I wrote to Marcelo above in the thread, I'd recommend running the "VM disk info gathering tool" attached to [1]. It's the best way to ensure the merge was completed and determine which image is the "bad" one that is no longer in use by any volume chains.

I've ran the disk info gathering tool and this outputs (for the affected VM):

VM lena Disk b2390535-744f-4c02-bdc8-5a897226554b (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) Volumes: 24d78600-22f4-44f7-987b-fbd866736249

The id of the volume is the ID of the snapshot that is marked "illegal". So the "bad" image would be the dc39 one, which according to the UI is in use by the "Active VM" snapshot. Can this make sense?

It looks accurate. Live merges are "backwards" merges, so the merge would have pushed data from the volume associated with "Active VM" into the volume associated with the snapshot you're trying to remove.

Upon completion, we "pivot" so that the VM uses that older volume, and we update the engine database to reflect this (basically we re-associate that older volume with, in your case, "Active VM").

In your case, it seems the pivot operation was done, but the database wasn't updated to reflect it. Given snapshot/image associations e.g.:

VM Name Snapshot Name Volume ------- ------------- ------ My-VM Active VM 123-abc My-VM My-Snapshot 789-def

My-VM in your case is actually running on volume 789-def. If you run the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") (note the volume is the newer, "bad" one), then it will switch the volume association for you and remove the invalid entries.

Of course, I'd shut down the VM, and back up the db beforehand.

I'm not terribly familiar with how vdsm handles block storage, but I'd image you could then e.g. `lvchange -an` the bad volume's LV, start the VM, and verify that the data is current without having the to-be-removed volume active, just to make sure everything lines up before running the vdsClient verb to remove the volume.

vdsm will reactivate the lv when starting the vm so this check will not work. The vm-disk-info.py script is using libvirt to check which volumes the VM actually has open so if the 'bad' volume is not listed, then it is not being used and is safe to remove.

...

...
Both the "Active VM" and the defective snapshot have an actual size that's bigger than the virtual size of the disk. When I remove the bad disk image/snapshot, will the actual size of the "Active VM" snapshot return to the virtual size of the disk? What's currently stored in the "Active VM" snapshot?

"Active VM" should now be unused; it previously (pre-merge) was the data written since the snapshot was taken. Normally the larger actual size might be from qcow format overhead. If your listing above is complete (ie one volume for the vm), then I'm not sure why the base volume would have a larger actual size than virtual size.

Adam, Nir--any thoughts on this?

There is a bug which has caused inflation of the snapshot volumes when performing a live merge. We are submitting fixes for 3.5, 3.6, and master right at this moment.

...

...
Would cloning the VM (and removing the original VM afterwards) work as an alternate way to clean this up? Or will the clone operation also clone the snapshots?

It would try to clone everything in the engine db, so no luck there.

...
Regards,

Rik

...
If indeed the "bad" image (whichever one it is) is no longer in use, then it's possible the image wasn't successfully removed from storage. There are 2 ways to fix this:

a) Run the db fixup script to remove the records for the merged image, and run the vdsm command by hand to remove it from storage. b) Adjust the db records so a merge retry would start at the right place, and re-run live merge.

Given that your merge retries were failing, option a) seems most likely to succeed. The db fixup script is attached to [1]; as parameters you would need to provide the vm name, snapshot name, and the id of the unused image as verified by the disk info tool.

To remove the stale LV, the vdsm deleteVolume verb would then be run from `vdsClient` -- but note that this must be run _on the SPM host_. It will not only perform lvremove, but also do housekeeping on other storage metadata to keep everything consistent. For this verb I believe you'll need to supply not only the unused image id, but also the pool, domain, and image group ids from your database queries.

I hope that helps.

Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1306741

...
However, I grepped the output from 'lvs' on the SPM host of the cluster and both disk id's are returned:

[root@amazone ~]# lvs | egrep 'cd39|6249' 24d78600-22f4-44f7-987b-fbd866736249 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi-ao---- 34.00g

81458622-aa54-4f2f-b6d8-75e7db36cd39 a7ba2db3-517c-408a-8b27-ea45989d6416 -wi------- 5.00g

I expected the "bad" img would no longer be found?

The SQL script only cleans up the database and not the logical volumes. Would running the script not keep a stale LV around?

Also, from the lvs output it seems the "bad" disk is bigger than the "good" one.

Is it possible the snapshot still needs to be merged?? If so, how can I initiate that?

Regards,

Rik

On 02/16/2016 02:02 PM, Rik Theys wrote:

...
Hi Greg,

...
2016-02-09 21:30 GMT-03:00 Greg Padgett <gpadgett@redhat.com>: >On 02/09/2016 06:08 AM, Michal Skrivanek wrote: >> >> >>>On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> >>>wrote:

...
>>>>I can see the snapshot in the "Disk snapshot" tab of the >>>>storage. It has >>>>a status of "illegal". Is it OK to (try to) remove this >>>>snapshot? Will >>>>this impact the running VM and/or disk image? >> >> >>No, it’s not ok to remove it while live merge(apparently) is still >>ongoing >>I guess that’s a live merge bug? > > >Indeed, this is bug 1302215. > >I wrote a sql script to help with cleanup in this scenario, which >you can >find attached to the bug along with a description of how to use it[1]. > >However, Rik, before trying that, would you be able to run the >attached >script [2] (or just the db query within) and forward the output to >me? I'd >like to make sure everything looks as it should before modifying >the db >directly.

I ran the following query on the engine database:

select images.* from images join snapshots ON (images.vm_snapshot_id = snapshots.snapshot_id) join vm_static on (snapshots.vm_id = vm_static.vm_guid) where vm_static.vm_name = 'lena' and snapshots.description='before jessie upgrade';

The resulting output is:

image_guid | creation_date | size | it_guid | parentid | images tatus | lastmodified | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_da te | _update_date | active | volume_classification --------------------------------------+------------------------+-------------+--------------------------------------+--------------------------------------+-------

------+----------------------------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------

------------+-------------------------------+--------+-----------------------

24d78600-22f4-44f7-987b-fbd866736249 | 2015-05-19 15:00:13+02 | 34359738368 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 4 | 2016-01-30 08:45:59.998+01 | 4b4930ed-b52d-47ec-8506-245b7f144102 | 1 | 5 | b2390535-744f-4c02-bdc8-5a897226554b | 2015-05-19 15:00:1 1.864425+02 | 2016-01-30 08:45:59.999422+01 | f | 1 (1 row)

Regards,

Rik

...
> >Thanks, >Greg > >[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 >(Also note that the engine should be stopped before running this.) > >[2] Arguments are the ovirt db name, db user, and the name of the >vm you >were performing live merge on. > > >>Thanks, >>michal >> >>> >>> >>>Regards, >>> >>>Rik >>> >>>On 02/03/2016 10:26 AM, Rik Theys wrote: >>>> >>>>Hi, >>>> >>>>I created a snapshot of a running VM prior to an OS upgrade. The OS >>>>upgrade has now been succesful and I would like to remove the >>>>snapshot. >>>>I've selected the snapshot in the UI and clicked Delete to start >>>>the >>>>task. >>>> >>>>After a few minutes, the task has failed. When I click delete >>>>again on >>>>the same snapshot, the failed message is returned after a few >>>>seconds. >>>> >>>>> From browsing through the engine log (attached) it seems the >>>>>snapshot >>>> >>>>was correctly merged in the first try but something went wrong >>>>in the >>>>finalizing fase. On retries, the log indicates the snapshot/disk >>>>image >>>>no longer exists and the removal of the snapshot fails for this >>>>reason. >>>> >>>>Is there any way to clean up this snapshot? >>>> >>>>I can see the snapshot in the "Disk snapshot" tab of the >>>>storage. It has >>>>a status of "illegal". Is it OK to (try to) remove this >>>>snapshot? Will >>>>this impact the running VM and/or disk image?

-- Adam Litke

Rik Theys

18 Feb 18 Feb

11:37 a.m.

Hi, On 02/17/2016 05:29 PM, Adam Litke wrote:

...

On 17/02/16 11:14 -0500, Greg Padgett wrote:

...
On 02/17/2016 03:42 AM, Rik Theys wrote:

...
Hi,

On 02/16/2016 10:52 PM, Greg Padgett wrote:

...
...
From the above I conclude that the disk with id that ends with Similar to what I wrote to Marcelo above in the thread, I'd recommend running the "VM disk info gathering tool" attached to [1]. It's the best way to ensure the merge was completed and determine which image is

On 02/16/2016 08:50 AM, Rik Theys wrote: the "bad" one that is no longer in use by any volume chains.

I've ran the disk info gathering tool and this outputs (for the affected VM):

VM lena Disk b2390535-744f-4c02-bdc8-5a897226554b (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) Volumes: 24d78600-22f4-44f7-987b-fbd866736249

The id of the volume is the ID of the snapshot that is marked "illegal". So the "bad" image would be the dc39 one, which according to the UI is in use by the "Active VM" snapshot. Can this make sense?

It looks accurate. Live merges are "backwards" merges, so the merge would have pushed data from the volume associated with "Active VM" into the volume associated with the snapshot you're trying to remove.

Upon completion, we "pivot" so that the VM uses that older volume, and we update the engine database to reflect this (basically we re-associate that older volume with, in your case, "Active VM").

In your case, it seems the pivot operation was done, but the database wasn't updated to reflect it. Given snapshot/image associations e.g.:

VM Name Snapshot Name Volume ------- ------------- ------ My-VM Active VM 123-abc My-VM My-Snapshot 789-def

My-VM in your case is actually running on volume 789-def. If you run the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") (note the volume is the newer, "bad" one), then it will switch the volume association for you and remove the invalid entries.

Of course, I'd shut down the VM, and back up the db beforehand.

I've executed the sql script and it seems to have worked. Thanks!

...

...
"Active VM" should now be unused; it previously (pre-merge) was the data written since the snapshot was taken. Normally the larger actual size might be from qcow format overhead. If your listing above is complete (ie one volume for the vm), then I'm not sure why the base volume would have a larger actual size than virtual size.

Adam, Nir--any thoughts on this?

There is a bug which has caused inflation of the snapshot volumes when performing a live merge. We are submitting fixes for 3.5, 3.6, and master right at this moment.

Which bug number is assigned to this bug? Will upgrading to a release with a fix reduce the disk usage again? Regards, Rik -- Rik Theys System Engineer KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee +32(0)16/32.11.07 ---------------------------------------------------------------- <<Any errors in spelling, tact or fact are transmission errors>>

Adam Litke

4:34 p.m.

On 18/02/16 10:37 +0100, Rik Theys wrote:

...

Hi,

On 02/17/2016 05:29 PM, Adam Litke wrote:

...
On 17/02/16 11:14 -0500, Greg Padgett wrote:

...
On 02/17/2016 03:42 AM, Rik Theys wrote:

...
Hi,

On 02/16/2016 10:52 PM, Greg Padgett wrote:

...
...
From the above I conclude that the disk with id that ends with Similar to what I wrote to Marcelo above in the thread, I'd recommend running the "VM disk info gathering tool" attached to [1]. It's the best way to ensure the merge was completed and determine which image is

On 02/16/2016 08:50 AM, Rik Theys wrote: the "bad" one that is no longer in use by any volume chains.

I've ran the disk info gathering tool and this outputs (for the affected VM):

VM lena Disk b2390535-744f-4c02-bdc8-5a897226554b (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) Volumes: 24d78600-22f4-44f7-987b-fbd866736249

The id of the volume is the ID of the snapshot that is marked "illegal". So the "bad" image would be the dc39 one, which according to the UI is in use by the "Active VM" snapshot. Can this make sense?

It looks accurate. Live merges are "backwards" merges, so the merge would have pushed data from the volume associated with "Active VM" into the volume associated with the snapshot you're trying to remove.

Upon completion, we "pivot" so that the VM uses that older volume, and we update the engine database to reflect this (basically we re-associate that older volume with, in your case, "Active VM").

In your case, it seems the pivot operation was done, but the database wasn't updated to reflect it. Given snapshot/image associations e.g.:

VM Name Snapshot Name Volume ------- ------------- ------ My-VM Active VM 123-abc My-VM My-Snapshot 789-def

My-VM in your case is actually running on volume 789-def. If you run the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") (note the volume is the newer, "bad" one), then it will switch the volume association for you and remove the invalid entries.

Of course, I'd shut down the VM, and back up the db beforehand.

I've executed the sql script and it seems to have worked. Thanks!

...
...
"Active VM" should now be unused; it previously (pre-merge) was the data written since the snapshot was taken. Normally the larger actual size might be from qcow format overhead. If your listing above is complete (ie one volume for the vm), then I'm not sure why the base volume would have a larger actual size than virtual size.

Adam, Nir--any thoughts on this?

There is a bug which has caused inflation of the snapshot volumes when performing a live merge. We are submitting fixes for 3.5, 3.6, and master right at this moment.

Which bug number is assigned to this bug? Will upgrading to a release with a fix reduce the disk usage again?

See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug. It's about a clone disk failure after the problem occurs. Unfortunately, there is not an automatic way to repair the raw base volumes if they were affected by this bug. They will need to be manually shrunk using lvreduce if you are certain that they are inflated. -- Adam Litke

Marcelo Leandro

22 Feb 22 Feb

2:10 p.m.

Hello, The bug with snapshot it will be fixed in ovirt 3.6.3? thanks. 2016-02-18 11:34 GMT-03:00 Adam Litke <alitke@redhat.com>:

...

On 18/02/16 10:37 +0100, Rik Theys wrote:

...
Hi,

On 02/17/2016 05:29 PM, Adam Litke wrote:

...
On 17/02/16 11:14 -0500, Greg Padgett wrote:

...
On 02/17/2016 03:42 AM, Rik Theys wrote:

...
Hi,

On 02/16/2016 10:52 PM, Greg Padgett wrote:

...
On 02/16/2016 08:50 AM, Rik Theys wrote: > > From the above I conclude that the disk with id that ends with

Similar to what I wrote to Marcelo above in the thread, I'd recommend running the "VM disk info gathering tool" attached to [1]. It's the best way to ensure the merge was completed and determine which image is the "bad" one that is no longer in use by any volume chains.

I've ran the disk info gathering tool and this outputs (for the affected VM):

VM lena Disk b2390535-744f-4c02-bdc8-5a897226554b (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) Volumes: 24d78600-22f4-44f7-987b-fbd866736249

The id of the volume is the ID of the snapshot that is marked "illegal". So the "bad" image would be the dc39 one, which according to the UI is in use by the "Active VM" snapshot. Can this make sense?

It looks accurate. Live merges are "backwards" merges, so the merge would have pushed data from the volume associated with "Active VM" into the volume associated with the snapshot you're trying to remove.

Upon completion, we "pivot" so that the VM uses that older volume, and we update the engine database to reflect this (basically we re-associate that older volume with, in your case, "Active VM").

In your case, it seems the pivot operation was done, but the database wasn't updated to reflect it. Given snapshot/image associations e.g.:

VM Name Snapshot Name Volume ------- ------------- ------ My-VM Active VM 123-abc My-VM My-Snapshot 789-def

My-VM in your case is actually running on volume 789-def. If you run the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") (note the volume is the newer, "bad" one), then it will switch the volume association for you and remove the invalid entries.

Of course, I'd shut down the VM, and back up the db beforehand.

I've executed the sql script and it seems to have worked. Thanks!

...
...
"Active VM" should now be unused; it previously (pre-merge) was the data written since the snapshot was taken. Normally the larger actual size might be from qcow format overhead. If your listing above is complete (ie one volume for the vm), then I'm not sure why the base volume would have a larger actual size than virtual size.

Adam, Nir--any thoughts on this?

There is a bug which has caused inflation of the snapshot volumes when performing a live merge. We are submitting fixes for 3.5, 3.6, and master right at this moment.

Which bug number is assigned to this bug? Will upgrading to a release with a fix reduce the disk usage again?

See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug. It's about a clone disk failure after the problem occurs. Unfortunately, there is not an automatic way to repair the raw base volumes if they were affected by this bug. They will need to be manually shrunk using lvreduce if you are certain that they are inflated.

-- Adam Litke

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Greg Padgett

23 Feb 23 Feb

8:51 p.m.

On 02/22/2016 07:10 AM, Marcelo Leandro wrote:

...

Hello,

The bug with snapshot it will be fixed in ovirt 3.6.3?

thanks.

Hi Marcelo, Yes, the bug below (bug 1301709) is now targeted to 3.6.3. Thanks, Greg

...

2016-02-18 11:34 GMT-03:00 Adam Litke <alitke@redhat.com>:

...
On 18/02/16 10:37 +0100, Rik Theys wrote:

...
Hi,

On 02/17/2016 05:29 PM, Adam Litke wrote:

...
On 17/02/16 11:14 -0500, Greg Padgett wrote:

...
On 02/17/2016 03:42 AM, Rik Theys wrote:

...
Hi,

On 02/16/2016 10:52 PM, Greg Padgett wrote: > > On 02/16/2016 08:50 AM, Rik Theys wrote: >> >> From the above I conclude that the disk with id that ends with > > Similar to what I wrote to Marcelo above in the thread, I'd recommend > running the "VM disk info gathering tool" attached to [1]. It's the > best way to ensure the merge was completed and determine which image > is > the "bad" one that is no longer in use by any volume chains.

I've ran the disk info gathering tool and this outputs (for the affected VM):

VM lena Disk b2390535-744f-4c02-bdc8-5a897226554b (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) Volumes: 24d78600-22f4-44f7-987b-fbd866736249

The id of the volume is the ID of the snapshot that is marked "illegal". So the "bad" image would be the dc39 one, which according to the UI is in use by the "Active VM" snapshot. Can this make sense?

It looks accurate. Live merges are "backwards" merges, so the merge would have pushed data from the volume associated with "Active VM" into the volume associated with the snapshot you're trying to remove.

Upon completion, we "pivot" so that the VM uses that older volume, and we update the engine database to reflect this (basically we re-associate that older volume with, in your case, "Active VM").

In your case, it seems the pivot operation was done, but the database wasn't updated to reflect it. Given snapshot/image associations e.g.:

VM Name Snapshot Name Volume ------- ------------- ------ My-VM Active VM 123-abc My-VM My-Snapshot 789-def

My-VM in your case is actually running on volume 789-def. If you run the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") (note the volume is the newer, "bad" one), then it will switch the volume association for you and remove the invalid entries.

Of course, I'd shut down the VM, and back up the db beforehand.

I've executed the sql script and it seems to have worked. Thanks!

...
...
"Active VM" should now be unused; it previously (pre-merge) was the data written since the snapshot was taken. Normally the larger actual size might be from qcow format overhead. If your listing above is complete (ie one volume for the vm), then I'm not sure why the base volume would have a larger actual size than virtual size.

Adam, Nir--any thoughts on this?

There is a bug which has caused inflation of the snapshot volumes when performing a live merge. We are submitting fixes for 3.5, 3.6, and master right at this moment.

Which bug number is assigned to this bug? Will upgrading to a release with a fix reduce the disk usage again?

See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug. It's about a clone disk failure after the problem occurs. Unfortunately, there is not an automatic way to repair the raw base volumes if they were affected by this bug. They will need to be manually shrunk using lvreduce if you are certain that they are inflated.

-- Adam Litke

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Nathanaël Blanchet

18 Mar 18 Mar

7:55 p.m.

Hello, I can create snapshot when no one exists but I'm not able to remove it after. It concerns many of my vms, and when stopping them, they can't boot anymore because of the illegal status of the disks, this leads me in a critical situation VM fedora23 is down with error. Exit message: Unable to get volume size for domain 5ef8572c-0ab5-4491-994a-e4c30230a525 volume e5969faa-97ea-41df-809b-cc62161ab1bc As far as I didn't initiate any live merge, am I concerned by this bug https://bugzilla.redhat.com/show_bug.cgi?id=1306741? I'm running 3.6.2, will upgrade to 3.6.3 solve this issue? 2016-03-18 18:26:57,652 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotCommand' with failure. 2016-03-18 18:26:57,663 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Could not delete image '46e9ecc8-e168-4f4d-926c-e769f5df1f2c' from snapshot '88fcf167-4302-405e-825f-ad7e0e9f6564' 2016-03-18 18:26:57,678 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Correlation ID: a1e222d, Job ID: 00d3e364-7e47-4022-82ff-f772cd79d4a1, Call Stack: null, Custom Event ID: -1, Message: Due to partial snapshot removal, Snapshot 'test' of VM 'fedora23' now contains only the following disks: 'fedora23_Disk1'. 2016-03-18 18:26:57,695 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand] (org.ovirt.thread.pool-8-thread-39) [724e99fd] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand' with failure. 2016-03-18 18:26:57,708 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandlin Thank you for your help. Le 23/02/2016 19:51, Greg Padgett a écrit :

...

On 02/22/2016 07:10 AM, Marcelo Leandro wrote:

...
Hello,

The bug with snapshot it will be fixed in ovirt 3.6.3?

thanks.

Hi Marcelo,

Yes, the bug below (bug 1301709) is now targeted to 3.6.3.

Thanks, Greg

...
2016-02-18 11:34 GMT-03:00 Adam Litke <alitke@redhat.com>:

...
On 18/02/16 10:37 +0100, Rik Theys wrote:

...
Hi,

On 02/17/2016 05:29 PM, Adam Litke wrote:

...
On 17/02/16 11:14 -0500, Greg Padgett wrote:

...
On 02/17/2016 03:42 AM, Rik Theys wrote: > > Hi, > > On 02/16/2016 10:52 PM, Greg Padgett wrote: >> >> On 02/16/2016 08:50 AM, Rik Theys wrote: >>> >>> From the above I conclude that the disk with id that ends with >> >> Similar to what I wrote to Marcelo above in the thread, I'd >> recommend >> running the "VM disk info gathering tool" attached to [1]. >> It's the >> best way to ensure the merge was completed and determine which >> image >> is >> the "bad" one that is no longer in use by any volume chains. > > > I've ran the disk info gathering tool and this outputs (for the > affected > VM): > > VM lena > Disk b2390535-744f-4c02-bdc8-5a897226554b > (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) > Volumes: > 24d78600-22f4-44f7-987b-fbd866736249 > > The id of the volume is the ID of the snapshot that is marked > "illegal". > So the "bad" image would be the dc39 one, which according to the > UI is > in use by the "Active VM" snapshot. Can this make sense?

It looks accurate. Live merges are "backwards" merges, so the merge would have pushed data from the volume associated with "Active VM" into the volume associated with the snapshot you're trying to remove.

Upon completion, we "pivot" so that the VM uses that older volume, and we update the engine database to reflect this (basically we re-associate that older volume with, in your case, "Active VM").

In your case, it seems the pivot operation was done, but the database wasn't updated to reflect it. Given snapshot/image associations e.g.:

VM Name Snapshot Name Volume ------- ------------- ------ My-VM Active VM 123-abc My-VM My-Snapshot 789-def

My-VM in your case is actually running on volume 789-def. If you run the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") (note the volume is the newer, "bad" one), then it will switch the volume association for you and remove the invalid entries.

Of course, I'd shut down the VM, and back up the db beforehand.

I've executed the sql script and it seems to have worked. Thanks!

...
...
"Active VM" should now be unused; it previously (pre-merge) was the data written since the snapshot was taken. Normally the larger actual size might be from qcow format overhead. If your listing above is complete (ie one volume for the vm), then I'm not sure why the base volume would have a larger actual size than virtual size.

Adam, Nir--any thoughts on this?

There is a bug which has caused inflation of the snapshot volumes when performing a live merge. We are submitting fixes for 3.5, 3.6, and master right at this moment.

Which bug number is assigned to this bug? Will upgrading to a release with a fix reduce the disk usage again?

See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug. It's about a clone disk failure after the problem occurs. Unfortunately, there is not an automatic way to repair the raw base volumes if they were affected by this bug. They will need to be manually shrunk using lvreduce if you are certain that they are inflated.

-- Adam Litke

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Nathanaël Blanchet Supervision réseau Pôle Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr

Nir Soffer

9:10 p.m.

On Fri, Mar 18, 2016 at 7:55 PM, Nathanaël Blanchet <blanchet@abes.fr> wrote:

...

Hello,

I can create snapshot when no one exists but I'm not able to remove it after.

Do you try to remove it when the vm is running?

...

It concerns many of my vms, and when stopping them, they can't boot anymore because of the illegal status of the disks, this leads me in a critical situation

VM fedora23 is down with error. Exit message: Unable to get volume size for domain 5ef8572c-0ab5-4491-994a-e4c30230a525 volume e5969faa-97ea-41df-809b-cc62161ab1bc

As far as I didn't initiate any live merge, am I concerned by this bug https://bugzilla.redhat.com/show_bug.cgi?id=1306741? I'm running 3.6.2, will upgrade to 3.6.3 solve this issue?

If you tried to remove a snapshot while the vm is running you did initiate live merge, and this bug may effect you. Adding Greg for adding more info about this.

...

2016-03-18 18:26:57,652 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotCommand' with failure. 2016-03-18 18:26:57,663 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Could not delete image '46e9ecc8-e168-4f4d-926c-e769f5df1f2c' from snapshot '88fcf167-4302-405e-825f-ad7e0e9f6564' 2016-03-18 18:26:57,678 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Correlation ID: a1e222d, Job ID: 00d3e364-7e47-4022-82ff-f772cd79d4a1, Call Stack: null, Custom Event ID: -1, Message: Due to partial snapshot removal, Snapshot 'test' of VM 'fedora23' now contains only the following disks: 'fedora23_Disk1'. 2016-03-18 18:26:57,695 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand] (org.ovirt.thread.pool-8-thread-39) [724e99fd] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand' with failure. 2016-03-18 18:26:57,708 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandlin

Thank you for your help.

Le 23/02/2016 19:51, Greg Padgett a écrit :

...
On 02/22/2016 07:10 AM, Marcelo Leandro wrote:

...
Hello,

The bug with snapshot it will be fixed in ovirt 3.6.3?

thanks.

Hi Marcelo,

Yes, the bug below (bug 1301709) is now targeted to 3.6.3.

Thanks, Greg

...
2016-02-18 11:34 GMT-03:00 Adam Litke <alitke@redhat.com>:

...
On 18/02/16 10:37 +0100, Rik Theys wrote:

...
Hi,

On 02/17/2016 05:29 PM, Adam Litke wrote:

...
On 17/02/16 11:14 -0500, Greg Padgett wrote: > > > On 02/17/2016 03:42 AM, Rik Theys wrote: >> >> >> Hi, >> >> On 02/16/2016 10:52 PM, Greg Padgett wrote: >>> >>> >>> On 02/16/2016 08:50 AM, Rik Theys wrote: >>>> >>>> >>>> From the above I conclude that the disk with id that ends with >>> >>> >>> Similar to what I wrote to Marcelo above in the thread, I'd >>> recommend >>> running the "VM disk info gathering tool" attached to [1]. It's >>> the >>> best way to ensure the merge was completed and determine which >>> image >>> is >>> the "bad" one that is no longer in use by any volume chains. >> >> >> >> I've ran the disk info gathering tool and this outputs (for the >> affected >> VM): >> >> VM lena >> Disk b2390535-744f-4c02-bdc8-5a897226554b >> (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) >> Volumes: >> 24d78600-22f4-44f7-987b-fbd866736249 >> >> The id of the volume is the ID of the snapshot that is marked >> "illegal". >> So the "bad" image would be the dc39 one, which according to the UI >> is >> in use by the "Active VM" snapshot. Can this make sense? > > > > It looks accurate. Live merges are "backwards" merges, so the merge > would have pushed data from the volume associated with "Active VM" > into the volume associated with the snapshot you're trying to remove. > > Upon completion, we "pivot" so that the VM uses that older volume, > and > we update the engine database to reflect this (basically we > re-associate that older volume with, in your case, "Active VM"). > > In your case, it seems the pivot operation was done, but the database > wasn't updated to reflect it. Given snapshot/image associations > e.g.: > > VM Name Snapshot Name Volume > ------- ------------- ------ > My-VM Active VM 123-abc > My-VM My-Snapshot 789-def > > My-VM in your case is actually running on volume 789-def. If you run > the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") > (note the volume is the newer, "bad" one), then it will switch the > volume association for you and remove the invalid entries. > > Of course, I'd shut down the VM, and back up the db beforehand.

I've executed the sql script and it seems to have worked. Thanks!

...
> "Active VM" should now be unused; it previously (pre-merge) was the > data written since the snapshot was taken. Normally the larger > actual > size might be from qcow format overhead. If your listing above is > complete (ie one volume for the vm), then I'm not sure why the base > volume would have a larger actual size than virtual size. > > Adam, Nir--any thoughts on this?

There is a bug which has caused inflation of the snapshot volumes when performing a live merge. We are submitting fixes for 3.5, 3.6, and master right at this moment.

Which bug number is assigned to this bug? Will upgrading to a release with a fix reduce the disk usage again?

See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug. It's about a clone disk failure after the problem occurs. Unfortunately, there is not an automatic way to repair the raw base volumes if they were affected by this bug. They will need to be manually shrunk using lvreduce if you are certain that they are inflated.

-- Adam Litke

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Nathanaël Blanchet

Supervision réseau Pôle Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Greg Padgett

9:50 p.m.

On 03/18/2016 03:10 PM, Nir Soffer wrote:

...

On Fri, Mar 18, 2016 at 7:55 PM, Nathanaël Blanchet <blanchet@abes.fr> wrote:

...
Hello,

I can create snapshot when no one exists but I'm not able to remove it after.

Do you try to remove it when the vm is running?

...
It concerns many of my vms, and when stopping them, they can't boot anymore because of the illegal status of the disks, this leads me in a critical situation

VM fedora23 is down with error. Exit message: Unable to get volume size for domain 5ef8572c-0ab5-4491-994a-e4c30230a525 volume e5969faa-97ea-41df-809b-cc62161ab1bc

As far as I didn't initiate any live merge, am I concerned by this bug https://bugzilla.redhat.com/show_bug.cgi?id=1306741? I'm running 3.6.2, will upgrade to 3.6.3 solve this issue?

If you tried to remove a snapshot while the vm is running you did initiate live merge, and this bug may effect you.

Adding Greg for adding more info about this.

Hi Nathanaël, From the logs you pasted below, showing RemoveSnapshotSingleDiskCommand (not ..SingleDiskLiveCommand), it looks like a non-live snapshot. In that case, bug 1306741 would not affect you. To dig deeper, we'd need to know the root cause of why the image could not be deleted. You should be able to find some clues in your engine log above the snippet you pasted below, or perhaps something in the vdsm log will reveal the reason. Thanks, Greg

...

...
2016-03-18 18:26:57,652 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotCommand' with failure. 2016-03-18 18:26:57,663 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Could not delete image '46e9ecc8-e168-4f4d-926c-e769f5df1f2c' from snapshot '88fcf167-4302-405e-825f-ad7e0e9f6564' 2016-03-18 18:26:57,678 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Correlation ID: a1e222d, Job ID: 00d3e364-7e47-4022-82ff-f772cd79d4a1, Call Stack: null, Custom Event ID: -1, Message: Due to partial snapshot removal, Snapshot 'test' of VM 'fedora23' now contains only the following disks: 'fedora23_Disk1'. 2016-03-18 18:26:57,695 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand] (org.ovirt.thread.pool-8-thread-39) [724e99fd] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand' with failure. 2016-03-18 18:26:57,708 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandlin

Thank you for your help.

Le 23/02/2016 19:51, Greg Padgett a écrit :

...
On 02/22/2016 07:10 AM, Marcelo Leandro wrote:

...
Hello,

The bug with snapshot it will be fixed in ovirt 3.6.3?

thanks.

Hi Marcelo,

Yes, the bug below (bug 1301709) is now targeted to 3.6.3.

Thanks, Greg

...
2016-02-18 11:34 GMT-03:00 Adam Litke <alitke@redhat.com>:

...
On 18/02/16 10:37 +0100, Rik Theys wrote:

...
Hi,

On 02/17/2016 05:29 PM, Adam Litke wrote: > > > On 17/02/16 11:14 -0500, Greg Padgett wrote: >> >> >> On 02/17/2016 03:42 AM, Rik Theys wrote: >>> >>> >>> Hi, >>> >>> On 02/16/2016 10:52 PM, Greg Padgett wrote: >>>> >>>> >>>> On 02/16/2016 08:50 AM, Rik Theys wrote: >>>>> >>>>> >>>>> From the above I conclude that the disk with id that ends with >>>> >>>> >>>> Similar to what I wrote to Marcelo above in the thread, I'd >>>> recommend >>>> running the "VM disk info gathering tool" attached to [1]. It's >>>> the >>>> best way to ensure the merge was completed and determine which >>>> image >>>> is >>>> the "bad" one that is no longer in use by any volume chains. >>> >>> >>> >>> I've ran the disk info gathering tool and this outputs (for the >>> affected >>> VM): >>> >>> VM lena >>> Disk b2390535-744f-4c02-bdc8-5a897226554b >>> (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) >>> Volumes: >>> 24d78600-22f4-44f7-987b-fbd866736249 >>> >>> The id of the volume is the ID of the snapshot that is marked >>> "illegal". >>> So the "bad" image would be the dc39 one, which according to the UI >>> is >>> in use by the "Active VM" snapshot. Can this make sense? >> >> >> >> It looks accurate. Live merges are "backwards" merges, so the merge >> would have pushed data from the volume associated with "Active VM" >> into the volume associated with the snapshot you're trying to remove. >> >> Upon completion, we "pivot" so that the VM uses that older volume, >> and >> we update the engine database to reflect this (basically we >> re-associate that older volume with, in your case, "Active VM"). >> >> In your case, it seems the pivot operation was done, but the database >> wasn't updated to reflect it. Given snapshot/image associations >> e.g.: >> >> VM Name Snapshot Name Volume >> ------- ------------- ------ >> My-VM Active VM 123-abc >> My-VM My-Snapshot 789-def >> >> My-VM in your case is actually running on volume 789-def. If you run >> the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") >> (note the volume is the newer, "bad" one), then it will switch the >> volume association for you and remove the invalid entries. >> >> Of course, I'd shut down the VM, and back up the db beforehand.

I've executed the sql script and it seems to have worked. Thanks!

>> "Active VM" should now be unused; it previously (pre-merge) was the >> data written since the snapshot was taken. Normally the larger >> actual >> size might be from qcow format overhead. If your listing above is >> complete (ie one volume for the vm), then I'm not sure why the base >> volume would have a larger actual size than virtual size. >> >> Adam, Nir--any thoughts on this? > > > > There is a bug which has caused inflation of the snapshot volumes when > performing a live merge. We are submitting fixes for 3.5, 3.6, and > master right at this moment.

Which bug number is assigned to this bug? Will upgrading to a release with a fix reduce the disk usage again?

See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug. It's about a clone disk failure after the problem occurs. Unfortunately, there is not an automatic way to repair the raw base volumes if they were affected by this bug. They will need to be manually shrunk using lvreduce if you are certain that they are inflated.

-- Adam Litke

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Nathanaël Blanchet

Supervision réseau Pôle Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Marcelo Leandro

2 May 2 May

1:36 a.m.

hello, i have problem for delete one snapshot. output the script vm-disk-info.py Warning: volume 023110fa-7d24-46ec-ada8-d617d7c2adaf is in chain but illegal Volumes: a09bfb5d-3922-406d-b4e0-daafad96ffec after running the md5sum command I realized that the volume change is the base: a09bfb5d-3922-406d-b4e0-daafad96ffec the disk 023110fa-7d24-46ec-ada8-d617d7c2adaf does not change. Thanks. 2016-03-18 16:50 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...

On 03/18/2016 03:10 PM, Nir Soffer wrote:

...
On Fri, Mar 18, 2016 at 7:55 PM, Nathanaël Blanchet <blanchet@abes.fr> wrote:

...
Hello,

I can create snapshot when no one exists but I'm not able to remove it after.

Do you try to remove it when the vm is running?

It concerns many of my vms, and when stopping them, they can't boot

...
anymore because of the illegal status of the disks, this leads me in a critical situation

VM fedora23 is down with error. Exit message: Unable to get volume size for domain 5ef8572c-0ab5-4491-994a-e4c30230a525 volume e5969faa-97ea-41df-809b-cc62161ab1bc

As far as I didn't initiate any live merge, am I concerned by this bug https://bugzilla.redhat.com/show_bug.cgi?id=1306741? I'm running 3.6.2, will upgrade to 3.6.3 solve this issue?

If you tried to remove a snapshot while the vm is running you did initiate live merge, and this bug may effect you.

Adding Greg for adding more info about this.

Hi Nathanaël,

From the logs you pasted below, showing RemoveSnapshotSingleDiskCommand (not ..SingleDiskLiveCommand), it looks like a non-live snapshot. In that case, bug 1306741 would not affect you.

To dig deeper, we'd need to know the root cause of why the image could not be deleted. You should be able to find some clues in your engine log above the snippet you pasted below, or perhaps something in the vdsm log will reveal the reason.

Thanks, Greg

...
...
2016-03-18 18:26:57,652 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotCommand' with failure. 2016-03-18 18:26:57,663 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Could not delete image '46e9ecc8-e168-4f4d-926c-e769f5df1f2c' from snapshot '88fcf167-4302-405e-825f-ad7e0e9f6564' 2016-03-18 18:26:57,678 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-39) [a1e222d] Correlation ID: a1e222d, Job ID: 00d3e364-7e47-4022-82ff-f772cd79d4a1, Call Stack: null, Custom Event ID: -1, Message: Due to partial snapshot removal, Snapshot 'test' of VM 'fedora23' now contains only the following disks: 'fedora23_Disk1'. 2016-03-18 18:26:57,695 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand] (org.ovirt.thread.pool-8-thread-39) [724e99fd] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand' with failure. 2016-03-18 18:26:57,708 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandlin

Thank you for your help.

Le 23/02/2016 19:51, Greg Padgett a écrit :

...
On 02/22/2016 07:10 AM, Marcelo Leandro wrote:

...
Hello,

The bug with snapshot it will be fixed in ovirt 3.6.3?

thanks.

Hi Marcelo,

Yes, the bug below (bug 1301709) is now targeted to 3.6.3.

Thanks, Greg

2016-02-18 11:34 GMT-03:00 Adam Litke <alitke@redhat.com>:

...
...
On 18/02/16 10:37 +0100, Rik Theys wrote:

> > > Hi, > > On 02/17/2016 05:29 PM, Adam Litke wrote: > >> >> >> On 17/02/16 11:14 -0500, Greg Padgett wrote: >> >>> >>> >>> On 02/17/2016 03:42 AM, Rik Theys wrote: >>> >>>> >>>> >>>> Hi, >>>> >>>> On 02/16/2016 10:52 PM, Greg Padgett wrote: >>>> >>>>> >>>>> >>>>> On 02/16/2016 08:50 AM, Rik Theys wrote: >>>>> >>>>>> >>>>>> >>>>>> From the above I conclude that the disk with id that ends >>>>>> with >>>>>> >>>>> >>>>> >>>>> Similar to what I wrote to Marcelo above in the thread, I'd >>>>> recommend >>>>> running the "VM disk info gathering tool" attached to [1]. It's >>>>> the >>>>> best way to ensure the merge was completed and determine which >>>>> image >>>>> is >>>>> the "bad" one that is no longer in use by any volume chains. >>>>> >>>> >>>> >>>> >>>> I've ran the disk info gathering tool and this outputs (for the >>>> affected >>>> VM): >>>> >>>> VM lena >>>> Disk b2390535-744f-4c02-bdc8-5a897226554b >>>> (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) >>>> Volumes: >>>> 24d78600-22f4-44f7-987b-fbd866736249 >>>> >>>> The id of the volume is the ID of the snapshot that is marked >>>> "illegal". >>>> So the "bad" image would be the dc39 one, which according to the >>>> UI >>>> is >>>> in use by the "Active VM" snapshot. Can this make sense? >>>> >>> >>> >>> >>> It looks accurate. Live merges are "backwards" merges, so the >>> merge >>> would have pushed data from the volume associated with "Active VM" >>> into the volume associated with the snapshot you're trying to >>> remove. >>> >>> Upon completion, we "pivot" so that the VM uses that older volume, >>> and >>> we update the engine database to reflect this (basically we >>> re-associate that older volume with, in your case, "Active VM"). >>> >>> In your case, it seems the pivot operation was done, but the >>> database >>> wasn't updated to reflect it. Given snapshot/image associations >>> e.g.: >>> >>> VM Name Snapshot Name Volume >>> ------- ------------- ------ >>> My-VM Active VM 123-abc >>> My-VM My-Snapshot 789-def >>> >>> My-VM in your case is actually running on volume 789-def. If you >>> run >>> the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") >>> (note the volume is the newer, "bad" one), then it will switch the >>> volume association for you and remove the invalid entries. >>> >>> Of course, I'd shut down the VM, and back up the db beforehand. >>> >> > > > I've executed the sql script and it seems to have worked. Thanks! > > "Active VM" should now be unused; it previously (pre-merge) was the >>> data written since the snapshot was taken. Normally the larger >>> actual >>> size might be from qcow format overhead. If your listing above is >>> complete (ie one volume for the vm), then I'm not sure why the base >>> volume would have a larger actual size than virtual size. >>> >>> Adam, Nir--any thoughts on this? >>> >> >> >> >> There is a bug which has caused inflation of the snapshot volumes >> when >> performing a live merge. We are submitting fixes for 3.5, 3.6, and >> master right at this moment. >> > > > > Which bug number is assigned to this bug? Will upgrading to a release > with a fix reduce the disk usage again? >

See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug. It's about a clone disk failure after the problem occurs. Unfortunately, there is not an automatic way to repair the raw base volumes if they were affected by this bug. They will need to be manually shrunk using lvreduce if you are certain that they are inflated.

-- Adam Litke

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Nathanaël Blanchet

Supervision réseau Pôle Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Greg Padgett

16 Feb 16 Feb

11:47 p.m.

On 02/16/2016 07:10 AM, Marcelo Leandro wrote:

...

Hello, I have the same problem, i tried delete snapshot but it did not success, the status snapshot as illegal , look the engine.log follow you can see the messages error:

Hi Marcelo, The problem in your log, error = Drive image file could not be found, code = 13 is a little different but may have been triggered by the same bug in a previous merge attempt. In this case, would you run the "VM disk info gathering tool" from [1]? See Adam's comment in the bug about its execution. If this shows that the VM is no longer dependent on that missing volume, then please shut down engine, back up the db, and run the "post-merge failure repair script", also attached to [1]. Arguments in your case would be the VM name, snapshot name, and the UUID of the image that is missing from your storage. (You may need to manually mark the image as illegal first, [2]). HTH, Greg [1] https://bugzilla.redhat.com/show_bug.cgi?id=1306741 [2] UPDATE images SET imagestatus = 4 WHERE image_guid = '<imgId>'::UUID;

...

2016-02-16 08:46:20,059 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-57) [46dd2ef7] Waiting on Live Merge child commands to complete 2016-02-16 08:46:21,069 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-40) [15b703ee] Waiting on Live Merge command step 'MERGE' to complete 2016-02-16 08:46:22,072 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-65) [30cdf6ed] Waiting on merge command to complete 2016-02-16 08:46:23,670 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (default task-48) [5e0c088f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[94d788f4-eba4-49ee-8091-80028cc46627=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2016-02-16 08:46:23,795 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (default task-48) [5e0c088f] Running command: RemoveSnapshotCommand internal: false. Entities affected : ID: 94d788f4-eba4-49ee-8091-80028cc46627 Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER 2016-02-16 08:46:23,824 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (default task-48) [5e0c088f] Lock freed to object 'EngineLock:{exclusiveLocks='[94d788f4-eba4-49ee-8091-80028cc46627=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2016-02-16 08:46:23,876 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (pool-7-thread-5) [1be123ac] Running command: RemoveSnapshotSingleDiskLiveCommand internal: true. Entities affected : ID: 00000000-0000-0000-0000-000000000000 Type: Storage 2016-02-16 08:46:23,921 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-48) [] Correlation ID: 5e0c088f, Job ID: aa811e83-24fb-4658-b849-d36439f58d95, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'BKP the VM' deletion for VM 'Servidor-Cliente' was initiated by admin@internal. 2016-02-16 08:46:24,093 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-14) [1be123ac] Executing Live Merge command step 'EXTEND' 2016-02-16 08:46:24,122 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-14) [] Waiting on Live Merge child commands to complete 2016-02-16 08:46:24,133 INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-7-thread-6) [766ffc9f] Running command: MergeExtendCommand internal: true. Entities affected : ID: c2dc0101-748e-4a7b-9913-47993eaa52bd Type: Storage 2016-02-16 08:46:24,134 INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (pool-7-thread-6) [766ffc9f] Base and top image sizes are the same; no image size update required 2016-02-16 08:46:25,133 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-16) [1be123ac] Executing Live Merge command step 'MERGE' 2016-02-16 08:46:25,168 INFO [org.ovirt.engine.core.bll.MergeCommand] (pool-7-thread-7) [1b7bc421] Running command: MergeCommand internal: true. Entities affected : ID: c2dc0101-748e-4a7b-9913-47993eaa52bd Type: Storage 2016-02-16 08:46:25,169 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] START, MergeVDSCommand(HostName = Host01, MergeVDSCommandParameters:{runAsync='true', hostId='d4f29978-1540-44d9-ab22-1e6ff750059f', vmId='94d788f4-eba4-49ee-8091-80028cc46627', storagePoolId='77e24b20-9d21-4952-a089-3c5c592b4e6d', storageDomainId='c2dc0101-748e-4a7b-9913-47993eaa52bd', imageGroupId='b7a27d0c-57cc-490e-a3f8-b4981310a9b0', imageId='7f8bb099-9a18-4e89-bf48-57e56e5770d2', baseImageId='2e59f7f2-9e30-460e-836a-5e0d3d625059', topImageId='7f8bb099-9a18-4e89-bf48-57e56e5770d2', bandwidth='0'}), log id: 2a7ab7b7 2016-02-16 08:46:25,176 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] Failed in 'MergeVDS' method 2016-02-16 08:46:25,179 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-7-thread-7) [1b7bc421] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM Host01 command failed: Drive image file could not be found 2016-02-16 08:46:25,179 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=13, message=Drive image file could not be found]]' 2016-02-16 08:46:25,179 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] HostName = Host01 2016-02-16 08:46:25,179 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] Command 'MergeVDSCommand(HostName = Host01, MergeVDSCommandParameters:{runAsync='true', hostId='d4f29978-1540-44d9-ab22-1e6ff750059f', vmId='94d788f4-eba4-49ee-8091-80028cc46627', storagePoolId='77e24b20-9d21-4952-a089-3c5c592b4e6d', storageDomainId='c2dc0101-748e-4a7b-9913-47993eaa52bd', imageGroupId='b7a27d0c-57cc-490e-a3f8-b4981310a9b0', imageId='7f8bb099-9a18-4e89-bf48-57e56e5770d2', baseImageId='2e59f7f2-9e30-460e-836a-5e0d3d625059', topImageId='7f8bb099-9a18-4e89-bf48-57e56e5770d2', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 2016-02-16 08:46:25,179 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-7-thread-7) [1b7bc421] FINISH, MergeVDSCommand, log id: 2a7ab7b7 2016-02-16 08:46:25,180 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-7-thread-7) [1b7bc421] Command 'org.ovirt.engine.core.bll.MergeCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 (Failed with error imageErr and code 13) 2016-02-16 08:46:25,186 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-7-thread-7) [1b7bc421] Transaction rolled-back for command 'org.ovirt.engine.core.bll.MergeCommand'. 2016-02-16 08:46:26,159 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-25) [15b703ee] Waiting on Live Merge child commands to complete 2016-02-16 08:46:27,164 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-32) [1be123ac] Failed child command status for step 'MERGE' 2016-02-16 08:46:27,497 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-37) [30cdf6ed] VM job '77669e28-4aa2-4038-b7b6-1a949a1d039e': In progress, updating 2016-02-16 08:46:28,192 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-87) [1be123ac] Merging of snapshot '119f668e-af60-49ea-aa08-735be8af0a7d' images '2e59f7f2-9e30-460e-836a-5e0d3d625059'..'7f8bb099-9a18-4e89-bf48-57e56e5770d2' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation. 2016-02-16 08:46:28,204 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-87) [5e0c088f] All Live Merge child commands have completed, status 'FAILED' 2016-02-16 08:46:29,216 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (DefaultQuartzScheduler_Worker-89) [5e0c088f] Ending command 'org.ovirt.engine.core.bll.RemoveSnapshotCommand' with failure. 2016-02-16 08:46:29,263 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-89) [5e0c088f] Correlation ID: 5e0c088f, Job ID: aa811e83-24fb-4658-b849-d36439f58d95, Call Stack: null, Custom Event ID: -1, Message: Failed to delete snapshot 'BKP the VM' for VM 'Servidor-Cliente'. 2016-02-16 08:46:30,287 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-33) [] Waiting on Live Merge child commands to complete 2016-02-16 08:46:31,298 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-21) [15b703ee] Waiting on Live Merge command step 'MERGE' to complete 2016-02-16 08:46:32,301 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-68) [30cdf6ed] Waiting on merge command to complete 2016-02-16 08:46:40,304 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-55) [280a8a32] Waiting on Live Merge child commands to complete 2016-02-16 08:46:41,308 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-54) [15b703ee] Waiting on Live Merge command step 'MERGE' to complete 2016-02-16 08:46:42,312 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-57) [30cdf6ed] Waiting on merge command to complete 2016-02-16 08:46:42,850 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-84) [] VM job '77669e28-4aa2-4038-b7b6-1a949a1d039e': In progress, updating 2016-02-16 08:46:42,854 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-84) [] START, FullListVDSCommand(HostName = , FullListVDSCommandParameters:{runAsync='true', hostId='aebc403a-ec4e-4346-9029-6353d5d76f01', vds='Host[,aebc403a-ec4e-4346-9029-6353d5d76f01]', vmIds='[6af1f9c3-7210-45c3-90dc-bd7793346c0c]'}), log id: 74961dad

I cannot see the snapshot disk at the storage domain:

[root@ ~]# cd /rhev/data-center/77e24b20-9d21-4952-a089-3c5c592b4e6d/c1938052-7524-404c-bac9-f238227269ea/images/b7a27d0c-57cc-490e-a3f8-b4981310a9b0/ [root@ b7a27d0c-57cc-490e-a3f8-b4981310a9b0]# ls 2e59f7f2-9e30-460e-836a-5e0d3d625059 2e59f7f2-9e30-460e-836a-5e0d3d625059.meta

Thanks.

2016-02-09 21:30 GMT-03:00 Greg Padgett <gpadgett@redhat.com>:

...
On 02/09/2016 06:08 AM, Michal Skrivanek wrote:

...
...
On 03 Feb 2016, at 10:37, Rik Theys <Rik.Theys@esat.kuleuven.be> wrote:

Hi,

In the mean time I've noticed the following entries in our periodic logcheck output:

Feb 3 09:05:53 orinoco journal: block copy still active: disk 'vda' not ready for pivot yet Feb 3 09:05:53 orinoco journal: vdsm root ERROR Unhandled exception#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 734, in wrapper#012 return f(*a, **kw)#012 File "/usr/share/vdsm/virt/vm.py", line 5168, in run#012 self.tryPivot()#012 File "/usr/share/vdsm/virt/vm.py", line 5137, in tryPivot#012 ret = self.vm._dom.blockJobAbort(self.drive.name, flags)#012 File "/usr/share/vdsm/virt/virdomain.py", line 68, in f#012 ret = attr(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper#012 ret = f(*args, **kwargs)#012 File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort#012 if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)#012libvirtError: block copy still active: disk 'vda' not ready for pivot yet

This is from the host running the VM.

Note that this host is not the SPM of the cluster. I always thought all operations on disk volumes happened on the SPM host?

My question still remains:

...
I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

No, it’s not ok to remove it while live merge(apparently) is still ongoing I guess that’s a live merge bug?

Indeed, this is bug 1302215.

I wrote a sql script to help with cleanup in this scenario, which you can find attached to the bug along with a description of how to use it[1].

However, Rik, before trying that, would you be able to run the attached script [2] (or just the db query within) and forward the output to me? I'd like to make sure everything looks as it should before modifying the db directly.

Thanks, Greg

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 (Also note that the engine should be stopped before running this.)

[2] Arguments are the ovirt db name, db user, and the name of the vm you were performing live merge on.

...
Thanks, michal

...
Regards,

Rik

On 02/03/2016 10:26 AM, Rik Theys wrote:

...
Hi,

I created a snapshot of a running VM prior to an OS upgrade. The OS upgrade has now been succesful and I would like to remove the snapshot. I've selected the snapshot in the UI and clicked Delete to start the task.

After a few minutes, the task has failed. When I click delete again on the same snapshot, the failed message is returned after a few seconds.

...
From browsing through the engine log (attached) it seems the snapshot

was correctly merged in the first try but something went wrong in the finalizing fase. On retries, the log indicates the snapshot/disk image no longer exists and the removal of the snapshot fails for this reason.

Is there any way to clean up this snapshot?

I can see the snapshot in the "Disk snapshot" tab of the storage. It has a status of "illegal". Is it OK to (try to) remove this snapshot? Will this impact the running VM and/or disk image?

Regards,

Rik

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Rik Theys System Engineer KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee +32(0)16/32.11.07 ---------------------------------------------------------------- <<Any errors in spelling, tact or fact are transmission errors>> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

3386

Age (days ago)

3474

Last active (days ago)

List overview

Download

22 comments

7 participants

participants (7)

Adam Litke
Greg Padgett
Marcelo Leandro
Michal Skrivanek
Nathanaël Blanchet
Nir Soffer
Rik Theys

Can't remove snapshot

tags

participants (7)