[Users] Live storage migration snapshot removal (fails)

When I move a disk (of a running vm) to a different storage pool a live snapshot ('Auto-generated for Live Storage Migration') is created. The allocation type of the virtual disk changes from "Preallocated" to "Thin Provision" with actual size > virtual size (e.g. 4GB actual, 1GB virtual). This snapshot is not removed after the storage migration is completed and cannot be removed from a running VM ( probably because live snapshot deletion is not implemented yet?). When I stop the VM and then remove the snapshot this fails and the snapshot state changes to "BROKEN". I've done some additional testing and the problem only occurs when a preallocated disk is used. Offline snapshot removal on a thin provisioned disk works (no online removal option). Is this expected behaviour? If so, I think live storage migration for preallocated disk should be disabled. Attached vdsm.log of the SPM host with the DEBUG and INFO lines removed to get it down to a manageable size. Log excerpts: Thread-20::WARNING::2013-11-08 13:48:30,423::persistentDict::256::Storage.PersistentDict::(refresh) data has no embedded checksum - trust it as it is f686ef6c-17e7-4777-bc69-5a67d38e3fa5::WARNING::2013-11-08 13:48:44,370::lvm::594::Storage.LVM::(getLv) lv: _remove_me_usZUyexc_9538a a00-2b19-495e-b091-3ffa76b1543c not found in lvs vg: 9d4e8a43-4851-42ff-a684-f3d802527cf7 response f686ef6c-17e7-4777-bc69-5a67d38e3fa5::ERROR::2013-11-08 13:48:44,375::task::850::TaskManager.Task::(_setError) Task=`f686ef6c-17e7-4 777-bc69-5a67d38e3fa5`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots sdUUID, vmUUID, imgUUID, ancestor, successor, postZero) File "/usr/share/vdsm/storage/image.py", line 1162, in merge srcVol.shrinkToOptimalSize() File "/usr/share/vdsm/storage/blockVolume.py", line 315, in shrinkToOptimalSize volParams = self.getVolumeParams() File "/usr/share/vdsm/storage/volume.py", line 1008, in getVolumeParams volParams['imgUUID'] = self.getImage() File "/usr/share/vdsm/storage/blockVolume.py", line 494, in getImage return self.getVolumeTag(TAG_PREFIX_IMAGE) File "/usr/share/vdsm/storage/blockVolume.py", line 464, in getVolumeTag return _getVolumeTag(self.sdUUID, self.volUUID, tagPrefix) File "/usr/share/vdsm/storage/blockVolume.py", line 662, in _getVolumeTag tags = lvm.getLV(sdUUID, volUUID).tags File "/usr/share/vdsm/storage/lvm.py", line 851, in getLV raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName)) LogicalVolumeDoesNotExistError: Logical volume does not exist: ('9d4e8a43-4851-42ff-a684-f3d802527cf7/_remove_me_usZUyexc_9538aa00-2b19-495e-b091-3ffa76b1543c',) e5ba40e5-05a4-4e9d-9f3d-c6f2000fb296::WARNING::2013-11-08 13:49:37,921::lvm::594::Storage.LVM::(getLv) lv: b6bae791-dc47-46c2-86e2-1557b3caa66e not found in lvs vg: dfbbc8dd-bfae-44e1-8876-2bb82921565a response e5ba40e5-05a4-4e9d-9f3d-c6f2000fb296::WARNING::2013-11-08 13:49:38,502::utils::113::root::(rmFile) File: /rhev/data-center/def9b712-876a-49a9-b4b9-df9770befac4/dfbbc8dd-bfae-44e1-8876-2bb82921565a/images/c0d7b583-2f63-4c6b-9088-6f5aa84ccfcc/b6bae791-dc47-46c2-86e2-1557b3caa66e already removed 52178458-1764-4317-b85b-71843054aae9::WARNING::2013-11-08 14:02:34,284::blockSD::742::Storage.StorageDomain::(_getOccupiedMetadataSlots) Could not find mapping for lv 9d4e8a43-4851-42ff-a684-f3d802527cf7/c512267d-ebba-4907-a782-fec9b6c95116 52178458-1764-4317-b85b-71843054aae9::WARNING::2013-11-08 14:02:53,772::image::1164::Storage.Image::(merge) Auto shrink after merge failed Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 1162, in merge srcVol.shrinkToOptimalSize() File "/usr/share/vdsm/storage/blockVolume.py", line 320, in shrinkToOptimalSize qemuImg.FORMAT.QCOW2) File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 109, in check raise QImgError(rc, out, err, "unable to parse qemu-img check output") QImgError: ecode=0, stdout=['No errors were found on the image.'], stderr=[], message=unable to parse qemu-img check output Thread-20::ERROR::2013-11-08 14:03:42,869::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 9d4e8a43-4851-42ff-a684-f3d802527cf7

On Fri, Nov 08, 2013 at 02:20:39PM +0100, Sander Grendelman wrote: <snip>
9d4e8a43-4851-42ff-a684-f3d802527cf7/c512267d-ebba-4907-a782-fec9b6c95116 52178458-1764-4317-b85b-71843054aae9::WARNING::2013-11-08 14:02:53,772::image::1164::Storage.Image::(merge) Auto shrink after merge failed Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 1162, in merge srcVol.shrinkToOptimalSize() File "/usr/share/vdsm/storage/blockVolume.py", line 320, in shrinkToOptimalSize qemuImg.FORMAT.QCOW2) File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 109, in check raise QImgError(rc, out, err, "unable to parse qemu-img check output") QImgError: ecode=0, stdout=['No errors were found on the image.'], stderr=[], message=unable to parse qemu-img check output
I'm not sure that it's the only problem in this flow, but there's a clear bug in lib/vdsm/qemuImg.py's check() function: it fails to parse the output of qemu-img. Would you open a bug on that? I found no open one. Dan.

----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Sander Grendelman" <sander@grendelman.com> Cc: users@ovirt.org, fsimonce@redhat.com, ykaplan@redhat.com Sent: Friday, November 8, 2013 4:06:53 PM Subject: Re: [Users] Live storage migration snapshot removal (fails)
On Fri, Nov 08, 2013 at 02:20:39PM +0100, Sander Grendelman wrote:
<snip>
9d4e8a43-4851-42ff-a684-f3d802527cf7/c512267d-ebba-4907-a782-fec9b6c95116 52178458-1764-4317-b85b-71843054aae9::WARNING::2013-11-08 14:02:53,772::image::1164::Storage.Image::(merge) Auto shrink after merge failed Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 1162, in merge srcVol.shrinkToOptimalSize() File "/usr/share/vdsm/storage/blockVolume.py", line 320, in shrinkToOptimalSize qemuImg.FORMAT.QCOW2) File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 109, in check raise QImgError(rc, out, err, "unable to parse qemu-img check output") QImgError: ecode=0, stdout=['No errors were found on the image.'], stderr=[], message=unable to parse qemu-img check output
I'm not sure that it's the only problem in this flow, but there's a clear bug in lib/vdsm/qemuImg.py's check() function: it fails to parse the output of qemu-img.
Would you open a bug on that? I found no open one.
I remember that this was discussed and the agreement was that if the offset is not reported by qemu-img we should have used the old method to calculate the new volume size. We'll probably need to verify it. Sander can you open a bug on this? Thanks, -- Federico

I can open a bug on this. Should I file it for the entire "snapshot removal fails"-problem or just this error? I've looked at the parsing error and the problem seems to be that the qemuImg.check() function expects a qemu-img version that outputs a string like: "Image end offset: 262144". The only version I could find that does that is the one in the F19 virt-preview repository. Stock F19 qemu-img doesn't have this output. Nor do the stock EL6 or qemu-img-rhev EL6 versions. The only function that calls qemuImg.check is blockVolume.shrinkToOptimalSize() this function needs (and uses) the parsed offset to shrink the LV containing the qcow2 image. On Fri, Nov 8, 2013 at 4:33 PM, Federico Simoncelli <fsimonce@redhat.com> wrote:
----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Sander Grendelman" <sander@grendelman.com> Cc: users@ovirt.org, fsimonce@redhat.com, ykaplan@redhat.com Sent: Friday, November 8, 2013 4:06:53 PM Subject: Re: [Users] Live storage migration snapshot removal (fails)
On Fri, Nov 08, 2013 at 02:20:39PM +0100, Sander Grendelman wrote:
<snip>
9d4e8a43-4851-42ff-a684-f3d802527cf7/c512267d-ebba-4907-a782-fec9b6c95116 52178458-1764-4317-b85b-71843054aae9::WARNING::2013-11-08 14:02:53,772::image::1164::Storage.Image::(merge) Auto shrink after merge failed Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 1162, in merge srcVol.shrinkToOptimalSize() File "/usr/share/vdsm/storage/blockVolume.py", line 320, in shrinkToOptimalSize qemuImg.FORMAT.QCOW2) File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 109, in check raise QImgError(rc, out, err, "unable to parse qemu-img check output") QImgError: ecode=0, stdout=['No errors were found on the image.'], stderr=[], message=unable to parse qemu-img check output
I'm not sure that it's the only problem in this flow, but there's a clear bug in lib/vdsm/qemuImg.py's check() function: it fails to parse the output of qemu-img.
Would you open a bug on that? I found no open one.
I remember that this was discussed and the agreement was that if the offset is not reported by qemu-img we should have used the old method to calculate the new volume size.
We'll probably need to verify it. Sander can you open a bug on this?
Thanks, -- Federico

On Mon, Nov 11, 2013 at 03:11:22PM +0100, Sander Grendelman wrote:
I can open a bug on this.
Should I file it for the entire "snapshot removal fails"-problem or just this error?
"live storage migration snapshot removal fails due to unexpected qemu-img output", followed by your analisys below, sounds great to me.
I've looked at the parsing error and the problem seems to be that the qemuImg.check() function expects a qemu-img version that outputs a string like: "Image end offset: 262144".
The only version I could find that does that is the one in the F19 virt-preview repository. Stock F19 qemu-img doesn't have this output. Nor do the stock EL6 or qemu-img-rhev EL6 versions.
The only function that calls qemuImg.check is blockVolume.shrinkToOptimalSize() this function needs (and uses) the parsed offset to shrink the LV containing the qcow2 image.

Done: https://bugzilla.redhat.com/show_bug.cgi?id=1029069 On Mon, Nov 11, 2013 at 4:02 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Mon, Nov 11, 2013 at 03:11:22PM +0100, Sander Grendelman wrote:
I can open a bug on this.
Should I file it for the entire "snapshot removal fails"-problem or just this error?
"live storage migration snapshot removal fails due to unexpected qemu-img output", followed by your analisys below, sounds great to me.
I've looked at the parsing error and the problem seems to be that the qemuImg.check() function expects a qemu-img version that outputs a string like: "Image end offset: 262144".
The only version I could find that does that is the one in the F19 virt-preview repository. Stock F19 qemu-img doesn't have this output. Nor do the stock EL6 or qemu-img-rhev EL6 versions.
The only function that calls qemuImg.check is blockVolume.shrinkToOptimalSize() this function needs (and uses) the parsed offset to shrink the LV containing the qcow2 image.

On Mon, Nov 11, 2013 at 4:33 PM, Sander Grendelman <sander@grendelman.com> wrote:
The patch at http://gerrit.ovirt.org/#/c/19983/ from https://bugzilla.redhat.com/show_bug.cgi?id=1015071 seems to fix this issue. I could not patch storage/image.py in our main environment so I had a nice exercise in building an oVirt environment on top of nested KVM virtualization on my laptop.
participants (3)
-
Dan Kenigsberg
-
Federico Simoncelli
-
Sander Grendelman