[Users] Unable to delete a snapshot

Nicolas Ecarnot nicolas at ecarnot.net
Tue Jan 7 08:02:48 UTC 2014


Hi everyone,

Thank you everyone who answered.
In fact, I will be glad to file a bug when I'm done with recovering this 
very serious VM. But my main concern now is to be able to run it asap or 
switch to the painfull way of tape recovering.

I found similarities between some already filed bugs and my issue, but I 
think my issue is much simpler. In my case :
- the VM has only one disk
- the whole oVirt setup is using an iSCSI SAN
- the VM was shut, there were no attempt to do a live snapshot
- I did not stop the engine during to delete or whatever disturbing action
- I did the exact same steps two days ago on a test VM and it ran fine.
- In between I did not upgrade or reset anything

I found in the mail below many many common points :

http://list-archives.org/2013/10/25/users-ovirt-org/vm-snapshot-delete-failed-iscsi-domain/f/6837397684

By reading my logs, some of you jumped to the python errors, but when 
looking far above, one can see some previous (non-python) errors 
complaining about some logical volume not found.


Today, I had no more log written in engine.log, so I decided to restart 
the engine :
- Logs came back (...).
- In the faulty VM, now I see NO snapshot at all.
- I still see the disk
- Trying to start the VM leads to the following error :

VM uc-674 is down. Exit message: internal error process exited while 
connecting to monitor: qemu-kvm: -drive 
file=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/11a077c7-658b-49bb-8596-a785109c24c9/images/69220da6-eeed-4435-aad0-7aa33f3a0d21/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23,if=none,id=drive-virtio-disk0,format=qcow2,serial=69220da6-eeed-4435-aad0-7aa33f3a0d21,cache=none,werror=stop,rerror=stop,aio=native: 
could not open disk image 
/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/11a077c7-658b-49bb-8596-a785109c24c9/images/69220da6-eeed-4435-aad0-7aa33f3a0d21/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23: 
Invalid argument .

And indeed, from the SPM I'm looking for the device, I see nothing :
[root at serv-vm-adm9 11a077c7-658b-49bb-8596-a785109c24c9]# ls -la 
/dev/11a077c7-658b-49bb-8596-a785109c24c9/
total 0
drwxr-xr-x.  2 root root  200  7 janv. 08:23 .
drwxr-xr-x. 21 root root 4480  7 janv. 08:23 ..
lrwxrwxrwx.  1 root root    8  5 déc.  11:58 
5c71e53b-21f2-4671-94f8-4603d1b0bf5e -> ../dm-19
lrwxrwxrwx.  1 root root    8  5 déc.  11:58 
7369a73a-fea5-40d9-ad0a-7d81a43fe931 -> ../dm-20
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 ids -> ../dm-5
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 inbox -> ../dm-7
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 leases -> ../dm-6
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 master -> ../dm-9
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 metadata -> ../dm-4
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 outbox -> ../dm-8

There is no trace of the lvs it should be using 
(/dev/11a077c7-658b-49bb-8596-a785109c24c9/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23).

In the URL I provided above, the op is able to lvchange -aey the device.
In my case, though a lvmdiskscan + a lvs is showing me the LV, there is 
not device in /dev/{the proper VG}/{my missing LV}.

Well, the last thing to ask is :

Is there a way to recover it, to recreate an device to access this LV 
and to activate it?

-- 
Nicolas Ecarnot

Le 07/01/2014 04:09, Maor Lipchuk a écrit :
> Hi Nicolas,
> I think that the initial problem started at 10:06 when VDSM tried to
> clear records of the ancestor volume
> c50561d9-c3ba-4366-b2bc-49bbfaa4cd23 (see [1])
>
> Looking at bugzilla, it could be related to
> https://bugzilla.redhat.com/1029069
> (based on the exception described at
> https://bugzilla.redhat.com/show_bug.cgi?id=1029069#c1)
>
> The issue there was fixed after an upgrade to 3.3.1 (as Sander mentioned
> it before in the mailing list)
>
> Could you give it a try and check if that works for you?
>
> Also it will be great if you could open a bug on that with the full
> VDSM, engine logs and the list of lvs.
>
> Regards,
> Maor
>
>
>
> [1]
> 236b3c5a-452a-4614-801a-c30cefbce87e::ERROR::2014-01-06
> 10:06:14,407::task::850::TaskManager.Task::(_setError)
> Task=`236b3c5a-452a-4614-801a-c30cefbce87e`::Unexpected error
> Traceback (most recent call last):
>    File "/usr/share/vdsm/storage/task.py", line 857, in _run
>      return fn(*args, **kargs)
>    File "/usr/share/vdsm/storage/task.py", line 318, in run
>      return self.cmd(*self.argslist, **self.argsdict)
>    File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
>      return f(self, *args, **kwargs)
>    File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots
>      sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
>    File "/usr/share/vdsm/storage/image.py", line 1162, in merge
>      srcVol.shrinkToOptimalSize()
>    File "/usr/share/vdsm/storage/blockVolume.py", line 315, in
> shrinkToOptimalSize
>      volParams = self.getVolumeParams()
>    File "/usr/share/vdsm/storage/volume.py", line 1008, in getVolumeParams
>      volParams['imgUUID'] = self.getImage()
>    File "/usr/share/vdsm/storage/blockVolume.py", line 494, in getImage
>      return self.getVolumeTag(TAG_PREFIX_IMAGE)
>    File "/usr/share/vdsm/storage/blockVolume.py", line 464, in getVolumeTag
>      return _getVolumeTag(self.sdUUID, self.volUUID, tagPrefix)
>    File "/usr/share/vdsm/storage/blockVolume.py", line 662, in _getVolumeTag
>      tags = lvm.getLV(sdUUID, volUUID).tags
>    File "/usr/share/vdsm/storage/lvm.py", line 851, in getLV
>      raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))
> LogicalVolumeDoesNotExistError: Logical volume does not exist:
> ('11a077c7-658b-49bb-8596-a785109c24c9/_remove_me_aVmPgweS_c50561d9-c3ba-4366-b2bc-49bbfaa4cd23',)
>
>
> On 01/06/2014 04:39 PM, Meital Bourvine wrote:
>> I got the attachment.
>>
>> This is the relevant error:
>> 6caec3bc-fc66-42be-a642-7733fc033103::ERROR::2014-01-06 10:13:21,068::task::850::TaskManager.Task::(_setError) Task=`6caec3bc-fc66-42be-a642-7733fc033103`::Unexpected error
>> Traceback (most recent call last):
>>    File "/usr/share/vdsm/storage/task.py", line 857, in _run
>>      return fn(*args, **kargs)
>>    File "/usr/share/vdsm/storage/task.py", line 318, in run
>>      return self.cmd(*self.argslist, **self.argsdict)
>>    File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
>>      return f(self, *args, **kwargs)
>>    File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots
>>      sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
>>    File "/usr/share/vdsm/storage/image.py", line 1101, in merge
>>      dstVol = vols[ancestor]
>> KeyError: '506085b6-40e0-4176-a4df-9102857f51f2'
>>
>> I don't know why it happens, so you'll have to wait for someone else to answer.
>>
>> ----- Original Message -----
>>> From: "Nicolas Ecarnot" <nicolas at ecarnot.net>
>>> To: "users" <users at ovirt.org>
>>> Sent: Monday, January 6, 2014 4:22:57 PM
>>> Subject: Re: [Users] Unable to delete a snapshot
>>>
>>> Le 06/01/2014 12:51, Nicolas Ecarnot a écrit :
>>>>> Also, Please attach the whole vdsm.log, it's hard to read it this way
>>>>> (lines are broken)
>>>>
>>>> See attachment.
>>>
>>> Actually, I don't know if this mailing list allows attachments ?
>>>
>>> --
>>> Nicolas Ecarnot
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>


-- 
Nicolas Ecarnot



More information about the Users mailing list