Hi everyone,
Thank you everyone who answered.
In fact, I will be glad to file a bug when I'm done with recovering this
very serious VM. But my main concern now is to be able to run it asap or
switch to the painfull way of tape recovering.
I found similarities between some already filed bugs and my issue, but I
think my issue is much simpler. In my case :
- the VM has only one disk
- the whole oVirt setup is using an iSCSI SAN
- the VM was shut, there were no attempt to do a live snapshot
- I did not stop the engine during to delete or whatever disturbing action
- I did the exact same steps two days ago on a test VM and it ran fine.
- In between I did not upgrade or reset anything
I found in the mail below many many common points :
By reading my logs, some of you jumped to the python errors, but when
looking far above, one can see some previous (non-python) errors
complaining about some logical volume not found.
Today, I had no more log written in engine.log, so I decided to restart
the engine :
- Logs came back (...).
- In the faulty VM, now I see NO snapshot at all.
- I still see the disk
- Trying to start the VM leads to the following error :
VM uc-674 is down. Exit message: internal error process exited while
connecting to monitor: qemu-kvm: -drive
file=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/11a077c7-658b-49bb-8596-a785109c24c9/images/69220da6-eeed-4435-aad0-7aa33f3a0d21/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23,if=none,id=drive-virtio-disk0,format=qcow2,serial=69220da6-eeed-4435-aad0-7aa33f3a0d21,cache=none,werror=stop,rerror=stop,aio=native:
could not open disk image
/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/11a077c7-658b-49bb-8596-a785109c24c9/images/69220da6-eeed-4435-aad0-7aa33f3a0d21/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23:
Invalid argument .
And indeed, from the SPM I'm looking for the device, I see nothing :
[root@serv-vm-adm9 11a077c7-658b-49bb-8596-a785109c24c9]# ls -la
/dev/11a077c7-658b-49bb-8596-a785109c24c9/
total 0
drwxr-xr-x. 2 root root 200 7 janv. 08:23 .
drwxr-xr-x. 21 root root 4480 7 janv. 08:23 ..
lrwxrwxrwx. 1 root root 8 5 déc. 11:58
5c71e53b-21f2-4671-94f8-4603d1b0bf5e -> ../dm-19
lrwxrwxrwx. 1 root root 8 5 déc. 11:58
7369a73a-fea5-40d9-ad0a-7d81a43fe931 -> ../dm-20
lrwxrwxrwx. 1 root root 7 10 oct. 17:22 ids -> ../dm-5
lrwxrwxrwx. 1 root root 7 10 oct. 17:22 inbox -> ../dm-7
lrwxrwxrwx. 1 root root 7 10 oct. 17:22 leases -> ../dm-6
lrwxrwxrwx. 1 root root 7 10 oct. 17:22 master -> ../dm-9
lrwxrwxrwx. 1 root root 7 10 oct. 17:22 metadata -> ../dm-4
lrwxrwxrwx. 1 root root 7 10 oct. 17:22 outbox -> ../dm-8
There is no trace of the lvs it should be using
(/dev/11a077c7-658b-49bb-8596-a785109c24c9/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23).
In the URL I provided above, the op is able to lvchange -aey the device.
In my case, though a lvmdiskscan + a lvs is showing me the LV, there is
not device in /dev/{the proper VG}/{my missing LV}.
Well, the last thing to ask is :
Is there a way to recover it, to recreate an device to access this LV
and to activate it?
--
Nicolas Ecarnot
Le 07/01/2014 04:09, Maor Lipchuk a écrit :
Hi Nicolas,
I think that the initial problem started at 10:06 when VDSM tried to
clear records of the ancestor volume
c50561d9-c3ba-4366-b2bc-49bbfaa4cd23 (see [1])
Looking at bugzilla, it could be related to
https://bugzilla.redhat.com/1029069
(based on the exception described at
https://bugzilla.redhat.com/show_bug.cgi?id=1029069#c1)
The issue there was fixed after an upgrade to 3.3.1 (as Sander mentioned
it before in the mailing list)
Could you give it a try and check if that works for you?
Also it will be great if you could open a bug on that with the full
VDSM, engine logs and the list of lvs.
Regards,
Maor
[1]
236b3c5a-452a-4614-801a-c30cefbce87e::ERROR::2014-01-06
10:06:14,407::task::850::TaskManager.Task::(_setError)
Task=`236b3c5a-452a-4614-801a-c30cefbce87e`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 857, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/storage/task.py", line 318, in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
return f(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots
sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
File "/usr/share/vdsm/storage/image.py", line 1162, in merge
srcVol.shrinkToOptimalSize()
File "/usr/share/vdsm/storage/blockVolume.py", line 315, in
shrinkToOptimalSize
volParams = self.getVolumeParams()
File "/usr/share/vdsm/storage/volume.py", line 1008, in getVolumeParams
volParams['imgUUID'] = self.getImage()
File "/usr/share/vdsm/storage/blockVolume.py", line 494, in getImage
return self.getVolumeTag(TAG_PREFIX_IMAGE)
File "/usr/share/vdsm/storage/blockVolume.py", line 464, in getVolumeTag
return _getVolumeTag(self.sdUUID, self.volUUID, tagPrefix)
File "/usr/share/vdsm/storage/blockVolume.py", line 662, in _getVolumeTag
tags = lvm.getLV(sdUUID, volUUID).tags
File "/usr/share/vdsm/storage/lvm.py", line 851, in getLV
raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))
LogicalVolumeDoesNotExistError: Logical volume does not exist:
('11a077c7-658b-49bb-8596-a785109c24c9/_remove_me_aVmPgweS_c50561d9-c3ba-4366-b2bc-49bbfaa4cd23',)
On 01/06/2014 04:39 PM, Meital Bourvine wrote:
> I got the attachment.
>
> This is the relevant error:
> 6caec3bc-fc66-42be-a642-7733fc033103::ERROR::2014-01-06
10:13:21,068::task::850::TaskManager.Task::(_setError)
Task=`6caec3bc-fc66-42be-a642-7733fc033103`::Unexpected error
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/task.py", line 857, in _run
> return fn(*args, **kargs)
> File "/usr/share/vdsm/storage/task.py", line 318, in run
> return self.cmd(*self.argslist, **self.argsdict)
> File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
> return f(self, *args, **kwargs)
> File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots
> sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
> File "/usr/share/vdsm/storage/image.py", line 1101, in merge
> dstVol = vols[ancestor]
> KeyError: '506085b6-40e0-4176-a4df-9102857f51f2'
>
> I don't know why it happens, so you'll have to wait for someone else to
answer.
>
> ----- Original Message -----
>> From: "Nicolas Ecarnot" <nicolas(a)ecarnot.net>
>> To: "users" <users(a)ovirt.org>
>> Sent: Monday, January 6, 2014 4:22:57 PM
>> Subject: Re: [Users] Unable to delete a snapshot
>>
>> Le 06/01/2014 12:51, Nicolas Ecarnot a écrit :
>>>> Also, Please attach the whole vdsm.log, it's hard to read it this
way
>>>> (lines are broken)
>>>
>>> See attachment.
>>
>> Actually, I don't know if this mailing list allows attachments ?
>>
>> --
>> Nicolas Ecarnot
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
>>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>