[ovirt-users] Re: VM Snapshot inconsistent

14 Jul 2020

      On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind
<arsene.gschwind@unibas.ch> wrote:
...
On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:
On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind
<
arsene.gschwind@unibas.ch
...
wrote:
Hi,
I running oVirt 4.3.9 with FC based storage.
I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent.
disk1 : Snapshot still visible in DB and on Storage using LVM commands
disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly)
disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly)
When I try to delete the snapshot again it runs forever and nothing happens.
Did you try also when the vm is not running?
Yes I've tried that without success
In general the system is designed so trying again a failed merge will complete
the merge.
If the merge does complete, there may be some bug that the system cannot
handle.
Is there a way to suppress that snapshot?
Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB?
Yes but it is complicated. You need to understand the qcow2 chain
on storage, complete the merge manually using qemu-img commit,
update the metadata manually (even harder), then update engine db.
The best way - if the system cannot recover, is to fix the bad metadata
that cause the system to fail, and the let the system recover itself.
Which storage domain format are you using? V5? V4?
I'm using storage format V5 on FC.
Fixing the metadata is not easy.

First you have to find the volumes related to this disk. You can find
the disk uuid and storage
domain uuid in engine ui, and then you can find the volumes like this:

lvs -o vg_name,lv_name,tags | grep disk-uuid

For every lv, you will have a tag MD_N where n is a number. This is
the slot number
in the metadata volume.

You need to calculate the offset of the metadata area for every volume using:

    offset = 1024*1024 + 8192 * N

Then you can copy the metadata block using:

    dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset
conv=skip_bytes > lv-name.meta

Please share these files.

This part is not needed in 4.4, we have a new StorageDomain dump API,
that can find the same
info in one command:

    vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \
        jq '.volumes | .[] | select(.image=="disk-uuid")'

The second step is to see what is the actual qcow2 chain. Find the
volume which is the LEAF
by grepping the metadata files. In some cases you may have more than
one LEAF (which may
be the problem).

Then activate all volumes using:

    lvchange -ay vg-name/lv-name

Now you can get the backing chain using qemu-img and the LEAF volume.

    qemu-img info --backing-chain /dev/vg-name/lv-name

If you have more than one LEAF, run this on all LEAFs. Ony one of them
will be correct.

Please share also output of qemu-img.

Once we finished with the volumes, deactivate them:

    lvchange -an vg-name/lv-name

Based on the output, we can tell what is the real chain, and what is
the chain as seen by
vdsm metadata, and what is the required fix.

Nir
...
Thanks.
Thanks for any hint or help.
rgds , arsene
--
Arsène Gschwind <
arsene.gschwind@unibas.ch
...
Universitaet Basel
_______________________________________________
Users mailing list --
users@ovirt.org
To unsubscribe send an email to
users-leave@ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2...
--
Arsène Gschwind
Fa. Sapify AG im Auftrag der universitaet Basel
IT Services
Klinelbergstr. 70 | CH-4056 Basel | Switzerland
Tel: +41 79 449 25 63 | http://its.unibas.ch
ITS-ServiceDesk: support-its@unibas.ch | +41 61 267 14 11