On Fri, Oct 30, 2020 at 10:43 AM Andrei Verovski <andreil1(a)starlett.lv> wrote:
Hi,
I have oVirt 4.4 with nodes 4.2.8
Attempt to create snapshot failed with:
EventID 10802 VDSM node11 command SnapshotVDS failed: Message timeout can be caused by
communication issues.
Event ID 2022 Add-disk operation failed to complete.
However, snapshot is on disk since VM now takes 1200GB instead of 500.
Yes snapshot list in VM details is empty.
How to recover disk space occupied by unsuccessful snapshot ?
I looked at /vmraid/nfs/disks/xxxx-xxxx-xxx/images
Is /vmraid/nfs/disks/ your storage domain directory on the server?
You can find the disk uuid on the engine side, and search the disk
snapshots engine
knows about:
$ sudo -u postgres psql -d engine
engine=# select image_group_id,image_guid from images where
image_group_id = '4b62aa6d-3bdd-4db3-b26f-0484c4124631';
image_group_id | image_guid
--------------------------------------+--------------------------------------
4b62aa6d-3bdd-4db3-b26f-0484c4124631 | 20391fbc-fa77-4fef-9aea-cf59f27f90b5
4b62aa6d-3bdd-4db3-b26f-0484c4124631 | 13ad63bf-71e2-4243-8967-ab4324818b31
4b62aa6d-3bdd-4db3-b26f-0484c4124631 | b5e56953-bf86-4496-a6bd-19132cba9763
(3 rows)
Looking up the same image on storage will be:
$ find /rhev/data-center/mnt/nfs1\:_export_3/ -name
4b62aa6d-3bdd-4db3-b26f-0484c4124631
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631
$ ls -1
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631
13ad63bf-71e2-4243-8967-ab4324818b31
13ad63bf-71e2-4243-8967-ab4324818b31.lease
13ad63bf-71e2-4243-8967-ab4324818b31.meta
20391fbc-fa77-4fef-9aea-cf59f27f90b5
20391fbc-fa77-4fef-9aea-cf59f27f90b5.lease
20391fbc-fa77-4fef-9aea-cf59f27f90b5.meta
b5e56953-bf86-4496-a6bd-19132cba9763
b5e56953-bf86-4496-a6bd-19132cba9763.lease
b5e56953-bf86-4496-a6bd-19132cba9763.meta
Note that for every image we have a file without extension (the actual
image data)
and .meta and .lease files.
You can find the leaf volume - the volume used by the VM like this:
$ grep LEAF
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/*.meta
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/b5e56953-bf86-4496-a6bd-19132cba9763.meta:VOLTYPE=LEAF
From this you can get the image chain:
$ sudo -u vdsm qemu-img info --backing-chain
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/b5e56953-bf86-4496-a6bd-19132cba9763
[sudo] password for nsoffer:
image:
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/b5e56953-bf86-4496-a6bd-19132cba9763
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 103 MiB
cluster_size: 65536
backing file: 13ad63bf-71e2-4243-8967-ab4324818b31 (actual path:
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/13ad63bf-71e2-4243-8967-ab4324818b31)
backing file format: qcow2
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
refcount bits: 16
corrupt: false
image:
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/13ad63bf-71e2-4243-8967-ab4324818b31
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 4.88 MiB
cluster_size: 65536
backing file: 20391fbc-fa77-4fef-9aea-cf59f27f90b5 (actual path:
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/20391fbc-fa77-4fef-9aea-cf59f27f90b5)
backing file format: qcow2
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
refcount bits: 16
corrupt: false
image:
/rhev/data-center/mnt/nfs1:_export_3/f5915245-0ac5-4712-b8b2-dd4d4be7cdc4/images/4b62aa6d-3bdd-4db3-b26f-0484c4124631/20391fbc-fa77-4fef-9aea-cf59f27f90b5
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 1.65 GiB
cluster_size: 65536
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
refcount bits: 16
corrupt: false
Any other image in this directory which engine does not know about and
is not part
of the backing chain is safe to to delete.
For example if you found unknown files:
3b37d7e3-a6d7-4f76-9145-2732669d2ebd
3b37d7e3-a6d7-4f76-9145-2732669d2ebd.meta
3b37d7e3-a6d7-4f76-9145-2732669d2ebd.lease
You can safely delete them.
This may be more complicated if the snapshot failed after changing the .meta
file, and your leaf is the image that engine does not know about. In this case
you will have to fix the metadata.
If you are not sure, about this please share output of qemu-img info
--backing-chain
shown above and contents of your .meta files in the image directory.
Nir