Error while removing snapshot: Unable to get volume info
by francesco@shellrent.com
Hi all,
I'm trying to remove a snapshot from a HA VM in a setup with glusterfs (2 nodes C8 stream oVirt 4.4 + 1 arbiter C8). The error that appears in the vdsm log of the host is:
2022-01-10 09:33:03,003+0100 ERROR (jsonrpc/4) [api] FINISH merge error=Merge failed: {'top': '441354e7-c234-4079-b494-53fa99cdce6f', 'base': 'fdf38f20-3416-4d75-a159-2a341b1ed637', 'job': '50206e3a-8018-4ea8-b191-e4bc859ae0c7', 'reason': 'Unable to get volume info for domain 574a3cd1-5617-4742-8de9-4732be4f27e0 volume 441354e7-c234-4079-b494-53fa99cdce6f'} (api:131)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/livemerge.py", line 285, in merge
drive.domainID, drive.poolID, drive.imageID, job.top)
File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5988, in getVolumeInfo
(domainID, volumeID))
vdsm.virt.errors.StorageUnavailableError: Unable to get volume info for domain 574a3cd1-5617-4742-8de9-4732be4f27e0 volume 441354e7-c234-4079-b494-53fa99cdce6f
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 124, in method
ret = func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 776, in merge
drive, baseVolUUID, topVolUUID, bandwidth, jobUUID)
File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5833, in merge
driveSpec, baseVolUUID, topVolUUID, bandwidth, jobUUID)
File "/usr/lib/python3.6/site-packages/vdsm/virt/livemerge.py", line 288, in merge
str(e), top=top, base=job.base, job=job_id)
The volume list in the host differs from the engine one:
HOST:
vdsm-tool dump-volume-chains 574a3cd1-5617-4742-8de9-4732be4f27e0 | grep -A10 0b995271-e7f3-41b3-aff7-b5ad7942c10d
image: 0b995271-e7f3-41b3-aff7-b5ad7942c10d
- fdf38f20-3416-4d75-a159-2a341b1ed637
status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE, capacity: 53687091200, truesize: 44255387648
- 10df3adb-38f4-41d1-be84-b8b5b86e92cc
status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE, capacity: 53687091200, truesize: 7335407616
ls -1 0b995271-e7f3-41b3-aff7-b5ad7942c10d
10df3adb-38f4-41d1-be84-b8b5b86e92cc
10df3adb-38f4-41d1-be84-b8b5b86e92cc.lease
10df3adb-38f4-41d1-be84-b8b5b86e92cc.meta
fdf38f20-3416-4d75-a159-2a341b1ed637
fdf38f20-3416-4d75-a159-2a341b1ed637.lease
fdf38f20-3416-4d75-a159-2a341b1ed637.meta
ENGINE:
engine=# select * from images where image_group_id='0b995271-e7f3-41b3-aff7-b5ad7942c10d';
-[ RECORD 1 ]---------+-------------------------------------
image_guid | 10df3adb-38f4-41d1-be84-b8b5b86e92cc
creation_date | 2022-01-07 11:23:43+01
size | 53687091200
it_guid | 00000000-0000-0000-0000-000000000000
parentid | 441354e7-c234-4079-b494-53fa99cdce6f
imagestatus | 1
lastmodified | 2022-01-07 11:23:39.951+01
vm_snapshot_id | bd2291a4-8018-4874-a400-8d044a95347d
volume_type | 2
volume_format | 4
image_group_id | 0b995271-e7f3-41b3-aff7-b5ad7942c10d
_create_date | 2022-01-07 11:23:41.448463+01
_update_date | 2022-01-07 11:24:10.414777+01
active | t
volume_classification | 0
qcow_compat | 2
-[ RECORD 2 ]---------+-------------------------------------
image_guid | 441354e7-c234-4079-b494-53fa99cdce6f
creation_date | 2021-12-15 07:16:31.647+01
size | 53687091200
it_guid | 00000000-0000-0000-0000-000000000000
parentid | fdf38f20-3416-4d75-a159-2a341b1ed637
imagestatus | 1
lastmodified | 2022-01-07 11:23:41.448+01
vm_snapshot_id | 2d610958-59e3-4685-b209-139b4266012f
volume_type | 2
volume_format | 4
image_group_id | 0b995271-e7f3-41b3-aff7-b5ad7942c10d
_create_date | 2021-12-15 07:16:32.37005+01
_update_date | 2022-01-07 11:23:41.448463+01
active | f
volume_classification | 1
qcow_compat | 0
-[ RECORD 3 ]---------+-------------------------------------
image_guid | fdf38f20-3416-4d75-a159-2a341b1ed637
creation_date | 2020-08-12 17:16:07+02
size | 53687091200
it_guid | 00000000-0000-0000-0000-000000000000
parentid | 00000000-0000-0000-0000-000000000000
imagestatus | 4
lastmodified | 2021-12-15 07:16:32.369+01
vm_snapshot_id | 603811ba-3cdd-4388-a971-05e300ced0c3
volume_type | 2
volume_format | 4
image_group_id | 0b995271-e7f3-41b3-aff7-b5ad7942c10d
_create_date | 2020-08-12 17:16:07.506823+02
_update_date | 2021-12-15 07:16:32.37005+01
active | f
volume_classification | 1
qcow_compat | 2
However in the engine gui I see only two snapshots ID:
1- 10df3adb-38f4-41d1-be84-b8b5b86e92cc (status ok)
2- 441354e7-c234-4079-b494-53fa99cdce6f (Disk status illegal)
So the situation is:
- on the host I see two volumes, status OK
- on the engine GUI I see two volumes, one OK and the other one, disk in illegal status
- on the engine DB I see three volumes, that are the combo of the two previous situations
I would avoid restarting the VM, any advice for fixing this messy situation? I can attach engine log/vdsm.log
Thank you for your time and help,
Francesco