New subject: Error while removing snapshot: Unable to get volume info

10 Jan 2022

      Hi all,

I'm trying to remove a snapshot from a HA VM in a setup with glusterfs (2 nodes C8 stream oVirt 4.4 + 1 arbiter C8). The error that appears in the vdsm log of the host is:

2022-01-10 09:33:03,003+0100 ERROR (jsonrpc/4) [api] FINISH merge error=Merge failed: {'top': '441354e7-c234-4079-b494-53fa99cdce6f', 'base': 'fdf38f20-3416-4d75-a159-2a341b1ed637', 'job': '50206e3a-8018-4ea8-b191-e4bc859ae0c7', 'reason': 'Unable to get volume info for domain 574a3cd1-5617-4742-8de9-4732be4f27e0 volume 441354e7-c234-4079-b494-53fa99cdce6f'} (api:131)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/livemerge.py", line 285, in merge
    drive.domainID, drive.poolID, drive.imageID, job.top)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5988, in getVolumeInfo
    (domainID, volumeID))
vdsm.virt.errors.StorageUnavailableError: Unable to get volume info for domain 574a3cd1-5617-4742-8de9-4732be4f27e0 volume 441354e7-c234-4079-b494-53fa99cdce6f

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 124, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 776, in merge
    drive, baseVolUUID, topVolUUID, bandwidth, jobUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5833, in merge
    driveSpec, baseVolUUID, topVolUUID, bandwidth, jobUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/livemerge.py", line 288, in merge
    str(e), top=top, base=job.base, job=job_id)

The volume list in the host differs from the engine one:

HOST:

vdsm-tool dump-volume-chains 574a3cd1-5617-4742-8de9-4732be4f27e0 | grep -A10 0b995271-e7f3-41b3-aff7-b5ad7942c10d
   image:    0b995271-e7f3-41b3-aff7-b5ad7942c10d

             - fdf38f20-3416-4d75-a159-2a341b1ed637
               status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE, capacity: 53687091200, truesize: 44255387648

             - 10df3adb-38f4-41d1-be84-b8b5b86e92cc
               status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE, capacity: 53687091200, truesize: 7335407616

ls -1 0b995271-e7f3-41b3-aff7-b5ad7942c10d
10df3adb-38f4-41d1-be84-b8b5b86e92cc
10df3adb-38f4-41d1-be84-b8b5b86e92cc.lease
10df3adb-38f4-41d1-be84-b8b5b86e92cc.meta
fdf38f20-3416-4d75-a159-2a341b1ed637
fdf38f20-3416-4d75-a159-2a341b1ed637.lease
fdf38f20-3416-4d75-a159-2a341b1ed637.meta

ENGINE:

engine=# select * from images where image_group_id='0b995271-e7f3-41b3-aff7-b5ad7942c10d';
-[ RECORD 1 ]---------+-------------------------------------
image_guid            | 10df3adb-38f4-41d1-be84-b8b5b86e92cc
creation_date         | 2022-01-07 11:23:43+01
size                  | 53687091200
it_guid               | 00000000-0000-0000-0000-000000000000
parentid              | 441354e7-c234-4079-b494-53fa99cdce6f
imagestatus           | 1
lastmodified          | 2022-01-07 11:23:39.951+01
vm_snapshot_id        | bd2291a4-8018-4874-a400-8d044a95347d
volume_type           | 2
volume_format         | 4
image_group_id        | 0b995271-e7f3-41b3-aff7-b5ad7942c10d
_create_date          | 2022-01-07 11:23:41.448463+01
_update_date          | 2022-01-07 11:24:10.414777+01
active                | t
volume_classification | 0
qcow_compat           | 2
-[ RECORD 2 ]---------+-------------------------------------
image_guid            | 441354e7-c234-4079-b494-53fa99cdce6f
creation_date         | 2021-12-15 07:16:31.647+01
size                  | 53687091200
it_guid               | 00000000-0000-0000-0000-000000000000
parentid              | fdf38f20-3416-4d75-a159-2a341b1ed637
imagestatus           | 1
lastmodified          | 2022-01-07 11:23:41.448+01
vm_snapshot_id        | 2d610958-59e3-4685-b209-139b4266012f
volume_type           | 2
volume_format         | 4
image_group_id        | 0b995271-e7f3-41b3-aff7-b5ad7942c10d
_create_date          | 2021-12-15 07:16:32.37005+01
_update_date          | 2022-01-07 11:23:41.448463+01
active                | f
volume_classification | 1
qcow_compat           | 0
-[ RECORD 3 ]---------+-------------------------------------
image_guid            | fdf38f20-3416-4d75-a159-2a341b1ed637
creation_date         | 2020-08-12 17:16:07+02
size                  | 53687091200
it_guid               | 00000000-0000-0000-0000-000000000000
parentid              | 00000000-0000-0000-0000-000000000000
imagestatus           | 4
lastmodified          | 2021-12-15 07:16:32.369+01
vm_snapshot_id        | 603811ba-3cdd-4388-a971-05e300ced0c3
volume_type           | 2
volume_format         | 4
image_group_id        | 0b995271-e7f3-41b3-aff7-b5ad7942c10d
_create_date          | 2020-08-12 17:16:07.506823+02
_update_date          | 2021-12-15 07:16:32.37005+01
active                | f
volume_classification | 1
qcow_compat           | 2

However in the engine gui I see only two snapshots ID:

1- 10df3adb-38f4-41d1-be84-b8b5b86e92cc (status ok)
2- 441354e7-c234-4079-b494-53fa99cdce6f (Disk status illegal)

So the situation is:
- on the host I see two volumes, status OK 
- on the engine GUI I see two volumes, one OK and the other one, disk in illegal status
- on the engine DB I see three volumes, that are the combo of the two previous situations

I would avoid restarting the VM, any advice for fixing this messy situation? I can attach engine log/vdsm.log 

Thank you for your time and help,
Francesco

Error while removing snapshot: Unable to get volume info

francesco＠shellrent.com

Francesco Lorenzini

Nir Soffer

tags

participants (3)