oVirt 4.2.5 : VM snapshot creation does not work : command HSMGetAllTasksStatusesVDS failed: Could not acquire resource

Hello We use oVirt 4.2.5.2-1.el7 (Hosted engine / 4 hosts in cluster / about twenty virtual machines) Virtual machine disks are located on the Data Domain from FC SAN. Snapshots of all virtual machines are created normally. But for one virtual machine, we can not create a snapshot. When we try to create a snapshot in the oVirt web console, we see such errors: Aug 13, 2018, 1:05:06 PM Failed to complete snapshot 'KOM-APP14_BACKUP01' creation for VM 'KOM-APP14'. Aug 13, 2018, 1:05:01 PM VDSM KOM-VM14 command HSMGetAllTasksStatusesVDS failed: Could not acquire resource. Probably resource factory threw an exception.: () Aug 13, 2018, 1:05:00 PM Snapshot 'KOM-APP14_BACKUP01' creation for VM 'KOM-APP14' was initiated by petya@sub.holding.com@sub.holding.com-authz. At this time on the server with the role of "SPM" in the vdsm.log we see this: ... 2018-08-13 05:05:06,471-0500 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats succeeded in 0.00 seconds (__init__:573) 2018-08-13 05:05:06,478-0500 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Image.deleteVolumes succeeded in 0.05 seconds (__init__:573) 2018-08-13 05:05:06,478-0500 INFO (tasks/3) [storage.ThreadPool.WorkerThread] START task bb45ae7e-77e9-4fec-9ee2-8e1f0ad3d589 (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7f06b85a2128>>, args=None) (threadPool:208) 2018-08-13 05:05:07,009-0500 WARN (tasks/3) [storage.ResourceManager] Resource factory failed to create resource '01_img_6db73566-0f7f-4438-a9ef-6815075f45ea.cdf1751b-64d3-42bc-b9ef-b0174c7ea068'. Canceling request. (resourceManager:543) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 539, in registerResource obj = namespaceObj.factory.createResource(name, lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList imgUUID=resourceName) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 213, in getChain if srcVol.isLeaf(): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1430, in isLeaf return self._manifest.isLeaf() File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 139, in isLeaf return self.getVolType() == sc.type2name(sc.LEAF_VOL) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 135, in getVolType self.voltype = self.getMetaParam(sc.VOLTYPE) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 119, in getMetaParam meta = self.getMetadata() File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 112, in getMetadata md = VolumeMetadata.from_lines(lines) File "/usr/lib/python2.7/site-packages/vdsm/storage/volumemetadata.py", line 103, in from_lines "Missing metadata key: %s: found: %s" % (e, md)) MetaDataKeyNotFoundError: Meta Data key not found error: ("Missing metadata key: 'DOMAIN': found: {'NONE': '######################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################'}",) 2018-08-13 05:05:07,010-0500 WARN (tasks/3) [storage.ResourceManager.Request] (ResName='01_img_6db73566-0f7f-4438-a9ef-6815075f45ea.cdf1751b-64d3-42bc-b9ef-b0174c7ea068', ReqID='3d924e5e-60d1-47b0-86a7-c63585b56f09') Tried to cancel a processed request (resourceManager:187) 2018-08-13 05:05:07,010-0500 ERROR (tasks/3) [storage.TaskManager.Task] (Task='bb45ae7e-77e9-4fec-9ee2-8e1f0ad3d589') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1966, in deleteVolume with rm.acquireResource(img_ns, imgUUID, rm.EXCLUSIVE): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 1025, in acquireResource return _manager.acquireResource(namespace, name, lockType, timeout=timeout) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 475, in acquireResource raise se.ResourceAcqusitionFailed() ResourceAcqusitionFailed: Could not acquire resource. Probably resource factory threw an exception.: () 2018-08-13 05:05:07,059-0500 INFO (tasks/3) [storage.ThreadPool.WorkerThread] FINISH task bb45ae7e-77e9-4fec-9ee2-8e1f0ad3d589 (threadPool:210) 2018-08-13 05:05:07,246-0500 INFO (jsonrpc/1) [root] /usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err= (hooks:110) 2018-08-13 05:05:07,660-0500 INFO (jsonrpc/1) [root] /usr/libexec/vdsm/hooks/after_get_caps/openstacknet_utils.py: rc=0 err= (hooks:110) 2018-08-13 05:05:08,152-0500 INFO (jsonrpc/1) [root] /usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0 err= (hooks:110) Please help us to solve this problem.

On Mon, Aug 13, 2018 at 1:45 PM Aleksey Maksimov < aleksey.i.maksimov@yandex.ru> wrote:
We use oVirt 4.2.5.2-1.el7 (Hosted engine / 4 hosts in cluster / about twenty virtual machines) Virtual machine disks are located on the Data Domain from FC SAN. Snapshots of all virtual machines are created normally. But for one virtual machine, we can not create a snapshot.
When we try to create a snapshot in the oVirt web console, we see such errors:
Aug 13, 2018, 1:05:06 PM Failed to complete snapshot 'KOM-APP14_BACKUP01' creation for VM 'KOM-APP14'. Aug 13, 2018, 1:05:01 PM VDSM KOM-VM14 command HSMGetAllTasksStatusesVDS failed: Could not acquire resource. Probably resource factory threw an exception.: () Aug 13, 2018, 1:05:00 PM Snapshot 'KOM-APP14_BACKUP01' creation for VM 'KOM-APP14' was initiated by petya@sub.holding.com@sub.holding.com-authz.
At this time on the server with the role of "SPM" in the vdsm.log we see this:
... 2018-08-13 05:05:06,471-0500 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats succeeded in 0.00 seconds (__init__:573) 2018-08-13 05:05:06,478-0500 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Image.deleteVolumes succeeded in 0.05 seconds (__init__:573) 2018-08-13 05:05:06,478-0500 INFO (tasks/3) [storage.ThreadPool.WorkerThread] START task bb45ae7e-77e9-4fec-9ee2-8e1f0ad3d589 (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7f06b85a2128>>, args=None) (threadPool:208) 2018-08-13 05:05:07,009-0500 WARN (tasks/3) [storage.ResourceManager] Resource factory failed to create resource '01_img_6db73566-0f7f-4438-a9ef-6815075f45ea.cdf1751b-64d3-42bc-b9ef-b0174c7ea068'. Canceling request. (resourceManager:543) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 539, in registerResource obj = namespaceObj.factory.createResource(name, lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList imgUUID=resourceName) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 213, in getChain if srcVol.isLeaf(): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1430, in isLeaf return self._manifest.isLeaf() File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 139, in isLeaf return self.getVolType() == sc.type2name(sc.LEAF_VOL) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 135, in getVolType self.voltype = self.getMetaParam(sc.VOLTYPE) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 119, in getMetaParam meta = self.getMetadata() File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 112, in getMetadata md = VolumeMetadata.from_lines(lines) File "/usr/lib/python2.7/site-packages/vdsm/storage/volumemetadata.py", line 103, in from_lines "Missing metadata key: %s: found: %s" % (e, md)) MetaDataKeyNotFoundError: Meta Data key not found error: ("Missing metadata key: 'DOMAIN': found: {'NONE': '######################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################'}",)
Looks like you have a volume without metadata in the chain. This may happen in the past when deleting a volume failed, but we cleared the volume metadata. In current 4.2, this cannot happen, since we clear the metadata only if deleting the volume succeeded. Can you post complete vdsm log with this error? Once we find the volume without metadata, we can delete the LV using lvremove. This will fix the issue. Shani, do you remember the bug we have with this error? this probably the same issue. Ala, I think we need to add a tool to check and repair such chains. Nir

On Tue, Aug 14, 2018 at 6:03 PM Алексей Максимов < aleksey.i.maksimov@yandex.ru> wrote:
Hello, Nir
Log in attachment.
In the log we can see both createVolume and deleteVolume fail for this disk uuid: cdf1751b-64d3-42bc-b9ef-b0174c7ea068 1. Please share the output of this command on one of the hosts: lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068 This will show all the volumes belonging to this disk. 2. For every volume, share the output of qemu-img info: If the lv is not active, activate it: lvchange -ay vg_name/lv_name Then run qemu-img info to find the actual chain: qemu-img info --backing /dev/vg_name/lv_name If the lv was not active, deactivate it - we don't want to leave unused lvs active. lvchange -an vg_name/lv_name 3. On of the volume in the chain will not be part of the chain. No other volume will use it as backing file, and it may not have a backing file, or it may point to another volume in the chain. Once we found this volume, please check engine logs for this volume uuid. You will probaly find that the volume was deleted in the past. Maybe you will not find it since it was deleted months or years ago. 4. To verify that this volume does not have metadata, check the volume MD_N tag. N is the offset in 512 bytes blocks from the start of the metadata volume. This will read the volume metadata block: dd if=dev/vg_name/metadata bs=512 count=1 skip=N iflag=direct We expect to see: NONE=####################################################... 5. To remove this volume use: lvremove vg_name/lv_name Once the volume is removed, you will be able to create snapshot.
14.08.2018, 01:30, "Nir Soffer" <nsoffer@redhat.com>:
On Mon, Aug 13, 2018 at 1:45 PM Aleksey Maksimov < aleksey.i.maksimov@yandex.ru> wrote:
We use oVirt 4.2.5.2-1.el7 (Hosted engine / 4 hosts in cluster / about twenty virtual machines) Virtual machine disks are located on the Data Domain from FC SAN. Snapshots of all virtual machines are created normally. But for one virtual machine, we can not create a snapshot.
When we try to create a snapshot in the oVirt web console, we see such errors:
Aug 13, 2018, 1:05:06 PM Failed to complete snapshot 'KOM-APP14_BACKUP01' creation for VM 'KOM-APP14'. Aug 13, 2018, 1:05:01 PM VDSM KOM-VM14 command HSMGetAllTasksStatusesVDS failed: Could not acquire resource. Probably resource factory threw an exception.: () Aug 13, 2018, 1:05:00 PM Snapshot 'KOM-APP14_BACKUP01' creation for VM 'KOM-APP14' was initiated by petya@sub.holding.com@sub.holding.com-authz.
At this time on the server with the role of "SPM" in the vdsm.log we see this:
... 2018-08-13 05:05:06,471-0500 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats succeeded in 0.00 seconds (__init__:573) 2018-08-13 05:05:06,478-0500 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Image.deleteVolumes succeeded in 0.05 seconds (__init__:573) 2018-08-13 05:05:06,478-0500 INFO (tasks/3) [storage.ThreadPool.WorkerThread] START task bb45ae7e-77e9-4fec-9ee2-8e1f0ad3d589 (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7f06b85a2128>>, args=None) (threadPool:208) 2018-08-13 05:05:07,009-0500 WARN (tasks/3) [storage.ResourceManager] Resource factory failed to create resource '01_img_6db73566-0f7f-4438-a9ef- 6815075f45ea.cdf1751b-64d3-42bc-b9ef-b0174c7ea068'. Canceling request. (resourceManager:543) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 539, in registerResource obj = namespaceObj.factory.createResource(name, lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList imgUUID=resourceName) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 213, in getChain if srcVol.isLeaf(): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1430, in isLeaf return self._manifest.isLeaf() File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 139, in isLeaf return self.getVolType() == sc.type2name(sc.LEAF_VOL) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 135, in getVolType self.voltype = self.getMetaParam(sc.VOLTYPE) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 119, in getMetaParam meta = self.getMetadata() File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 112, in getMetadata md = VolumeMetadata.from_lines(lines) File "/usr/lib/python2.7/site-packages/vdsm/storage/volumemetadata.py", line 103, in from_lines "Missing metadata key: %s: found: %s" % (e, md)) MetaDataKeyNotFoundError: Meta Data key not found error: ("Missing metadata key: 'DOMAIN': found: {'NONE': '######################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################'}",)
Looks like you have a volume without metadata in the chain.
This may happen in the past when deleting a volume failed, but we cleared the volume metadata. In current 4.2, this cannot happen, since we clear the metadata only if deleting the volume succeeded.
Can you post complete vdsm log with this error?
Once we find the volume without metadata, we can delete the LV using lvremove. This will fix the issue.
Shani, do you remember the bug we have with this error? this probably the same issue.
Ala, I think we need to add a tool to check and repair such chains.
Nir
-- С наилучшими пожеланиями, Максимов Алексей
Email: Aleksey.I.Maksimov@Yandex.ru

Hello Nir Thanks for the answer. The output of the commands is below. *********************************************************************************************************************************************
1. Please share the output of this command on one of the hosts: lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
# lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068 VG LV LV Tags ... 6db73566-0f7f-4438-a9ef-6815075f45ea 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000 6db73566-0f7f-4438-a9ef-6815075f45ea 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000 6db73566-0f7f-4438-a9ef-6815075f45ea 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2 ... *********************************************************************************************************************************************
2. For every volume, share the output of qemu-img info: If the lv is not active, activate it: lvchange -ay vg_name/lv_name
# lvdisplay 6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 --- Logical volume --- LV Path /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 LV Name 208ece15-1c71-46f2-a019-6a9fce4309b2 VG Name 6db73566-0f7f-4438-a9ef-6815075f45ea LV UUID k28hUo-Z6t7-wKdO-x7kz-ceYL-Vuzx-f9jLWi LV Write Access read/write LV Creation host, time VM32.sub.holding.com, 2017-12-05 14:46:42 +0300 LV Status NOT available LV Size 33.00 GiB Current LE 264 Segments 4 Allocation inherit Read ahead sectors auto # lvdisplay 6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 --- Logical volume --- LV Path /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 LV Name 4974a4cc-b388-456f-b98e-19d2158f0d58 VG Name 6db73566-0f7f-4438-a9ef-6815075f45ea LV UUID HnnP01-JGxU-9zne-HB6n-BcaE-2lrM-qr9KPI LV Write Access read/write LV Creation host, time VM12.sub.holding.com, 2018-07-31 03:35:20 +0300 LV Status NOT available LV Size 2.00 GiB Current LE 16 Segments 1 Allocation inherit Read ahead sectors auto # lvdisplay 6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832 --- Logical volume --- LV Path /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832 LV Name 8c66f617-7add-410c-b546-5214b0200832 VG Name 6db73566-0f7f-4438-a9ef-6815075f45ea LV UUID MG1VRN-IqRn-mOGm-F4ul-ufbZ-Dywb-M3V14P LV Write Access read/write LV Creation host, time VM12.sub.holding.com, 2018-08-01 03:34:31 +0300 LV Status NOT available LV Size 1.00 GiB Current LE 8 Segments 1 Allocation inherit Read ahead sectors auto # lvchange -ay 6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 # lvchange -ay 6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 # lvchange -ay 6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832 *********************************************************************************************************************************************
qemu-img info --backing /dev/vg_name/lv_name
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832 image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false Do not quite understand. What should I do next? 15.08.2018, 15:33, "Nir Soffer" <nsoffer@redhat.com>:
On Tue, Aug 14, 2018 at 6:03 PM Алексей Максимов <aleksey.i.maksimov@yandex.ru> wrote:
Hello, Nir
Log in attachment.
In the log we can see both createVolume and deleteVolume fail for this disk uuid: cdf1751b-64d3-42bc-b9ef-b0174c7ea068
1. Please share the output of this command on one of the hosts:
lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
This will show all the volumes belonging to this disk.
2. For every volume, share the output of qemu-img info:
If the lv is not active, activate it:
lvchange -ay vg_name/lv_name
Then run qemu-img info to find the actual chain:
qemu-img info --backing /dev/vg_name/lv_name
If the lv was not active, deactivate it - we don't want to leave unused lvs active.
lvchange -an vg_name/lv_name 3. On of the volume in the chain will not be part of the chain.
No other volume will use it as backing file, and it may not have a backing file, or it may point to another volume in the chain.
Once we found this volume, please check engine logs for this volume uuid. You will probaly find that the volume was deleted in the past. Maybe you will not find it since it was deleted months or years ago.
4. To verify that this volume does not have metadata, check the volume MD_N tag. N is the offset in 512 bytes blocks from the start of the metadata volume.
This will read the volume metadata block:
dd if=dev/vg_name/metadata bs=512 count=1 skip=N iflag=direct
We expect to see:
NONE=####################################################...
5. To remove this volume use:
lvremove vg_name/lv_name
Once the volume is removed, you will be able to create snapshot.
14.08.2018, 01:30, "Nir Soffer" <nsoffer@redhat.com>:
On Mon, Aug 13, 2018 at 1:45 PM Aleksey Maksimov <aleksey.i.maksimov@yandex.ru> wrote:
We use oVirt 4.2.5.2-1.el7 (Hosted engine / 4 hosts in cluster / about twenty virtual machines) Virtual machine disks are located on the Data Domain from FC SAN. Snapshots of all virtual machines are created normally. But for one virtual machine, we can not create a snapshot.
When we try to create a snapshot in the oVirt web console, we see such errors:
Aug 13, 2018, 1:05:06 PM Failed to complete snapshot 'KOM-APP14_BACKUP01' creation for VM 'KOM-APP14'. Aug 13, 2018, 1:05:01 PM VDSM KOM-VM14 command HSMGetAllTasksStatusesVDS failed: Could not acquire resource. Probably resource factory threw an exception.: () Aug 13, 2018, 1:05:00 PM Snapshot 'KOM-APP14_BACKUP01' creation for VM 'KOM-APP14' was initiated by petya@sub.holding.com@sub.holding.com-authz.
At this time on the server with the role of "SPM" in the vdsm.log we see this:
... 2018-08-13 05:05:06,471-0500 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats succeeded in 0.00 seconds (__init__:573) 2018-08-13 05:05:06,478-0500 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Image.deleteVolumes succeeded in 0.05 seconds (__init__:573) 2018-08-13 05:05:06,478-0500 INFO (tasks/3) [storage.ThreadPool.WorkerThread] START task bb45ae7e-77e9-4fec-9ee2-8e1f0ad3d589 (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7f06b85a2128>>, args=None) (threadPool:208) 2018-08-13 05:05:07,009-0500 WARN (tasks/3) [storage.ResourceManager] Resource factory failed to create resource '01_img_6db73566-0f7f-4438-a9ef-6815075f45ea.cdf1751b-64d3-42bc-b9ef-b0174c7ea068'. Canceling request. (resourceManager:543) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 539, in registerResource obj = namespaceObj.factory.createResource(name, lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource lockType) File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList imgUUID=resourceName) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 213, in getChain if srcVol.isLeaf(): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1430, in isLeaf return self._manifest.isLeaf() File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 139, in isLeaf return self.getVolType() == sc.type2name(sc.LEAF_VOL) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 135, in getVolType self.voltype = self.getMetaParam(sc.VOLTYPE) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 119, in getMetaParam meta = self.getMetadata() File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 112, in getMetadata md = VolumeMetadata.from_lines(lines) File "/usr/lib/python2.7/site-packages/vdsm/storage/volumemetadata.py", line 103, in from_lines "Missing metadata key: %s: found: %s" % (e, md)) MetaDataKeyNotFoundError: Meta Data key not found error: ("Missing metadata key: 'DOMAIN': found: {'NONE': '######################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################'}",)
Looks like you have a volume without metadata in the chain.
This may happen in the past when deleting a volume failed, but we cleared the volume metadata. In current 4.2, this cannot happen, since we clear the metadata only if deleting the volume succeeded.
Can you post complete vdsm log with this error?
Once we find the volume without metadata, we can delete the LV using lvremove. This will fix the issue.
Shani, do you remember the bug we have with this error? this probably the same issue.
Ala, I think we need to add a tool to check and repair such chains.
Nir

On Wed, Aug 15, 2018 at 6:14 PM Алексей Максимов < aleksey.i.maksimov@yandex.ru> wrote:
Hello Nir
Thanks for the answer. The output of the commands is below.
*********************************************************************************************************************************************
1. Please share the output of this command on one of the hosts: lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
********************************************************************************************************************************************* # lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
VG LV LV Tags ... 6db73566-0f7f-4438-a9ef-6815075f45ea 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000 6db73566-0f7f-4438-a9ef-6815075f45ea 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000 6db73566-0f7f-4438-a9ef-6815075f45ea 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2
So we have 2 volumes - 2 are base volumes: - 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000 - 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000 And one is top volume: - 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2 So according to vdsm, this is the chain: 208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top) The volume 4974a4cc-b388-456f-b98e-19d2158f0d58 is not part of this chain.
*********************************************************************************************************************************************
qemu-img info --backing /dev/vg_name/lv_name
*********************************************************************************************************************************************
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is the base volume according to vdsm and qemu, good.
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is the deleted volume according to vdsm metadata. We can see that this volume still has a backing file pointing to the base volume.
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is top top volume. So I think this is what happened: You had this chain in the past: 208ece15-1c71-46f2-a019-6a9fce4309b2 <- 4974a4cc-b388-456f-b98e-19d2158f0d5 <- 8c66f617-7add-410c-b546-5214b0200832 (top) You deleted a snapshot in engine, which created the new chain: 208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top) <- 4974a4cc-b388-456f-b98e-19d2158f0d5 (deleted) Deleting 4974a4cc-b388-456f-b98e-19d2158f0d5 failed, but we cleared the metadata of this volume. To confirm this theory, please share the output of: Top volume: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=16 iflag=direct Base volume: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=23 iflag=direct Deleted volume?: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=15 iflag=direct Nir

Hello Nir
To confirm this theory, please share the output of: Top volume: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=16 iflag=direct
DOMAIN=6db73566-0f7f-4438-a9ef-6815075f45ea CTIME=1533083673 FORMAT=COW DISKTYPE=DATA LEGALITY=LEGAL SIZE=62914560 VOLTYPE=LEAF DESCRIPTION= IMAGE=cdf1751b-64d3-42bc-b9ef-b0174c7ea068 PUUID=208ece15-1c71-46f2-a019-6a9fce4309b2 MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000348555 s, 1.5 MB/s
Base volume: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=23 iflag=direct
DOMAIN=6db73566-0f7f-4438-a9ef-6815075f45ea CTIME=1512474404 FORMAT=COW DISKTYPE=2 LEGALITY=LEGAL SIZE=62914560 VOLTYPE=INTERNAL DESCRIPTION={"DiskAlias":"KOM-APP14_Disk1","DiskDescription":""} IMAGE=cdf1751b-64d3-42bc-b9ef-b0174c7ea068 PUUID=00000000-0000-0000-0000-000000000000 MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00031362 s, 1.6 MB/s
Deleted volume?: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=15 iflag=direct
NONE=###################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################### EOF 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000350361 s, 1.5 MB/s 15.08.2018, 21:09, "Nir Soffer" <nsoffer@redhat.com>:
On Wed, Aug 15, 2018 at 6:14 PM Алексей Максимов <aleksey.i.maksimov@yandex.ru> wrote:
Hello Nir
Thanks for the answer. The output of the commands is below.
*********************************************************************************************************************************************
1. Please share the output of this command on one of the hosts: lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
# lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
VG LV LV Tags ... 6db73566-0f7f-4438-a9ef-6815075f45ea 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000 6db73566-0f7f-4438-a9ef-6815075f45ea 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000 6db73566-0f7f-4438-a9ef-6815075f45ea 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2
So we have 2 volumes - 2 are base volumes:
- 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000 - 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000
And one is top volume: - 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2
So according to vdsm, this is the chain:
208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
The volume 4974a4cc-b388-456f-b98e-19d2158f0d58 is not part of this chain.
*********************************************************************************************************************************************
qemu-img info --backing /dev/vg_name/lv_name
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is the base volume according to vdsm and qemu, good.
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is the deleted volume according to vdsm metadata. We can see that this volume still has a backing file pointing to the base volume.
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is top top volume.
So I think this is what happened:
You had this chain in the past:
208ece15-1c71-46f2-a019-6a9fce4309b2 <- 4974a4cc-b388-456f-b98e-19d2158f0d5 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
You deleted a snapshot in engine, which created the new chain:
208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top) <- 4974a4cc-b388-456f-b98e-19d2158f0d5 (deleted)
Deleting 4974a4cc-b388-456f-b98e-19d2158f0d5 failed, but we cleared the metadata of this volume.
To confirm this theory, please share the output of:
Top volume:
dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=16 iflag=direct
Base volume:
dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=23 iflag=direct
Deleted volume?:
dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=15 iflag=direct Nir

On Wed, Aug 15, 2018 at 10:30 PM Алексей Максимов < aleksey.i.maksimov@yandex.ru> wrote:
Hello Nir
To confirm this theory, please share the output of: Top volume: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=16 iflag=direct
DOMAIN=6db73566-0f7f-4438-a9ef-6815075f45ea CTIME=1533083673 FORMAT=COW DISKTYPE=DATA LEGALITY=LEGAL SIZE=62914560 VOLTYPE=LEAF DESCRIPTION= IMAGE=cdf1751b-64d3-42bc-b9ef-b0174c7ea068 PUUID=208ece15-1c71-46f2-a019-6a9fce4309b2 MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000348555 s, 1.5 MB/s
Base volume: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=23 iflag=direct
DOMAIN=6db73566-0f7f-4438-a9ef-6815075f45ea CTIME=1512474404 FORMAT=COW DISKTYPE=2 LEGALITY=LEGAL SIZE=62914560 VOLTYPE=INTERNAL DESCRIPTION={"DiskAlias":"KOM-APP14_Disk1","DiskDescription":""} IMAGE=cdf1751b-64d3-42bc-b9ef-b0174c7ea068 PUUID=00000000-0000-0000-0000-000000000000 MTIME=0 POOL_UUID= TYPE=SPARSE GEN=0 EOF 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00031362 s, 1.6 MB/s
Deleted volume?: dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=15 iflag=direct
NONE=###################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################### EOF 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000350361 s, 1.5 MB/s
This confirms that 6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 is a deleted volume. To fix this VM, please remove this volume. Run these commands on the SPM host: systemctl stop vdsmd lvremove 6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 systemctl start vdsmd You should be able to create snapshot after that.
15.08.2018, 21:09, "Nir Soffer" <nsoffer@redhat.com>:
On Wed, Aug 15, 2018 at 6:14 PM Алексей Максимов < aleksey.i.maksimov@yandex.ru> wrote:
Hello Nir
Thanks for the answer. The output of the commands is below.
1. Please share the output of this command on one of the hosts: lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
# lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
VG LV LV Tags ... 6db73566-0f7f-4438-a9ef-6815075f45ea 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000 6db73566-0f7f-4438-a9ef-6815075f45ea 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000 6db73566-0f7f-4438-a9ef-6815075f45ea 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2
So we have 2 volumes - 2 are base volumes:
- 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000 - 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000
And one is top volume: - 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2
So according to vdsm, this is the chain:
208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
The volume 4974a4cc-b388-456f-b98e-19d2158f0d58 is not part of this chain.
qemu-img info --backing /dev/vg_name/lv_name
# qemu-img info --backing
/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
image:
/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is the base volume according to vdsm and qemu, good.
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is the deleted volume according to vdsm metadata. We can see that this volume still has a backing file pointing to the base volume.
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
This is top top volume.
So I think this is what happened:
You had this chain in the past:
208ece15-1c71-46f2-a019-6a9fce4309b2 <- 4974a4cc-b388-456f-b98e-19d2158f0d5 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
You deleted a snapshot in engine, which created the new chain:
208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
<- 4974a4cc-b388-456f-b98e-19d2158f0d5 (deleted)
Deleting 4974a4cc-b388-456f-b98e-19d2158f0d5 failed, but we cleared the
metadata
of this volume.
To confirm this theory, please share the output of:
Top volume:
dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=16 iflag=direct
Base volume:
dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=23 iflag=direct
Deleted volume?:
dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=15 iflag=direct Nir

Hello Nir Many thanks for your help. The problem is solved.
participants (3)
-
Aleksey Maksimov
-
Nir Soffer
-
Алексей Максимов