On Wed, Aug 15, 2018 at 6:14 PM Алексей Максимов <aleksey.i.maksimov@yandex.ru> wrote:
Hello Nir

Thanks for the answer.
The output of the commands is below.


*********************************************************************************************************************************************
> 1. Please share the output of this command on one of the hosts:
> lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
*********************************************************************************************************************************************
# lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068

  VG                                   LV                                   LV Tags
  ...
  6db73566-0f7f-4438-a9ef-6815075f45ea 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000
  6db73566-0f7f-4438-a9ef-6815075f45ea 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000
  6db73566-0f7f-4438-a9ef-6815075f45ea 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2

So we have 2 volumes - 2 are base volumes:

- 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000
- 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000

And one is top volume:
- 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2

So according to vdsm, this is the chain:

    208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top)

The volume 4974a4cc-b388-456f-b98e-19d2158f0d58 is not part of this chain.
 
*********************************************************************************************************************************************
> qemu-img info --backing /dev/vg_name/lv_name
*********************************************************************************************************************************************


# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2

image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

This is the base volume according to vdsm and qemu, good.
 
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58

image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 0
cluster_size: 65536
backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2)
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

This is the deleted volume according to vdsm metadata. We can see that this volume
still has a backing file pointing to the base volume.
 
# qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832

image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 0
cluster_size: 65536
backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2)
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

This is top top volume.

So I think this is what happened:

You had this chain in the past:

    208ece15-1c71-46f2-a019-6a9fce4309b2 <- 4974a4cc-b388-456f-b98e-19d2158f0d5 <- 8c66f617-7add-410c-b546-5214b0200832 (top)

You deleted a snapshot in engine, which created the new chain:

    208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
                                                                       <- 4974a4cc-b388-456f-b98e-19d2158f0d5  (deleted)

Deleting 4974a4cc-b388-456f-b98e-19d2158f0d5 failed, but we cleared the metadata
of this volume.

To confirm this theory, please share the output of:

Top volume:

dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=16 iflag=direct

Base volume:

dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=23 iflag=direct

Deleted volume?:

dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=15 iflag=direct

Nir