On Wed, Aug 15, 2018 at 10:30 PM Алексей Максимов <aleksey.i.maksimov@yandex.ru> wrote:
Hello Nir

> To confirm this theory, please share the output of:
> Top volume:
> dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=16 iflag=direct

DOMAIN=6db73566-0f7f-4438-a9ef-6815075f45ea
CTIME=1533083673
FORMAT=COW
DISKTYPE=DATA
LEGALITY=LEGAL
SIZE=62914560
VOLTYPE=LEAF
DESCRIPTION=
IMAGE=cdf1751b-64d3-42bc-b9ef-b0174c7ea068
PUUID=208ece15-1c71-46f2-a019-6a9fce4309b2
MTIME=0
POOL_UUID=
TYPE=SPARSE
GEN=0
EOF
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000348555 s, 1.5 MB/s


> Base volume:
> dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=23 iflag=direct


DOMAIN=6db73566-0f7f-4438-a9ef-6815075f45ea
CTIME=1512474404
FORMAT=COW
DISKTYPE=2
LEGALITY=LEGAL
SIZE=62914560
VOLTYPE=INTERNAL
DESCRIPTION={"DiskAlias":"KOM-APP14_Disk1","DiskDescription":""}
IMAGE=cdf1751b-64d3-42bc-b9ef-b0174c7ea068
PUUID=00000000-0000-0000-0000-000000000000
MTIME=0
POOL_UUID=
TYPE=SPARSE
GEN=0
EOF
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00031362 s, 1.6 MB/s


> Deleted volume?:
> dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=15 iflag=direct

NONE=######################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################
EOF
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000350361 s, 1.5 MB/s

This confirms that 6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58
is a deleted volume.

To fix this VM, please remove this volume. Run these commands on the SPM host:

    systemctl stop vdsmd
    lvremove 6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58
    systemctl start vdsmd

You should be able to create snapshot after that.
 


15.08.2018, 21:09, "Nir Soffer" <nsoffer@redhat.com>:
> On Wed, Aug 15, 2018 at 6:14 PM Алексей Максимов <aleksey.i.maksimov@yandex.ru> wrote:
>> Hello Nir
>>
>> Thanks for the answer.
>> The output of the commands is below.
>>
>> *********************************************************************************************************************************************
>>> 1. Please share the output of this command on one of the hosts:
>>> lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
>> *********************************************************************************************************************************************
>> # lvs -o vg_name,lv_name,tags | grep cdf1751b-64d3-42bc-b9ef-b0174c7ea068
>>
>>   VG                                   LV                                   LV Tags
>>   ...
>>   6db73566-0f7f-4438-a9ef-6815075f45ea 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000
>>   6db73566-0f7f-4438-a9ef-6815075f45ea 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000
>>   6db73566-0f7f-4438-a9ef-6815075f45ea 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2
>
> So we have 2 volumes - 2 are base volumes:
>
> - 208ece15-1c71-46f2-a019-6a9fce4309b2 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_23,PU_00000000-0000-0000-0000-000000000000
> - 4974a4cc-b388-456f-b98e-19d2158f0d58 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_15,PU_00000000-0000-0000-0000-000000000000
>
> And one is top volume:
> - 8c66f617-7add-410c-b546-5214b0200832 IU_cdf1751b-64d3-42bc-b9ef-b0174c7ea068,MD_16,PU_208ece15-1c71-46f2-a019-6a9fce4309b2
>
> So according to vdsm, this is the chain:
>
>     208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
>
> The volume 4974a4cc-b388-456f-b98e-19d2158f0d58 is not part of this chain.
>
>> *********************************************************************************************************************************************
>>> qemu-img info --backing /dev/vg_name/lv_name
>> *********************************************************************************************************************************************
>>
>> # qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
>>
>> image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
>> file format: qcow2
>> virtual size: 30G (32212254720 bytes)
>> disk size: 0
>> cluster_size: 65536
>> Format specific information:
>>     compat: 1.1
>>     lazy refcounts: false
>>     refcount bits: 16
>>     corrupt: false
>
> This is the base volume according to vdsm and qemu, good.
>
>> # qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58
>>
>> image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/4974a4cc-b388-456f-b98e-19d2158f0d58
>> file format: qcow2
>> virtual size: 30G (32212254720 bytes)
>> disk size: 0
>> cluster_size: 65536
>> backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2)
>> backing file format: qcow2
>> Format specific information:
>>     compat: 1.1
>>     lazy refcounts: false
>>     refcount bits: 16
>>     corrupt: false
>>
>> image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
>> file format: qcow2
>> virtual size: 30G (32212254720 bytes)
>> disk size: 0
>> cluster_size: 65536
>> Format specific information:
>>     compat: 1.1
>>     lazy refcounts: false
>>     refcount bits: 16
>>     corrupt: false
>
> This is the deleted volume according to vdsm metadata. We can see that this volume
> still has a backing file pointing to the base volume.
>
>> # qemu-img info --backing /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832
>>
>> image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/8c66f617-7add-410c-b546-5214b0200832
>> file format: qcow2
>> virtual size: 30G (32212254720 bytes)
>> disk size: 0
>> cluster_size: 65536
>> backing file: 208ece15-1c71-46f2-a019-6a9fce4309b2 (actual path: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2)
>> backing file format: qcow2
>> Format specific information:
>>     compat: 1.1
>>     lazy refcounts: false
>>     refcount bits: 16
>>     corrupt: false
>>
>> image: /dev/6db73566-0f7f-4438-a9ef-6815075f45ea/208ece15-1c71-46f2-a019-6a9fce4309b2
>> file format: qcow2
>> virtual size: 30G (32212254720 bytes)
>> disk size: 0
>> cluster_size: 65536
>> Format specific information:
>>     compat: 1.1
>>     lazy refcounts: false
>>     refcount bits: 16
>>     corrupt: false
>
> This is top top volume.
>
> So I think this is what happened:
>
> You had this chain in the past:
>
>     208ece15-1c71-46f2-a019-6a9fce4309b2 <- 4974a4cc-b388-456f-b98e-19d2158f0d5 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
>
> You deleted a snapshot in engine, which created the new chain:
>
>     208ece15-1c71-46f2-a019-6a9fce4309b2 <- 8c66f617-7add-410c-b546-5214b0200832 (top)
>                                                                        <- 4974a4cc-b388-456f-b98e-19d2158f0d5  (deleted)
>
> Deleting 4974a4cc-b388-456f-b98e-19d2158f0d5 failed, but we cleared the metadata
> of this volume.
>
> To confirm this theory, please share the output of:
>
> Top volume:
>
> dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=16 iflag=direct
>
> Base volume:
>
> dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=23 iflag=direct
>
> Deleted volume?:
>
> dd if=/dev/6db73566-0f7f-4438-a9ef-6815075f45ea/metadata bs=512 count=1 skip=15 iflag=direct
> Nir