
Hi, I running oVirt 4.3.9 with FC based storage. I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent. disk1 : Snapshot still visible in DB and on Storage using LVM commands disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly) disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly) When I try to delete the snapshot again it runs forever and nothing happens. Is there a way to suppress that snapshot? Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB? Thanks for any hint or help. rgds , arsene -- Arsène Gschwind <arsene.gschwind@unibas.ch<mailto:arsene.gschwind@unibas.ch>> Universitaet Basel

On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
Hi,
I running oVirt 4.3.9 with FC based storage. I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent. disk1 : Snapshot still visible in DB and on Storage using LVM commands disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly) disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly)
When I try to delete the snapshot again it runs forever and nothing happens.
Did you try also when the vm is not running? In general the system is designed so trying again a failed merge will complete the merge. If the merge does complete, there may be some bug that the system cannot handle.
Is there a way to suppress that snapshot? Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB?
Yes but it is complicated. You need to understand the qcow2 chain on storage, complete the merge manually using qemu-img commit, update the metadata manually (even harder), then update engine db. The best way - if the system cannot recover, is to fix the bad metadata that cause the system to fail, and the let the system recover itself. Which storage domain format are you using? V5? V4?
Thanks for any hint or help. rgds , arsene
--
Arsène Gschwind <arsene.gschwind@unibas.ch> Universitaet Basel _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2...

On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote: On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
Hi, I running oVirt 4.3.9 with FC based storage. I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent. disk1 : Snapshot still visible in DB and on Storage using LVM commands disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly) disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly) When I try to delete the snapshot again it runs forever and nothing happens. Did you try also when the vm is not running? Yes I've tried that without success In general the system is designed so trying again a failed merge will complete the merge. If the merge does complete, there may be some bug that the system cannot handle. Is there a way to suppress that snapshot? Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB? Yes but it is complicated. You need to understand the qcow2 chain on storage, complete the merge manually using qemu-img commit, update the metadata manually (even harder), then update engine db. The best way - if the system cannot recover, is to fix the bad metadata that cause the system to fail, and the let the system recover itself. Which storage domain format are you using? V5? V4? I'm using storage format V5 on FC. Thanks. Thanks for any hint or help. rgds , arsene -- Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
Universitaet Basel _______________________________________________ Users mailing list -- <mailto:users@ovirt.org> users@ovirt.org To unsubscribe send an email to <mailto:users-leave@ovirt.org> users-leave@ovirt.org Privacy Statement: <https://www.ovirt.org/privacy-policy.html> https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: <https://www.ovirt.org/community/about/community-guidelines/> https://www.ovirt.org/community/about/community-guidelines/ List Archives: <https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2... -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11

On Tue, 2020-07-14 at 16:50 +0000, Arsène Gschwind wrote: On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote: On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
Hi, I running oVirt 4.3.9 with FC based storage. I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent. disk1 : Snapshot still visible in DB and on Storage using LVM commands disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly) disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly) When I try to delete the snapshot again it runs forever and nothing happens. Did you try also when the vm is not running? Yes I've tried that without success In general the system is designed so trying again a failed merge will complete the merge. If the merge does complete, there may be some bug that the system cannot handle. Is there a way to suppress that snapshot? Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB? Yes but it is complicated. You need to understand the qcow2 chain on storage, complete the merge manually using qemu-img commit, update the metadata manually (even harder), then update engine db. The best way - if the system cannot recover, is to fix the bad metadata that cause the system to fail, and the let the system recover itself. Do you have some hint how to fix the metadata? Thanks a lot. Which storage domain format are you using? V5? V4? I'm using storage format V5 on FC. Thanks. Thanks for any hint or help. rgds , arsene -- Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
Universitaet Basel _______________________________________________ Users mailing list -- <mailto:users@ovirt.org> users@ovirt.org To unsubscribe send an email to <mailto:users-leave@ovirt.org> users-leave@ovirt.org Privacy Statement: <https://www.ovirt.org/privacy-policy.html> https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: <https://www.ovirt.org/community/about/community-guidelines/> https://www.ovirt.org/community/about/community-guidelines/ List Archives: <https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2... -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11 _______________________________________________ Users mailing list -- <mailto:users@ovirt.org> users@ovirt.org To unsubscribe send an email to <mailto:users-leave@ovirt.org> users-leave@ovirt.org Privacy Statement: <https://www.ovirt.org/privacy-policy.html> https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: <https://www.ovirt.org/community/about/community-guidelines/> https://www.ovirt.org/community/about/community-guidelines/ List Archives: <https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y4A2PG6PNTSW2DR72QREH3IW6DITCU4U/> https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y4A2PG6PNTSW2D... -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11

On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:
On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
Hi,
I running oVirt 4.3.9 with FC based storage.
I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent.
disk1 : Snapshot still visible in DB and on Storage using LVM commands
disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly)
disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly)
When I try to delete the snapshot again it runs forever and nothing happens.
Did you try also when the vm is not running?
Yes I've tried that without success
In general the system is designed so trying again a failed merge will complete
the merge.
If the merge does complete, there may be some bug that the system cannot
handle.
Is there a way to suppress that snapshot?
Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB?
Yes but it is complicated. You need to understand the qcow2 chain
on storage, complete the merge manually using qemu-img commit,
update the metadata manually (even harder), then update engine db.
The best way - if the system cannot recover, is to fix the bad metadata
that cause the system to fail, and the let the system recover itself.
Which storage domain format are you using? V5? V4?
I'm using storage format V5 on FC.
Fixing the metadata is not easy. First you have to find the volumes related to this disk. You can find the disk uuid and storage domain uuid in engine ui, and then you can find the volumes like this: lvs -o vg_name,lv_name,tags | grep disk-uuid For every lv, you will have a tag MD_N where n is a number. This is the slot number in the metadata volume. You need to calculate the offset of the metadata area for every volume using: offset = 1024*1024 + 8192 * N Then you can copy the metadata block using: dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset conv=skip_bytes > lv-name.meta Please share these files. This part is not needed in 4.4, we have a new StorageDomain dump API, that can find the same info in one command: vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \ jq '.volumes | .[] | select(.image=="disk-uuid")' The second step is to see what is the actual qcow2 chain. Find the volume which is the LEAF by grepping the metadata files. In some cases you may have more than one LEAF (which may be the problem). Then activate all volumes using: lvchange -ay vg-name/lv-name Now you can get the backing chain using qemu-img and the LEAF volume. qemu-img info --backing-chain /dev/vg-name/lv-name If you have more than one LEAF, run this on all LEAFs. Ony one of them will be correct. Please share also output of qemu-img. Once we finished with the volumes, deactivate them: lvchange -an vg-name/lv-name Based on the output, we can tell what is the real chain, and what is the chain as seen by vdsm metadata, and what is the required fix. Nir
Thanks.
Thanks for any hint or help.
rgds , arsene
--
Arsène Gschwind <
arsene.gschwind@unibas.ch
Universitaet Basel
_______________________________________________
Users mailing list --
users@ovirt.org
To unsubscribe send an email to
users-leave@ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2...
--
Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch | +41 61 267 14 11

On Tue, Jul 14, 2020 at 10:50 PM Nir Soffer <nsoffer@redhat.com> wrote:
Fixing the metadata is not easy.
First you have to find the volumes related to this disk. You can find the disk uuid and storage domain uuid in engine ui, and then you can find the volumes like this:
lvs -o vg_name,lv_name,tags | grep disk-uuid
Only to add that possibly the RHV logical volumes are filtered out at lvm.conf level, so in this case it could be necessary to bypass the filter to see the information, like this: lvs --config 'devices { filter = [ "a|.*|" ] }' -o vg_name,lv_name,tags | grep disk-uuid Gianluca

Hi Nir, I've followed your guide, please find attached the informations. Thanks a lot for your help. Arsene On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote: On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote: On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch wrote: Hi, I running oVirt 4.3.9 with FC based storage. I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent. disk1 : Snapshot still visible in DB and on Storage using LVM commands disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly) disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly) When I try to delete the snapshot again it runs forever and nothing happens. Did you try also when the vm is not running? Yes I've tried that without success In general the system is designed so trying again a failed merge will complete the merge. If the merge does complete, there may be some bug that the system cannot handle. Is there a way to suppress that snapshot? Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB? Yes but it is complicated. You need to understand the qcow2 chain on storage, complete the merge manually using qemu-img commit, update the metadata manually (even harder), then update engine db. The best way - if the system cannot recover, is to fix the bad metadata that cause the system to fail, and the let the system recover itself. Which storage domain format are you using? V5? V4? I'm using storage format V5 on FC. Fixing the metadata is not easy. First you have to find the volumes related to this disk. You can find the disk uuid and storage domain uuid in engine ui, and then you can find the volumes like this: lvs -o vg_name,lv_name,tags | grep disk-uuid For every lv, you will have a tag MD_N where n is a number. This is the slot number in the metadata volume. You need to calculate the offset of the metadata area for every volume using: offset = 1024*1024 + 8192 * N Then you can copy the metadata block using: dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset conv=skip_bytes > lv-name.meta Please share these files. This part is not needed in 4.4, we have a new StorageDomain dump API, that can find the same info in one command: vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \ jq '.volumes | .[] | select(.image=="disk-uuid")' The second step is to see what is the actual qcow2 chain. Find the volume which is the LEAF by grepping the metadata files. In some cases you may have more than one LEAF (which may be the problem). Then activate all volumes using: lvchange -ay vg-name/lv-name Now you can get the backing chain using qemu-img and the LEAF volume. qemu-img info --backing-chain /dev/vg-name/lv-name If you have more than one LEAF, run this on all LEAFs. Ony one of them will be correct. Please share also output of qemu-img. Once we finished with the volumes, deactivate them: lvchange -an vg-name/lv-name Based on the output, we can tell what is the real chain, and what is the chain as seen by vdsm metadata, and what is the required fix. Nir Thanks. Thanks for any hint or help. rgds , arsene -- Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch Universitaet Basel _______________________________________________ Users mailing list -- <mailto:users@ovirt.org> users@ovirt.org To unsubscribe send an email to <mailto:users-leave@ovirt.org> users-leave@ovirt.org Privacy Statement: <https://www.ovirt.org/privacy-policy.html> https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: <https://www.ovirt.org/community/about/community-guidelines/> https://www.ovirt.org/community/about/community-guidelines/ List Archives: <https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2... -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | <http://its.unibas.ch> http://its.unibas.ch ITS-ServiceDesk: <mailto:support-its@unibas.ch> support-its@unibas.ch | +41 61 267 14 11 -- Arsène Gschwind <arsene.gschwind@unibas.ch<mailto:arsene.gschwind@unibas.ch>> Universitaet Basel

On Wed, Jul 15, 2020 at 3:12 PM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
Hi Nir,
I've followed your guide, please find attached the informations. Thanks a lot for your help.
Thanks, looking at the data. Quick look in the pdf show that one qemu-img info command failed: --- lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b qemu-img: Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': Could not open '/dev/33777993- a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': No such file or directory --- Maybe this lv was deactivated by vdsm after you activate it? Please try to activate it again and run the command again. Sending all the info in text format in the mail message would make it easier to respond.
Arsene
On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote:
On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:
On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
Hi,
I running oVirt 4.3.9 with FC based storage.
I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent.
disk1 : Snapshot still visible in DB and on Storage using LVM commands
disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly)
disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly)
When I try to delete the snapshot again it runs forever and nothing happens.
Did you try also when the vm is not running?
Yes I've tried that without success
In general the system is designed so trying again a failed merge will complete
the merge.
If the merge does complete, there may be some bug that the system cannot
handle.
Is there a way to suppress that snapshot?
Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB?
Yes but it is complicated. You need to understand the qcow2 chain
on storage, complete the merge manually using qemu-img commit,
update the metadata manually (even harder), then update engine db.
The best way - if the system cannot recover, is to fix the bad metadata
that cause the system to fail, and the let the system recover itself.
Which storage domain format are you using? V5? V4?
I'm using storage format V5 on FC.
Fixing the metadata is not easy.
First you have to find the volumes related to this disk. You can find
the disk uuid and storage
domain uuid in engine ui, and then you can find the volumes like this:
lvs -o vg_name,lv_name,tags | grep disk-uuid
For every lv, you will have a tag MD_N where n is a number. This is
the slot number
in the metadata volume.
You need to calculate the offset of the metadata area for every volume using:
offset = 1024*1024 + 8192 * N
Then you can copy the metadata block using:
dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset
conv=skip_bytes > lv-name.meta
Please share these files.
This part is not needed in 4.4, we have a new StorageDomain dump API,
that can find the same
info in one command:
vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \
jq '.volumes | .[] | select(.image=="disk-uuid")'
The second step is to see what is the actual qcow2 chain. Find the
volume which is the LEAF
by grepping the metadata files. In some cases you may have more than
one LEAF (which may
be the problem).
Then activate all volumes using:
lvchange -ay vg-name/lv-name
Now you can get the backing chain using qemu-img and the LEAF volume.
qemu-img info --backing-chain /dev/vg-name/lv-name
If you have more than one LEAF, run this on all LEAFs. Ony one of them
will be correct.
Please share also output of qemu-img.
Once we finished with the volumes, deactivate them:
lvchange -an vg-name/lv-name
Based on the output, we can tell what is the real chain, and what is
the chain as seen by
vdsm metadata, and what is the required fix.
Nir
Thanks.
Thanks for any hint or help.
rgds , arsene
--
Arsène Gschwind <
arsene.gschwind@unibas.ch
Universitaet Basel
_______________________________________________
Users mailing list --
users@ovirt.org
To unsubscribe send an email to
users-leave@ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2...
--
Arsène Gschwind
Fa. Sapify AG im Auftrag der universitaet Basel
IT Services
Klinelbergstr. 70 | CH-4056 Basel | Switzerland
Tel: +41 79 449 25 63 |
ITS-ServiceDesk:
support-its@unibas.ch
| +41 61 267 14 11
--
Arsène Gschwind <arsene.gschwind@unibas.ch> Universitaet Basel

On Wed, 2020-07-15 at 15:42 +0300, Nir Soffer wrote: On Wed, Jul 15, 2020 at 3:12 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
Hi Nir, I've followed your guide, please find attached the informations. Thanks a lot for your help. Thanks, looking at the data. Quick look in the pdf show that one qemu-img info command failed: --- lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b qemu-img: Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': Could not open '/dev/33777993- a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': No such file or directory --- Maybe this lv was deactivated by vdsm after you activate it? Please try to activate it again and run the command again. Sending all the info in text format in the mail message would make it easier to respond. I did it again with the same result, and the LV was still activated. lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b qemu-img: Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': No such file or directory lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g Sorry for the PDF, it was easier for me, but I will post everything in the mail from now on. Arsene On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote: On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch wrote: On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote: On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch wrote: Hi, I running oVirt 4.3.9 with FC based storage. I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent. disk1 : Snapshot still visible in DB and on Storage using LVM commands disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly) disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly) When I try to delete the snapshot again it runs forever and nothing happens. Did you try also when the vm is not running? Yes I've tried that without success In general the system is designed so trying again a failed merge will complete the merge. If the merge does complete, there may be some bug that the system cannot handle. Is there a way to suppress that snapshot? Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB? Yes but it is complicated. You need to understand the qcow2 chain on storage, complete the merge manually using qemu-img commit, update the metadata manually (even harder), then update engine db. The best way - if the system cannot recover, is to fix the bad metadata that cause the system to fail, and the let the system recover itself. Which storage domain format are you using? V5? V4? I'm using storage format V5 on FC. Fixing the metadata is not easy. First you have to find the volumes related to this disk. You can find the disk uuid and storage domain uuid in engine ui, and then you can find the volumes like this: lvs -o vg_name,lv_name,tags | grep disk-uuid For every lv, you will have a tag MD_N where n is a number. This is the slot number in the metadata volume. You need to calculate the offset of the metadata area for every volume using: offset = 1024*1024 + 8192 * N Then you can copy the metadata block using: dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset conv=skip_bytes > lv-name.meta Please share these files. This part is not needed in 4.4, we have a new StorageDomain dump API, that can find the same info in one command: vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \ jq '.volumes | .[] | select(.image=="disk-uuid")' The second step is to see what is the actual qcow2 chain. Find the volume which is the LEAF by grepping the metadata files. In some cases you may have more than one LEAF (which may be the problem). Then activate all volumes using: lvchange -ay vg-name/lv-name Now you can get the backing chain using qemu-img and the LEAF volume. qemu-img info --backing-chain /dev/vg-name/lv-name If you have more than one LEAF, run this on all LEAFs. Ony one of them will be correct. Please share also output of qemu-img. Once we finished with the volumes, deactivate them: lvchange -an vg-name/lv-name Based on the output, we can tell what is the real chain, and what is the chain as seen by vdsm metadata, and what is the required fix. Nir Thanks. Thanks for any hint or help. rgds , arsene -- Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch Universitaet Basel _______________________________________________ Users mailing list -- <mailto:users@ovirt.org> users@ovirt.org To unsubscribe send an email to <mailto:users-leave@ovirt.org> users-leave@ovirt.org Privacy Statement: <https://www.ovirt.org/privacy-policy.html> https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: <https://www.ovirt.org/community/about/community-guidelines/> https://www.ovirt.org/community/about/community-guidelines/ List Archives: <https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2... -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | <http://its.unibas.ch> http://its.unibas.ch ITS-ServiceDesk: <mailto:support-its@unibas.ch> support-its@unibas.ch | +41 61 267 14 11 -- Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
Universitaet Basel -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11

On Wed, Jul 15, 2020 at 4:00 PM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
On Wed, 2020-07-15 at 15:42 +0300, Nir Soffer wrote:
On Wed, Jul 15, 2020 at 3:12 PM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
Hi Nir,
I've followed your guide, please find attached the informations.
Thanks a lot for your help.
Thanks, looking at the data.
Quick look in the pdf show that one qemu-img info command failed:
---
lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
6197b30d-0732-4cc7-aef0-12f9f6e9565b
33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g
qemu-img info --backing-chain
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
qemu-img: Could not open
'/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':
It is clear now - qemu could not open the backing file: lv=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 You must activate all the volumes in this image. I think my instructions was not clear enough. 1. Find all lvs related to this image 2. Activate all of them for lv_name in lv-name-1 lv-name-2 lv-name-3; do lvchange -ay vg-name/$lv_name done 3. Run qemu-img info on the LEAF volume 4. Deactivate the lvs activated in step 2.
---
Maybe this lv was deactivated by vdsm after you activate it? Please
try to activate it again and
run the command again.
Sending all the info in text format in the mail message would make it
easier to respond.
I did it again with the same result, and the LV was still activated.
lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g
qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
qemu-img: Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': No such file or directory
lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g
Sorry for the PDF, it was easier for me, but I will post everything in the mail from now on.
Arsene
On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote:
On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:
On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
Hi,
I running oVirt 4.3.9 with FC based storage.
I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent.
disk1 : Snapshot still visible in DB and on Storage using LVM commands
disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly)
disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly)
When I try to delete the snapshot again it runs forever and nothing happens.
Did you try also when the vm is not running?
Yes I've tried that without success
In general the system is designed so trying again a failed merge will complete
the merge.
If the merge does complete, there may be some bug that the system cannot
handle.
Is there a way to suppress that snapshot?
Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB?
Yes but it is complicated. You need to understand the qcow2 chain
on storage, complete the merge manually using qemu-img commit,
update the metadata manually (even harder), then update engine db.
The best way - if the system cannot recover, is to fix the bad metadata
that cause the system to fail, and the let the system recover itself.
Which storage domain format are you using? V5? V4?
I'm using storage format V5 on FC.
Fixing the metadata is not easy.
First you have to find the volumes related to this disk. You can find
the disk uuid and storage
domain uuid in engine ui, and then you can find the volumes like this:
lvs -o vg_name,lv_name,tags | grep disk-uuid
For every lv, you will have a tag MD_N where n is a number. This is
the slot number
in the metadata volume.
You need to calculate the offset of the metadata area for every volume using:
offset = 1024*1024 + 8192 * N
Then you can copy the metadata block using:
dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset
conv=skip_bytes > lv-name.meta
Please share these files.
This part is not needed in 4.4, we have a new StorageDomain dump API,
that can find the same
info in one command:
vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \
jq '.volumes | .[] | select(.image=="disk-uuid")'
The second step is to see what is the actual qcow2 chain. Find the
volume which is the LEAF
by grepping the metadata files. In some cases you may have more than
one LEAF (which may
be the problem).
Then activate all volumes using:
lvchange -ay vg-name/lv-name
Now you can get the backing chain using qemu-img and the LEAF volume.
qemu-img info --backing-chain /dev/vg-name/lv-name
If you have more than one LEAF, run this on all LEAFs. Ony one of them
will be correct.
Please share also output of qemu-img.
Once we finished with the volumes, deactivate them:
lvchange -an vg-name/lv-name
Based on the output, we can tell what is the real chain, and what is
the chain as seen by
vdsm metadata, and what is the required fix.
Nir
Thanks.
Thanks for any hint or help.
rgds , arsene
--
Arsène Gschwind <
arsene.gschwind@unibas.ch
Universitaet Basel
_______________________________________________
Users mailing list --
users@ovirt.org
To unsubscribe send an email to
users-leave@ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2...
--
Arsène Gschwind
Fa. Sapify AG im Auftrag der universitaet Basel
IT Services
Klinelbergstr. 70 | CH-4056 Basel | Switzerland
Tel: +41 79 449 25 63 |
ITS-ServiceDesk:
support-its@unibas.ch
| +41 61 267 14 11
--
Arsène Gschwind <
arsene.gschwind@unibas.ch
Universitaet Basel
--
Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch | +41 61 267 14 11

On Wed, 2020-07-15 at 16:28 +0300, Nir Soffer wrote: On Wed, Jul 15, 2020 at 4:00 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
On Wed, 2020-07-15 at 15:42 +0300, Nir Soffer wrote: On Wed, Jul 15, 2020 at 3:12 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch wrote: Hi Nir, I've followed your guide, please find attached the informations. Thanks a lot for your help. Thanks, looking at the data. Quick look in the pdf show that one qemu-img info command failed: --- lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b qemu-img: Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': It is clear now - qemu could not open the backing file: lv=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 You must activate all the volumes in this image. I think my instructions was not clear enough. 1. Find all lvs related to this image 2. Activate all of them for lv_name in lv-name-1 lv-name-2 lv-name-3; do lvchange -ay vg-name/$lv_name done 3. Run qemu-img info on the LEAF volume 4. Deactivate the lvs activated in step 2. Ouups, sorry . Now it should be correct. qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 0 cluster_size: 65536 backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false --- Maybe this lv was deactivated by vdsm after you activate it? Please try to activate it again and run the command again. Sending all the info in text format in the mail message would make it easier to respond. I did it again with the same result, and the LV was still activated. lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b qemu-img: Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': Could not open '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': No such file or directory lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a----- 5.00g Sorry for the PDF, it was easier for me, but I will post everything in the mail from now on. Arsene On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote: On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch wrote: On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote: On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch wrote: Hi, I running oVirt 4.3.9 with FC based storage. I'm running several VM with 3 disks on 3 different SD. Lately we did delete a VM Snapshot and that task failed after a while and since then the Snapshot is inconsistent. disk1 : Snapshot still visible in DB and on Storage using LVM commands disk2: Snapshot still visible in DB but not on storage anymore (It seems the merge did run correctly) disk3: Snapshot still visible in DB but not on storage ansmore (It seems the merge did run correctly) When I try to delete the snapshot again it runs forever and nothing happens. Did you try also when the vm is not running? Yes I've tried that without success In general the system is designed so trying again a failed merge will complete the merge. If the merge does complete, there may be some bug that the system cannot handle. Is there a way to suppress that snapshot? Is it possible to merge disk1 with its snapshot using LVM commands and then cleanup the Engine DB? Yes but it is complicated. You need to understand the qcow2 chain on storage, complete the merge manually using qemu-img commit, update the metadata manually (even harder), then update engine db. The best way - if the system cannot recover, is to fix the bad metadata that cause the system to fail, and the let the system recover itself. Which storage domain format are you using? V5? V4? I'm using storage format V5 on FC. Fixing the metadata is not easy. First you have to find the volumes related to this disk. You can find the disk uuid and storage domain uuid in engine ui, and then you can find the volumes like this: lvs -o vg_name,lv_name,tags | grep disk-uuid For every lv, you will have a tag MD_N where n is a number. This is the slot number in the metadata volume. You need to calculate the offset of the metadata area for every volume using: offset = 1024*1024 + 8192 * N Then you can copy the metadata block using: dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset conv=skip_bytes > lv-name.meta Please share these files. This part is not needed in 4.4, we have a new StorageDomain dump API, that can find the same info in one command: vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \ jq '.volumes | .[] | select(.image=="disk-uuid")' The second step is to see what is the actual qcow2 chain. Find the volume which is the LEAF by grepping the metadata files. In some cases you may have more than one LEAF (which may be the problem). Then activate all volumes using: lvchange -ay vg-name/lv-name Now you can get the backing chain using qemu-img and the LEAF volume. qemu-img info --backing-chain /dev/vg-name/lv-name If you have more than one LEAF, run this on all LEAFs. Ony one of them will be correct. Please share also output of qemu-img. Once we finished with the volumes, deactivate them: lvchange -an vg-name/lv-name Based on the output, we can tell what is the real chain, and what is the chain as seen by vdsm metadata, and what is the required fix. Nir Thanks. Thanks for any hint or help. rgds , arsene -- Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch Universitaet Basel _______________________________________________ Users mailing list -- <mailto:users@ovirt.org> users@ovirt.org To unsubscribe send an email to <mailto:users-leave@ovirt.org> users-leave@ovirt.org Privacy Statement: <https://www.ovirt.org/privacy-policy.html> https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: <https://www.ovirt.org/community/about/community-guidelines/> https://www.ovirt.org/community/about/community-guidelines/ List Archives: <https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2... -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | <http://its.unibas.ch> http://its.unibas.ch ITS-ServiceDesk: <mailto:support-its@unibas.ch> support-its@unibas.ch | +41 61 267 14 11 -- Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch Universitaet Basel -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | <http://its.unibas.ch> http://its.unibas.ch ITS-ServiceDesk: <mailto:support-its@unibas.ch> support-its@unibas.ch | +41 61 267 14 11 -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11

What we see in the data you sent: Qemu chain: $ qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 0 cluster_size: 65536 backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false Vdsm chain: $ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta CAP=161061273600 CTIME=1594060718 DESCRIPTION= DISKTYPE=DATA DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca FORMAT=COW GEN=0 IMAGE=d7bd480d-2c51-4141-a386-113abf75219e LEGALITY=ILLEGAL ^^^^^^ This is the issue, the top volume is illegal. PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 TYPE=SPARSE VOLTYPE=LEAF $ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta CAP=161061273600 CTIME=1587646763 DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"} DISKTYPE=DATA DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca FORMAT=COW GEN=0 IMAGE=d7bd480d-2c51-4141-a386-113abf75219e LEGALITY=LEGAL PUUID=00000000-0000-0000-0000-000000000000 TYPE=SPARSE VOLTYPE=INTERNAL We set volume to ILLEGAL when we merge the top volume into the parent volume, and both volumes contain the same data. After we mark the volume as ILLEGAL, we pivot to the parent volume (8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8). If the pivot was successful, the parent volume may have new data, and starting the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state prevent this. If the pivot was not successful, the vm must be started using the top volume, but it will always fail if the volume is ILLEGAL. If the volume is ILLEGAL, trying to merge again when the VM is not running will always fail, since vdsm does not if the pivot succeeded or not, and cannot merge the volume in a safe way. Do you have the vdsm from all merge attempts on this disk? The most important log is the one showing the original merge. If the merge succeeded, we should see a log showing the new libvirt chain, which should contain only the parent volume. Nir

On Wed, 2020-07-15 at 17:46 +0300, Nir Soffer wrote: What we see in the data you sent: Qemu chain: $ qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 0 cluster_size: 65536 backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false Vdsm chain: $ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta CAP=161061273600 CTIME=1594060718 DESCRIPTION= DISKTYPE=DATA DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca FORMAT=COW GEN=0 IMAGE=d7bd480d-2c51-4141-a386-113abf75219e LEGALITY=ILLEGAL ^^^^^^ This is the issue, the top volume is illegal. PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 TYPE=SPARSE VOLTYPE=LEAF $ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta CAP=161061273600 CTIME=1587646763 DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"} DISKTYPE=DATA DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca FORMAT=COW GEN=0 IMAGE=d7bd480d-2c51-4141-a386-113abf75219e LEGALITY=LEGAL PUUID=00000000-0000-0000-0000-000000000000 TYPE=SPARSE VOLTYPE=INTERNAL We set volume to ILLEGAL when we merge the top volume into the parent volume, and both volumes contain the same data. After we mark the volume as ILLEGAL, we pivot to the parent volume (8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8). If the pivot was successful, the parent volume may have new data, and starting the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state prevent this. If the pivot was not successful, the vm must be started using the top volume, but it will always fail if the volume is ILLEGAL. If the volume is ILLEGAL, trying to merge again when the VM is not running will always fail, since vdsm does not if the pivot succeeded or not, and cannot merge the volume in a safe way. Do you have the vdsm from all merge attempts on this disk? This is an extract of the vdsm logs, i may provide the complete log if it would help. 2020-07-13 11:18:30,257+0200 INFO (jsonrpc/5) [api.virt] START merge(drive={u'imageID': u'6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', u'volumeID': u'6172a270-5f73-464d-bebd-8bf0658c1de0', u'domainID': u'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', topVolUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', bandwidth=u'0', jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,271+0200 INFO (jsonrpc/5) [vdsm.api] START getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', volUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:48) 2020-07-13 11:18:30,281+0200 INFO (jsonrpc/6) [api.virt] START merge(drive={u'imageID': u'b8e8b8b6-edd1-4d40-b80b-259268ff4878', u'volumeID': u'28ed1acb-9697-43bd-980b-fe4317a06f24', u'domainID': u'6b82f31b-fa2a-406b-832d-64d9666e1bcc', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', topVolUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', bandwidth=u'0', jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,299+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='b8e8b8b6-edd1-4d40-b80b-259268ff4878', volUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3163b275-71d5-4976-9046-c0b558a8437f (api:48) 2020-07-13 11:18:30,312+0200 INFO (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=e4836371-e73a-4373-bd73-7754ecf1f3d1 (api:48) 2020-07-13 11:18:30,509+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6b82f31b-fa2a-406b-832d-64d9666e1bcc imgUUID=b8e8b8b6-edd1-4d40-b80b-259268ff4878 volUUID = 29f99f8d-d8a6-475a-928c-e2ffdba76d80 (volume:240) 2020-07-13 11:18:30,522+0200 INFO (jsonrpc/5) [storage.VolumeManifest] Info request: sdUUID=a6f2625d-0f21-4d81-b98c-f545d5f86f8e imgUUID=6c1445b3-33ac-4ec4-8e43-483d4a6da4e3 volUUID = a9d5fe18-f1bd-462e-95f7-42a50e81eb11 (volume:240) 2020-07-13 11:18:30,545+0200 INFO (jsonrpc/7) [storage.VolumeManifest] Info request: sdUUID=33777993-a3a5-4aad-a24c-dfe5e473faca imgUUID=d7bd480d-2c51-4141-a386-113abf75219e volUUID = 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (volume:240) 2020-07-13 11:18:30,569+0200 INFO (jsonrpc/5) [storage.VolumeManifest] a6f2625d-0f21-4d81-b98c-f545d5f86f8e/6c1445b3-33ac-4ec4-8e43-483d4a6da4e3/a9d5fe18-f1bd-462e-95f7-42a50e81eb11 info is {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '354334801920', 'children': [], 'pool': '', 'ctime': '1587654444', 'capacity': '354334801920', 'uuid': u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'truesize': '354334801920', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': N one, 'offset': 121634816}} (volume:279) 2020-07-13 11:18:30,569+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}', 'paren t': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '354334801920', 'children': [], 'pool': '', 'ctime': '1587654444', 'capacity': '354334801920', 'uuid': u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'truesize': '354334801920', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 121634816}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630 febc227, task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:54) 2020-07-13 11:18:30,571+0200 INFO (jsonrpc/5) [vdsm.api] START getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', volUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=548e96d6-e103-4344-991d-5e4f0f4cd703 (api:48) 2020-07-13 11:18:30,571+0200 INFO (jsonrpc/5) [storage.VolumeManifest] Info request: sdUUID=a6f2625d-0f21-4d81-b98c-f545d5f86f8e imgUUID=6c1445b3-33ac-4ec4-8e43-483d4a6da4e3 volUUID = 6172a270-5f73-464d-bebd-8bf0658c1de0 (volume:240) 2020-07-13 11:18:30,585+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6b82f31b-fa2a-406b-832d-64d9666e1bcc/b8e8b8b6-edd1-4d40-b80b-259268ff4878/29f99f8d-d8a6-475a-928c-e2ffdba76d80 info is {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_HANALogs_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 logs"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize ': '139586437120', 'children': [], 'pool': '', 'ctime': '1587654445', 'capacity': '139586437120', 'uuid': u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'truesize': '139586437120', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 121634816}} (volume:279) 2020-07-13 11:18:30,585+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_HANALogs_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 logs"}', 'par ent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '139586437120', 'children': [], 'pool': '', 'ctime': '1587654445', 'capacity': '139586437120' , 'uuid': u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'truesize': '139586437120', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 121634816}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-886 30febc227, task_id=3163b275-71d5-4976-9046-c0b558a8437f (api:54) 2020-07-13 11:18:30,586+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='b8e8b8b6-edd1-4d40-b80b-259268ff4878', volUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3d7790d3-7b6d-49a0-8867-594b6c859894 (api:48) 2020-07-13 11:18:30,587+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6b82f31b-fa2a-406b-832d-64d9666e1bcc imgUUID=b8e8b8b6-edd1-4d40-b80b-259268ff4878 volUUID = 28ed1acb-9697-43bd-980b-fe4317a06f24 (volume:240) 2020-07-13 11:18:30,600+0200 INFO (jsonrpc/7) [storage.VolumeManifest] 33777993-a3a5-4aad-a24c-dfe5e473faca/d7bd480d-2c51-4141-a386-113abf75219e/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 info is {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '4429185024 0', 'children': [], 'pool': '', 'ctime': '1587646763', 'capacity': '161061273600', 'uuid': u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'truesize': '44291850240', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 13421 7728}} (volume:279) 2020-07-13 11:18:30,600+0200 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"}', 'parent': '0000000 0-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '44291850240', 'children': [], 'pool': '', 'ctime': '1587646763', 'capacity': '161061273600', 'uuid': u'8e4 12b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'truesize': '44291850240', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 134217728}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=e48 36371-e73a-4373-bd73-7754ecf1f3d1 (api:54) 2020-07-13 11:18:30,601+0200 INFO (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=4fc04383-42da-4240-9783-395c1b610754 (api:48) 2020-07-13 11:18:30,602+0200 INFO (jsonrpc/7) [storage.VolumeManifest] Info request: sdUUID=33777993-a3a5-4aad-a24c-dfe5e473faca imgUUID=d7bd480d-2c51-4141-a386-113abf75219e volUUID = 6197b30d-0732-4cc7-aef0-12f9f6e9565b (volume:240) 2020-07-13 11:18:30,615+0200 INFO (jsonrpc/5) [storage.VolumeManifest] a6f2625d-0f21-4d81-b98c-f545d5f86f8e/6c1445b3-33ac-4ec4-8e43-483d4a6da4e3/6172a270-5f73-464d-bebd-8bf0658c1de0 info is {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'LEAF', 'description': ' ', 'parent': 'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'format': 'COW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '53687091200', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '3543348 01920', 'uuid': u'6172a270-5f73-464d-bebd-8bf0658c1de0', 'truesize': '53687091200', 'type': 'SPARSE', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 125829120}} (volume:279) 2020-07-13 11:18:30,616+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'LEAF', 'description': '', 'parent': 'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'format': 'COW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '53687091200', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '354334801920', 'uuid': u'6172a270-5f73-464d-bebd-8bf0658c1de0', 'truesize': '53687091200', 'type': 'SPA RSE', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 125829120}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=548e96d6-e103-4344-991d-5e4f0f4cd703 (api:54) 2020-07-13 11:18:30,630+0200 INFO (jsonrpc/5) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667', original chain=a9d5fe18-f1bd-462e-95f7-42a50e81eb11 < 6172a270-5f73-464d-bebd-8bf0658c1de0 (top), disk='sdb', base='sdb[1 ]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,640+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6b82f31b-fa2a-406b-832d-64d9666e1bcc/b8e8b8b6-edd1-4d40-b80b-259268ff4878/28ed1acb-9697-43bd-980b-fe4317a06f24 info is {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'LEAF', 'description': ' ', 'parent': '29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'format': 'COW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '13958643 7120', 'uuid': u'28ed1acb-9697-43bd-980b-fe4317a06f24', 'truesize': '3221225472', 'type': 'SPARSE', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 127926272}} (volume:279) 2020-07-13 11:18:30,640+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'LEAF', 'description': '', 'parent': '29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'format': 'COW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '139586437120', 'uuid': u'28ed1acb-9697-43bd-980b-fe4317a06f24', 'truesize': '3221225472', 'type': 'SPARS E', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 127926272}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3d7790d3-7b6d-49a0-8867-594b6c859894 (api:54) 2020-07-13 11:18:30,649+0200 INFO (jsonrpc/7) [storage.VolumeManifest] 33777993-a3a5-4aad-a24c-dfe5e473faca/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b info is {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'LEAF', 'description': ' ', 'parent': '8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '5368709120', 'children': [], 'pool': '', 'ctime': '1594060718', 'capacity': '16106127 3600', 'uuid': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', 'truesize': '5368709120', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 165675008}} (volume:279) 2020-07-13 11:18:30,649+0200 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'LEAF', 'description': '', 'parent': '8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '5368709120', 'children': [], 'pool': '', 'ctime': '1594060718', 'capacity': '161061273600', 'uuid': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', 'truesize': '5368709120', 'type': 'SPARS E', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 165675008}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=4fc04383-42da-4240-9783-395c1b610754 (api:54) 2020-07-13 11:18:30,676+0200 INFO (jsonrpc/5) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54) 2020-07-13 11:18:30,676+0200 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.merge succeeded in 0.42 seconds (__init__:312) 2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,716+0200 INFO (jsonrpc/7) [vdsm.api] START sendExtendMsg(spUUID='00000002-0002-0002-0002-000000000289', volDict={'newSize': 50734301184, 'domainID': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'name': 'sda', 'poolID': '00000002-0002-0002-0002-000000000289', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': True, 'volumeID': u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'imageID': 'd7bd480d-2c51-4141-a386-113abf75219e'}, newSize=50734301184, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fa1e06cd890>>) from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=ca213f64-0223-4adb-bba9-8b704e477c40 (api:48) 2020-07-13 11:18:30,716+0200 INFO (jsonrpc/7) [vdsm.api] FINISH sendExtendMsg return=None from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=ca213f64-0223-4adb-bba9-8b704e477c40 (api:54) 2020-07-13 11:18:30,740+0200 INFO (jsonrpc/6) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193', original chain=29f99f8d-d8a6-475a-928c-e2ffdba76d80 < 28ed1acb-9697-43bd-980b-fe4317a06f24 (top), disk='sdc', base='sdc[1]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,752+0200 INFO (mailbox-hsm) [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM - ['/usr/bin/dd', 'of=/rhev/data-center/00000002-0002-0002-0002-000000000289/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', 'seek=2'] (mailbox:380) 2020-07-13 11:18:30,808+0200 INFO (jsonrpc/6) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54) 2020-07-13 11:18:30,809+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.merge succeeded in 0.53 seconds (__init__:312) 2020-07-13 11:18:30,817+0200 INFO (jsonrpc/7) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54) The most important log is the one showing the original merge. If the merge succeeded, we should see a log showing the new libvirt chain, which should contain only the parent volume. Nir -- Arsène Gschwind <arsene.gschwind@unibas.ch<mailto:arsene.gschwind@unibas.ch>> Universitaet Basel

On Wed, Jul 15, 2020 at 7:54 PM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
On Wed, 2020-07-15 at 17:46 +0300, Nir Soffer wrote:
What we see in the data you sent:
Qemu chain:
$ qemu-img info --backing-chain
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
file format: qcow2
virtual size: 150G (161061273600 bytes)
disk size: 0
cluster_size: 65536
backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path:
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8)
backing file format: qcow2
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
file format: qcow2
virtual size: 150G (161061273600 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
Vdsm chain:
$ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta
CAP=161061273600
CTIME=1594060718
DESCRIPTION=
DISKTYPE=DATA
DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
FORMAT=COW
GEN=0
IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
LEGALITY=ILLEGAL
^^^^^^
This is the issue, the top volume is illegal.
PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
TYPE=SPARSE
VOLTYPE=LEAF
$ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
CAP=161061273600
CTIME=1587646763
DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM
H11 HDB D13"}
DISKTYPE=DATA
DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
FORMAT=COW
GEN=0
IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
LEGALITY=LEGAL
PUUID=00000000-0000-0000-0000-000000000000
TYPE=SPARSE
VOLTYPE=INTERNAL
We set volume to ILLEGAL when we merge the top volume into the parent volume,
and both volumes contain the same data.
After we mark the volume as ILLEGAL, we pivot to the parent volume
(8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8).
If the pivot was successful, the parent volume may have new data, and starting
the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state
prevent this.
If the pivot was not successful, the vm must be started using the top
volume, but it
will always fail if the volume is ILLEGAL.
If the volume is ILLEGAL, trying to merge again when the VM is not running will
always fail, since vdsm does not if the pivot succeeded or not, and cannot merge
the volume in a safe way.
Do you have the vdsm from all merge attempts on this disk?
This is an extract of the vdsm logs, i may provide the complete log if it would help.
Yes, this is only the start of the merge. We see the success message but this only means the merge job was started. Please share the complete log, and if needed the next log. The important messages we look for are: Requesting pivot to complete active layer commit ... Follow by: Pivot completed ... If pivot failed, we expect to see this message: Pivot failed: ... After these messages we may find very important logs that explain why your disk was left in an inconsistent state. Since this looks like a bug and may be useful to others, I think it is time to file a vdsm bug, and attach the logs to the bug.
2020-07-13 11:18:30,257+0200 INFO (jsonrpc/5) [api.virt] START merge(drive={u'imageID': u'6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', u'volumeID': u'6172a270-5f73-464d-bebd-8bf0658c1de0', u'domainID': u'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', topVolUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', bandwidth=u'0', jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,271+0200 INFO (jsonrpc/5) [vdsm.api] START getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', volUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:48) 2020-07-13 11:18:30,281+0200 INFO (jsonrpc/6) [api.virt] START merge(drive={u'imageID': u'b8e8b8b6-edd1-4d40-b80b-259268ff4878', u'volumeID': u'28ed1acb-9697-43bd-980b-fe4317a06f24', u'domainID': u'6b82f31b-fa2a-406b-832d-64d9666e1bcc', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', topVolUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', bandwidth=u'0', jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,299+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='b8e8b8b6-edd1-4d40-b80b-259268ff4878', volUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3163b275-71d5-4976-9046-c0b558a8437f (api:48) 2020-07-13 11:18:30,312+0200 INFO (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=e4836371-e73a-4373-bd73-7754ecf1f3d1 (api:48) 2020-07-13 11:18:30,509+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6b82f31b-fa2a-406b-832d-64d9666e1bcc imgUUID=b8e8b8b6-edd1-4d40-b80b-259268ff4878 volUUID = 29f99f8d-d8a6-475a-928c-e2ffdba76d80 (volume:240) 2020-07-13 11:18:30,522+0200 INFO (jsonrpc/5) [storage.VolumeManifest] Info request: sdUUID=a6f2625d-0f21-4d81-b98c-f545d5f86f8e imgUUID=6c1445b3-33ac-4ec4-8e43-483d4a6da4e3 volUUID = a9d5fe18-f1bd-462e-95f7-42a50e81eb11 (volume:240) 2020-07-13 11:18:30,545+0200 INFO (jsonrpc/7) [storage.VolumeManifest] Info request: sdUUID=33777993-a3a5-4aad-a24c-dfe5e473faca imgUUID=d7bd480d-2c51-4141-a386-113abf75219e volUUID = 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (volume:240) 2020-07-13 11:18:30,569+0200 INFO (jsonrpc/5) [storage.VolumeManifest] a6f2625d-0f21-4d81-b98c-f545d5f86f8e/6c1445b3-33ac-4ec4-8e43-483d4a6da4e3/a9d5fe18-f1bd-462e-95f7-42a50e81eb11 info is {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '354334801920', 'children': [], 'pool': '', 'ctime': '1587654444', 'capacity': '354334801920', 'uuid': u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'truesize': '354334801920', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': N one, 'offset': 121634816}} (volume:279) 2020-07-13 11:18:30,569+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}', 'paren t': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '354334801920', 'children': [], 'pool': '', 'ctime': '1587654444', 'capacity': '354334801920', 'uuid': u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'truesize': '354334801920', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 121634816}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630 febc227, task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:54) 2020-07-13 11:18:30,571+0200 INFO (jsonrpc/5) [vdsm.api] START getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', volUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=548e96d6-e103-4344-991d-5e4f0f4cd703 (api:48) 2020-07-13 11:18:30,571+0200 INFO (jsonrpc/5) [storage.VolumeManifest] Info request: sdUUID=a6f2625d-0f21-4d81-b98c-f545d5f86f8e imgUUID=6c1445b3-33ac-4ec4-8e43-483d4a6da4e3 volUUID = 6172a270-5f73-464d-bebd-8bf0658c1de0 (volume:240) 2020-07-13 11:18:30,585+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6b82f31b-fa2a-406b-832d-64d9666e1bcc/b8e8b8b6-edd1-4d40-b80b-259268ff4878/29f99f8d-d8a6-475a-928c-e2ffdba76d80 info is {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_HANALogs_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 logs"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize ': '139586437120', 'children': [], 'pool': '', 'ctime': '1587654445', 'capacity': '139586437120', 'uuid': u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'truesize': '139586437120', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 121634816}} (volume:279) 2020-07-13 11:18:30,585+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_HANALogs_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 logs"}', 'par ent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '139586437120', 'children': [], 'pool': '', 'ctime': '1587654445', 'capacity': '139586437120' , 'uuid': u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'truesize': '139586437120', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 121634816}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-886 30febc227, task_id=3163b275-71d5-4976-9046-c0b558a8437f (api:54) 2020-07-13 11:18:30,586+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='b8e8b8b6-edd1-4d40-b80b-259268ff4878', volUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3d7790d3-7b6d-49a0-8867-594b6c859894 (api:48) 2020-07-13 11:18:30,587+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6b82f31b-fa2a-406b-832d-64d9666e1bcc imgUUID=b8e8b8b6-edd1-4d40-b80b-259268ff4878 volUUID = 28ed1acb-9697-43bd-980b-fe4317a06f24 (volume:240) 2020-07-13 11:18:30,600+0200 INFO (jsonrpc/7) [storage.VolumeManifest] 33777993-a3a5-4aad-a24c-dfe5e473faca/d7bd480d-2c51-4141-a386-113abf75219e/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 info is {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '4429185024 0', 'children': [], 'pool': '', 'ctime': '1587646763', 'capacity': '161061273600', 'uuid': u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'truesize': '44291850240', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 13421 7728}} (volume:279) 2020-07-13 11:18:30,600+0200 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"}', 'parent': '0000000 0-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '44291850240', 'children': [], 'pool': '', 'ctime': '1587646763', 'capacity': '161061273600', 'uuid': u'8e4 12b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'truesize': '44291850240', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 134217728}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=e48 36371-e73a-4373-bd73-7754ecf1f3d1 (api:54) 2020-07-13 11:18:30,601+0200 INFO (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=4fc04383-42da-4240-9783-395c1b610754 (api:48) 2020-07-13 11:18:30,602+0200 INFO (jsonrpc/7) [storage.VolumeManifest] Info request: sdUUID=33777993-a3a5-4aad-a24c-dfe5e473faca imgUUID=d7bd480d-2c51-4141-a386-113abf75219e volUUID = 6197b30d-0732-4cc7-aef0-12f9f6e9565b (volume:240) 2020-07-13 11:18:30,615+0200 INFO (jsonrpc/5) [storage.VolumeManifest] a6f2625d-0f21-4d81-b98c-f545d5f86f8e/6c1445b3-33ac-4ec4-8e43-483d4a6da4e3/6172a270-5f73-464d-bebd-8bf0658c1de0 info is {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'LEAF', 'description': ' ', 'parent': 'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'format': 'COW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '53687091200', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '3543348 01920', 'uuid': u'6172a270-5f73-464d-bebd-8bf0658c1de0', 'truesize': '53687091200', 'type': 'SPARSE', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 125829120}} (volume:279) 2020-07-13 11:18:30,616+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'LEAF', 'description': '', 'parent': 'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'format': 'COW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '53687091200', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '354334801920', 'uuid': u'6172a270-5f73-464d-bebd-8bf0658c1de0', 'truesize': '53687091200', 'type': 'SPA RSE', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 125829120}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=548e96d6-e103-4344-991d-5e4f0f4cd703 (api:54) 2020-07-13 11:18:30,630+0200 INFO (jsonrpc/5) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667', original chain=a9d5fe18-f1bd-462e-95f7-42a50e81eb11 < 6172a270-5f73-464d-bebd-8bf0658c1de0 (top), disk='sdb', base='sdb[1 ]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,640+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6b82f31b-fa2a-406b-832d-64d9666e1bcc/b8e8b8b6-edd1-4d40-b80b-259268ff4878/28ed1acb-9697-43bd-980b-fe4317a06f24 info is {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'LEAF', 'description': ' ', 'parent': '29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'format': 'COW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '13958643 7120', 'uuid': u'28ed1acb-9697-43bd-980b-fe4317a06f24', 'truesize': '3221225472', 'type': 'SPARSE', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 127926272}} (volume:279) 2020-07-13 11:18:30,640+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'LEAF', 'description': '', 'parent': '29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'format': 'COW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '139586437120', 'uuid': u'28ed1acb-9697-43bd-980b-fe4317a06f24', 'truesize': '3221225472', 'type': 'SPARS E', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 127926272}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3d7790d3-7b6d-49a0-8867-594b6c859894 (api:54) 2020-07-13 11:18:30,649+0200 INFO (jsonrpc/7) [storage.VolumeManifest] 33777993-a3a5-4aad-a24c-dfe5e473faca/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b info is {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'LEAF', 'description': ' ', 'parent': '8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '5368709120', 'children': [], 'pool': '', 'ctime': '1594060718', 'capacity': '16106127 3600', 'uuid': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', 'truesize': '5368709120', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 165675008}} (volume:279) 2020-07-13 11:18:30,649+0200 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'LEAF', 'description': '', 'parent': '8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '5368709120', 'children': [], 'pool': '', 'ctime': '1594060718', 'capacity': '161061273600', 'uuid': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', 'truesize': '5368709120', 'type': 'SPARS E', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 165675008}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=4fc04383-42da-4240-9783-395c1b610754 (api:54) 2020-07-13 11:18:30,676+0200 INFO (jsonrpc/5) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54) 2020-07-13 11:18:30,676+0200 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.merge succeeded in 0.42 seconds (__init__:312) 2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,716+0200 INFO (jsonrpc/7) [vdsm.api] START sendExtendMsg(spUUID='00000002-0002-0002-0002-000000000289', volDict={'newSize': 50734301184, 'domainID': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'name': 'sda', 'poolID': '00000002-0002-0002-0002-000000000289', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': True, 'volumeID': u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'imageID': 'd7bd480d-2c51-4141-a386-113abf75219e'}, newSize=50734301184, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fa1e06cd890>>) from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=ca213f64-0223-4adb-bba9-8b704e477c40 (api:48) 2020-07-13 11:18:30,716+0200 INFO (jsonrpc/7) [vdsm.api] FINISH sendExtendMsg return=None from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=ca213f64-0223-4adb-bba9-8b704e477c40 (api:54) 2020-07-13 11:18:30,740+0200 INFO (jsonrpc/6) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193', original chain=29f99f8d-d8a6-475a-928c-e2ffdba76d80 < 28ed1acb-9697-43bd-980b-fe4317a06f24 (top), disk='sdc', base='sdc[1]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,752+0200 INFO (mailbox-hsm) [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM - ['/usr/bin/dd', 'of=/rhev/data-center/00000002-0002-0002-0002-000000000289/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', 'seek=2'] (mailbox:380) 2020-07-13 11:18:30,808+0200 INFO (jsonrpc/6) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54) 2020-07-13 11:18:30,809+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.merge succeeded in 0.53 seconds (__init__:312) 2020-07-13 11:18:30,817+0200 INFO (jsonrpc/7) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54)
The most important log is the one showing the original merge. If the merge
succeeded, we should see a log showing the new libvirt chain, which
should contain
only the parent volume.
Nir
--
Arsène Gschwind <arsene.gschwind@unibas.ch> Universitaet Basel

On Wed, 2020-07-15 at 22:54 +0300, Nir Soffer wrote: On Wed, Jul 15, 2020 at 7:54 PM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
On Wed, 2020-07-15 at 17:46 +0300, Nir Soffer wrote: What we see in the data you sent: Qemu chain: $ qemu-img info --backing-chain /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 0 cluster_size: 65536 backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8) backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false Vdsm chain: $ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta CAP=161061273600 CTIME=1594060718 DESCRIPTION= DISKTYPE=DATA DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca FORMAT=COW GEN=0 IMAGE=d7bd480d-2c51-4141-a386-113abf75219e LEGALITY=ILLEGAL ^^^^^^ This is the issue, the top volume is illegal. PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 TYPE=SPARSE VOLTYPE=LEAF $ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta CAP=161061273600 CTIME=1587646763 DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"} DISKTYPE=DATA DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca FORMAT=COW GEN=0 IMAGE=d7bd480d-2c51-4141-a386-113abf75219e LEGALITY=LEGAL PUUID=00000000-0000-0000-0000-000000000000 TYPE=SPARSE VOLTYPE=INTERNAL We set volume to ILLEGAL when we merge the top volume into the parent volume, and both volumes contain the same data. After we mark the volume as ILLEGAL, we pivot to the parent volume (8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8). If the pivot was successful, the parent volume may have new data, and starting the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state prevent this. If the pivot was not successful, the vm must be started using the top volume, but it will always fail if the volume is ILLEGAL. If the volume is ILLEGAL, trying to merge again when the VM is not running will always fail, since vdsm does not if the pivot succeeded or not, and cannot merge the volume in a safe way. Do you have the vdsm from all merge attempts on this disk? This is an extract of the vdsm logs, i may provide the complete log if it would help. Yes, this is only the start of the merge. We see the success message but this only means the merge job was started. Please share the complete log, and if needed the next log. The important messages we look for are: Requesting pivot to complete active layer commit ... Follow by: Pivot completed ... If pivot failed, we expect to see this message: Pivot failed: ... After these messages we may find very important logs that explain why your disk was left in an inconsistent state. It looks like the Pivot completed successfully, see attached vdsm.log. Is there a way to recover that VM? Or would it be better to recover the VM from Backup? Thanks a lot Arsene Since this looks like a bug and may be useful to others, I think it is time to file a vdsm bug, and attach the logs to the bug. 2020-07-13 11:18:30,257+0200 INFO (jsonrpc/5) [api.virt] START merge(drive={u'imageID': u'6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', u'volumeID': u'6172a270-5f73-464d-bebd-8bf0658c1de0', u'domainID': u'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', topVolUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', bandwidth=u'0', jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,271+0200 INFO (jsonrpc/5) [vdsm.api] START getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', volUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:48) 2020-07-13 11:18:30,281+0200 INFO (jsonrpc/6) [api.virt] START merge(drive={u'imageID': u'b8e8b8b6-edd1-4d40-b80b-259268ff4878', u'volumeID': u'28ed1acb-9697-43bd-980b-fe4317a06f24', u'domainID': u'6b82f31b-fa2a-406b-832d-64d9666e1bcc', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', topVolUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', bandwidth=u'0', jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba seVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4 8) 2020-07-13 11:18:30,299+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='b8e8b8b6-edd1-4d40-b80b-259268ff4878', volUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3163b275-71d5-4976-9046-c0b558a8437f (api:48) 2020-07-13 11:18:30,312+0200 INFO (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=e4836371-e73a-4373-bd73-7754ecf1f3d1 (api:48) 2020-07-13 11:18:30,509+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6b82f31b-fa2a-406b-832d-64d9666e1bcc imgUUID=b8e8b8b6-edd1-4d40-b80b-259268ff4878 volUUID = 29f99f8d-d8a6-475a-928c-e2ffdba76d80 (volume:240) 2020-07-13 11:18:30,522+0200 INFO (jsonrpc/5) [storage.VolumeManifest] Info request: sdUUID=a6f2625d-0f21-4d81-b98c-f545d5f86f8e imgUUID=6c1445b3-33ac-4ec4-8e43-483d4a6da4e3 volUUID = a9d5fe18-f1bd-462e-95f7-42a50e81eb11 (volume:240) 2020-07-13 11:18:30,545+0200 INFO (jsonrpc/7) [storage.VolumeManifest] Info request: sdUUID=33777993-a3a5-4aad-a24c-dfe5e473faca imgUUID=d7bd480d-2c51-4141-a386-113abf75219e volUUID = 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (volume:240) 2020-07-13 11:18:30,569+0200 INFO (jsonrpc/5) [storage.VolumeManifest] a6f2625d-0f21-4d81-b98c-f545d5f86f8e/6c1445b3-33ac-4ec4-8e43-483d4a6da4e3/a9d5fe18-f1bd-462e-95f7-42a50e81eb11 info is {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '354334801920', 'children': [], 'pool': '', 'ctime': '1587654444', 'capacity': '354334801920', 'uuid': u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'truesize': '354334801920', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': N one, 'offset': 121634816}} (volume:279) 2020-07-13 11:18:30,569+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}', 'paren t': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '354334801920', 'children': [], 'pool': '', 'ctime': '1587654444', 'capacity': '354334801920', 'uuid': u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'truesize': '354334801920', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 121634816}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630 febc227, task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:54) 2020-07-13 11:18:30,571+0200 INFO (jsonrpc/5) [vdsm.api] START getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', volUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=548e96d6-e103-4344-991d-5e4f0f4cd703 (api:48) 2020-07-13 11:18:30,571+0200 INFO (jsonrpc/5) [storage.VolumeManifest] Info request: sdUUID=a6f2625d-0f21-4d81-b98c-f545d5f86f8e imgUUID=6c1445b3-33ac-4ec4-8e43-483d4a6da4e3 volUUID = 6172a270-5f73-464d-bebd-8bf0658c1de0 (volume:240) 2020-07-13 11:18:30,585+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6b82f31b-fa2a-406b-832d-64d9666e1bcc/b8e8b8b6-edd1-4d40-b80b-259268ff4878/29f99f8d-d8a6-475a-928c-e2ffdba76d80 info is {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_HANALogs_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 logs"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize ': '139586437120', 'children': [], 'pool': '', 'ctime': '1587654445', 'capacity': '139586437120', 'uuid': u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'truesize': '139586437120', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 121634816}} (volume:279) 2020-07-13 11:18:30,585+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_HANALogs_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 logs"}', 'par ent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '139586437120', 'children': [], 'pool': '', 'ctime': '1587654445', 'capacity': '139586437120' , 'uuid': u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'truesize': '139586437120', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 121634816}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-886 30febc227, task_id=3163b275-71d5-4976-9046-c0b558a8437f (api:54) 2020-07-13 11:18:30,586+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='b8e8b8b6-edd1-4d40-b80b-259268ff4878', volUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3d7790d3-7b6d-49a0-8867-594b6c859894 (api:48) 2020-07-13 11:18:30,587+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6b82f31b-fa2a-406b-832d-64d9666e1bcc imgUUID=b8e8b8b6-edd1-4d40-b80b-259268ff4878 volUUID = 28ed1acb-9697-43bd-980b-fe4317a06f24 (volume:240) 2020-07-13 11:18:30,600+0200 INFO (jsonrpc/7) [storage.VolumeManifest] 33777993-a3a5-4aad-a24c-dfe5e473faca/d7bd480d-2c51-4141-a386-113abf75219e/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 info is {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'INTERNAL', 'description ': '{"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '4429185024 0', 'children': [], 'pool': '', 'ctime': '1587646763', 'capacity': '161061273600', 'uuid': u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'truesize': '44291850240', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 13421 7728}} (volume:279) 2020-07-13 11:18:30,600+0200 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"}', 'parent': '0000000 0-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '44291850240', 'children': [], 'pool': '', 'ctime': '1587646763', 'capacity': '161061273600', 'uuid': u'8e4 12b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'truesize': '44291850240', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 134217728}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=e48 36371-e73a-4373-bd73-7754ecf1f3d1 (api:54) 2020-07-13 11:18:30,601+0200 INFO (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', options=None) from=::fff f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=4fc04383-42da-4240-9783-395c1b610754 (api:48) 2020-07-13 11:18:30,602+0200 INFO (jsonrpc/7) [storage.VolumeManifest] Info request: sdUUID=33777993-a3a5-4aad-a24c-dfe5e473faca imgUUID=d7bd480d-2c51-4141-a386-113abf75219e volUUID = 6197b30d-0732-4cc7-aef0-12f9f6e9565b (volume:240) 2020-07-13 11:18:30,615+0200 INFO (jsonrpc/5) [storage.VolumeManifest] a6f2625d-0f21-4d81-b98c-f545d5f86f8e/6c1445b3-33ac-4ec4-8e43-483d4a6da4e3/6172a270-5f73-464d-bebd-8bf0658c1de0 info is {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'LEAF', 'description': ' ', 'parent': 'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'format': 'COW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '53687091200', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '3543348 01920', 'uuid': u'6172a270-5f73-464d-bebd-8bf0658c1de0', 'truesize': '53687091200', 'type': 'SPARSE', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 125829120}} (volume:279) 2020-07-13 11:18:30,616+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'LEAF', 'description': '', 'parent': 'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'format': 'COW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '53687091200', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '354334801920', 'uuid': u'6172a270-5f73-464d-bebd-8bf0658c1de0', 'truesize': '53687091200', 'type': 'SPA RSE', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 125829120}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=548e96d6-e103-4344-991d-5e4f0f4cd703 (api:54) 2020-07-13 11:18:30,630+0200 INFO (jsonrpc/5) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667', original chain=a9d5fe18-f1bd-462e-95f7-42a50e81eb11 < 6172a270-5f73-464d-bebd-8bf0658c1de0 (top), disk='sdb', base='sdb[1 ]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,640+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6b82f31b-fa2a-406b-832d-64d9666e1bcc/b8e8b8b6-edd1-4d40-b80b-259268ff4878/28ed1acb-9697-43bd-980b-fe4317a06f24 info is {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'LEAF', 'description': ' ', 'parent': '29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'format': 'COW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '13958643 7120', 'uuid': u'28ed1acb-9697-43bd-980b-fe4317a06f24', 'truesize': '3221225472', 'type': 'SPARSE', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 127926272}} (volume:279) 2020-07-13 11:18:30,640+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'LEAF', 'description': '', 'parent': '29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'format': 'COW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '139586437120', 'uuid': u'28ed1acb-9697-43bd-980b-fe4317a06f24', 'truesize': '3221225472', 'type': 'SPARS E', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 127926272}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3d7790d3-7b6d-49a0-8867-594b6c859894 (api:54) 2020-07-13 11:18:30,649+0200 INFO (jsonrpc/7) [storage.VolumeManifest] 33777993-a3a5-4aad-a24c-dfe5e473faca/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b info is {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'LEAF', 'description': ' ', 'parent': '8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '5368709120', 'children': [], 'pool': '', 'ctime': '1594060718', 'capacity': '16106127 3600', 'uuid': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', 'truesize': '5368709120', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 165675008}} (volume:279) 2020-07-13 11:18:30,649+0200 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'LEAF', 'description': '', 'parent': '8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '5368709120', 'children': [], 'pool': '', 'ctime': '1594060718', 'capacity': '161061273600', 'uuid': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', 'truesize': '5368709120', 'type': 'SPARS E', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 165675008}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=4fc04383-42da-4240-9783-395c1b610754 (api:54) 2020-07-13 11:18:30,676+0200 INFO (jsonrpc/5) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54) 2020-07-13 11:18:30,676+0200 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.merge succeeded in 0.42 seconds (__init__:312) 2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,716+0200 INFO (jsonrpc/7) [vdsm.api] START sendExtendMsg(spUUID='00000002-0002-0002-0002-000000000289', volDict={'newSize': 50734301184, 'domainID': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'name': 'sda', 'poolID': '00000002-0002-0002-0002-000000000289', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': True, 'volumeID': u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'imageID': 'd7bd480d-2c51-4141-a386-113abf75219e'}, newSize=50734301184, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fa1e06cd890>>) from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=ca213f64-0223-4adb-bba9-8b704e477c40 (api:48) 2020-07-13 11:18:30,716+0200 INFO (jsonrpc/7) [vdsm.api] FINISH sendExtendMsg return=None from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=ca213f64-0223-4adb-bba9-8b704e477c40 (api:54) 2020-07-13 11:18:30,740+0200 INFO (jsonrpc/6) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193', original chain=29f99f8d-d8a6-475a-928c-e2ffdba76d80 < 28ed1acb-9697-43bd-980b-fe4317a06f24 (top), disk='sdc', base='sdc[1]', top=None, bandwidth=0, flags=12 (vm:5945) 2020-07-13 11:18:30,752+0200 INFO (mailbox-hsm) [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM - ['/usr/bin/dd', 'of=/rhev/data-center/00000002-0002-0002-0002-000000000289/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', 'seek=2'] (mailbox:380) 2020-07-13 11:18:30,808+0200 INFO (jsonrpc/6) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54) 2020-07-13 11:18:30,809+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.merge succeeded in 0.53 seconds (__init__:312) 2020-07-13 11:18:30,817+0200 INFO (jsonrpc/7) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54) The most important log is the one showing the original merge. If the merge succeeded, we should see a log showing the new libvirt chain, which should contain only the parent volume. Nir -- Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
Universitaet Basel -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11

On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
On Wed, 2020-07-15 at 22:54 +0300, Nir Soffer wrote:
On Wed, Jul 15, 2020 at 7:54 PM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
On Wed, 2020-07-15 at 17:46 +0300, Nir Soffer wrote:
What we see in the data you sent:
Qemu chain:
$ qemu-img info --backing-chain
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
file format: qcow2
virtual size: 150G (161061273600 bytes)
disk size: 0
cluster_size: 65536
backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path:
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8)
backing file format: qcow2
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
image: /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
file format: qcow2
virtual size: 150G (161061273600 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
Vdsm chain:
$ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta
CAP=161061273600
CTIME=1594060718
DESCRIPTION=
DISKTYPE=DATA
DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
FORMAT=COW
GEN=0
IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
LEGALITY=ILLEGAL
^^^^^^
This is the issue, the top volume is illegal.
PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
TYPE=SPARSE
VOLTYPE=LEAF
$ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
CAP=161061273600
CTIME=1587646763
DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM
H11 HDB D13"}
DISKTYPE=DATA
DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
FORMAT=COW
GEN=0
IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
LEGALITY=LEGAL
PUUID=00000000-0000-0000-0000-000000000000
TYPE=SPARSE
VOLTYPE=INTERNAL
We set volume to ILLEGAL when we merge the top volume into the parent volume,
and both volumes contain the same data.
After we mark the volume as ILLEGAL, we pivot to the parent volume
(8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8).
If the pivot was successful, the parent volume may have new data, and starting
the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state
prevent this.
If the pivot was not successful, the vm must be started using the top
volume, but it
will always fail if the volume is ILLEGAL.
If the volume is ILLEGAL, trying to merge again when the VM is not running will
always fail, since vdsm does not if the pivot succeeded or not, and cannot merge
the volume in a safe way.
Do you have the vdsm from all merge attempts on this disk?
This is an extract of the vdsm logs, i may provide the complete log if it would help.
Yes, this is only the start of the merge. We see the success message
but this only means the merge
job was started.
Please share the complete log, and if needed the next log. The
important messages we look for are:
Requesting pivot to complete active layer commit ...
Follow by:
Pivot completed ...
If pivot failed, we expect to see this message:
Pivot failed: ...
After these messages we may find very important logs that explain why
your disk was left
in an inconsistent state.
It looks like the Pivot completed successfully, see attached vdsm.log.
That's good, I'm looking in your log.
Is there a way to recover that VM?
If the pivot was successful, qemu started to use the parent volume instead of the top volume. In thi case you can delete the top volume, and fix the metadata of the parent volume. Then you need to remove the top volume from engine db, and fix the metadata of the parent volume in engine db. Let me veirfy first that the pivot was successful, and then I'll add instructions how to fix engine and volume metadata.
Or would it be better to recover the VM from Backup?
If the backup is recent enough, it will be easier. But fixing the VM is will prevent any data loss since the last backup. It is not clear from the previous mails (or maybe I missed it) - is the VM running now or stopped? If the vm is running, checking the vm xml will show very clearly that it is not using the top volume. You can do: virsh -r dumpxml vm-name-or-id
Thanks a lot Arsene
Since this looks like a bug and may be useful to others, I think it is
time to file a vdsm bug,
and attach the logs to the bug.
2020-07-13 11:18:30,257+0200 INFO (jsonrpc/5) [api.virt] START merge(drive={u'imageID': u'6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', u'volumeID': u'6172a270-5f73-464d-bebd-8bf0658c1de0', u'domainID': u'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba
seVolUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', topVolUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', bandwidth=u'0', jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4
8)
2020-07-13 11:18:30,271+0200 INFO (jsonrpc/5) [vdsm.api] START getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', volUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', options=None) from=::fff
f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:48)
2020-07-13 11:18:30,281+0200 INFO (jsonrpc/6) [api.virt] START merge(drive={u'imageID': u'b8e8b8b6-edd1-4d40-b80b-259268ff4878', u'volumeID': u'28ed1acb-9697-43bd-980b-fe4317a06f24', u'domainID': u'6b82f31b-fa2a-406b-832d-64d9666e1bcc', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba
seVolUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', topVolUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', bandwidth=u'0', jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4
8)
2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, ba
seVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4
8)
2020-07-13 11:18:30,299+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='b8e8b8b6-edd1-4d40-b80b-259268ff4878', volUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', options=None) from=::fff
f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3163b275-71d5-4976-9046-c0b558a8437f (api:48)
2020-07-13 11:18:30,312+0200 INFO (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', options=None) from=::fff
f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=e4836371-e73a-4373-bd73-7754ecf1f3d1 (api:48)
2020-07-13 11:18:30,509+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6b82f31b-fa2a-406b-832d-64d9666e1bcc imgUUID=b8e8b8b6-edd1-4d40-b80b-259268ff4878 volUUID = 29f99f8d-d8a6-475a-928c-e2ffdba76d80 (volume:240)
2020-07-13 11:18:30,522+0200 INFO (jsonrpc/5) [storage.VolumeManifest] Info request: sdUUID=a6f2625d-0f21-4d81-b98c-f545d5f86f8e imgUUID=6c1445b3-33ac-4ec4-8e43-483d4a6da4e3 volUUID = a9d5fe18-f1bd-462e-95f7-42a50e81eb11 (volume:240)
2020-07-13 11:18:30,545+0200 INFO (jsonrpc/7) [storage.VolumeManifest] Info request: sdUUID=33777993-a3a5-4aad-a24c-dfe5e473faca imgUUID=d7bd480d-2c51-4141-a386-113abf75219e volUUID = 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (volume:240)
2020-07-13 11:18:30,569+0200 INFO (jsonrpc/5) [storage.VolumeManifest] a6f2625d-0f21-4d81-b98c-f545d5f86f8e/6c1445b3-33ac-4ec4-8e43-483d4a6da4e3/a9d5fe18-f1bd-462e-95f7-42a50e81eb11 info is {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'INTERNAL', 'description
': '{"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize':
'354334801920', 'children': [], 'pool': '', 'ctime': '1587654444', 'capacity': '354334801920', 'uuid': u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'truesize': '354334801920', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': N
one, 'offset': 121634816}} (volume:279)
2020-07-13 11:18:30,569+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}', 'paren
t': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '354334801920', 'children': [], 'pool': '', 'ctime': '1587654444', 'capacity': '354334801920',
'uuid': u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'truesize': '354334801920', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 121634816}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630
febc227, task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:54)
2020-07-13 11:18:30,571+0200 INFO (jsonrpc/5) [vdsm.api] START getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', volUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', options=None) from=::fff
f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=548e96d6-e103-4344-991d-5e4f0f4cd703 (api:48)
2020-07-13 11:18:30,571+0200 INFO (jsonrpc/5) [storage.VolumeManifest] Info request: sdUUID=a6f2625d-0f21-4d81-b98c-f545d5f86f8e imgUUID=6c1445b3-33ac-4ec4-8e43-483d4a6da4e3 volUUID = 6172a270-5f73-464d-bebd-8bf0658c1de0 (volume:240)
2020-07-13 11:18:30,585+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6b82f31b-fa2a-406b-832d-64d9666e1bcc/b8e8b8b6-edd1-4d40-b80b-259268ff4878/29f99f8d-d8a6-475a-928c-e2ffdba76d80 info is {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'INTERNAL', 'description
': '{"DiskAlias":"cpslpd01_HANALogs_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 logs"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize
': '139586437120', 'children': [], 'pool': '', 'ctime': '1587654445', 'capacity': '139586437120', 'uuid': u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'truesize': '139586437120', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version':
None, 'offset': 121634816}} (volume:279)
2020-07-13 11:18:30,585+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_HANALogs_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 logs"}', 'par
ent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '139586437120', 'children': [], 'pool': '', 'ctime': '1587654445', 'capacity': '139586437120'
, 'uuid': u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'truesize': '139586437120', 'type': 'PREALLOCATED', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 121634816}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-886
30febc227, task_id=3163b275-71d5-4976-9046-c0b558a8437f (api:54)
2020-07-13 11:18:30,586+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='b8e8b8b6-edd1-4d40-b80b-259268ff4878', volUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', options=None) from=::fff
f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3d7790d3-7b6d-49a0-8867-594b6c859894 (api:48)
2020-07-13 11:18:30,587+0200 INFO (jsonrpc/6) [storage.VolumeManifest] Info request: sdUUID=6b82f31b-fa2a-406b-832d-64d9666e1bcc imgUUID=b8e8b8b6-edd1-4d40-b80b-259268ff4878 volUUID = 28ed1acb-9697-43bd-980b-fe4317a06f24 (volume:240)
2020-07-13 11:18:30,600+0200 INFO (jsonrpc/7) [storage.VolumeManifest] 33777993-a3a5-4aad-a24c-dfe5e473faca/d7bd480d-2c51-4141-a386-113abf75219e/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 info is {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'INTERNAL', 'description
': '{"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"}', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '4429185024
0', 'children': [], 'pool': '', 'ctime': '1587646763', 'capacity': '161061273600', 'uuid': u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'truesize': '44291850240', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 13421
7728}} (volume:279)
2020-07-13 11:18:30,600+0200 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'INTERNAL', 'description': '{"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM H11 HDB D13"}', 'parent': '0000000
0-0000-0000-0000-000000000000', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '44291850240', 'children': [], 'pool': '', 'ctime': '1587646763', 'capacity': '161061273600', 'uuid': u'8e4
12b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'truesize': '44291850240', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 134217728}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=e48
36371-e73a-4373-bd73-7754ecf1f3d1 (api:54)
2020-07-13 11:18:30,601+0200 INFO (jsonrpc/7) [vdsm.api] START getVolumeInfo(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', spUUID='00000002-0002-0002-0002-000000000289', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', options=None) from=::fff
f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=4fc04383-42da-4240-9783-395c1b610754 (api:48)
2020-07-13 11:18:30,602+0200 INFO (jsonrpc/7) [storage.VolumeManifest] Info request: sdUUID=33777993-a3a5-4aad-a24c-dfe5e473faca imgUUID=d7bd480d-2c51-4141-a386-113abf75219e volUUID = 6197b30d-0732-4cc7-aef0-12f9f6e9565b (volume:240)
2020-07-13 11:18:30,615+0200 INFO (jsonrpc/5) [storage.VolumeManifest] a6f2625d-0f21-4d81-b98c-f545d5f86f8e/6c1445b3-33ac-4ec4-8e43-483d4a6da4e3/6172a270-5f73-464d-bebd-8bf0658c1de0 info is {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'LEAF', 'description': '
', 'parent': 'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'format': 'COW', 'generation': 0, 'image': '6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '53687091200', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '3543348
01920', 'uuid': u'6172a270-5f73-464d-bebd-8bf0658c1de0', 'truesize': '53687091200', 'type': 'SPARSE', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 125829120}} (volume:279)
2020-07-13 11:18:30,616+0200 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 'voltype': 'LEAF', 'description': '', 'parent': 'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 'format': 'COW', 'generation': 0, 'image':
'6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '53687091200', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '354334801920', 'uuid': u'6172a270-5f73-464d-bebd-8bf0658c1de0', 'truesize': '53687091200', 'type': 'SPA
RSE', 'lease': {'path': '/dev/a6f2625d-0f21-4d81-b98c-f545d5f86f8e/leases', 'owners': [], 'version': None, 'offset': 125829120}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=548e96d6-e103-4344-991d-5e4f0f4cd703 (api:54)
2020-07-13 11:18:30,630+0200 INFO (jsonrpc/5) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667', original chain=a9d5fe18-f1bd-462e-95f7-42a50e81eb11 < 6172a270-5f73-464d-bebd-8bf0658c1de0 (top), disk='sdb', base='sdb[1
]', top=None, bandwidth=0, flags=12 (vm:5945)
2020-07-13 11:18:30,640+0200 INFO (jsonrpc/6) [storage.VolumeManifest] 6b82f31b-fa2a-406b-832d-64d9666e1bcc/b8e8b8b6-edd1-4d40-b80b-259268ff4878/28ed1acb-9697-43bd-980b-fe4317a06f24 info is {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'LEAF', 'description': '
', 'parent': '29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'format': 'COW', 'generation': 0, 'image': 'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '13958643
7120', 'uuid': u'28ed1acb-9697-43bd-980b-fe4317a06f24', 'truesize': '3221225472', 'type': 'SPARSE', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 127926272}} (volume:279)
2020-07-13 11:18:30,640+0200 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '6b82f31b-fa2a-406b-832d-64d9666e1bcc', 'voltype': 'LEAF', 'description': '', 'parent': '29f99f8d-d8a6-475a-928c-e2ffdba76d80', 'format': 'COW', 'generation': 0, 'image':
'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '3221225472', 'children': [], 'pool': '', 'ctime': '1594060717', 'capacity': '139586437120', 'uuid': u'28ed1acb-9697-43bd-980b-fe4317a06f24', 'truesize': '3221225472', 'type': 'SPARS
E', 'lease': {'path': '/dev/6b82f31b-fa2a-406b-832d-64d9666e1bcc/leases', 'owners': [], 'version': None, 'offset': 127926272}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=3d7790d3-7b6d-49a0-8867-594b6c859894 (api:54)
2020-07-13 11:18:30,649+0200 INFO (jsonrpc/7) [storage.VolumeManifest] 33777993-a3a5-4aad-a24c-dfe5e473faca/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b info is {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'LEAF', 'description': '
', 'parent': '8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'format': 'COW', 'generation': 0, 'image': 'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '5368709120', 'children': [], 'pool': '', 'ctime': '1594060718', 'capacity': '16106127
3600', 'uuid': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', 'truesize': '5368709120', 'type': 'SPARSE', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 165675008}} (volume:279)
2020-07-13 11:18:30,649+0200 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'voltype': 'LEAF', 'description': '', 'parent': '8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'format': 'COW', 'generation': 0, 'image':
'd7bd480d-2c51-4141-a386-113abf75219e', 'disktype': 'DATA', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '5368709120', 'children': [], 'pool': '', 'ctime': '1594060718', 'capacity': '161061273600', 'uuid': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', 'truesize': '5368709120', 'type': 'SPARS
E', 'lease': {'path': '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/leases', 'owners': [], 'version': None, 'offset': 165675008}}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=4fc04383-42da-4240-9783-395c1b610754 (api:54)
2020-07-13 11:18:30,676+0200 INFO (jsonrpc/5) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54)
2020-07-13 11:18:30,676+0200 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.merge succeeded in 0.42 seconds (__init__:312)
2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945)
2020-07-13 11:18:30,716+0200 INFO (jsonrpc/7) [vdsm.api] START sendExtendMsg(spUUID='00000002-0002-0002-0002-000000000289', volDict={'newSize': 50734301184, 'domainID': '33777993-a3a5-4aad-a24c-dfe5e473faca', 'name': 'sda', 'poolID': '00000002-0002-0002-0002-000000000289', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': True, 'volumeID': u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 'imageID': 'd7bd480d-2c51-4141-a386-113abf75219e'}, newSize=50734301184, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fa1e06cd890>>) from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=ca213f64-0223-4adb-bba9-8b704e477c40 (api:48)
2020-07-13 11:18:30,716+0200 INFO (jsonrpc/7) [vdsm.api] FINISH sendExtendMsg return=None from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, task_id=ca213f64-0223-4adb-bba9-8b704e477c40 (api:54)
2020-07-13 11:18:30,740+0200 INFO (jsonrpc/6) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193', original chain=29f99f8d-d8a6-475a-928c-e2ffdba76d80 < 28ed1acb-9697-43bd-980b-fe4317a06f24 (top), disk='sdc', base='sdc[1]', top=None, bandwidth=0, flags=12 (vm:5945)
2020-07-13 11:18:30,752+0200 INFO (mailbox-hsm) [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM - ['/usr/bin/dd', 'of=/rhev/data-center/00000002-0002-0002-0002-000000000289/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', 'seek=2'] (mailbox:380)
2020-07-13 11:18:30,808+0200 INFO (jsonrpc/6) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54)
2020-07-13 11:18:30,809+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.merge succeeded in 0.53 seconds (__init__:312)
2020-07-13 11:18:30,817+0200 INFO (jsonrpc/7) [api.virt] FINISH merge return={'status': {'message': 'Done', 'code': 0}} from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:54)
The most important log is the one showing the original merge. If the merge
succeeded, we should see a log showing the new libvirt chain, which
should contain
only the parent volume.
Nir
--
Arsène Gschwind <
arsene.gschwind@unibas.ch
Universitaet Basel
--
Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch | +41 61 267 14 11

On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
It looks like the Pivot completed successfully, see attached vdsm.log. Is there a way to recover that VM? Or would it be better to recover the VM from Backup?
This what we see in the log: 1. Merge request recevied 2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48) To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b 2. Starting the merge 2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945) We see the original chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) 3. The merge was completed, ready for pivot 2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b is ready (vm:5847) At this point parent volume contains all the data in top volume and we can pivot to the parent volume. 4. Vdsm detect that the merge is ready, and start the clean thread that will complete the merge 2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809) 5. Requesting pivot to parent volume: 2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to complete active layer commit (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205) 6. Pivot was successful 2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b has completed (vm:5838) 7. Vdsm wait until libvirt updates the xml: 2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219) 8. Syncronizing vdsm metadata 2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b', newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal, task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48) We can see the new chain: ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8'] 2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image] Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221) The old chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) 2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image] Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b'] (image:1231) 2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image] Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from the chain. Marking it ILLEGAL to prevent data corruption (image:1239) This matches what we see on storage. 9. Merge job is untracked 2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350> successfully completed, untracking job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8, top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752) This was a successful merge on vdsm side. We don't see any more requests for the top volume in this log. The next step to complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b but this can be done only on the SPM. To understand why this did not happen, we need engine log showing this interaction, and logs from the SPM host from the same time. Please file a bug about this and attach these logs (and the vdsm log you sent here). Fixing this vm is important but preventing this bug for other users is even more important. How to fix the volume metadata: 1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta Change: VOLTYPE=INTERNAL To: VOLTYPE=LEAF See attached file for reference. 2. Truncate the file to 512 bytes truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta 3. Verify the file size $ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta -rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta 4. Get the slot number for this volume from the LV using MD_N and compute the offset (copied from your pdf) lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e 33777993-a3a5-4aad-a24c-dfe5e473faca 6197b30d-0732-4cc7-aef0-12f9f6e9565b IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a- 85ec-4c53-a5b8-dfb4d6d987b8 33777993-a3a5-4aad-a24c-dfe5e473faca 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000- 0000-0000-0000-000000000000 5. Get the metadata from the slot to verify that we change the right metadata dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.bad Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed file, the only change should be the VOLTYPE=LEAF line, and the amount of padding. 6. Write new metadata to storage dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync < /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.fixed 7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host WARNING: this must be done on the SPM host, otherwise you may corrupt the VG metadata. If you selected the wipe-after-delete option for this disk, you want to wipe it before deleting. If you selected the discard-after-delete you want to discard the lv before deleting it. Activate the lv on the SPM host: lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b If needed, wipe it: blkdiscard --zeroout --step 32m /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b If needed, discard it: blkdiscard /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b Deactivate the lv: lvchange -an 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b Remove the lv: lvremove 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b 9. Fixing engine db Benny, Eyal: can you add instructions how to remove the volume on the engine side? After the volume is removed from engine side, starting the vm will succeed.

It can be done by deleting from the images table: $ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b'"; of course the database should be backed up before doing this On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
It looks like the Pivot completed successfully, see attached vdsm.log. Is there a way to recover that VM? Or would it be better to recover the VM from Backup?
This what we see in the log:
1. Merge request recevied
2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)
To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b
2. Starting the merge
2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945)
We see the original chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
3. The merge was completed, ready for pivot
2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b is ready (vm:5847)
At this point parent volume contains all the data in top volume and we can pivot to the parent volume.
4. Vdsm detect that the merge is ready, and start the clean thread that will complete the merge
2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)
5. Requesting pivot to parent volume:
2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to complete active layer commit (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)
6. Pivot was successful
2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b has completed (vm:5838)
7. Vdsm wait until libvirt updates the xml:
2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)
8. Syncronizing vdsm metadata
2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b', newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal, task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)
We can see the new chain: ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']
2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image] Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221)
The old chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image] Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b'] (image:1231) 2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image] Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from the chain. Marking it ILLEGAL to prevent data corruption (image:1239)
This matches what we see on storage.
9. Merge job is untracked
2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350> successfully completed, untracking job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8, top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)
This was a successful merge on vdsm side.
We don't see any more requests for the top volume in this log. The next step to complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b but this can be done only on the SPM.
To understand why this did not happen, we need engine log showing this interaction, and logs from the SPM host from the same time.
Please file a bug about this and attach these logs (and the vdsm log you sent here). Fixing this vm is important but preventing this bug for other users is even more important.
How to fix the volume metadata:
1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
Change: VOLTYPE=INTERNAL
To: VOLTYPE=LEAF
See attached file for reference.
2. Truncate the file to 512 bytes
truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
3. Verify the file size
$ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta -rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
4. Get the slot number for this volume from the LV using MD_N and compute the offset
(copied from your pdf)
lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e
33777993-a3a5-4aad-a24c-dfe5e473faca 6197b30d-0732-4cc7-aef0-12f9f6e9565b IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a- 85ec-4c53-a5b8-dfb4d6d987b8
33777993-a3a5-4aad-a24c-dfe5e473faca 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000- 0000-0000-0000-000000000000
5. Get the metadata from the slot to verify that we change the right metadata
dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.bad
Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed file, the only change should be the VOLTYPE=LEAF line, and the amount of padding.
6. Write new metadata to storage
dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync < /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.fixed
7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host
WARNING: this must be done on the SPM host, otherwise you may corrupt the VG metadata.
If you selected the wipe-after-delete option for this disk, you want to wipe it before deleting. If you selected the discard-after-delete you want to discard the lv before deleting it.
Activate the lv on the SPM host:
lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
If needed, wipe it:
blkdiscard --zeroout --step 32m /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
If needed, discard it:
blkdiscard /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
Deactivate the lv:
lvchange -an 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
Remove the lv:
lvremove 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
9. Fixing engine db
Benny, Eyal: can you add instructions how to remove the volume on the engine side?
After the volume is removed from engine side, starting the vm will succeed.

Sorry, I only replied to the question, in addition to removing the image from the images table, you may also need to set the parent as the active image and remove the snapshot referenced by this image from the database. Can you provide the output of: $ psql -U engine -d engine -c "select * from images where image_group_id = <disk_id>"; As well as $ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';" On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik <bzlotnik@redhat.com> wrote:
It can be done by deleting from the images table: $ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b'";
of course the database should be backed up before doing this
On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
It looks like the Pivot completed successfully, see attached vdsm.log. Is there a way to recover that VM? Or would it be better to recover the VM from Backup?
This what we see in the log:
1. Merge request recevied
2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)
To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b
2. Starting the merge
2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945)
We see the original chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
3. The merge was completed, ready for pivot
2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b is ready (vm:5847)
At this point parent volume contains all the data in top volume and we can pivot to the parent volume.
4. Vdsm detect that the merge is ready, and start the clean thread that will complete the merge
2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)
5. Requesting pivot to parent volume:
2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to complete active layer commit (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)
6. Pivot was successful
2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b has completed (vm:5838)
7. Vdsm wait until libvirt updates the xml:
2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)
8. Syncronizing vdsm metadata
2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b', newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal, task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)
We can see the new chain: ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']
2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image] Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221)
The old chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image] Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b'] (image:1231) 2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image] Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from the chain. Marking it ILLEGAL to prevent data corruption (image:1239)
This matches what we see on storage.
9. Merge job is untracked
2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350> successfully completed, untracking job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8, top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)
This was a successful merge on vdsm side.
We don't see any more requests for the top volume in this log. The next step to complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b but this can be done only on the SPM.
To understand why this did not happen, we need engine log showing this interaction, and logs from the SPM host from the same time.
Please file a bug about this and attach these logs (and the vdsm log you sent here). Fixing this vm is important but preventing this bug for other users is even more important.
How to fix the volume metadata:
1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
Change: VOLTYPE=INTERNAL
To: VOLTYPE=LEAF
See attached file for reference.
2. Truncate the file to 512 bytes
truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
3. Verify the file size
$ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta -rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
4. Get the slot number for this volume from the LV using MD_N and compute the offset
(copied from your pdf)
lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e
33777993-a3a5-4aad-a24c-dfe5e473faca 6197b30d-0732-4cc7-aef0-12f9f6e9565b IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a- 85ec-4c53-a5b8-dfb4d6d987b8
33777993-a3a5-4aad-a24c-dfe5e473faca 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000- 0000-0000-0000-000000000000
5. Get the metadata from the slot to verify that we change the right metadata
dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.bad
Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed file, the only change should be the VOLTYPE=LEAF line, and the amount of padding.
6. Write new metadata to storage
dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync < /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.fixed
7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host
WARNING: this must be done on the SPM host, otherwise you may corrupt the VG metadata.
If you selected the wipe-after-delete option for this disk, you want to wipe it before deleting. If you selected the discard-after-delete you want to discard the lv before deleting it.
Activate the lv on the SPM host:
lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
If needed, wipe it:
blkdiscard --zeroout --step 32m /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
If needed, discard it:
blkdiscard /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
Deactivate the lv:
lvchange -an 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
Remove the lv:
lvremove 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
9. Fixing engine db
Benny, Eyal: can you add instructions how to remove the volume on the engine side?
After the volume is removed from engine side, starting the vm will succeed.

Hi, Please find the output: select * from images where image_group_id = 'd7bd480d-2c51-4141-a386-113abf75219e'; image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodified | vm_snapshot_id | volume_type | volume_for mat | image_group_id | _create_date | _update_date | active | volume_classification | qcow_compat --------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+----------------------------+--------------------------------------+-------------+----------- ----+--------------------------------------+-------------------------------+-------------------------------+--------+-----------------------+------------- 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2020-07-06 20:38:36.093+02 | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 | 2 | 4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 2020-07-06 20:38:36.093788+02 | f | 1 | 2 6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 1 | 1970-01-01 01:00:00+01 | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | 2 | 4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 2020-07-06 20:38:52.139003+02 | t | 0 | 2 (2 rows) SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b'; snapshot_id | vm_id | snapshot_type | status | description | creation_date | app_list | vm_configuration | _create_date | _update_date | memory_metadata_disk_id | memory_dump_disk_id | vm_configuration_broken --------------------------------------+--------------------------------------+---------------+--------+-------------+----------------------------+---------------------------------------------------------------------------------------------------------------------- ---------------------------------+------------------+-------------------------------+-------------------------------+-------------------------+---------------------+------------------------- fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | b5534254-660f-44b1-bc83-d616c98ba0ba | ACTIVE | OK | Active VM | 2020-04-23 14:59:20.171+02 | kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt -guest-agent-common-1.0.14-1.el7 | | 2020-04-23 14:59:20.154023+02 | 2020-07-03 17:33:17.483215+02 | | | f (1 row) Thanks, Arsene On Sun, 2020-07-19 at 16:34 +0300, Benny Zlotnik wrote: Sorry, I only replied to the question, in addition to removing the image from the images table, you may also need to set the parent as the active image and remove the snapshot referenced by this image from the database. Can you provide the output of: $ psql -U engine -d engine -c "select * from images where image_group_id = <disk_id>"; As well as $ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';" On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik < <mailto:bzlotnik@redhat.com> bzlotnik@redhat.com
wrote:
It can be done by deleting from the images table: $ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b'"; of course the database should be backed up before doing this On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer < <mailto:nsoffer@redhat.com> nsoffer@redhat.com
wrote:
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
It looks like the Pivot completed successfully, see attached vdsm.log. Is there a way to recover that VM? Or would it be better to recover the VM from Backup? This what we see in the log: 1. Merge request recevied 2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48) To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b 2. Starting the merge 2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945) We see the original chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) 3. The merge was completed, ready for pivot 2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b is ready (vm:5847) At this point parent volume contains all the data in top volume and we can pivot to the parent volume. 4. Vdsm detect that the merge is ready, and start the clean thread that will complete the merge 2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809) 5. Requesting pivot to parent volume: 2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to complete active layer commit (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205) 6. Pivot was successful 2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b has completed (vm:5838) 7. Vdsm wait until libvirt updates the xml: 2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219) 8. Syncronizing vdsm metadata 2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b', newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal, task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48) We can see the new chain: ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8'] 2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image] Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221) The old chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) 2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image] Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b'] (image:1231) 2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image] Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from the chain. Marking it ILLEGAL to prevent data corruption (image:1239) This matches what we see on storage. 9. Merge job is untracked 2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350> successfully completed, untracking job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8, top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752) This was a successful merge on vdsm side. We don't see any more requests for the top volume in this log. The next step to complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b but this can be done only on the SPM. To understand why this did not happen, we need engine log showing this interaction, and logs from the SPM host from the same time. Please file a bug about this and attach these logs (and the vdsm log you sent here). Fixing this vm is important but preventing this bug for other users is even more important. How to fix the volume metadata: 1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta Change: VOLTYPE=INTERNAL To: VOLTYPE=LEAF See attached file for reference. 2. Truncate the file to 512 bytes truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta 3. Verify the file size $ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta -rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta 4. Get the slot number for this volume from the LV using MD_N and compute the offset (copied from your pdf) lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e 33777993-a3a5-4aad-a24c-dfe5e473faca 6197b30d-0732-4cc7-aef0-12f9f6e9565b IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a- 85ec-4c53-a5b8-dfb4d6d987b8 33777993-a3a5-4aad-a24c-dfe5e473faca 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000- 0000-0000-0000-000000000000 5. Get the metadata from the slot to verify that we change the right metadata dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.bad Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed file, the only change should be the VOLTYPE=LEAF line, and the amount of padding. 6. Write new metadata to storage dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync < /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.fixed 7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host WARNING: this must be done on the SPM host, otherwise you may corrupt the VG metadata. If you selected the wipe-after-delete option for this disk, you want to wipe it before deleting. If you selected the discard-after-delete you want to discard the lv before deleting it. Activate the lv on the SPM host: lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b If needed, wipe it: blkdiscard --zeroout --step 32m /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b If needed, discard it: blkdiscard /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b Deactivate the lv: lvchange -an 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b Remove the lv: lvremove 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b 9. Fixing engine db Benny, Eyal: can you add instructions how to remove the volume on the engine side? After the volume is removed from engine side, starting the vm will succeed. -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11

I forgot to add the `\x on` to make the output readable, can you run it with: $ psql -U engine -d engine -c "\x on" -c "<rest of the query...>" On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
Hi,
Please find the output:
select * from images where image_group_id = 'd7bd480d-2c51-4141-a386-113abf75219e';
image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodified | vm_snapshot_id | volume_type | volume_for
mat | image_group_id | _create_date | _update_date | active | volume_classification | qcow_compat
--------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+----------------------------+--------------------------------------+-------------+-----------
----+--------------------------------------+-------------------------------+-------------------------------+--------+-----------------------+-------------
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2020-07-06 20:38:36.093+02 | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 | 2 |
4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 2020-07-06 20:38:36.093788+02 | f | 1 | 2
6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 1 | 1970-01-01 01:00:00+01 | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | 2 |
4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 2020-07-06 20:38:52.139003+02 | t | 0 | 2
(2 rows)
SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';
snapshot_id | vm_id | snapshot_type | status | description | creation_date | app_list
| vm_configuration | _create_date | _update_date | memory_metadata_disk_id | memory_dump_disk_id | vm_configuration_broken
--------------------------------------+--------------------------------------+---------------+--------+-------------+----------------------------+----------------------------------------------------------------------------------------------------------------------
---------------------------------+------------------+-------------------------------+-------------------------------+-------------------------+---------------------+-------------------------
fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | b5534254-660f-44b1-bc83-d616c98ba0ba | ACTIVE | OK | Active VM | 2020-04-23 14:59:20.171+02 | kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt
-guest-agent-common-1.0.14-1.el7 | | 2020-04-23 14:59:20.154023+02 | 2020-07-03 17:33:17.483215+02 | | | f
(1 row)
Thanks, Arsene
On Sun, 2020-07-19 at 16:34 +0300, Benny Zlotnik wrote:
Sorry, I only replied to the question, in addition to removing the
image from the images table, you may also need to set the parent as
the active image and remove the snapshot referenced by this image from
the database. Can you provide the output of:
$ psql -U engine -d engine -c "select * from images where
image_group_id = <disk_id>";
As well as
$ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i
where i.vm_snapshot_id = s.snapshot_id and i.image_guid =
'6197b30d-0732-4cc7-aef0-12f9f6e9565b';"
On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik <
bzlotnik@redhat.com
wrote:
It can be done by deleting from the images table:
$ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid =
'6197b30d-0732-4cc7-aef0-12f9f6e9565b'";
of course the database should be backed up before doing this
On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer <
nsoffer@redhat.com
wrote:
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
It looks like the Pivot completed successfully, see attached vdsm.log.
Is there a way to recover that VM?
Or would it be better to recover the VM from Backup?
This what we see in the log:
1. Merge request recevied
2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START
merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',
u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID':
u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':
u'00000002-0002-0002-0002-000000000289'},
baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8',
topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0',
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97')
from=::ffff:10.34.38.31,39226,
flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227,
vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)
To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b
2. Starting the merge
2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original
chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]',
top=None, bandwidth=0, flags=12 (vm:5945)
We see the original chain:
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
3. The merge was completed, ready for pivot
2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
is ready (vm:5847)
At this point parent volume contains all the data in top volume and we can pivot
to the parent volume.
4. Vdsm detect that the merge is ready, and start the clean thread
that will complete the merge
2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread
for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)
5. Requesting pivot to parent volume:
2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to
complete active layer commit (job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)
6. Pivot was successful
2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
has completed (vm:5838)
7. Vdsm wait until libvirt updates the xml:
2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)
8. Syncronizing vdsm metadata
2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START
imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca',
imgUUID='d7bd480d-2c51-4141-a386-113abf75219e',
volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b',
newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal,
task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)
We can see the new chain:
['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']
2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image]
Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221)
The old chain:
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image]
Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b']
(image:1231)
2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image]
Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from
the chain. Marking it ILLEGAL to prevent data corruption (image:1239)
This matches what we see on storage.
9. Merge job is untracked
2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread
<vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350>
successfully completed, untracking job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97
(base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8,
top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)
This was a successful merge on vdsm side.
We don't see any more requests for the top volume in this log. The next step to
complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b
but this can be done only on the SPM.
To understand why this did not happen, we need engine log showing this
interaction,
and logs from the SPM host from the same time.
Please file a bug about this and attach these logs (and the vdsm log
you sent here).
Fixing this vm is important but preventing this bug for other users is even more
important.
How to fix the volume metadata:
1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
Change:
VOLTYPE=INTERNAL
To:
VOLTYPE=LEAF
See attached file for reference.
2. Truncate the file to 512 bytes
truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
3. Verify the file size
$ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
-rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
4. Get the slot number for this volume from the LV using MD_N and
compute the offset
(copied from your pdf)
lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e
33777993-a3a5-4aad-a24c-dfe5e473faca
6197b30d-0732-4cc7-aef0-12f9f6e9565b
IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a-
85ec-4c53-a5b8-dfb4d6d987b8
33777993-a3a5-4aad-a24c-dfe5e473faca
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000-
0000-0000-0000-000000000000
5. Get the metadata from the slot to verify that we change the right metadata
dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8-
dfb4d6d987b8.meta.bad
Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed
file, the only change
should be the VOLTYPE=LEAF line, and the amount of padding.
6. Write new metadata to storage
dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync <
/tmp/8e412b5a-85ec-4c53-a5b8-
dfb4d6d987b8.meta.fixed
7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host
WARNING: this must be done on the SPM host, otherwise you may corrupt
the VG metadata.
If you selected the wipe-after-delete option for this disk, you want
to wipe it before
deleting. If you selected the discard-after-delete you want to discard
the lv before
deleting it.
Activate the lv on the SPM host:
lvchange -ay
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
If needed, wipe it:
blkdiscard --zeroout --step 32m
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
If needed, discard it:
blkdiscard /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
Deactivate the lv:
lvchange -an
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
Remove the lv:
lvremove
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
9. Fixing engine db
Benny, Eyal: can you add instructions how to remove the volume on the
engine side?
After the volume is removed from engine side, starting the vm will succeed.
--
*Arsène Gschwind* Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch | +41 61 267 14 11

Please find the result: psql -d engine -c "\x on" -c "select * from images where image_group_id = 'd7bd480d-2c51-4141-a386-113abf75219e';" Expanded display is on. -[ RECORD 1 ]---------+------------------------------------- image_guid | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 creation_date | 2020-04-23 14:59:23+02 size | 161061273600 it_guid | 00000000-0000-0000-0000-000000000000 parentid | 00000000-0000-0000-0000-000000000000 imagestatus | 1 lastmodified | 2020-07-06 20:38:36.093+02 vm_snapshot_id | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 volume_type | 2 volume_format | 4 image_group_id | d7bd480d-2c51-4141-a386-113abf75219e _create_date | 2020-04-23 14:59:20.919344+02 _update_date | 2020-07-06 20:38:36.093788+02 active | f volume_classification | 1 qcow_compat | 2 -[ RECORD 2 ]---------+------------------------------------- image_guid | 6197b30d-0732-4cc7-aef0-12f9f6e9565b creation_date | 2020-07-06 20:38:38+02 size | 161061273600 it_guid | 00000000-0000-0000-0000-000000000000 parentid | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 imagestatus | 1 lastmodified | 1970-01-01 01:00:00+01 vm_snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 volume_type | 2 volume_format | 4 image_group_id | d7bd480d-2c51-4141-a386-113abf75219e _create_date | 2020-07-06 20:38:36.093788+02 _update_date | 2020-07-06 20:38:52.139003+02 active | t volume_classification | 0 qcow_compat | 2 psql -d engine -c "\x on" -c "SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';" Expanded display is on. -[ RECORD 1 ]-----------+------------------------------------------------------------------------------------------------------------------------------------------------------ snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 vm_id | b5534254-660f-44b1-bc83-d616c98ba0ba snapshot_type | ACTIVE status | OK description | Active VM creation_date | 2020-04-23 14:59:20.171+02 app_list | kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt-guest-agent-common-1.0.14-1.el7 vm_configuration | _create_date | 2020-04-23 14:59:20.154023+02 _update_date | 2020-07-03 17:33:17.483215+02 memory_metadata_disk_id | memory_dump_disk_id | vm_configuration_broken | f Thanks. On Tue, 2020-07-21 at 13:45 +0300, Benny Zlotnik wrote: I forgot to add the `\x on` to make the output readable, can you run it with: $ psql -U engine -d engine -c "\x on" -c "<rest of the query...>" On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind <arsene.gschwind@unibas.ch<mailto:arsene.gschwind@unibas.ch>> wrote: Hi, Please find the output: select * from images where image_group_id = 'd7bd480d-2c51-4141-a386-113abf75219e'; image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodified | vm_snapshot_id | volume_type | volume_for mat | image_group_id | _create_date | _update_date | active | volume_classification | qcow_compat --------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+----------------------------+--------------------------------------+-------------+----------- ----+--------------------------------------+-------------------------------+-------------------------------+--------+-----------------------+------------- 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2020-07-06 20:38:36.093+02 | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 | 2 | 4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 2020-07-06 20:38:36.093788+02 | f | 1 | 2 6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 1 | 1970-01-01 01:00:00+01 | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | 2 | 4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 2020-07-06 20:38:52.139003+02 | t | 0 | 2 (2 rows) SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b'; snapshot_id | vm_id | snapshot_type | status | description | creation_date | app_list | vm_configuration | _create_date | _update_date | memory_metadata_disk_id | memory_dump_disk_id | vm_configuration_broken --------------------------------------+--------------------------------------+---------------+--------+-------------+----------------------------+---------------------------------------------------------------------------------------------------------------------- ---------------------------------+------------------+-------------------------------+-------------------------------+-------------------------+---------------------+------------------------- fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | b5534254-660f-44b1-bc83-d616c98ba0ba | ACTIVE | OK | Active VM | 2020-04-23 14:59:20.171+02 | kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt -guest-agent-common-1.0.14-1.el7 | | 2020-04-23 14:59:20.154023+02 | 2020-07-03 17:33:17.483215+02 | | | f (1 row) Thanks, Arsene On Sun, 2020-07-19 at 16:34 +0300, Benny Zlotnik wrote: Sorry, I only replied to the question, in addition to removing the image from the images table, you may also need to set the parent as the active image and remove the snapshot referenced by this image from the database. Can you provide the output of: $ psql -U engine -d engine -c "select * from images where image_group_id = <disk_id>"; As well as $ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';" On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik < <mailto:bzlotnik@redhat.com> bzlotnik@redhat.com
wrote:
It can be done by deleting from the images table: $ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b'"; of course the database should be backed up before doing this On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer < <mailto:nsoffer@redhat.com> nsoffer@redhat.com
wrote:
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
It looks like the Pivot completed successfully, see attached vdsm.log. Is there a way to recover that VM? Or would it be better to recover the VM from Backup? This what we see in the log: 1. Merge request recevied 2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48) To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b 2. Starting the merge 2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945) We see the original chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) 3. The merge was completed, ready for pivot 2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b is ready (vm:5847) At this point parent volume contains all the data in top volume and we can pivot to the parent volume. 4. Vdsm detect that the merge is ready, and start the clean thread that will complete the merge 2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809) 5. Requesting pivot to parent volume: 2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to complete active layer commit (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205) 6. Pivot was successful 2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b has completed (vm:5838) 7. Vdsm wait until libvirt updates the xml: 2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219) 8. Syncronizing vdsm metadata 2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b', newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal, task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48) We can see the new chain: ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8'] 2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image] Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221) The old chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) 2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image] Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b'] (image:1231) 2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image] Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from the chain. Marking it ILLEGAL to prevent data corruption (image:1239) This matches what we see on storage. 9. Merge job is untracked 2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350> successfully completed, untracking job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8, top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752) This was a successful merge on vdsm side. We don't see any more requests for the top volume in this log. The next step to complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b but this can be done only on the SPM. To understand why this did not happen, we need engine log showing this interaction, and logs from the SPM host from the same time. Please file a bug about this and attach these logs (and the vdsm log you sent here). Fixing this vm is important but preventing this bug for other users is even more important. How to fix the volume metadata: 1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta Change: VOLTYPE=INTERNAL To: VOLTYPE=LEAF See attached file for reference. 2. Truncate the file to 512 bytes truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta 3. Verify the file size $ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta -rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta 4. Get the slot number for this volume from the LV using MD_N and compute the offset (copied from your pdf) lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e 33777993-a3a5-4aad-a24c-dfe5e473faca 6197b30d-0732-4cc7-aef0-12f9f6e9565b IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a- 85ec-4c53-a5b8-dfb4d6d987b8 33777993-a3a5-4aad-a24c-dfe5e473faca 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000- 0000-0000-0000-000000000000 5. Get the metadata from the slot to verify that we change the right metadata dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.bad Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed file, the only change should be the VOLTYPE=LEAF line, and the amount of padding. 6. Write new metadata to storage dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync < /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.fixed 7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host WARNING: this must be done on the SPM host, otherwise you may corrupt the VG metadata. If you selected the wipe-after-delete option for this disk, you want to wipe it before deleting. If you selected the discard-after-delete you want to discard the lv before deleting it. Activate the lv on the SPM host: lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b If needed, wipe it: blkdiscard --zeroout --step 32m /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b If needed, discard it: blkdiscard /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b Deactivate the lv: lvchange -an 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b Remove the lv: lvremove 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b 9. Fixing engine db Benny, Eyal: can you add instructions how to remove the volume on the engine side? After the volume is removed from engine side, starting the vm will succeed. -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11 -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11

I think you can remove 6197b30d-0732-4cc7-aef0-12f9f6e9565b from images and the corresponding snapshot, and set the parent, 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 as active (active = 't' field), and change its snapshot to be active snapshot. That is if I correctly understand the current layout, that 6197b30d-0732-4cc7-aef0-12f9f6e9565b was removed from the storage and 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 is now the only volume for the disk On Wed, Jul 22, 2020 at 1:32 PM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
Please find the result:
psql -d engine -c "\x on" -c "select * from images where image_group_id = 'd7bd480d-2c51-4141-a386-113abf75219e';"
Expanded display is on.
-[ RECORD 1 ]---------+-------------------------------------
image_guid | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
creation_date | 2020-04-23 14:59:23+02
size | 161061273600
it_guid | 00000000-0000-0000-0000-000000000000
parentid | 00000000-0000-0000-0000-000000000000
imagestatus | 1
lastmodified | 2020-07-06 20:38:36.093+02
vm_snapshot_id | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8
volume_type | 2
volume_format | 4
image_group_id | d7bd480d-2c51-4141-a386-113abf75219e
_create_date | 2020-04-23 14:59:20.919344+02
_update_date | 2020-07-06 20:38:36.093788+02
active | f
volume_classification | 1
qcow_compat | 2
-[ RECORD 2 ]---------+-------------------------------------
image_guid | 6197b30d-0732-4cc7-aef0-12f9f6e9565b
creation_date | 2020-07-06 20:38:38+02
size | 161061273600
it_guid | 00000000-0000-0000-0000-000000000000
parentid | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
imagestatus | 1
lastmodified | 1970-01-01 01:00:00+01
vm_snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2
volume_type | 2
volume_format | 4
image_group_id | d7bd480d-2c51-4141-a386-113abf75219e
_create_date | 2020-07-06 20:38:36.093788+02
_update_date | 2020-07-06 20:38:52.139003+02
active | t
volume_classification | 0
qcow_compat | 2
psql -d engine -c "\x on" -c "SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';"
Expanded display is on.
-[ RECORD 1 ]-----------+------------------------------------------------------------------------------------------------------------------------------------------------------
snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2
vm_id | b5534254-660f-44b1-bc83-d616c98ba0ba
snapshot_type | ACTIVE
status | OK
description | Active VM
creation_date | 2020-04-23 14:59:20.171+02
app_list | kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt-guest-agent-common-1.0.14-1.el7
vm_configuration |
_create_date | 2020-04-23 14:59:20.154023+02
_update_date | 2020-07-03 17:33:17.483215+02
memory_metadata_disk_id |
memory_dump_disk_id |
vm_configuration_broken | f
Thanks.
On Tue, 2020-07-21 at 13:45 +0300, Benny Zlotnik wrote:
I forgot to add the `\x on` to make the output readable, can you run it with: $ psql -U engine -d engine -c "\x on" -c "<rest of the query...>"
On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind <arsene.gschwind@unibas.ch> wrote:
Hi,
Please find the output:
select * from images where image_group_id = 'd7bd480d-2c51-4141-a386-113abf75219e';
image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodified | vm_snapshot_id | volume_type | volume_for
mat | image_group_id | _create_date | _update_date | active | volume_classification | qcow_compat
--------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+----------------------------+--------------------------------------+-------------+-----------
----+--------------------------------------+-------------------------------+-------------------------------+--------+-----------------------+-------------
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2020-07-06 20:38:36.093+02 | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 | 2 |
4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 2020-07-06 20:38:36.093788+02 | f | 1 | 2
6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 1 | 1970-01-01 01:00:00+01 | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | 2 |
4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 2020-07-06 20:38:52.139003+02 | t | 0 | 2
(2 rows)
SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';
snapshot_id | vm_id | snapshot_type | status | description | creation_date | app_list
| vm_configuration | _create_date | _update_date | memory_metadata_disk_id | memory_dump_disk_id | vm_configuration_broken
--------------------------------------+--------------------------------------+---------------+--------+-------------+----------------------------+----------------------------------------------------------------------------------------------------------------------
---------------------------------+------------------+-------------------------------+-------------------------------+-------------------------+---------------------+-------------------------
fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | b5534254-660f-44b1-bc83-d616c98ba0ba | ACTIVE | OK | Active VM | 2020-04-23 14:59:20.171+02 | kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt
-guest-agent-common-1.0.14-1.el7 | | 2020-04-23 14:59:20.154023+02 | 2020-07-03 17:33:17.483215+02 | | | f
(1 row)
Thanks, Arsene
On Sun, 2020-07-19 at 16:34 +0300, Benny Zlotnik wrote:
Sorry, I only replied to the question, in addition to removing the
image from the images table, you may also need to set the parent as
the active image and remove the snapshot referenced by this image from
the database. Can you provide the output of:
$ psql -U engine -d engine -c "select * from images where
image_group_id = <disk_id>";
As well as
$ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i
where i.vm_snapshot_id = s.snapshot_id and i.image_guid =
'6197b30d-0732-4cc7-aef0-12f9f6e9565b';"
On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik <
bzlotnik@redhat.com
wrote:
It can be done by deleting from the images table:
$ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid =
'6197b30d-0732-4cc7-aef0-12f9f6e9565b'";
of course the database should be backed up before doing this
On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer <
nsoffer@redhat.com
wrote:
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
<
arsene.gschwind@unibas.ch
wrote:
It looks like the Pivot completed successfully, see attached vdsm.log.
Is there a way to recover that VM?
Or would it be better to recover the VM from Backup?
This what we see in the log:
1. Merge request recevied
2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START
merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',
u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID':
u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':
u'00000002-0002-0002-0002-000000000289'},
baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8',
topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0',
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97')
from=::ffff:10.34.38.31,39226,
flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227,
vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)
To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b
2. Starting the merge
2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original
chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]',
top=None, bandwidth=0, flags=12 (vm:5945)
We see the original chain:
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
3. The merge was completed, ready for pivot
2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
is ready (vm:5847)
At this point parent volume contains all the data in top volume and we can pivot
to the parent volume.
4. Vdsm detect that the merge is ready, and start the clean thread
that will complete the merge
2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread
for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)
5. Requesting pivot to parent volume:
2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to
complete active layer commit (job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)
6. Pivot was successful
2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
has completed (vm:5838)
7. Vdsm wait until libvirt updates the xml:
2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)
8. Syncronizing vdsm metadata
2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START
imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca',
imgUUID='d7bd480d-2c51-4141-a386-113abf75219e',
volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b',
newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal,
task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)
We can see the new chain:
['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']
2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image]
Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221)
The old chain:
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image]
Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b']
(image:1231)
2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image]
Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from
the chain. Marking it ILLEGAL to prevent data corruption (image:1239)
This matches what we see on storage.
9. Merge job is untracked
2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread
<vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350>
successfully completed, untracking job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97
(base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8,
top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)
This was a successful merge on vdsm side.
We don't see any more requests for the top volume in this log. The next step to
complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b
but this can be done only on the SPM.
To understand why this did not happen, we need engine log showing this
interaction,
and logs from the SPM host from the same time.
Please file a bug about this and attach these logs (and the vdsm log
you sent here).
Fixing this vm is important but preventing this bug for other users is even more
important.
How to fix the volume metadata:
1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
Change:
VOLTYPE=INTERNAL
To:
VOLTYPE=LEAF
See attached file for reference.
2. Truncate the file to 512 bytes
truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
3. Verify the file size
$ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
-rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
4. Get the slot number for this volume from the LV using MD_N and
compute the offset
(copied from your pdf)
lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e
33777993-a3a5-4aad-a24c-dfe5e473faca
6197b30d-0732-4cc7-aef0-12f9f6e9565b
IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a-
85ec-4c53-a5b8-dfb4d6d987b8
33777993-a3a5-4aad-a24c-dfe5e473faca
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000-
0000-0000-0000-000000000000
5. Get the metadata from the slot to verify that we change the right metadata
dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8-
dfb4d6d987b8.meta.bad
Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed
file, the only change
should be the VOLTYPE=LEAF line, and the amount of padding.
6. Write new metadata to storage
dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync <
/tmp/8e412b5a-85ec-4c53-a5b8-
dfb4d6d987b8.meta.fixed
7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host
WARNING: this must be done on the SPM host, otherwise you may corrupt
the VG metadata.
If you selected the wipe-after-delete option for this disk, you want
to wipe it before
deleting. If you selected the discard-after-delete you want to discard
the lv before
deleting it.
Activate the lv on the SPM host:
lvchange -ay
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
If needed, wipe it:
blkdiscard --zeroout --step 32m
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
If needed, discard it:
blkdiscard /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
Deactivate the lv:
lvchange -an
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
Remove the lv:
lvremove
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
9. Fixing engine db
Benny, Eyal: can you add instructions how to remove the volume on the
engine side?
After the volume is removed from engine side, starting the vm will succeed.
--
*Arsène Gschwind* Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch | +41 61 267 14 11
--
*Arsène Gschwind* Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch | +41 61 267 14 11

On Thu, 2020-07-23 at 15:17 +0300, Benny Zlotnik wrote: I think you can remove 6197b30d-0732-4cc7-aef0-12f9f6e9565b from images and the corresponding snapshot, and set the parent, 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 as active (active = 't' field), and change its snapshot to be active snapshot. That is if I correctly understand the current layout, that 6197b30d-0732-4cc7-aef0-12f9f6e9565b was removed from the storage and 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 is now the only volume for the disk What do you mean by "change its snapshot to be active snapshot" ? Yes correct, 6197b30d-0732-4cc7-aef0-12f9f6e9565b was removed from the storage and 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 is now the only volume for the disk Thanks, arsene On Wed, Jul 22, 2020 at 1:32 PM Arsène Gschwind <arsene.gschwind@unibas.ch<mailto:arsene.gschwind@unibas.ch>> wrote: Please find the result: psql -d engine -c "\x on" -c "select * from images where image_group_id = 'd7bd480d-2c51-4141-a386-113abf75219e';" Expanded display is on. -[ RECORD 1 ]---------+------------------------------------- image_guid | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 creation_date | 2020-04-23 14:59:23+02 size | 161061273600 it_guid | 00000000-0000-0000-0000-000000000000 parentid | 00000000-0000-0000-0000-000000000000 imagestatus | 1 lastmodified | 2020-07-06 20:38:36.093+02 vm_snapshot_id | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 volume_type | 2 volume_format | 4 image_group_id | d7bd480d-2c51-4141-a386-113abf75219e _create_date | 2020-04-23 14:59:20.919344+02 _update_date | 2020-07-06 20:38:36.093788+02 active | f volume_classification | 1 qcow_compat | 2 -[ RECORD 2 ]---------+------------------------------------- image_guid | 6197b30d-0732-4cc7-aef0-12f9f6e9565b creation_date | 2020-07-06 20:38:38+02 size | 161061273600 it_guid | 00000000-0000-0000-0000-000000000000 parentid | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 imagestatus | 1 lastmodified | 1970-01-01 01:00:00+01 vm_snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 volume_type | 2 volume_format | 4 image_group_id | d7bd480d-2c51-4141-a386-113abf75219e _create_date | 2020-07-06 20:38:36.093788+02 _update_date | 2020-07-06 20:38:52.139003+02 active | t volume_classification | 0 qcow_compat | 2 psql -d engine -c "\x on" -c "SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';" Expanded display is on. -[ RECORD 1 ]-----------+------------------------------------------------------------------------------------------------------------------------------------------------------ snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 vm_id | b5534254-660f-44b1-bc83-d616c98ba0ba snapshot_type | ACTIVE status | OK description | Active VM creation_date | 2020-04-23 14:59:20.171+02 app_list | kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt-guest-agent-common-1.0.14-1.el7 vm_configuration | _create_date | 2020-04-23 14:59:20.154023+02 _update_date | 2020-07-03 17:33:17.483215+02 memory_metadata_disk_id | memory_dump_disk_id | vm_configuration_broken | f Thanks. On Tue, 2020-07-21 at 13:45 +0300, Benny Zlotnik wrote: I forgot to add the `\x on` to make the output readable, can you run it with: $ psql -U engine -d engine -c "\x on" -c "<rest of the query...>" On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind <arsene.gschwind@unibas.ch<mailto:arsene.gschwind@unibas.ch>> wrote: Hi, Please find the output: select * from images where image_group_id = 'd7bd480d-2c51-4141-a386-113abf75219e'; image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodified | vm_snapshot_id | volume_type | volume_for mat | image_group_id | _create_date | _update_date | active | volume_classification | qcow_compat --------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+----------------------------+--------------------------------------+-------------+----------- ----+--------------------------------------+-------------------------------+-------------------------------+--------+-----------------------+------------- 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2020-07-06 20:38:36.093+02 | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 | 2 | 4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 2020-07-06 20:38:36.093788+02 | f | 1 | 2 6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 | 00000000-0000-0000-0000-000000000000 | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 1 | 1970-01-01 01:00:00+01 | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | 2 | 4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 2020-07-06 20:38:52.139003+02 | t | 0 | 2 (2 rows) SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b'; snapshot_id | vm_id | snapshot_type | status | description | creation_date | app_list | vm_configuration | _create_date | _update_date | memory_metadata_disk_id | memory_dump_disk_id | vm_configuration_broken --------------------------------------+--------------------------------------+---------------+--------+-------------+----------------------------+---------------------------------------------------------------------------------------------------------------------- ---------------------------------+------------------+-------------------------------+-------------------------------+-------------------------+---------------------+------------------------- fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | b5534254-660f-44b1-bc83-d616c98ba0ba | ACTIVE | OK | Active VM | 2020-04-23 14:59:20.171+02 | kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt -guest-agent-common-1.0.14-1.el7 | | 2020-04-23 14:59:20.154023+02 | 2020-07-03 17:33:17.483215+02 | | | f (1 row) Thanks, Arsene On Sun, 2020-07-19 at 16:34 +0300, Benny Zlotnik wrote: Sorry, I only replied to the question, in addition to removing the image from the images table, you may also need to set the parent as the active image and remove the snapshot referenced by this image from the database. Can you provide the output of: $ psql -U engine -d engine -c "select * from images where image_group_id = <disk_id>"; As well as $ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';" On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik < <mailto:bzlotnik@redhat.com> bzlotnik@redhat.com
wrote:
It can be done by deleting from the images table: $ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b'"; of course the database should be backed up before doing this On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer < <mailto:nsoffer@redhat.com> nsoffer@redhat.com
wrote:
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind < <mailto:arsene.gschwind@unibas.ch> arsene.gschwind@unibas.ch
wrote:
It looks like the Pivot completed successfully, see attached vdsm.log. Is there a way to recover that VM? Or would it be better to recover the VM from Backup? This what we see in the log: 1. Merge request recevied 2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': u'00000002-0002-0002-0002-000000000289'}, baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=::ffff:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48) To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b 2. Starting the merge 2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:5945) We see the original chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) 3. The merge was completed, ready for pivot 2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b is ready (vm:5847) At this point parent volume contains all the data in top volume and we can pivot to the parent volume. 4. Vdsm detect that the merge is ready, and start the clean thread that will complete the merge 2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809) 5. Requesting pivot to parent volume: 2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to complete active layer commit (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205) 6. Pivot was successful 2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT for drive /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b has completed (vm:5838) 7. Vdsm wait until libvirt updates the xml: 2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219) 8. Syncronizing vdsm metadata 2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca', imgUUID='d7bd480d-2c51-4141-a386-113abf75219e', volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b', newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal, task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48) We can see the new chain: ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8'] 2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image] Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221) The old chain: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 < 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) 2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image] Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b'] (image:1231) 2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image] Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from the chain. Marking it ILLEGAL to prevent data corruption (image:1239) This matches what we see on storage. 9. Merge job is untracked 2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm] (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350> successfully completed, untracking job 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8, top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752) This was a successful merge on vdsm side. We don't see any more requests for the top volume in this log. The next step to complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b but this can be done only on the SPM. To understand why this did not happen, we need engine log showing this interaction, and logs from the SPM host from the same time. Please file a bug about this and attach these logs (and the vdsm log you sent here). Fixing this vm is important but preventing this bug for other users is even more important. How to fix the volume metadata: 1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta Change: VOLTYPE=INTERNAL To: VOLTYPE=LEAF See attached file for reference. 2. Truncate the file to 512 bytes truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta 3. Verify the file size $ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta -rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta 4. Get the slot number for this volume from the LV using MD_N and compute the offset (copied from your pdf) lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e 33777993-a3a5-4aad-a24c-dfe5e473faca 6197b30d-0732-4cc7-aef0-12f9f6e9565b IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a- 85ec-4c53-a5b8-dfb4d6d987b8 33777993-a3a5-4aad-a24c-dfe5e473faca 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000- 0000-0000-0000-000000000000 5. Get the metadata from the slot to verify that we change the right metadata dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.bad Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed file, the only change should be the VOLTYPE=LEAF line, and the amount of padding. 6. Write new metadata to storage dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512 count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync < /tmp/8e412b5a-85ec-4c53-a5b8- dfb4d6d987b8.meta.fixed 7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host WARNING: this must be done on the SPM host, otherwise you may corrupt the VG metadata. If you selected the wipe-after-delete option for this disk, you want to wipe it before deleting. If you selected the discard-after-delete you want to discard the lv before deleting it. Activate the lv on the SPM host: lvchange -ay 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b If needed, wipe it: blkdiscard --zeroout --step 32m /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b If needed, discard it: blkdiscard /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b Deactivate the lv: lvchange -an 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b Remove the lv: lvremove 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b 9. Fixing engine db Benny, Eyal: can you add instructions how to remove the volume on the engine side? After the volume is removed from engine side, starting the vm will succeed. -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11 -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11 -- Arsène Gschwind Fa. Sapify AG im Auftrag der universitaet Basel IT Services Klinelbergstr. 70 | CH-4056 Basel | Switzerland Tel: +41 79 449 25 63 | http://its.unibas.ch ITS-ServiceDesk: support-its@unibas.ch<mailto:support-its@unibas.ch> | +41 61 267 14 11
participants (4)
-
Arsène Gschwind
-
Benny Zlotnik
-
Gianluca Cecchi
-
Nir Soffer