Sorry, I only replied to the question, in addition to removing the
image from the images table, you may also need to set the parent as
the active image and remove the snapshot referenced by this image from
the database. Can you provide the output of:
$ psql -U engine -d engine -c "select * from images where
image_group_id = <disk_id>";
As well as
$ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i
where i.vm_snapshot_id = s.snapshot_id and i.image_guid =
'6197b30d-0732-4cc7-aef0-12f9f6e9565b';"
On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik <bzlotnik(a)redhat.com> wrote:
It can be done by deleting from the images table:
$ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid =
'6197b30d-0732-4cc7-aef0-12f9f6e9565b'";
of course the database should be backed up before doing this
On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
> On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
> <arsene.gschwind(a)unibas.ch> wrote:
>
> > It looks like the Pivot completed successfully, see attached vdsm.log.
> > Is there a way to recover that VM?
> > Or would it be better to recover the VM from Backup?
>
> This what we see in the log:
>
> 1. Merge request recevied
>
> 2020-07-13 11:18:30,282+0200 INFO (jsonrpc/7) [api.virt] START
> merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',
> u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b',
u'domainID':
> u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':
> u'00000002-0002-0002-0002-000000000289'},
> baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8',
> topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0',
> jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97')
> from=::ffff:10.34.38.31,39226,
> flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227,
> vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)
>
> To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
> and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> 2. Starting the merge
>
> 2020-07-13 11:18:30,690+0200 INFO (jsonrpc/7) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with
> jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original
> chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda',
base='sda[1]',
> top=None, bandwidth=0, flags=12 (vm:5945)
>
> We see the original chain:
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
>
> 3. The merge was completed, ready for pivot
>
> 2020-07-13 11:19:00,992+0200 INFO (libvirt/events) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
> for drive
/rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
> is ready (vm:5847)
>
> At this point parent volume contains all the data in top volume and we can pivot
> to the parent volume.
>
> 4. Vdsm detect that the merge is ready, and start the clean thread
> that will complete the merge
>
> 2020-07-13 11:19:06,166+0200 INFO (periodic/1) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread
> for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)
>
> 5. Requesting pivot to parent volume:
>
> 2020-07-13 11:19:06,717+0200 INFO (merge/720410c3) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to
> complete active layer commit (job
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)
>
> 6. Pivot was successful
>
> 2020-07-13 11:19:06,734+0200 INFO (libvirt/events) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
> for drive
/rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
> has completed (vm:5838)
>
> 7. Vdsm wait until libvirt updates the xml:
>
> 2020-07-13 11:19:06,756+0200 INFO (merge/720410c3) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)
>
> 8. Syncronizing vdsm metadata
>
> 2020-07-13 11:19:06,776+0200 INFO (merge/720410c3) [vdsm.api] START
> imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca',
> imgUUID='d7bd480d-2c51-4141-a386-113abf75219e',
> volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b',
> newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal,
> task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)
>
> We can see the new chain:
> ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']
>
> 2020-07-13 11:19:07,005+0200 INFO (merge/720410c3) [storage.Image]
> Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top) (image:1221)
>
> The old chain:
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
>
> 2020-07-13 11:19:07,006+0200 INFO (merge/720410c3) [storage.Image]
> Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b']
> (image:1231)
> 2020-07-13 11:19:07,017+0200 INFO (merge/720410c3) [storage.Image]
> Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from
> the chain. Marking it ILLEGAL to prevent data corruption (image:1239)
>
> This matches what we see on storage.
>
> 9. Merge job is untracked
>
> 2020-07-13 11:19:21,134+0200 INFO (periodic/1) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread
> <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350>
> successfully completed, untracking job
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
> (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8,
> top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)
>
> This was a successful merge on vdsm side.
>
> We don't see any more requests for the top volume in this log. The next step to
> complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b
> but this can be done only on the SPM.
>
> To understand why this did not happen, we need engine log showing this
> interaction,
> and logs from the SPM host from the same time.
>
> Please file a bug about this and attach these logs (and the vdsm log
> you sent here).
> Fixing this vm is important but preventing this bug for other users is even more
> important.
>
>
> How to fix the volume metadata:
>
> 1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
> Change:
> VOLTYPE=INTERNAL
>
> To:
> VOLTYPE=LEAF
>
> See attached file for reference.
>
> 2. Truncate the file to 512 bytes
>
> truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
> 3. Verify the file size
>
> $ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
> -rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
> 4. Get the slot number for this volume from the LV using MD_N and
> compute the offset
>
> (copied from your pdf)
>
> lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e
>
> 33777993-a3a5-4aad-a24c-dfe5e473faca
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b
> IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a-
> 85ec-4c53-a5b8-dfb4d6d987b8
>
> 33777993-a3a5-4aad-a24c-dfe5e473faca
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
> IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000-
> 0000-0000-0000-000000000000
>
> 5. Get the metadata from the slot to verify that we change the right metadata
>
> dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
> count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8-
> dfb4d6d987b8.meta.bad
>
> Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed
> file, the only change
> should be the VOLTYPE=LEAF line, and the amount of padding.
>
> 6. Write new metadata to storage
>
> dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
> count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync <
> /tmp/8e412b5a-85ec-4c53-a5b8-
> dfb4d6d987b8.meta.fixed
>
> 7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host
>
> WARNING: this must be done on the SPM host, otherwise you may corrupt
> the VG metadata.
>
> If you selected the wipe-after-delete option for this disk, you want
> to wipe it before
> deleting. If you selected the discard-after-delete you want to discard
> the lv before
> deleting it.
>
> Activate the lv on the SPM host:
>
> lvchange -ay
> 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> If needed, wipe it:
>
> blkdiscard --zeroout --step 32m
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> If needed, discard it:
>
> blkdiscard
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> Deactivate the lv:
>
> lvchange -an
> 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> Remove the lv:
>
> lvremove
> 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> 9. Fixing engine db
>
> Benny, Eyal: can you add instructions how to remove the volume on the
> engine side?
>
> After the volume is removed from engine side, starting the vm will succeed.