Another illegal disk snapshot problem!

Hi List, oVirt 4.3 I know there have been threads about this before - but I am unable to find the exact scenario I am facing. I have a VM with 3 snapshots - Active, and 2 dated ones (technically created by vProtect) After trying to do a fresh snapshot in the GUI it failed out and marked one of the old snapshot disks as 'illegal' - then the other tried to follow suit. I tried 'unlocking' the entity using the unlock_entity.sh tool but any action reverts them back to illegal. Following previous advice - I can see the VDSM status is all showing LEGAL: image: 23710238-07c2-46f3-96c0-9061fe1c3e0d - *c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1* *status: OK, voltype: INTERNAL, format: RAW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 18402942976** * - *a6d4533b-b0b0-475d-a436-26ce99a38d94** ** status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 21521768448* - 4b6f7ca1-b70d-4893-b473-d8d30138bb6b status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 12617457664 The 2 bold entries are the 'illegal' snapshots. Looking in the DB see's: select image_guid,parentid ,imagestatus ,vm_snapshot_id ,volume_type ,volume_format ,active from images where image_group_id='23710238-07c2-46f3-96c0-9061fe1c3e0d'; image_guid | parentid | imagestatus | vm_snapshot_id | volume_type | volume_format | active --------------------------------------+--------------------------------------+-------------+--------------------------------------+-------------+---------------+-------- 4b6f7ca1-b70d-4893-b473-d8d30138bb6b | a6d4533b-b0b0-475d-a436-26ce99a38d94 | 1 | d5044ae5-dc48-4700-9e46-d61e676c73fc | 2 | 4 | t c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 | 00000000-0000-0000-0000-000000000000 | 4 | 57337968-28da-4b03-ac40-134a347d8c11 | 2 | 5 | f a6d4533b-b0b0-475d-a436-26ce99a38d94 | c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 | 4 | d2d82724-9fe7-452c-a114-f8d70b555520 | 2 | 4 | f So from here previous advice has been to do things such as delete the snapshot/disk etc but thats when they showed as illegal status. I also notice the Active Image is not the same image that has a parentid of all 00000's so im not sure on the process of possibly deleting the other snapshots and disks cleanly and/or safely. Deleting or any tasks in the gui 100% fails and its at a point that if I shut down the (critical) VM it will not come back on because of these status'. On top of this what is a good way to take a clean, manual backup of current in use disk before I start playing with this in case worse comes to worse I have to try build it as a new server (As at this point I can't trust my vProtect backups) Any help appreciated. Thanks, Joe

Hi We have the same issue as you, and we are also using vProtect. I have no solution, but I'm very interested in how to address this. Some VM:s we do have managed to remove the illegal snapshots after changing storage for the VM:s disks, but we have 3-4 VM:s that will not want to remove the illegal snapshot. As for us, this issue has escalated the last couple of months. Is it only us who have these issues or does people not take backup of their VM:s? Feels like more people should have these issues. //Magnus ________________________________ From: Joseph Goldman <joseph@goldman.id.au> Sent: 08 December 2020 10:57 To: users@ovirt.org <users@ovirt.org> Subject: [ovirt-users] Another illegal disk snapshot problem! Hi List, oVirt 4.3 I know there have been threads about this before - but I am unable to find the exact scenario I am facing. I have a VM with 3 snapshots - Active, and 2 dated ones (technically created by vProtect) After trying to do a fresh snapshot in the GUI it failed out and marked one of the old snapshot disks as 'illegal' - then the other tried to follow suit. I tried 'unlocking' the entity using the unlock_entity.sh tool but any action reverts them back to illegal. Following previous advice - I can see the VDSM status is all showing LEGAL: image: 23710238-07c2-46f3-96c0-9061fe1c3e0d - c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 status: OK, voltype: INTERNAL, format: RAW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 18402942976 - a6d4533b-b0b0-475d-a436-26ce99a38d94 status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 21521768448 - 4b6f7ca1-b70d-4893-b473-d8d30138bb6b status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE, capacity: 107374182400, truesize: 12617457664 The 2 bold entries are the 'illegal' snapshots. Looking in the DB see's: select image_guid,parentid ,imagestatus ,vm_snapshot_id ,volume_type ,volume_format ,active from images where image_group_id='23710238-07c2-46f3-96c0-9061fe1c3e0d'; image_guid | parentid | imagestatus | vm_snapshot_id | volume_type | volume_format | active --------------------------------------+--------------------------------------+-------------+--------------------------------------+-------------+---------------+-------- 4b6f7ca1-b70d-4893-b473-d8d30138bb6b | a6d4533b-b0b0-475d-a436-26ce99a38d94 | 1 | d5044ae5-dc48-4700-9e46-d61e676c73fc | 2 | 4 | t c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 | 00000000-0000-0000-0000-000000000000 | 4 | 57337968-28da-4b03-ac40-134a347d8c11 | 2 | 5 | f a6d4533b-b0b0-475d-a436-26ce99a38d94 | c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 | 4 | d2d82724-9fe7-452c-a114-f8d70b555520 | 2 | 4 | f So from here previous advice has been to do things such as delete the snapshot/disk etc but thats when they showed as illegal status. I also notice the Active Image is not the same image that has a parentid of all 00000's so im not sure on the process of possibly deleting the other snapshots and disks cleanly and/or safely. Deleting or any tasks in the gui 100% fails and its at a point that if I shut down the (critical) VM it will not come back on because of these status'. On top of this what is a good way to take a clean, manual backup of current in use disk before I start playing with this in case worse comes to worse I have to try build it as a new server (As at this point I can't trust my vProtect backups) Any help appreciated. Thanks, Joe

Active Image is not the same image that has a parentid of all 00000 Can you elaborate on this? The image with the empty parent is usually
Do you know why your snapshot creation failed? Do you have logs with the error? On paper the situation does not look too bad, as the only discrepancy between the database and vdsm is the status of the image, and since it's legal on vdsm, changing it legal in database should work (image status 1) the base image (the first active image), the active image will usually be the leaf (unless the VM is in preview or something similar) Of course do not make any changes without backing up first

On 8/12/2020 10:55 pm, Benny Zlotnik wrote:
Do you know why your snapshot creation failed? Do you have logs with the error? Here is the log (grepped for ERROR only to keep it a bit less verbose):
On paper the situation does not look too bad, as the only discrepancy between the database and vdsm is the status of the image, and since it's legal on vdsm, changing it legal in database should work (image status 1) Using unlock_entity.sh -t all sets the status back to 1 (confirmed in DB) and then trying to create does not change it back to illegal, but
[root@ov-engine ~]# tail -f /var/log/ovirt-engine/engine.log | grep ERROR 2020-12-08 22:03:13,679+10 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-72) [9b2283fe-37cc-436c-89df-37c81abcb2e1] Failed in 'SnapshotVDS' method 2020-12-08 22:03:13,710+10 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-72) [9b2283fe-37cc-436c-89df-37c81abcb2e1] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov-node1 command SnapshotVDS failed: Snapshot failed 2020-12-08 22:03:13,710+10 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-72) [9b2283fe-37cc-436c-89df-37c81abcb2e1] Command 'SnapshotVDSCommand(HostName = ov-node1, SnapshotVDSCommandParameters:{hostId='40b344e3-0508-48f4-9c86-589faa630adb', vmId='2a0df965-8434-4074-85cf-df12a69648e7'})' execution failed: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48 2020-12-08 22:03:14,885+10 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-15) [9b2283fe-37cc-436c-89df-37c81abcb2e1] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand' with failure. 2020-12-08 22:03:14,932+10 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-15) [9b2283fe-37cc-436c-89df-37c81abcb2e1] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotDiskCommand' with failure. 2020-12-08 22:03:14,953+10 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-15) [9b2283fe-37cc-436c-89df-37c81abcb2e1] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand' with failure. 2020-12-08 22:03:14,977+10 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMRevertTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-15) [9b2283fe-37cc-436c-89df-37c81abcb2e1] Trying to revert unknown task '4c2ec360-5a00-4bae-bc25-9d8c2d698172' 2020-12-08 22:03:16,309+10 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-15) [] EVENT_ID: USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Failed to complete snapshot 'Test' creation for VM 'prod-DC1'. trying to delete that snapshot fails and sets it back to 4.
Active Image is not the same image that has a parentid of all 00000 Can you elaborate on this? The image with the empty parent is usually the base image (the first active image), the active image will usually be the leaf (unless the VM is in preview or something similar) This is probably just my misunderstanding of the snapshot structure. Of course do not make any changes without backing up first Whats the best way to back up this (running) VM without snapshots? Can I just copy the folder and disk? Is there a way to copy it so it doesn't copy it as a 100G file (when only 20G used)?
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z3FBWNGWNQ3U4U...

[root@ov-engine ~]# tail -f /var/log/ovirt-engine/engine.log | grep ERROR grepping error is ok, but it does not show the reason for the failure, which will probably be on the vdsm host (you can use flow_id 9b2283fe-37cc-436c-89df-37c81abcb2e1 to find the correct file Need to see the underlying error causing: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48
Using unlock_entity.sh -t all sets the status back to 1 (confirmed in DB) and then trying to create does not change it back to illegal, but trying to delete that snapshot fails and sets it back to 4. I see, can you share the removal failure log (similar information as requested above)
regarding backup, I don't have a good answer, hopefully someone else has suggestions

Looks like the physical files dont exist: 2020-12-09 22:01:00,122+1000 INFO (jsonrpc/4) [api.virt] START merge(drive={u'imageID': u'23710238-07c2-46f3-96c0-9061fe1c3e0d', u'volumeID': u'4b6f7ca1-b70d-4893-b473-d8d30138bb6b', u'domainID': u'74c06ce1-94e6-4064-9d7d-69e1d956645b', u'poolID': u'e2540c6a-33c7-4ac7-b2a2-175cf51994c2'}, baseVolUUID=u'c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1', topVolUUID=u'a6d4533b-b0b0-475d-a436-26ce99a38d94', bandwidth=u'0', jobUUID=u'ff193892-356b-4db8-b525-e543e8e69d6a') from=::ffff:192.168.5.10,56030, flow_id=c149117a-1080-424c-85d8-3de2103ac4ae, vmId=2a0df965-8434-4074-85cf-df12a69648e7 (api:48) 2020-12-09 22:01:00,122+1000 INFO (jsonrpc/4) [api.virt] FINISH merge return={'status': {'message': 'Drive image file could not be found', 'code': 13}} from=::ffff:192.168.5.10,56030, flow_id=c149117a-1080-424c-85d8-3de2103ac4ae, vmId=2a0df965-8434-4074-85cf-df12a69648e7 (api:54) Although looking on the physical file system they seem to exist: [root@ov-node1 23710238-07c2-46f3-96c0-9061fe1c3e0d]# ll total 56637572 -rw-rw----. 1 vdsm kvm 15936061440 Dec 9 21:51 4b6f7ca1-b70d-4893-b473-d8d30138bb6b -rw-rw----. 1 vdsm kvm 1048576 Dec 8 01:11 4b6f7ca1-b70d-4893-b473-d8d30138bb6b.lease -rw-r--r--. 1 vdsm kvm 252 Dec 9 21:37 4b6f7ca1-b70d-4893-b473-d8d30138bb6b.meta -rw-rw----. 1 vdsm kvm 21521825792 Dec 8 01:47 a6d4533b-b0b0-475d-a436-26ce99a38d94 -rw-rw----. 1 vdsm kvm 1048576 May 17 2020 a6d4533b-b0b0-475d-a436-26ce99a38d94.lease -rw-r--r--. 1 vdsm kvm 256 Dec 8 01:49 a6d4533b-b0b0-475d-a436-26ce99a38d94.meta -rw-rw----. 1 vdsm kvm 107374182400 Dec 9 01:13 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 -rw-rw----. 1 vdsm kvm 1048576 Feb 24 2020 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1.lease -rw-r--r--. 1 vdsm kvm 320 May 17 2020 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1.meta The UUID's match the UUID's in the snapshot list. So much stuff happens in vdsm.log its hard to pinpoint whats going on but grepping 'c149117a-1080-424c-85d8-3de2103ac4ae' (flow-id) shows pretty much those 2 calls and then XML dump. Still a bit lost on the most comfortable way forward unfortunately. On 8/12/2020 11:15 pm, Benny Zlotnik wrote:
[root@ov-engine ~]# tail -f /var/log/ovirt-engine/engine.log | grep ERROR grepping error is ok, but it does not show the reason for the failure, which will probably be on the vdsm host (you can use flow_id 9b2283fe-37cc-436c-89df-37c81abcb2e1 to find the correct file Need to see the underlying error causing: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48
Using unlock_entity.sh -t all sets the status back to 1 (confirmed in DB) and then trying to create does not change it back to illegal, but trying to delete that snapshot fails and sets it back to 4. I see, can you share the removal failure log (similar information as requested above)
regarding backup, I don't have a good answer, hopefully someone else has suggestions _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MJHKYBPBTINAWY...

The VM is running, right? Can you run: $ virsh -r dumpxml <vm_name> On Wed, Dec 9, 2020 at 2:01 PM Joseph Goldman <joseph@goldman.id.au> wrote:
Looks like the physical files dont exist:
2020-12-09 22:01:00,122+1000 INFO (jsonrpc/4) [api.virt] START merge(drive={u'imageID': u'23710238-07c2-46f3-96c0-9061fe1c3e0d', u'volumeID': u'4b6f7ca1-b70d-4893-b473-d8d30138bb6b', u'domainID': u'74c06ce1-94e6-4064-9d7d-69e1d956645b', u'poolID': u'e2540c6a-33c7-4ac7-b2a2-175cf51994c2'}, baseVolUUID=u'c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1', topVolUUID=u'a6d4533b-b0b0-475d-a436-26ce99a38d94', bandwidth=u'0', jobUUID=u'ff193892-356b-4db8-b525-e543e8e69d6a') from=::ffff:192.168.5.10,56030, flow_id=c149117a-1080-424c-85d8-3de2103ac4ae, vmId=2a0df965-8434-4074-85cf-df12a69648e7 (api:48)
2020-12-09 22:01:00,122+1000 INFO (jsonrpc/4) [api.virt] FINISH merge return={'status': {'message': 'Drive image file could not be found', 'code': 13}} from=::ffff:192.168.5.10,56030, flow_id=c149117a-1080-424c-85d8-3de2103ac4ae, vmId=2a0df965-8434-4074-85cf-df12a69648e7 (api:54)
Although looking on the physical file system they seem to exist:
[root@ov-node1 23710238-07c2-46f3-96c0-9061fe1c3e0d]# ll total 56637572 -rw-rw----. 1 vdsm kvm 15936061440 Dec 9 21:51 4b6f7ca1-b70d-4893-b473-d8d30138bb6b -rw-rw----. 1 vdsm kvm 1048576 Dec 8 01:11 4b6f7ca1-b70d-4893-b473-d8d30138bb6b.lease -rw-r--r--. 1 vdsm kvm 252 Dec 9 21:37 4b6f7ca1-b70d-4893-b473-d8d30138bb6b.meta -rw-rw----. 1 vdsm kvm 21521825792 Dec 8 01:47 a6d4533b-b0b0-475d-a436-26ce99a38d94 -rw-rw----. 1 vdsm kvm 1048576 May 17 2020 a6d4533b-b0b0-475d-a436-26ce99a38d94.lease -rw-r--r--. 1 vdsm kvm 256 Dec 8 01:49 a6d4533b-b0b0-475d-a436-26ce99a38d94.meta -rw-rw----. 1 vdsm kvm 107374182400 Dec 9 01:13 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 -rw-rw----. 1 vdsm kvm 1048576 Feb 24 2020 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1.lease -rw-r--r--. 1 vdsm kvm 320 May 17 2020 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1.meta
The UUID's match the UUID's in the snapshot list.
So much stuff happens in vdsm.log its hard to pinpoint whats going on but grepping 'c149117a-1080-424c-85d8-3de2103ac4ae' (flow-id) shows pretty much those 2 calls and then XML dump.
Still a bit lost on the most comfortable way forward unfortunately.
On 8/12/2020 11:15 pm, Benny Zlotnik wrote:
[root@ov-engine ~]# tail -f /var/log/ovirt-engine/engine.log | grep ERROR grepping error is ok, but it does not show the reason for the failure, which will probably be on the vdsm host (you can use flow_id 9b2283fe-37cc-436c-89df-37c81abcb2e1 to find the correct file Need to see the underlying error causing: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48
Using unlock_entity.sh -t all sets the status back to 1 (confirmed in DB) and then trying to create does not change it back to illegal, but trying to delete that snapshot fails and sets it back to 4. I see, can you share the removal failure log (similar information as requested above)
regarding backup, I don't have a good answer, hopefully someone else has suggestions _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MJHKYBPBTINAWY...

Attached XML dump. Looks like its let me run a 'reboot' but im afraid to do a shutdown at this point. I have taken just a raw copy of the whole image group folder in the hope if worse came to worse I'd be able to recreate the disk with the actual files. All existing files seem to be referenced in the xmldump. On 9/12/2020 11:54 pm, Benny Zlotnik wrote:
The VM is running, right? Can you run: $ virsh -r dumpxml <vm_name>
On Wed, Dec 9, 2020 at 2:01 PM Joseph Goldman <joseph@goldman.id.au> wrote:
Looks like the physical files dont exist:
2020-12-09 22:01:00,122+1000 INFO (jsonrpc/4) [api.virt] START merge(drive={u'imageID': u'23710238-07c2-46f3-96c0-9061fe1c3e0d', u'volumeID': u'4b6f7ca1-b70d-4893-b473-d8d30138bb6b', u'domainID': u'74c06ce1-94e6-4064-9d7d-69e1d956645b', u'poolID': u'e2540c6a-33c7-4ac7-b2a2-175cf51994c2'}, baseVolUUID=u'c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1', topVolUUID=u'a6d4533b-b0b0-475d-a436-26ce99a38d94', bandwidth=u'0', jobUUID=u'ff193892-356b-4db8-b525-e543e8e69d6a') from=::ffff:192.168.5.10,56030, flow_id=c149117a-1080-424c-85d8-3de2103ac4ae, vmId=2a0df965-8434-4074-85cf-df12a69648e7 (api:48)
2020-12-09 22:01:00,122+1000 INFO (jsonrpc/4) [api.virt] FINISH merge return={'status': {'message': 'Drive image file could not be found', 'code': 13}} from=::ffff:192.168.5.10,56030, flow_id=c149117a-1080-424c-85d8-3de2103ac4ae, vmId=2a0df965-8434-4074-85cf-df12a69648e7 (api:54)
Although looking on the physical file system they seem to exist:
[root@ov-node1 23710238-07c2-46f3-96c0-9061fe1c3e0d]# ll total 56637572 -rw-rw----. 1 vdsm kvm 15936061440 Dec 9 21:51 4b6f7ca1-b70d-4893-b473-d8d30138bb6b -rw-rw----. 1 vdsm kvm 1048576 Dec 8 01:11 4b6f7ca1-b70d-4893-b473-d8d30138bb6b.lease -rw-r--r--. 1 vdsm kvm 252 Dec 9 21:37 4b6f7ca1-b70d-4893-b473-d8d30138bb6b.meta -rw-rw----. 1 vdsm kvm 21521825792 Dec 8 01:47 a6d4533b-b0b0-475d-a436-26ce99a38d94 -rw-rw----. 1 vdsm kvm 1048576 May 17 2020 a6d4533b-b0b0-475d-a436-26ce99a38d94.lease -rw-r--r--. 1 vdsm kvm 256 Dec 8 01:49 a6d4533b-b0b0-475d-a436-26ce99a38d94.meta -rw-rw----. 1 vdsm kvm 107374182400 Dec 9 01:13 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 -rw-rw----. 1 vdsm kvm 1048576 Feb 24 2020 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1.lease -rw-r--r--. 1 vdsm kvm 320 May 17 2020 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1.meta
The UUID's match the UUID's in the snapshot list.
So much stuff happens in vdsm.log its hard to pinpoint whats going on but grepping 'c149117a-1080-424c-85d8-3de2103ac4ae' (flow-id) shows pretty much those 2 calls and then XML dump.
Still a bit lost on the most comfortable way forward unfortunately.
On 8/12/2020 11:15 pm, Benny Zlotnik wrote:
[root@ov-engine ~]# tail -f /var/log/ovirt-engine/engine.log | grep ERROR grepping error is ok, but it does not show the reason for the failure, which will probably be on the vdsm host (you can use flow_id 9b2283fe-37cc-436c-89df-37c81abcb2e1 to find the correct file Need to see the underlying error causing: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48
Using unlock_entity.sh -t all sets the status back to 1 (confirmed in DB) and then trying to create does not change it back to illegal, but trying to delete that snapshot fails and sets it back to 4. I see, can you share the removal failure log (similar information as requested above)
regarding backup, I don't have a good answer, hopefully someone else has suggestions _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MJHKYBPBTINAWY...

yes, the VM looks fine... to investigate this further I'd need the full log vdsm log with error, please share it On Wed, Dec 9, 2020 at 3:01 PM Joseph Goldman <joseph@goldman.id.au> wrote:
Attached XML dump.
Looks like its let me run a 'reboot' but im afraid to do a shutdown at this point.
I have taken just a raw copy of the whole image group folder in the hope if worse came to worse I'd be able to recreate the disk with the actual files.
All existing files seem to be referenced in the xmldump.
On 9/12/2020 11:54 pm, Benny Zlotnik wrote:
The VM is running, right? Can you run: $ virsh -r dumpxml <vm_name>
On Wed, Dec 9, 2020 at 2:01 PM Joseph Goldman <joseph@goldman.id.au> wrote:
Looks like the physical files dont exist:
2020-12-09 22:01:00,122+1000 INFO (jsonrpc/4) [api.virt] START merge(drive={u'imageID': u'23710238-07c2-46f3-96c0-9061fe1c3e0d', u'volumeID': u'4b6f7ca1-b70d-4893-b473-d8d30138bb6b', u'domainID': u'74c06ce1-94e6-4064-9d7d-69e1d956645b', u'poolID': u'e2540c6a-33c7-4ac7-b2a2-175cf51994c2'}, baseVolUUID=u'c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1', topVolUUID=u'a6d4533b-b0b0-475d-a436-26ce99a38d94', bandwidth=u'0', jobUUID=u'ff193892-356b-4db8-b525-e543e8e69d6a') from=::ffff:192.168.5.10,56030, flow_id=c149117a-1080-424c-85d8-3de2103ac4ae, vmId=2a0df965-8434-4074-85cf-df12a69648e7 (api:48)
2020-12-09 22:01:00,122+1000 INFO (jsonrpc/4) [api.virt] FINISH merge return={'status': {'message': 'Drive image file could not be found', 'code': 13}} from=::ffff:192.168.5.10,56030, flow_id=c149117a-1080-424c-85d8-3de2103ac4ae, vmId=2a0df965-8434-4074-85cf-df12a69648e7 (api:54)
Although looking on the physical file system they seem to exist:
[root@ov-node1 23710238-07c2-46f3-96c0-9061fe1c3e0d]# ll total 56637572 -rw-rw----. 1 vdsm kvm 15936061440 Dec 9 21:51 4b6f7ca1-b70d-4893-b473-d8d30138bb6b -rw-rw----. 1 vdsm kvm 1048576 Dec 8 01:11 4b6f7ca1-b70d-4893-b473-d8d30138bb6b.lease -rw-r--r--. 1 vdsm kvm 252 Dec 9 21:37 4b6f7ca1-b70d-4893-b473-d8d30138bb6b.meta -rw-rw----. 1 vdsm kvm 21521825792 Dec 8 01:47 a6d4533b-b0b0-475d-a436-26ce99a38d94 -rw-rw----. 1 vdsm kvm 1048576 May 17 2020 a6d4533b-b0b0-475d-a436-26ce99a38d94.lease -rw-r--r--. 1 vdsm kvm 256 Dec 8 01:49 a6d4533b-b0b0-475d-a436-26ce99a38d94.meta -rw-rw----. 1 vdsm kvm 107374182400 Dec 9 01:13 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1 -rw-rw----. 1 vdsm kvm 1048576 Feb 24 2020 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1.lease -rw-r--r--. 1 vdsm kvm 320 May 17 2020 c3dadf14-bb4e-45a7-8bee-b9a01fe29ae1.meta
The UUID's match the UUID's in the snapshot list.
So much stuff happens in vdsm.log its hard to pinpoint whats going on but grepping 'c149117a-1080-424c-85d8-3de2103ac4ae' (flow-id) shows pretty much those 2 calls and then XML dump.
Still a bit lost on the most comfortable way forward unfortunately.
On 8/12/2020 11:15 pm, Benny Zlotnik wrote:
[root@ov-engine ~]# tail -f /var/log/ovirt-engine/engine.log | grep ERROR grepping error is ok, but it does not show the reason for the failure, which will probably be on the vdsm host (you can use flow_id 9b2283fe-37cc-436c-89df-37c81abcb2e1 to find the correct file Need to see the underlying error causing: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48
Using unlock_entity.sh -t all sets the status back to 1 (confirmed in DB) and then trying to create does not change it back to illegal, but trying to delete that snapshot fails and sets it back to 4. I see, can you share the removal failure log (similar information as requested above)
regarding backup, I don't have a good answer, hopefully someone else has suggestions _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MJHKYBPBTINAWY...
participants (3)
-
Benny Zlotnik
-
Joseph Goldman
-
Magnus Isaksson