On Fri, Jul 30, 2021 at 5:47 AM luwen.zhang <luwen.zhang(a)vinchin.com> wrote:
Sorry I was trying to open a new thread for this issue, but it seems
I failed to submit. Here let me explain how the issue is reproduced.
It’s a regular backup by using CBT+imageip API, after a series of successful backup, at
one of the backup session beginning, when we try to obtain the VM config and the snapshot
list (obtain snapshot list can determine the VM virtual disk format is RAW or QCOW2)
Why do you need the snapshot list when doing incremental backup? What you need
is the list of disks in the vms, accessible via:
GET /vms/{vm-id}/diskattachments
For each disk attachment, get the disk using the diskattachment.disk.id:
GET /disks/{disk-id}/
Please check how we do this in backup_vm.py example:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup...
by using `GET vms/<vm-id>/snapshots`, but get the following
error.
<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
<fault>
<detail>duplicate key acf1edaa-e950-4c4f-94df-1bd6b3da49c1 (attempted merging
values org.ovirt.engine.core.common.businessentities.storage.diskimage@5103046c and
org.ovirt.engine.core.common.businessentities.storage.diskimage@d973046c)</detail>
<reason>Operation Failed</reason>
</fault>
We need a much more detailed steps.
This is a typical backup flow:
1. Start incremental backup
2. Wait until backup is ready (phase == READY)
3. Start image transfer for incremental backup
4. Wait until image transfer is ready (phase == TRANSFERRING)
5. Download disk incremental data
6. Finalize transfer
7. Wait until transfer is finished (phase == FINISHED_SUCCESS/FINISHED_FAILURE)
This is not easy, see this example:
https://github.com/oVirt/ovirt-engine-sdk/blob/ac6f05bb5dcd8fdee2a67b2a29...
8. Finalize backup
9. Wait until backup is finished (phase == FINISHED/FAILED)
This is easier, but possible only since 4.4.7:
https://github.com/oVirt/ovirt-engine-sdk/blob/ac6f05bb5dcd8fdee2a67b2a29...
10. Rebase backup image on previous backup (if you store backup as qcow2 layers)
Where in this flow you get the snapshot list (and other stuff?)
Getting snapshots list is likely not needed for backup, but we need to fix it
in case it is broken while running backups or image transfers.
Do you run this flow in a loop? Maybe you do not wait until the previous image
transfer was finished before starting a new backup?
After this, on oVirt engine web console, the VM show 2 disks
(actually it only has 1) , and the disk status always showing “Finalizing”, it’s been more
than 30 hours now, and during this, cannot modify VM disk or take snapshots.
Before upgrading oVirt engine to 4.4.7.7-1.el8 this problem happened frequently, after
upgrading, the frequency is reduced.
Here I’m adding the engine logs and vdsm logs.
Engine logs:
https://drive.google.com/file/d/1T3-EOxYYl3oFZOA9VMMBte5WyBoUO48U/view?us...
VDSM logs:
https://drive.google.com/file/d/1x0B8lGqnKEDrgn666CuN3hqUGwD7fcYv/view?us...
Thanks, we will check the logs next week.
Thanks & regards!
On 07/29/2021 19:20,Nir Soffer<nsoffer(a)redhat.com> wrote:
On Thu, Jul 29, 2021 at 10:08 AM luwen.zhang <luwen.zhang(a)vinchin.com> wrote:
The problem occurred yesterday, but we waited for more than 20 hours, still 2 disks and
in Finalizing state.
If the image transfer is "finalizing" it means the image transfer is
trying to finalize, but the operation could not complete.
In this phase the disk remains locked, and it should not be possible
to start a new image transfer
(e.g perform another backup).
Engine and vdsm logs should explain why the image transfer is stuck in
the finalizing phase.
Can you add detailed instructions on how to reproduce this issue?