On Sun, Sep 5, 2021 at 6:00 PM Pavel Bar <pbar@redhat.com> wrote:
Hi,
Please try the instructions below and update whether it helped.

Thank you!

Pavel


Thanks for input.
If I understand it correctly I have to complete the steps described by Nir and then work at db level.

Right now what I see in the table is:

engine=# \x
Expanded display is on.
engine=# select * from vm_backups;
-[ RECORD 1 ]------+-------------------------------------
backup_id          | 68f83141-9d03-4cb0-84d4-e71fdd8753bb
from_checkpoint_id |
to_checkpoint_id   | d31e35b6-bd16-46d2-a053-eabb26d283f5
vm_id              | dc386237-1e98-40c8-9d3d-45658163d1e2
phase              | Finalizing
_create_date       | 2021-09-03 15:31:11.447+02
host_id            | cc241ec7-64fc-4c93-8cec-9e0e7005a49d

engine=#

see below my doubts...

On Sun, 5 Sept 2021 at 18:41, Nir Soffer <nsoffer@redhat.com> wrote:
On Sat, Sep 4, 2021 at 1:08 AM Gianluca Cecchi
<gianluca.cecchi@gmail.com> wrote:
...
>>> ovirt_imageio._internal.nbd.ReplyError: Writing to file failed: [Error 28] No space left on device
>> This error is expected if you don't have space to write the data.
> ok.

I forgot to mention that running backup on engine host is not recommended.
It is better to run the backup on the hypervisor, speeding up the data copy.

OK, I will take care of it, thanks.

>>> How can I clean the situation?
>>
>> 1. Stop the current backup

 
>> If stopping the backup failed, stopping the VM will stop the backup.

OK, I will try to fix it with the VM running if possible, before going and stopping it.


> But if I try the stop command I get the error
>
> [g.cecchi@ovmgr1 ~]$ python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py -c ovmgr1 stop dc386237-1e98-40c8-9d3d-45658163d1e2 68f83141-9d03-4cb0-84d4-e71fdd8753bb
> [   0.0 ] Finalizing backup '68f83141-9d03-4cb0-84d4-e71fdd8753bb'
> Traceback (most recent call last):
...
> ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot stop VM backup. The VM backup is not in READY phase, backup phase is FINALIZING. Please try again when the backup is in READY phase.]". HTTP response code is 409.

So your backup was already finalized, and it is stuck in "finalizing" phase.

Usually this means the backup on libvirt side was already stopped, but engine
failed to detect this and failed to complete the finalize step
(ovirt-engine bug).

You need to ensure if the backup was stopped on vdsm side.

- If the vm was stopped, the bacukp is not running
- If the vm is running, we can make sure the backup is stopped using

    vdsm-client VM stop_backup
vmID=dc386237-1e98-40c8-9d3d-45658163d1e2
backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb

The VM is still running.
The host (I see it in its events with relation to backup errors) is ov200. BTW: how can I see the mapping between host id and hostname (from the db and/or api)?

[root@ov200 ~]# vdsm-client VM stop_backup vmID=dc386237-1e98-40c8-9d3d-45658163d1e2 backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb
{
    "code": 0,
    "message": "Done"
}
[root@ov200 ~]#


If this succeeds, the backup is not running on vdsm side.

I preseum from the output above that the command succeeded, correct?

If this fails, you may need stop the VM to end the backup.

If the backup was stopped, you may need to delete the scratch disks
used in this backup.
You can find the scratch disks ids in engine logs, and delete them
from engine UI.

Any insight for finding the scratch disks ids in engine.log?
See here my engine.log and timestamp of backup (as seen in database above) is 15:31 on 03 September:

https://drive.google.com/file/d/1Ao1CIA2wlFCqMMKeXbxKXrWZXUrnJN2h/view?usp=sharing


Finally, after you cleaned up vdsm side, you can delete the backup
from engine database,
and unlock the disks.

Pavel, can you provide instructions on how to clean up engine db after
stuck backup?

Can you please try manually updating the 'phase" of the problematic backup entry in the "vm_backups" DB table to 1 of the final phases, which are either "Succeeded" or "Failed"?
This should allow creating a new backup.
image.png
 

After vdsm and engine were cleaned, new backup should work normally.

OK, so I wait for Nir input about scratch disks removal and then I go with changing the phase column for the backup.


>> 2. File a bug about this
> Filed this one, hope its is correct; I chose ovirt-imageio as the product and Client as the component:

In general backup bugs should be filed for ovirt-engine. ovirt-imageio
is rarely the
cause for a bug. We will move the bug to ovirt-imageio if needed.

> https://bugzilla.redhat.com/show_bug.cgi?id=2001136

Thanks!

Nir

ok.

Gianluca