According to the vdsm.log it seems that the host reports checkpoint
'ef0dfe55-c08c-4d9e-ad32-d6b6d5cbdac6' as defined checkpoint for the VM -
7-28 16:24:06,488+0200 INFO (jsonrpc/3) [api.virt] FINISH list_checkpoints
return={'result': ['cd078706-84c0-4370-a6ec-654ccd6a21aa',
'906fe1fd-4e81-473c-9582-de693985f462',
'd20de661-fcc1-43ff-86a3-6294a0a94e1b', '805b447e-d033-4e
b1-905a-d287ca50c85a', '9ae68264-a0c9-4f75-9ffa-20c40323abe9',
'8664da63-6d92-44b8-a690-04e17afb483a',
'fb6fa799-3b6f-44fa-a60e-c66ce8c76e35',
'f9f53ee7-2a8c-4dfc-a669-2dd788361eae',
'7a0f37db-02bc-49c4-bfea-a03d3474f1f8', '95d8d5c7-55fc-
4c39-8a72-fc3aaec72c39', 'aa71a6e9-174b-4d8d-a2da-1a12d5e2ab2e',
'ac833f22-c8fa-4925-becc-dcf4dac81d3f',
'7fc9afd1-076f-4985-b412-4ea029fd2af7',
'ae299a18-c4b0-46c4-81e1-f9583e6954f6',
'cee97ef9-6e7c-4518-91b1-dfa3b467f5c4', '51b3bf23-3cc
3-4a16-b6d9-942c6e4f7573', '0f376950-a25e-477a-93b8-d1d0ffbff716',
'ef0dfe55-c08c-4d9e-ad32-d6b6d5cbdac6'], 'status': {'code': 0,
'message':
'Done'}} from=::ffff:10.40.1.66,56474,
flow_id=32ba9d9d-32c2-4652-ac01-895e92123323, vmId=116aa6e
b-31a1-43db-9b1e-ad6e32fb9260 (api:54)
It means as Nir said that the host assumes the corresponding bitmap is also
exists in QEMU.
The logs only contain a short period of time so I cannot tell if some
bitmap destructive operation occurs on the VM (Live merge, cold snapshot
creation etc..)
Peter, is there any other log that may assist?
On Wed, 29 Jul 2020 at 11:28, Łukasz Kołaciński <l.kolacinski(a)storware.eu>
wrote:
Hello,
I am sending logs from vdsm, maybe they will help. I cannot find useful
libvirt logs and file libvirt.log doesn't exists.
Best Regards,
Łukasz Kołaciński
Junior Java Developer
e-mail: l.kolacinski(a)storware.eu
<m.helbert(a)storware.eu>
*[image: STORWARE]* <
http://www.storware.eu/>
*ul. Leszno 8/44 01-192 Warszawa
www.storware.eu
<
https://www.storware.eu/>*
*[image: facebook]* <
https://www.facebook.com/storware>
*[image: twitter]* <
https://twitter.com/storware>
*[image: linkedin]* <
https://www.linkedin.com/company/storware>
*[image: Storware_Stopka_09]*
<
https://www.youtube.com/channel/UCKvLitYPyAplBctXibFWrkw>
*Storware Spółka z o.o. nr wpisu do ewidencji KRS dla M.St. Warszawa
000510131* *, NIP 5213672602.** Wiadomość ta jest przeznaczona jedynie
dla osoby lub podmiotu, który jest jej adresatem i może zawierać poufne
i/lub uprzywilejowane informacje. Zakazane jest jakiekolwiek przeglądanie,
przesyłanie, rozpowszechnianie lub inne wykorzystanie tych informacji lub
podjęcie jakichkolwiek działań odnośnie tych informacji przez osoby lub
podmioty inne niż zamierzony adresat. Jeżeli Państwo otrzymali przez
pomyłkę tę informację prosimy o poinformowanie o tym nadawcy i usunięcie
tej wiadomości z wszelkich komputerów. **This message is intended only
for the person or entity to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon,
this information by persons or entities other than the intended recipient
is prohibited. If you have received this message in error, please contact
the sender and remove the material from all of your computer systems.*
------------------------------
*Od:* Nir Soffer <nsoffer(a)redhat.com>
*Wysłane:* środa, 29 lipca 2020 00:31
*Do:* Łukasz Kołaciński <l.kolacinski(a)storware.eu>
*DW:* Eyal Shenitzky <eshenitz(a)redhat.com>; Mateusz Maziarz <
m.maziarz(a)storware.eu>; Marcin Kubacki <m.kubacki(a)storware.eu>; Peter
Krempa <pkrempa(a)redhat.com>; devel <devel(a)ovirt.org>
*Temat:* Re: Issue with ovirt 4.4 after doing some incremental backups.
On Tue, Jul 28, 2020 at 4:05 PM Łukasz Kołaciński <
l.kolacinski(a)storware.eu> wrote:
Hello,
Hi Łukasz
I moved the discussion to devel(a)ovirt.org since it is more appropriate
for this issue.
After doing a few vm backups, something breaks and I am unable to perform
any operations. I cannot do incremental backups and even full backups
doesn't work. I have this issue third time. I don't know how to fix this so
I am currently making new vms for testing purposes
VDSM ovirt44-h2.storware.local command StartVmBackupVDS failed: Backup
Error: {'vm_id': '116aa6eb-31a1-43db-9b1e-ad6e32fb9260',
'backup':
<vdsm.virt.backup.BackupConfig object at 0x7f42602bba20>, 'reason':
"Error
starting backup: internal error: unable to execute QEMU command
'transaction': Dirty bitmap 'ef0dfe55-c08c-4d9e-ad32-d6b6d5cbdac6' not
found"}
This means that libvirt cannot find the dirty bitmap when starting the
backup.
When we start a backup, we get the list of checkpoints from libvirt and we
redefine all checkpoints. We assume that all redefined checkpoints have
a bitmap in qemu at the time of the redefine.
Peter, is this assumption correct?
If libvirt and engine agree on the existing checkpoints, we start the
backup.
In this case one of the bitmaps was missing, so the backup failed.
We know about some flows that may cause loss of the bitmaps:
- copying disks (bitmaps are not copied yet)
- live storage migration (it copy the disks)
- deleting snapshots
- live migration may cause this, not tested yet
- unclean shutdown of the vm
- storage is not accessible when vm is terminated
Did you do any of these operations on the tested vm?
Some of the issues can never be fixed, like unclean shutdown or storage
issue when qemu try to write the bitmaps to disk. So your backup
application
must be able to recover from this error.
However oVirt does not provide a useful error that enables recovery. We
have
a special error when full backup is required, but it seem that this error
is
not returned in this case, and instead we return internal error.
Also since you cannot do a full backup after this error, I guess that
engine
did not delete the checkpoint with the missing bitmap. This is not
surprising
since the error returned from vdsm is a generic error (BackupError), so
engine
cannot tell what is the reason for the failure.
Did you check the backup events? What was the backup completion event?
See this example how to get backup events:
https://github.com/oVirt/ovirt-engine-sdk/blob/4a143351fcd3cdb0df8c508889...
https://github.com/oVirt/ovirt-engine-sdk/blob/4a143351fcd3cdb0df8c508889...
Eyal, looking at the API docs:
http://ovirt.github.io/ovirt-engine-api-model/master/#types/event
Event code is an integer. This is not usable for detecting errors since
the value
is not part of the API. We need an enum like:
http://ovirt.github.io/ovirt-engine-api-model/4.4/#types/image_transfer_p...
Peter, do we have a specific error code in libvirt about missing bitmap?
we need
this to pass useful error to engine, and engine needs this error to pass
useful
error to the user.
The error seen here is generated by:
flags = libvirt.VIR_DOMAIN_BACKUP_BEGIN_REUSE_EXTERNAL
try:
dom.backupBegin(backup_xml, checkpoint_xml, flags=flags)
except libvirt.libvirtError as e:
raise exception.BackupError(
reason="Error starting backup: {}".format(e),
vm_id=vm.id,
backup=backup_cfg)
Looking in the documentation:
https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin
There are no specified errors so we cannot detect the reason for the
failure
and return meaningful error to our caller.
Peter, how do you suggest to recover from internal errors? how can we
tell if this temporary error that can succeed in the next attempt, or an
error
that requires starting from full backup?
Nir
--
Regards,
Eyal Shenitzky