Hello,
I have ovirt-engine version 4.5.6-1.el9, and nodes version 4.5.5.
We have about 200 VMs in the environment. So far, backups were performed using snapshot +
ovirt API + AWX.
The problem started when during backup some machines became unresponsive (it was
impossible to get to the console and it happened randomly), Prellocated disks became
ThinkProvisioning and it was not possible to delete the snapshot.
We decided to skip all scripts and click the snapshot directly in ovirt-engine and then
delete it in order to merge.
In many cases, these tasks ended successfully, and in random situations the process hung
again and the machine became unresponsive.
I thought..... maybe it was a performance problem (network to the array, the array itself,
etc.). So we set up the dev environment on different hardware (different physical servers
and different hardware with ISCSI). I copied one of the sample machines that had problems.
I made a snapshot again and merged it using deleting from the ovirt-engine interface. It
worked a few times and I decided to do another test
I made a snapshot of the test VM and logged into it and used a file about 16GB in size and
additionally copied it to another place on the disk and then tried to merge the snapshot
and again the VM became unresponsive and it was impossible to delete the snapshot.
Has anyone encountered such a problem and has a way to solve it?
I am attaching the logs from /var/log/ovirt-engine/engine.log
https://pastebin.pl/view/e6e01717