Update.... perhaps I have discovered a bug somewhere?

I started another export after hours (it's very early morning hours right now, and I can tolerate a little downtime on this VM). I had the same symptoms, but this time, I just left it alone. I waited about 45 minutes with no progress.

I then ssh'd to the NFS destination (also on the 10Gbps storage network), and running tcpdump, I didn't see any traffic coming across the wire.

So I then powered off my VM, and I immediately began to see a new backup image appear in my NFS export. 

I wonder if the VM was trying to snapshot the memory and there wasn't enough on the host or something? The VM has 16GB of RAM, and there are multiple VMs on that host (although the host itself has 64GB of physical RAM, so should have been plenty).

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 3rd, 2021 at 4:10 AM, David White via Users <users@ovirt.org> wrote:
In this particular case, I have 1 (one) 250GB virtual disk..

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, August 31st, 2021 at 11:21 PM, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hi David,

how big are your VM disks ?

I suppose you have several very large ones.


Best Regards,
Strahil Nikolov


On Thu, Aug 26, 2021 at 3:27, David White via Users
<users@ovirt.org> wrote:
I have an HCI cluster running on Gluster storage. I exposed an NFS share into oVirt as a storage domain so that I could clone all of my VMs (I'm preparing to move physically to a new datacenter). I got 3-4 VMs cloned perfectly fine yesterday. But then this evening, I tried to clone a big VM, and it caused the disk to lock up. The VM went totally unresponsive, and I didn't see a way to cancel the clone. Nagios NRPE (on the client VM) was reporting server load over 65+, but I was never able to establish an SSH connection. 

Eventually, I tried restarting the ovirt-engine, per https://access.redhat.com/solutions/396753. When that didn't work, I powered down the VM completely. But the disks were still locked. So I then tried to put the storage domain into maintenance mode, but that wound up putting the entire domain into a "locked" state. Finally, eventually, the disks unlocked, and I was able to power the VM back online.

From start to finish, my VM was down for about 45 minutes, including the time when NRPE was still sending data to Nagios.

What logs should I look at, and how can I troubleshoot what went wrong here, and hopefully avoid this from happening again?

Sent with ProtonMail Secure Email.

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASEENELT4TRTXQ7MF4FKB6L75D3H75AN/