The save operation is going at a snail's pace, though.
Using "watch du -skh", I counted about 5-7 seconds per .1 GB (1/10 of 1GB).
It's a virtual disk, but I'm using over 200GB... so at this rate, it'll take a
very long time.
I wonder if Pascal is on to something, and the export is happening over the frontend 1GB
network?
I'm going to cancel this operation, as the VM has now been down for close to an hour.
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 3rd, 2021 at 4:33 AM, David White <dmwhite823(a)protonmail.com>
wrote:
Update.... perhaps I have discovered a bug somewhere?
I started another export after hours (it's very early morning
hours right now, and I can tolerate a little downtime on this VM). I had the same
symptoms, but this time, I just left it alone. I waited about 45 minutes with no
progress.
I then ssh'd to the NFS destination (also on the 10Gbps storage
network), and running tcpdump, I didn't see any traffic coming across the wire.
So I then powered off my VM, and I immediately began to see a new
backup image appear in my NFS export.
I wonder if the VM was trying to snapshot the memory and there
wasn't enough on the host or something? The VM has 16GB of RAM, and there are multiple
VMs on that host (although the host itself has 64GB of physical RAM, so should have been
plenty).
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 3rd, 2021 at 4:10 AM, David White via Users <users(a)ovirt.org>
wrote:
> In this particular case, I have 1 (one) 250GB virtual disk..
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Tuesday, August 31st, 2021 at 11:21 PM, Strahil Nikolov
<hunter86_bg(a)yahoo.com> wrote:
>
> > Hi David,
> >
> > how big are your VM disks ?
> >
> > I suppose you have several very large ones.
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > Sent from Yahoo Mail on Android
> >
> > > On Thu, Aug 26, 2021 at 3:27, David White via Users
> > > <users(a)ovirt.org> wrote:
> > > I have an HCI cluster running on Gluster storage. I exposed an NFS share
into oVirt as a storage domain so that I could clone all of my VMs (I'm preparing to
move physically to a new datacenter). I got 3-4 VMs cloned perfectly fine yesterday. But
then this evening, I tried to clone a big VM, and it caused the disk to lock up. The VM
went totally unresponsive, and I didn't see a way to cancel the clone. Nagios NRPE (on
the client VM) was reporting server load over 65+, but I was never able to establish an
SSH connection.
> > >
> > > Eventually, I tried restarting the ovirt-engine, per
https://access.redhat.com/solutions/396753. When that didn't work, I powered down the
VM completely. But the disks were still locked. So I then tried to put the storage domain
into maintenance mode, but that wound up putting the entire domain into a
"locked" state. Finally, eventually, the disks unlocked, and I was able to power
the VM back online.
> > >
> > > From start to finish, my VM was down for about 45
minutes, including the time when NRPE was still sending data to Nagios.
> > >
> > > What logs should I look at, and how can I troubleshoot
what went wrong here, and hopefully avoid this from happening again?
> > >
> > > Sent with ProtonMail Secure Email.
> > >
> > > _______________________________________________
> > > Users mailing list -- users(a)ovirt.org
> > > To unsubscribe send an email to users-leave(a)ovirt.org
> > > Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> > > oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> > > List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASEENELT4TR...