Can you try with a test vm, if this happens after a Virtual Machine migration ?
What are your mount options for the storage domain ?
Best Regards,
Strahil Nikolov
В събота, 28 ноември 2020 г., 18:25:15 Гринуич+2, Vinícius Ferrão via Users
<users(a)ovirt.org> написа:
Hello,
I’m trying to discover why an oVirt 4.4.3 Cluster with two hosts and NFS shared storage on
TrueNAS 12.0 is constantly getting XFS corruption inside the VMs.
For random reasons VM’s gets corrupted, sometimes halting it or just being silent
corrupted and after a reboot the system is unable to boot due to “corruption of in-memory
data detected”. Sometimes the corrupted data are “all zeroes”, sometimes there’s data
there. In extreme cases the XFS superblock 0 get’s corrupted and the system cannot even
detect a XFS partition anymore since the magic XFS key is corrupted on the first blocks of
the virtual disk.
This is happening for a month now. We had to rollback some backups, and I don’t trust
anymore on the state of the VMs.
Using xfs_db I can see that some VM’s have corrupted superblocks but the VM is up. One in
specific, was with sb0 corrupted, so I knew when a reboot kicks in the machine will be
gone, and that’s exactly what happened.
Another day I was just installing a new CentOS 8 VM for random reasons, and after running
dnf -y update and a reboot the VM was corrupted needing XFS repair. That was an extreme
case.
So, I’ve looked on the TrueNAS logs, and there’s apparently nothing wrong on the system.
No errors logged on dmesg, nothing on /var/log/messages and no errors on the “zpools”, not
even after scrub operations. On the switch, a Catalyst 2960X, we’ve been monitoring it and
all it’s interfaces. There are no “up and down” and zero errors on all interfaces (we have
a 4x Port LACP on the TrueNAS side and 2x Port LACP on each hosts), everything seems to be
fine. The only metric that I was unable to get is “dropped packages”, but I’m don’t know
if this can be an issue or not.
Finally, on oVirt, I can’t find anything either. I looked on /var/log/messages and
/var/log/sanlock.log but there’s nothing that I found suspicious.
Is there’s anyone out there experiencing this? Our VM’s are mainly CentOS 7/8 with XFS,
there’s 3 Windows VM’s that does not seems to be affected, everything else is affected.
Thanks all.
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLYSE7HCFNW...