2016-09-30 15:35 GMT+02:00 Michal Skrivanek <michal.skrivanek@redhat.com>:


that is a very low level error really pointing at HW issues. It may or may not be detected by memtest…but I would give it a try


I left memtest86 running for 2 days and no error detected :(
 
The only difference that this host (vmhost01) has is that it was the first host installed in my self-hosted engine installation. But I have already reinstalled it from GUI and menawhile I've upgraded to 4.0.4 from 4.0.3.

does it happen only for the big 96GB VM? The others which you said are working, are they all small?
Might be worth trying other system stability tests, playing with safer/slower settings in BIOS, use lower CPU cluster, etc


Yep, it happens only for the 96GB VM. Other VMs with fewer RAM (16GB for example) can be created on or migrated to that host flawlessly. I'll try to play a little with BIOS settings but otherwise I'll have the HW replaced. I was only trying to rule out possible oVirt SW problems due to that host being the first I deployed (from CLI) when I installed the cluster.

Thanks!

--
Davide Ferrari
Senior Systems Engineer