Hello

just for the record, after I have that server replaced (only motherboard+ram+controller, same disks), now everything works ok, so it was definitely an hardware issue.

Thanks everyone for the troubleshoot help!


2016-10-04 18:06 GMT+02:00 Michal Skrivanek <michal.skrivanek@redhat.com>:

On 3 Oct 2016, at 10:39, Davide Ferrari <davide@billymob.com> wrote:



2016-09-30 15:35 GMT+02:00 Michal Skrivanek <michal.skrivanek@redhat.com>:


that is a very low level error really pointing at HW issues. It may or may not be detected by memtest…but I would give it a try


I left memtest86 running for 2 days and no error detected :(
 
The only difference that this host (vmhost01) has is that it was the first host installed in my self-hosted engine installation. But I have already reinstalled it from GUI and menawhile I've upgraded to 4.0.4 from 4.0.3.

does it happen only for the big 96GB VM? The others which you said are working, are they all small?
Might be worth trying other system stability tests, playing with safer/slower settings in BIOS, use lower CPU cluster, etc


Yep, it happens only for the 96GB VM. Other VMs with fewer RAM (16GB for example) can be created on or migrated to that host flawlessly. I'll try to play a little with BIOS settings but otherwise I'll have the HW replaced. I was only trying to rule out possible oVirt SW problems due to that host being the first I deployed (from CLI) when I installed the cluster.

I understand. Unfortunately it really does look like some sort of incompatibility rather than a sw issue:/


Thanks!

--
Davide Ferrari
Senior Systems Engineer
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users




--
Davide Ferrari
Senior Systems Engineer