Once upon a time, Roel de Rooy <RdeRooy(a)motto.nl> said:
We are observing the same thing with our oVirt environment.
At random moments (could be a couple of times a day , once a day or even once every
couple of days), we receive the "VDSNetworkException" message on one of our
nodes.
Haven't seen the "heartbeat exceeded" message, but could be that I
overlooked it within our logs.
At some rare occasions, we also do see "Host cannot access the Storage Domain(s)
<UNKNOWN> attached to the Data Center", within the GUI.
VM's will continue to run normally and most of the times the nodes will be in
"UP" state again within the same minute.
Will still haven't found the root cause of this issue.
Our engine is CentOS 6.6 based and it's happing with both Centos 6 and Fedora 20
nodes.
We are using a LCAP bond of 1Gbit ports for our management network.
As we didn't see any reports about this before, we are currently looking if something
network related is causing this.
I just opened a BZ on it (since it isn't just me):
https://bugzilla.redhat.com/show_bug.cgi?id=1201779
My cluster went a couple of days without hitting this (as soon as I
posted to the list of course), but then it happened several times
overnight. Interestingly, one error logged was communicating with the
node currently running my hosted engine. That should rule out external
network (e.g. switch and such) issues, as those packets should not have
left the physical box.
--
Chris Adams <cma(a)cmadams.net>