[ovirt-users] Communication errors between engine and nodes?

Chris Adams cma at cmadams.net
Fri Mar 13 13:39:02 UTC 2015


Once upon a time, Roel de Rooy <RdeRooy at motto.nl> said:
> We are observing the same thing with our oVirt environment.
> At random moments (could be a couple of times a day , once a day or even once every couple of days), we receive the "VDSNetworkException" message on one of our nodes.
> Haven't seen the "heartbeat exceeded" message, but could be that I overlooked it within our logs.
> At some rare occasions, we also do see "Host cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center", within the GUI.
> 
> VM's will continue to run normally and most of the times the nodes will be in "UP" state again within the same minute.
> 
> Will still haven't found the root cause of this issue.
> Our engine is CentOS 6.6 based and it's happing with both Centos 6 and Fedora 20 nodes.
> We are using a LCAP bond of 1Gbit ports for our management network.
> 
> As we didn't see any reports about this before, we are currently looking if something network related is causing this.

I just opened a BZ on it (since it isn't just me):

https://bugzilla.redhat.com/show_bug.cgi?id=1201779

My cluster went a couple of days without hitting this (as soon as I
posted to the list of course), but then it happened several times
overnight.  Interestingly, one error logged was communicating with the
node currently running my hosted engine.  That should rule out external
network (e.g. switch and such) issues, as those packets should not have
left the physical box.

-- 
Chris Adams <cma at cmadams.net>



More information about the Users mailing list