
Setup: oVirt 3.5.1 w/hosted engine, nodes: CentOS 7, engine: CentOS 6 I am periodically seeing errors like this in my engine web UI: 2015-Mar-10, 04:42 Host node5 is not responding. It will stay in Connecting state for a grace period of 89 seconds and after that an attempt to fence the host will be issued. 2015-Mar-10, 04:42 Host node3 from cluster c1 was chosen as a proxy to execute Status command on Host node5. 2015-Mar-10, 04:42 Status of host node5 was set to Up. 2015-Mar-10, 04:42 Host node5 power management was verified successfully. The engine.log file has this: 2015-03-10 04:42:23,310 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-40) [75b9e6d9] Command ListVDSCommand(HostName = node5, HostId = 8dfd0195-f386-4e16-9379-a5287221d5bd, vds=Host[node5,8dfd0195-f386-4e16-9379-a5287221d5bd]) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exeeded This seems to happen with a random node sometimes. The VMs on the node stay up and don't appear to experience any problem. I can't find any sign of a network problem on either the node, the engine, the node hosting the engine, or the switches. I don't see anything obvious in the logs on any of the systems involved either. The node network setup is VLANs on top of a bond of two NICs, each connected to a different switch in a two-switch stack. -- Chris Adams <cma@cmadams.net>