
If I'm not mistaken, heartbeat intervals are configured to 10 seconds by default. The command times out queries for the status of VMs on a host - any reason to suspect why that's taking long? Does it happen on specific hosts? On 11/03/15 18:40, Chris Adams wrote:
Once upon a time, Chris Adams <cma@cmadams.net> said:
2015-03-10 04:42:23,310 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-40) [75b9e6d9] Command ListVDSCommand(HostName = node5, HostId = 8dfd0195-f386-4e16-9379-a5287221d5bd, vds=Host[node5,8dfd0195-f386-4e16-9379-a5287221d5bd]) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exeeded
I'm trying to dig into this some on my own (without knowing about oVirt's internals); can somebody tell me the timeout for the dispatching of commands to vdsm? I get different things happening when the engine thinks a node has "gone away", but they all start with the same org.ovirt.engine.core.vdsbroker.vdsbroker bit (and have a network timeout of some type).
I don't see anything in common in any of the logs at the time of the error, so I'm trying to roll back to when the request was sent (but I don't know how long it took for the engine to time out before the error was logged).