Hi,
I've been trying to understand why we see some heartbeat exceeded messages
with a last response a very short time ago, while the vdsHeartbeatInSeconds
option would allow for much more before triggering the disconnect.
The vdc_option vdsHeartbeatInSeconds is set to 30 by default. It is used
here in the engine, the propagates to vdsm-jsonrpc-java
int heartbeat = Config.<Integer>
getValue(ConfigValues.vdsHeartbeatInSeconds) * 1000;
TransportFactory.createVdsServer()
-> JsonRpcUtils.createStompClient()
-> StompClientPolicy
ClientPolicy
-> createClient()
-> getJsonClient()
-> setClientPolicy()
-> setRetryPolicy()
So I understand ClientPolicy.incomingHearbeat = vdsHeartbeatInSeconds*1000
= 30000ms
If this is the case, why do we see messages such as this, if
isIncomingHeartbeatExceeded() shouldn't return True in case the last
heartbeat was less than 30000ms ago?
2018-09-17 10:46:49,891+03 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor)
[] Heartbeat exceeded for host '192.168.7.235', last response arrived 1702
ms ago.
if (!this.isInInit() && this.policy.isIncomingHeartbeat() &&
this.isIncomingHeartbeatExceeded()) {
String msg = String.format("Heartbeat exceeded for host '%s',
last response arrived %s ms ago.",
getHostname(),
getHeartbeatTime());
log.error(msg);
this.disconnect(msg);
}
private long getHeartbeatTime() {
return System.currentTimeMillis() - this.lastIncomingHeartbeat;
}
private boolean isIncomingHeartbeatExceeded() {
return this.lastIncomingHeartbeat +
this.policy.getIncomingHeartbeat() < System.currentTimeMillis();
}
public ClientPolicy(int retryTimeOut, int retryNumber, int
incomingHeartbeat,
int outgoingHeartbeat, List<Class<? extends Exception>>
retryableExceptions) {
this.retryNumber = retryNumber;
this.retryTimeOut = retryTimeOut;
setIncomingHeartbeat(incomingHeartbeat);
setOutgoingHeartbeat(outgoingHeartbeat);
this.exceptions =
Collections.unmodifiableList(retryableExceptions);
}
I used byteman to print the value stored in policy.incomingHeartbeat and it
is:
If vdsHeartbeatInSeconds = 45
2018-09-27 15:17:13,673+10 INFO [stdout] (SSL Stomp Reactor)
policy.incomingHeartbeat: 33750
If vdsHeartbeatInSeconds = 30 (default)
2018-09-27 15:21:51,099+10 INFO [stdout] (SSL Stomp Reactor)
policy.incomingHeartbeat: 22500
I was expecting 45000 and 30000. But even with 33750 and 22500, I dont
understand why we see "Heartbeat exceeded" and disconnection if the last
heartbeat arrived just 1700ms ago.
I'm clearly missing something. Could someone please put some light on what
is the missing piece?
Thanks,
Germano