Hello,

Anyone able to help on this?

Thanks

On Thu, Sep 27, 2018 at 3:52 PM Germano Veit Michel <germano@redhat.com> wrote:
Hi,

I've been trying to understand why we see some heartbeat exceeded messages with a last response a very short time ago, while the vdsHeartbeatInSeconds option would allow for much more before triggering the disconnect.

The vdc_option vdsHeartbeatInSeconds is set to 30 by default. It is used here in the engine, the propagates to vdsm-jsonrpc-java

int heartbeat = Config.<Integer> getValue(ConfigValues.vdsHeartbeatInSeconds) * 1000;

TransportFactory.createVdsServer() 
    -> JsonRpcUtils.createStompClient() 
        -> StompClientPolicy 
           ClientPolicy
            -> createClient() 
                -> getJsonClient()
                    -> setClientPolicy()
                    -> setRetryPolicy()

So I understand ClientPolicy.incomingHearbeat = vdsHeartbeatInSeconds*1000 = 30000ms

If this is the case, why do we see messages such as this, if isIncomingHeartbeatExceeded() shouldn't return True in case the last heartbeat was less than 30000ms ago?
2018-09-17 10:46:49,891+03 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Heartbeat exceeded for host '192.168.7.235', last response arrived 1702 ms ago.

        if (!this.isInInit() && this.policy.isIncomingHeartbeat() && this.isIncomingHeartbeatExceeded()) {
            String msg = String.format("Heartbeat exceeded for host '%s', last response arrived %s ms ago.",
                    getHostname(),
                    getHeartbeatTime());
            log.error(msg);
            this.disconnect(msg);
        }

    private long getHeartbeatTime() {                                                                                                                                                          
        return System.currentTimeMillis() - this.lastIncomingHeartbeat;                                                                                                                        
    }  

    private boolean isIncomingHeartbeatExceeded() {
        return this.lastIncomingHeartbeat + this.policy.getIncomingHeartbeat() < System.currentTimeMillis();
    }

    public ClientPolicy(int retryTimeOut, int retryNumber, int incomingHeartbeat,                                                                                                              
            int outgoingHeartbeat, List<Class<? extends Exception>> retryableExceptions) {                                                                                                     
        this.retryNumber = retryNumber;                                                                                                                                                        
        this.retryTimeOut = retryTimeOut;                                                                                                                                                      
        setIncomingHeartbeat(incomingHeartbeat);                                                                                                                                               
        setOutgoingHeartbeat(outgoingHeartbeat);                                                                                                                                               
        this.exceptions = Collections.unmodifiableList(retryableExceptions);                                                                                                                   
    } 

I used byteman to print the value stored in policy.incomingHeartbeat and it is:

If vdsHeartbeatInSeconds = 45
2018-09-27 15:17:13,673+10 INFO  [stdout] (SSL Stomp Reactor) policy.incomingHeartbeat: 33750

If vdsHeartbeatInSeconds = 30 (default)
2018-09-27 15:21:51,099+10 INFO  [stdout] (SSL Stomp Reactor) policy.incomingHeartbeat: 22500

I was expecting 45000 and 30000. But even with 33750 and 22500, I dont understand why we see "Heartbeat exceeded" and disconnection if the last heartbeat arrived just 1700ms ago.

I'm clearly missing something. Could someone please put some light on what is the missing piece?

Thanks,
Germano