On Tue, Mar 19, 2019 at 1:32 PM Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
On Tue, Mar 19, 2019 at 1:33 PM Juhani Rautiainen
<juhani.rautiainen@gmail.com> wrote:
>
> On Tue, Mar 19, 2019 at 12:46 PM Juhani Rautiainen
>
> It seems that either our firewall is not responding to pings or
> something else is wrong. Looking at the broker.log this can be seen.
> Curious thing is that the reboot happens even when ping comes back in
> couple of seconds. Is there timeout in ping or does it fire them in
> quick succession?

I don't know much of Python, but I think there is a problem with
broker/ping.py. I noticed that these ping failures happen every
fifteen minutes:

[root@ovirt01 ~]# grep Failed /var/log/ovirt-hosted-engine-ha/broker.log
Thread-1::WARNING::2019-03-19
14:04:44,898::ping::63::ping.Ping::(action) Failed to ping 10.168.8.1,
(4 out of 5)
Thread-1::WARNING::2019-03-19
14:19:38,891::ping::63::ping.Ping::(action) Failed to ping 10.168.8.1,
(4 out of 5)

I monitored the firewall and network traffic in host and ping works
but that ping.py somehow thinks that it did not get replies. I can't
see anything obvius in the code. But this is from tcpdump from that
last failure time frame:

14:19:22.598518 IP ovirt01.virt.local > gateway: ICMP echo request, id
19055, seq 1, length 64
14:19:22.598705 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19055, seq 1, length 64
14:19:23.126800 IP ovirt01.virt.local > gateway: ICMP echo request, id
19056, seq 1, length 64
14:19:23.126978 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19056, seq 1, length 64
14:19:23.653544 IP ovirt01.virt.local > gateway: ICMP echo request, id
19057, seq 1, length 64
14:19:23.653731 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19057, seq 1, length 64
14:19:24.180846 IP ovirt01.virt.local > gateway: ICMP echo request, id
19058, seq 1, length 64
14:19:24.181042 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19058, seq 1, length 64
14:19:24.708083 IP ovirt01.virt.local > gateway: ICMP echo request, id
19065, seq 1, length 64
14:19:24.708274 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19065, seq 1, length 64
14:19:32.743986 IP ovirt01.virt.local > gateway: ICMP echo request, id
19141, seq 1, length 64
14:19:35.160398 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19141, seq 1, length 64
14:19:35.271171 IP ovirt01.virt.local > gateway: ICMP echo request, id
19152, seq 1, length 64
14:19:35.365315 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19152, seq 1, length 64
14:19:35.892716 IP ovirt01.virt.local > gateway: ICMP echo request, id
19154, seq 1, length 64
14:19:36.002087 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19154, seq 1, length 64
14:19:36.529263 IP ovirt01.virt.local > gateway: ICMP echo request, id
19156, seq 1, length 64
14:19:38.359281 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19156, seq 1, length 64
14:19:38.887231 IP ovirt01.virt.local > gateway: ICMP echo request, id
19201, seq 1, length 64
14:19:38.889774 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19201, seq 1, length 64
14:19:42.923684 IP ovirt01.virt.local > gateway: ICMP echo request, id
19234, seq 1, length 64
14:19:42.923951 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19234, seq 1, length 64
14:19:43.450788 IP ovirt01.virt.local > gateway: ICMP echo request, id
19235, seq 1, length 64
14:19:43.450968 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19235, seq 1, length 64
14:19:43.977791 IP ovirt01.virt.local > gateway: ICMP echo request, id
19237, seq 1, length 64
14:19:43.977965 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19237, seq 1, length 64
14:19:44.504541 IP ovirt01.virt.local > gateway: ICMP echo request, id
19238, seq 1, length 64
14:19:44.504715 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19238, seq 1, length 64
14:19:45.031570 IP ovirt01.virt.local > gateway: ICMP echo request, id
19244, seq 1, length 64
14:19:45.031752 IP gateway > ovirt01.virt.local: ICMP echo reply, id
19244, seq 1, length 64

No failed pings to be seen. So how that ping.py decides that 4 out of 5 failed??

It's just calling the system ping utility as an external process checking the exit code.
I don't see any issue with that approach.

Can you please try executing:

while true; 
   do ping -c 1 -W 2 10.168.8.1 > /dev/null; echo $?; sleep 0.5;
done
 

Thanks,
  Juhani
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UH7MKGQECM2VSI77DNRHQB56C76FJBTY/