[ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

Yaniv Kaul ykaul at redhat.com
Wed Dec 13 19:18:39 UTC 2017


On Wed, Dec 13, 2017 at 4:15 PM, Luca 'remix_tj' Lorenzetto <
lorenzetto.luca at gmail.com> wrote:

> Hello,
>
> Today i started troubleshooting more in depth on dns requests and exactly
> while i was looking at tcpdump an event of EngineUp -> EngineBadHealth
> happened.
>
> Looking at the dns requests i see this:
>
> [...]
> 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54)
> 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54)
> 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:56.044242 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:30:56.044267 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54)
> 14:31:01.049761 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:31:01.049777 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54)
> 14:31:06.052635 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:06.052649 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:11.057724 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:11.057745 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54)
> 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 >
> dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49)
> 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54)
> 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 >
> dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48)
> [...]
>
> The last dns requests has success and gets the MX address and immediately
> after i get the email reporting the status change.
>

Can you ensure it doesn't have multiple IPs registered for it in DNS?
dig or so should help.
Y.


>
> This is clearly an issue with name resolution, but that's not clear to me
> from the broker.log file. The only message about it that i get is:
>
> Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor
> engine-health id 139653
> 412040592 current value: {"reason": "failed liveliness check", "health":
> "bad", "vm": "up", "detail": "up"}
> Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::
> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Response: success {"reaso
> n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
>
>
> But around that messages i get no signals of error on dns queries or
> similar. Do i need to check on other log files?
>
> Luca
>
>
> On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto <
> lorenzetto.luca at gmail.com> wrote:
>
>> Hi Martin, Hi all,
>>
>> *some minutes* has passed and i've the piece of log i'm looking at.
>>
>>>>  broker.log-upbadup
>> <https://drive.google.com/file/d/1wlWZPuhgtJRBWt4xUZC-Jis8vLWM1jYD/view?usp=drive_web>
>>>>
>>
>
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento"
> John Allen Paulos, Matematico (1945-vivente)
>
> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <
> lorenzetto.luca at gmail.com>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171213/305ba795/attachment.html>


More information about the Users mailing list