[ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

Martin Sivak msivak at redhat.com
Wed Dec 13 14:22:00 UTC 2017


Hi,

I am afraid we do not have logs that would go that deep into the stack. DNS
resolution issues will definitely affect both the notification system (if
not using localhost smtp) and the engine status checks (because we use the
fqdn).

Best regards

Martin

On Wed, Dec 13, 2017 at 3:15 PM, Luca 'remix_tj' Lorenzetto <
lorenzetto.luca at gmail.com> wrote:

> Hello,
>
> Today i started troubleshooting more in depth on dns requests and exactly
> while i was looking at tcpdump an event of EngineUp -> EngineBadHealth
> happened.
>
> Looking at the dns requests i see this:
>
> [...]
> 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54)
> 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54)
> 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:56.044242 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:30:56.044267 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54)
> 14:31:01.049761 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:31:01.049777 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54)
> 14:31:06.052635 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:06.052649 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:11.057724 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:11.057745 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54)
> 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 >
> dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49)
> 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54)
> 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 >
> dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48)
> [...]
>
> The last dns requests has success and gets the MX address and immediately
> after i get the email reporting the status change.
>
> This is clearly an issue with name resolution, but that's not clear to me
> from the broker.log file. The only message about it that i get is:
>
> Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor
> engine-health id 139653
> 412040592 current value: {"reason": "failed liveliness check", "health":
> "bad", "vm": "up", "detail": "up"}
> Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::
> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Response: success {"reaso
> n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
>
>
> But around that messages i get no signals of error on dns queries or
> similar. Do i need to check on other log files?
>
> Luca
>
>
> On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto <
> lorenzetto.luca at gmail.com> wrote:
>
>> Hi Martin, Hi all,
>>
>> *some minutes* has passed and i've the piece of log i'm looking at.
>>
>>>>  broker.log-upbadup
>> <https://drive.google.com/file/d/1wlWZPuhgtJRBWt4xUZC-Jis8vLWM1jYD/view?usp=drive_web>
>>>>
>>
>
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento"
> John Allen Paulos, Matematico (1945-vivente)
>
> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <
> lorenzetto.luca at gmail.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171213/36386f1b/attachment.html>


More information about the Users mailing list