Il 13 dic 2017 8:19 PM, "Yaniv Kaul" <ykaul@redhat.com> ha scritto:


On Wed, Dec 13, 2017 at 4:15 PM, Luca 'remix_tj' Lorenzetto <lorenzetto.luca@gmail.com> wrote:
Hello,

Today i started troubleshooting more in depth on dns requests and exactly while i was looking at tcpdump an event of EngineUp -> EngineBadHealth happened.

Looking at the dns requests i see this:

[...]
14:30:35.909201 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
14:30:35.909215 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54)
14:30:40.914285 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
14:30:40.914316 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54)
14:30:45.918306 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it.intranet.company.it. (74)
14:30:45.918329 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.intranet.company.it. (74)
14:30:50.920376 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it.intranet.company.it. (74)
14:30:50.920411 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.intranet.company.it. (74)
14:30:56.044242 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
14:30:56.044267 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54)
14:31:01.049761 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
14:31:01.049777 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54)
14:31:11.057724 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it.intranet.company.it. (74)
14:31:11.057745 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.intranet.company.it. (74)
14:31:16.175204 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
14:31:16.175225 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54)
14:31:19.670746 IP kvmhost01.intranet.company.it.54689 > dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49)
14:31:21.180295 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
14:31:21.180337 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54)
14:31:23.771959 IP kvmhost01.intranet.company.it.53741 > dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48)
[...]

The last dns requests has success and gets the MX address and immediately after i get the email reporting the status change.

Can you ensure it doesn't have multiple IPs registered for it in DNS?
dig or so should help.
Y.
 

No, it's not. A single ip is registered. It's for sure a dns query missing its replies.

I'm debugging with network team on what's happening.

Anyway, i think that Broker log in debug Mode should help identifying the source if this errors.
Maybe explaining better why liveness check has failed will reduce the troubleshooting experiments.

Luca



This is clearly an issue with name resolution, but that's not clear to me from the broker.log file. The only message about it that i get is:

Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor engine-health id 139653
412040592 current value: {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Response: success {"reaso
n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}


But around that messages i get no signals of error on dns queries or similar. Do i need to check on other log files?

Luca


On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto <lorenzetto.luca@gmail.com> wrote:
Hi Martin, Hi all,

*some minutes* has passed and i've the piece of log i'm looking at.



 

--
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)
 
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users