Il 13 dic 2017 8:19 PM, "Yaniv Kaul" <ykaul(a)redhat.com> ha scritto:
On Wed, Dec 13, 2017 at 4:15 PM, Luca 'remix_tj' Lorenzetto <
lorenzetto.luca(a)gmail.com> wrote:
Hello,
Today i started troubleshooting more in depth on dns requests and exactly
while i was looking at tcpdump an event of EngineUp -> EngineBadHealth
happened.
Looking at the dns requests i see this:
[...]
14:30:35.909201 IP kvmhost01.intranet.company.it.55654 >
dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
14:30:35.909215 IP kvmhost01.intranet.company.it.55654 >
dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54)
14:30:40.914285 IP kvmhost01.intranet.company.it.55654 >
dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
14:30:40.914316 IP kvmhost01.intranet.company.it.55654 >
dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54)
14:30:45.918306 IP kvmhost01.intranet.company.it.54885 >
dns.company.it.53: 60263+ A? engine01.intranet.company.it.i
ntranet.company.it. (74)
14:30:45.918329 IP kvmhost01.intranet.company.it.54885 >
dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.i
ntranet.company.it. (74)
14:30:50.920376 IP kvmhost01.intranet.company.it.54885 >
dns.company.it.53: 60263+ A? engine01.intranet.company.it.i
ntranet.company.it. (74)
14:30:50.920411 IP kvmhost01.intranet.company.it.54885 >
dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.i
ntranet.company.it. (74)
14:30:56.044242 <0442%2042> IP kvmhost01.intranet.company.it.58319 >
dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
14:30:56.044267 <0442%2067> IP kvmhost01.intranet.company.it.58319 >
dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54)
14:31:01.049761 <049%20761> IP kvmhost01.intranet.company.it.58319 >
dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
14:31:01.049777 <049%20777> IP kvmhost01.intranet.company.it.58319 >
dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54)
14:31:06.052635 <06%20052635> IP kvmhost01.intranet.company.it.58093 >
dns.company.it.53: 24807+ A? engine01.intranet.company.it.i
ntranet.company.it. (74)
14:31:06.052649 <06%20052649> IP kvmhost01.intranet.company.it.58093 >
dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.i
ntranet.company.it. (74)
14:31:11.057724 <0577%2024> IP kvmhost01.intranet.company.it.58093 >
dns.company.it.53: 24807+ A? engine01.intranet.company.it.i
ntranet.company.it. (74)
14:31:11.057745 <0577%2045> IP kvmhost01.intranet.company.it.58093 >
dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.i
ntranet.company.it. (74)
14:31:16.175204 IP kvmhost01.intranet.company.it.44950 >
dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
14:31:16.175225 IP kvmhost01.intranet.company.it.44950 >
dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54)
14:31:19.670746 IP kvmhost01.intranet.company.it.54689 >
dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49)
14:31:21.180295 IP kvmhost01.intranet.company.it.44950 >
dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
14:31:21.180337 IP kvmhost01.intranet.company.it.44950 >
dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54)
14:31:23.771959 IP kvmhost01.intranet.company.it.53741 >
dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48)
[...]
The last dns requests has success and gets the MX address and immediately
after i get the email reporting the status change.
Can you ensure it doesn't have multiple IPs registered for it in DNS?
dig or so should help.
Y.
No, it's not. A single ip is registered. It's for sure a dns query missing
its replies.
I'm debugging with network team on what's happening.
Anyway, i think that Broker log in debug Mode should help identifying the
source if this errors.
Maybe explaining better why liveness check has failed will reduce the
troubleshooting experiments.
Luca
This is clearly an issue with name resolution, but that's not
clear to me
from the broker.log file. The only message about it that i get is:
Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::ov
irt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor
engine-health id 139653
412040592 current value: {"reason": "failed liveliness check",
"health":
"bad", "vm": "up", "detail": "up"}
Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::o
virt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Response: success {"reaso
n": "failed liveliness check", "health": "bad",
"vm": "up", "detail": "up"}
But around that messages i get no signals of error on dns queries or
similar. Do i need to check on other log files?
Luca
On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto <
lorenzetto.luca(a)gmail.com> wrote:
> Hi Martin, Hi all,
>
> *some minutes* has passed and i've the piece of log i'm looking at.
>
>
> broker.log-upbadup
>
<
https://drive.google.com/file/d/1wlWZPuhgtJRBWt4xUZC-Jis8vLWM1jYD/view?us...
>
>
>
--
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto,
http://www.remixtj.net , <
lorenzetto.luca(a)gmail.com>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users