[ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

Luca 'remix_tj' Lorenzetto lorenzetto.luca at gmail.com
Sat Dec 2 16:10:43 UTC 2017


Hello,

i had several switches between EngineUp and EngineBadHealth today with
broker.log ad DEBUG level. Where i should start to identify root
cause? Log is somewhat chatty at this level.

Luca

On Fri, Dec 1, 2017 at 1:24 PM, Martin Sivak <msivak at redhat.com> wrote:
> Hi,
>
>> [logger_root]
>> level=INFO
>
>> [handler_logfile]
>> level=DEBUG
>
>> Seems already set. The file broker.log is already containing DEBUG,
>> but syslog is not (and this is good). What about logger_root?
>
> Yeah, I think you should change that one as well to get full debug
> logging. The handler level does nothing if the messages do not get to
> it. And the root logger should not let them in the default
> configuration you have.
>
> Best regards
>
> Martin
>
>> Luca
>>
>> On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak <msivak at redhat.com> wrote:
>>> Hi,
>>>
>>> can you please enable DEBUG log and then attach broker.log once it
>>> reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the
>>> place where to set it (do not forget to restart ovirt-ha-agent and
>>> ovirt-ha-broker afterwards).
>>>
>>> Name resolution issues might be the cause for this indeed, because the
>>> broker is trying to query a health endpoint over HTTP. If
>>> notifications failed because of unresolvable name then there is high
>>> chance the same happens to the health request every now and then.
>>>
>>> Best regards
>>>
>>> Martin Sivak
>>>
>>> On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto
>>> <lorenzetto.luca at gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> since some days my hosted-engine environments (one RHEV 4.0.7, one
>>>> ovirt 4.1.7) continue to send mails about changes between EngineUp and
>>>> EngineBadHealth.
>>>>
>>>> This is pretty annoying and i'm not able to find out the root cause.
>>>>
>>>> The only issue i've seen on hosts is this error appearing sometimes
>>>> randomly about sending mails.
>>>>
>>>> Thread-1::ERROR::2017-12-01
>>>> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
>>>> [Errno -2] Name or service not known
>>>> Traceback (most recent call last):
>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
>>>> line 26, in send_email
>>>>     timeout=float(cfg["smtp-timeout"]))
>>>>   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
>>>>     (code, msg) = self.connect(host, port)
>>>>   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
>>>>     self.sock = self._get_socket(host, port, self.timeout)
>>>>   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
>>>>     return socket.create_connection((host, port), timeout)
>>>>   File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
>>>>     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
>>>> gaierror: [Errno -2] Name or service not known
>>>> Thread-6::WARNING::2017-12-01
>>>> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
>>>> bad health status: Hosted Engine is not up!
>>>>
>>>> There are no errors on engine logs and all the api queries done by
>>>> ovirt-hosted-engine-ha returns HTTP code 200.
>>>>
>>>> I suspect the switch between EngineUP and EngineBadHealth status could
>>>> be due to some dns resolution issues, but there is no clear message on
>>>> the log showing this and this doesn't help our netadmins to make some
>>>> traces.
>>>>
>>>> Is there a way to increase the verbosity of broker.log and agent.log?
>>>>
>>>> Luca
>>>>
>>>> --
>>>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>>>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>>>> macchine"
>>>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>>>
>>>> "Internet è la più grande biblioteca del mondo.
>>>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>>>> John Allen Paulos, Matematico (1945-vivente)
>>>>
>>>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> --
>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>> macchine"
>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>
>> "Internet è la più grande biblioteca del mondo.
>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>> John Allen Paulos, Matematico (1945-vivente)
>>
>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>



-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>


More information about the Users mailing list