[ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

Martin Sivak msivak at redhat.com
Mon Dec 4 08:31:10 UTC 2017


Hi,

please attach the log. You can grep out the connected / disconnected lines.

Look for engine health monitor lines.

Martin

On Sat, Dec 2, 2017 at 5:10 PM, Luca 'remix_tj' Lorenzetto
<lorenzetto.luca at gmail.com> wrote:
> Hello,
>
> i had several switches between EngineUp and EngineBadHealth today with
> broker.log ad DEBUG level. Where i should start to identify root
> cause? Log is somewhat chatty at this level.
>
> Luca
>
> On Fri, Dec 1, 2017 at 1:24 PM, Martin Sivak <msivak at redhat.com> wrote:
>> Hi,
>>
>>> [logger_root]
>>> level=INFO
>>
>>> [handler_logfile]
>>> level=DEBUG
>>
>>> Seems already set. The file broker.log is already containing DEBUG,
>>> but syslog is not (and this is good). What about logger_root?
>>
>> Yeah, I think you should change that one as well to get full debug
>> logging. The handler level does nothing if the messages do not get to
>> it. And the root logger should not let them in the default
>> configuration you have.
>>
>> Best regards
>>
>> Martin
>>
>>> Luca
>>>
>>> On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak <msivak at redhat.com> wrote:
>>>> Hi,
>>>>
>>>> can you please enable DEBUG log and then attach broker.log once it
>>>> reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the
>>>> place where to set it (do not forget to restart ovirt-ha-agent and
>>>> ovirt-ha-broker afterwards).
>>>>
>>>> Name resolution issues might be the cause for this indeed, because the
>>>> broker is trying to query a health endpoint over HTTP. If
>>>> notifications failed because of unresolvable name then there is high
>>>> chance the same happens to the health request every now and then.
>>>>
>>>> Best regards
>>>>
>>>> Martin Sivak
>>>>
>>>> On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto
>>>> <lorenzetto.luca at gmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> since some days my hosted-engine environments (one RHEV 4.0.7, one
>>>>> ovirt 4.1.7) continue to send mails about changes between EngineUp and
>>>>> EngineBadHealth.
>>>>>
>>>>> This is pretty annoying and i'm not able to find out the root cause.
>>>>>
>>>>> The only issue i've seen on hosts is this error appearing sometimes
>>>>> randomly about sending mails.
>>>>>
>>>>> Thread-1::ERROR::2017-12-01
>>>>> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
>>>>> [Errno -2] Name or service not known
>>>>> Traceback (most recent call last):
>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
>>>>> line 26, in send_email
>>>>>     timeout=float(cfg["smtp-timeout"]))
>>>>>   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
>>>>>     (code, msg) = self.connect(host, port)
>>>>>   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
>>>>>     self.sock = self._get_socket(host, port, self.timeout)
>>>>>   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
>>>>>     return socket.create_connection((host, port), timeout)
>>>>>   File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
>>>>>     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
>>>>> gaierror: [Errno -2] Name or service not known
>>>>> Thread-6::WARNING::2017-12-01
>>>>> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
>>>>> bad health status: Hosted Engine is not up!
>>>>>
>>>>> There are no errors on engine logs and all the api queries done by
>>>>> ovirt-hosted-engine-ha returns HTTP code 200.
>>>>>
>>>>> I suspect the switch between EngineUP and EngineBadHealth status could
>>>>> be due to some dns resolution issues, but there is no clear message on
>>>>> the log showing this and this doesn't help our netadmins to make some
>>>>> traces.
>>>>>
>>>>> Is there a way to increase the verbosity of broker.log and agent.log?
>>>>>
>>>>> Luca
>>>>>
>>>>> --
>>>>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>>>>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>>>>> macchine"
>>>>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>>>>
>>>>> "Internet è la più grande biblioteca del mondo.
>>>>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>>>>> John Allen Paulos, Matematico (1945-vivente)
>>>>>
>>>>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>> --
>>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>>> macchine"
>>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>>
>>> "Internet è la più grande biblioteca del mondo.
>>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>>> John Allen Paulos, Matematico (1945-vivente)
>>>
>>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>
>
>
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento"
> John Allen Paulos, Matematico (1945-vivente)
>
> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>


More information about the Users mailing list