[ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth
Martin Sivak
msivak at redhat.com
Fri Dec 1 11:29:32 UTC 2017
Hi,
can you please enable DEBUG log and then attach broker.log once it
reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the
place where to set it (do not forget to restart ovirt-ha-agent and
ovirt-ha-broker afterwards).
Name resolution issues might be the cause for this indeed, because the
broker is trying to query a health endpoint over HTTP. If
notifications failed because of unresolvable name then there is high
chance the same happens to the health request every now and then.
Best regards
Martin Sivak
On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto
<lorenzetto.luca at gmail.com> wrote:
> Hi all,
>
> since some days my hosted-engine environments (one RHEV 4.0.7, one
> ovirt 4.1.7) continue to send mails about changes between EngineUp and
> EngineBadHealth.
>
> This is pretty annoying and i'm not able to find out the root cause.
>
> The only issue i've seen on hosts is this error appearing sometimes
> randomly about sending mails.
>
> Thread-1::ERROR::2017-12-01
> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
> [Errno -2] Name or service not known
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
> line 26, in send_email
> timeout=float(cfg["smtp-timeout"]))
> File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
> (code, msg) = self.connect(host, port)
> File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
> self.sock = self._get_socket(host, port, self.timeout)
> File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
> return socket.create_connection((host, port), timeout)
> File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
> for res in getaddrinfo(host, port, 0, SOCK_STREAM):
> gaierror: [Errno -2] Name or service not known
> Thread-6::WARNING::2017-12-01
> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
> bad health status: Hosted Engine is not up!
>
> There are no errors on engine logs and all the api queries done by
> ovirt-hosted-engine-ha returns HTTP code 200.
>
> I suspect the switch between EngineUP and EngineBadHealth status could
> be due to some dns resolution issues, but there is no clear message on
> the log showing this and this doesn't help our netadmins to make some
> traces.
>
> Is there a way to increase the verbosity of broker.log and agent.log?
>
> Luca
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento"
> John Allen Paulos, Matematico (1945-vivente)
>
> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
More information about the Users
mailing list