Debugging why hosted engine flips between EngineUp and EngineBadHealth

Hi all, since some days my hosted-engine environments (one RHEV 4.0.7, one ovirt 4.1.7) continue to send mails about changes between EngineUp and EngineBadHealth. This is pretty annoying and i'm not able to find out the root cause. The only issue i've seen on hosts is this error appearing sometimes randomly about sending mails. Thread-1::ERROR::2017-12-01 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno -2] Name or service not known Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 26, in send_email timeout=float(cfg["smtp-timeout"])) File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python2.7/smtplib.py", line 315, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib64/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known Thread-6::WARNING::2017-12-01 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action) bad health status: Hosted Engine is not up! There are no errors on engine logs and all the api queries done by ovirt-hosted-engine-ha returns HTTP code 200. I suspect the switch between EngineUP and EngineBadHealth status could be due to some dns resolution issues, but there is no clear message on the log showing this and this doesn't help our netadmins to make some traces. Is there a way to increase the verbosity of broker.log and agent.log? Luca -- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>

We have had a similar issue that has been resolved with restarting the engine vps. Not ideal but it solves the problem for a about a month. /JohanOn Fri, 2017-12-01 at 10:50 +0100, Luca 'remix_tj' Lorenzetto wrote:
Hi all,
since some days my hosted-engine environments (one RHEV 4.0.7, one ovirt 4.1.7) continue to send mails about changes between EngineUp and EngineBadHealth.
This is pretty annoying and i'm not able to find out the root cause.
The only issue i've seen on hosts is this error appearing sometimes randomly about sending mails.
Thread-1::ERROR::2017-12-01 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifi cations.Notifications::(send_email) [Errno -2] Name or service not known Traceback (most recent call last): File "/usr/lib/python2.7/site- packages/ovirt_hosted_engine_ha/broker/notifications.py", line 26, in send_email timeout=float(cfg["smtp-timeout"])) File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python2.7/smtplib.py", line 315, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib64/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known Thread-6::WARNING::2017-12-01 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(act ion) bad health status: Hosted Engine is not up!
There are no errors on engine logs and all the api queries done by ovirt-hosted-engine-ha returns HTTP code 200.
I suspect the switch between EngineUP and EngineBadHealth status could be due to some dns resolution issues, but there is no clear message on the log showing this and this doesn't help our netadmins to make some traces.
Is there a way to increase the verbosity of broker.log and agent.log?
Luca

Hi, can you please enable DEBUG log and then attach broker.log once it reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the place where to set it (do not forget to restart ovirt-ha-agent and ovirt-ha-broker afterwards). Name resolution issues might be the cause for this indeed, because the broker is trying to query a health endpoint over HTTP. If notifications failed because of unresolvable name then there is high chance the same happens to the health request every now and then. Best regards Martin Sivak On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto <lorenzetto.luca@gmail.com> wrote:
Hi all,
since some days my hosted-engine environments (one RHEV 4.0.7, one ovirt 4.1.7) continue to send mails about changes between EngineUp and EngineBadHealth.
This is pretty annoying and i'm not able to find out the root cause.
The only issue i've seen on hosts is this error appearing sometimes randomly about sending mails.
Thread-1::ERROR::2017-12-01 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno -2] Name or service not known Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 26, in send_email timeout=float(cfg["smtp-timeout"])) File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python2.7/smtplib.py", line 315, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib64/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known Thread-6::WARNING::2017-12-01 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action) bad health status: Hosted Engine is not up!
There are no errors on engine logs and all the api queries done by ovirt-hosted-engine-ha returns HTTP code 200.
I suspect the switch between EngineUP and EngineBadHealth status could be due to some dns resolution issues, but there is no clear message on the log showing this and this doesn't help our netadmins to make some traces.
Is there a way to increase the verbosity of broker.log and agent.log?
Luca
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Martin, i see this in broker-log.conf [logger_root] level=INFO handlers=syslog,logfile propagate=0 [handler_syslog] level=ERROR class=handlers.SysLogHandler formatter=sysform args=('/dev/log', handlers.SysLogHandler.LOG_USER) [handler_logfile] class=logging.handlers.TimedRotatingFileHandler args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7) level=DEBUG formatter=long Seems already set. The file broker.log is already containing DEBUG, but syslog is not (and this is good). What about logger_root? Luca On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
can you please enable DEBUG log and then attach broker.log once it reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the place where to set it (do not forget to restart ovirt-ha-agent and ovirt-ha-broker afterwards).
Name resolution issues might be the cause for this indeed, because the broker is trying to query a health endpoint over HTTP. If notifications failed because of unresolvable name then there is high chance the same happens to the health request every now and then.
Best regards
Martin Sivak
On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto <lorenzetto.luca@gmail.com> wrote:
Hi all,
since some days my hosted-engine environments (one RHEV 4.0.7, one ovirt 4.1.7) continue to send mails about changes between EngineUp and EngineBadHealth.
This is pretty annoying and i'm not able to find out the root cause.
The only issue i've seen on hosts is this error appearing sometimes randomly about sending mails.
Thread-1::ERROR::2017-12-01 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno -2] Name or service not known Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 26, in send_email timeout=float(cfg["smtp-timeout"])) File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python2.7/smtplib.py", line 315, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib64/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known Thread-6::WARNING::2017-12-01 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action) bad health status: Hosted Engine is not up!
There are no errors on engine logs and all the api queries done by ovirt-hosted-engine-ha returns HTTP code 200.
I suspect the switch between EngineUP and EngineBadHealth status could be due to some dns resolution issues, but there is no clear message on the log showing this and this doesn't help our netadmins to make some traces.
Is there a way to increase the verbosity of broker.log and agent.log?
Luca
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>

Hi,
[logger_root] level=INFO
[handler_logfile] level=DEBUG
Seems already set. The file broker.log is already containing DEBUG, but syslog is not (and this is good). What about logger_root?
Yeah, I think you should change that one as well to get full debug logging. The handler level does nothing if the messages do not get to it. And the root logger should not let them in the default configuration you have. Best regards Martin
Luca
On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
can you please enable DEBUG log and then attach broker.log once it reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the place where to set it (do not forget to restart ovirt-ha-agent and ovirt-ha-broker afterwards).
Name resolution issues might be the cause for this indeed, because the broker is trying to query a health endpoint over HTTP. If notifications failed because of unresolvable name then there is high chance the same happens to the health request every now and then.
Best regards
Martin Sivak
On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto <lorenzetto.luca@gmail.com> wrote:
Hi all,
since some days my hosted-engine environments (one RHEV 4.0.7, one ovirt 4.1.7) continue to send mails about changes between EngineUp and EngineBadHealth.
This is pretty annoying and i'm not able to find out the root cause.
The only issue i've seen on hosts is this error appearing sometimes randomly about sending mails.
Thread-1::ERROR::2017-12-01 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno -2] Name or service not known Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 26, in send_email timeout=float(cfg["smtp-timeout"])) File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python2.7/smtplib.py", line 315, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib64/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known Thread-6::WARNING::2017-12-01 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action) bad health status: Hosted Engine is not up!
There are no errors on engine logs and all the api queries done by ovirt-hosted-engine-ha returns HTTP code 200.
I suspect the switch between EngineUP and EngineBadHealth status could be due to some dns resolution issues, but there is no clear message on the log showing this and this doesn't help our netadmins to make some traces.
Is there a way to increase the verbosity of broker.log and agent.log?
Luca
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>

Hello, i had several switches between EngineUp and EngineBadHealth today with broker.log ad DEBUG level. Where i should start to identify root cause? Log is somewhat chatty at this level. Luca On Fri, Dec 1, 2017 at 1:24 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
[logger_root] level=INFO
[handler_logfile] level=DEBUG
Seems already set. The file broker.log is already containing DEBUG, but syslog is not (and this is good). What about logger_root?
Yeah, I think you should change that one as well to get full debug logging. The handler level does nothing if the messages do not get to it. And the root logger should not let them in the default configuration you have.
Best regards
Martin
Luca
On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
can you please enable DEBUG log and then attach broker.log once it reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the place where to set it (do not forget to restart ovirt-ha-agent and ovirt-ha-broker afterwards).
Name resolution issues might be the cause for this indeed, because the broker is trying to query a health endpoint over HTTP. If notifications failed because of unresolvable name then there is high chance the same happens to the health request every now and then.
Best regards
Martin Sivak
On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto <lorenzetto.luca@gmail.com> wrote:
Hi all,
since some days my hosted-engine environments (one RHEV 4.0.7, one ovirt 4.1.7) continue to send mails about changes between EngineUp and EngineBadHealth.
This is pretty annoying and i'm not able to find out the root cause.
The only issue i've seen on hosts is this error appearing sometimes randomly about sending mails.
Thread-1::ERROR::2017-12-01 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno -2] Name or service not known Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 26, in send_email timeout=float(cfg["smtp-timeout"])) File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python2.7/smtplib.py", line 315, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib64/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known Thread-6::WARNING::2017-12-01 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action) bad health status: Hosted Engine is not up!
There are no errors on engine logs and all the api queries done by ovirt-hosted-engine-ha returns HTTP code 200.
I suspect the switch between EngineUP and EngineBadHealth status could be due to some dns resolution issues, but there is no clear message on the log showing this and this doesn't help our netadmins to make some traces.
Is there a way to increase the verbosity of broker.log and agent.log?
Luca
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>

Hi, please attach the log. You can grep out the connected / disconnected lines. Look for engine health monitor lines. Martin On Sat, Dec 2, 2017 at 5:10 PM, Luca 'remix_tj' Lorenzetto <lorenzetto.luca@gmail.com> wrote:
Hello,
i had several switches between EngineUp and EngineBadHealth today with broker.log ad DEBUG level. Where i should start to identify root cause? Log is somewhat chatty at this level.
Luca
On Fri, Dec 1, 2017 at 1:24 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
[logger_root] level=INFO
[handler_logfile] level=DEBUG
Seems already set. The file broker.log is already containing DEBUG, but syslog is not (and this is good). What about logger_root?
Yeah, I think you should change that one as well to get full debug logging. The handler level does nothing if the messages do not get to it. And the root logger should not let them in the default configuration you have.
Best regards
Martin
Luca
On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
can you please enable DEBUG log and then attach broker.log once it reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the place where to set it (do not forget to restart ovirt-ha-agent and ovirt-ha-broker afterwards).
Name resolution issues might be the cause for this indeed, because the broker is trying to query a health endpoint over HTTP. If notifications failed because of unresolvable name then there is high chance the same happens to the health request every now and then.
Best regards
Martin Sivak
On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto <lorenzetto.luca@gmail.com> wrote:
Hi all,
since some days my hosted-engine environments (one RHEV 4.0.7, one ovirt 4.1.7) continue to send mails about changes between EngineUp and EngineBadHealth.
This is pretty annoying and i'm not able to find out the root cause.
The only issue i've seen on hosts is this error appearing sometimes randomly about sending mails.
Thread-1::ERROR::2017-12-01 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno -2] Name or service not known Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 26, in send_email timeout=float(cfg["smtp-timeout"])) File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python2.7/smtplib.py", line 315, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib64/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known Thread-6::WARNING::2017-12-01 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action) bad health status: Hosted Engine is not up!
There are no errors on engine logs and all the api queries done by ovirt-hosted-engine-ha returns HTTP code 200.
I suspect the switch between EngineUP and EngineBadHealth status could be due to some dns resolution issues, but there is no clear message on the log showing this and this doesn't help our netadmins to make some traces.
Is there a way to increase the verbosity of broker.log and agent.log?
Luca
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>

On Mon, Dec 4, 2017 at 9:31 AM, Martin Sivak <msivak@redhat.com> wrote:
Hi,
please attach the log. You can grep out the connected / disconnected lines.
Look for engine health monitor lines.
Martin
Log is quite big (about 1.5GB). I'm filtering out the messages around the last report of EngineBadHealth <-> EngineUp. I'll upload in some minutes. Luca -- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca@gmail.com>

Hi Martin, Hi all, *some minutes* has passed and i've the piece of log i'm looking at. broker.log-upbadup <https://drive.google.com/file/d/1wlWZPuhgtJRBWt4xUZC-Jis8vLWM1jYD/view?usp=drive_web> I got this morning a notice about enginebadhealth and engineup flip. I'm not able to identify nothing that could have caused this, because up to some seconds before the bad health report everything is ok... Do you notice anything strange? Thank you, Luca 2017-12-04 12:00 GMT+01:00 Luca 'remix_tj' Lorenzetto < lorenzetto.luca@gmail.com>:
Hi,
please attach the log. You can grep out the connected / disconnected
On Mon, Dec 4, 2017 at 9:31 AM, Martin Sivak <msivak@redhat.com> wrote: lines.
Look for engine health monitor lines.
Martin
Log is quite big (about 1.5GB). I'm filtering out the messages around the last report of EngineBadHealth <-> EngineUp.
I'll upload in some minutes.
Luca
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , < lorenzetto.luca@gmail.com>
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , < lorenzetto.luca@gmail.com>

Hello, Today i started troubleshooting more in depth on dns requests and exactly while i was looking at tcpdump an event of EngineUp -> EngineBadHealth happened. Looking at the dns requests i see this: [...] 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54) 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54) 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54) 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54) 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it.intranet.company.it. (74) 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.intranet.company.it. (74) 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it.intranet.company.it. (74) 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.intranet.company.it. (74) 14:30:56.044242 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54) 14:30:56.044267 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54) 14:31:01.049761 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54) 14:31:01.049777 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54) 14:31:06.052635 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it.intranet.company.it. (74) 14:31:06.052649 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.intranet.company.it. (74) 14:31:11.057724 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it.intranet.company.it. (74) 14:31:11.057745 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.intranet.company.it. (74) 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54) 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54) 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 > dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49) 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54) 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54) 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 > dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48) [...] The last dns requests has success and gets the MX address and immediately after i get the email reporting the status change. This is clearly an issue with name resolution, but that's not clear to me from the broker.log file. The only message about it that i get is: Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor engine-health id 139653 412040592 current value: {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Response: success {"reaso n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} But around that messages i get no signals of error on dns queries or similar. Do i need to check on other log files? Luca On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto < lorenzetto.luca@gmail.com> wrote:
Hi Martin, Hi all,
*some minutes* has passed and i've the piece of log i'm looking at.
broker.log-upbadup <https://drive.google.com/file/d/1wlWZPuhgtJRBWt4xUZC-Jis8vLWM1jYD/view?usp=drive_web>
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716) "Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente) Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , < lorenzetto.luca@gmail.com>

Hi, I am afraid we do not have logs that would go that deep into the stack. DNS resolution issues will definitely affect both the notification system (if not using localhost smtp) and the engine status checks (because we use the fqdn). Best regards Martin On Wed, Dec 13, 2017 at 3:15 PM, Luca 'remix_tj' Lorenzetto < lorenzetto.luca@gmail.com> wrote:
Hello,
Today i started troubleshooting more in depth on dns requests and exactly while i was looking at tcpdump an event of EngineUp -> EngineBadHealth happened.
Looking at the dns requests i see this:
[...] 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54) 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54) 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54) 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54) 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it. intranet.company.it. (74) 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it. intranet.company.it. (74) 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it. intranet.company.it. (74) 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it. intranet.company.it. (74) 14:30:56.044242 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54) 14:30:56.044267 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54) 14:31:01.049761 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54) 14:31:01.049777 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54) 14:31:06.052635 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it. intranet.company.it. (74) 14:31:06.052649 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it. intranet.company.it. (74) 14:31:11.057724 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it. intranet.company.it. (74) 14:31:11.057745 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it. intranet.company.it. (74) 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54) 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54) 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 > dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49) 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54) 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54) 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 > dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48) [...]
The last dns requests has success and gets the MX address and immediately after i get the email reporting the status change.
This is clearly an issue with name resolution, but that's not clear to me from the broker.log file. The only message about it that i get is:
Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126:: ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor engine-health id 139653 412040592 current value: {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170:: ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Response: success {"reaso n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
But around that messages i get no signals of error on dns queries or similar. Do i need to check on other log files?
Luca
On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto < lorenzetto.luca@gmail.com> wrote:
Hi Martin, Hi all,
*some minutes* has passed and i've the piece of log i'm looking at.
broker.log-upbadup <https://drive.google.com/file/d/1wlWZPuhgtJRBWt4xUZC-Jis8vLWM1jYD/view?usp=drive_web>
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , < lorenzetto.luca@gmail.com>

On Wed, Dec 13, 2017 at 4:15 PM, Luca 'remix_tj' Lorenzetto < lorenzetto.luca@gmail.com> wrote:
Hello,
Today i started troubleshooting more in depth on dns requests and exactly while i was looking at tcpdump an event of EngineUp -> EngineBadHealth happened.
Looking at the dns requests i see this:
[...] 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54) 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54) 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54) 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54) 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it. intranet.company.it. (74) 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it. intranet.company.it. (74) 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it. intranet.company.it. (74) 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it. intranet.company.it. (74) 14:30:56.044242 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54) 14:30:56.044267 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54) 14:31:01.049761 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54) 14:31:01.049777 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54) 14:31:06.052635 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it. intranet.company.it. (74) 14:31:06.052649 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it. intranet.company.it. (74) 14:31:11.057724 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it. intranet.company.it. (74) 14:31:11.057745 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it. intranet.company.it. (74) 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54) 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54) 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 > dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49) 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54) 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54) 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 > dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48) [...]
The last dns requests has success and gets the MX address and immediately after i get the email reporting the status change.
Can you ensure it doesn't have multiple IPs registered for it in DNS? dig or so should help. Y.
This is clearly an issue with name resolution, but that's not clear to me from the broker.log file. The only message about it that i get is:
Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126:: ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor engine-health id 139653 412040592 current value: {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170:: ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Response: success {"reaso n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
But around that messages i get no signals of error on dns queries or similar. Do i need to check on other log files?
Luca
On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto < lorenzetto.luca@gmail.com> wrote:
Hi Martin, Hi all,
*some minutes* has passed and i've the piece of log i'm looking at.
broker.log-upbadup <https://drive.google.com/file/d/1wlWZPuhgtJRBWt4xUZC-Jis8vLWM1jYD/view?usp=drive_web>
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , < lorenzetto.luca@gmail.com>
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Il 13 dic 2017 8:19 PM, "Yaniv Kaul" <ykaul@redhat.com> ha scritto: On Wed, Dec 13, 2017 at 4:15 PM, Luca 'remix_tj' Lorenzetto < lorenzetto.luca@gmail.com> wrote:
Hello,
Today i started troubleshooting more in depth on dns requests and exactly while i was looking at tcpdump an event of EngineUp -> EngineBadHealth happened.
Looking at the dns requests i see this:
[...] 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54) 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54) 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54) 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53: 6242+ AAAA? engine01.intranet.company.it. (54) 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it.i ntranet.company.it. (74) 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.i ntranet.company.it. (74) 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 60263+ A? engine01.intranet.company.it.i ntranet.company.it. (74) 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53: 18681+ AAAA? engine01.intranet.company.it.i ntranet.company.it. (74) 14:30:56.044242 <0442%2042> IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54) 14:30:56.044267 <0442%2067> IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54) 14:31:01.049761 <049%20761> IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54) 14:31:01.049777 <049%20777> IP kvmhost01.intranet.company.it.58319 > dns.company.it.53: 29680+ AAAA? engine01.intranet.company.it. (54) 14:31:06.052635 <06%20052635> IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it.i ntranet.company.it. (74) 14:31:06.052649 <06%20052649> IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.i ntranet.company.it. (74) 14:31:11.057724 <0577%2024> IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 24807+ A? engine01.intranet.company.it.i ntranet.company.it. (74) 14:31:11.057745 <0577%2045> IP kvmhost01.intranet.company.it.58093 > dns.company.it.53: 53745+ AAAA? engine01.intranet.company.it.i ntranet.company.it. (74) 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54) 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54) 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 > dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49) 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54) 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53: 15726+ AAAA? engine01.intranet.company.it. (54) 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 > dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48) [...]
The last dns requests has success and gets the MX address and immediately after i get the email reporting the status change.
Can you ensure it doesn't have multiple IPs registered for it in DNS? dig or so should help. Y. No, it's not. A single ip is registered. It's for sure a dns query missing its replies. I'm debugging with network team on what's happening. Anyway, i think that Broker log in debug Mode should help identifying the source if this errors. Maybe explaining better why liveness check has failed will reduce the troubleshooting experiments. Luca
This is clearly an issue with name resolution, but that's not clear to me from the broker.log file. The only message about it that i get is:
Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::ov irt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor engine-health id 139653 412040592 current value: {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"} Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::o virt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Response: success {"reaso n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
But around that messages i get no signals of error on dns queries or similar. Do i need to check on other log files?
Luca
On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto < lorenzetto.luca@gmail.com> wrote:
Hi Martin, Hi all,
*some minutes* has passed and i've the piece of log i'm looking at.
broker.log-upbadup <https://drive.google.com/file/d/1wlWZPuhgtJRBWt4xUZC-Jis8vLWM1jYD/view?usp=drive_web>
-- "E' assurdo impiegare gli uomini di intelligenza eccellente per fare calcoli che potrebbero essere affidati a chiunque se si usassero delle macchine" Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
"Internet è la più grande biblioteca del mondo. Ma il problema è che i libri sono tutti sparsi sul pavimento" John Allen Paulos, Matematico (1945-vivente)
Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , < lorenzetto.luca@gmail.com>
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (4)
-
Johan Bernhardsson
-
Luca 'remix_tj' Lorenzetto
-
Martin Sivak
-
Yaniv Kaul