Hi,

the issue seems to be that host-1 stopped responding and I can see some fluetd errors which we should look at.

Jira opened to track this issue: https://ovirt-jira.atlassian.net/browse/OVIRT-2363

Martin, I also added you to the Jira - can you please have a look?

error from node-1 messages log:
Jul 23 05:09:14 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:14 -0400 [warn]: detached forwarding server 'lago-basic-suite-master-engine' host="lago-basic-suite-master-engine" port=24224 phi=16.275347714068506
Jul 23 05:09:14 lago-basic-suite-master-host-1 fluentd: ["lago-basic-suite-master-engine", "lago-basic-suite-master-engine", "lago-basic-suite-master-engine", "lago-basic-suite-master-engine", "lago-basic-suite-master-engine", "lago-basic-suite-master-engine"]
Jul 23 05:09:14 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:14 -0400 fluent.warn: {"host":"lago-basic-suite-master-engine","port":24224,"phi":16.275347714068506,"message":"detached forwarding server 'lago-basic-suite-master-engine' host=\"lago-basic-suite-master-engine\" port=24224 phi=16.275347714068506"}
Jul 23 05:09:15 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:15 -0400 [warn]: detached forwarding server 'lago-basic-suite-master-engine' host="lago-basic-suite-master-engine" port=24224 phi=16.70444149784817
Jul 23 05:09:15 lago-basic-suite-master-host-1 fluentd: ["lago-basic-suite-master-engine", "lago-basic-suite-master-engine", "lago-basic-suite-master-engine", "lago-basic-suite-master-engine", "lago-basic-suite-master-engine", "lago-basic-suite-master-engine"]
Jul 23 05:09:15 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:15 -0400 fluent.warn: {"host":"lago-basic-suite-master-engine","port":24224,"phi":16.70444149784817,"message":"detached forwarding server 'lago-basic-suite-master-engine' host=\"lago-basic-suite-master-engine\" port=24224 phi=16.70444149784817"}
Jul 23 05:09:23 lago-basic-suite-master-host-1 python: ansible-command Invoked with warn=False executable=None _uses_shell=False _raw_params=systemctl is-active 'collectd' removes=None argv=None creates=None chdir=None stdin=None
Jul 23 05:09:25 lago-basic-suite-master-host-1 systemd-logind: New session 29 of user root.
Jul 23 05:09:25 lago-basic-suite-master-host-1 systemd: Started Session 29 of user root.
Jul 23 05:09:25 lago-basic-suite-master-host-1 systemd: Starting Session 29 of user root.
Jul 23 05:09:25 lago-basic-suite-master-host-1 systemd-logind: Removed session 29.
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [warn]: failed to flush the buffer. error_class="RuntimeError" error="no nodes are available" plugin_id="object:151a620"
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [warn]: retry count exceededs limit.
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/plugin/out_forward.rb:222:in `write_objects'
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/output.rb:490:in `write'
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/buffer.rb:354:in `write_chunk'
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/buffer.rb:333:in `pop'
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/output.rb:342:in `try_flush'
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/output.rb:149:in `run'
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 [error]: throwing away old logs.
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 fluent.warn: {"error_class":"RuntimeError","error":"no nodes are available","plugin_id":"object:151a620","message":"failed to flush the buffer. error_class=\"RuntimeError\" error=\"no nodes are available\" plugin_id=\"object:151a620\""}
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 fluent.warn: {"message":"retry count exceededs limit."}
Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 05:09:27 -0400 fluent.error: {"message":"throwing away old logs."}



Thanks.
Dafna



On Mon, Jul 23, 2018 at 10:31 AM, oVirt Jenkins <jenkins@ovirt.org> wrote:
Change 92882,9 (ovirt-engine) is probably the reason behind recent system test
failures in the "ovirt-master" change queue and needs to be fixed.

This change had been removed from the testing queue. Artifacts build from this
change will not be released until it is fixed.

For further details about the change see:
https://gerrit.ovirt.org/#/c/92882/9

For failed test results see:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/8764/
_______________________________________________
Infra mailing list -- infra@ovirt.org
To unsubscribe send an email to infra-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/6LYYXSGM4LQSRVSYY3IJEIE64LW27TJM/