[ OST Failure Report ] [ oVirt master ] [ 2017-12-24 ] [add_hosts]

Test failed: [ 002_bootstrap.add_hosts ] Link to suspected patches: - Linked test is failing with the failing engine patch: https://gerrit.ovirt.org/#/c/85533/2 - Tests with engine patches had been failing since: https://gerrit.ovirt.org/#/c/85668/2 Link to Job: https://gerrit.ovirt.org/#/c/85668/2 Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4497/artifact/... Error snippet from log: <error> 'Host lago-upgrade-from-release-suite-master-host0 is in non responsive state -------------------- >> begin captured logging << -------------------- ovirtlago.testlib: ERROR: * Unhandled exception in <function _host_is_up_4 at 0x45fb938> Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 219, in assert_equals_within res = func() File "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/upgrade-from-release-suite-master/test-scenarios-after-upgrade/002_bootstrap.py", line 178, in _host_is_up_4 api_host.name)\nRuntimeError: Host lago-upgrade-from-release-suite-master-host0 is in non responsive state --------------------- >> end captured logging << --------------------- </error> For some reason there is no host-deploy log, so further analysis is difficult. -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Sun, Dec 24, 2017 at 5:37 PM, Barak Korren <bkorren@redhat.com> wrote:
Test failed: [ 002_bootstrap.add_hosts ]
Link to suspected patches:
- Linked test is failing with the failing engine patch: https://gerrit.ovirt.org/#/c/85533/2
This patch "webadmin: vnic mapping - initial match target to source" seems utterly unrelated.
- Tests with engine patches had been failing since: https://gerrit.ovirt.org/#/c/85668/2
"core: ansible: Replace include with include_tasks" this is more related to add-host flow.
Link to Job: https://gerrit.ovirt.org/#/c/85668/2
Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4497/artifact/...
Error snippet from log:
<error>
'Host lago-upgrade-from-release-suite-master-host0 is in non responsive state -------------------- >> begin captured logging << -------------------- ovirtlago.testlib: ERROR: * Unhandled exception in <function _host_is_up_4 at 0x45fb938> Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 219, in assert_equals_within res = func() File "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/upgrade-from-release-suite-master/test-scenarios-after-upgrade/002_bootstrap.py", line 178, in _host_is_up_4 api_host.name)\nRuntimeError: Host lago-upgrade-from-release-suite-master-host0 is in non responsive state --------------------- >> end captured logging << ---------------------
</error>
For some reason there is no host-deploy log, so further analysis is difficult.
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On 24 December 2017 at 21:40, Dan Kenigsberg <danken@redhat.com> wrote:
On Sun, Dec 24, 2017 at 5:37 PM, Barak Korren <bkorren@redhat.com> wrote:
Test failed: [ 002_bootstrap.add_hosts ]
Link to suspected patches:
- Linked test is failing with the failing engine patch: https://gerrit.ovirt.org/#/c/85533/2
This patch "webadmin: vnic mapping - initial match target to source" seems utterly unrelated.
- Tests with engine patches had been failing since: https://gerrit.ovirt.org/#/c/85668/2
"core: ansible: Replace include with include_tasks" this is more related to add-host flow.
Here is a test with just that patch to confirm is causes the same issue: http://jenkins.ovirt.org/job/ovirt-system-tests_manual/1924
Link to Job: https://gerrit.ovirt.org/#/c/85668/2
Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4497/artifact/...
Error snippet from log:
<error>
'Host lago-upgrade-from-release-suite-master-host0 is in non responsive state -------------------- >> begin captured logging << -------------------- ovirtlago.testlib: ERROR: * Unhandled exception in <function _host_is_up_4 at 0x45fb938> Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 219, in assert_equals_within res = func() File "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/upgrade-from-release-suite-master/test-scenarios-after-upgrade/002_bootstrap.py", line 178, in _host_is_up_4 api_host.name)\nRuntimeError: Host lago-upgrade-from-release-suite-master-host0 is in non responsive state --------------------- >> end captured logging << ---------------------
</error>
For some reason there is no host-deploy log, so further analysis is difficult.
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Mon, Dec 25, 2017 at 12:20 AM, Barak Korren <bkorren@redhat.com> wrote:
On 24 December 2017 at 21:40, Dan Kenigsberg <danken@redhat.com> wrote:
On Sun, Dec 24, 2017 at 5:37 PM, Barak Korren <bkorren@redhat.com> wrote:
Test failed: [ 002_bootstrap.add_hosts ]
Link to suspected patches:
- Linked test is failing with the failing engine patch: https://gerrit.ovirt.org/#/c/85533/2
This patch "webadmin: vnic mapping - initial match target to source" seems utterly unrelated.
- Tests with engine patches had been failing since: https://gerrit.ovirt.org/#/c/85668/2
"core: ansible: Replace include with include_tasks" this is more related to add-host flow.
Here is a test with just that patch to confirm is causes the same issue: http://jenkins.ovirt.org/job/ovirt-system-tests_manual/1924
In future breakage reports, may ask to include the patch title when you declare it as suspicious? It helps readers understand what (and who) are to blame.

I'm testing a revert of the patch [1], but it looks like it is very related and suspicion is around ansible not configured firewalld and thus causing the failure. Gal has been debugging the same issue downstream yesterday. [1] https://gerrit.ovirt.org/#/c/85723/ On Mon, Dec 25, 2017 at 9:57 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Mon, Dec 25, 2017 at 12:20 AM, Barak Korren <bkorren@redhat.com> wrote:
On 24 December 2017 at 21:40, Dan Kenigsberg <danken@redhat.com> wrote:
On Sun, Dec 24, 2017 at 5:37 PM, Barak Korren <bkorren@redhat.com> wrote:
Test failed: [ 002_bootstrap.add_hosts ]
Link to suspected patches:
- Linked test is failing with the failing engine patch: https://gerrit.ovirt.org/#/c/85533/2
This patch "webadmin: vnic mapping - initial match target to source" seems utterly unrelated.
- Tests with engine patches had been failing since: https://gerrit.ovirt.org/#/c/85668/2
"core: ansible: Replace include with include_tasks" this is more related to add-host flow.
Here is a test with just that patch to confirm is causes the same issue: http://jenkins.ovirt.org/job/ovirt-system-tests_manual/1924
In future breakage reports, may ask to include the patch title when you declare it as suspicious? It helps readers understand what (and who) are to blame. _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On 25 December 2017 at 10:50, Eyal Edri <eedri@redhat.com> wrote:
I'm testing a revert of the patch [1], but it looks like it is very related and suspicion is around ansible not configured firewalld and thus causing the failure. Gal has been debugging the same issue downstream yesterday.
Revet patch failed to pass CQ: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4518/ I'm rerunning just in case it passes a 2nd time: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4520/ Maybe we need to revert the firewalld patch too: https://gerrit.ovirt.org/c/85326 (engine: Let backend choose default firewall type) -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Mon, Dec 25, 2017 at 2:20 PM, Barak Korren <bkorren@redhat.com> wrote:
On 25 December 2017 at 10:50, Eyal Edri <eedri@redhat.com> wrote:
I'm testing a revert of the patch [1], but it looks like it is very related and suspicion is around ansible not configured firewalld and thus causing the failure. Gal has been debugging the same issue downstream yesterday.
Revet patch failed to pass CQ: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4518/
I'm rerunning just in case it passes a 2nd time: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4520/
Maybe we need to revert the firewalld patch too: https://gerrit.ovirt.org/c/85326 (engine: Let backend choose default firewall type)
OK, Thanks for Gal for debugging this further, it seems that a different patch was causing the failure: https://gerrit.ovirt.org/#/c/85611/ - core: ansible: Don't print errors when files not found I've tested a revert patch [1]( rebased on master ) and it passed here [2]. Details on why it caused the failure will be added to the revert patch commit msg. [1] https://gerrit.ovirt.org/#/c/85737/ [2] http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
-- Eyal edri MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On 25 December 2017 at 16:59, Eyal Edri <eedri@redhat.com> wrote:
On Mon, Dec 25, 2017 at 2:20 PM, Barak Korren <bkorren@redhat.com> wrote:
On 25 December 2017 at 10:50, Eyal Edri <eedri@redhat.com> wrote:
I'm testing a revert of the patch [1], but it looks like it is very related and suspicion is around ansible not configured firewalld and thus causing the failure. Gal has been debugging the same issue downstream yesterday.
Revet patch failed to pass CQ: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4518/
I'm rerunning just in case it passes a 2nd time: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4520/
Maybe we need to revert the firewalld patch too: https://gerrit.ovirt.org/c/85326 (engine: Let backend choose default firewall type)
OK, Thanks for Gal for debugging this further, it seems that a different patch was causing the failure: https://gerrit.ovirt.org/#/c/85611/ - core: ansible: Don't print errors when files not found
I've tested a revert patch [1]( rebased on master ) and it passed here [2].
Details on why it caused the failure will be added to the revert patch commit msg.
This had now finally passed OST/CQ: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4522/ -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
participants (3)
-
Barak Korren
-
Dan Kenigsberg
-
Eyal Edri