Re: [CQ]: 108705, 9 (ovirt-engine) failed "ovirt-master" system tests, but isn't the failure root cause

On Sun, May 3, 2020 at 12:57 PM oVirt Jenkins <jenkins@ovirt.org> wrote:
A system test invoked by the "ovirt-master" change queue including change 108705,9 (ovirt-engine) failed. However, this change seems not to be the root cause for this failure. Change 107284,12 (ovirt-engine) that this change depends on or is based on, was detected as the cause of the testing failures.
This change had been removed from the testing queue. Artifacts built from this change will not be released until either change 107284,12 (ovirt-engine) is fixed and this change is updated to refer to or rebased on the fixed version, or this change is modified to no longer depend on it.
For further details about the change see: https://gerrit.ovirt.org/#/c/108705/9
For further details about the change that seems to be the root cause behind the testing failures see: https://gerrit.ovirt.org/#/c/107284/12
For failed test results see: https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/
The engine did manage to run ssh-copy-id to both host-0 and host-1, but then failed, a few seconds later, while running ansible: https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/artifac... 2020-05-03 05:54:49,779-04 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engine-Thread-1) [13ea1de1] Error executing playbook: Failed to add host to inventory: SSH timeout waiting for response from 'lago-basic-suite-master-host-0' A few other lines along this one, as well as similar ones for host-1, do not give (me) more information. lago did manage to collect logs from both hosts, a few seconds later. vdsm logs are empty, messages does not give me a clue. What is the timeout on trying to ssh (with ansible)? engine.log shows only 1-2 seconds from start to timeout. Perhaps we should make it a bit longer? Best regards, -- Didi

Hi Didi, I manage to reproduce this error locally. Is there something you would like me to check? Regards, Galit On Sun, May 3, 2020 at 1:37 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, May 3, 2020 at 12:57 PM oVirt Jenkins <jenkins@ovirt.org> wrote:
A system test invoked by the "ovirt-master" change queue including change 108705,9 (ovirt-engine) failed. However, this change seems not to be the
root
cause for this failure. Change 107284,12 (ovirt-engine) that this change depends on or is based on, was detected as the cause of the testing failures.
This change had been removed from the testing queue. Artifacts built from this change will not be released until either change 107284,12 (ovirt-engine) is fixed and this change is updated to refer to or rebased on the fixed version, or this change is modified to no longer depend on it.
For further details about the change see: https://gerrit.ovirt.org/#/c/108705/9
For further details about the change that seems to be the root cause behind the testing failures see: https://gerrit.ovirt.org/#/c/107284/12
For failed test results see: https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/
The engine did manage to run ssh-copy-id to both host-0 and host-1, but then failed, a few seconds later, while running ansible:
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/artifac...
2020-05-03 05:54:49,779-04 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engine-Thread-1) [13ea1de1] Error executing playbook: Failed to add host to inventory: SSH timeout waiting for response from 'lago-basic-suite-master-host-0'
A few other lines along this one, as well as similar ones for host-1, do not give (me) more information.
lago did manage to collect logs from both hosts, a few seconds later. vdsm logs are empty, messages does not give me a clue.
What is the timeout on trying to ssh (with ansible)? engine.log shows only 1-2 seconds from start to timeout. Perhaps we should make it a bit longer?
Best regards, -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WQ3JDHUPHR42VN...
-- GALIT ROSENTHAL SOFTWARE ENGINEER Red Hat <https://www.redhat.com/> galit@redhat.com T: 972-9-7692230 <https://red.ht/sig>

On Sun, May 3, 2020 at 1:47 PM Galit Rosenthal <grosenth@redhat.com> wrote:
Hi Didi,
I manage to reproduce this error locally. Is there something you would like me to check?
Yes, please! Can you ssh from the engine to the hosts?
Regards, Galit
On Sun, May 3, 2020 at 1:37 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, May 3, 2020 at 12:57 PM oVirt Jenkins <jenkins@ovirt.org> wrote:
A system test invoked by the "ovirt-master" change queue including
108705,9 (ovirt-engine) failed. However, this change seems not to be
change the root
cause for this failure. Change 107284,12 (ovirt-engine) that this change depends on or is based on, was detected as the cause of the testing failures.
This change had been removed from the testing queue. Artifacts built from this change will not be released until either change 107284,12 (ovirt-engine) is fixed and this change is updated to refer to or rebased on the fixed version, or this change is modified to no longer depend on it.
For further details about the change see: https://gerrit.ovirt.org/#/c/108705/9
For further details about the change that seems to be the root cause behind the testing failures see: https://gerrit.ovirt.org/#/c/107284/12
For failed test results see: https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/
The engine did manage to run ssh-copy-id to both host-0 and host-1, but then failed, a few seconds later, while running ansible:
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/artifac...
2020-05-03 05:54:49,779-04 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engine-Thread-1) [13ea1de1] Error executing playbook: Failed to add host to inventory: SSH timeout waiting for response from 'lago-basic-suite-master-host-0'
A few other lines along this one, as well as similar ones for host-1, do not give (me) more information.
lago did manage to collect logs from both hosts, a few seconds later. vdsm logs are empty, messages does not give me a clue.
What is the timeout on trying to ssh (with ansible)? engine.log shows only 1-2 seconds from start to timeout. Perhaps we should make it a bit longer?
Best regards, -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WQ3JDHUPHR42VN...
--
GALIT ROSENTHAL
SOFTWARE ENGINEER
Red Hat
galit@redhat.com T: 972-9-7692230 <https://red.ht/sig>
-- Didi

I already checked this, it isn't ssh directly it requires the password, and if the host not listed also to approve. On Sun, May 3, 2020 at 2:19 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, May 3, 2020 at 1:47 PM Galit Rosenthal <grosenth@redhat.com> wrote:
Hi Didi,
I manage to reproduce this error locally. Is there something you would like me to check?
Yes, please! Can you ssh from the engine to the hosts?
Regards, Galit
On Sun, May 3, 2020 at 1:37 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, May 3, 2020 at 12:57 PM oVirt Jenkins <jenkins@ovirt.org> wrote:
A system test invoked by the "ovirt-master" change queue including
108705,9 (ovirt-engine) failed. However, this change seems not to be
change the root
cause for this failure. Change 107284,12 (ovirt-engine) that this change depends on or is based on, was detected as the cause of the testing failures.
This change had been removed from the testing queue. Artifacts built from this change will not be released until either change 107284,12 (ovirt-engine) is fixed and this change is updated to refer to or rebased on the fixed version, or this change is modified to no longer depend on it.
For further details about the change see: https://gerrit.ovirt.org/#/c/108705/9
For further details about the change that seems to be the root cause behind the testing failures see: https://gerrit.ovirt.org/#/c/107284/12
For failed test results see: https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/
The engine did manage to run ssh-copy-id to both host-0 and host-1, but then failed, a few seconds later, while running ansible:
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/artifac...
2020-05-03 05:54:49,779-04 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engine-Thread-1) [13ea1de1] Error executing playbook: Failed to add host to inventory: SSH timeout waiting for response from 'lago-basic-suite-master-host-0'
A few other lines along this one, as well as similar ones for host-1, do not give (me) more information.
lago did manage to collect logs from both hosts, a few seconds later. vdsm logs are empty, messages does not give me a clue.
What is the timeout on trying to ssh (with ansible)? engine.log shows only 1-2 seconds from start to timeout. Perhaps we should make it a bit longer?
Best regards, -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WQ3JDHUPHR42VN...
--
GALIT ROSENTHAL
SOFTWARE ENGINEER
Red Hat
galit@redhat.com T: 972-9-7692230 <https://red.ht/sig>
-- Didi
-- GALIT ROSENTHAL SOFTWARE ENGINEER Red Hat <https://www.redhat.com/> galit@redhat.com T: 972-9-7692230 <https://red.ht/sig>

On 3 May 2020, at 13:48, Galit Rosenthal <grosenth@redhat.com> wrote:
I already checked this, it isn't ssh directly it requires the password, and if the host not listed also to approve.
On Sun, May 3, 2020 at 2:19 PM Yedidyah Bar David <didi@redhat.com <mailto:didi@redhat.com>> wrote: On Sun, May 3, 2020 at 1:47 PM Galit Rosenthal <grosenth@redhat.com <mailto:grosenth@redhat.com>> wrote: Hi Didi,
I manage to reproduce this error locally. Is there something you would like me to check?
Yes, please! Can you ssh from the engine to the hosts?
Regards, Galit
On Sun, May 3, 2020 at 1:37 PM Yedidyah Bar David <didi@redhat.com <mailto:didi@redhat.com>> wrote: On Sun, May 3, 2020 at 12:57 PM oVirt Jenkins <jenkins@ovirt.org <mailto:jenkins@ovirt.org>> wrote:
A system test invoked by the "ovirt-master" change queue including change 108705,9 (ovirt-engine) failed. However, this change seems not to be the root cause for this failure. Change 107284,12 (ovirt-engine) that this change depends on or is based on, was detected as the cause of the testing failures.
This change had been removed from the testing queue. Artifacts built from this change will not be released until either change 107284,12 (ovirt-engine) is fixed and this change is updated to refer to or rebased on the fixed version, or this change is modified to no longer depend on it.
For further details about the change see: https://gerrit.ovirt.org/#/c/108705/9 <https://gerrit.ovirt.org/#/c/108705/9>
For further details about the change that seems to be the root cause behind the testing failures see: https://gerrit.ovirt.org/#/c/107284/12 <https://gerrit.ovirt.org/#/c/107284/12>
For failed test results see: https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/ <https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/>
The engine did manage to run ssh-copy-id to both host-0 and host-1, but then failed, a few seconds later, while running ansible:
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/artifac... <https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/23505/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-master/post-002_bootstrap_pytest.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log>
2020-05-03 05:54:49,779-04 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engine-Thread-1) [13ea1de1] Error executing playbook: Failed to add host to inventory: SSH timeout waiting for response from 'lago-basic-suite-master-host-0'
new ansible-runner prioritizes IPv6 so in case it’s not entirely correct it’s going to fail. it should be fixed with https://gerrit.ovirt.org/#/c/108725/
A few other lines along this one, as well as similar ones for host-1, do not give (me) more information.
lago did manage to collect logs from both hosts, a few seconds later. vdsm logs are empty, messages does not give me a clue.
What is the timeout on trying to ssh (with ansible)? engine.log shows only 1-2 seconds from start to timeout. Perhaps we should make it a bit longer?
Best regards, -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org <mailto:devel@ovirt.org> To unsubscribe send an email to devel-leave@ovirt.org <mailto:devel-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WQ3JDHUPHR42VN... <https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WQ3JDHUPHR42VNWYBE7DGF4LRLOVC4V2/>
--
GALIT ROSENTHAL SOFTWARE ENGINEER Red Hat
<https://www.redhat.com/> galit@redhat.com <mailto:galit@redhat.com> T: 972-9-7692230 <tel:972-9-7692230>
-- Didi
--
GALIT ROSENTHAL SOFTWARE ENGINEER Red Hat
<https://www.redhat.com/> galit@redhat.com <mailto:galit@redhat.com> T: 972-9-7692230 <tel:972-9-7692230>
<https://red.ht/sig> _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/ASQJQW6Z44UCKA...
participants (3)
-
Galit Rosenthal
-
Michal Skrivanek
-
Yedidyah Bar David