Re: [oVirt Jenkins] ovirt-system-tests_he-basic-role-remote-suite-master - Build # 406 - Still Failing!

On Sun, May 10, 2020 at 4:35 AM <jenkins@jenkins.phx.ovirt.org> wrote:
Project: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-... Build: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-...
Fails for a long time now, but recently, since he-basic-suite-master is fixed, fails with: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-... 03:32:03 TASK [ovirt.hosted_engine_setup : Wait for the local VM] *********************** 04:32:09 fatal: [lago-he-basic-role-remote-suite-master-host-0 -> lago-he-basic-role-remote-suite-master-engine.lago.local]: FAILED! => {"changed": false, "elapsed": 3605, "msg": "timed out waiting for ping module test success: [Errno 24] Too many open files"} Looked around and failed to find other relevant information. I guess we should somehow (in the playbook or in OST, not sure which is better) collect 'ulimit -a' and open files. Searching google for the exact error message (meaning, both that the ping module failed, and that the reason was too many open files), finds only one place, does not seem related, so I do not think it's a regression in ansible.
Build Number: 406 Build Status: Still Failing Triggered By: Started by timer
------------------------------------- Changes Since Last Success: ------------------------------------- Changes for Build #397 [Sandro Bonazzola] python3: use print function
Changes for Build #398 [Sandro Bonazzola] python3: use print function
Changes for Build #399 [Sandro Bonazzola] python3: use print function
Changes for Build #400 [Sandro Bonazzola] python3: use print function
Changes for Build #401 [Michal Skrivanek] fix list of packages for storage setup
Changes for Build #402 [Lucia Jelinkova] UI test refactoring
Changes for Build #403 [Galit Rosenthal] Remove reposync from upgrade-from-prevrelease-suite-4.3
Changes for Build #404 [Marcin Sobczyk] selenium: podman: Add grid setup retries
Changes for Build #405 [Marcin Sobczyk] selenium: podman: Add grid setup retries
Changes for Build #406 [Marcin Sobczyk] selenium: podman: Add grid setup retries
----------------- Failed Tests: ----------------- No tests ran.
-- Didi

Hi, the issue with `open files` obscured the root cause of the problem [0], which was that: fatal: [lago-he-basic-role-remote-suite-master-host-0 -> lago-he-basic-role-remote-suite-master-engine.lago.local]: FAILED! => {"changed": false, "elapsed": 3605, "msg": "timed out waiting for ping module test success: Data could not be sent to remote host \"lago-he-basic-role-remote-suite-master-engine.lago.local\". Make sure this host can be reached over ssh: Warning: Permanently added 'lago-he-basic-role-remote-suite-master-engine.lago.local' (ECDSA) to the list of known hosts.\r\nunix_listener: path \"/root/.ansible/cp/ansible-ssh-lago-he-basic-role-remote-suite-master-engine.lago.local-22-root.vGo68KSTIX9y2nmm\" too long for Unix domain socket\r\n"} this patch [1] should handle both issue. [0] https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-... [1] https://gerrit.ovirt.org/#/c/108919/ On Sun, May 10, 2020 at 10:45 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, May 10, 2020 at 4:35 AM <jenkins@jenkins.phx.ovirt.org> wrote:
Project:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-...
Build: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-...
Fails for a long time now, but recently, since he-basic-suite-master is fixed, fails with:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-role-remote-suite-...
03:32:03 TASK [ovirt.hosted_engine_setup : Wait for the local VM] *********************** 04:32:09 fatal: [lago-he-basic-role-remote-suite-master-host-0 -> lago-he-basic-role-remote-suite-master-engine.lago.local]: FAILED! => {"changed": false, "elapsed": 3605, "msg": "timed out waiting for ping module test success: [Errno 24] Too many open files"}
Looked around and failed to find other relevant information. I guess we should somehow (in the playbook or in OST, not sure which is better) collect 'ulimit -a' and open files.
Searching google for the exact error message (meaning, both that the ping module failed, and that the reason was too many open files), finds only one place, does not seem related, so I do not think it's a regression in ansible.
Build Number: 406 Build Status: Still Failing Triggered By: Started by timer
------------------------------------- Changes Since Last Success: ------------------------------------- Changes for Build #397 [Sandro Bonazzola] python3: use print function
Changes for Build #398 [Sandro Bonazzola] python3: use print function
Changes for Build #399 [Sandro Bonazzola] python3: use print function
Changes for Build #400 [Sandro Bonazzola] python3: use print function
Changes for Build #401 [Michal Skrivanek] fix list of packages for storage setup
Changes for Build #402 [Lucia Jelinkova] UI test refactoring
Changes for Build #403 [Galit Rosenthal] Remove reposync from upgrade-from-prevrelease-suite-4.3
Changes for Build #404 [Marcin Sobczyk] selenium: podman: Add grid setup retries
Changes for Build #405 [Marcin Sobczyk] selenium: podman: Add grid setup retries
Changes for Build #406 [Marcin Sobczyk] selenium: podman: Add grid setup retries
----------------- Failed Tests: ----------------- No tests ran.
-- Didi
participants (2)
-
Evgeny Slutsky
-
Yedidyah Bar David