On Tue, Mar 16, 2021 at 6:02 PM Michal Skrivanek <mskrivan@redhat.com> wrote:

On 16. 3. 2021, at 15:53, Yedidyah Bar David <didi@redhat.com> wrote:

On Tue, Mar 16, 2021 at 10:09 AM Yedidyah Bar David <didi@redhat.com> wrote:

On Tue, Mar 16, 2021 at 7:06 AM <jenkins@jenkins.phx.ovirt.org> wrote:

Project: https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/
Build: https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/962/
Build Number: 962
Build Status: Still Failing
Triggered By: Started by timer

-------------------------------------
Changes Since Last Success:
-------------------------------------
Changes for Build #953
[Michal Skrivanek] randomize /dev/shm logcollector tmp directory

Changes for Build #954
[Michal Skrivanek] randomize /dev/shm logcollector tmp directory

Changes for Build #955
[Michal Skrivanek] randomize /dev/shm logcollector tmp directory

Changes for Build #956
[Michal Skrivanek] randomize /dev/shm logcollector tmp directory

Changes for Build #957
[Michal Skrivanek] randomize /dev/shm logcollector tmp directory

Changes for Build #958
[Michal Skrivanek] randomize /dev/shm logcollector tmp directory

Changes for Build #959
[Michal Skrivanek] randomize /dev/shm logcollector tmp directory

Changes for Build #960
[Andrej Cernek] pylint: Upgrade to 2.7

Changes for Build #961
[Andrej Cernek] pylint: Upgrade to 2.7

Changes for Build #962
[Andrej Cernek] pylint: Upgrade to 2.7

-----------------
Failed Tests:
-----------------
1 tests failed.
FAILED: basic-suite-master.test-scenarios.test_001_initialize_engine.test_set_hostnames

Error Message:
failed on setup with "TypeError: __new__() missing 2 required positional arguments: 'version' and 'repo'"

Stack Trace:
ansible_by_hostname = <function module_mapper_for at 0x7ffbad0acc80>

   @pytest.fixture(scope="session", autouse=True)
   def check_installed_packages(ansible_by_hostname):
       vms_pckgs_dict_list = []
       for hostname in backend.default_backend().hostnames():
           vm_pckgs_dict = _get_custom_repos_packages(
             ansible_by_hostname(hostname))

ost_utils/ost_utils/pytest/fixtures/check_repos.py:39:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ost_utils/ost_utils/pytest/fixtures/check_repos.py:55: in _get_custom_repos_packages
   repo_name)
ost_utils/ost_utils/pytest/fixtures/check_repos.py:69: in _get_installed_packages
   Package(*line) for line in result
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

.0 = <list_iterator object at 0x7ffba6e97860>

     Package(*line) for line in result
   ]
E   TypeError: __new__() missing 2 required positional arguments: 'version' and 'repo'

This failed, because 'dnf repo-pkgs' has split the output to two
lines, so the first
didn't include a version [1]:

lago-basic-suite-master-host-1 | CHANGED | rc=0 >>
Installed Packages
ovirt-ansible-collection.noarch 1.3.2-0.1.master.20210315141358.el8 @extra-src-1
python3-ovirt-engine-sdk4.x86_64
                               4.4.10-1.20210315.gitf8b9f2a.el8    @extra-src-1

We should either give up on this, or rewrite the call 'dnf repo-pkgs'
in some other
language that does not require parsing of human-targeted output
(perhaps python or
ansible), or amend a bit the current code and hope it will survive longer...

Trying last one:

https://gerrit.ovirt.org/c/ovirt-system-tests/+/113895

Merged, but we still fail in nightly (which I ran manually):

https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/963/console

16:06:44 >           raise RuntimeError('None of user custom repos has
been used')
16:06:44 E           RuntimeError: None of user custom repos has been used

I think this is "by design" - this job runs with a "custom repo"
pointing at master-snapshot, and apparently at least in this run it
didn't see updates, so failed.

wouldn’t it fail exactly during the night when we build a fresh ost-image and run with custom repo that has nothing newer, obviously, since we just built the image?

In principle, yes, but in practice we did have some runs that did not fail. I looked a bit trying to understand why they didn't fail and gave up, deciding it's not that important. I think it's related to the timing of these jobs and the publisher, ost-images, etc.

I wonder if this is simply a design issue, or we should change the
nightly run to not use a custom repo, or something else.

In any case, perhaps we should consider completely reverting
check_repos.py for now, until we decide what we want. It was a good
idea, but we can't let basic-suite remain red for so long. And then,
we can get back to the issue of ovirt-log-collector...

I already pushed a patch to make it warn instead of fail, but since you didn't merge yet,

we don’t need so radical changes/investment just yet, just dropping it from nightly config should be enough
I think it is rather a jenkins config issue. For nightly we do not need custom repo

I now also pushed these:

https://gerrit.ovirt.org/c/jenkins/+/113904 Remove custom repos from ovirt-system-tests

https://gerrit.ovirt.org/c/ovirt-system-tests/+/113905 automation: Add ovirt-master-snapshot

I think they are rather safe to merge right now, as-is, in whatever order.

If something breaks, we can fix it later.

Best regards,

Didi