
On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk <msobczyk@redhat.com> wrote:
Hi,
my guess is it's selinux-related.
Unfortunately I can't find any meaningful errors in audit.log in a scenario where host deployment fails. However switching selinux to permissive mode before adding hosts makes the problem go away, so it's probably not an error somewhere in logic.
It's getting weirder: Under strace, it succeeds:
https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948
(Can't see the actual log, as I didn't add '-A', so it was overwritten on restart...)
After updating it to use '-A' it indeed shows that it worked:
43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK <unfinished ...> 43664 14:16:55.997695 <... access resumed>) = 0
Weird.
Now ran in parallel 'ci test' for this patch and another one from master, for comparison:
Again, the same:
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/
With strace, passed,
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/
Without strace, failed.
Last nightly run that passed [1] used:
ost-images-el8-host-installed-1-202101100446.x86_64 ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64
Trying now with these - not sure it possible to put specific versions inside automation/*packages, let's see:
Indeed, with a fixed ost-images and removing updates, it passes. network suite failed, but he-basic passed:
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/...
So I am quite certain this is an OS issue. Not sure how we do not see this in basic-suite. Perhaps it's related to nested-kvm, or to load/slowness caused by that? Weird.
when this fails, we do not collect all engine's /var/log, only messages and ovirt-engine/ . So it's not easy to get a list of the packages that were updated.
Pushed now:
https://github.com/oVirt/ovirt-ansible-collection/pull/202
to get all of engine's /var/log, and ran manual HE job with it:
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests...
This one I accidentally ran with the wrong repo, then ran another one with the correct repo [1], But: 1. The repo wasn't used. Emailed about this a separate thread: "manual job does not use custom repo" 2. It passed! Being what seems like a heisenbug, I understand why when you run it under strace it works differently. But even if you just intend to collect more logs it also causes it to behave differently? :-) This does not mean that "problem solved" - latest nightly run [2] did fail with the same error. [1] https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests... [2] https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1887/
[1] https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1879/
-- Didi
-- Didi