
On Mon, Jan 18, 2021 at 11:19 AM Marcin Sobczyk <msobczyk@redhat.com> wrote:
On 1/18/21 9:58 AM, Yedidyah Bar David wrote:
On Mon, Jan 18, 2021 at 10:53 AM Martin Perina <mperina@redhat.com> wrote:
On Mon, Jan 18, 2021 at 9:08 AM Yedidyah Bar David <didi@redhat.com> wrote:
On Sun, Jan 17, 2021 at 3:11 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David <didi@redhat.com> wrote: > On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David <didi@redhat.com> wrote: >> On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David <didi@redhat.com> wrote: >>> On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk <msobczyk@redhat.com> wrote: >>>> Hi, >>>> >>>> my guess is it's selinux-related. >>>> >>>> Unfortunately I can't find any meaningful errors in audit.log in a >>>> scenario where host deployment fails. >>>> However switching selinux to permissive mode before adding hosts makes >>>> the problem go away, so it's probably not an error somewhere in logic. >>> It's getting weirder: Under strace, it succeeds: >>> >>> https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948 >>> >>> (Can't see the actual log, as I didn't add '-A', so it was overwritten >>> on restart...) >> After updating it to use '-A' it indeed shows that it worked: >> >> 43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK >> <unfinished ...> >> 43664 14:16:55.997695 <... access resumed>) = 0 >> >> Weird. >> >> Now ran in parallel 'ci test' for this patch and another one from >> master, for comparison: > Again, the same: > >> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/ > With strace, passed, > >> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/ > Without strace, failed. > > Last nightly run that passed [1] used: > > ost-images-el8-host-installed-1-202101100446.x86_64 > ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64 > > Trying now with these - not sure it possible to put specific versions inside > automation/*packages, let's see: > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977 Indeed, with a fixed ost-images and removing updates, it passes. network suite failed, but he-basic passed:
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/...
So I am quite certain this is an OS issue. Not sure how we do not see this in basic-suite. Perhaps it's related to nested-kvm, or to load/slowness caused by that? Weird.
when this fails, we do not collect all engine's /var/log, only messages and ovirt-engine/ . So it's not easy to get a list of the packages that were updated.
Pushed now:
https://github.com/oVirt/ovirt-ansible-collection/pull/202
to get all of engine's /var/log, and ran manual HE job with it:
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests... This one I accidentally ran with the wrong repo, then ran another one with the correct repo [1], But:
1. The repo wasn't used. Emailed about this a separate thread: "manual job does not use custom repo"
2. It passed! Being what seems like a heisenbug, I understand why when you run it under strace it works differently. But even if you just intend to collect more logs it also causes it to behave differently? :-) This does not mean that "problem solved" - latest nightly run [2] did fail with the same error. Status:
1. he-basic-suite is still failing.
2. Patch to collect all of /var/log from the engine merged.
Dana, can you please update? Did you have any progress?
IMO it's an OS bug. If Marcin says it's an selinux issue, I do not argue :-). So, how do we continue?
Switching to CentOS Stream development/testing is a big effort, I'm not sure we can do this and still deliver all the RFEs/bugs planned for 4.4.5 ...
+1 IMO we should now revert appliance and node to CentOS 8.3, and then continue the discussion. Having he-basic-suite broken for a week is too much. +1 The testing infrastructure for Stream is here, but if it doesn't work yet than let's stick to the plan and focus on 8.3.
Just to conclude the original issue - a workaround found, root cause still under investigation. Commented on the bugs (oVirt and Stream) with details. Best regards, -- Didi