On Tue, Apr 6, 2021 at 9:24 AM Marcin Sobczyk <msobczyk(a)redhat.com> wrote:
Hi,
On 4/6/21 7:23 AM, Yedidyah Bar David wrote:
> On Mon, Apr 5, 2021 at 5:53 AM <jenkins(a)jenkins.phx.ovirt.org> wrote:
>> Project:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
>> Build:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1974/
> FYI: This failed twice in a row (1973 and 1974), for the same reason.
> I reproduced locally, looked a bit, failed to find the root cause.
> When I connected
> to host-1's console, it was stuck in emergency after reboot. I checked
> a bit, there
> was some error about kdump failing to read the kernel image
> ( /boot/vmlinuz-4.18.0-240.15.1.el8_3.x86_64 ), when I tried manually
> as root I did
> manage to read it. I rebooted, and the VM came up fine. I decided to
> try OST again,
> cleaned up and ran it, and opened a 'lago console' on the vm after it
> was up, but
> OST passed. Tried again, passed again. Then I manually ran in CI 1975
> and it passed,
> and also the nightly 1976 passed. So I am going to ignore for now.
>
> I think we need a patch to make lago/OST log consoles of all the VMs.
> I might try
> to work on this.
Also stumbled upon this. Please take a look at
https://gerrit.ovirt.org/#/c/ovirt-system-tests/+/114050/
Yes, I did notice this change and wondered if it's related...
But it's not merged yet, and still HE passed at least 4 times (two locally,
two on CI). Obviously this does not prove that the issue is fixed.
Anyway, in addition to merely fixing it (which perhaps your patch does),
I also wanted to emphasize the importance of making it easier to fix
future such cases. How did you manage to find the root cause?
Best regards,
--
Didi