[ovirt-devel] "env issues" in CI (was: virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!))

14 Oct 2020

      On Tue, Oct 13, 2020 at 6:46 PM Nir Soffer <nsoffer@redhat.com> wrote:
...
On Mon, Oct 12, 2020 at 9:05 AM Yedidyah Bar David <didi@redhat.com> wrote:
...
The next run of the job (480) did finish successfully. No idea if it
was already fixed by a patch, or is simply a random/env issue.
I think this is env issue, we run on overloaded vms with small amount of memory.
I have seen such radnom failures before.
Generally speaking, I think we must aim for zero failures due to "env
issues" - and not ignore them as such.

It would obviously be nice if we had more hardware in CI, no doubt.

But I wonder if perhaps stressing the system like we do (due to resources
scarcity) is actually a good thing - that it helps us find bugs that real
users might also run into in actually legitimate scenarios - meaning, using
what we recommend in terms of hardware etc. but with a load that is higher
than what we have in CI per-run - as, admittedly, we only have minimal
_data_ there.

So: If we decide that some code "worked as designed" and failed due to
"env issue", I still think we should fix this - either in our code, or
in CI.

For latter, I do not think it makes sense to just say "the machines are
overloaded and not have enough memory" - we must come up with concrete
details - e.g. "We need at least X MiB RAM".

For current issue, if we are certain that this is due to low mem, it's
quite easy to e.g. revert this patch:

https://gerrit.ovirt.org/110530

Obviously it will mean either longer queues or over-committing (higher
load). Not sure which.

But personally, I wouldn't do that without knowing more (e.g. following
the other thread).

Best regards,
--
Didi