Re: [ OST Failure Report ] [ oVirt Master (log-collector/engine/vdsm) ] [ 23/25/26-03-2018 ] [004_basic_sanity.hotplug_cpu ]

29 Mar 2018

      Milan Zamazal <mzamazal@redhat.com> writes:
...
Dafna Ron <dron@redhat.com> writes:
...
Can you post the fix that you added on the mail?
I split hotplug_cpu test in https://gerrit.ovirt.org/89551
After some thought, I have a different fix:
https://gerrit.ovirt.org/89589

This one checks whether ssh (and not only network) is available after
resume and if not then it reboots the VM as if network wasn't available.
...
We'll see whether it is actually useful once the disk_operations failure
is fixed.  If it still causes problem then hotplug_cpu_guest_check
introduced by the patch should be disabled until we can log into the
guest OS reliably.
...
On Wed, Mar 28, 2018 at 9:23 AM, Milan Zamazal <mzamazal@redhat.com> wrote:
...
Milan Zamazal <mzamazal@redhat.com> writes:
...
Dafna Ron <dron@redhat.com> writes:
...
We have a failure that seems to be random and happening in several
projects.
Does this failure occur only recently or has it been present for ages?
...
from what I can see, we are failing due to a timing issue in the test
itself because we are querying the vm after its been destroyed in
engine.
looking at engine, I can see that the vm was actually shut down,
No, the VM was shut down in another test.  It's already running again in
hotplug_cpu.
...
I would like to disable this test until we can fix the issue since it
already failed about 7 different patches from different projects.
I can see that Gal has already increased the timeout.  I think the test
could be split to reduce the waiting delay, I'll post a patch for that.
BTW I think the primary cause of the trouble are the infamous Cirros
networking recovery problems.