[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

30 Nov 2018


      On Fri, Nov 30, 2018 at 2:18 PM Dan Kenigsberg <danken@redhat.com> wrote:
...
On Fri, 30 Nov 2018, 19:33 Dafna Ron <dron@redhat.com wrote:
...
Hi,
This mail is to provide the current status of CQ and allow people to
review status before and after the weekend.
Please refer to below colour map for further information on the meaning
of the colours.
*CQ-4.2*: RED (#1)
I checked last date ovirt-engine and vdsm passed and moved packages to
tested as they are the bigger projects and it was on the 27-11-218.
We have been having sporadic failures for most of the projects on test
check_snapshot_with_memory.
We have deducted that this is caused by a code regression in storage
based on the following things:
1.Evgheni and Gal helped debug this issue to rule out lago and infra
issue as the cause of failure and both determined the issue is a code
regression - most likely in storage.
2. The failure only happens on 4.2 branch.
3. the failure itself is cannot run a vm due to low disk space in storage
domain and we cannot see any failures which would leave any leftovers in
the storage domain.
Dan and Ryan are actively
Actually,  my involvement was a misguided attempt to solve another 4.2
failure that I thought that I've seen.
involved
...
in trying to find the regression but the consensus is that this is a
storage related regression and* we are having a problem getting the
storage team to join us in debugging the issue. *
I prepared a patch to skip the test in case we cannot get cooperation
from storage team and resolve this regression in the next few days:
https://gerrit.ovirt.org/#/c/95889/
Why do you consider this? Are we considering a release of 4.2 without live
snapshot?
No, we aren't.
...
Please do not merge it without an ack from Tal and Ryan.
Until we can bisect it, have you considered simply making a larger iSCSI
volume so OST stops failing there? I know it's an additional burden on
Infra's resources, and it's hopefully something we can revert later, but
it's likely to make OST pass for now so we can identify if/where other
failures are before we discover that even disabling this test (which I'm
against) doesn't make OST pass and we've lost a good bisection point.
...
...
*CQ-Master:* YELLOW (#1)
We have failures which CQ is still bisecting and until its done we cannot
point to any specific failing projects.
Happy week!
Dafna
-------------------------------------------------------------------------------------------------------------------
COLOUR MAP
Green = job has been passing successfully
** green for more than 3 days may suggest we need a review of our test
coverage
1.
1-3 days       GREEN (#1)
   2.
4-7 days       GREEN (#2)
   3.
Over 7 days GREEN (#3)
Yellow = intermittent failures for different projects but no lasting or
current regressions
** intermittent would be a healthy project as we expect a number of
failures during the week
** I will not report any of the solved failures or regressions.
1.
Solved job failures        YELLOW (#1)
   2.
Solved regressions      YELLOW (#2)
Red = job has been failing
** Active Failures. The colour will change based on the amount of time
the project/s has been broken. Only active regressions would be reported.
1.
1-3 days      RED (#1)
   2.
4-7 days      RED (#2)
   3.
Over 7 days RED (#3)
-- 

Ryan Barry

Associate Manager - RHV Virt/SLA

rbarry@redhat.com    M: +16518159306     IM: rbarry
<https://red.ht/sig>