After some analysis, I think the bug we are seeing here is
https://bugzilla.redhat.com/show_bug.cgi?id=1588061
This applies for suspend/resume and also for a snapshot with memory.
Following the steps and considering that the iscsi storage domain is only
20GB, this should be the reason for reaching ~4GB free space
On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir <ratamir(a)redhat.com> wrote:
On Fri, Nov 30, 2018, 21:57 Ryan Barry <rbarry(a)redhat.com wrote:
>
>
> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir <ratamir(a)redhat.com> wrote:
>
>>
>>
>> On Fri, Nov 30, 2018, 19:33 Dafna Ron <dron(a)redhat.com wrote:
>>
>>> Hi,
>>>
>>> This mail is to provide the current status of CQ and allow people to
>>> review status before and after the weekend.
>>> Please refer to below colour map for further information on the meaning
>>> of the colours.
>>>
>>> *CQ-4.2*: RED (#1)
>>>
>>> I checked last date ovirt-engine and vdsm passed and moved packages to
>>> tested as they are the bigger projects and it was on the 27-11-218.
>>>
>>> We have been having sporadic failures for most of the projects on test
>>> check_snapshot_with_memory.
>>> We have deducted that this is caused by a code regression in storage
>>> based on the following things:
>>> 1.Evgheni and Gal helped debug this issue to rule out lago and infra
>>> issue as the cause of failure and both determined the issue is a code
>>> regression - most likely in storage.
>>> 2. The failure only happens on 4.2 branch.
>>> 3. the failure itself is cannot run a vm due to low disk space in
>>> storage domain and we cannot see any failures which would leave any
>>> leftovers in the storage domain.
>>>
>> Can you please share the link to the execution?
>>
>
> Here's an example of one run:
>
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>
> The iSCSI storage domain starts emitting warnings about low storage space
> immediately after removing the VmPool, but it's possible that the storage
> domain is filling before that from some other call prior to that which is
> still running, possibly the VM import.
>
Thanks Ryan, I'll try to help with debugging this issue
>
>
>>
>>> Dan and Ryan are actively involved in trying to find the regression but
>>> the consensus is that this is a storage related regression and* we are
>>> having a problem getting the storage team to join us in debugging the
>>> issue. *
>>>
>>> I prepared a patch to skip the test in case we cannot get cooperation
>>> from storage team and resolve this regression in the next few days:
>>>
https://gerrit.ovirt.org/#/c/95889/
>>>
>>> *CQ-Master:* YELLOW (#1)
>>>
>>> We have failures which CQ is still bisecting and until its done we
>>> cannot point to any specific failing projects.
>>>
>>>
>>> Happy week!
>>> Dafna
>>>
>>>
>>>
>>>
-------------------------------------------------------------------------------------------------------------------
>>> COLOUR MAP
>>>
>>> Green = job has been passing successfully
>>>
>>> ** green for more than 3 days may suggest we need a review of our test
>>> coverage
>>>
>>>
>>> 1.
>>>
>>> 1-3 days GREEN (#1)
>>> 2.
>>>
>>> 4-7 days GREEN (#2)
>>> 3.
>>>
>>> Over 7 days GREEN (#3)
>>>
>>>
>>> Yellow = intermittent failures for different projects but no lasting or
>>> current regressions
>>>
>>> ** intermittent would be a healthy project as we expect a number of
>>> failures during the week
>>>
>>> ** I will not report any of the solved failures or regressions.
>>>
>>>
>>> 1.
>>>
>>> Solved job failures YELLOW (#1)
>>> 2.
>>>
>>> Solved regressions YELLOW (#2)
>>>
>>>
>>> Red = job has been failing
>>>
>>> ** Active Failures. The colour will change based on the amount of time
>>> the project/s has been broken. Only active regressions would be reported.
>>>
>>>
>>> 1.
>>>
>>> 1-3 days RED (#1)
>>> 2.
>>>
>>> 4-7 days RED (#2)
>>> 3.
>>>
>>> Over 7 days RED (#3)
>>>
>>>
>>>
>
> --
>
> Ryan Barry
>
> Associate Manager - RHV Virt/SLA
>
> rbarry(a)redhat.com M: +16518159306 IM: rbarry
> <
https://red.ht/sig>
>
--
Raz Tamir
Manager, RHV QE