In order to not block other patches on CQ, I've sent [1] which will double
the amount of space on the ISCSI SD (with the patch it will have 40GB).
As a side note, we use the same configuration on the master suite, which
may explain
why we don't see the issue there.
Why did we use different configurations?
Can we extract the configuration to external file that will be shared by
both master
and 4.x suites?
[1]
https://gerrit.ovirt.org/#/c/95922/
On Sun, Dec 2, 2018 at 5:41 PM Gal Ben Haim <gbenhaim(a)redhat.com> wrote:
> Below you can find 2 jobs, one that succeeded and the other failed on the
> iscsi issue.
> Both were triggered by unrelated patches.
>
> Success -
>
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3546/
> Failure -
>
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3544/
>
>
> On Sun, Dec 2, 2018 at 2:37 PM Gal Ben Haim <gbenhaim(a)redhat.com> wrote:
>
>> Raz, thanks for the investigation.
>> I'll send a patch for increasing the luns size.
>>
>> On Sun, Dec 2, 2018 at 1:27 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>>> On Sun, Dec 2, 2018, 10:44 Raz Tamir <ratamir(a)redhat.com wrote:
>>>
>>>> After some analysis, I think the bug we are seeing here is
>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1588061
>>>> This applies for suspend/resume and also for a snapshot with memory.
>>>> Following the steps and considering that the iscsi storage domain is
>>>> only 20GB, this should be the reason for reaching ~4GB free space
>>>>
>>>
>>>
>>> OST configuration should change so it is will not fail because of such
>>> bugs.
>>>
>>
>> I disagree. the purpose of OST it to catch bugs, not covering them.
>>
>>>
>>> Iscsi storage can be created using sparse files, not consuming any
>>> resources until you write to the lvs, so having 100g storage domain cost
>>> nothing.
>>>
>>
>> OST use sparse files.
>>
>>>
>>> Nir
>>>
>>>
>>>> On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir <ratamir(a)redhat.com>
wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 30, 2018, 21:57 Ryan Barry <rbarry(a)redhat.com wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir
<ratamir(a)redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 30, 2018, 19:33 Dafna Ron <dron(a)redhat.com
wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This mail is to provide the current status of CQ and
allow people
>>>>>>>> to review status before and after the weekend.
>>>>>>>> Please refer to below colour map for further information
on the
>>>>>>>> meaning of the colours.
>>>>>>>>
>>>>>>>> *CQ-4.2*: RED (#1)
>>>>>>>>
>>>>>>>> I checked last date ovirt-engine and vdsm passed and
moved
>>>>>>>> packages to tested as they are the bigger projects and it
was on the
>>>>>>>> 27-11-218.
>>>>>>>>
>>>>>>>> We have been having sporadic failures for most of the
projects on
>>>>>>>> test check_snapshot_with_memory.
>>>>>>>> We have deducted that this is caused by a code regression
in
>>>>>>>> storage based on the following things:
>>>>>>>> 1.Evgheni and Gal helped debug this issue to rule out
lago and
>>>>>>>> infra issue as the cause of failure and both determined
the issue is a code
>>>>>>>> regression - most likely in storage.
>>>>>>>> 2. The failure only happens on 4.2 branch.
>>>>>>>> 3. the failure itself is cannot run a vm due to low disk
space in
>>>>>>>> storage domain and we cannot see any failures which would
leave any
>>>>>>>> leftovers in the storage domain.
>>>>>>>>
>>>>>>> Can you please share the link to the execution?
>>>>>>>
>>>>>>
>>>>>> Here's an example of one run:
>>>>>>
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>>>>>>
>>>>>> The iSCSI storage domain starts emitting warnings about low
storage
>>>>>> space immediately after removing the VmPool, but it's
possible that the
>>>>>> storage domain is filling before that from some other call prior
to that
>>>>>> which is still running, possibly the VM import.
>>>>>>
>>>>> Thanks Ryan, I'll try to help with debugging this issue
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Dan and Ryan are actively involved in trying to find the
>>>>>>>> regression but the consensus is that this is a storage
related
>>>>>>>> regression and* we are having a problem getting the
storage team
>>>>>>>> to join us in debugging the issue. *
>>>>>>>>
>>>>>>>> I prepared a patch to skip the test in case we cannot
get
>>>>>>>> cooperation from storage team and resolve this regression
in the next few
>>>>>>>> days:
>>>>>>>>
https://gerrit.ovirt.org/#/c/95889/
>>>>>>>>
>>>>>>>> *CQ-Master:* YELLOW (#1)
>>>>>>>>
>>>>>>>> We have failures which CQ is still bisecting and until
its done we
>>>>>>>> cannot point to any specific failing projects.
>>>>>>>>
>>>>>>>>
>>>>>>>> Happy week!
>>>>>>>> Dafna
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
-------------------------------------------------------------------------------------------------------------------
>>>>>>>> COLOUR MAP
>>>>>>>>
>>>>>>>> Green = job has been passing successfully
>>>>>>>>
>>>>>>>> ** green for more than 3 days may suggest we need a
review of our
>>>>>>>> test coverage
>>>>>>>>
>>>>>>>>
>>>>>>>> 1.
>>>>>>>>
>>>>>>>> 1-3 days GREEN (#1)
>>>>>>>> 2.
>>>>>>>>
>>>>>>>> 4-7 days GREEN (#2)
>>>>>>>> 3.
>>>>>>>>
>>>>>>>> Over 7 days GREEN (#3)
>>>>>>>>
>>>>>>>>
>>>>>>>> Yellow = intermittent failures for different projects but
no
>>>>>>>> lasting or current regressions
>>>>>>>>
>>>>>>>> ** intermittent would be a healthy project as we expect a
number
>>>>>>>> of failures during the week
>>>>>>>>
>>>>>>>> ** I will not report any of the solved failures or
regressions.
>>>>>>>>
>>>>>>>>
>>>>>>>> 1.
>>>>>>>>
>>>>>>>> Solved job failures YELLOW (#1)
>>>>>>>> 2.
>>>>>>>>
>>>>>>>> Solved regressions YELLOW (#2)
>>>>>>>>
>>>>>>>>
>>>>>>>> Red = job has been failing
>>>>>>>>
>>>>>>>> ** Active Failures. The colour will change based on the
amount of
>>>>>>>> time the project/s has been broken. Only active
regressions would be
>>>>>>>> reported.
>>>>>>>>
>>>>>>>>
>>>>>>>> 1.
>>>>>>>>
>>>>>>>> 1-3 days RED (#1)
>>>>>>>> 2.
>>>>>>>>
>>>>>>>> 4-7 days RED (#2)
>>>>>>>> 3.
>>>>>>>>
>>>>>>>> Over 7 days RED (#3)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ryan Barry
>>>>>>
>>>>>> Associate Manager - RHV Virt/SLA
>>>>>>
>>>>>> rbarry(a)redhat.com M: +16518159306 IM: rbarry
>>>>>> <
https://red.ht/sig>
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Raz Tamir
>>>> Manager, RHV QE
>>>> _______________________________________________
>>>> Devel mailing list -- devel(a)ovirt.org
>>>> To unsubscribe send an email to devel-leave(a)ovirt.org
>>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>> oVirt Code of Conduct:
>>>>
https://www.ovirt.org/community/about/community-guidelines/
>>>> List Archives:
>>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/6EFAA4LR743...
>>>>
>>> _______________________________________________
>>> Devel mailing list -- devel(a)ovirt.org
>>> To unsubscribe send an email to devel-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>>
https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/ZNMZS7V2TLR...
>>>
>>
>>
>> --
>> *GAL bEN HAIM*
>> RHV DEVOPS
>>
>
>
> --
> *GAL bEN HAIM*
> RHV DEVOPS
>
--
*GAL bEN HAIM*
RHV DEVOPS
_______________________________________________
Devel mailing list -- devel(a)ovirt.org
To unsubscribe send an email to devel-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/MP277EZWHCF...