Change-queue job failures this weekend

Daniel Belenky dbelenky at redhat.com
Sun Jan 21 11:00:49 UTC 2018


Due to a bug with CPU model type in OST, right now, we have only 2
available physical slaves to run tests (srv04 and srv05).
All the other slaves are temporary offline until the issue will be solved.
Please, avoid any changes to srv04 and srv05.

On Sun, Jan 21, 2018 at 12:50 PM, Eyal Edri <eedri at redhat.com> wrote:

>
>
> On Sun, Jan 21, 2018 at 12:47 PM, Barak Korren <bkorren at redhat.com> wrote:
>
>>
>>
>> On 21 January 2018 at 12:39, Eyal Edri <eedri at redhat.com> wrote:
>>
>>> There is another issue, which is currently failing all CQ, and its
>>> related to the new IBRS CPU model.
>>> It looks like all of the lago slaves were upgraded to new Libvirt and
>>> kernel on Friday, while we still don't have a fix on lago-ost-plugin for
>>> that.
>>>
>>> I think there was a misunderstanding about what to upgrade, and it might
>>> have been understood that only the bios upgrade breaks it and not the
>>> kernel one.
>>>
>>> In any case, we're currently fixing the issue, either by downgrading the
>>> relevant pkgs on lago slaves or adding the mapping to new CPU types from
>>> OST.
>>>
>>> For future, I suggest a few updates to maintenance work on Jenkins
>>> slaves ( VMs or BM ):
>>>
>>> 1. Let's avoid doing an upgrade close to a weekend ( i.e not on Thu-Sun
>>> ), so all the team can be around to help if needed or if something
>>> unexpected happens.
>>> 2. When we have a system-wide upgrade scheduled, like all BM slaves or
>>> VMs for a specific OS, let's adopt a gradual upgrade with a few days window
>>> in between,
>>>   e.g, if we need to upgrade all Lago slaves, let's upgrade 1-2 and wait
>>> to see if nothing breaks and continue after we verify OST runs ( either
>>> seeing on CQ or running manually )
>>>
>>>
>>> Thoughts?
>>>
>>>
>> We have a staging system - we should be using it for staging....
>>
>
> Do we have OST tests or manual job avaialble there?
> In any case, this doesn't contradict what I suggested, even if you test on
> staging, there could be differences from the production system, so we
> should take care when we upgrade regardless.
>
> Another point when scheduling an upgrade, is to talk to infra owner or the
> CI team and understand if we currently have a large Q in CQ or known
> failures, so it might be best to wait a bit until its cleared.
>
>
>>
>>
>> --
>> Barak Korren
>> RHV DevOps team , RHCE, RHCi
>> Red Hat EMEA
>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>
>
>
>
> --
>
> Eyal edri
>
>
> MANAGER
>
> RHV DevOps
>
> EMEA VIRTUALIZATION R&D
>
>
> Red Hat EMEA <https://www.redhat.com/>
> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
> phone: +972-9-7692018 <+972%209-769-2018>
> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
>


-- 

DANIEL BELENKY

RHV DEVOPS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20180121/3e9904dd/attachment.html>


More information about the Infra mailing list