Change-queue job failures this weekend

Sun Jan 21 10:47:00 UTC 2018

On 21 January 2018 at 12:39, Eyal Edri <eedri at redhat.com> wrote:

> There is another issue, which is currently failing all CQ, and its related
> to the new IBRS CPU model.
> It looks like all of the lago slaves were upgraded to new Libvirt and
> kernel on Friday, while we still don't have a fix on lago-ost-plugin for
> that.
>
> I think there was a misunderstanding about what to upgrade, and it might
> have been understood that only the bios upgrade breaks it and not the
> kernel one.
>
> In any case, we're currently fixing the issue, either by downgrading the
> relevant pkgs on lago slaves or adding the mapping to new CPU types from
> OST.
>
> For future, I suggest a few updates to maintenance work on Jenkins slaves
> ( VMs or BM ):
>
> 1. Let's avoid doing an upgrade close to a weekend ( i.e not on Thu-Sun ),
> so all the team can be around to help if needed or if something unexpected
> happens.
> 2. When we have a system-wide upgrade scheduled, like all BM slaves or VMs
> for a specific OS, let's adopt a gradual upgrade with a few days window in
> between,
>   e.g, if we need to upgrade all Lago slaves, let's upgrade 1-2 and wait
> to see if nothing breaks and continue after we verify OST runs ( either
> seeing on CQ or running manually )
>
>
> Thoughts?
>
>
We have a staging system - we should be using it for staging....

-- 
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20180121/1a28d3c9/attachment.html>