On 21 January 2018 at 12:39, Eyal Edri <eedri(a)redhat.com> wrote:
There is another issue, which is currently failing all CQ, and its
related
to the new IBRS CPU model.
It looks like all of the lago slaves were upgraded to new Libvirt and
kernel on Friday, while we still don't have a fix on lago-ost-plugin for
that.
I think there was a misunderstanding about what to upgrade, and it might
have been understood that only the bios upgrade breaks it and not the
kernel one.
In any case, we're currently fixing the issue, either by downgrading the
relevant pkgs on lago slaves or adding the mapping to new CPU types from
OST.
For future, I suggest a few updates to maintenance work on Jenkins slaves
( VMs or BM ):
1. Let's avoid doing an upgrade close to a weekend ( i.e not on Thu-Sun ),
so all the team can be around to help if needed or if something unexpected
happens.
2. When we have a system-wide upgrade scheduled, like all BM slaves or VMs
for a specific OS, let's adopt a gradual upgrade with a few days window in
between,
e.g, if we need to upgrade all Lago slaves, let's upgrade 1-2 and wait
to see if nothing breaks and continue after we verify OST runs ( either
seeing on CQ or running manually )
Thoughts?
We have a staging system - we should be using it for staging....
--
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted