<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 21 January 2018 at 12:50, Eyal Edri <span dir="ltr"><<a href="mailto:eedri@redhat.com" target="_blank">eedri@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Sun, Jan 21, 2018 at 12:47 PM, Barak Korren <span dir="ltr"><<a href="mailto:bkorren@redhat.com" target="_blank">bkorren@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On 21 January 2018 at 12:39, Eyal Edri <span dir="ltr"><<a href="mailto:eedri@redhat.com" target="_blank">eedri@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">There is another issue, which is currently failing all CQ, and its related to the new IBRS CPU model.<div>It looks like all of the lago slaves were upgraded to new Libvirt and kernel on Friday, while we still don't have a fix on lago-ost-plugin for that. </div><div><br></div><div>I think there was a misunderstanding about what to upgrade, and it might have been understood that only the bios upgrade breaks it and not the kernel one.</div><div><br></div><div>In any case, we're currently fixing the issue, either by downgrading the relevant pkgs on lago slaves or adding the mapping to new CPU types from OST.</div><div><br></div><div>For future, I suggest a few updates to maintenance work on Jenkins slaves ( VMs or BM ):</div><div><br></div><div>1. Let's avoid doing an upgrade close to a weekend ( i.e not on Thu-Sun ), so all the team can be around to help if needed or if something unexpected happens.</div><div>2. When we have a system-wide upgrade scheduled, like all BM slaves or VMs for a specific OS, let's adopt a gradual upgrade with a few days window in between, </div><div> e.g, if we need to upgrade all Lago slaves, let's upgrade 1-2 and wait to see if nothing breaks and continue after we verify OST runs ( either seeing on CQ or running manually ) </div><div><br></div><div><br></div><div>Thoughts? </div><div><br></div></div></blockquote><div><br></div></span><div>We have a staging system - we should be using it for staging.... <br></div></div></div></div></blockquote><div><br></div></span><div>Do we have OST tests or manual job avaialble there? </div></div></div></div></blockquote><div><br></div><div>We can add them easily, or simply run Lago manually when needed.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><div class="gmail_quote"><div>In any case, this doesn't contradict what I suggested, even if you test on staging, there could be differences from the production system, so we should take care when we upgrade regardless.</div></div></div></div></blockquote><div><br></div><div>Yes, but at least we'd know we green lighted the new configuration - I'm sure in this case we could have found at least some of the issues on staging (Like the fc27 issues for example) and could have avoided expansive production failures.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Another point when scheduling an upgrade, is to talk to infra owner or the CI team and understand if we currently have a large Q in CQ or known failures, so it might be best to wait a bit until its cleared.</div><span class=""><div><br></div></span></div></div></div></blockquote><div><br></div></div><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Barak Korren<br>RHV DevOps team , RHCE, RHCi<br>Red Hat EMEA<br><a href="http://redhat.com" target="_blank">redhat.com</a> | TRIED. TESTED. TRUSTED. | <a href="http://redhat.com/trusted" target="_blank">redhat.com/trusted</a></div>
</div></div>