[JIRA] (OVIRT-1856) Re: Change-queue job failures this weekend

Daniel Belenky (oVirt JIRA) jira at ovirt-jira.atlassian.net
Sun Jan 21 11:38:34 UTC 2018


    [ https://ovirt-jira.atlassian.net/browse/OVIRT-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35691#comment-35691 ] 

Daniel Belenky commented on OVIRT-1856:
---------------------------------------

As for now, [tester 5029|http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt-master_change-queue-tester/5029/] passed with 175 patches.
So for now, we know that there are no unknown regressions in master repo.

> Re: Change-queue job failures this weekend
> ------------------------------------------
>
>                 Key: OVIRT-1856
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1856
>             Project: oVirt - virtualization made easy
>          Issue Type: By-EMAIL
>            Reporter: eyal edri
>            Assignee: infra
>
> On Sun, Jan 21, 2018 at 1:01 PM, Barak Korren <bkorren at redhat.com> wrote:
> >
> >
> > On 21 January 2018 at 12:50, Eyal Edri <eedri at redhat.com> wrote:
> >
> >>
> >>
> >> On Sun, Jan 21, 2018 at 12:47 PM, Barak Korren <bkorren at redhat.com>
> >> wrote:
> >>
> >>>
> >>>
> >>> On 21 January 2018 at 12:39, Eyal Edri <eedri at redhat.com> wrote:
> >>>
> >>>> There is another issue, which is currently failing all CQ, and its
> >>>> related to the new IBRS CPU model.
> >>>> It looks like all of the lago slaves were upgraded to new Libvirt and
> >>>> kernel on Friday, while we still don't have a fix on lago-ost-plugin for
> >>>> that.
> >>>>
> >>>> I think there was a misunderstanding about what to upgrade, and it
> >>>> might have been understood that only the bios upgrade breaks it and not the
> >>>> kernel one.
> >>>>
> >>>> In any case, we're currently fixing the issue, either by downgrading
> >>>> the relevant pkgs on lago slaves or adding the mapping to new CPU types
> >>>> from OST.
> >>>>
> >>>> For future, I suggest a few updates to maintenance work on Jenkins
> >>>> slaves ( VMs or BM ):
> >>>>
> >>>> 1. Let's avoid doing an upgrade close to a weekend ( i.e not on Thu-Sun
> >>>> ), so all the team can be around to help if needed or if something
> >>>> unexpected happens.
> >>>> 2. When we have a system-wide upgrade scheduled, like all BM slaves or
> >>>> VMs for a specific OS, let's adopt a gradual upgrade with a few days window
> >>>> in between,
> >>>>   e.g, if we need to upgrade all Lago slaves, let's upgrade 1-2 and
> >>>> wait to see if nothing breaks and continue after we verify OST runs (
> >>>> either seeing on CQ or running manually )
> >>>>
> >>>>
> >>>> Thoughts?
> >>>>
> >>>>
> >>> We have a staging system - we should be using it for staging....
> >>>
> >>
> >> Do we have OST tests or manual job avaialble there?
> >>
> >
> > We can add them easily, or simply run Lago manually when needed.
> >
> >
> >> In any case, this doesn't contradict what I suggested, even if you test
> >> on staging, there could be differences from the production system, so we
> >> should take care when we upgrade regardless.
> >>
> >
> > Yes, but at least we'd know we green lighted the new configuration - I'm
> > sure in this case we could have found at least some of the issues on
> > staging (Like the fc27 issues for example) and could have avoided expansive
> > production failures.
> >
> > Another point when scheduling an upgrade, is to talk to infra owner or the
> >> CI team and understand if we currently have a large Q in CQ or known
> >> failures, so it might be best to wait a bit until its cleared.
> >>
> >>
> >
> >
> Adding infra-support so we can gather this info and prepare a maintanaince
> / upgrade checklist to add to the oVirt infra docs.
> Let's continue the discussion, suggestion on that ticket.
> > --
> > Barak Korren
> > RHV DevOps team , RHCE, RHCi
> > Red Hat EMEA
> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
> >
> -- 
> Eyal edri
> MANAGER
> RHV DevOps
> EMEA VIRTUALIZATION R&D
> Red Hat EMEA <https://www.redhat.com/>
> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
> phone: +972-9-7692018
> irc: eedri (on #tlv #rhev-dev #rhev-integ)



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100076)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20180121/560cf83f/attachment-0001.html>


More information about the Infra mailing list