----- Original Message -----
From: "David Caro" <dcaroest(a)redhat.com>
To: "Infra" <infra(a)ovirt.org>
Sent: Tuesday, February 3, 2015 10:50:14 PM
Subject: Re: Major outage
Good news!
I got a vm working in the recently-upgraded fc20 host ovirt-srv02. The issue
with the vms seems to be that the default value for the numa setting is not
behaving correctly with libvirt. The fc19 vms just show the input/output
error,
but the fc20 shows also the libvirt full string, and there you can see that
it
complains about numa:
libvirtError: internal error: Process exited prior to exec: libvirt: error :
internal error: NUMA memory tuning in 'preferred' mode only supports single
node
So what I've done is edit the vm, pin it to a node, set it as not migratable
(or whatever is the spelling) and changed the numa mode from preferred to
strict. Saved, and then edited the vm again, reverting the host pin and the
migration settings, but not changing the numa ones. That allowed me to boot
one
of the vms so far (just tested).
Some ugly issues:
The know multipathd message in the logs... it's quite annoying and fills up
the logs.
Vdsm messed up the network a couple of times, once it removed all the ifcfg
files, and the other it restored old values in the rules/route files
Vdsm failed on vdsm-restore-net-config:89 with a non-existing key exception
insted of just showing an error message and continuing execution
I'll triage the above errors tomorrow and resend to the devels, for now just
sending to avoid forgetting about them.
Will continue booting the rest of production vms, do some simple sanity and
leave the rest for tomorrow.
On the good side, we have now one fc20 host on each cluster, and 3.5 on all
the
production DC hosts! yay \o/
Great news!
adding some NUMA experts to see if they have an advise on optimizing it on the DC.
e.
If anything comes up again I'll update in this thread, if not, tomorrow
morning
I'll update when all the environment is working 100%
pd. Thanks Fabian and Max!!
On 02/03, David Caro wrote:
>
> New update,
>
> Host srv01 is up and running, but 02 and 03 have issues, they can't start
> up
> any vms.
>
> The error is in libvirt:
>
> libvirtError: Child quit during startup handshake: Input/output error
>
>
> Looking around I saw a thread in the users lists that fixed it with_
>
> /usr/lib/systemd/systemd-vdsmd reconfigure force
>
> That worked on srv01, but the others did not. So I'm trying to upgrade to
> fc20
> one of them, the srv02, hoping the newer libvirt version will not have that
> issue.
>
> Those two hosts are the ones that are in the production data center, and it
> has the foreman vm, so none of the slaves is working properly until that is
> solved.
>
>
> Will update in ~one hour or when the problem is solved.
>
> Being so late, if I get the production vms running in one host, I'll leave
> the
> rest for tomorrow.
>
>
> D
>
> On 02/03, David Caro wrote:
> >
> > Ok, update:
> >
> >
> > Not all the servers have been restored, most of the slave vms are up, and
> > all
> > but one host are up.
> >
> > Engine - Ok
> > storage -Ok
> > storage01 - Ok
> > storage02 - Ok
> > srv01 - DOWN
> > srv02 - OUT OF THE POOL (will add when 01 is up)
> > srv03 - OK
> > srv04 - OK
> > srv05 - OK
> > srv06 - OK
> > srv07 - OK
> > srv08 - OK
> >
> >
> > If you need any specific vm I can try to get it up on one of the running
> > hosts,
> > but I'd wait until the last host is up to start all of them.
> >
> >
> > Will update again when finished or in one hour.
> >
> >
> > On 02/03, David Caro wrote:
> > >
> > > We are having a major outage on phoenix lab, don't expect any
> > > vms/slaves to be
> > > properly working yet.
> > >
> > > Will update when solved or in an hour with the status.
> > >
> > >
> > > --
> > > David Caro
> > >
> > > Red Hat S.L.
> > > Continuous Integration Engineer - EMEA ENG Virtualization R&D
> > >
> > > Tel.: +420 532 294 605
> > > Email: dcaro(a)redhat.com
> > > Web:
www.redhat.com
> > > RHT Global #: 82-62605
> >
> >
> >
> > > _______________________________________________
> > > Infra mailing list
> > > Infra(a)ovirt.org
> > >
http://lists.ovirt.org/mailman/listinfo/infra
> >
> >
> > --
> > David Caro
> >
> > Red Hat S.L.
> > Continuous Integration Engineer - EMEA ENG Virtualization R&D
> >
> > Tel.: +420 532 294 605
> > Email: dcaro(a)redhat.com
> > Web:
www.redhat.com
> > RHT Global #: 82-62605
>
>
>
> > _______________________________________________
> > Infra mailing list
> > Infra(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/infra
>
>
> --
> David Caro
>
> Red Hat S.L.
> Continuous Integration Engineer - EMEA ENG Virtualization R&D
>
> Tel.: +420 532 294 605
> Email: dcaro(a)redhat.com
> Web:
www.redhat.com
> RHT Global #: 82-62605
> _______________________________________________
> Infra mailing list
> Infra(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/infra
--
David Caro
Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605
Email: dcaro(a)redhat.com
Web:
www.redhat.com
RHT Global #: 82-62605
_______________________________________________
Infra mailing list
Infra(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra