Major outage

David Caro dcaroest at redhat.com
Tue Feb 3 20:50:14 UTC 2015


Good news!


I got a vm working in the recently-upgraded fc20 host ovirt-srv02. The issue
with the vms seems to be that the default value for the numa setting is not
behaving correctly with libvirt. The fc19 vms just show the input/output error,
but the fc20 shows also the libvirt full string, and there you can see that it
complains about numa:


libvirtError: internal error: Process exited prior to exec: libvirt:  error : internal error: NUMA memory tuning in 'preferred' mode only supports single node


So what I've done is edit the vm, pin it to a node, set it as not migratable
(or whatever is the spelling) and changed the numa mode from preferred to
strict. Saved, and then edited the vm again, reverting the host pin and the
migration settings, but not changing the numa ones. That allowed me to boot one
of the vms so far (just tested).

Some ugly issues:

  The know multipathd message in the logs... it's quite annoying and fills up
  the logs.
  Vdsm messed up the network a couple of times, once it removed all the ifcfg
  files, and the other it restored old values in the rules/route files
  Vdsm failed on vdsm-restore-net-config:89 with a non-existing key exception
  insted of just showing an error message and continuing execution

I'll triage the above errors tomorrow and resend to the devels, for now just
sending to avoid forgetting about them.


Will continue booting the rest of production vms, do some simple sanity and
leave the rest for tomorrow.

On the good side, we have now one fc20 host on each cluster, and 3.5 on all the
production DC hosts! yay \o/

If anything comes up again I'll update in this thread, if not, tomorrow morning
I'll update when all the environment is working 100%

pd. Thanks Fabian and Max!!

On 02/03, David Caro wrote:
> 
> New update,
> 
> Host srv01 is up and running, but 02 and 03 have issues, they can't start up
> any vms.
> 
> The error is in libvirt:
> 
> libvirtError: Child quit during startup handshake: Input/output error
> 
> 
> Looking around I saw a thread in the users lists that fixed it with_
> 
>   /usr/lib/systemd/systemd-vdsmd reconfigure force
> 
> That worked on srv01, but the others did not. So I'm trying to upgrade to fc20
> one of them, the srv02, hoping the newer libvirt version will not have that
> issue.
> 
> Those two hosts are the ones that are in the production data center, and it
> has the foreman vm, so none of the slaves is working properly until that is solved.
> 
> 
> Will update in ~one hour or when the problem is solved.
> 
> Being so late, if I get the production vms running in one host, I'll leave the
> rest for tomorrow.
> 
> 
> D
> 
> On 02/03, David Caro wrote:
> > 
> > Ok, update:
> > 
> > 
> > Not all the servers have been restored, most of the slave vms are up, and all
> > but one host are up.
> > 
> > Engine - Ok
> > storage -Ok
> > storage01 - Ok
> > storage02 - Ok
> > srv01 - DOWN
> > srv02 - OUT OF THE POOL (will add when 01 is up)
> > srv03 - OK
> > srv04 - OK
> > srv05 - OK
> > srv06 - OK
> > srv07 - OK
> > srv08 - OK
> > 
> > 
> > If you need any specific vm I can try to get it up on one of the running hosts,
> > but I'd wait until the last host is up to start all of them.
> > 
> > 
> > Will update again when finished or in one hour.
> > 
> > 
> > On 02/03, David Caro wrote:
> > > 
> > > We are having a major outage on phoenix lab, don't expect any vms/slaves to be
> > > properly working yet.
> > > 
> > > Will update when solved or in an hour with the status.
> > > 
> > > 
> > > -- 
> > > David Caro
> > > 
> > > Red Hat S.L.
> > > Continuous Integration Engineer - EMEA ENG Virtualization R&D
> > > 
> > > Tel.: +420 532 294 605
> > > Email: dcaro at redhat.com
> > > Web: www.redhat.com
> > > RHT Global #: 82-62605
> > 
> > 
> > 
> > > _______________________________________________
> > > Infra mailing list
> > > Infra at ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/infra
> > 
> > 
> > -- 
> > David Caro
> > 
> > Red Hat S.L.
> > Continuous Integration Engineer - EMEA ENG Virtualization R&D
> > 
> > Tel.: +420 532 294 605
> > Email: dcaro at redhat.com
> > Web: www.redhat.com
> > RHT Global #: 82-62605
> 
> 
> 
> > _______________________________________________
> > Infra mailing list
> > Infra at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/infra
> 
> 
> -- 
> David Caro
> 
> Red Hat S.L.
> Continuous Integration Engineer - EMEA ENG Virtualization R&D
> 
> Tel.: +420 532 294 605
> Email: dcaro at redhat.com
> Web: www.redhat.com
> RHT Global #: 82-62605



> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra


-- 
David Caro

Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605
Email: dcaro at redhat.com
Web: www.redhat.com
RHT Global #: 82-62605
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20150203/0fb8ef50/attachment.sig>


More information about the Infra mailing list