
--pf9I7BMVVzbSWLtt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable We are having a major outage on phoenix lab, don't expect any vms/slaves to= be properly working yet. Will update when solved or in an hour with the status. --=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --pf9I7BMVVzbSWLtt Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJU0QFzAAoJEEBxx+HSYmnDI6oH/im7CNuDufcx+wm6dVsFjv/J kGfnSxDUZhcf8GqZgLotgXxhfM8LmSJorJkOAyCysZIWlAB+Hyzuw6eDuRoireur 2mULSrxdsCWzkn3/IHNKF+txmuREZ+zrwtR5ExiL5iAxK8smDiTYnUQ0+uAvlbKb vQK7J8ApZYTXlwu7aggVaya5jVTFOPJny6eosec1hSTx8gLkuQcpuUWSP0AqbDxd YWLgWqei0pa5Knz+/S35Rv/Kdw3m6gX5ZSgqWfyW+bMh+/Gf/0YsHrJ8LPvgBXXb bgP59/P79zOkzDKwCKYeEkAQ7P3NwFBZIUrIOmoxVvex8AgdoLxEwXfuHgJT2Fs= =jNwl -----END PGP SIGNATURE----- --pf9I7BMVVzbSWLtt--

--+g7M9IMkV8truYOl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Ok, update: Not all the servers have been restored, most of the slave vms are up, and a= ll but one host are up. Engine - Ok storage -Ok storage01 - Ok storage02 - Ok srv01 - DOWN srv02 - OUT OF THE POOL (will add when 01 is up) srv03 - OK srv04 - OK srv05 - OK srv06 - OK srv07 - OK srv08 - OK If you need any specific vm I can try to get it up on one of the running ho= sts, but I'd wait until the last host is up to start all of them. Will update again when finished or in one hour. On 02/03, David Caro wrote:
=20 We are having a major outage on phoenix lab, don't expect any vms/slaves = to be properly working yet. =20 Will update when solved or in an hour with the status. =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --+g7M9IMkV8truYOl Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJU0RfoAAoJEEBxx+HSYmnDMs8IAKBOIsIFZMkWkplIXtfqoKfn 7+kPj+hgpAzIKOON3V68KNbGI0qVcma4ild4oISjIEmHHPqJLjsPxZWu2sF1iq6w XSv1wG8M7KhCa8Gm+Pbz54LLCtxD4/S8HVkIBCti5kHxnLlJ8f3e0alcnTaKvntV K1rTMrg5za4Xq2QfVxzX/nLlxnV45G+bpJl6qaSpVrIwvHukxLOOfskQWcGYVguH ieUdt/ZGN0U+pad8fZWVwSuhkEFfQB0VMrv4PNjyAU5Nyp0vlhy74zP+6pup2ld/ cJxjAlCS98SNkYrBDD0VjT0klWLco4zsf9uwrfAx7JV9aNw3qyZZgg7+IYV3cyI= =fVmR -----END PGP SIGNATURE----- --+g7M9IMkV8truYOl--

--2/5bycvrmDh4d1IB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable New update, Host srv01 is up and running, but 02 and 03 have issues, they can't start up any vms. The error is in libvirt: libvirtError: Child quit during startup handshake: Input/output error Looking around I saw a thread in the users lists that fixed it with_ /usr/lib/systemd/systemd-vdsmd reconfigure force That worked on srv01, but the others did not. So I'm trying to upgrade to f= c20 one of them, the srv02, hoping the newer libvirt version will not have that issue. Those two hosts are the ones that are in the production data center, and it has the foreman vm, so none of the slaves is working properly until that is= solved. Will update in ~one hour or when the problem is solved. Being so late, if I get the production vms running in one host, I'll leave = the rest for tomorrow. D On 02/03, David Caro wrote:
=20 Ok, update: =20 =20 Not all the servers have been restored, most of the slave vms are up, and= all but one host are up. =20 Engine - Ok storage -Ok storage01 - Ok storage02 - Ok srv01 - DOWN srv02 - OUT OF THE POOL (will add when 01 is up) srv03 - OK srv04 - OK srv05 - OK srv06 - OK srv07 - OK srv08 - OK =20 =20 If you need any specific vm I can try to get it up on one of the running = hosts, but I'd wait until the last host is up to start all of them. =20 =20 Will update again when finished or in one hour. =20 =20 On 02/03, David Caro wrote:
=20 We are having a major outage on phoenix lab, don't expect any vms/slave= s to be properly working yet. =20 Will update when solved or in an hour with the status. =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 =20 =20 =20 _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --2/5bycvrmDh4d1IB Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJU0SMeAAoJEEBxx+HSYmnDYCEIAIwQlGphK3w65SNnmRexN2Fe eZta0vvIHqiLR7jXgzFux0WfwrgxQ6k4XiG0/4wvZzumXI2zzxTLyH3agqcqjPHu JO+vzEg5hWLr/liHowj/4Qapi7HT5n51xwxdd7sCMOj9E6Q5kHkBvUkk2+8k4D3H 5mDEuwGGm1bqHF81SOLKR7fJbRh5rY85ShHtJyrrK+DYUxcRbO8D0P9JjcPGlEnL DWVePF3QBGT73aumaUzC0NqpLI04pFJHECIQDHhgDgm5fL6QhMTZSG3RaXkk29NQ /B2M0urf1D9pg3DbRaHBZ7kJdx1t+WCtJfZCrx48uMcRxqaKx9udBcADnDxhaXE= =bZSt -----END PGP SIGNATURE----- --2/5bycvrmDh4d1IB--

--WChQLJJJfbwij+9x Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Good news! I got a vm working in the recently-upgraded fc20 host ovirt-srv02. The issue with the vms seems to be that the default value for the numa setting is not behaving correctly with libvirt. The fc19 vms just show the input/output er= ror, but the fc20 shows also the libvirt full string, and there you can see that= it complains about numa: libvirtError: internal error: Process exited prior to exec: libvirt: error= : internal error: NUMA memory tuning in 'preferred' mode only supports sin= gle node So what I've done is edit the vm, pin it to a node, set it as not migratable (or whatever is the spelling) and changed the numa mode from preferred to strict. Saved, and then edited the vm again, reverting the host pin and the migration settings, but not changing the numa ones. That allowed me to boot= one of the vms so far (just tested). Some ugly issues: The know multipathd message in the logs... it's quite annoying and fills = up the logs. Vdsm messed up the network a couple of times, once it removed all the ifc= fg files, and the other it restored old values in the rules/route files Vdsm failed on vdsm-restore-net-config:89 with a non-existing key excepti= on insted of just showing an error message and continuing execution I'll triage the above errors tomorrow and resend to the devels, for now just sending to avoid forgetting about them. Will continue booting the rest of production vms, do some simple sanity and leave the rest for tomorrow. On the good side, we have now one fc20 host on each cluster, and 3.5 on all= the production DC hosts! yay \o/ If anything comes up again I'll update in this thread, if not, tomorrow mor= ning I'll update when all the environment is working 100% pd. Thanks Fabian and Max!! On 02/03, David Caro wrote:
=20 New update, =20 Host srv01 is up and running, but 02 and 03 have issues, they can't start= up any vms. =20 The error is in libvirt: =20 libvirtError: Child quit during startup handshake: Input/output error =20 =20 Looking around I saw a thread in the users lists that fixed it with_ =20 /usr/lib/systemd/systemd-vdsmd reconfigure force =20 That worked on srv01, but the others did not. So I'm trying to upgrade to= fc20 one of them, the srv02, hoping the newer libvirt version will not have th= at issue. =20 Those two hosts are the ones that are in the production data center, and = it has the foreman vm, so none of the slaves is working properly until that = is solved. =20 =20 Will update in ~one hour or when the problem is solved. =20 Being so late, if I get the production vms running in one host, I'll leav= e the rest for tomorrow. =20 =20 D =20 On 02/03, David Caro wrote:
=20 Ok, update: =20 =20 Not all the servers have been restored, most of the slave vms are up, a= nd all but one host are up. =20 Engine - Ok storage -Ok storage01 - Ok storage02 - Ok srv01 - DOWN srv02 - OUT OF THE POOL (will add when 01 is up) srv03 - OK srv04 - OK srv05 - OK srv06 - OK srv07 - OK srv08 - OK =20 =20 If you need any specific vm I can try to get it up on one of the runnin= g hosts, but I'd wait until the last host is up to start all of them. =20 =20 Will update again when finished or in one hour. =20 =20 On 02/03, David Caro wrote:
=20 We are having a major outage on phoenix lab, don't expect any vms/sla= ves to be properly working yet. =20 Will update when solved or in an hour with the status. =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 =20 =20 =20 _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 =20 =20 =20
Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --WChQLJJJfbwij+9x Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJU0TSGAAoJEEBxx+HSYmnDGV0H/0KgMp+lXs7IGgD2/jMVxwmI 9RFBnKVsPtOvsnUq6UbmUcT+z9MMeU/Ru/SqDeTJhfgUI4JB2nxBQVEKmgu7sbp3 i+qQfwc6N6Pt3UQvCBHJ3XX4CHc4KMXe0F8Y4OY3Hp4WHrn5h+2rD/BhLij/kKV8 YQDVZs78v8KwTcS22r4iK2elKahkWUWyodBBwKE5YFa2h3rxbn911dLmeoh9fwCX 9s+lacP0XD/8Cjbc2snQ/i9mjuB1iTw/o+9nzbl6ScTomH33ND5/8V7KRG1WFtf/ iQj4j68BWb7YYaxhyv4qE9QOwAOwwF5l/uIVfr5oDH4hYIWj4FfZxuBbRAVPeDQ= =jquJ -----END PGP SIGNATURE----- --WChQLJJJfbwij+9x--

----- Original Message -----
From: "David Caro" <dcaroest@redhat.com> To: "Infra" <infra@ovirt.org> Sent: Tuesday, February 3, 2015 10:50:14 PM Subject: Re: Major outage
Good news!
I got a vm working in the recently-upgraded fc20 host ovirt-srv02. The issue with the vms seems to be that the default value for the numa setting is not behaving correctly with libvirt. The fc19 vms just show the input/output error, but the fc20 shows also the libvirt full string, and there you can see that it complains about numa:
libvirtError: internal error: Process exited prior to exec: libvirt: error : internal error: NUMA memory tuning in 'preferred' mode only supports single node
So what I've done is edit the vm, pin it to a node, set it as not migratable (or whatever is the spelling) and changed the numa mode from preferred to strict. Saved, and then edited the vm again, reverting the host pin and the migration settings, but not changing the numa ones. That allowed me to boot one of the vms so far (just tested).
Some ugly issues:
The know multipathd message in the logs... it's quite annoying and fills up the logs. Vdsm messed up the network a couple of times, once it removed all the ifcfg files, and the other it restored old values in the rules/route files Vdsm failed on vdsm-restore-net-config:89 with a non-existing key exception insted of just showing an error message and continuing execution
I'll triage the above errors tomorrow and resend to the devels, for now just sending to avoid forgetting about them.
Will continue booting the rest of production vms, do some simple sanity and leave the rest for tomorrow.
On the good side, we have now one fc20 host on each cluster, and 3.5 on all the production DC hosts! yay \o/
Great news! adding some NUMA experts to see if they have an advise on optimizing it on the DC. e.
If anything comes up again I'll update in this thread, if not, tomorrow morning I'll update when all the environment is working 100%
pd. Thanks Fabian and Max!!
On 02/03, David Caro wrote:
New update,
Host srv01 is up and running, but 02 and 03 have issues, they can't start up any vms.
The error is in libvirt:
libvirtError: Child quit during startup handshake: Input/output error
Looking around I saw a thread in the users lists that fixed it with_
/usr/lib/systemd/systemd-vdsmd reconfigure force
That worked on srv01, but the others did not. So I'm trying to upgrade to fc20 one of them, the srv02, hoping the newer libvirt version will not have that issue.
Those two hosts are the ones that are in the production data center, and it has the foreman vm, so none of the slaves is working properly until that is solved.
Will update in ~one hour or when the problem is solved.
Being so late, if I get the production vms running in one host, I'll leave the rest for tomorrow.
D
On 02/03, David Caro wrote:
Ok, update:
Not all the servers have been restored, most of the slave vms are up, and all but one host are up.
Engine - Ok storage -Ok storage01 - Ok storage02 - Ok srv01 - DOWN srv02 - OUT OF THE POOL (will add when 01 is up) srv03 - OK srv04 - OK srv05 - OK srv06 - OK srv07 - OK srv08 - OK
If you need any specific vm I can try to get it up on one of the running hosts, but I'd wait until the last host is up to start all of them.
Will update again when finished or in one hour.
On 02/03, David Caro wrote:
We are having a major outage on phoenix lab, don't expect any vms/slaves to be properly working yet.
Will update when solved or in an hour with the status.
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
participants (2)
-
David Caro
-
Eyal Edri