
--QqzFzR/RUlLahzby Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi all, We are having a downtime on some of the vms and hosts on the phx lab. It's caused by an unexpected issue with the engine and dhcp after changing the gateway of the machines to adapt to the new ip range. It's almost fixed, but we (I) should really get the environment to a really stable status again and finish the upgrade. There are still some issues, but most of them are already fixed, will fix t= he rest in less than one hour. --=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --QqzFzR/RUlLahzby Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJUz/oqAAoJEEBxx+HSYmnD1mwH/3oY97OhhLzwf10z+Humwjdk Mmzozv0TbPIIF04J9rsJBT+6TM2M48uk1AD0ElhnCZSV/mD2onmGP/Z9wDNkj7N0 UA0blemOSQK4gpKSzvUZrB7C7wW0WKAkDQ3N/L1I/phooYl3Zk8aHSjd2LgoCmys ZcHldxM89JOZNNEoGkhpxCos4a7Tk4TgbZqXeOsXO/qkeWI+lcqlAPjfx8oxDPxf alWUlgVEtkWgiEFVoJzMXxgtvlCUwS+Tll6HacmM5WAJsvBbuqzCUM7jmcjyVa+D xV2M8+sfwjxto6f6Kcz9zTIk9i9JGYsF1R0QT6pUS8UjPxqPSWGIVXB0rGyFYYg= =bst/ -----END PGP SIGNATURE----- --QqzFzR/RUlLahzby--

=20 Hi all, =20 We are having a downtime on some of the vms and hosts on the phx lab. It's caused by an unexpected issue with the engine and dhcp after changing the gateway of the machines to adapt to the new ip range. =20 It's almost fixed, but we (I) should really get the environment to a real= ly stable status again and finish the upgrade. =20 There are still some issues, but most of them are already fixed, will fix=
--IoFIGPN1N3g1Ryqz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable It took more than one hour :S Current status is that all the vms are up and running, all the services are working, but we have one host down, ovirt-srv02 is out of the pool of hosts, with a strange issue resolving names. When running ping it can't resolve names, but with dig it works ok. That's usually a misconfiguration in the nsswitch.conf file, but it's ok I tried selinux and iptables. Did a strace of pinx, and I can see it does open a socket to the nameserver= and sends the query, but nothing goes out the interface... (had tcpdump open in another screen) socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) =3D 4 <0.000786> connect(4, {sa_family=3DAF_INET, sin_port=3Dhtons(53), sin_addr=3Dinet_addr= ("8.8.8.8")}, 16) =3D 0 <0.000028> poll([{fd=3D4, events=3DPOLLOUT}], 1, 0) =3D 1 ([{fd=3D4, revents=3DPOLL= OUT}]) <0.000016> sendto(4, "ck\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_NOS= IGNAL, NULL, 0) =3D 32 <0.000052> Any idea is welcome, I'm going to get some sleep now that all the services = are back up... ps. ovirt-srv02 was also the hosted engine master, and was the host that br= oke during the upgrade the last time. Maybe it was related to this issue, or th= is issue is related to that... See you tomorrow! On 02/02, David Caro wrote: the
rest in less than one hour. =20 =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --IoFIGPN1N3g1Ryqz Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJU0BW+AAoJEEBxx+HSYmnDNU4H/jxAgrGY8Wmmpn9SK3UwTtFd 7QsdE0QJD5stsrIMUgPeq1F/ARFQ0RQlFZHo5VY38K9UVjKcTY9mh/rfdMs+v0gP NHxBTzJewHAYLi39k+s6rjhzoLx4YOrYFsO5+mEUFni1G9UOJ2VTEL63e3QuoGWl NvF7yBBVuRDSd/1jqLvIPRsYLIT3sEsP2x1CyI8aw3JIC2+CB+R7mRzwssolNurP boYFLVPAepYDwwC4nvAXvNddIH/AJFVWiD5517a3czn/0phErJMEZGPBNfD7hbbd it/ybQGLQtYuk3FGX3HdxUVsJyztwEinCMFpYWXB9esyzWd/q2E+/DZCI/b+f+g= =lvDZ -----END PGP SIGNATURE----- --IoFIGPN1N3g1Ryqz--

--o99acAvKqrTZeiCU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 02/03, David Caro wrote:
=20 It took more than one hour :S =20 Current status is that all the vms are up and running, all the services a= re working, but we have one host down, ovirt-srv02 is out of the pool of hos= ts, with a strange issue resolving names. =20 When running ping it can't resolve names, but with dig it works ok. That's usually a misconfiguration in the nsswitch.conf file, but it's ok I tried selinux and iptables. =20 Did a strace of pinx, and I can see it does open a socket to the nameserv= er and sends the query, but nothing goes out the interface... (had tcpdump open = in another screen) =20 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) =3D 4 <0.000786> connect(4, {sa_family=3DAF_INET, sin_port=3Dhtons(53), sin_addr=3Dinet_ad= dr("8.8.8.8")}, 16) =3D 0 <0.000028> poll([{fd=3D4, events=3DPOLLOUT}], 1, 0) =3D 1 ([{fd=3D4, revents=3DPO= LLOUT}]) <0.000016> sendto(4, "ck\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_N= OSIGNAL, NULL, 0) =3D 32 <0.000052> =20 Any idea is welcome, I'm going to get some sleep now that all the service= s are back up...
=20 =20 ps. ovirt-srv02 was also the hosted engine master, and was the host that = broke during the upgrade the last time. Maybe it was related to this issue, or =
Mystery solved! Thanks Max! The issue was that the iproute2 module can edit multiple routing tables, and the usual command 'ip route show' only shows the routes in the default kern= el table while newer vdsm adds a new table for the routing aside from it, and = when I modified the gateway and netmask in the routing tables of the hosts, the extra side table was not updated. That lead to the strange behavior of ping= udp requests to the dns server being routed through the old gateway while icmp = was being routed though the new table. You can find more info about routing tables and rules here: http://linux-ip.net/html/routing-tables.html http://linux-ip.net/html/routing-rpdb.html and some examples here: http://linux-ip.net/html/adv-multi-internet.html Just for future reference, you can see all the routing for all the routing tables with: ip route show table all this
issue is related to that... =20 See you tomorrow! =20 =20 On 02/02, David Caro wrote:
=20 Hi all, =20 We are having a downtime on some of the vms and hosts on the phx lab. I= t's caused by an unexpected issue with the engine and dhcp after changing t= he gateway of the machines to adapt to the new ip range. =20 It's almost fixed, but we (I) should really get the environment to a re= ally stable status again and finish the upgrade. =20 There are still some issues, but most of them are already fixed, will f= ix the rest in less than one hour. =20 =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 =20 =20 =20 _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra =20 =20 --=20 David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --o99acAvKqrTZeiCU Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJU0OokAAoJEEBxx+HSYmnDCbAH/2OaIVm/vgMj48oiJE82LRfv mgRxaU14IEzj4jZOE9b7xgjEZGpSDCdlpR2cINH6GSiE+RiygsSLETZE59R93y4Q nFfdcWDq7LCFIxAWhSLbcRCh6ZwQVzotTFefOvw1mTTHOEFZVuK6XF+VUdnepBmC WmJmgng2IZanjM/rCKU+ikr/Ql1NX6xjakXoxaiIkEXJIBgM82pyTdVuSelJ1z16 Krzbat8bzhByd7LL+HrxD77VXtYJEHmTDnpbkPjxtHdx2+wHUROr/ik62gPk/lAg 8mLwrIQKZhQbtDnFfz9L3cC/l84pLDbNedQE4HuhIOpr92HQl2y6Hm38Emgho7k= =PYBI -----END PGP SIGNATURE----- --o99acAvKqrTZeiCU--

----- Original Message -----
From: "David Caro" <dcaroest@redhat.com> To: "Infra" <infra@ovirt.org> Cc: "Max Kovgan" <mkovgan@redhat.com> Sent: Tuesday, February 3, 2015 5:32:52 PM Subject: Re: PHX lab downtime
On 02/03, David Caro wrote:
It took more than one hour :S
Current status is that all the vms are up and running, all the services are working, but we have one host down, ovirt-srv02 is out of the pool of hosts, with a strange issue resolving names.
When running ping it can't resolve names, but with dig it works ok. That's usually a misconfiguration in the nsswitch.conf file, but it's ok I tried selinux and iptables.
Did a strace of pinx, and I can see it does open a socket to the nameserver and sends the query, but nothing goes out the interface... (had tcpdump open in another screen)
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4 <0.000786> connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0 <0.000028> poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}]) <0.000016> sendto(4, "ck\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_NOSIGNAL, NULL, 0) = 32 <0.000052>
Any idea is welcome, I'm going to get some sleep now that all the services are back up...
Mystery solved! Thanks Max!
The issue was that the iproute2 module can edit multiple routing tables, and the usual command 'ip route show' only shows the routes in the default kernel table while newer vdsm adds a new table for the routing aside from it, and when I modified the gateway and netmask in the routing tables of the hosts, the extra side table was not updated. That lead to the strange behavior of ping udp requests to the dns server being routed through the old gateway while icmp was being routed though the new table.
You can find more info about routing tables and rules here:
http://linux-ip.net/html/routing-tables.html http://linux-ip.net/html/routing-rpdb.html
and some examples here:
http://linux-ip.net/html/adv-multi-internet.html
Just for future reference, you can see all the routing for all the routing tables with:
ip route show table all
do we have the network info documented on the infra page? worth documenting it or enforcing standard configuration via puppet if possible.
ps. ovirt-srv02 was also the hosted engine master, and was the host that broke during the upgrade the last time. Maybe it was related to this issue, or this issue is related to that...
See you tomorrow!
On 02/02, David Caro wrote:
Hi all,
We are having a downtime on some of the vms and hosts on the phx lab. It's caused by an unexpected issue with the engine and dhcp after changing the gateway of the machines to adapt to the new ip range.
It's almost fixed, but we (I) should really get the environment to a really stable status again and finish the upgrade.
There are still some issues, but most of them are already fixed, will fix the rest in less than one hour.
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- David Caro
Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

=20 =20 ----- Original Message -----
From: "David Caro" <dcaroest@redhat.com> To: "Infra" <infra@ovirt.org> Cc: "Max Kovgan" <mkovgan@redhat.com> Sent: Tuesday, February 3, 2015 5:32:52 PM Subject: Re: PHX lab downtime =20 On 02/03, David Caro wrote:
=20 It took more than one hour :S =20 Current status is that all the vms are up and running, all the servic= es are working, but we have one host down, ovirt-srv02 is out of the pool of hosts, with a strange issue resolving names. =20 When running ping it can't resolve names, but with dig it works ok. T= hat's usually a misconfiguration in the nsswitch.conf file, but it's ok I tried selinux and iptables. =20 Did a strace of pinx, and I can see it does open a socket to the name= server and sends the query, but nothing goes out the interface... (had tcpdump o=
another screen) =20 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) =3D 4 <0.000786> connect(4, {sa_family=3DAF_INET, sin_port=3Dhtons(53), sin_addr=3Dinet_addr("8.8.8.8")}, 16) =3D 0 <0.000028> poll([{fd=3D4, events=3DPOLLOUT}], 1, 0) =3D 1 ([{fd=3D4, revents= =3DPOLLOUT}]) <0.000016> sendto(4, "ck\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_NOSIGNAL, NULL, 0) =3D 32 <0.000052> =20 Any idea is welcome, I'm going to get some sleep now that all the ser= vices are back up... =20 Mystery solved! Thanks Max! =20 The issue was that the iproute2 module can edit multiple routing tables= , and the usual command 'ip route show' only shows the routes in the default = kernel table while newer vdsm adds a new table for the routing aside from it, = and when I modified the gateway and netmask in the routing tables of the hosts, =
extra side table was not updated. That lead to the strange behavior of =
--1E1Oui4vdubnXi3o Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 02/04, Eyal Edri wrote: pen in the ping
udp requests to the dns server being routed through the old gateway while i= cmp was being routed though the new table. =20 You can find more info about routing tables and rules here: =20 http://linux-ip.net/html/routing-tables.html http://linux-ip.net/html/routing-rpdb.html =20 =20 and some examples here: =20 http://linux-ip.net/html/adv-multi-internet.html =20 =20 Just for future reference, you can see all the routing for all the rout= ing tables with: =20 ip route show table all =20 =20 do we have the network info documented on the infra page?=20 worth documenting it or enforcing standard configuration via puppet if po= ssible.
The network setup is there, but the first time you run vdsm it messes everything up and changes configurations and such, so I don't think it's ok= to change it with puppet when vdsm is already managing it (not always ok it seems). That will lead to a race condition between vdsm and puppet changing= the network configuration. But I'll add the troubleshooting tips to the docs.
=20
=20
=20 =20 ps. ovirt-srv02 was also the hosted engine master, and was the host t= hat broke during the upgrade the last time. Maybe it was related to this issue,= or this issue is related to that... =20 See you tomorrow! =20 =20 On 02/02, David Caro wrote:
=20 Hi all, =20 We are having a downtime on some of the vms and hosts on the phx la= b. It's caused by an unexpected issue with the engine and dhcp after changi= ng the gateway of the machines to adapt to the new ip range. =20 It's almost fixed, but we (I) should really get the environment to a really stable status again and finish the upgrade. =20 There are still some issues, but most of them are already fixed, wi= ll fix the rest in less than one hour. =20 =20 =20 -- David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 =20 =20 =20 _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra =20 =20 -- David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 =20 =20 =20
Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra =20 =20 -- David Caro =20 Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D =20 Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 =20
Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra =20
--=20 David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --1E1Oui4vdubnXi3o Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJU0fTXAAoJEEBxx+HSYmnDm4gH/jeCdExJrzH43Y+4VXiavlTV ZVDymrHbEqJKLpl+EkvlLAmBZSACzxQKK4qo+4uPr8vIxHRjSgzPW/uigpT0lI2l UY/eJ20xeEolJIZr7uK02w7vylR9t/ekwQDxhiNwaOjjHSzd52gBHJQU24zL5VWH BHzgMuwSLXo85vG6ICxb1fmUSxNQJn/pUT6JUg6NqJoDT0PFa2rK7wlNlHhBGJBI Vlg4FIlXmL0OBo6mb+HMIb980aKE+kpVbtZEWbQMfz0pnH0Y5Bm2njT6YKWhky1J unQMMbNC6h+RNSwJRLa5giLZCHQyZZ0DVCP0zK4w/b6pEGaz1kLa6LJ1zNLaNFo= =hk9u -----END PGP SIGNATURE----- --1E1Oui4vdubnXi3o--
participants (2)
-
David Caro
-
Eyal Edri