From: "David Caro" <dcaroest(a)redhat.com>
To: "Infra" <infra(a)ovirt.org>
Cc: "Max Kovgan" <mkovgan(a)redhat.com>
Sent: Tuesday, February 3, 2015 5:32:52 PM
Subject: Re: PHX lab downtime
On 02/03, David Caro wrote:
>
> It took more than one hour :S
>
> Current status is that all the vms are up and running, all the services are
> working, but we have one host down, ovirt-srv02 is out of the pool of
> hosts,
> with a strange issue resolving names.
>
> When running ping it can't resolve names, but with dig it works ok. That's
> usually a misconfiguration in the nsswitch.conf file, but it's ok
> I tried selinux and iptables.
>
> Did a strace of pinx, and I can see it does open a socket to the nameserver
> and
> sends the query, but nothing goes out the interface... (had tcpdump open in
> another screen)
>
> socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4 <0.000786>
> connect(4, {sa_family=AF_INET, sin_port=htons(53),
> sin_addr=inet_addr("8.8.8.8")}, 16) = 0 <0.000028>
> poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
> <0.000016>
> sendto(4, "ck\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32,
> MSG_NOSIGNAL, NULL, 0) = 32 <0.000052>
>
> Any idea is welcome, I'm going to get some sleep now that all the services
> are
> back up...
Mystery solved! Thanks Max!
The issue was that the iproute2 module can edit multiple routing tables, and
the usual command 'ip route show' only shows the routes in the default kernel
table while newer vdsm adds a new table for the routing aside from it, and
when
I modified the gateway and netmask in the routing tables of the hosts, the
extra side table was not updated. That lead to the strange behavior of ping
udp
requests to the dns server being routed through the old gateway while icmp
was
being routed though the new table.
You can find more info about routing tables and rules here:
http://linux-ip.net/html/routing-tables.html
http://linux-ip.net/html/routing-rpdb.html
and some examples here:
http://linux-ip.net/html/adv-multi-internet.html
Just for future reference, you can see all the routing for all the routing
tables with:
ip route show table all
do we have the network info documented on the infra page?
worth documenting it or enforcing standard configuration via puppet if possible.
>
>
> ps. ovirt-srv02 was also the hosted engine master, and was the host that
> broke
> during the upgrade the last time. Maybe it was related to this issue, or
> this
> issue is related to that...
>
> See you tomorrow!
>
>
> On 02/02, David Caro wrote:
> >
> > Hi all,
> >
> > We are having a downtime on some of the vms and hosts on the phx lab.
> > It's
> > caused by an unexpected issue with the engine and dhcp after changing the
> > gateway of the machines to adapt to the new ip range.
> >
> > It's almost fixed, but we (I) should really get the environment to a
> > really
> > stable status again and finish the upgrade.
> >
> > There are still some issues, but most of them are already fixed, will fix
> > the
> > rest in less than one hour.
> >
> >
> >
> > --
> > David Caro
> >
> > Red Hat S.L.
> > Continuous Integration Engineer - EMEA ENG Virtualization R&D
> >
> > Tel.: +420 532 294 605
> > Email: dcaro(a)redhat.com
> > Web:
www.redhat.com
> > RHT Global #: 82-62605
>
>
>
> > _______________________________________________
> > Infra mailing list
> > Infra(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/infra
>
>
> --
> David Caro
>
> Red Hat S.L.
> Continuous Integration Engineer - EMEA ENG Virtualization R&D
>
> Tel.: +420 532 294 605
> Email: dcaro(a)redhat.com
> Web:
www.redhat.com
> RHT Global #: 82-62605
> _______________________________________________
> Infra mailing list
> Infra(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/infra
--
David Caro
Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605
Email: dcaro(a)redhat.com
Web:
www.redhat.com
RHT Global #: 82-62605
_______________________________________________
Infra mailing list
Infra(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra