PHX lab downtime

Eyal Edri eedri at redhat.com
Wed Feb 4 10:18:44 UTC 2015



----- Original Message -----
> From: "David Caro" <dcaroest at redhat.com>
> To: "Infra" <infra at ovirt.org>
> Cc: "Max Kovgan" <mkovgan at redhat.com>
> Sent: Tuesday, February 3, 2015 5:32:52 PM
> Subject: Re: PHX lab downtime
> 
> On 02/03, David Caro wrote:
> > 
> > It took more than one hour :S
> > 
> > Current status is that all the vms are up and running, all the services are
> > working, but we have one host down, ovirt-srv02 is out of the pool of
> > hosts,
> > with a strange issue resolving names.
> > 
> > When running ping it can't resolve names, but with dig it works ok. That's
> > usually a misconfiguration in the nsswitch.conf file, but it's ok
> > I tried selinux and iptables.
> > 
> > Did a strace of pinx, and I can see it does open a socket to the nameserver
> > and
> > sends the query, but nothing goes out the interface... (had tcpdump open in
> > another screen)
> > 
> > socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4 <0.000786>
> > connect(4, {sa_family=AF_INET, sin_port=htons(53),
> > sin_addr=inet_addr("8.8.8.8")}, 16) = 0 <0.000028>
> > poll([{fd=4, events=POLLOUT}], 1, 0)    = 1 ([{fd=4, revents=POLLOUT}])
> > <0.000016>
> > sendto(4, "ck\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32,
> > MSG_NOSIGNAL, NULL, 0) = 32 <0.000052>
> > 
> > Any idea is welcome, I'm going to get some sleep now that all the services
> > are
> > back up...
> 
> Mystery solved! Thanks Max!
> 
> The issue was that the iproute2 module can edit multiple routing tables, and
> the usual command 'ip route show' only shows the routes in the default kernel
> table while newer vdsm adds a new table for the routing aside from it, and
> when
> I modified the gateway and netmask in the routing tables of the hosts, the
> extra side table was not updated. That lead to the strange behavior of ping
> udp
> requests to the dns server being routed through the old gateway while icmp
> was
> being routed though the new table.
> 
> You can find more info about routing tables and rules here:
> 
> http://linux-ip.net/html/routing-tables.html
> http://linux-ip.net/html/routing-rpdb.html
> 
> 
> and some examples here:
> 
> http://linux-ip.net/html/adv-multi-internet.html
> 
> 
> Just for future reference, you can see all the routing for all the routing
> tables with:
> 
>   ip route show table all
> 

do we have the network info documented on the infra page? 
worth documenting it or enforcing standard configuration via puppet if possible.

> 
> > 
> > 
> > ps. ovirt-srv02 was also the hosted engine master, and was the host that
> > broke
> > during the upgrade the last time. Maybe it was related to this issue, or
> > this
> > issue is related to that...
> > 
> > See you tomorrow!
> > 
> > 
> > On 02/02, David Caro wrote:
> > > 
> > > Hi all,
> > > 
> > > We are having a downtime on some of the vms and hosts on the phx lab.
> > > It's
> > > caused by an unexpected issue with the engine and dhcp after changing the
> > > gateway of the machines to adapt to the new ip range.
> > > 
> > > It's almost fixed, but we (I) should really get the environment to a
> > > really
> > > stable status again and finish the upgrade.
> > > 
> > > There are still some issues, but most of them are already fixed, will fix
> > > the
> > > rest in less than one hour.
> > > 
> > > 
> > > 
> > > --
> > > David Caro
> > > 
> > > Red Hat S.L.
> > > Continuous Integration Engineer - EMEA ENG Virtualization R&D
> > > 
> > > Tel.: +420 532 294 605
> > > Email: dcaro at redhat.com
> > > Web: www.redhat.com
> > > RHT Global #: 82-62605
> > 
> > 
> > 
> > > _______________________________________________
> > > Infra mailing list
> > > Infra at ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/infra
> > 
> > 
> > --
> > David Caro
> > 
> > Red Hat S.L.
> > Continuous Integration Engineer - EMEA ENG Virtualization R&D
> > 
> > Tel.: +420 532 294 605
> > Email: dcaro at redhat.com
> > Web: www.redhat.com
> > RHT Global #: 82-62605
> 
> 
> 
> > _______________________________________________
> > Infra mailing list
> > Infra at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/infra
> 
> 
> --
> David Caro
> 
> Red Hat S.L.
> Continuous Integration Engineer - EMEA ENG Virtualization R&D
> 
> Tel.: +420 532 294 605
> Email: dcaro at redhat.com
> Web: www.redhat.com
> RHT Global #: 82-62605
> 
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
> 



More information about the Infra mailing list