[ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)

Tue Mar 10 10:20:48 EDT 2015

----- Original Message -----
> From: "Bob Doolittle" <bob at doolittle.us.com>
> To: "Simone Tiraboschi" <stirabos at redhat.com>
> Cc: "users-ovirt" <users at ovirt.org>
> Sent: Tuesday, March 10, 2015 2:40:13 PM
> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed
> state)
> 
> 
> On 03/10/2015 04:58 AM, Simone Tiraboschi wrote:
> >
> > ----- Original Message -----
> >> From: "Bob Doolittle" <bob at doolittle.us.com>
> >> To: "Simone Tiraboschi" <stirabos at redhat.com>
> >> Cc: "users-ovirt" <users at ovirt.org>
> >> Sent: Monday, March 9, 2015 11:48:03 PM
> >> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on
> >> F20 (The VDSM host was found in a failed
> >> state)
> >>
> >>
> >> On 03/09/2015 02:47 PM, Bob Doolittle wrote:
> >>> Resending with CC to list (and an update).
> >>>
> >>> On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
> >>>> ----- Original Message -----
> >>>>> From: "Bob Doolittle" <bob at doolittle.us.com>
> >>>>> To: "Simone Tiraboschi" <stirabos at redhat.com>
> >>>>> Cc: "users-ovirt" <users at ovirt.org>
> >>>>> Sent: Monday, March 9, 2015 6:26:30 PM
> >>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1
> >>>>> on
> >>>>> F20 (Cannot add the host to cluster ... SSH
> >>>>> has failed)
> >>>>>
> ...
> >>>>> OK, I've started over. Simply removing the storage domain was
> >>>>> insufficient,
> >>>>> the hosted-engine deploy failed when it found the HA and Broker
> >>>>> services
> >>>>> already configured. I decided to just start over fresh starting with
> >>>>> re-installing the OS on my host.
> >>>>>
> >>>>> I can't deploy DNS at the moment, so I have to simply replicate
> >>>>> /etc/hosts
> >>>>> files on my host/engine. I did that this time, but have run into a new
> >>>>> problem:
> >>>>>
> >>>>> [ INFO  ] Engine replied: DB Up!Welcome to Health Status!
> >>>>>           Enter the name of the cluster to which you want to add the
> >>>>>           host
> >>>>>           (Default) [Default]:
> >>>>> [ INFO  ] Waiting for the host to become operational in the engine.
> >>>>> This
> >>>>> may
> >>>>> take several minutes...
> >>>>> [ ERROR ] The VDSM host was found in a failed state. Please check
> >>>>> engine
> >>>>> and
> >>>>> bootstrap installation logs.
> >>>>> [ ERROR ] Unable to add ovirt-vm to the manager
> >>>>>           Please shutdown the VM allowing the system to launch it as a
> >>>>>           monitored service.
> >>>>>           The system will wait until the VM is down.
> >>>>> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection
> >>>>> refused
> >>>>> [ INFO  ] Stage: Clean up
> >>>>> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection
> >>>>> refused
> >>>>>
> >>>>>
> >>>>> I've attached my engine log and the ovirt-hosted-engine-setup log. I
> >>>>> think I
> >>>>> had an issue with resolving external hostnames, or else a connectivity
> >>>>> issue
> >>>>> during the install.
> >>>> For some reason your engine wasn't able to deploy your hosts but the SSH
> >>>> session this time was established.
> >>>> 2015-03-09 13:05:58,514 ERROR
> >>>> [org.ovirt.engine.core.bll.InstallVdsInternalCommand]
> >>>> (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed
> >>>> for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.:
> >>>> java.io.IOException: Command returned failure code 1 during SSH session
> >>>> 'root at xion2.smartcity.net'
> >>>>
> >>>> Can you please attach host-deploy logs from the engine VM?
> >>> OK, attached.
> >>>
> >>> Like I said, it looks to me like a name-resolution issue during the yum
> >>> update on the engine. I think I've fixed that, but do you have a better
> >>> suggestion for cleaning up and re-deploying other than installing the OS
> >>> on my host and starting all over again?
> >> I just finished starting over from scratch, starting with OS installation
> >> on
> >> my host/node, and wound up with a very similar problem - the engine
> >> couldn't
> >> reach the hosts during the yum operation. But this time the error was
> >> "Network is unreachable". Which is weird, because I can ssh into the
> >> engine
> >> and ping many of those hosts, after the operation has failed.
> >>
> >> Here's my latest host-deploy log from the engine. I'd appreciate any
> >> clues.
> > It seams that now your host is able to resolve that addresses but it's not
> > able to connect over http.
> > On your hosts some of them resolves as IPv6 addresses; can you please try
> > to use curl to get one of the file that it wasn't able to fetch?
> > Can you please check your network configuration before and after
> > host-deploy?
> 
> I can give you the network configuration after host-deploy, at least for the
> host/Node. The engine won't start for me this morning, after I shut down the
> host for the night.
> 
> In order to give you the config before host-deploy (or, apparently for the
> engine), I'll have to re-install the OS on the host and start again from
> scratch. Obviously I'd rather not do that unless absolutely necessary.
> 
> Here's the host config after the failed host-deploy:
> 
> Host/Node:
> 
> # ip route
> 169.254.0.0/16 dev ovirtmgmt  scope link  metric 1007
> 172.16.0.0/16 dev ovirtmgmt  proto kernel  scope link  src 172.16.0.58

You are missing a default gateway and so the issue.
Are you sure that it was properly configured before trying to deploy that host?

> # ip addr
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
> default
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 scope host lo
>        valid_lft forever preferred_lft forever
>     inet6 ::1/128 scope host
>        valid_lft forever preferred_lft forever
> 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master
> ovirtmgmt state UP group default qlen 1000
>     link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::baca:3aff:fe79:2212/64 scope link
>        valid_lft forever preferred_lft forever
> 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue
> state DOWN group default
>     link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff
> 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
> group default qlen 1000
>     link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff
> 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> default
>     link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff
> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
> UP group default
>     link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff
>     inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt
>        valid_lft forever preferred_lft forever
>     inet6 fe80::baca:3aff:fe79:2212/64 scope link
>        valid_lft forever preferred_lft forever
> 
> 
> The only unusual thing about my setup that I can think of, from the network
> perspective, is that my physical host has a wireless interface, which I've
> not configured. Could it be confusing hosted-engine --deploy?
> 
> -Bob
> 
>