----- Original Message -----
From: "Bob Doolittle" <bob(a)doolittle.us.com>
To: "Simone Tiraboschi" <stirabos(a)redhat.com>
Cc: "users-ovirt" <users(a)ovirt.org>
Sent: Tuesday, March 10, 2015 2:40:13 PM
Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM
host was found in a failed
state)
On 03/10/2015 04:58 AM, Simone Tiraboschi wrote:
>
> ----- Original Message -----
>> From: "Bob Doolittle" <bob(a)doolittle.us.com>
>> To: "Simone Tiraboschi" <stirabos(a)redhat.com>
>> Cc: "users-ovirt" <users(a)ovirt.org>
>> Sent: Monday, March 9, 2015 11:48:03 PM
>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on
>> F20 (The VDSM host was found in a failed
>> state)
>>
>>
>> On 03/09/2015 02:47 PM, Bob Doolittle wrote:
>>> Resending with CC to list (and an update).
>>>
>>> On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
>>>> ----- Original Message -----
>>>>> From: "Bob Doolittle" <bob(a)doolittle.us.com>
>>>>> To: "Simone Tiraboschi" <stirabos(a)redhat.com>
>>>>> Cc: "users-ovirt" <users(a)ovirt.org>
>>>>> Sent: Monday, March 9, 2015 6:26:30 PM
>>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for
3.5.1
>>>>> on
>>>>> F20 (Cannot add the host to cluster ... SSH
>>>>> has failed)
>>>>>
...
>>>>> OK, I've started over. Simply removing the storage domain was
>>>>> insufficient,
>>>>> the hosted-engine deploy failed when it found the HA and Broker
>>>>> services
>>>>> already configured. I decided to just start over fresh starting
with
>>>>> re-installing the OS on my host.
>>>>>
>>>>> I can't deploy DNS at the moment, so I have to simply replicate
>>>>> /etc/hosts
>>>>> files on my host/engine. I did that this time, but have run into a
new
>>>>> problem:
>>>>>
>>>>> [ INFO ] Engine replied: DB Up!Welcome to Health Status!
>>>>> Enter the name of the cluster to which you want to add
the
>>>>> host
>>>>> (Default) [Default]:
>>>>> [ INFO ] Waiting for the host to become operational in the engine.
>>>>> This
>>>>> may
>>>>> take several minutes...
>>>>> [ ERROR ] The VDSM host was found in a failed state. Please check
>>>>> engine
>>>>> and
>>>>> bootstrap installation logs.
>>>>> [ ERROR ] Unable to add ovirt-vm to the manager
>>>>> Please shutdown the VM allowing the system to launch it as
a
>>>>> monitored service.
>>>>> The system will wait until the VM is down.
>>>>> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111]
Connection
>>>>> refused
>>>>> [ INFO ] Stage: Clean up
>>>>> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111]
Connection
>>>>> refused
>>>>>
>>>>>
>>>>> I've attached my engine log and the ovirt-hosted-engine-setup
log. I
>>>>> think I
>>>>> had an issue with resolving external hostnames, or else a
connectivity
>>>>> issue
>>>>> during the install.
>>>> For some reason your engine wasn't able to deploy your hosts but the
SSH
>>>> session this time was established.
>>>> 2015-03-09 13:05:58,514 ERROR
>>>> [org.ovirt.engine.core.bll.InstallVdsInternalCommand]
>>>> (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed
>>>> for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.:
>>>> java.io.IOException: Command returned failure code 1 during SSH session
>>>> 'root(a)xion2.smartcity.net'
>>>>
>>>> Can you please attach host-deploy logs from the engine VM?
>>> OK, attached.
>>>
>>> Like I said, it looks to me like a name-resolution issue during the yum
>>> update on the engine. I think I've fixed that, but do you have a better
>>> suggestion for cleaning up and re-deploying other than installing the OS
>>> on my host and starting all over again?
>> I just finished starting over from scratch, starting with OS installation
>> on
>> my host/node, and wound up with a very similar problem - the engine
>> couldn't
>> reach the hosts during the yum operation. But this time the error was
>> "Network is unreachable". Which is weird, because I can ssh into the
>> engine
>> and ping many of those hosts, after the operation has failed.
>>
>> Here's my latest host-deploy log from the engine. I'd appreciate any
>> clues.
> It seams that now your host is able to resolve that addresses but it's not
> able to connect over http.
> On your hosts some of them resolves as IPv6 addresses; can you please try
> to use curl to get one of the file that it wasn't able to fetch?
> Can you please check your network configuration before and after
> host-deploy?
I can give you the network configuration after host-deploy, at least for the
host/Node. The engine won't start for me this morning, after I shut down the
host for the night.
In order to give you the config before host-deploy (or, apparently for the
engine), I'll have to re-install the OS on the host and start again from
scratch. Obviously I'd rather not do that unless absolutely necessary.
Here's the host config after the failed host-deploy:
Host/Node:
# ip route
169.254.0.0/16 dev ovirtmgmt scope link metric 1007
172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58
You are missing a default gateway and so the issue.
Are you sure that it was properly configured before trying to deploy that host?
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master
ovirtmgmt state UP group default qlen 1000
link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff
inet6 fe80::baca:3aff:fe79:2212/64 scope link
valid_lft forever preferred_lft forever
3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue
state DOWN group default
link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff
4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
group default qlen 1000
link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff
6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default
link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff
7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP group default
link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::baca:3aff:fe79:2212/64 scope link
valid_lft forever preferred_lft forever
The only unusual thing about my setup that I can think of, from the network
perspective, is that my physical host has a wireless interface, which I've
not configured. Could it be confusing hosted-engine --deploy?
-Bob