On 03/10/2015 04:58 AM, Simone Tiraboschi wrote:
----- Original Message -----
> From: "Bob Doolittle" <bob(a)doolittle.us.com>
> To: "Simone Tiraboschi" <stirabos(a)redhat.com>
> Cc: "users-ovirt" <users(a)ovirt.org>
> Sent: Monday, March 9, 2015 11:48:03 PM
> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The
VDSM host was found in a failed
> state)
>
>
> On 03/09/2015 02:47 PM, Bob Doolittle wrote:
>> Resending with CC to list (and an update).
>>
>> On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
>>> ----- Original Message -----
>>>> From: "Bob Doolittle" <bob(a)doolittle.us.com>
>>>> To: "Simone Tiraboschi" <stirabos(a)redhat.com>
>>>> Cc: "users-ovirt" <users(a)ovirt.org>
>>>> Sent: Monday, March 9, 2015 6:26:30 PM
>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on
>>>> F20 (Cannot add the host to cluster ... SSH
>>>> has failed)
>>>>
...
>>>> OK, I've started over. Simply removing the
storage domain was
>>>> insufficient,
>>>> the hosted-engine deploy failed when it found the HA and Broker services
>>>> already configured. I decided to just start over fresh starting with
>>>> re-installing the OS on my host.
>>>>
>>>> I can't deploy DNS at the moment, so I have to simply replicate
>>>> /etc/hosts
>>>> files on my host/engine. I did that this time, but have run into a new
>>>> problem:
>>>>
>>>> [ INFO ] Engine replied: DB Up!Welcome to Health Status!
>>>> Enter the name of the cluster to which you want to add the
host
>>>> (Default) [Default]:
>>>> [ INFO ] Waiting for the host to become operational in the engine. This
>>>> may
>>>> take several minutes...
>>>> [ ERROR ] The VDSM host was found in a failed state. Please check engine
>>>> and
>>>> bootstrap installation logs.
>>>> [ ERROR ] Unable to add ovirt-vm to the manager
>>>> Please shutdown the VM allowing the system to launch it as a
>>>> monitored service.
>>>> The system will wait until the VM is down.
>>>> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111]
Connection
>>>> refused
>>>> [ INFO ] Stage: Clean up
>>>> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111]
Connection
>>>> refused
>>>>
>>>>
>>>> I've attached my engine log and the ovirt-hosted-engine-setup log. I
>>>> think I
>>>> had an issue with resolving external hostnames, or else a connectivity
>>>> issue
>>>> during the install.
>>> For some reason your engine wasn't able to deploy your hosts but the SSH
>>> session this time was established.
>>> 2015-03-09 13:05:58,514 ERROR
>>> [org.ovirt.engine.core.bll.InstallVdsInternalCommand]
>>> (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed
>>> for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.:
>>> java.io.IOException: Command returned failure code 1 during SSH session
>>> 'root(a)xion2.smartcity.net'
>>>
>>> Can you please attach host-deploy logs from the engine VM?
>> OK, attached.
>>
>> Like I said, it looks to me like a name-resolution issue during the yum
>> update on the engine. I think I've fixed that, but do you have a better
>> suggestion for cleaning up and re-deploying other than installing the OS
>> on my host and starting all over again?
> I just finished starting over from scratch, starting with OS installation on
> my host/node, and wound up with a very similar problem - the engine couldn't
> reach the hosts during the yum operation. But this time the error was
> "Network is unreachable". Which is weird, because I can ssh into the
engine
> and ping many of those hosts, after the operation has failed.
>
> Here's my latest host-deploy log from the engine. I'd appreciate any clues.
It seams that now your host is able to resolve that addresses but it's not able to
connect over http.
On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to
get one of the file that it wasn't able to fetch?
Can you please check your network configuration before and after host-deploy?
I can give you the network configuration after host-deploy, at least for the host/Node.
The engine won't start for me this morning, after I shut down the host for the night.
In order to give you the config before host-deploy (or, apparently for the engine),
I'll have to re-install the OS on the host and start again from scratch. Obviously
I'd rather not do that unless absolutely necessary.
Here's the host config after the failed host-deploy:
Host/Node:
# ip route
169.254.0.0/16 dev ovirtmgmt scope link metric 1007
172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master
ovirtmgmt state UP group default qlen 1000
link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff
inet6 fe80::baca:3aff:fe79:2212/64 scope link
valid_lft forever preferred_lft forever
3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state
DOWN group default
link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff
4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group
default qlen 1000
link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff
6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff
7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
group default
link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::baca:3aff:fe79:2212/64 scope link
valid_lft forever preferred_lft forever
The only unusual thing about my setup that I can think of, from the network perspective,
is that my physical host has a wireless interface, which I've not configured. Could it
be confusing hosted-engine --deploy?
-Bob