Resending with CC to list (and an update).
On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
> ----- Original Message -----
>> From: "Bob Doolittle" <bob(a)doolittle.us.com>
>> To: "Simone Tiraboschi" <stirabos(a)redhat.com>
>> Cc: "users-ovirt" <users(a)ovirt.org>
>> Sent: Monday, March 9, 2015 6:26:30 PM
>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20
(Cannot add the host to cluster ... SSH
>> has failed)
>>
>>
>> On 03/09/2015 12:53 PM, Simone Tiraboschi wrote:
>>> ----- Original Message -----
>>>> From: "Bob Doolittle" <bob(a)doolittle.us.com>
>>>> To: "Simone Tiraboschi" <stirabos(a)redhat.com>
>>>> Cc: "users-ovirt" <users(a)ovirt.org>
>>>> Sent: Monday, March 9, 2015 12:48:37 PM
>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on
>>>> F20 (Cannot add the host to cluster ... SSH
>>>> has failed)
>>>>
>>>>
>>>> On 03/09/2015 07:12 AM, Simone Tiraboschi wrote:
>>>>> ----- Original Message -----
>>>>>> From: "Bob Doolittle" <bob(a)doolittle.us.com>
>>>>>> To: "Simone Tiraboschi" <stirabos(a)redhat.com>
>>>>>> Sent: Monday, March 9, 2015 12:02:49 PM
>>>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for
3.5.1 on
>>>>>> F20 (Cannot add the host to cluster ... SSH
>>>>>> has failed)
>>>>>>
>>>>>> On Mar 9, 2015 5:23 AM, "Simone Tiraboschi"
<stirabos(a)redhat.com> wrote:
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Bob Doolittle"
<bob(a)doolittle.us.com>
>>>>>>>> To: "users-ovirt" <users(a)ovirt.org>
>>>>>>>> Sent: Friday, March 6, 2015 9:21:20 PM
>>>>>>>> Subject: [ovirt-users] Error during hosted-engine-setup
for 3.5.1 on
>>>>>> F20 (Cannot add the host to cluster ... SSH has
>>>>>>>> failed)
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm following the instructions here:
>>>>>>
http://www.ovirt.org/Hosted_Engine_Howto
>>>>>>>> My self-hosted install failed near the end:
>>>>>>>>
>>>>>>>> To continue make a selection from the options below:
>>>>>>>> (1) Continue setup - engine installation is
complete
>>>>>>>> (2) Power off and restart the VM
>>>>>>>> (3) Abort setup
>>>>>>>> (4) Destroy VM and abort setup
>>>>>>>>
>>>>>>>> (1, 2, 3, 4)[1]: 1
>>>>>>>> [ INFO ] Engine replied: DB Up!Welcome to Health
Status!
>>>>>>>> Enter the name of the cluster to which you want
to add the
>>>>>> host
>>>>>>>> (Default) [Default]:
>>>>>>>> [ ERROR ] Cannot automatically add the host to cluster
Default: Cannot
>>>>>> add
>>>>>>>> Host. Connecting to host via SSH has failed, verify that
the host is
>>>>>>>> reachable (IP address, routable address etc.) You may
refer to the
>>>>>>>> engine.log file for further details.
>>>>>>>> [ ERROR ] Failed to execute stage 'Closing up':
Cannot add the host to
>>>>>>>> cluster Default
>>>>>>>> [ INFO ] Stage: Clean up
>>>>>>>> [ INFO ] Generating answer file
>>>>>>>>
'/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf'
>>>>>>>> [ INFO ] Stage: Pre-termination
>>>>>>>> [ INFO ] Stage: Termination
>>>>>>>>
>>>>>>>> I can ssh into the engine VM both locally and remotely.
There is no
>>>>>>>> /root/.ssh directory, however. Did I need to set that up
somehow?
>>>>>>> It's the engine that needs to open an SSH connection to
the host
>>>>>>> calling
>>>>>> it by its hostname.
>>>>>>> So please be sure that you can SSH to the host from the
engine using
>>>>>>> its
>>>>>> hostname and not its IP address.
>>>>>>
>>>>>> I'm assuming this should be a password-less login (key-based
>>>>>> authentication?).
>>>>> Yes, it is.
>>>>>
>>>>>> As what user?
>>>>> root
>>>> OK, I see a couple of problems.
>>>> First off, I didn't have my deploying-host hostname in the hosts map
for
>>>> my
>>>> engine.
>>> This is enough by itself to make the deploy procedure failing. If possible
>>> we recommend to rely a DNS infrastructure especially if you are deploying
>>> more than one host.
>> OK, I've started over. Simply removing the storage domain was insufficient,
>> the hosted-engine deploy failed when it found the HA and Broker services
>> already configured. I decided to just start over fresh starting with
>> re-installing the OS on my host.
>>
>> I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts
>> files on my host/engine. I did that this time, but have run into a new
>> problem:
>>
>> [ INFO ] Engine replied: DB Up!Welcome to Health Status!
>> Enter the name of the cluster to which you want to add the host
>> (Default) [Default]:
>> [ INFO ] Waiting for the host to become operational in the engine. This may
>> take several minutes...
>> [ ERROR ] The VDSM host was found in a failed state. Please check engine and
>> bootstrap installation logs.
>> [ ERROR ] Unable to add ovirt-vm to the manager
>> Please shutdown the VM allowing the system to launch it as a
>> monitored service.
>> The system will wait until the VM is down.
>> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection
>> refused
>> [ INFO ] Stage: Clean up
>> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection
refused
>>
>>
>> I've attached my engine log and the ovirt-hosted-engine-setup log. I think I
>> had an issue with resolving external hostnames, or else a connectivity issue
>> during the install.
> For some reason your engine wasn't able to deploy your hosts but the SSH session
this time was established.
> 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand]
(org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host
217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned
failure code 1 during SSH session 'root(a)xion2.smartcity.net'
>
> Can you please attach host-deploy logs from the engine VM?
OK, attached.
Like I said, it looks to me like a name-resolution issue during the yum update on the
engine. I think I've fixed that, but do you have a better suggestion for cleaning up
and re-deploying other than installing the OS on my host and starting all over again?
I just finished starting over from scratch, starting with OS installation on my host/node,
and wound up with a very similar problem - the engine couldn't reach the hosts during
the yum operation. But this time the error was "Network is unreachable". Which
is weird, because I can ssh into the engine and ping many of those hosts, after the
operation has failed.
Here's my latest host-deploy log from the engine. I'd appreciate any clues.
Thanks,
Bob