Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)

On 03/09/2015 07:12 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On Mar 9, 2015 5:23 AM, "Simone Tiraboschi" <stirabos@redhat.com> wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "users-ovirt" <users@ovirt.org> Sent: Friday, March 6, 2015 9:21:20 PM Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on
F20 (Cannot add the host to cluster ... SSH has
failed)
Hi,
I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto My self-hosted install failed near the end:
To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup
(1, 2, 3, 4)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination
I can ssh into the engine VM both locally and remotely. There is no /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address.
I'm assuming this should be a password-less login (key-based authentication?). Yes, it is.
As what user? root
OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. After adding it to /etc/hosts (both hostname and FQDN), when I try to ssh from root@engine to root@host it is prompting me for a password. On my engine, ~root/.ssh does not contain any keys. On my host, ~root/.ssh has authorized_keys, and in it there is a key with the comment "ovirt-engine". It's possible that I inadvertently removed ~root/.ssh on engine while I was preparing the engine (I started to set up my own no-password logins and then thought better and cleaned up, not realizing that some prior setup affecting that directory had occurred). That would explain the second issue. How/when does the key for root@engine get populated to the host's ~root/.ssh/authenticated_keys during setup? -Bob
-Bob
Till hosted-engine hosts were simply identified by their IP address but than we had some bug report on side effects of that. So now we generate and sign certs using host hostnames and so the engine should be able to correctly resolve them.
When I log into the Administration portal, the engine VM does not appear under the Virtual machine view (it's empty). It's cause the setup didn't complete.
I've attached what I think are the relevant logs.
Also, when my host reboots, the ovirt-ha-broker and ovirt-ha-agent services do not come up automatically. I have to use systemctl to start them manually. It's cause the setup didn't complete.
This is a fresh Fedora 20 machine installing a fresh copy of Ovirt 3.5.1. What's the cleanest approach to restore/complete sanity of my setup please? First step is to clarify what went wrong in order to avoid it in the future. Than, if you want a really sanity environment for production use I'd suggest to redeploy. So hosted-engine --vm-poweroff empty the storage domain share and deploy again
Thanks, Bob
I've linked 3 files to this email: server.log (12.4 MB) Dropbox https://db.tt/g5p09AaD vdsm.log (3.2 MB) Dropbox https://db.tt/P4572SUm ovirt-hosted-engine-setup-20150306123622-tad1fy.log (413 KB) Dropbox https://db.tt/XAM9ffhi Mozilla Thunderbird makes it easy to share large files over email.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 03/09/2015 07:12 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On Mar 9, 2015 5:23 AM, "Simone Tiraboschi" <stirabos@redhat.com> wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "users-ovirt" <users@ovirt.org> Sent: Friday, March 6, 2015 9:21:20 PM Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on
F20 (Cannot add the host to cluster ... SSH has
failed)
Hi,
I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto My self-hosted install failed near the end:
To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup
(1, 2, 3, 4)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination
I can ssh into the engine VM both locally and remotely. There is no /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address.
I'm assuming this should be a password-less login (key-based authentication?). Yes, it is.
As what user? root
OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine.
This is enough by itself to make the deploy procedure failing. If possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host.
After adding it to /etc/hosts (both hostname and FQDN), when I try to ssh from root@engine to root@host it is prompting me for a password.
On my engine, ~root/.ssh does not contain any keys. On my host, ~root/.ssh has authorized_keys, and in it there is a key with the comment "ovirt-engine".
It's possible that I inadvertently removed ~root/.ssh on engine while I was preparing the engine (I started to set up my own no-password logins and then thought better and cleaned up, not realizing that some prior setup affecting that directory had occurred). That would explain the second issue.
No, it's OK: the private key is contained in /etc/pki/ovirt-engine/keys/engine.p12
How/when does the key for root@engine get populated to the host's ~root/.ssh/authenticated_keys during setup?
It's part of hosted-engine deploy procedure: when the engine setup on the VM it's completed, it gathers the engine SSH public key from http://{enginefqdn}/engine.ssh.key.txt and it stores it under ~root/.ssh/authenticated_keys to make the engine able to add the host without knowing the host root password. Than hosted-engine setup contacts the engine via REST APIs to trigger the host setup procedure. If the engine wasn't able to contact the host due to bad hostname resolution as we pointed out, you missed some steps to have a safe deployment.
-Bob
-Bob
Till hosted-engine hosts were simply identified by their IP address but than we had some bug report on side effects of that. So now we generate and sign certs using host hostnames and so the engine should be able to correctly resolve them.
When I log into the Administration portal, the engine VM does not appear under the Virtual machine view (it's empty). It's cause the setup didn't complete.
I've attached what I think are the relevant logs.
Also, when my host reboots, the ovirt-ha-broker and ovirt-ha-agent services do not come up automatically. I have to use systemctl to start them manually. It's cause the setup didn't complete.
This is a fresh Fedora 20 machine installing a fresh copy of Ovirt 3.5.1. What's the cleanest approach to restore/complete sanity of my setup please? First step is to clarify what went wrong in order to avoid it in the future. Than, if you want a really sanity environment for production use I'd suggest to redeploy. So hosted-engine --vm-poweroff empty the storage domain share and deploy again
Thanks, Bob
I've linked 3 files to this email: server.log (12.4 MB) Dropbox https://db.tt/g5p09AaD vdsm.log (3.2 MB) Dropbox https://db.tt/P4572SUm ovirt-hosted-engine-setup-20150306123622-tad1fy.log (413 KB) Dropbox https://db.tt/XAM9ffhi Mozilla Thunderbird makes it easy to share large files over email.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 03/09/2015 12:53 PM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 03/09/2015 07:12 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On Mar 9, 2015 5:23 AM, "Simone Tiraboschi" <stirabos@redhat.com> wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "users-ovirt" <users@ovirt.org> Sent: Friday, March 6, 2015 9:21:20 PM Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on
F20 (Cannot add the host to cluster ... SSH has
failed)
Hi,
I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto My self-hosted install failed near the end:
To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup
(1, 2, 3, 4)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination
I can ssh into the engine VM both locally and remotely. There is no /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address.
I'm assuming this should be a password-less login (key-based authentication?). Yes, it is.
As what user? root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. This is enough by itself to make the deploy procedure failing. If possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host.
OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. So I have to start over again. Do you have any better tips on how can I clean up in order to try this operation yet again? Thanks, Bob

----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 03/09/2015 12:53 PM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 03/09/2015 07:12 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On Mar 9, 2015 5:23 AM, "Simone Tiraboschi" <stirabos@redhat.com> wrote:
----- Original Message ----- > From: "Bob Doolittle" <bob@doolittle.us.com> > To: "users-ovirt" <users@ovirt.org> > Sent: Friday, March 6, 2015 9:21:20 PM > Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on
F20 (Cannot add the host to cluster ... SSH has
> failed) > > Hi, > > I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto > My self-hosted install failed near the end: > > To continue make a selection from the options below: > (1) Continue setup - engine installation is complete > (2) Power off and restart the VM > (3) Abort setup > (4) Destroy VM and abort setup > > (1, 2, 3, 4)[1]: 1 > [ INFO ] Engine replied: DB Up!Welcome to Health Status! > Enter the name of the cluster to which you want to add the host > (Default) [Default]: > [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add > Host. Connecting to host via SSH has failed, verify that the host is > reachable (IP address, routable address etc.) You may refer to the > engine.log file for further details. > [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to > cluster Default > [ INFO ] Stage: Clean up > [ INFO ] Generating answer file > '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' > [ INFO ] Stage: Pre-termination > [ INFO ] Stage: Termination > > I can ssh into the engine VM both locally and remotely. There is no > /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address.
I'm assuming this should be a password-less login (key-based authentication?). Yes, it is.
As what user? root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. This is enough by itself to make the deploy procedure failing. If possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host.
OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host.
I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused
I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install.
For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'root@xion2.smartcity.net' Can you please attach host-deploy logs from the engine VM? thanks
So I have to start over again. Do you have any better tips on how can I clean up in order to try this operation yet again?
Thanks, Bob

Resending with CC to list (and an update). On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 03/09/2015 07:12 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On Mar 9, 2015 5:23 AM, "Simone Tiraboschi" <stirabos@redhat.com> wrote: > ----- Original Message ----- >> From: "Bob Doolittle" <bob@doolittle.us.com> >> To: "users-ovirt" <users@ovirt.org> >> Sent: Friday, March 6, 2015 9:21:20 PM >> Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has >> failed) >> >> Hi, >> >> I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto >> My self-hosted install failed near the end: >> >> To continue make a selection from the options below: >> (1) Continue setup - engine installation is complete >> (2) Power off and restart the VM >> (3) Abort setup >> (4) Destroy VM and abort setup >> >> (1, 2, 3, 4)[1]: 1 >> [ INFO ] Engine replied: DB Up!Welcome to Health Status! >> Enter the name of the cluster to which you want to add the host >> (Default) [Default]: >> [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add >> Host. Connecting to host via SSH has failed, verify that the host is >> reachable (IP address, routable address etc.) You may refer to the >> engine.log file for further details. >> [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to >> cluster Default >> [ INFO ] Stage: Clean up >> [ INFO ] Generating answer file >> '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' >> [ INFO ] Stage: Pre-termination >> [ INFO ] Stage: Termination >> >> I can ssh into the engine VM both locally and remotely. There is no >> /root/.ssh directory, however. Did I need to set that up somehow? > It's the engine that needs to open an SSH connection to the host > calling it by its hostname. > So please be sure that you can SSH to the host from the engine using > its hostname and not its IP address.
I'm assuming this should be a password-less login (key-based authentication?). Yes, it is.
As what user? root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. This is enough by itself to make the deploy procedure failing. If possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host. OK, I've started over. Simply removing the storage domain was insufficient,
On 03/09/2015 12:53 PM, Simone Tiraboschi wrote: the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host.
I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused
I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'root@xion2.smartcity.net'
Can you please attach host-deploy logs from the engine VM?
OK, attached. Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? -Bob
thanks
So I have to start over again. Do you have any better tips on how can I clean up in order to try this operation yet again?
Thanks, Bob

On 03/09/2015 02:47 PM, Bob Doolittle wrote:
Resending with CC to list (and an update).
On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 03/09/2015 07:12 AM, Simone Tiraboschi wrote:
----- Original Message ----- > From: "Bob Doolittle" <bob@doolittle.us.com> > To: "Simone Tiraboschi" <stirabos@redhat.com> > Sent: Monday, March 9, 2015 12:02:49 PM > Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on > F20 (Cannot add the host to cluster ... SSH > has failed) > > On Mar 9, 2015 5:23 AM, "Simone Tiraboschi" <stirabos@redhat.com> wrote: >> ----- Original Message ----- >>> From: "Bob Doolittle" <bob@doolittle.us.com> >>> To: "users-ovirt" <users@ovirt.org> >>> Sent: Friday, March 6, 2015 9:21:20 PM >>> Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on > F20 (Cannot add the host to cluster ... SSH has >>> failed) >>> >>> Hi, >>> >>> I'm following the instructions here: > http://www.ovirt.org/Hosted_Engine_Howto >>> My self-hosted install failed near the end: >>> >>> To continue make a selection from the options below: >>> (1) Continue setup - engine installation is complete >>> (2) Power off and restart the VM >>> (3) Abort setup >>> (4) Destroy VM and abort setup >>> >>> (1, 2, 3, 4)[1]: 1 >>> [ INFO ] Engine replied: DB Up!Welcome to Health Status! >>> Enter the name of the cluster to which you want to add the > host >>> (Default) [Default]: >>> [ ERROR ] Cannot automatically add the host to cluster Default: Cannot > add >>> Host. Connecting to host via SSH has failed, verify that the host is >>> reachable (IP address, routable address etc.) You may refer to the >>> engine.log file for further details. >>> [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to >>> cluster Default >>> [ INFO ] Stage: Clean up >>> [ INFO ] Generating answer file >>> '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' >>> [ INFO ] Stage: Pre-termination >>> [ INFO ] Stage: Termination >>> >>> I can ssh into the engine VM both locally and remotely. There is no >>> /root/.ssh directory, however. Did I need to set that up somehow? >> It's the engine that needs to open an SSH connection to the host >> calling > it by its hostname. >> So please be sure that you can SSH to the host from the engine using >> its > hostname and not its IP address. > > I'm assuming this should be a password-less login (key-based > authentication?). Yes, it is.
> As what user? root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. This is enough by itself to make the deploy procedure failing. If possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host. OK, I've started over. Simply removing the storage domain was insufficient,
On 03/09/2015 12:53 PM, Simone Tiraboschi wrote: the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host.
I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused
I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'root@xion2.smartcity.net'
Can you please attach host-deploy logs from the engine VM? OK, attached.
Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again?
I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was "Network is unreachable". Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed. Here's my latest host-deploy log from the engine. I'd appreciate any clues. Thanks, Bob

----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/09/2015 02:47 PM, Bob Doolittle wrote:
Resending with CC to list (and an update).
On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 03/09/2015 07:12 AM, Simone Tiraboschi wrote: > ----- Original Message ----- >> From: "Bob Doolittle" <bob@doolittle.us.com> >> To: "Simone Tiraboschi" <stirabos@redhat.com> >> Sent: Monday, March 9, 2015 12:02:49 PM >> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 >> on >> F20 (Cannot add the host to cluster ... SSH >> has failed) >> >> On Mar 9, 2015 5:23 AM, "Simone Tiraboschi" <stirabos@redhat.com> >> wrote: >>> ----- Original Message ----- >>>> From: "Bob Doolittle" <bob@doolittle.us.com> >>>> To: "users-ovirt" <users@ovirt.org> >>>> Sent: Friday, March 6, 2015 9:21:20 PM >>>> Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 >>>> on >> F20 (Cannot add the host to cluster ... SSH has >>>> failed) >>>> >>>> Hi, >>>> >>>> I'm following the instructions here: >> http://www.ovirt.org/Hosted_Engine_Howto >>>> My self-hosted install failed near the end: >>>> >>>> To continue make a selection from the options below: >>>> (1) Continue setup - engine installation is complete >>>> (2) Power off and restart the VM >>>> (3) Abort setup >>>> (4) Destroy VM and abort setup >>>> >>>> (1, 2, 3, 4)[1]: 1 >>>> [ INFO ] Engine replied: DB Up!Welcome to Health Status! >>>> Enter the name of the cluster to which you want to add >>>> the >> host >>>> (Default) [Default]: >>>> [ ERROR ] Cannot automatically add the host to cluster Default: >>>> Cannot >> add >>>> Host. Connecting to host via SSH has failed, verify that the host >>>> is >>>> reachable (IP address, routable address etc.) You may refer to the >>>> engine.log file for further details. >>>> [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host >>>> to >>>> cluster Default >>>> [ INFO ] Stage: Clean up >>>> [ INFO ] Generating answer file >>>> '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' >>>> [ INFO ] Stage: Pre-termination >>>> [ INFO ] Stage: Termination >>>> >>>> I can ssh into the engine VM both locally and remotely. There is no >>>> /root/.ssh directory, however. Did I need to set that up somehow? >>> It's the engine that needs to open an SSH connection to the host >>> calling >> it by its hostname. >>> So please be sure that you can SSH to the host from the engine using >>> its >> hostname and not its IP address. >> >> I'm assuming this should be a password-less login (key-based >> authentication?). > Yes, it is. > >> As what user? > root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. This is enough by itself to make the deploy procedure failing. If
----- Original Message ----- possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host. OK, I've started over. Simply removing the storage domain was insufficient,
On 03/09/2015 12:53 PM, Simone Tiraboschi wrote: the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host.
I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused
I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'root@xion2.smartcity.net'
Can you please attach host-deploy logs from the engine VM? OK, attached.
Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again?
I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was "Network is unreachable". Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed.
Here's my latest host-deploy log from the engine. I'd appreciate any clues.
It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy?
Thanks, Bob

On 03/10/2015 04:58 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/09/2015 02:47 PM, Bob Doolittle wrote:
Resending with CC to list (and an update).
On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
...
OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host.
I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused
I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'root@xion2.smartcity.net'
Can you please attach host-deploy logs from the engine VM? OK, attached.
Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was "Network is unreachable". Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed.
Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy?
I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night. In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary. Here's the host config after the failed host-deploy: Host/Node: # ip route 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 # ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever The only unusual thing about my setup that I can think of, from the network perspective, is that my physical host has a wireless interface, which I've not configured. Could it be confusing hosted-engine --deploy? -Bob

----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/10/2015 04:58 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/09/2015 02:47 PM, Bob Doolittle wrote:
Resending with CC to list (and an update).
On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
...
OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host.
I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem:
[ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused
I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'root@xion2.smartcity.net'
Can you please attach host-deploy logs from the engine VM? OK, attached.
Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was "Network is unreachable". Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed.
Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy?
I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night.
In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary.
Here's the host config after the failed host-deploy:
Host/Node:
# ip route 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58
You are missing a default gateway and so the issue. Are you sure that it was properly configured before trying to deploy that host?
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever
The only unusual thing about my setup that I can think of, from the network perspective, is that my physical host has a wireless interface, which I've not configured. Could it be confusing hosted-engine --deploy?
-Bob

This is a multi-part message in MIME format. --------------070402000308020002020408 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 03/10/2015 10:20 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/10/2015 04:58 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/09/2015 02:47 PM, Bob Doolittle wrote:
Resending with CC to list (and an update).
On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:
----- Original Message ----- > From: "Bob Doolittle" <bob@doolittle.us.com> > To: "Simone Tiraboschi" <stirabos@redhat.com> > Cc: "users-ovirt" <users@ovirt.org> > Sent: Monday, March 9, 2015 6:26:30 PM > Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 > on > F20 (Cannot add the host to cluster ... SSH > has failed) > ... > OK, I've started over. Simply removing the storage domain was > insufficient, > the hosted-engine deploy failed when it found the HA and Broker > services > already configured. I decided to just start over fresh starting with > re-installing the OS on my host. > > I can't deploy DNS at the moment, so I have to simply replicate > /etc/hosts > files on my host/engine. I did that this time, but have run into a new > problem: > > [ INFO ] Engine replied: DB Up!Welcome to Health Status! > Enter the name of the cluster to which you want to add the > host > (Default) [Default]: > [ INFO ] Waiting for the host to become operational in the engine. > This > may > take several minutes... > [ ERROR ] The VDSM host was found in a failed state. Please check > engine > and > bootstrap installation logs. > [ ERROR ] Unable to add ovirt-vm to the manager > Please shutdown the VM allowing the system to launch it as a > monitored service. > The system will wait until the VM is down. > [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection > refused > [ INFO ] Stage: Clean up > [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection > refused > > > I've attached my engine log and the ovirt-hosted-engine-setup log. I > think I > had an issue with resolving external hostnames, or else a connectivity > issue > during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'root@xion2.smartcity.net'
Can you please attach host-deploy logs from the engine VM? OK, attached.
Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was "Network is unreachable". Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed.
Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy? I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night.
In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary.
Here's the host config after the failed host-deploy:
Host/Node:
# ip route 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 You are missing a default gateway and so the issue. Are you sure that it was properly configured before trying to deploy that host?
It should have been, it was a fresh OS install. So I'm starting again, and keeping careful records of my network config. Here is my initial network config of my host/node, immediately following a new OS install: % ip route default via 172.16.0.1 dev p3p1 proto static metric 1024 172.16.0.0/16 dev p3p1 proto kernel scope link src 172.16.0.58 % ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global p3p1 valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff After the VM is first created, the host/node config is: # ip route default via 172.16.0.1 dev ovirtmgmt 169.254.0.0/16 dev ovirtmgmt scope link metric 1006 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 # ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 4: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 92:cb:9d:97:18:36 brd ff:ff:ff:ff:ff:ff 5: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 9a:bc:29:52:82:38 brd ff:ff:ff:ff:ff:ff 6: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN group default qlen 500 link/ether fe:16:3e:16:a4:37 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe16:a437/64 scope link valid_lft forever preferred_lft forever At this point, I was already seeing a problem on the host/node. I remembered that a newer version of sos package is delivered from the ovirt repositories. So I tried to do a "yum update" on my host, and got a similar problem: % sudo yum update [sudo] password for rad: Loaded plugins: langpacks, refresh-packagekit Resolving Dependencies --> Running transaction check ---> Package sos.noarch 0:3.1-1.fc20 will be updated ---> Package sos.noarch 0:3.2-0.2.fc20.ovirt will be an update --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================ Package Arch Version Repository Size ================================================================================================================ Updating: sos noarch 3.2-0.2.fc20.ovirt ovirt-3.5 292 k Transaction Summary ================================================================================================================ Upgrade 1 Package Total download size: 292 k Is this ok [y/d/N]: y Downloading packages: No Presto metadata available for ovirt-3.5 sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://www.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-...: [Errno 14] curl#6 - "Could not resolve host: www.gtlib.gatech.edu" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED ftp://ftp.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: ftp.gtlib.gatech.edu" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://resources.ovirt.org/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ov...: [Errno 14] curl#6 - "Could not resolve host: resources.ovirt.org" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://ftp.snt.utwente.nl/pub/software/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3...: [Errno 14] curl#6 - "Could not resolve host: ftp.snt.utwente.nl" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://ftp.nluug.nl/os/Linux/virtual/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2...: [Errno 14] curl#6 - "Could not resolve host: ftp.nluug.nl" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://mirror.linux.duke.edu/ovirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2...: [Errno 14] curl#6 - "Could not resolve host: mirror.linux.duke.edu" Trying other mirror. Error downloading packages: sos-3.2-0.2.fc20.ovirt.noarch: [Errno 256] No more mirrors to try. This was similar to my previous failures. I took a look, and the problem was that /etc/resolv.conf had no nameservers, and the /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt file contained no entries for DNS1 or DOMAIN. So, it appears that when hosted-engine set up my bridged network, it neglected to carry over the DNS configuration necessary to the bridge. Note that I am using *static* network configuration, rather than DHCP. During installation of the OS I am setting up the network configuration as Manual. Perhaps the hosted-engine script is not properly prepared to deal with that? I went ahead and modified the ifcfg-ovirtmgmt network script (for the next service restart/boot) and resolv.conf (I was afraid to restart the network in the middle of hosted-engine execution since I don't know what might already be connected to the engine). This time it got further, but ultimately it still failed at the very end: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': Error acquiring VM status [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150310140028.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination At that point, neither the ovirt-ha-broker or ovirt-ha-agent services were running. Note there was no significant pause after it said "The system will wait until the VM is down". After the script completed, I shut down the VM, and manually started the ha services, and the VM came up. I could login to the Administration Portal, and finally see my HostedEngine VM. :-) I seem to be in a bad state however: The Data Center has *no* storage domains attached. I'm not sure what else might need cleaning up. Any assistance appreciated. -Bob
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever
The only unusual thing about my setup that I can think of, from the network perspective, is that my physical host has a wireless interface, which I've not configured. Could it be confusing hosted-engine --deploy?
-Bob
--------------070402000308020002020408 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <br> <div class="moz-cite-prefix">On 03/10/2015 10:20 AM, Simone Tiraboschi wrote:<br> </div> <blockquote cite="mid:330401686.19087599.1425997248583.JavaMail.zimbra@redhat.com" type="cite"> <pre wrap=""> ----- Original Message ----- </pre> <blockquote type="cite"> <pre wrap="">From: "Bob Doolittle" <a class="moz-txt-link-rfc2396E" href="mailto:bob@doolittle.us.com"><bob@doolittle.us.com></a> To: "Simone Tiraboschi" <a class="moz-txt-link-rfc2396E" href="mailto:stirabos@redhat.com"><stirabos@redhat.com></a> Cc: "users-ovirt" <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org"><users@ovirt.org></a> Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: </pre> <blockquote type="cite"> <pre wrap=""> ----- Original Message ----- </pre> <blockquote type="cite"> <pre wrap="">From: "Bob Doolittle" <a class="moz-txt-link-rfc2396E" href="mailto:bob@doolittle.us.com"><bob@doolittle.us.com></a> To: "Simone Tiraboschi" <a class="moz-txt-link-rfc2396E" href="mailto:stirabos@redhat.com"><stirabos@redhat.com></a> Cc: "users-ovirt" <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org"><users@ovirt.org></a> Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/09/2015 02:47 PM, Bob Doolittle wrote: </pre> <blockquote type="cite"> <pre wrap="">Resending with CC to list (and an update). On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: </pre> <blockquote type="cite"> <pre wrap="">----- Original Message ----- </pre> <blockquote type="cite"> <pre wrap="">From: "Bob Doolittle" <a class="moz-txt-link-rfc2396E" href="mailto:bob@doolittle.us.com"><bob@doolittle.us.com></a> To: "Simone Tiraboschi" <a class="moz-txt-link-rfc2396E" href="mailto:stirabos@redhat.com"><stirabos@redhat.com></a> Cc: "users-ovirt" <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org"><users@ovirt.org></a> Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) </pre> </blockquote> </blockquote> </blockquote> </blockquote> </blockquote> <pre wrap="">... </pre> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. </pre> </blockquote> <pre wrap="">For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session '<a class="moz-txt-link-abbreviated" href="mailto:root@xion2.smartcity.net">root@xion2.smartcity.net</a>' Can you please attach host-deploy logs from the engine VM? </pre> </blockquote> <pre wrap="">OK, attached. Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? </pre> </blockquote> <pre wrap="">I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was "Network is unreachable". Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed. Here's my latest host-deploy log from the engine. I'd appreciate any clues. </pre> </blockquote> <pre wrap="">It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy? </pre> </blockquote> <pre wrap=""> I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night. In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary. Here's the host config after the failed host-deploy: Host/Node: # ip route 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 </pre> </blockquote> <pre wrap=""> You are missing a default gateway and so the issue. Are you sure that it was properly configured before trying to deploy that host?</pre> </blockquote> <br> It should have been, it was a fresh OS install. So I'm starting again, and keeping careful records of my network config.<br> <br> Here is my initial network config of my host/node, immediately following a new OS install:<br> <br> <pre>% ip route default via 172.16.0.1 dev p3p1 proto static metric 1024 172.16.0.0/16 dev p3p1 proto kernel scope link src 172.16.0.58 % ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global p3p1 valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff </pre> <br> After the VM is first created, the host/node config is:<br> <br> <pre># ip route default via 172.16.0.1 dev ovirtmgmt 169.254.0.0/16 dev ovirtmgmt scope link metric 1006 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 # ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 4: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 92:cb:9d:97:18:36 brd ff:ff:ff:ff:ff:ff 5: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 9a:bc:29:52:82:38 brd ff:ff:ff:ff:ff:ff 6: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN group default qlen 500 link/ether fe:16:3e:16:a4:37 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe16:a437/64 scope link valid_lft forever preferred_lft forever </pre> <br> At this point, I was already seeing a problem on the host/node. I remembered that a newer version of sos package is delivered from the ovirt repositories. So I tried to do a "yum update" on my host, and got a similar problem:<br> <br> <pre>% sudo yum update [sudo] password for rad: Loaded plugins: langpacks, refresh-packagekit Resolving Dependencies --> Running transaction check ---> Package sos.noarch 0:3.1-1.fc20 will be updated ---> Package sos.noarch 0:3.2-0.2.fc20.ovirt will be an update --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================ Package Arch Version Repository Size ================================================================================================================ Updating: sos noarch 3.2-0.2.fc20.ovirt ovirt-3.5 292 k Transaction Summary ================================================================================================================ Upgrade 1 Package Total download size: 292 k Is this ok [y/d/N]: y Downloading packages: No Presto metadata available for ovirt-3.5 sos-3.2-0.2.fc20.ovirt.noarch. FAILED <a class="moz-txt-link-freetext" href="http://www.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://www.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a class="moz-txt-link-abbreviated" href="http://www.gtlib.gatech.edu">www.gtlib.gatech.edu</a>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED <a class="moz-txt-link-freetext" href="ftp://ftp.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">ftp://ftp.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a class="moz-txt-link-abbreviated" href="ftp://ftp.gtlib.gatech.edu">ftp.gtlib.gatech.edu</a>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED <a class="moz-txt-link-freetext" href="http://resources.ovirt.org/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://resources.ovirt.org/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: resources.ovirt.org" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED <a class="moz-txt-link-freetext" href="http://ftp.snt.utwente.nl/pub/software/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://ftp.snt.utwente.nl/pub/software/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a class="moz-txt-link-abbreviated" href="ftp://ftp.snt.utwente.nl">ftp.snt.utwente.nl</a>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED <a class="moz-txt-link-freetext" href="http://ftp.nluug.nl/os/Linux/virtual/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://ftp.nluug.nl/os/Linux/virtual/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a class="moz-txt-link-abbreviated" href="ftp://ftp.nluug.nl">ftp.nluug.nl</a>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED <a class="moz-txt-link-freetext" href="http://mirror.linux.duke.edu/ovirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://mirror.linux.duke.edu/ovirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: mirror.linux.duke.edu" Trying other mirror. Error downloading packages: sos-3.2-0.2.fc20.ovirt.noarch: [Errno 256] No more mirrors to try. </pre> <br> This was similar to my previous failures. I took a look, and the problem was that /etc/resolv.conf had no nameservers, and the /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt file contained no entries for DNS1 or DOMAIN.<br> <br> So, it appears that when hosted-engine set up my bridged network, it neglected to carry over the DNS configuration necessary to the bridge.<br> <br> Note that I am using *static* network configuration, rather than DHCP. During installation of the OS I am setting up the network configuration as Manual. Perhaps the hosted-engine script is not properly prepared to deal with that?<br> <br> I went ahead and modified the ifcfg-ovirtmgmt network script (for the next service restart/boot) and resolv.conf (I was afraid to restart the network in the middle of hosted-engine execution since I don't know what might already be connected to the engine). This time it got further, but ultimately it still failed at the very end:<br> <br> <pre>[ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': Error acquiring VM status [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150310140028.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination </pre> <br> At that point, neither the ovirt-ha-broker or ovirt-ha-agent services were running.<br> <br> Note there was no significant pause after it said "The system will wait until the VM is down".<br> <br> After the script completed, I shut down the VM, and manually started the ha services, and the VM came up. I could login to the Administration Portal, and finally see my HostedEngine VM. :-)<br> <br> I seem to be in a bad state however: The Data Center has <b>no</b> storage domains attached. I'm not sure what else might need cleaning up. Any assistance appreciated.<br> <br> -Bob<br> <br> <br> <blockquote cite="mid:330401686.19087599.1425997248583.JavaMail.zimbra@redhat.com" type="cite"> <blockquote type="cite"> <pre wrap=""># ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever The only unusual thing about my setup that I can think of, from the network perspective, is that my physical host has a wireless interface, which I've not configured. Could it be confusing hosted-engine --deploy? -Bob </pre> </blockquote> </blockquote> <br> </body> </html> --------------070402000308020002020408--

----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Tuesday, March 10, 2015 7:29:44 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/10/2015 10:20 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: "users-ovirt" <users@ovirt.org> Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/09/2015 02:47 PM, Bob Doolittle wrote:
Resending with CC to list (and an update).
On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: > ----- Original Message ----- >> From: "Bob Doolittle" <bob@doolittle.us.com> >> To: "Simone Tiraboschi" <stirabos@redhat.com> >> Cc: "users-ovirt" <users@ovirt.org> >> Sent: Monday, March 9, 2015 6:26:30 PM >> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 >> on >> F20 (Cannot add the host to cluster ... SSH >> has failed) >> ... >> OK, I've started over. Simply removing the storage domain was >> insufficient, >> the hosted-engine deploy failed when it found the HA and Broker >> services >> already configured. I decided to just start over fresh starting with >> re-installing the OS on my host. >> >> I can't deploy DNS at the moment, so I have to simply replicate >> /etc/hosts >> files on my host/engine. I did that this time, but have run into a >> new >> problem: >> >> [ INFO ] Engine replied: DB Up!Welcome to Health Status! >> Enter the name of the cluster to which you want to add the >> host >> (Default) [Default]: >> [ INFO ] Waiting for the host to become operational in the engine. >> This >> may >> take several minutes... >> [ ERROR ] The VDSM host was found in a failed state. Please check >> engine >> and >> bootstrap installation logs. >> [ ERROR ] Unable to add ovirt-vm to the manager >> Please shutdown the VM allowing the system to launch it as >> a >> monitored service. >> The system will wait until the VM is down. >> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] >> Connection >> refused >> [ INFO ] Stage: Clean up >> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection >> refused >> >> >> I've attached my engine log and the ovirt-hosted-engine-setup log. I >> think I >> had an issue with resolving external hostnames, or else a >> connectivity >> issue >> during the install. > For some reason your engine wasn't able to deploy your hosts but the > SSH > session this time was established. > 2015-03-09 13:05:58,514 ERROR > [org.ovirt.engine.core.bll.InstallVdsInternalCommand] > (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed > for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: > java.io.IOException: Command returned failure code 1 during SSH > session > 'root@xion2.smartcity.net' > > Can you please attach host-deploy logs from the engine VM? OK, attached.
Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was "Network is unreachable". Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed.
Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy? I can give you the network configuration after host-deploy, at least for
On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: the host/Node. The engine won't start for me this morning, after I shut down the host for the night.
In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary.
Here's the host config after the failed host-deploy:
Host/Node:
# ip route 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 You are missing a default gateway and so the issue. Are you sure that it was properly configured before trying to deploy that host?
It should have been, it was a fresh OS install. So I'm starting again, and keeping careful records of my network config.
Here is my initial network config of my host/node, immediately following a new OS install:
% ip route default via 172.16.0.1 dev p3p1 proto static metric 1024 172.16.0.0/16 dev p3p1 proto kernel scope link src 172.16.0.58
% ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global p3p1 valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff
After the VM is first created, the host/node config is:
# ip route default via 172.16.0.1 dev ovirtmgmt 169.254.0.0/16 dev ovirtmgmt scope link metric 1006 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 4: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 92:cb:9d:97:18:36 brd ff:ff:ff:ff:ff:ff 5: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 9a:bc:29:52:82:38 brd ff:ff:ff:ff:ff:ff 6: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN group default qlen 500 link/ether fe:16:3e:16:a4:37 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe16:a437/64 scope link valid_lft forever preferred_lft forever
At this point, I was already seeing a problem on the host/node. I remembered that a newer version of sos package is delivered from the ovirt repositories. So I tried to do a "yum update" on my host, and got a similar problem:
% sudo yum update [sudo] password for rad: Loaded plugins: langpacks, refresh-packagekit Resolving Dependencies --> Running transaction check ---> Package sos.noarch 0:3.1-1.fc20 will be updated ---> Package sos.noarch 0:3.2-0.2.fc20.ovirt will be an update --> Finished Dependency Resolution
Dependencies Resolved
================================================================================================================ Package Arch Version Repository Size ================================================================================================================ Updating: sos noarch 3.2-0.2.fc20.ovirt ovirt-3.5 292 k
Transaction Summary ================================================================================================================ Upgrade 1 Package
Total download size: 292 k Is this ok [y/d/N]: y Downloading packages: No Presto metadata available for ovirt-3.5 sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://www.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-...: [Errno 14] curl#6 - "Could not resolve host: www.gtlib.gatech.edu" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED ftp://ftp.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: ftp.gtlib.gatech.edu" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://resources.ovirt.org/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ov...: [Errno 14] curl#6 - "Could not resolve host: resources.ovirt.org" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://ftp.snt.utwente.nl/pub/software/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3...: [Errno 14] curl#6 - "Could not resolve host: ftp.snt.utwente.nl" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://ftp.nluug.nl/os/Linux/virtual/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2...: [Errno 14] curl#6 - "Could not resolve host: ftp.nluug.nl" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://mirror.linux.duke.edu/ovirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2...: [Errno 14] curl#6 - "Could not resolve host: mirror.linux.duke.edu" Trying other mirror.
Error downloading packages: sos-3.2-0.2.fc20.ovirt.noarch: [Errno 256] No more mirrors to try.
This was similar to my previous failures. I took a look, and the problem was that /etc/resolv.conf had no nameservers, and the /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt file contained no entries for DNS1 or DOMAIN.
So, it appears that when hosted-engine set up my bridged network, it neglected to carry over the DNS configuration necessary to the bridge.
Unfortunately you find a know bug: VDSM doesn't report static DNS (DNS1 from /etc/sysconfig/network-scripts/ifcfg-ethX) and so we are going to loose them simply deploying the host: https://bugzilla.redhat.com/show_bug.cgi?id=1160667 https://bugzilla.redhat.com/show_bug.cgi?id=1160423 We are going to fix it for 3.6; thanks for reporting.
Note that I am using *static* network configuration, rather than DHCP. During installation of the OS I am setting up the network configuration as Manual. Perhaps the hosted-engine script is not properly prepared to deal with that?
I went ahead and modified the ifcfg-ovirtmgmt network script (for the next service restart/boot) and resolv.conf (I was afraid to restart the network in the middle of hosted-engine execution since I don't know what might already be connected to the engine). This time it got further, but ultimately it still failed at the very end:
Manually fixing /etc/resolv.conf is a valid workaroud.
[ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': Error acquiring VM status [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150310140028.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination
At that point, neither the ovirt-ha-broker or ovirt-ha-agent services were running.
Note there was no significant pause after it said "The system will wait until the VM is down".
After the script completed, I shut down the VM, and manually started the ha services, and the VM came up. I could login to the Administration Portal, and finally see my HostedEngine VM. :-)
I seem to be in a bad state however: The Data Center has *no* storage domains attached. I'm not sure what else might need cleaning up. Any assistance appreciated.
No, it's right: hosted engine storage domain is a special one and is currently not reported by the engine cause you cannot use it for other VMs. Simply add another storage domain and, after all, you are done.
-Bob
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever
The only unusual thing about my setup that I can think of, from the network perspective, is that my physical host has a wireless interface, which I've not configured. Could it be confusing hosted-engine --deploy?
-Bob

This is a multi-part message in MIME format. --------------010402020902050501080400 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit For the record, once I added a new storage domain the Data center came up. So in the end, this seems to have been due to known bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1160667 https://bugzilla.redhat.com/show_bug.cgi?id=1160423 Effectively, for hosts with static/manual IP addressing (i.e. not DHCP), the DNS and default route information are not set up correctly by hosted-engine-setup. I'm not sure why that's not considered a higher priority bug (e.g. blocker for 3.5.2?) since I believe the most typical configuration for servers is static IP addressing. All seems to be working now. Many thanks to Simone for the invaluable assistance. -Bob On Mar 10, 2015 2:29 PM, "Bob Doolittle" <bob@doolittle.us.com <mailto:bob@doolittle.us.com>> wrote:
On 03/10/2015 10:20 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com <mailto:bob@doolittle.us.com>> To: "Simone Tiraboschi" <stirabos@redhat.com <mailto:stirabos@redhat.com>> Cc: "users-ovirt" <users@ovirt.org <mailto:users@ovirt.org>> Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/10/2015 04:58 AM, Simone Tiraboschi wrote:
----- Original Message -----
From: "Bob Doolittle" <bob@doolittle.us.com <mailto:bob@doolittle.us.com>> To: "Simone Tiraboschi" <stirabos@redhat.com <mailto:stirabos@redhat.com>> Cc: "users-ovirt" <users@ovirt.org <mailto:users@ovirt.org>> Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/09/2015 02:47 PM, Bob Doolittle wrote:
Resending with CC to list (and an update).
On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: > > ----- Original Message ----- >> >> From: "Bob Doolittle" <bob@doolittle.us.com <mailto:bob@doolittle.us.com>> >> To: "Simone Tiraboschi" <stirabos@redhat.com <mailto:stirabos@redhat.com>> >> Cc: "users-ovirt" <users@ovirt.org <mailto:users@ovirt.org>> >> Sent: Monday, March 9, 2015 6:26:30 PM >> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 >> on >> F20 (Cannot add the host to cluster ... SSH >> has failed) >>
...
>> >> OK, I've started over. Simply removing the storage domain was >> insufficient, >> the hosted-engine deploy failed when it found the HA and Broker >> services >> already configured. I decided to just start over fresh starting with >> re-installing the OS on my host. >> >> I can't deploy DNS at the moment, so I have to simply replicate >> /etc/hosts >> files on my host/engine. I did that this time, but have run into a new >> problem: >> >> [ INFO ] Engine replied: DB Up!Welcome to Health Status! >> Enter the name of the cluster to which you want to add the >> host >> (Default) [Default]: >> [ INFO ] Waiting for the host to become operational in the engine. >> This >> may >> take several minutes... >> [ ERROR ] The VDSM host was found in a failed state. Please check >> engine >> and >> bootstrap installation logs. >> [ ERROR ] Unable to add ovirt-vm to the manager >> Please shutdown the VM allowing the system to launch it as a >> monitored service. >> The system will wait until the VM is down. >> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection >> refused >> [ INFO ] Stage: Clean up >> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection >> refused >> >> >> I've attached my engine log and the ovirt-hosted-engine-setup log. I >> think I >> had an issue with resolving external hostnames, or else a connectivity >> issue >> during the install. > > For some reason your engine wasn't able to deploy your hosts but the SSH > session this time was established. > 2015-03-09 13:05:58,514 ERROR > [org.ovirt.engine.core.bll.InstallVdsInternalCommand] > (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed > for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: > java.io.IOException: Command returned failure code 1 during SSH session > 'root@xion2.smartcity.net <mailto:root@xion2.smartcity.net>' > > Can you please attach host-deploy logs from the engine VM?
OK, attached.
Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again?
I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was "Network is unreachable". Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed.
Here's my latest host-deploy log from the engine. I'd appreciate any clues.
It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy?
I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night.
In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary.
Here's the host config after the failed host-deploy:
Host/Node:
# ip route 169.254.0.0/16 <http://169.254.0.0/16> dev ovirtmgmt scope link metric 1007 172.16.0.0/16 <http://172.16.0.0/16> dev ovirtmgmt proto kernel scope link src 172.16.0.58
You are missing a default gateway and so the issue. Are you sure that it was properly configured before trying to deploy that host?
It should have been, it was a fresh OS install. So I'm starting again, and keeping careful records of my network config.
Here is my initial network config of my host/node, immediately following a new OS install:
% ip route default via 172.16.0.1 dev p3p1 proto static metric 1024 172.16.0.0/16 <http://172.16.0.0/16> dev p3p1 proto kernel scope link src 172.16.0.58
% ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 <http://127.0.0.1/8> scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 <http://172.16.0.58/16> brd 172.16.255.255 scope global p3p1 valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff
After the VM is first created, the host/node config is:
# ip route default via 172.16.0.1 dev ovirtmgmt 169.254.0.0/16 <http://169.254.0.0/16> dev ovirtmgmt scope link metric 1006 172.16.0.0/16 <http://172.16.0.0/16> dev ovirtmgmt proto kernel scope link src 172.16.0.58
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 <http://127.0.0.1/8> scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 4: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 92:cb:9d:97:18:36 brd ff:ff:ff:ff:ff:ff 5: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 9a:bc:29:52:82:38 brd ff:ff:ff:ff:ff:ff 6: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 <http://172.16.0.58/16> brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN group default qlen 500 link/ether fe:16:3e:16:a4:37 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe16:a437/64 scope link valid_lft forever preferred_lft forever
At this point, I was already seeing a problem on the host/node. I remembered that a newer version of sos package is delivered from the ovirt repositories. So I tried to do a "yum update" on my host, and got a similar problem:
% sudo yum update [sudo] password for rad: Loaded plugins: langpacks, refresh-packagekit Resolving Dependencies --> Running transaction check ---> Package sos.noarch 0:3.1-1.fc20 will be updated ---> Package sos.noarch 0:3.2-0.2.fc20.ovirt will be an update --> Finished Dependency Resolution
Dependencies Resolved
================================================================================================================ Package Arch Version Repository Size ================================================================================================================ Updating: sos noarch 3.2-0.2.fc20.ovirt ovirt-3.5 292 k
Transaction Summary ================================================================================================================ Upgrade 1 Package
Total download size: 292 k Is this ok [y/d/N]: y Downloading packages: No Presto metadata available for ovirt-3.5 sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://www.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-...: [Errno 14] curl#6 - "Could not resolve host: www.gtlib.gatech.edu <http://www.gtlib.gatech.edu>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED ftp://ftp.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: [Errno 14] curl#6 - "Could not resolve host: ftp.gtlib.gatech.edu <http://ftp.gtlib.gatech.edu>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://resources.ovirt.org/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ov...: [Errno 14] curl#6 - "Could not resolve host: resources.ovirt.org <http://resources.ovirt.org>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://ftp.snt.utwente.nl/pub/software/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3...: [Errno 14] curl#6 - "Could not resolve host: ftp.snt.utwente.nl <http://ftp.snt.utwente.nl>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://ftp.nluug.nl/os/Linux/virtual/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2...: [Errno 14] curl#6 - "Could not resolve host: ftp.nluug.nl <http://ftp.nluug.nl>" Trying other mirror. sos-3.2-0.2.fc20.ovirt.noarch. FAILED http://mirror.linux.duke.edu/ovirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2...: [Errno 14] curl#6 - "Could not resolve host: mirror.linux.duke.edu <http://mirror.linux.duke.edu>" Trying other mirror.
Error downloading packages: sos-3.2-0.2.fc20.ovirt.noarch: [Errno 256] No more mirrors to try.
This was similar to my previous failures. I took a look, and the problem was that /etc/resolv.conf had no nameservers, and the /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt file contained no entries for DNS1 or DOMAIN.
So, it appears that when hosted-engine set up my bridged network, it neglected to carry over the DNS configuration necessary to the bridge.
Note that I am using *static* network configuration, rather than DHCP. During installation of the OS I am setting up the network configuration as Manual. Perhaps the hosted-engine script is not properly prepared to deal with that?
I went ahead and modified the ifcfg-ovirtmgmt network script (for the next service restart/boot) and resolv.conf (I was afraid to restart the network in the middle of hosted-engine execution since I don't know what might already be connected to the engine). This time it got further, but ultimately it still failed at the very end:
[ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': Error acquiring VM status [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150310140028.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination
At that point, neither the ovirt-ha-broker or ovirt-ha-agent services were running.
Note there was no significant pause after it said "The system will wait until the VM is down".
After the script completed, I shut down the VM, and manually started the ha services, and the VM came up. I could login to the Administration Portal, and finally see my HostedEngine VM. :-)
I seem to be in a bad state however: The Data Center has no storage domains attached. I'm not sure what else might need cleaning up. Any assistance appreciated.
-Bob
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 <http://127.0.0.1/8> scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet 172.16.0.58/16 <http://172.16.0.58/16> brd 172.16.255.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever
The only unusual thing about my setup that I can think of, from the network perspective, is that my physical host has a wireless interface, which I've not configured. Could it be confusing hosted-engine --deploy?
-Bob
--------------010402020902050501080400 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"content-type" content=3D"text/html; charset=3Dutf= -8"> </head> <body bgcolor=3D"#FFFFFF" text=3D"#000000"> <p dir=3D"ltr">For the record, once I added a new storage domain the Data center came up.</p> So in the end, this seems to have been due to known bugs:<br> <pre wrap=3D""><a class=3D"moz-txt-link-freetext" href=3D"https://bug= zilla.redhat.com/show_bug.cgi?id=3D1160667">https://bugzilla.redhat.com/s= how_bug.cgi?id=3D1160667</a> <a class=3D"moz-txt-link-freetext" href=3D"https://bugzilla.redhat.com/sh= ow_bug.cgi?id=3D1160423">https://bugzilla.redhat.com/show_bug.cgi?id=3D11= 60423</a></pre> <br> Effectively, for hosts with static/manual IP addressing (i.e. not DHCP), the DNS and default route information are not set up correctly by hosted-engine-setup. I'm not sure why that's not considered a higher priority bug (e.g. blocker for 3.5.2?) since I believe the most typical configuration for servers is static IP addressing.<br> <p dir=3D"ltr">All seems to be working now. Many thanks to Simone for the invaluable assistance.<br> </p> <p dir=3D"ltr">-Bob</p> <p dir=3D"ltr"> On Mar 10, 2015 2:29 PM, "Bob Doolittle" <<a href=3D"mailto:bob@doolittle.us.com">bob@doolittle.us.com</a>> wrote:<br> ><br> ><br> > On 03/10/2015 10:20 AM, Simone Tiraboschi wrote:<br> >><br> >><br> >> ----- Original Message -----<br> >>><br> >>> From: "Bob Doolittle" <<a href=3D"mailto:bob@doolittle.us.com">bob@doolittle.us.com</a>>= <br> >>> To: "Simone Tiraboschi" <<a href=3D"mailto:stirabos@redhat.com">stirabos@redhat.com</a>><b= r> >>> Cc: "users-ovirt" <<a href=3D"mailto:users@ovirt.org">users@ovirt.org</a>><br> >>> Sent: Tuesday, March 10, 2015 2:40:13 PM<br> >>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed<br> >>> state)<br> >>><br> >>><br> >>> On 03/10/2015 04:58 AM, Simone Tiraboschi wrote:<br> >>>><br> >>>> ----- Original Message -----<br> >>>>><br> >>>>> From: "Bob Doolittle" <<a href=3D"mailto:bob@doolittle.us.com">bob@doolittle.us.com</a>>= <br> >>>>> To: "Simone Tiraboschi" <<a href=3D"mailto:stirabos@redhat.com">stirabos@redhat.com</a>><b= r> >>>>> Cc: "users-ovirt" <<a href=3D"mailto:users@ovirt.org">users@ovirt.org</a>><br> >>>>> Sent: Monday, March 9, 2015 11:48:03 PM<br> >>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on<br> >>>>> F20 (The VDSM host was found in a failed<br> >>>>> state)<br> >>>>><br> >>>>><br> >>>>> On 03/09/2015 02:47 PM, Bob Doolittle wrote:<b= r> >>>>>><br> >>>>>> Resending with CC to list (and an update).<br> >>>>>><br> >>>>>> On 03/09/2015 01:40 PM, Simone Tiraboschi wrote:<br> >>>>>>><br> >>>>>>> ----- Original Message -----<br> >>>>>>>><br> >>>>>>>> From: "Bob Doolittle" <<a href=3D"mailto:bob@doolittle.us.com">bob@doolittle.us.com</a>>= <br> >>>>>>>> To: "Simone Tiraboschi" <<a href=3D"mailto:stirabos@redhat.com">stirabos@redhat.com</a>><b= r> >>>>>>>> Cc: "users-ovirt" <<a href=3D"mailto:users@ovirt.org">users@ovirt.org</a>><br> >>>>>>>> Sent: Monday, March 9, 2015 6:26:30 PM<br> >>>>>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1<br> >>>>>>>> on<br> >>>>>>>> F20 (Cannot add the host to cluster ... SSH<br> >>>>>>>> has failed)<br> >>>>>>>><br> >>> ...<br> >>>>>>>><br> >>>>>>>> OK, I've started over. Simply removing the storage domain was<br> >>>>>>>> insufficient,<br> >>>>>>>> the hosted-engine deploy failed when it found the HA and Broker<br> >>>>>>>> services<br> >>>>>>>> already configured. I decided to just start over fresh starting with<br> >>>>>>>> re-installing the OS on my host.<b= r> >>>>>>>><br> >>>>>>>> I can't deploy DNS at the moment, so I have to simply replicate<br> >>>>>>>> /etc/hosts<br> >>>>>>>> files on my host/engine. I did that this time, but have run into a new<br> >>>>>>>> problem:<br> >>>>>>>><br> >>>>>>>> [ INFO=C2=A0 ] Engine replied: DB Up!Welcome to Health Status!<br> >>>>>>>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 Enter the name of the cluster to which you want to add the<br> >>>>>>>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 host<br> >>>>>>>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 (Default) [Default]:<br> >>>>>>>> [ INFO=C2=A0 ] Waiting for the hos= t to become operational in the engine.<br> >>>>>>>> This<br> >>>>>>>> may<br> >>>>>>>> take several minutes...<br> >>>>>>>> [ ERROR ] The VDSM host was found in a failed state. Please check<br> >>>>>>>> engine<br> >>>>>>>> and<br> >>>>>>>> bootstrap installation logs.<br> >>>>>>>> [ ERROR ] Unable to add ovirt-vm to the manager<br> >>>>>>>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 Please shutdown the VM allowing the system to launch it as a<br> >>>>>>>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 monitored service.<br> >>>>>>>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 The system will wait until the VM is down.<br> >>>>>>>> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection<br> >>>>>>>> refused<br> >>>>>>>> [ INFO=C2=A0 ] Stage: Clean up<br> >>>>>>>> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection<br> >>>>>>>> refused<br> >>>>>>>><br> >>>>>>>><br> >>>>>>>> I've attached my engine log and the ovirt-hosted-engine-setup log. I<br> >>>>>>>> think I<br> >>>>>>>> had an issue with resolving external hostnames, or else a connectivity<br> >>>>>>>> issue<br> >>>>>>>> during the install.<br> >>>>>>><br> >>>>>>> For some reason your engine wasn't able to deploy your hosts but the SSH<br> >>>>>>> session this time was established.<br> >>>>>>> 2015-03-09 13:05:58,514 ERROR<br> >>>>>>> [org.ovirt.engine.core.bll.InstallVdsInternalCommand]<br> >>>>>>> (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed<br> >>>>>>> for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.:<br> >>>>>>> java.io.IOException: Command returned failure code 1 during SSH session<br> >>>>>>> '<a href=3D"mailto:root@xion2.smartcity.net">root@xion2.smartcity.net= </a>'<br> >>>>>>><br> >>>>>>> Can you please attach host-deploy logs from the engine VM?<br> >>>>>><br> >>>>>> OK, attached.<br> >>>>>><br> >>>>>> Like I said, it looks to me like a name-resolution issue during the yum<br> >>>>>> update on the engine. I think I've fixed that, but do you have a better<br> >>>>>> suggestion for cleaning up and re-deploying other than installing the OS<br> >>>>>> on my host and starting all over again?<br=
172.16.0.58/16</a> brd 172.16.255.255 scope global p3p1<br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet6 fe80::baca:3aff:fe79:2212/64 sco=
172.16.0.58/16</a> brd 172.16.255.255 scope global ovirtmgmt<br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet6 fe80::baca:3aff:fe79:2212/64 sco=
>>>>><br> >>>>> I just finished starting over from scratch, starting with OS installation<br> >>>>> on<br> >>>>> my host/node, and wound up with a very similar problem - the engine<br> >>>>> couldn't<br> >>>>> reach the hosts during the yum operation. But this time the error was<br> >>>>> "Network is unreachable". Which is weird, because I can ssh into the<br> >>>>> engine<br> >>>>> and ping many of those hosts, after the operation has failed.<br> >>>>><br> >>>>> Here's my latest host-deploy log from the engine. I'd appreciate any<br> >>>>> clues.<br> >>>><br> >>>> It seams that now your host is able to resolve that addresses but it's not<br> >>>> able to connect over http.<br> >>>> On your hosts some of them resolves as IPv6 addresses; can you please try<br> >>>> to use curl to get one of the file that it wasn't able to fetch?<br> >>>> Can you please check your network configuration before and after<br> >>>> host-deploy?<br> >>><br> >>> I can give you the network configuration after host-deploy, at least for the<br> >>> host/Node. The engine won't start for me this morning, after I shut down the<br> >>> host for the night.<br> >>><br> >>> In order to give you the config before host-deploy (or, apparently for the<br> >>> engine), I'll have to re-install the OS on the host and start again from<br> >>> scratch. Obviously I'd rather not do that unless absolutely necessary.<br> >>><br> >>> Here's the host config after the failed host-deploy:<b= r> >>><br> >>> Host/Node:<br> >>><br> >>> # ip route<br> >>> <a href=3D"http://169.254.0.0/16">169.254.0.0/16</a> dev ovirtmgmt=C2=A0 scope link=C2=A0 metric 1007<br> >>> <a href=3D"http://172.16.0.0/16">172.16.0.0/16</a> dev ovirtmgmt=C2=A0 proto kernel=C2=A0 scope link=C2=A0 src 172.16.0.58= <br> >><br> >> You are missing a default gateway and so the issue.<br> >> Are you sure that it was properly configured before trying to deploy that host?<br> ><br> ><br> > It should have been, it was a fresh OS install. So I'm starting again, and keeping careful records of my network config.<b= r> ><br> > Here is my initial network config of my host/node, immediately following a new OS install:<br> ><br> > % ip route<br> > default via 172.16.0.1 dev p3p1=C2=A0 proto static=C2=A0 metri= c 1024 <br> > <a href=3D"http://172.16.0.0/16">172.16.0.0/16</a> dev p3p1=C2= =A0 proto kernel=C2=A0 scope link=C2=A0 src 172.16.0.58 <br> ><br> > % ip addr<br> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default <br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/loopback 00:00:00:00:00:00 brd 00= :00:00:00:00:00<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet <a href=3D"http://127.0.0.1/8">12= 7.0.0.1/8</a> scope host lo<br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet6 ::1/128 scope host <br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> > 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000<br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/ether b8:ca:3a:79:22:12 brd ff:ff= :ff:ff:ff:ff<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet <a href=3D"http://172.16.0.58/16"= pe link <br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> > 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000<br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/ether 1c:3e:84:50:8d:c3 brd ff:ff= :ff:ff:ff:ff<br> ><br> ><br> > After the VM is first created, the host/node config is:<br> ><br> > # ip route<br> > default via 172.16.0.1 dev ovirtmgmt <br> > <a href=3D"http://169.254.0.0/16">169.254.0.0/16</a> dev ovirtmgmt=C2=A0 scope link=C2=A0 metric 1006 <br> > <a href=3D"http://172.16.0.0/16">172.16.0.0/16</a> dev ovirtmgmt=C2=A0 proto kernel=C2=A0 scope link=C2=A0 src 172.16.0.58= <br> ><br> > # ip addr <br> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default <br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/loopback 00:00:00:00:00:00 brd 00= :00:00:00:00:00<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet <a href=3D"http://127.0.0.1/8">12= 7.0.0.1/8</a> scope host lo<br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet6 ::1/128 scope host <br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> > 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000<= br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/ether b8:ca:3a:79:22:12 brd ff:ff= :ff:ff:ff:ff<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet6 fe80::baca:3aff:fe79:2212/64 sco= pe link <br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> > 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000<br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/ether 1c:3e:84:50:8d:c3 brd ff:ff= :ff:ff:ff:ff<br> > 4: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default <br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/ether 92:cb:9d:97:18:36 brd ff:ff= :ff:ff:ff:ff<br> > 5: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default <br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/ether 9a:bc:29:52:82:38 brd ff:ff= :ff:ff:ff:ff<br> > 6: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default <br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/ether b8:ca:3a:79:22:12 brd ff:ff= :ff:ff:ff:ff<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet <a href=3D"http://172.16.0.58/16"= pe link <br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> > 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovirtmgmt state UNKNOWN group default qlen 500<br> >=C2=A0=C2=A0=C2=A0=C2=A0 link/ether fe:16:3e:16:a4:37 brd ff:ff= :ff:ff:ff:ff<br> >=C2=A0=C2=A0=C2=A0=C2=A0 inet6 fe80::fc16:3eff:fe16:a437/64 sco= pe link <br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft forever pr= eferred_lft forever<br> ><br> ><br> > At this point, I was already seeing a problem on the host/node. I remembered that a newer version of sos package is delivered from the ovirt repositories. So I tried to do a "yum update" on my host, and got a similar problem:<br> ><br> > % sudo yum update<br> > [sudo] password for rad: <br> > Loaded plugins: langpacks, refresh-packagekit<br> > Resolving Dependencies<br> > --> Running transaction check<br> > ---> Package sos.noarch 0:3.1-1.fc20 will be updated<br> > ---> Package sos.noarch 0:3.2-0.2.fc20.ovirt will be an update<br> > --> Finished Dependency Resolution<br> ><br> > Dependencies Resolved<br> ><br> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br> >=C2=A0 Package=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 Arch=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Version=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Repository=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Size<br> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br> > Updating:<br> >=C2=A0 sos=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 noarch=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.2-0.2.fc20.ovirt=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ovirt-3.5=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 292 k<br> ><br> > Transaction Summary<br> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br> > Upgrade=C2=A0 1 Package<br> ><br> > Total download size: 292 k<br> > Is this ok [y/d/N]: y<br> > Downloading packages:<br> > No Presto metadata available for ovirt-3.5<br> > sos-3.2-0.2.fc20.ovirt.noarch. FAILED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <br> > <a href=3D"http://www.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noar= ch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://www.gtlib.gatech.edu/pub/oVi= rt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a href=3D"http://www.gtlib.gatech.edu">www.gtlib.gatech.edu</a>"<br=
> Trying other mirror.<br> > sos-3.2-0.2.fc20.ovirt.noarch. FAILED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <br> > <a href=3D"ftp://ftp.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarc= h/sos-3.2-0.2.fc20.ovirt.noarch.rpm">ftp://ftp.gtlib.gatech.edu/pub/oVirt= /pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a href=3D"http://ftp.gtlib.gatech.edu">ftp.gtlib.gatech.edu</a>"<br=
> Trying other mirror.<br> > sos-3.2-0.2.fc20.ovirt.noarch. FAILED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <br> > <a href=3D"http://resources.ovirt.org/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-= 0.2.fc20.ovirt.noarch.rpm">http://resources.ovirt.org/pub/ovirt-3.5/rpm/f= c20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a href=3D"http://resources.ovirt.org">resources.ovirt.org</a>"<br> > Trying other mirror.<br> > sos-3.2-0.2.fc20.ovirt.noarch. FAILED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <br> > <a href=3D"http://ftp.snt.utwente.nl/pub/software/ovirt/ovirt-3.5/rpm/fc20/n= oarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://ftp.snt.utwente.nl/pub/so= ftware/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm<= /a>: [Errno 14] curl#6 - "Could not resolve host: <a href=3D"http://ftp.snt.utwente.nl">ftp.snt.utwente.nl</a>"<br> > Trying other mirror.<br> > sos-3.2-0.2.fc20.ovirt.noarch. FAILED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <br> > <a href=3D"http://ftp.nluug.nl/os/Linux/virtual/ovirt/ovirt-3.5/rpm/fc20/noa= rch/sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://ftp.nluug.nl/os/Linux/virtu= al/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a href=3D"http://ftp.nluug.nl">ftp.nluug.nl</a>"<br> > Trying other mirror.<br> > sos-3.2-0.2.fc20.ovirt.noarch. FAILED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <br> > <a href=3D"http://mirror.linux.duke.edu/ovirt/pub/ovirt-3.5/rpm/fc20/noarch/= sos-3.2-0.2.fc20.ovirt.noarch.rpm">http://mirror.linux.duke.edu/ovirt/pub= /ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm</a>: [Errno 14] curl#6 - "Could not resolve host: <a href=3D"http://mirror.linux.duke.edu">mirror.linux.duke.edu</a>"<= br> > Trying other mirror.<br> ><br> ><br> > Error downloading packages:<br> >=C2=A0=C2=A0 sos-3.2-0.2.fc20.ovirt.noarch: [Errno 256] No more= mirrors to try.<br> ><br> ><br> > This was similar to my previous failures. I took a look, and the problem was that /etc/resolv.conf had no nameservers, and the /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt file contained no entries for DNS1 or DOMAIN.<br> ><br> > So, it appears that when hosted-engine set up my bridged network, it neglected to carry over the DNS configuration necessary to the bridge.<br> ><br> > Note that I am using *static* network configuration, rather than DHCP. During installation of the OS I am setting up the network configuration as Manual. Perhaps the hosted-engine script is not properly prepared to deal with that?<br> ><br> > I went ahead and modified the ifcfg-ovirtmgmt network script (for the next service restart/boot) and resolv.conf (I was afraid to restart the network in the middle of hosted-engine execution since I don't know what might already be connected to the engine). This time it got further, but ultimately it still failed at the very end:<br> ><br> > [ INFO=C2=A0 ] Waiting for the host to become operational in t= he engine. This may take several minutes...<br> > [ INFO=C2=A0 ] Still waiting for VDSM host to become operational...<br> > [ INFO=C2=A0 ] The VDSM Host is now operational<br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Pl= ease shutdown the VM allowing the system to launch it as a monitored service.<br> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Th= e system will wait until the VM is down.<br> > [ ERROR ] Failed to execute stage 'Closing up': Error acquiring VM status<br> > [ INFO=C2=A0 ] Stage: Clean up<br> > [ INFO=C2=A0 ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150310140028.= conf'<br> > [ INFO=C2=A0 ] Stage: Pre-termination<br> > [ INFO=C2=A0 ] Stage: Termination<br> ><br> ><br> > At that point, neither the ovirt-ha-broker or ovirt-ha-agent services were running.<br> ><br> > Note there was no significant pause after it said "The system will wait until the VM is down".<br> ><br> > After the script completed, I shut down the VM, and manually started the ha services, and the VM came up. I could login to the Administration Portal, and finally see my HostedEngine VM. :-)<br> ><br> > I seem to be in a bad state however: The Data Center has no storage domains attached. I'm not sure what else might need cleaning up. Any assistance appreciated.</p> ><br> <p dir=3D"ltr"> > -Bob<br> ><br> ><br> ><br> >>> # ip addr<br> >>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group<br> >>> default<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 link/loopback 00:00:00:00:00:0= 0 brd 00:00:00:00:00:00<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 inet <a href=3D"http://127.0.0= .1/8">127.0.0.1/8</a> scope host lo<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft fo= rever preferred_lft forever<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 inet6 ::1/128 scope host<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft fo= rever preferred_lft forever<br> >>> 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master<br> >>> ovirtmgmt state UP group default qlen 1000<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 link/ether b8:ca:3a:79:22:12 b= rd ff:ff:ff:ff:ff:ff<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 inet6 fe80::baca:3aff:fe79:221= 2/64 scope link<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft fo= rever preferred_lft forever<br> >>> 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue<br> >>> state DOWN group default<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 link/ether 56:56:f7:cf:73:27 b= rd ff:ff:ff:ff:ff:ff<br> >>> 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN<br> >>> group default qlen 1000<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 link/ether 1c:3e:84:50:8d:c3 b= rd ff:ff:ff:ff:ff:ff<br> >>> 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group<br> >>> default<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 link/ether 22:a1:01:9e:30:71 b= rd ff:ff:ff:ff:ff:ff<br> >>> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state<br> >>> UP group default<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 link/ether b8:ca:3a:79:22:12 b= rd ff:ff:ff:ff:ff:ff<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 inet <a href=3D"http://172.16.= 0.58/16">172.16.0.58/16</a> brd 172.16.255.255 scope global ovirtmgmt<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft fo= rever preferred_lft forever<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0 inet6 fe80::baca:3aff:fe79:221= 2/64 scope link<br> >>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 valid_lft fo= rever preferred_lft forever<br> >>><br> >>><br> >>> The only unusual thing about my setup that I can think of, from the network<br> >>> perspective, is that my physical host has a wireless interface, which I've<br> >>> not configured. Could it be confusing hosted-engine --deploy?<br> >>><br> >>> -Bob<br> >>><br> >>><br> ><br> </p> </body> </html> --------------010402020902050501080400--

On 11/03/15 16:37, Bob Doolittle wrote:
For the record, once I added a new storage domain the Data center came up.
So in the end, this seems to have been due to known bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1160667 https://bugzilla.redhat.com/show_bug.cgi?id=1160423
Effectively, for hosts with static/manual IP addressing (i.e. not DHCP), the DNS and default route information are not set up correctly by hosted-engine-setup. I'm not sure why that's not considered a higher priority bug (e.g. blocker for 3.5.2?) since I believe the most typical configuration for servers is static IP addressing.
+1 -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

I've been having this problem for a while, but it's now impossible to get around as all my browsers have updated to prevent by passing HSTS. I must be doing something wrong as I don't see anyone else complaining about this in regards to Ovirt... I'm running a Ovirt 3.5.4 All In One setup. No matter what browser I use, I can no longer get to the GUI because it appears 3.5? started enforcing HTTP Strict Transport Security and since Ovirt uses self signed certs, the browsers are refusing to let me through. It appears newer browsers no longer allow you to click through unsigned certs of HSTS is enabled. What do I need to do to get around this? What am I missing since no one else seems to be having this problem. Thanks...

On 09/03/15 17:53, Simone Tiraboschi wrote:
it gathers the engine SSH public key from http://{enginefqdn}/engine.ssh.key.txt and it stores it under ~root/.ssh/authenticated_keys to make the engine able to add the host without knowing the host root password.
Sorry that I'm getting off topic, but: are you sure this is done via _http_ (without "s")? this should be done via https imho. should I open a BZ for this? -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

----- Original Message -----
From: "Sven Kieske" <s.kieske@mittwald.de> To: users@ovirt.org Sent: Tuesday, March 10, 2015 10:39:36 AM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 09/03/15 17:53, Simone Tiraboschi wrote:
it gathers the engine SSH public key from http://{enginefqdn}/engine.ssh.key.txt and it stores it under ~root/.ssh/authenticated_keys to make the engine able to add the host without knowing the host root password.
Sorry that I'm getting off topic, but:
are you sure this is done via _http_ (without "s")? this should be done via https imho.
Yes, I am.
should I open a BZ for this?
On my opinion no: you just installed the engine and the engine just created its CA. In order to trust an https connection to the engine you have to trust its CA but you still don't know it cause it's a private one and it has been just created on the engine from scratch. Blindly downloading the engine CA cert and blindly trusting it is not that different that simply using http to download the public key: in order to fetch it you don't need to send any password or token and being a public key you don't need to crypt it by definition so you don't need encryption.
-- Mit freundlichen Grüßen / Regards
Sven Kieske
Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 10/03/15 10:53, Simone Tiraboschi wrote:
In order to trust an https connection to the engine you have to trust its CA but you still don't know it cause it's a private one and it has been just created on the engine from scratch.
Can't the setup display the necessary parameters to make sure I trust the right CA when I accept it in my browser? It could even create a consumable file, which I can copy to my workstation and import there.
Blindly downloading the engine CA cert and blindly trusting it is not that different that simply using http to download the public key:
this is correct, but who would do this? of course you need to check if it is the right CA!
in order to fetch it you don't need to send any password or token and being a public key you don't need to crypt it by definition so you don't need encryption.
this is not about keeping the public key secret, but about keeping the channel over which it is transferred secure. so no one can tamper with the key and send you another public key to a different machine. (dns spoofing, arp spoofing etc.) if you don't check the public key and ensure you connect to the correct machine, there is no need for public keys anyway and you could just skip this step. imho this is a security bug. other people would just consider this a hardening. trusting the local network is a security mindset from the 90's. most LANs have to many hosts which you might don't even know. you could also be on some shared foreign network where third party machines from different users can tamper with the network. I have seen user reports who used some leased hardware in offsite data centers to install ovirt, where you can't fully trust all local clients. this should be more secure by default imho. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

----- Original Message -----
From: "Sven Kieske" <s.kieske@mittwald.de> To: "Simone Tiraboschi" <stirabos@redhat.com> Cc: users@ovirt.org Sent: Tuesday, March 10, 2015 11:12:38 AM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 10/03/15 10:53, Simone Tiraboschi wrote:
In order to trust an https connection to the engine you have to trust its CA but you still don't know it cause it's a private one and it has been just created on the engine from scratch.
Can't the setup display the necessary parameters to make sure I trust the right CA when I accept it in my browser? It could even create a consumable file, which I can copy to my workstation and import there.
This is another things: having the user explicitly trusting the CA cert by manually and explicitly checking its fingerprint on both the host is the right solution but is more invasive and a lot of user is already complaining that hosted-engine involves to many steps.
Blindly downloading the engine CA cert and blindly trusting it is not that different that simply using http to download the public key:
this is correct, but who would do this? of course you need to check if it is the right CA!
in order to fetch it you don't need to send any password or token and being a public key you don't need to crypt it by definition so you don't need encryption.
this is not about keeping the public key secret, but about keeping the channel over which it is transferred secure. so no one can tamper with the key and send you another public key to a different machine. (dns spoofing, arp spoofing etc.)
if you don't check the public key and ensure you connect to the correct machine, there is no need for public keys anyway and you could just skip this step.
imho this is a security bug. other people would just consider this a hardening. trusting the local network is a security mindset from the 90's.
No, I didn't said that I trust the network to be secure cause it's a local network. I said another thing, please read it carefully and follow me: 1. in order to trust an https connection you need to trust the CA that signed the cert that the engine host is using. 2. that CA is by default a private CA and so it has just been created on engine VM, so you don't have the CA cert on the host 3. so, to trust the https connection, you need to have/download the CA cert from the engine VM to the host 4. if you just download engine CA via http (https is not more secure at this point cause you are still trusting everything cause you don't have the CA cert) you just moved the issue instead of solving it So the issue is that the CA cert should reach the host in secure way otherwise you are in the same situation: somebody could provide a tampered CA cert and make you trusting a tampered https connection. It's just false security: it simply adds complexity without adding real security. I would be different if we ask to the user to copy and paste the engine CA cert by himself or at least to validate its fingerprint, without that step its really the same.
most LANs have to many hosts which you might don't even know.
you could also be on some shared foreign network where third party machines from different users can tamper with the network. I have seen user reports who used some leased hardware in offsite data centers to install ovirt, where you can't fully trust all local clients.
this should be more secure by default imho.
-- Mit freundlichen Grüßen / Regards
Sven Kieske
Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
participants (4)
-
Blaster
-
Bob Doolittle
-
Simone Tiraboschi
-
Sven Kieske