----- Original Message -----
From: "Itamar Heim" <iheim(a)redhat.com>
To: "Doron Fediuck" <dfediuck(a)redhat.com>, "Thomas Keppler
(PEBA)" <thomas.keppler(a)kit.edu>
Cc: users(a)ovirt.org
Sent: Wednesday, September 10, 2014 1:41:22 PM
Subject: Re: [ovirt-users] issues deploying hosted-engine on mutiple hosts
On 09/10/2014 12:57 PM, Doron Fediuck wrote:
>
>
> ----- Original Message -----
>> From: "Thomas Keppler (PEBA)" <thomas.keppler(a)kit.edu>
>> To: users(a)ovirt.org
>> Sent: Tuesday, September 9, 2014 11:00:23 AM
>> Subject: [ovirt-users] issues deploying hosted-engine on mutiple hosts
>>
>> Dear oVirt-Team,
>>
>> we (as in: our company) currently has a *little* problem regarding your
>> hosted-engine solution.
>> First, I want to tell you the steps we did until the errors occured:
>>
>> 1.) All four hosts have been prepared with CentOS 7, the EPEL repositories
>> and a glusterfs-volume.
>> 2.) The oVIrt 3.5 nightly snapshot was added to each host's yum mirror
>> list,
>> a yum upgrade was performed
>> 3.) Then, we installed the hosted-engine package and we triggered a
>> hosted-engine --deploy. It stopped there, complaining that there were new
>> packages available and we should perform an upgrade first, so we did that.
>> We ran the --deploy process again, resulting in a working engine-vm, but
>> ending up in an error (all log files of all hosts are attached as a tar,gz
>> package to this mail) - We completed those steps on Friday, 5th Sept. (As
>> vmnode1 is dead by now, sadly, we can't provide any logs for this machine
>> without imense tinkering, but could be provided if you really desire so).
>> 4.) On Monday, we noticed that the node (xxx-vmnode1), which had the
>> hosted-engine on it, died due to a hardware failure. Not minding this, we
>> decided to give our gluster-fs the good 'ol rm -rf in order to get rid of
>> the previously created files and we moved on with three nodes from there.
>> 5.) We decided to deploy the engine on xxx-vmnode4 this time, since it
>> seemed
>> to be the most stable of the rack. Immediately, an error occured (stating
>> that /etc/pki/vdsm/certs/cacert.pem couldn't be found) which thanks to
>> sbonazzo's help in the IRC could be worked around by doing a vdsm config
>> --force. Running the deploy process again worked fine, resulted in the
>> same
>> matter as the first try (see 3rd point) BUT bringing up the stated error
>> again.
>> 6.) Now, we tried to add another host (xxx-vmnode3 to our solution in
>> order
>> to make the Engine highly available. Thus, working fine until the point of
>> entering an id for the new node where it complained, that the UUID was
>> already in use and we couldn't add this node to the cluster - which is
>> fairly odd, according to sbonazzo as any machine should have its own,
>> unique
>> UUID.
>> 7.) As this host wouldn't work, we decided giving xxx-vmnode2 a shot and
>> ran
>> the deploy process on there, which resulted in ultimate failure. It didn't
>> even get to the steps regarding the path for the resulting VM.
>>
>> Because it might help, I probably should give you an overview of our
>> network
>> setup:
>> It is currently set up, so that we have a company-wide WAN and a rack-wide
>> LAN. The WAN is only there for the VMs to communicate with the outside
>> world, management and calling the engine is done via the LAN, which can be
>> accessed through a VPN connection. Therefore, we bridged the engine's
>> "ovirtmgmt" bridge to the internal LAN connection. Because the FQDN
for
>> the
>> Engine isn't callable through the DNS, we hacked it into the hosts file on
>> all nodes prior deploying the hosted-engine package.
>>
>> This is where we are and where we come from - the oVirt setup worked
>> initially, when the engine was still seperated from the nodes. Our bad
>> luck
>> with hardware didn't really help, too.
>> I am really looking forward to hearing from you guys because this project
>> would be a nice successor to our current VMWare solution, which is
>> starting
>> to die.
>>
>> Thank you for any time invested into our problems (and probably solutions)
>> ;)
>>
>> --
>> Best regards
>> Thomas Keppler
>>
>> PS: I've just heard that the hosted-engine is **NOT** (really) compatible
>> with the hosted-engine. Are there any recommendations on what to do?
>
> Hi Thomas,
> Just to re-cap, the main issue as identified and handled by Sandro was
> blade servers
> with supermicro boards using bugged bioses which caused the host-deploy to
> get the
> same uuid from all the bioses.
>
> The solution provided for the uuid was:
>
> # uuidgen >/etc/vdsm/vdsm.id
>
> on the hosts, which resolved it.
>
> Feel free to ping us if there's anything else.
>
> Going forward I'd advice to go with the stable releases rather than with
> nightly builds.
is there a bug, a missing warning/detection, etc?
https://bugzilla.redhat.com/show_bug.cgi?id=1139742
Discussed it with Alon and Sandro this morning.
We're going to close it with the relevant steps to fix,
since this is a corner case.