[ovirt-users] issues deploying hosted-engine on mutiple hosts

Wed Sep 10 08:28:54 EDT 2014

----- Original Message -----
> From: "Itamar Heim" <iheim at redhat.com>
> To: "Doron Fediuck" <dfediuck at redhat.com>, "Thomas Keppler (PEBA)" <thomas.keppler at kit.edu>
> Cc: users at ovirt.org
> Sent: Wednesday, September 10, 2014 1:41:22 PM
> Subject: Re: [ovirt-users] issues deploying hosted-engine on mutiple hosts
> 
> On 09/10/2014 12:57 PM, Doron Fediuck wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Thomas Keppler (PEBA)" <thomas.keppler at kit.edu>
> >> To: users at ovirt.org
> >> Sent: Tuesday, September 9, 2014 11:00:23 AM
> >> Subject: [ovirt-users] issues deploying hosted-engine on mutiple hosts
> >>
> >> Dear oVirt-Team,
> >>
> >> we (as in: our company) currently has a *little* problem regarding your
> >> hosted-engine solution.
> >> First, I want to tell you the steps we did until the errors occured:
> >>
> >> 1.) All four hosts have been prepared with CentOS 7, the EPEL repositories
> >> and a glusterfs-volume.
> >> 2.) The oVIrt 3.5 nightly snapshot was added to each host's yum mirror
> >> list,
> >> a yum upgrade was performed
> >> 3.) Then, we installed the hosted-engine package and we triggered a
> >> hosted-engine --deploy. It stopped there, complaining that there were new
> >> packages available and we should perform an upgrade first, so we did that.
> >> We ran the --deploy process again, resulting in a working engine-vm, but
> >> ending up in an error (all log files of all hosts are attached as a tar,gz
> >> package to this mail) - We completed those steps on Friday, 5th Sept. (As
> >> vmnode1 is dead by now, sadly, we can't provide any logs for this machine
> >> without imense tinkering, but could be provided if you really desire so).
> >> 4.) On Monday, we noticed that the node (xxx-vmnode1), which had the
> >> hosted-engine on it, died due to a hardware failure. Not minding this, we
> >> decided to give our gluster-fs the good 'ol rm -rf in order to get rid of
> >> the previously created files and we moved on with three nodes from there.
> >> 5.) We decided to deploy the engine on xxx-vmnode4 this time, since it
> >> seemed
> >> to be the most stable of the rack. Immediately, an error occured (stating
> >> that /etc/pki/vdsm/certs/cacert.pem couldn't be found) which thanks to
> >> sbonazzo's help in the IRC could be worked around by doing a vdsm config
> >> --force. Running the deploy process again worked fine, resulted in the
> >> same
> >> matter as the first try (see 3rd point) BUT bringing up the stated error
> >> again.
> >> 6.) Now, we tried to add another host (xxx-vmnode3 to our solution in
> >> order
> >> to make the Engine highly available. Thus, working fine until the point of
> >> entering an id for the new node where it complained, that the UUID was
> >> already in use and we couldn't add this node to the cluster - which is
> >> fairly odd, according to sbonazzo as any machine should have its own,
> >> unique
> >> UUID.
> >> 7.) As this host wouldn't work, we decided giving xxx-vmnode2 a shot and
> >> ran
> >> the deploy process on there, which resulted in ultimate failure. It didn't
> >> even get to the steps regarding the path for the resulting VM.
> >>
> >> Because it might help, I probably should give you an overview of our
> >> network
> >> setup:
> >> It is currently set up, so that we have a company-wide WAN and a rack-wide
> >> LAN. The WAN is only there for the VMs to communicate with the outside
> >> world, management and calling the engine is done via the LAN, which can be
> >> accessed through a VPN connection. Therefore, we bridged the engine's
> >> "ovirtmgmt" bridge to the internal LAN connection. Because the FQDN for
> >> the
> >> Engine isn't callable through the DNS, we hacked it into the hosts file on
> >> all nodes prior deploying the hosted-engine package.
> >>
> >> This is where we are and where we come from - the oVirt setup worked
> >> initially, when the engine was still seperated from the nodes. Our bad
> >> luck
> >> with hardware didn't really help, too.
> >> I am really looking forward to hearing from you guys because this project
> >> would be a nice successor to our current VMWare solution, which is
> >> starting
> >> to die.
> >>
> >> Thank you for any time invested into our problems (and probably solutions)
> >> ;)
> >>
> >> --
> >> Best regards
> >> Thomas Keppler
> >>
> >> PS: I've just heard that the hosted-engine is **NOT** (really) compatible
> >> with the hosted-engine. Are there any recommendations on what to do?
> >
> > Hi Thomas,
> > Just to re-cap, the main issue as identified and handled by Sandro was
> > blade servers
> > with supermicro boards using bugged bioses which caused the host-deploy to
> > get the
> > same uuid from all the bioses.
> >
> > The solution provided for the uuid was:
> >
> > # uuidgen >/etc/vdsm/vdsm.id
> >
> > on the hosts, which resolved it.
> >
> > Feel free to ping us if there's anything else.
> >
> > Going forward I'd advice to go with the stable releases rather than with
> > nightly builds.
> 
> is there a bug, a missing warning/detection, etc?
> 
> 

https://bugzilla.redhat.com/show_bug.cgi?id=1139742

Discussed it with Alon and Sandro this morning.
We're going to close it with the relevant steps to fix,
since this is a corner case.