[ovirt-users] issues deploying hosted-engine on mutiple hosts

Wed Sep 10 09:57:01 UTC 2014

----- Original Message -----
> From: "Thomas Keppler (PEBA)" <thomas.keppler at kit.edu>
> To: users at ovirt.org
> Sent: Tuesday, September 9, 2014 11:00:23 AM
> Subject: [ovirt-users] issues deploying hosted-engine on mutiple hosts
> 
> Dear oVirt-Team,
> 
> we (as in: our company) currently has a *little* problem regarding your
> hosted-engine solution.
> First, I want to tell you the steps we did until the errors occured:
> 
> 1.) All four hosts have been prepared with CentOS 7, the EPEL repositories
> and a glusterfs-volume.
> 2.) The oVIrt 3.5 nightly snapshot was added to each host's yum mirror list,
> a yum upgrade was performed
> 3.) Then, we installed the hosted-engine package and we triggered a
> hosted-engine --deploy. It stopped there, complaining that there were new
> packages available and we should perform an upgrade first, so we did that.
> We ran the --deploy process again, resulting in a working engine-vm, but
> ending up in an error (all log files of all hosts are attached as a tar,gz
> package to this mail) - We completed those steps on Friday, 5th Sept. (As
> vmnode1 is dead by now, sadly, we can't provide any logs for this machine
> without imense tinkering, but could be provided if you really desire so).
> 4.) On Monday, we noticed that the node (xxx-vmnode1), which had the
> hosted-engine on it, died due to a hardware failure. Not minding this, we
> decided to give our gluster-fs the good 'ol rm -rf in order to get rid of
> the previously created files and we moved on with three nodes from there.
> 5.) We decided to deploy the engine on xxx-vmnode4 this time, since it seemed
> to be the most stable of the rack. Immediately, an error occured (stating
> that /etc/pki/vdsm/certs/cacert.pem couldn't be found) which thanks to
> sbonazzo's help in the IRC could be worked around by doing a vdsm config
> --force. Running the deploy process again worked fine, resulted in the same
> matter as the first try (see 3rd point) BUT bringing up the stated error
> again.
> 6.) Now, we tried to add another host (xxx-vmnode3 to our solution in order
> to make the Engine highly available. Thus, working fine until the point of
> entering an id for the new node where it complained, that the UUID was
> already in use and we couldn't add this node to the cluster - which is
> fairly odd, according to sbonazzo as any machine should have its own, unique
> UUID.
> 7.) As this host wouldn't work, we decided giving xxx-vmnode2 a shot and ran
> the deploy process on there, which resulted in ultimate failure. It didn't
> even get to the steps regarding the path for the resulting VM.
> 
> Because it might help, I probably should give you an overview of our network
> setup:
> It is currently set up, so that we have a company-wide WAN and a rack-wide
> LAN. The WAN is only there for the VMs to communicate with the outside
> world, management and calling the engine is done via the LAN, which can be
> accessed through a VPN connection. Therefore, we bridged the engine's
> "ovirtmgmt" bridge to the internal LAN connection. Because the FQDN for the
> Engine isn't callable through the DNS, we hacked it into the hosts file on
> all nodes prior deploying the hosted-engine package.
> 
> This is where we are and where we come from - the oVirt setup worked
> initially, when the engine was still seperated from the nodes. Our bad luck
> with hardware didn't really help, too.
> I am really looking forward to hearing from you guys because this project
> would be a nice successor to our current VMWare solution, which is starting
> to die.
> 
> Thank you for any time invested into our problems (and probably solutions) ;)
> 
> --
> Best regards
> Thomas Keppler
> 
> PS: I've just heard that the hosted-engine is **NOT** (really) compatible
> with the hosted-engine. Are there any recommendations on what to do?

Hi Thomas,
Just to re-cap, the main issue as identified and handled by Sandro was blade servers
with supermicro boards using bugged bioses which caused the host-deploy to get the
same uuid from all the bioses.

The solution provided for the uuid was:

# uuidgen >/etc/vdsm/vdsm.id

on the hosts, which resolved it.

Feel free to ping us if there's anything else.

Going forward I'd advice to go with the stable releases rather than with nightly builds.
Doron