[ovirt-users] issues deploying hosted-engine on mutiple hosts

Doron Fediuck dfediuck at redhat.com
Wed Sep 10 09:57:01 UTC 2014



----- Original Message -----
> From: "Thomas Keppler (PEBA)" <thomas.keppler at kit.edu>
> To: users at ovirt.org
> Sent: Tuesday, September 9, 2014 11:00:23 AM
> Subject: [ovirt-users] issues deploying hosted-engine on mutiple hosts
> 
> Dear oVirt-Team,
> 
> we (as in: our company) currently has a *little* problem regarding your
> hosted-engine solution.
> First, I want to tell you the steps we did until the errors occured:
> 
> 1.) All four hosts have been prepared with CentOS 7, the EPEL repositories
> and a glusterfs-volume.
> 2.) The oVIrt 3.5 nightly snapshot was added to each host's yum mirror list,
> a yum upgrade was performed
> 3.) Then, we installed the hosted-engine package and we triggered a
> hosted-engine --deploy. It stopped there, complaining that there were new
> packages available and we should perform an upgrade first, so we did that.
> We ran the --deploy process again, resulting in a working engine-vm, but
> ending up in an error (all log files of all hosts are attached as a tar,gz
> package to this mail) - We completed those steps on Friday, 5th Sept. (As
> vmnode1 is dead by now, sadly, we can't provide any logs for this machine
> without imense tinkering, but could be provided if you really desire so).
> 4.) On Monday, we noticed that the node (xxx-vmnode1), which had the
> hosted-engine on it, died due to a hardware failure. Not minding this, we
> decided to give our gluster-fs the good 'ol rm -rf in order to get rid of
> the previously created files and we moved on with three nodes from there.
> 5.) We decided to deploy the engine on xxx-vmnode4 this time, since it seemed
> to be the most stable of the rack. Immediately, an error occured (stating
> that /etc/pki/vdsm/certs/cacert.pem couldn't be found) which thanks to
> sbonazzo's help in the IRC could be worked around by doing a vdsm config
> --force. Running the deploy process again worked fine, resulted in the same
> matter as the first try (see 3rd point) BUT bringing up the stated error
> again.
> 6.) Now, we tried to add another host (xxx-vmnode3 to our solution in order
> to make the Engine highly available. Thus, working fine until the point of
> entering an id for the new node where it complained, that the UUID was
> already in use and we couldn't add this node to the cluster - which is
> fairly odd, according to sbonazzo as any machine should have its own, unique
> UUID.
> 7.) As this host wouldn't work, we decided giving xxx-vmnode2 a shot and ran
> the deploy process on there, which resulted in ultimate failure. It didn't
> even get to the steps regarding the path for the resulting VM.
> 
> Because it might help, I probably should give you an overview of our network
> setup:
> It is currently set up, so that we have a company-wide WAN and a rack-wide
> LAN. The WAN is only there for the VMs to communicate with the outside
> world, management and calling the engine is done via the LAN, which can be
> accessed through a VPN connection. Therefore, we bridged the engine's
> "ovirtmgmt" bridge to the internal LAN connection. Because the FQDN for the
> Engine isn't callable through the DNS, we hacked it into the hosts file on
> all nodes prior deploying the hosted-engine package.
> 
> This is where we are and where we come from - the oVirt setup worked
> initially, when the engine was still seperated from the nodes. Our bad luck
> with hardware didn't really help, too.
> I am really looking forward to hearing from you guys because this project
> would be a nice successor to our current VMWare solution, which is starting
> to die.
> 
> Thank you for any time invested into our problems (and probably solutions) ;)
> 
> --
> Best regards
> Thomas Keppler
> 
> PS: I've just heard that the hosted-engine is **NOT** (really) compatible
> with the hosted-engine. Are there any recommendations on what to do?

Hi Thomas,
Just to re-cap, the main issue as identified and handled by Sandro was blade servers
with supermicro boards using bugged bioses which caused the host-deploy to get the
same uuid from all the bioses.

The solution provided for the uuid was:

# uuidgen >/etc/vdsm/vdsm.id

on the hosts, which resolved it.

Feel free to ping us if there's anything else.

Going forward I'd advice to go with the stable releases rather than with nightly builds.
Doron



More information about the Users mailing list