----- Original Message -----
From: "Thomas Keppler (PEBA)"
<thomas.keppler(a)kit.edu>
To: users(a)ovirt.org
Sent: Tuesday, September 9, 2014 11:00:23 AM
Subject: [ovirt-users] issues deploying hosted-engine on mutiple hosts
Dear oVirt-Team,
we (as in: our company) currently has a *little* problem regarding your
hosted-engine solution.
First, I want to tell you the steps we did until the errors occured:
1.) All four hosts have been prepared with CentOS 7, the EPEL repositories
and a glusterfs-volume.
2.) The oVIrt 3.5 nightly snapshot was added to each host's yum mirror list,
a yum upgrade was performed
3.) Then, we installed the hosted-engine package and we triggered a
hosted-engine --deploy. It stopped there, complaining that there were new
packages available and we should perform an upgrade first, so we did that.
We ran the --deploy process again, resulting in a working engine-vm, but
ending up in an error (all log files of all hosts are attached as a tar,gz
package to this mail) - We completed those steps on Friday, 5th Sept. (As
vmnode1 is dead by now, sadly, we can't provide any logs for this machine
without imense tinkering, but could be provided if you really desire so).
4.) On Monday, we noticed that the node (xxx-vmnode1), which had the
hosted-engine on it, died due to a hardware failure. Not minding this, we
decided to give our gluster-fs the good 'ol rm -rf in order to get rid of
the previously created files and we moved on with three nodes from there.
5.) We decided to deploy the engine on xxx-vmnode4 this time, since it seemed
to be the most stable of the rack. Immediately, an error occured (stating
that /etc/pki/vdsm/certs/cacert.pem couldn't be found) which thanks to
sbonazzo's help in the IRC could be worked around by doing a vdsm config
--force. Running the deploy process again worked fine, resulted in the same
matter as the first try (see 3rd point) BUT bringing up the stated error
again.
6.) Now, we tried to add another host (xxx-vmnode3 to our solution in order
to make the Engine highly available. Thus, working fine until the point of
entering an id for the new node where it complained, that the UUID was
already in use and we couldn't add this node to the cluster - which is
fairly odd, according to sbonazzo as any machine should have its own, unique
UUID.
7.) As this host wouldn't work, we decided giving xxx-vmnode2 a shot and ran
the deploy process on there, which resulted in ultimate failure. It didn't
even get to the steps regarding the path for the resulting VM.
Because it might help, I probably should give you an overview of our network
setup:
It is currently set up, so that we have a company-wide WAN and a rack-wide
LAN. The WAN is only there for the VMs to communicate with the outside
world, management and calling the engine is done via the LAN, which can be
accessed through a VPN connection. Therefore, we bridged the engine's
"ovirtmgmt" bridge to the internal LAN connection. Because the FQDN for the
Engine isn't callable through the DNS, we hacked it into the hosts file on
all nodes prior deploying the hosted-engine package.
This is where we are and where we come from - the oVirt setup worked
initially, when the engine was still seperated from the nodes. Our bad luck
with hardware didn't really help, too.
I am really looking forward to hearing from you guys because this project
would be a nice successor to our current VMWare solution, which is starting
to die.
Thank you for any time invested into our problems (and probably solutions) ;)
--
Best regards
Thomas Keppler
PS: I've just heard that the hosted-engine is **NOT** (really) compatible
with the hosted-engine. Are there any recommendations on what to do?
Hi Thomas,
Just to re-cap, the main issue as identified and handled by Sandro was blade servers
with supermicro boards using bugged bioses which caused the host-deploy to get the
same uuid from all the bioses.
The solution provided for the uuid was:
# uuidgen >/etc/vdsm/vdsm.id
on the hosts, which resolved it.
Feel free to ping us if there's anything else.
Going forward I'd advice to go with the stable releases rather than with nightly
builds.
Doron