I've just inherited a project where we need to bring a prototype of a
small Ovirt system (single node or 3 node hyperconverged, with
glusterFS on the same machine, a bunch of different VM's ) running in
an industrial machine into serial production.
This means, we want to build a new 1 or 3 node Ovirt system each day
up until 3 times a day.
In my tests so far, the failure rate of the Ovirt engine deployment
(via the included scripts as well as the web UI) turns out to be
pretty high - it's between 40-60%, meaning until we have a running
system, we would have to try the installation and/or final engine
deployment about 2-4 times until we are successful.
So far I could not identify clear error messages that let me tell how
to fix the problem.
Before going into details of the errors I would like to ask if people
deeper into Ovirt would consider this a somewhat normal success rate,
or if this indicates we are doing something generally wrong and we
should definitely spend a few more hours or maybe days into finding
sources of problems.
More info about the system and errors
* OVirt 4.3.9 (because the prototype was made and verified with that
version - would be interesting to know, too, if it's strongly
considered to upgrade for more stable installation/deployment)
* The errors that appear are changing between the deployment process
seeming not to be able to transfer the "LocalHostedEngine" VM to the
glusterFS storage to become a "HostedEngine", and the other seems to
be when the engine is already up and running, but never being really
connected to the Ovirt system, continuously restarting, and also
showing XFS filesystem errors in it's dmesg output.
Any hints on our chances on getting this solved or requests for more
information about the error are welcome - thanks in advance.