40-50% is low. Usually it should be above 90% but some versions are more buggy than others (depending on the version).

4.3.10 is the latest 4.3 version and is the only supported for migration to 4.4/4.5, so consider altering the deployment process.

When an engine constantly restarts -> put it in global maintenance and connect to the HE (either via the 'hosted-engine' or ssh). Then you can investigate further.
As the deployment uses Ansible - it should be fully idempotent.

Best Regards,
Strahil Nikolov

On Mon, Nov 1, 2021 at 13:34, Henning Sprang
<henning.sprang@gmail.com> wrote:
Hi,
Thanks for your reply.

If you say "The process could get some polishing" - does it mean you
confirm that it is somewhat normal to have a success rate of 40-50%?

I inherited this project and have been told I should to use 4.3.9 so
far because that's the version other parts of the system have been
tested with.
But I can check what is necessary to upgrade the PXE/Kickstart bare
metal OS+ovirt install to 4.3.10 if that is promising to make it more
reliable.

The symptoms I observed where changing between the deployment process
seeming not to be able to transfer the "LocalHostedEngine" VM to the
glusterFS storage to become a "HostedEngine", and the other seems to
be when the engine is already up and running, but never being really
connected to the Ovirt system, continuously restarting, and also
showing XFS filesystem errors in it's dmesg output.

There is a whole lot of output, but none that I can identify as
telling me something like "X is wrong, check the config of service Y
please".
The log of the main deployment script says in it's logfile "please
look at the logfile..." and states it's own filename.

Do you need any specific log file or config?
I have to get them from my next failed install then.

Thanks,
Henning

On Sat, Oct 30, 2021 at 7:29 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
>
> Well, it (the install process)requires some polishing. Any reason not to use 4.3.10 ? This is the only supported version for migration to 4.4
>
>
> Can you share what were the errors ?
>
>
> Best Regards,
> Strahil Nikolov
>
> On Sat, Oct 30, 2021 at 20:25, Henning Sprang
> <henning.sprang@gmail.com> wrote:
> Hi Strahil,
>
> Thanks for your reply!
>
> Since we are using 4.3.9 from the installation ISO, issues regarding CentOS 8 rolling releases do not apply here.
> We will do that upgrade sooner or later, but then i would try to do local yum repository mirrors that are always snapshots. More or less how you suggest…
>
> So my success rate of 4 in 10 attempts has nothing to do with changing package or Ovirt versions.
>
> It happens when applying exactly the same procedure, scripts, installation media, on the exact same machine.
> So the input is always the same, but the result is at least 2 different error states in 6 cases, and only 4 times a working system.
>
> I would like to understand if that is normal and everybody needs their handful of attempts to deploy the engine until it succeeds or if this is a sign that we still do something wrong that we need to track down and fix.
>
> Thanks,
> Henning
>
>
> On Sat 30. Oct 2021 at 14:31, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
>
> If you want to increase your deployment success, you will need to use repository management and freeze your OS & oVirt repos to a working level .
> For example if you use RHEL 8.4 and current level of oVirt - you will have dependency issues until RHEL 8.5 is released.
>
> Once this happens and your build succeeds, you will lock your repos and deploy from them again.Once you test each 'batch' of repos and confirm they work for you -> you will create a new set of repos ...
>
> oVirt is dynamic project with new features constantly comming up (and sometimes go away).
>
> Another approach is to use the ovirt Node image which is based on CentOS Stream and is validated in the dev infrastructure .
>
>
> Best Regards,
> Strahil Nikolov
>
> On Wed, Oct 27, 2021 at 21:39, Henning Sprang
> <henning.sprang@gmail.com> wrote:
>
> Hello,
>
> I've just inherited a project where we need to bring a prototype of a
> small Ovirt system (single node or 3 node hyperconverged, with
> glusterFS on the same machine, a bunch of different VM's ) running in
> an industrial machine into serial production.
>
> This means, we want to build a new 1 or 3 node Ovirt system each day
> up until 3 times a day.
>
> In my tests so far, the failure rate of the Ovirt engine deployment
> (via the included scripts as well as the web UI) turns out to be
> pretty high - it's between 40-60%, meaning until we have a running
> system, we would have to try the installation and/or final engine
> deployment about 2-4 times until we are successful.
>
> So far I could not identify clear error messages that let me tell how
> to fix the problem.
>
> Before going into details of the errors I would like to ask if people
> deeper into Ovirt would consider this a somewhat normal success rate,
> or if this indicates we are doing something generally wrong and we
> should definitely spend a few more hours or maybe days into finding
> sources of problems.
>
> More info about the system and errors
>
> * OVirt 4.3.9 (because the prototype was made and verified with that
> version - would be interesting to know, too, if it's strongly
> considered to upgrade for more stable installation/deployment)
> * The errors that appear are changing between the deployment process
> seeming not to be able to transfer the "LocalHostedEngine" VM to the
> glusterFS storage to become a "HostedEngine", and the other seems to
> be when the engine is already up and running, but never being really
> connected to the Ovirt system, continuously restarting, and also
> showing XFS filesystem errors in it's dmesg output.
>
> Any hints on our chances on getting this solved or requests for more
> information about the error are welcome - thanks in advance.
>
> Henning
>
> _______________________________________________
>
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-leave@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PAX7UPTXTGISDSFCABRLBHE63Y5GD6RR/
>
> --
> Henning Sprang
> http://www.sprang.de




--
Henning Sprang
http://www.sprang.de
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DY2OZQSCHSTY3OX4G6EGYC3MMAHEETU4/