Hi,
On Tue, Oct 11, 2022 at 9:10 AM Matthew J Black <matthew(a)peregrineit.net> wrote:
Hi All,
OK, so after much reading of logs, Ansible files, blog posts, documentation, and much
gnashing of teeth, glasses of bourbon, language to make a sailor blush, tears, blood,
sweat, and various versions of "DOH!", I finally worked out what was wrong -
what I did wrong - and so I'm putting it down here so that the next person who comes
along with the same (or a similar) issue doesn't have to go through what I went
through - and I'm including a couple of suggestions to the devs/doco writers which (I
believe) would have stopped me from making my mistake in the first place.
Much appreciated!
When I did my install I used the command:
~~~
hosted-engine --deploy --4 --ansible-extra-vars=he_ipv4_subnet_prefix=172.16.1
~~~
I did this because we're running an IPv4 network and because the oVirt Engine needs
to be on the 172.16.1.0/24 network - and that's what I thought the
"he_ipv4_subnet_prefix" option did, and I was trying to let the deployment
script know this in advance instead of having to discover this itself.
Now that I've gone back over *all* the doco I realise that the
"he_ipv4_subnet_prefix" option is *not* used for this purpose, but is instead
used for the *temporary* ip address of the deployment engine when the default subnet of
192.168.222.0/24 is not available.
Because I was specifying the 172.16.1.0/24 network (which is already in use) the
deployment failed because it was attempting to create that network as a temporary network
for the initial deployment.
So yes, as I said, my fault - no question about that at all.
Some suggestions:
Although it is stated in the documentation - Installing oVirt As A Self-Hosted Engine
Using The Command Line, section 2.3.2
(
https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_eng...)
- (I believe) it is not very clear what is happening here, so a "Note:" or some
sort of statement explicitly stating what this is used for might be in order. For example,
here is the note I made for our team in our internal documentation:
~~~
**Note:** he_ipv4_subnet_prefix=x.x.x: - This is a temporary network prefix if
192.168.222.0/24 (the default) is not available - this is ***NOT*** the final working
subnet of the oVirt Engine.
~~~
I now read the subsection you linked to above - and IMO the context is
well-presented - if you read the entirety of 2.3.2 (6 lines, in my
browser), it should be clear. But of course - patches are welcome!
This page has, like most others in the website, an "Edit this page"
link at the bottom.
I also believe - quite strongly, in fact - that having the entire deployment hidden
behind the "black box" that is the Ansible deployment - while making things easy
by automating the deployment - makes troubleshooting more difficult. I believe that if
there was a definite "Step-By-Step" list of what was going on behind the scenes
- perhaps as an Appendix to the documentation - then the mistake I made would have been a
lot harder to make - ie if there was such a list then it would have been less likely to
make the assumption I made.
I'm thinking something along the lines of (and I am aware that what follows is not
correct):
~~~
1. Collect info - this is stored in "/path/file" temporarily.
2. Install Deployment VM.
3. Deployment VM creates internal bridge - this uses 192.168.222.0/24 by default but can
be overridden by "he_ipv4_subnet_prefix".
4. Deployment Engine creates oVirt Engine.
etc, etc, etc
~~~
Makes sense, but I do not think doing this well, and above that
maintaining this well over time/versions - is going to happen.
We have a very nice presentation from a few years ago, still relevant
even if not up-to-date, which might help get the big picture.
Searching google for "ovirt hosted-engine deep dive" finds it, for me:
https://www.ovirt.org/media/Hosted-Engine-4.3-deep-dive.pdf
BTW, in the long distant past, hosted-engine deployment was much more
manual (the script guided you through stuff, but you did a lot more by
hand - including installing the OS and engine on the VM, configuring
stuff, etc.) and the move to what we have now (called "node zero" or
"node 0" in some places, including above pdf) was definitely a huge
improvement.
Anyway, that's my feedback / suggestions / mea culpa / whatever. :-)
Thanks!
Best regards,
--
Didi