On Sun, Jun 21, 2020 at 8:04 PM Gilboa Davara <gilboad(a)gmail.com> wrote:
Hello,
Following the previous email, I think I'm hitting an odd problem, not
sure if it's my mistake or an actual bug.
1. Newly deployed 4.4 self-hosted engine on localhost NFS storage on a
single node.
2. Installation failed during the final phase with a non-descriptive
error message [1].
I agree. Would you like to open a bug about this? It's not always easy
to know the root cause for the failure, nor to pass it through the
various components until it can reach the end-user.
3. Log attached.
4. Even though the installation seemed to have failed, I managed to
connect to the ovirt console, and noticed it failed to connect to the
host.
5. SSH into the hosted engine, and noticed it cannot resolve the host hostname.
6. Added the missing /etc/hosts entry, restarted the ovirt-engine
service, and all is green.
7. Looking the deployment log, I'm seeing the following message:
"[WARNING] Failed to resolve gilboa-wx-ovirt.localdomain using DNS, it
can be resolved only locally", which means the ansible was aware the
my DNS server doesn't resolve the host hostname, but didn't add the
missing /etc/hosts entry / and or errored out.
Not sure it must abort. In principle, you could have supplied custom
ansible code to be ran inside the appliance, to add the items yourself
to /etc/hosts, or in theory it can also happen that you configured stuff
so that the host fails DNS resolution but the engine VM does not.
A. Is it a bug, or is it PBKAC?
It also asked you:
2020-06-21 10:49:18,562-0400 DEBUG otopi.plugins.otopi.dialog.human
dialog.__logString:204 DIALOG:SEND Add lines for the
appliance itself and for this host to /etc/hosts on the engine VM?
2020-06-21 10:49:18,562-0400 DEBUG otopi.plugins.otopi.dialog.human
dialog.__logString:204 DIALOG:SEND Note: ensuring that
this host could resolve the engine VM hostname is still up to you
2020-06-21 10:49:18,563-0400 DEBUG otopi.plugins.otopi.dialog.human
dialog.__logString:204 DIALOG:SEND (Yes, No)[No]
And you accepted the default 'No'.
Perhaps we should change the default to Yes.
Of course - Yes is also a risk - a user not noticing it, then later on
changing the DNS, and not understanding why it "does not work"...
B. What are the chances that I have a working ovirt (test) setup?
In theory, you can examine the ansible code, and see what (not very
many) next steps it should have done if it didn't fail there, and do
that yourself (or decide that they are not important). In practice,
I'd personally deploy again cleanly, unless this is for a quick test
or something.
Best regards,
--
Didi