[ovirt-users] Re: Hosted engine deployment fails consistently when trying to download files.

22 Jun 2020

      On Mon, Jun 22, 2020 at 8:58 AM Yedidyah Bar David <didi@redhat.com> wrote:
...
On Sun, Jun 21, 2020 at 7:36 PM Gilboa Davara <gilboad@gmail.com> wrote:
...
On Thu, Jun 18, 2020 at 2:54 PM Yedidyah Bar David <didi@redhat.com> wrote:
...
On Thu, Jun 18, 2020 at 2:37 PM Gilboa Davara <gilboad@gmail.com> wrote:
...
On Wed, Jun 17, 2020 at 12:35 PM Yedidyah Bar David <didi@redhat.com> wrote:
...
...
However, when trying to install 4.4 on the test CentOS 8.x (now 8.2
after yesterday release), either manually (via hosted-engine --deploy)
or by using cockpit, fails when trying to download packages (see
attached logs) during the hosted engine deployment phase.
Right. Didn't check them - I guess it's the same, no?
Most likely you are correct. That said, the console version is more verbose.
...
...
Just to be clear, it is the hosted engine VM (during the deployment
process) that fails to automatically download packages, _not_ the
host.
Exactly. That's why I asked you (because the logs do not reveal that)
to manually login there and try to install (update) the package, and
see what happens, why it failes, etc. Can you please try that? Thanks.
Sadly enough, the failure comes early in the hosted engine deployment
process, making the VM completely inaccessible.
While I see qemu-kvm briefly start, it usually dies before I have any
chance to access it.
Can I somehow prevent hosted-engine --deploy from destroying the
hosted engine VM, when the deployment fails, giving me access to it?
This is how it should behave normally, it does not kill the VM.
Perhaps check logs, try to find who/what killed it.
Anyway: Earlier today I pushed this patch:
https://gerrit.ovirt.org/109730
Didn't yet get to try verifying it. Would you like to try? You can get
an RPM from the CI build linked there, or download the patch and apply
it manually (in the "gitweb" link [1]).
Then, you can do:
hosted-engine --deploy --ansible-extra-vars=he_offline_deployment=true
If you try this, please share the results. Thanks!
[1] https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=commitdiff...
Now filed https://bugzilla.redhat.com/1849517 for this.
...
...
Best regards,
--
Didi
Good news. I managed to connect to the VM and solve the problem.
Glad to hear that, thanks for the report!
...
For some odd reason our primary DNS server had upstream connection
issues and all the requests were silently handled by our secondary DNS
server.
Not sure I understand why, but while the ovirt host did manage to
silently spill over to the secondary DNS, the hosted engine, at least
during the initial deployment phase (when it still uses the host's
dnsmasq), failed to spill over to the secondary DNS server and the
deployment failed.
Sounds like a bug in dnsmasq, although I am not sure.
That said, DNS/DHCP are out of scope for oVirt. We simply assume they
are robust.
In retrospective, what do you think we should have done differently
to make it easier for you to find (and fix) the problem?
Best regards,
--
Didi
In retrospect, the main problem was the non-descriptive error message
generated by DNF (which has nothing to do with the ovirt installer).
That said, this could easily be circumvented by adding a simple
network-test script to the installer playbook.

Then again, the problem was clearly on my side...

- Gilboa