On Mon, Jun 22, 2020 at 8:58 AM Yedidyah Bar David <didi(a)redhat.com> wrote:
On Sun, Jun 21, 2020 at 7:36 PM Gilboa Davara <gilboad(a)gmail.com> wrote:
>
> On Thu, Jun 18, 2020 at 2:54 PM Yedidyah Bar David <didi(a)redhat.com> wrote:
> >
> > On Thu, Jun 18, 2020 at 2:37 PM Gilboa Davara <gilboad(a)gmail.com> wrote:
> > >
> > > On Wed, Jun 17, 2020 at 12:35 PM Yedidyah Bar David
<didi(a)redhat.com> wrote:
> > > > > However, when trying to install 4.4 on the test CentOS 8.x (now
8.2
> > > > > after yesterday release), either manually (via hosted-engine
--deploy)
> > > > > or by using cockpit, fails when trying to download packages
(see
> > > > > attached logs) during the hosted engine deployment phase.
> > > >
> > > > Right. Didn't check them - I guess it's the same, no?
> > >
> > > Most likely you are correct. That said, the console version is more
verbose.
> > >
> > >
> > > > > Just to be clear, it is the hosted engine VM (during the
deployment
> > > > > process) that fails to automatically download packages, _not_
the
> > > > > host.
> > > >
> > > > Exactly. That's why I asked you (because the logs do not reveal
that)
> > > > to manually login there and try to install (update) the package, and
> > > > see what happens, why it failes, etc. Can you please try that?
Thanks.
> > >
> > > Sadly enough, the failure comes early in the hosted engine deployment
> > > process, making the VM completely inaccessible.
> > > While I see qemu-kvm briefly start, it usually dies before I have any
> > > chance to access it.
> > >
> > > Can I somehow prevent hosted-engine --deploy from destroying the
> > > hosted engine VM, when the deployment fails, giving me access to it?
> >
> > This is how it should behave normally, it does not kill the VM.
> > Perhaps check logs, try to find who/what killed it.
> >
> > Anyway: Earlier today I pushed this patch:
> >
> >
https://gerrit.ovirt.org/109730
> >
> > Didn't yet get to try verifying it. Would you like to try? You can get
> > an RPM from the CI build linked there, or download the patch and apply
> > it manually (in the "gitweb" link [1]).
> >
> > Then, you can do:
> >
> > hosted-engine --deploy --ansible-extra-vars=he_offline_deployment=true
> >
> > If you try this, please share the results. Thanks!
> >
> > [1]
https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-setup.git;a=commitd...
Now filed
https://bugzilla.redhat.com/1849517 for this.
> >
> > Best regards,
> > --
> > Didi
> >
>
> Good news. I managed to connect to the VM and solve the problem.
Glad to hear that, thanks for the report!
>
> For some odd reason our primary DNS server had upstream connection
> issues and all the requests were silently handled by our secondary DNS
> server.
> Not sure I understand why, but while the ovirt host did manage to
> silently spill over to the secondary DNS, the hosted engine, at least
> during the initial deployment phase (when it still uses the host's
> dnsmasq), failed to spill over to the secondary DNS server and the
> deployment failed.
Sounds like a bug in dnsmasq, although I am not sure.
That said, DNS/DHCP are out of scope for oVirt. We simply assume they
are robust.
In retrospective, what do you think we should have done differently
to make it easier for you to find (and fix) the problem?
Best regards,
--
Didi
In retrospect, the main problem was the non-descriptive error message
generated by DNF (which has nothing to do with the ovirt installer).
That said, this could easily be circumvented by adding a simple
network-test script to the installer playbook.
Then again, the problem was clearly on my side...
- Gilboa