[ovirt-devel] Unable to start VMs
Tomas Jelinek
tjelinek at redhat.com
Tue Sep 26 13:08:14 UTC 2017
On Tue, Sep 26, 2017 at 1:58 PM, Alexander Wels <awels at redhat.com> wrote:
> On Tuesday, September 26, 2017 3:26:44 AM EDT Tomas Jelinek wrote:
> > On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova at redhat.com>
> >
> > wrote:
> > > From 4.0 architecture family was renamed in script
> > > 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported',
> > > 'HotUnplugCpuSupported', '
> > > HotPlugMemorySupported', 'HotUnplugMemorySupported',
> > > 'IsMigrationSupported', 'IsMemorySnapshotSupported' and
> > > 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of
> > > view nothing wrong with that particular line in [1].
> > >
> > > Could be that somewhere in code is not used architecture family, but
> host
> > > architecture, when asked for value of this ConfigValues. But that would
> > > throw exception even before my patch, because
> '{"x86:"true","ppc":"true"}'
> > > was default value for HotPlugMemorySupported.
> >
> > I see a code path where the cluster arch can be set to x86_64 - it is
> > always executed for external VMs (imported from external provider or
> > unmanaged). It does not happen all the time, it is only a fallback if the
> > arch type is not known/reported etc.
> >
> > @Alexander: by any chance, was this VM an unmanaged one? Or imported? In
> > logs you should find something like:
> > "Illegal architecture type: {}, replacing with x86_64" or "null
> > architecture type, replacing with x86_64, {}".
> >
> > Also, if you create a new VM, can you start it?
> >
>
> No its an old database though from pre 4.0 times. These VMs have never been
> unmanaged or imported from external providers. I did not see that in the
> log,
> I had to manuall step through the code to end up in the right place that
> causes the NPE. Like I said before line 23 in FeatureSupported.java is the
> culprit IMO. It does:
>
> String value = archOptions.get(arch.name());
>
> arch is ArchitectureType, and arch.name returns x86_64, and if I
> understand
> right they should have done arch.getFamily().name() which does happen 2
> lines
> below it. Honestly I don't understand how any VMs are able to run with the
> code like that since they all check to see if you can do memory hot plug
> before starting, and that check runs through this piece of code, which
> based
> on the contents of [1] should return an npe since the database should not
> contain the x86_64 entries.
>
the reason it did not work is that there was a syntactic error in the
vdc_options table causing the
Config.<Map>getValue(feature, version.getValue()); to return null.
The VMs normally run, because if there is no entry for x86_64 than it
checks x86 two lines below.
>
> > > [1]
> > > *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/
> upgrade/pre_upg
> > > rade/0000_config.sql
> > > <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/
> upgrade/pre_upg
> > > rade/0000_config.sql>*
> > >
> > > On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek at redhat.com>
> > >
> > > wrote:
> > >> On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan at redhat.com>
> wrote:
> > >>> On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels at redhat.com>
> wrote:
> > >>>> On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote:
> > >>>> > So somewhere in the code somebody used the Arch and not the
> family.
> > >>>>
> > >>>> See the
> > >>>>
> > >>>> > enum getFamily() method
> > >>>>
> > >>>> Yep, in particular line 23 of FeatureSupported.java.
> > >>>>
> > >>>> I meant the caller of the method on this line. Do you have it in the
> > >>>
> > >>> trace so we can see who passed x86_64 as arch ?
> > >>>
> > >>> > On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels at redhat.com>
> wrote:
> > >>>> > > On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote:
> > >>>> > > > what JRE are you using? any change with that?
> > >>>> > >
> > >>>> > > So I just figured out the problem, and its really strange. It
> has
> > >>>>
> > >>>> nothing
> > >>>>
> > >>>> > > to
> > >>>> > > do with the SSL as the stack trace is mentioning. I manually
> > >>>> > > stepped
> > >>>> > > through
> > >>>> > > the code to see what was going on and it turns out it is
> failing in
> > >>>> > > FeatureSupported.java in supportedInConfig call from
> hotPlugMemory.
> > >>>> > >
> > >>>> > > The Config.<Map>getValue(feature, version.getValue()) (version
> is
> > >>>>
> > >>>> 4.2) is
> > >>>>
> > >>>> > > returning a map containing x86=true and ppc=true. But then it
> > >>>>
> > >>>> compares
> > >>>>
> > >>>> > > this to
> > >>>> > > ArchitectureType.name() it returns null, because .name() return
> > >>>>
> > >>>> x86_64. No
> > >>>>
> > >>>> > > it
> > >>>> > > appears that sometime during the last few months we dropped the
> _64
> > >>>>
> > >>>> in the
> > >>>>
> > >>>> > > ArchitectureType, or at least in the database.
> > >>
> > >> It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/
> 81464/
> > >>
> > >> @Mirka: what you think?
> > >>
> > >>>> > > As soon as I added a vdc_options tha contains x86_64 value for
> that
> > >>>>
> > >>>> key it
> > >>>>
> > >>>> > > started working. Now I have checked with Greg who has a fresh
> > >>>>
> > >>>> database
> > >>>>
> > >>>> > > that he
> > >>>> > > can start VMs no problem, and his database contains x86 instead
> of
> > >>>>
> > >>>> x86_64.
> > >>>>
> > >>>> > > > On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels at redhat.com
> >
> > >>>>
> > >>>> wrote:
> > >>>> > > > > Hi guys,
> > >>>> > > > >
> > >>>> > > > > I see to be having an issue starting VMs with the latest
> > >>>> > > > > master.
> > >>>> > >
> > >>>> > > Whenever
> > >>>> > >
> > >>>> > > > > I
> > >>>> > > > > try to start a VM I get null pointer exception. And the VM
> > >>>>
> > >>>> doesn't
> > >>>>
> > >>>> > > start.
> > >>>> > >
> > >>>> > > > > I
> > >>>> > > > > have debugged the engine, and it appears that the null
> pointer
> > >>>>
> > >>>> happens
> > >>>>
> > >>>> > > > > after
> > >>>> > > > > the engine tries to connect to the host. In the stack trace
> I
> > >>>>
> > >>>> see
> > >>>>
> > >>>> > > > > SSLPeerUnverifiedException, so it appears something went
> wrong
> > >>>>
> > >>>> with a
> > >>>>
> > >>>> > > > > certificate somewhere.
> > >>>> > > > >
> > >>>> > > > > I have put my hosts in maintaince and re-enrolled the
> > >>>>
> > >>>> certificate, but
> > >>>>
> > >>>> > > > > that
> > >>>> > > > > doesn't appear to be helping at all. Any other place I need
> to
> > >>>>
> > >>>> look at
> > >>>>
> > >>>> > > to
> > >>>> > >
> > >>>> > > > > make
> > >>>> > > > > sure the engine can talk to the hosts? This appears to have
> > >>>>
> > >>>> started
> > >>>>
> > >>>> > > after
> > >>>> > >
> > >>>> > > > > I
> > >>>> > > > > upgraded Wildfly to 11, so it is possible it has something
> to
> > >>>>
> > >>>> do with
> > >>>>
> > >>>> > > that
> > >>>> > >
> > >>>> > > > > as
> > >>>> > > > > well.
> > >>>> > > > >
> > >>>> > > > > Any help figuring this out would be appreciated.
> > >>>> > > > >
> > >>>> > > > > Alexander
> > >>>> > > > > _______________________________________________
> > >>>> > > > > Devel mailing list
> > >>>> > > > > Devel at ovirt.org
> > >>>> > > > > http://lists.ovirt.org/mailman/listinfo/devel
> > >>>
> > >>> _______________________________________________
> > >>> Devel mailing list
> > >>> Devel at ovirt.org
> > >>> http://lists.ovirt.org/mailman/listinfo/devel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170926/ea9b4e09/attachment.html>
More information about the Devel
mailing list