On Tue, Sep 26, 2017 at 1:58 PM, Alexander Wels <awels(a)redhat.com> wrote:
On Tuesday, September 26, 2017 3:26:44 AM EDT Tomas Jelinek wrote:
> On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova(a)redhat.com>
>
> wrote:
> > From 4.0 architecture family was renamed in script
> > 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported',
> > 'HotUnplugCpuSupported', '
> > HotPlugMemorySupported', 'HotUnplugMemorySupported',
> > 'IsMigrationSupported', 'IsMemorySnapshotSupported' and
> > 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of
> > view nothing wrong with that particular line in [1].
> >
> > Could be that somewhere in code is not used architecture family, but
host
> > architecture, when asked for value of this ConfigValues. But that would
> > throw exception even before my patch, because
'{"x86:"true","ppc":"true"}'
> > was default value for HotPlugMemorySupported.
>
> I see a code path where the cluster arch can be set to x86_64 - it is
> always executed for external VMs (imported from external provider or
> unmanaged). It does not happen all the time, it is only a fallback if the
> arch type is not known/reported etc.
>
> @Alexander: by any chance, was this VM an unmanaged one? Or imported? In
> logs you should find something like:
> "Illegal architecture type: {}, replacing with x86_64" or "null
> architecture type, replacing with x86_64, {}".
>
> Also, if you create a new VM, can you start it?
>
No its an old database though from pre 4.0 times. These VMs have never been
unmanaged or imported from external providers. I did not see that in the
log,
I had to manuall step through the code to end up in the right place that
causes the NPE. Like I said before line 23 in FeatureSupported.java is the
culprit IMO. It does:
String value = archOptions.get(arch.name());
arch is ArchitectureType, and arch.name returns x86_64, and if I
understand
right they should have done arch.getFamily().name() which does happen 2
lines
below it. Honestly I don't understand how any VMs are able to run with the
code like that since they all check to see if you can do memory hot plug
before starting, and that check runs through this piece of code, which
based
on the contents of [1] should return an npe since the database should not
contain the x86_64 entries.
the reason it did not work is that there was a syntactic error in the
vdc_options table causing the
Config.<Map>getValue(feature, version.getValue()); to return null.
The VMs normally run, because if there is no entry for x86_64 than it
checks x86 two lines below.
> > [1]
> > *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/
upgrade/pre_upg
> > rade/0000_config.sql
> > <
https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/
upgrade/pre_upg
> > rade/0000_config.sql>*
> >
> > On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek(a)redhat.com>
> >
> > wrote:
> >> On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan(a)redhat.com>
wrote:
> >>> On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels(a)redhat.com>
wrote:
> >>>> On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote:
> >>>> > So somewhere in the code somebody used the Arch and not the
family.
> >>>>
> >>>> See the
> >>>>
> >>>> > enum getFamily() method
> >>>>
> >>>> Yep, in particular line 23 of FeatureSupported.java.
> >>>>
> >>>> I meant the caller of the method on this line. Do you have it in
the
> >>>
> >>> trace so we can see who passed x86_64 as arch ?
> >>>
> >>> > On Mon, 25 Sep 2017 at 22:31 Alexander Wels
<awels(a)redhat.com>
wrote:
> >>>> > > On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan
wrote:
> >>>> > > > what JRE are you using? any change with that?
> >>>> > >
> >>>> > > So I just figured out the problem, and its really
strange. It
has
> >>>>
> >>>> nothing
> >>>>
> >>>> > > to
> >>>> > > do with the SSL as the stack trace is mentioning. I
manually
> >>>> > > stepped
> >>>> > > through
> >>>> > > the code to see what was going on and it turns out it is
failing in
> >>>> > > FeatureSupported.java in supportedInConfig call from
hotPlugMemory.
> >>>> > >
> >>>> > > The Config.<Map>getValue(feature,
version.getValue()) (version
is
> >>>>
> >>>> 4.2) is
> >>>>
> >>>> > > returning a map containing x86=true and ppc=true. But
then it
> >>>>
> >>>> compares
> >>>>
> >>>> > > this to
> >>>> > > ArchitectureType.name() it returns null, because .name()
return
> >>>>
> >>>> x86_64. No
> >>>>
> >>>> > > it
> >>>> > > appears that sometime during the last few months we
dropped the
_64
> >>>>
> >>>> in the
> >>>>
> >>>> > > ArchitectureType, or at least in the database.
> >>
> >> It looks a lot like introduced here:
https://gerrit.ovirt.org/#/c/
81464/
> >>
> >> @Mirka: what you think?
> >>
> >>>> > > As soon as I added a vdc_options tha contains x86_64
value for
that
> >>>>
> >>>> key it
> >>>>
> >>>> > > started working. Now I have checked with Greg who has a
fresh
> >>>>
> >>>> database
> >>>>
> >>>> > > that he
> >>>> > > can start VMs no problem, and his database contains x86
instead
of
> >>>>
> >>>> x86_64.
> >>>>
> >>>> > > > On Mon, 25 Sep 2017 at 21:12 Alexander Wels
<awels(a)redhat.com
>
> >>>>
> >>>> wrote:
> >>>> > > > > Hi guys,
> >>>> > > > >
> >>>> > > > > I see to be having an issue starting VMs with
the latest
> >>>> > > > > master.
> >>>> > >
> >>>> > > Whenever
> >>>> > >
> >>>> > > > > I
> >>>> > > > > try to start a VM I get null pointer exception.
And the VM
> >>>>
> >>>> doesn't
> >>>>
> >>>> > > start.
> >>>> > >
> >>>> > > > > I
> >>>> > > > > have debugged the engine, and it appears that
the null
pointer
> >>>>
> >>>> happens
> >>>>
> >>>> > > > > after
> >>>> > > > > the engine tries to connect to the host. In the
stack trace
I
> >>>>
> >>>> see
> >>>>
> >>>> > > > > SSLPeerUnverifiedException, so it appears
something went
wrong
> >>>>
> >>>> with a
> >>>>
> >>>> > > > > certificate somewhere.
> >>>> > > > >
> >>>> > > > > I have put my hosts in maintaince and
re-enrolled the
> >>>>
> >>>> certificate, but
> >>>>
> >>>> > > > > that
> >>>> > > > > doesn't appear to be helping at all. Any
other place I need
to
> >>>>
> >>>> look at
> >>>>
> >>>> > > to
> >>>> > >
> >>>> > > > > make
> >>>> > > > > sure the engine can talk to the hosts? This
appears to have
> >>>>
> >>>> started
> >>>>
> >>>> > > after
> >>>> > >
> >>>> > > > > I
> >>>> > > > > upgraded Wildfly to 11, so it is possible it
has something
to
> >>>>
> >>>> do with
> >>>>
> >>>> > > that
> >>>> > >
> >>>> > > > > as
> >>>> > > > > well.
> >>>> > > > >
> >>>> > > > > Any help figuring this out would be
appreciated.
> >>>> > > > >
> >>>> > > > > Alexander
> >>>> > > > >
_______________________________________________
> >>>> > > > > Devel mailing list
> >>>> > > > > Devel(a)ovirt.org
> >>>> > > > >
http://lists.ovirt.org/mailman/listinfo/devel
> >>>
> >>> _______________________________________________
> >>> Devel mailing list
> >>> Devel(a)ovirt.org
> >>>
http://lists.ovirt.org/mailman/listinfo/devel