
Hi guys, I see to be having an issue starting VMs with the latest master. Whenever I try to start a VM I get null pointer exception. And the VM doesn't start. I have debugged the engine, and it appears that the null pointer happens after the engine tries to connect to the host. In the stack trace I see SSLPeerUnverifiedException, so it appears something went wrong with a certificate somewhere. I have put my hosts in maintaince and re-enrolled the certificate, but that doesn't appear to be helping at all. Any other place I need to look at to make sure the engine can talk to the hosts? This appears to have started after I upgraded Wildfly to 11, so it is possible it has something to do with that as well. Any help figuring this out would be appreciated. Alexander

what JRE are you using? any change with that? On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote:
Hi guys,
I see to be having an issue starting VMs with the latest master. Whenever I try to start a VM I get null pointer exception. And the VM doesn't start. I have debugged the engine, and it appears that the null pointer happens after the engine tries to connect to the host. In the stack trace I see SSLPeerUnverifiedException, so it appears something went wrong with a certificate somewhere.
I have put my hosts in maintaince and re-enrolled the certificate, but that doesn't appear to be helping at all. Any other place I need to look at to make sure the engine can talk to the hosts? This appears to have started after I upgraded Wildfly to 11, so it is possible it has something to do with that as well.
Any help figuring this out would be appreciated.
Alexander _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote:
what JRE are you using? any change with that?
So I just figured out the problem, and its really strange. It has nothing to do with the SSL as the stack trace is mentioning. I manually stepped through the code to see what was going on and it turns out it is failing in FeatureSupported.java in supportedInConfig call from hotPlugMemory. The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is returning a map containing x86=true and ppc=true. But then it compares this to ArchitectureType.name() it returns null, because .name() return x86_64. No it appears that sometime during the last few months we dropped the _64 in the ArchitectureType, or at least in the database. As soon as I added a vdc_options tha contains x86_64 value for that key it started working. Now I have checked with Greg who has a fresh database that he can start VMs no problem, and his database contains x86 instead of x86_64.
On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote:
Hi guys,
I see to be having an issue starting VMs with the latest master. Whenever I try to start a VM I get null pointer exception. And the VM doesn't start. I have debugged the engine, and it appears that the null pointer happens after the engine tries to connect to the host. In the stack trace I see SSLPeerUnverifiedException, so it appears something went wrong with a certificate somewhere.
I have put my hosts in maintaince and re-enrolled the certificate, but that doesn't appear to be helping at all. Any other place I need to look at to make sure the engine can talk to the hosts? This appears to have started after I upgraded Wildfly to 11, so it is possible it has something to do with that as well.
Any help figuring this out would be appreciated.
Alexander _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

So somewhere in the code somebody used the Arch and not the family. See the enum getFamily() method On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote:
what JRE are you using? any change with that?
So I just figured out the problem, and its really strange. It has nothing to do with the SSL as the stack trace is mentioning. I manually stepped through the code to see what was going on and it turns out it is failing in FeatureSupported.java in supportedInConfig call from hotPlugMemory.
The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is returning a map containing x86=true and ppc=true. But then it compares this to ArchitectureType.name() it returns null, because .name() return x86_64. No it appears that sometime during the last few months we dropped the _64 in the ArchitectureType, or at least in the database.
As soon as I added a vdc_options tha contains x86_64 value for that key it started working. Now I have checked with Greg who has a fresh database that he can start VMs no problem, and his database contains x86 instead of x86_64.
Hi guys,
I see to be having an issue starting VMs with the latest master. Whenever I try to start a VM I get null pointer exception. And the VM doesn't start. I have debugged the engine, and it appears that the null pointer happens after the engine tries to connect to the host. In the stack trace I see SSLPeerUnverifiedException, so it appears something went wrong with a certificate somewhere.
I have put my hosts in maintaince and re-enrolled the certificate, but that doesn't appear to be helping at all. Any other place I need to look at to make sure the engine can talk to the hosts? This appears to have started after I upgraded Wildfly to 11, so it is possible it has something to do with
On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote: that
as well.
Any help figuring this out would be appreciated.
Alexander _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote:
So somewhere in the code somebody used the Arch and not the family. See the enum getFamily() method
Yep, in particular line 23 of FeatureSupported.java.
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote:
what JRE are you using? any change with that?
So I just figured out the problem, and its really strange. It has nothing to do with the SSL as the stack trace is mentioning. I manually stepped through the code to see what was going on and it turns out it is failing in FeatureSupported.java in supportedInConfig call from hotPlugMemory.
The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is returning a map containing x86=true and ppc=true. But then it compares this to ArchitectureType.name() it returns null, because .name() return x86_64. No it appears that sometime during the last few months we dropped the _64 in the ArchitectureType, or at least in the database.
As soon as I added a vdc_options tha contains x86_64 value for that key it started working. Now I have checked with Greg who has a fresh database that he can start VMs no problem, and his database contains x86 instead of x86_64.
On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote:
Hi guys,
I see to be having an issue starting VMs with the latest master.
Whenever
I try to start a VM I get null pointer exception. And the VM doesn't
start.
I have debugged the engine, and it appears that the null pointer happens after the engine tries to connect to the host. In the stack trace I see SSLPeerUnverifiedException, so it appears something went wrong with a certificate somewhere.
I have put my hosts in maintaince and re-enrolled the certificate, but that doesn't appear to be helping at all. Any other place I need to look at
to
make sure the engine can talk to the hosts? This appears to have started
after
I upgraded Wildfly to 11, so it is possible it has something to do with
that
as well.
Any help figuring this out would be appreciated.
Alexander _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
So somewhere in the code somebody used the Arch and not the family. See
On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote: the
enum getFamily() method
Yep, in particular line 23 of FeatureSupported.java.
I meant the caller of the method on this line. Do you have it in the trace so we can see who passed x86_64 as arch ?
On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote:
what JRE are you using? any change with that?
So I just figured out the problem, and its really strange. It has nothing to do with the SSL as the stack trace is mentioning. I manually stepped through the code to see what was going on and it turns out it is failing in FeatureSupported.java in supportedInConfig call from hotPlugMemory.
The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is returning a map containing x86=true and ppc=true. But then it compares this to ArchitectureType.name() it returns null, because .name() return x86_64. No it appears that sometime during the last few months we dropped the _64 in
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote: the
ArchitectureType, or at least in the database.
As soon as I added a vdc_options tha contains x86_64 value for that key it started working. Now I have checked with Greg who has a fresh database that he can start VMs no problem, and his database contains x86 instead of x86_64.
On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote:
Hi guys,
I see to be having an issue starting VMs with the latest master.
Whenever
I try to start a VM I get null pointer exception. And the VM doesn't
start.
I have debugged the engine, and it appears that the null pointer happens after the engine tries to connect to the host. In the stack trace I see SSLPeerUnverifiedException, so it appears something went wrong with a certificate somewhere.
I have put my hosts in maintaince and re-enrolled the certificate, but that doesn't appear to be helping at all. Any other place I need to look at
to
make sure the engine can talk to the hosts? This appears to have started
after
I upgraded Wildfly to 11, so it is possible it has something to do with
that
as well.
Any help figuring this out would be appreciated.
Alexander _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
So somewhere in the code somebody used the Arch and not the family. See
On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote: the
enum getFamily() method
Yep, in particular line 23 of FeatureSupported.java.
I meant the caller of the method on this line. Do you have it in the trace so we can see who passed x86_64 as arch ?
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote:
what JRE are you using? any change with that?
So I just figured out the problem, and its really strange. It has nothing to do with the SSL as the stack trace is mentioning. I manually stepped through the code to see what was going on and it turns out it is failing in FeatureSupported.java in supportedInConfig call from hotPlugMemory.
The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is returning a map containing x86=true and ppc=true. But then it compares this to ArchitectureType.name() it returns null, because .name() return x86_64. No it appears that sometime during the last few months we dropped the _64 in the ArchitectureType, or at least in the database.
It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/81464/ @Mirka: what you think?
As soon as I added a vdc_options tha contains x86_64 value for that
key it
started working. Now I have checked with Greg who has a fresh database that he can start VMs no problem, and his database contains x86 instead of x86_64.
On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote:
Hi guys,
I see to be having an issue starting VMs with the latest master.
Whenever
I try to start a VM I get null pointer exception. And the VM doesn't
start.
I have debugged the engine, and it appears that the null pointer happens after the engine tries to connect to the host. In the stack trace I see SSLPeerUnverifiedException, so it appears something went wrong with a certificate somewhere.
I have put my hosts in maintaince and re-enrolled the certificate, but that doesn't appear to be helping at all. Any other place I need to look at
to
make sure the engine can talk to the hosts? This appears to have started
after
I upgraded Wildfly to 11, so it is possible it has something to do with
that
as well.
Any help figuring this out would be appreciated.
Alexander _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

From 4.0 architecture family was renamed in script 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported', 'HotUnplugCpuSupported', 'HotPlugMemorySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported', 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of view nothing wrong with that particular line in [1].
Could be that somewhere in code is not used architecture family, but host architecture, when asked for value of this ConfigValues. But that would throw exception even before my patch, because '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported. [1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade... <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql>* On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote:
So somewhere in the code somebody used the Arch and not the family. See the enum getFamily() method
Yep, in particular line 23 of FeatureSupported.java.
I meant the caller of the method on this line. Do you have it in the trace so we can see who passed x86_64 as arch ?
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote:
what JRE are you using? any change with that?
So I just figured out the problem, and its really strange. It has nothing to do with the SSL as the stack trace is mentioning. I manually stepped through the code to see what was going on and it turns out it is failing in FeatureSupported.java in supportedInConfig call from hotPlugMemory.
The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is returning a map containing x86=true and ppc=true. But then it compares this to ArchitectureType.name() it returns null, because .name() return x86_64. No it appears that sometime during the last few months we dropped the _64 in the ArchitectureType, or at least in the database.
It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/81464/
@Mirka: what you think?
As soon as I added a vdc_options tha contains x86_64 value for that
key it
started working. Now I have checked with Greg who has a fresh database that he can start VMs no problem, and his database contains x86 instead of x86_64.
On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote: > Hi guys, > > I see to be having an issue starting VMs with the latest master.
Whenever
> I > try to start a VM I get null pointer exception. And the VM doesn't
start.
> I > have debugged the engine, and it appears that the null pointer happens > after > the engine tries to connect to the host. In the stack trace I see > SSLPeerUnverifiedException, so it appears something went wrong with a > certificate somewhere. > > I have put my hosts in maintaince and re-enrolled the certificate, but > that > doesn't appear to be helping at all. Any other place I need to look at
to
> make > sure the engine can talk to the hosts? This appears to have started
after
> I > upgraded Wildfly to 11, so it is possible it has something to do with
that
> as > well. > > Any help figuring this out would be appreciated. > > Alexander > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova@redhat.com> wrote:
From 4.0 architecture family was renamed in script 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported', 'HotUnplugCpuSupported', ' HotPlugMemorySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported', 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of view nothing wrong with that particular line in [1].
Could be that somewhere in code is not used architecture family, but host architecture, when asked for value of this ConfigValues. But that would throw exception even before my patch, because '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported.
I see a code path where the cluster arch can be set to x86_64 - it is always executed for external VMs (imported from external provider or unmanaged). It does not happen all the time, it is only a fallback if the arch type is not known/reported etc. @Alexander: by any chance, was this VM an unmanaged one? Or imported? In logs you should find something like: "Illegal architecture type: {}, replacing with x86_64" or "null architecture type, replacing with x86_64, {}". Also, if you create a new VM, can you start it?
[1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade... <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql>*
On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote:
So somewhere in the code somebody used the Arch and not the family. See the enum getFamily() method
Yep, in particular line 23 of FeatureSupported.java.
I meant the caller of the method on this line. Do you have it in the trace so we can see who passed x86_64 as arch ?
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote: > what JRE are you using? any change with that?
So I just figured out the problem, and its really strange. It has nothing to do with the SSL as the stack trace is mentioning. I manually stepped through the code to see what was going on and it turns out it is failing in FeatureSupported.java in supportedInConfig call from hotPlugMemory.
The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is returning a map containing x86=true and ppc=true. But then it compares this to ArchitectureType.name() it returns null, because .name() return x86_64. No it appears that sometime during the last few months we dropped the _64 in the ArchitectureType, or at least in the database.
It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/81464/
@Mirka: what you think?
As soon as I added a vdc_options tha contains x86_64 value for that
key it
started working. Now I have checked with Greg who has a fresh database that he can start VMs no problem, and his database contains x86 instead of x86_64.
> On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote: > > Hi guys, > > > > I see to be having an issue starting VMs with the latest master.
Whenever
> > I > > try to start a VM I get null pointer exception. And the VM doesn't
start.
> > I > > have debugged the engine, and it appears that the null pointer happens > > after > > the engine tries to connect to the host. In the stack trace I see > > SSLPeerUnverifiedException, so it appears something went wrong with a > > certificate somewhere. > > > > I have put my hosts in maintaince and re-enrolled the certificate, but > > that > > doesn't appear to be helping at all. Any other place I need to look at
to
> > make > > sure the engine can talk to the hosts? This appears to have started
after
> > I > > upgraded Wildfly to 11, so it is possible it has something to do with
that
> > as > > well. > > > > Any help figuring this out would be appreciated. > > > > Alexander > > _______________________________________________ > > Devel mailing list > > Devel@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

I have same issue with new VM : 2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery] (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Query 'GetArchitectureCapabilitiesQuery' failed: null 2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery] (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Exception: java.lang.NullPointerException at org.ovirt.engine.core.common.FeatureSupported.supportedInConfig(FeatureSupported.java:23) [common.jar:] at org.ovirt.engine.core.common.FeatureSupported.hotUnplugMemory(FeatureSupported.java:43) [common.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.isSupported(GetArchitectureCapabilitiesQuery.java:66) [bll.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.getMap(GetArchitectureCapabilitiesQuery.java:36) [bll.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.executeQueryCommand(GetArchitectureCapabilitiesQuery.java:22) [bll.jar:] at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:106) [bll.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendQueryExecutor.execute(DefaultBackendQueryExecutor.java:14) [bll.jar:] Then: 2017-09-26 11:08:08,397+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11) [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Failed to create VM: null 2017-09-26 11:08:08,398+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11) [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Command 'CreateVDSCommand( CreateVDSCommandParameters:{hostId='b6b6a226-8d4f-4929-85d1-b218eceee99e', vmId='f15ddd07-408b-4665-aede-a9efc5716dc7', vm='VM [FEDORA_CINDER]'})' execution failed: java.lang.NullPointerException On Tue, Sep 26, 2017 at 10:26 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova@redhat.com> wrote:
From 4.0 architecture family was renamed in script 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported', 'HotUnplugCpuSupported', 'HotPlugMemo rySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported', 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of view nothing wrong with that particular line in [1].
Could be that somewhere in code is not used architecture family, but host architecture, when asked for value of this ConfigValues. But that would throw exception even before my patch, because '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported.
I see a code path where the cluster arch can be set to x86_64 - it is always executed for external VMs (imported from external provider or unmanaged). It does not happen all the time, it is only a fallback if the arch type is not known/reported etc.
@Alexander: by any chance, was this VM an unmanaged one? Or imported? In logs you should find something like: "Illegal architecture type: {}, replacing with x86_64" or "null architecture type, replacing with x86_64, {}".
Also, if you create a new VM, can you start it?
[1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade... <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql>*
On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote:
So somewhere in the code somebody used the Arch and not the family. See the enum getFamily() method
Yep, in particular line 23 of FeatureSupported.java.
I meant the caller of the method on this line. Do you have it in the trace so we can see who passed x86_64 as arch ?
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote:
> On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote: > > what JRE are you using? any change with that? > > So I just figured out the problem, and its really strange. It has nothing > to > do with the SSL as the stack trace is mentioning. I manually stepped > through > the code to see what was going on and it turns out it is failing in > FeatureSupported.java in supportedInConfig call from hotPlugMemory. > > The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is > returning a map containing x86=true and ppc=true. But then it compares > this to > ArchitectureType.name() it returns null, because .name() return x86_64. No > it > appears that sometime during the last few months we dropped the _64 in the > ArchitectureType, or at least in the database.
It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/81464/
@Mirka: what you think?
> As soon as I added a vdc_options tha contains x86_64 value for
that key it
> started working. Now I have checked with Greg who has a fresh database > that he > can start VMs no problem, and his database contains x86 instead of x86_64. > > > On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote: > > > Hi guys, > > > > > > I see to be having an issue starting VMs with the latest master. > > Whenever > > > > I > > > try to start a VM I get null pointer exception. And the VM doesn't > > start. > > > > I > > > have debugged the engine, and it appears that the null pointer happens > > > after > > > the engine tries to connect to the host. In the stack trace I see > > > SSLPeerUnverifiedException, so it appears something went wrong with a > > > certificate somewhere. > > > > > > I have put my hosts in maintaince and re-enrolled the certificate, but > > > that > > > doesn't appear to be helping at all. Any other place I need to look at > > to > > > > make > > > sure the engine can talk to the hosts? This appears to have started > > after > > > > I > > > upgraded Wildfly to 11, so it is possible it has something to do with > > that > > > > as > > > well. > > > > > > Any help figuring this out would be appreciated. > > > > > > Alexander > > > _______________________________________________ > > > Devel mailing list > > > Devel@ovirt.org > > > http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

So we found where is problem. There was patch before some time that was reverted and had two badly formatted jsons in value of vdc_option. If you have db from that time, the bug will appear, because the values are not updated (because they are already in db). To fix this, its enough to change 'HotPlugMemorySupported' to value '{"x86":"true","ppc":"true"}' for versions 4.0 - 4.2 and 'HotUnplugMemorySupported' to '{"x86":"true","ppc":"true"}' for 4.2. Will do the patch that will include this update in 0000_config in case anyone else had the same problem. On Tue, Sep 26, 2017 at 10:12 AM, Fred Rolland <frolland@redhat.com> wrote:
I have same issue with new VM :
2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll. GetArchitectureCapabilitiesQuery] (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Query ' GetArchitectureCapabilitiesQuery' failed: null 2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll. GetArchitectureCapabilitiesQuery] (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Exception: java.lang.NullPointerException at org.ovirt.engine.core.common.FeatureSupported.supportedInConfig(FeatureSupported.java:23) [common.jar:] at org.ovirt.engine.core.common.FeatureSupported.hotUnplugMemory(FeatureSupported.java:43) [common.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQue ry.isSupported(GetArchitectureCapabilitiesQuery.java:66) [bll.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.getMap( GetArchitectureCapabilitiesQuery.java:36) [bll.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQue ry.executeQueryCommand(GetArchitectureCapabilitiesQuery.java:22) [bll.jar:] at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:106) [bll.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendQueryExecutor. execute(DefaultBackendQueryExecutor.java:14) [bll.jar:]
Then:
2017-09-26 11:08:08,397+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11) [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Failed to create VM: null 2017-09-26 11:08:08,398+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11) [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Command 'CreateVDSCommand( CreateVDSCommandParameters:{hostId='b6b6a226-8d4f-4929-85d1-b218eceee99e', vmId='f15ddd07-408b-4665-aede-a9efc5716dc7', vm='VM [FEDORA_CINDER]'})' execution failed: java.lang.NullPointerException
On Tue, Sep 26, 2017 at 10:26 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova@redhat.com> wrote:
From 4.0 architecture family was renamed in script 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported', 'HotUnplugCpuSupported', 'HotPlugMemo rySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported', 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of view nothing wrong with that particular line in [1].
Could be that somewhere in code is not used architecture family, but host architecture, when asked for value of this ConfigValues. But that would throw exception even before my patch, because '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported.
I see a code path where the cluster arch can be set to x86_64 - it is always executed for external VMs (imported from external provider or unmanaged). It does not happen all the time, it is only a fallback if the arch type is not known/reported etc.
@Alexander: by any chance, was this VM an unmanaged one? Or imported? In logs you should find something like: "Illegal architecture type: {}, replacing with x86_64" or "null architecture type, replacing with x86_64, {}".
Also, if you create a new VM, can you start it?
[1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade... <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql>*
On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote: > So somewhere in the code somebody used the Arch and not the family. See the > enum getFamily() method >
Yep, in particular line 23 of FeatureSupported.java.
I meant the caller of the method on this line. Do you have it in the trace so we can see who passed x86_64 as arch ?
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote: > > On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote: > > > what JRE are you using? any change with that? > > > > So I just figured out the problem, and its really strange. It has nothing > > to > > do with the SSL as the stack trace is mentioning. I manually stepped > > through > > the code to see what was going on and it turns out it is failing in > > FeatureSupported.java in supportedInConfig call from hotPlugMemory. > > > > The Config.<Map>getValue(feature, version.getValue()) (version is 4.2) is > > returning a map containing x86=true and ppc=true. But then it compares > > this to > > ArchitectureType.name() it returns null, because .name() return x86_64. No > > it > > appears that sometime during the last few months we dropped the _64 in the > > ArchitectureType, or at least in the database.
It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/8 1464/
@Mirka: what you think?
> > > As soon as I added a vdc_options tha contains x86_64 value for that key it > > started working. Now I have checked with Greg who has a fresh database > > that he > > can start VMs no problem, and his database contains x86 instead of x86_64. > > > > > On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> wrote: > > > > Hi guys, > > > > > > > > I see to be having an issue starting VMs with the latest master. > > > > Whenever > > > > > > I > > > > try to start a VM I get null pointer exception. And the VM doesn't > > > > start. > > > > > > I > > > > have debugged the engine, and it appears that the null pointer happens > > > > after > > > > the engine tries to connect to the host. In the stack trace I see > > > > SSLPeerUnverifiedException, so it appears something went wrong with a > > > > certificate somewhere. > > > > > > > > I have put my hosts in maintaince and re-enrolled the certificate, but > > > > that > > > > doesn't appear to be helping at all. Any other place I need to look at > > > > to > > > > > > make > > > > sure the engine can talk to the hosts? This appears to have started > > > > after > > > > > > I > > > > upgraded Wildfly to 11, so it is possible it has something to do with > > > > that > > > > > > as > > > > well. > > > > > > > > Any help figuring this out would be appreciated. > > > > > > > > Alexander > > > > _______________________________________________ > > > > Devel mailing list > > > > Devel@ovirt.org > > > > http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Promised patch: https://gerrit.ovirt.org/#/c/82203/ On Tue, Sep 26, 2017 at 11:44 AM, Miroslava Voglova <mvoglova@redhat.com> wrote:
So we found where is problem. There was patch before some time that was reverted and had two badly formatted jsons in value of vdc_option. If you have db from that time, the bug will appear, because the values are not updated (because they are already in db).
To fix this, its enough to change 'HotPlugMemorySupported' to value '{"x86":"true","ppc":"true"}' for versions 4.0 - 4.2 and 'HotUnplugMemorySupported' to '{"x86":"true","ppc":"true"}' for 4.2.
Will do the patch that will include this update in 0000_config in case anyone else had the same problem.
On Tue, Sep 26, 2017 at 10:12 AM, Fred Rolland <frolland@redhat.com> wrote:
I have same issue with new VM :
2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery] (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Query 'GetArchitectureCapabilitiesQuery' failed: null 2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery] (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Exception: java.lang.NullPointerException at org.ovirt.engine.core.common.FeatureSupported.supportedInConfig(FeatureSupported.java:23) [common.jar:] at org.ovirt.engine.core.common.FeatureSupported.hotUnplugMemory(FeatureSupported.java:43) [common.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery. isSupported(GetArchitectureCapabilitiesQuery.java:66) [bll.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery. getMap(GetArchitectureCapabilitiesQuery.java:36) [bll.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery. executeQueryCommand(GetArchitectureCapabilitiesQuery.java:22) [bll.jar:] at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:106) [bll.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendQueryExecut or.execute(DefaultBackendQueryExecutor.java:14) [bll.jar:]
Then:
2017-09-26 11:08:08,397+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11) [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Failed to create VM: null 2017-09-26 11:08:08,398+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11) [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Command 'CreateVDSCommand( CreateVDSCommandParameters:{hostId='b6b6a226-8d4f-4929-85d1-b218eceee99e', vmId='f15ddd07-408b-4665-aede-a9efc5716dc7', vm='VM [FEDORA_CINDER]'})' execution failed: java.lang.NullPointerException
On Tue, Sep 26, 2017 at 10:26 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova@redhat.com> wrote:
From 4.0 architecture family was renamed in script 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported', 'HotUnplugCpuSupported', 'HotPlugMemo rySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported', 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of view nothing wrong with that particular line in [1].
Could be that somewhere in code is not used architecture family, but host architecture, when asked for value of this ConfigValues. But that would throw exception even before my patch, because '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported.
I see a code path where the cluster arch can be set to x86_64 - it is always executed for external VMs (imported from external provider or unmanaged). It does not happen all the time, it is only a fallback if the arch type is not known/reported etc.
@Alexander: by any chance, was this VM an unmanaged one? Or imported? In logs you should find something like: "Illegal architecture type: {}, replacing with x86_64" or "null architecture type, replacing with x86_64, {}".
Also, if you create a new VM, can you start it?
[1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade... <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql>*
On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
> On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote: > > So somewhere in the code somebody used the Arch and not the > family. See the > > enum getFamily() method > > > > Yep, in particular line 23 of FeatureSupported.java. > > I meant the caller of the method on this line. Do you have it in the trace so we can see who passed x86_64 as arch ?
> On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> > wrote: > > > On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote: > > > > what JRE are you using? any change with that? > > > > > > So I just figured out the problem, and its really strange. It > has nothing > > > to > > > do with the SSL as the stack trace is mentioning. I manually > stepped > > > through > > > the code to see what was going on and it turns out it is failing > in > > > FeatureSupported.java in supportedInConfig call from > hotPlugMemory. > > > > > > The Config.<Map>getValue(feature, version.getValue()) (version > is 4.2) is > > > returning a map containing x86=true and ppc=true. But then it > compares > > > this to > > > ArchitectureType.name() it returns null, because .name() return > x86_64. No > > > it > > > appears that sometime during the last few months we dropped the > _64 in the > > > ArchitectureType, or at least in the database. >
It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/8 1464/
@Mirka: what you think?
> > > > > As soon as I added a vdc_options tha contains x86_64 value for > that key it > > > started working. Now I have checked with Greg who has a fresh > database > > > that he > > > can start VMs no problem, and his database contains x86 instead > of x86_64. > > > > > > > On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> > wrote: > > > > > Hi guys, > > > > > > > > > > I see to be having an issue starting VMs with the latest > master. > > > > > > Whenever > > > > > > > > I > > > > > try to start a VM I get null pointer exception. And the VM > doesn't > > > > > > start. > > > > > > > > I > > > > > have debugged the engine, and it appears that the null > pointer happens > > > > > after > > > > > the engine tries to connect to the host. In the stack trace > I see > > > > > SSLPeerUnverifiedException, so it appears something went > wrong with a > > > > > certificate somewhere. > > > > > > > > > > I have put my hosts in maintaince and re-enrolled the > certificate, but > > > > > that > > > > > doesn't appear to be helping at all. Any other place I need > to look at > > > > > > to > > > > > > > > make > > > > > sure the engine can talk to the hosts? This appears to have > started > > > > > > after > > > > > > > > I > > > > > upgraded Wildfly to 11, so it is possible it has something > to do with > > > > > > that > > > > > > > > as > > > > > well. > > > > > > > > > > Any help figuring this out would be appreciated. > > > > > > > > > > Alexander > > > > > _______________________________________________ > > > > > Devel mailing list > > > > > Devel@ovirt.org > > > > > http://lists.ovirt.org/mailman/listinfo/devel > > > _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Thanks, it fixed my issue. On Tue, Sep 26, 2017 at 1:04 PM, Miroslava Voglova <mvoglova@redhat.com> wrote:
Promised patch: https://gerrit.ovirt.org/#/c/82203/
On Tue, Sep 26, 2017 at 11:44 AM, Miroslava Voglova <mvoglova@redhat.com> wrote:
So we found where is problem. There was patch before some time that was reverted and had two badly formatted jsons in value of vdc_option. If you have db from that time, the bug will appear, because the values are not updated (because they are already in db).
To fix this, its enough to change 'HotPlugMemorySupported' to value '{"x86":"true","ppc":"true"}' for versions 4.0 - 4.2 and 'HotUnplugMemorySupported' to '{"x86":"true","ppc":"true"}' for 4.2.
Will do the patch that will include this update in 0000_config in case anyone else had the same problem.
On Tue, Sep 26, 2017 at 10:12 AM, Fred Rolland <frolland@redhat.com> wrote:
I have same issue with new VM :
2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery] (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Query 'GetArchitectureCapabilitiesQuery' failed: null 2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery] (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Exception: java.lang.NullPointerException at org.ovirt.engine.core.common.FeatureSupported.supportedInConfig(FeatureSupported.java:23) [common.jar:] at org.ovirt.engine.core.common.FeatureSupported.hotUnplugMemory(FeatureSupported.java:43) [common.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.i sSupported(GetArchitectureCapabilitiesQuery.java:66) [bll.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.g etMap(GetArchitectureCapabilitiesQuery.java:36) [bll.jar:] at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.e xecuteQueryCommand(GetArchitectureCapabilitiesQuery.java:22) [bll.jar:] at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:106) [bll.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendQueryExecut or.execute(DefaultBackendQueryExecutor.java:14) [bll.jar:]
Then:
2017-09-26 11:08:08,397+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11) [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Failed to create VM: null 2017-09-26 11:08:08,398+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11) [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Command 'CreateVDSCommand( CreateVDSCommandParameters:{hostId='b6b6a226-8d4f-4929-85d1-b218eceee99e', vmId='f15ddd07-408b-4665-aede-a9efc5716dc7', vm='VM [FEDORA_CINDER]'})' execution failed: java.lang.NullPointerException
On Tue, Sep 26, 2017 at 10:26 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova@redhat.com
wrote:
From 4.0 architecture family was renamed in script 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported', 'HotUnplugCpuSupported', 'HotPlugMemo rySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported', 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of view nothing wrong with that particular line in [1].
Could be that somewhere in code is not used architecture family, but host architecture, when asked for value of this ConfigValues. But that would throw exception even before my patch, because '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported.
I see a code path where the cluster arch can be set to x86_64 - it is always executed for external VMs (imported from external provider or unmanaged). It does not happen all the time, it is only a fallback if the arch type is not known/reported etc.
@Alexander: by any chance, was this VM an unmanaged one? Or imported? In logs you should find something like: "Illegal architecture type: {}, replacing with x86_64" or "null architecture type, replacing with x86_64, {}".
Also, if you create a new VM, can you start it?
[1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade... <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql>*
On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek@redhat.com> wrote:
On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
> > > On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> > wrote: > >> On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote: >> > So somewhere in the code somebody used the Arch and not the >> family. See the >> > enum getFamily() method >> > >> >> Yep, in particular line 23 of FeatureSupported.java. >> >> I meant the caller of the method on this line. Do you have it in > the trace so we can see who passed x86_64 as arch ? > > > On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> >> wrote: >> > > On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote: >> > > > what JRE are you using? any change with that? >> > > >> > > So I just figured out the problem, and its really strange. It >> has nothing >> > > to >> > > do with the SSL as the stack trace is mentioning. I manually >> stepped >> > > through >> > > the code to see what was going on and it turns out it is >> failing in >> > > FeatureSupported.java in supportedInConfig call from >> hotPlugMemory. >> > > >> > > The Config.<Map>getValue(feature, version.getValue()) (version >> is 4.2) is >> > > returning a map containing x86=true and ppc=true. But then it >> compares >> > > this to >> > > ArchitectureType.name() it returns null, because .name() return >> x86_64. No >> > > it >> > > appears that sometime during the last few months we dropped the >> _64 in the >> > > ArchitectureType, or at least in the database. >> > It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/8 1464/
@Mirka: what you think?
> > > >> > > As soon as I added a vdc_options tha contains x86_64 value for >> that key it >> > > started working. Now I have checked with Greg who has a fresh >> database >> > > that he >> > > can start VMs no problem, and his database contains x86 instead >> of x86_64. >> > > >> > > > On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com> >> wrote: >> > > > > Hi guys, >> > > > > >> > > > > I see to be having an issue starting VMs with the latest >> master. >> > > >> > > Whenever >> > > >> > > > > I >> > > > > try to start a VM I get null pointer exception. And the VM >> doesn't >> > > >> > > start. >> > > >> > > > > I >> > > > > have debugged the engine, and it appears that the null >> pointer happens >> > > > > after >> > > > > the engine tries to connect to the host. In the stack trace >> I see >> > > > > SSLPeerUnverifiedException, so it appears something went >> wrong with a >> > > > > certificate somewhere. >> > > > > >> > > > > I have put my hosts in maintaince and re-enrolled the >> certificate, but >> > > > > that >> > > > > doesn't appear to be helping at all. Any other place I need >> to look at >> > > >> > > to >> > > >> > > > > make >> > > > > sure the engine can talk to the hosts? This appears to have >> started >> > > >> > > after >> > > >> > > > > I >> > > > > upgraded Wildfly to 11, so it is possible it has something >> to do with >> > > >> > > that >> > > >> > > > > as >> > > > > well. >> > > > > >> > > > > Any help figuring this out would be appreciated. >> > > > > >> > > > > Alexander >> > > > > _______________________________________________ >> > > > > Devel mailing list >> > > > > Devel@ovirt.org >> > > > > http://lists.ovirt.org/mailman/listinfo/devel >> >> >> > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel >
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Tuesday, September 26, 2017 3:26:44 AM EDT Tomas Jelinek wrote:
On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova@redhat.com>
wrote:
From 4.0 architecture family was renamed in script 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported', 'HotUnplugCpuSupported', ' HotPlugMemorySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported', 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of view nothing wrong with that particular line in [1].
Could be that somewhere in code is not used architecture family, but host architecture, when asked for value of this ConfigValues. But that would throw exception even before my patch, because '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported.
I see a code path where the cluster arch can be set to x86_64 - it is always executed for external VMs (imported from external provider or unmanaged). It does not happen all the time, it is only a fallback if the arch type is not known/reported etc.
@Alexander: by any chance, was this VM an unmanaged one? Or imported? In logs you should find something like: "Illegal architecture type: {}, replacing with x86_64" or "null architecture type, replacing with x86_64, {}".
Also, if you create a new VM, can you start it?
No its an old database though from pre 4.0 times. These VMs have never been unmanaged or imported from external providers. I did not see that in the log, I had to manuall step through the code to end up in the right place that causes the NPE. Like I said before line 23 in FeatureSupported.java is the culprit IMO. It does: String value = archOptions.get(arch.name()); arch is ArchitectureType, and arch.name returns x86_64, and if I understand right they should have done arch.getFamily().name() which does happen 2 lines below it. Honestly I don't understand how any VMs are able to run with the code like that since they all check to see if you can do memory hot plug before starting, and that check runs through this piece of code, which based on the contents of [1] should return an npe since the database should not contain the x86_64 entries.
[1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upg rade/0000_config.sql <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upg rade/0000_config.sql>*
On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek@redhat.com>
wrote:
On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote:
So somewhere in the code somebody used the Arch and not the family.
See the
enum getFamily() method
Yep, in particular line 23 of FeatureSupported.java.
I meant the caller of the method on this line. Do you have it in the
trace so we can see who passed x86_64 as arch ?
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote:
> On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote: > > what JRE are you using? any change with that? > > So I just figured out the problem, and its really strange. It has
nothing
> to > do with the SSL as the stack trace is mentioning. I manually > stepped > through > the code to see what was going on and it turns out it is failing in > FeatureSupported.java in supportedInConfig call from hotPlugMemory. > > The Config.<Map>getValue(feature, version.getValue()) (version is
4.2) is
> returning a map containing x86=true and ppc=true. But then it
compares
> this to > ArchitectureType.name() it returns null, because .name() return
x86_64. No
> it > appears that sometime during the last few months we dropped the _64
in the
> ArchitectureType, or at least in the database.
It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/81464/
@Mirka: what you think?
> As soon as I added a vdc_options tha contains x86_64 value for that
key it
> started working. Now I have checked with Greg who has a fresh
database
> that he > can start VMs no problem, and his database contains x86 instead of
x86_64.
> > On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com>
wrote:
> > > Hi guys, > > > > > > I see to be having an issue starting VMs with the latest > > > master. > > Whenever > > > > I > > > try to start a VM I get null pointer exception. And the VM
doesn't
> start. > > > > I > > > have debugged the engine, and it appears that the null pointer
happens
> > > after > > > the engine tries to connect to the host. In the stack trace I
see
> > > SSLPeerUnverifiedException, so it appears something went wrong
with a
> > > certificate somewhere. > > > > > > I have put my hosts in maintaince and re-enrolled the
certificate, but
> > > that > > > doesn't appear to be helping at all. Any other place I need to
look at
> to > > > > make > > > sure the engine can talk to the hosts? This appears to have
started
> after > > > > I > > > upgraded Wildfly to 11, so it is possible it has something to
do with
> that > > > > as > > > well. > > > > > > Any help figuring this out would be appreciated. > > > > > > Alexander > > > _______________________________________________ > > > Devel mailing list > > > Devel@ovirt.org > > > http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Tue, Sep 26, 2017 at 1:58 PM, Alexander Wels <awels@redhat.com> wrote:
On Tuesday, September 26, 2017 3:26:44 AM EDT Tomas Jelinek wrote:
On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova@redhat.com>
wrote:
From 4.0 architecture family was renamed in script 04_00_0080_rename_architecture_family. So 'HotPlugCpuSupported', 'HotUnplugCpuSupported', ' HotPlugMemorySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported', 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86 not x86_64. In my point of view nothing wrong with that particular line in [1].
Could be that somewhere in code is not used architecture family, but host architecture, when asked for value of this ConfigValues. But that would throw exception even before my patch, because '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported.
I see a code path where the cluster arch can be set to x86_64 - it is always executed for external VMs (imported from external provider or unmanaged). It does not happen all the time, it is only a fallback if the arch type is not known/reported etc.
@Alexander: by any chance, was this VM an unmanaged one? Or imported? In logs you should find something like: "Illegal architecture type: {}, replacing with x86_64" or "null architecture type, replacing with x86_64, {}".
Also, if you create a new VM, can you start it?
No its an old database though from pre 4.0 times. These VMs have never been unmanaged or imported from external providers. I did not see that in the log, I had to manuall step through the code to end up in the right place that causes the NPE. Like I said before line 23 in FeatureSupported.java is the culprit IMO. It does:
String value = archOptions.get(arch.name());
arch is ArchitectureType, and arch.name returns x86_64, and if I understand right they should have done arch.getFamily().name() which does happen 2 lines below it. Honestly I don't understand how any VMs are able to run with the code like that since they all check to see if you can do memory hot plug before starting, and that check runs through this piece of code, which based on the contents of [1] should return an npe since the database should not contain the x86_64 entries.
the reason it did not work is that there was a syntactic error in the vdc_options table causing the Config.<Map>getValue(feature, version.getValue()); to return null. The VMs normally run, because if there is no entry for x86_64 than it checks x86 two lines below.
[1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/ upgrade/pre_upg rade/0000_config.sql <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/ upgrade/pre_upg rade/0000_config.sql>*
On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek@redhat.com>
On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan@redhat.com> wrote:
On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels@redhat.com> wrote:
On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote: > So somewhere in the code somebody used the Arch and not the family.
See the
> enum getFamily() method
Yep, in particular line 23 of FeatureSupported.java.
I meant the caller of the method on this line. Do you have it in the
trace so we can see who passed x86_64 as arch ?
On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels@redhat.com> wrote: > > On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote: > > > what JRE are you using? any change with that? > > > > So I just figured out the problem, and its really strange. It has
nothing
> > to > > do with the SSL as the stack trace is mentioning. I manually > > stepped > > through > > the code to see what was going on and it turns out it is failing in > > FeatureSupported.java in supportedInConfig call from hotPlugMemory. > > > > The Config.<Map>getValue(feature, version.getValue()) (version is
4.2) is
> > returning a map containing x86=true and ppc=true. But then it
compares
> > this to > > ArchitectureType.name() it returns null, because .name() return
x86_64. No
> > it > > appears that sometime during the last few months we dropped the _64
in the
> > ArchitectureType, or at least in the database.
It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/ 81464/
@Mirka: what you think?
> > As soon as I added a vdc_options tha contains x86_64 value for
wrote: that
key it
> > started working. Now I have checked with Greg who has a fresh
database
> > that he > > can start VMs no problem, and his database contains x86 instead
of
x86_64.
> > > On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels@redhat.com
wrote: > > > > Hi guys, > > > > > > > > I see to be having an issue starting VMs with the latest > > > > master. > > > > Whenever > > > > > > I > > > > try to start a VM I get null pointer exception. And the VM
doesn't
> > start. > > > > > > I > > > > have debugged the engine, and it appears that the null
pointer
happens
> > > > after > > > > the engine tries to connect to the host. In the stack trace
I
see
> > > > SSLPeerUnverifiedException, so it appears something went
wrong
with a
> > > > certificate somewhere. > > > > > > > > I have put my hosts in maintaince and re-enrolled the
certificate, but
> > > > that > > > > doesn't appear to be helping at all. Any other place I need
to
look at
> > to > > > > > > make > > > > sure the engine can talk to the hosts? This appears to have
started
> > after > > > > > > I > > > > upgraded Wildfly to 11, so it is possible it has something
to
do with
> > that > > > > > > as > > > > well. > > > > > > > > Any help figuring this out would be appreciated. > > > > > > > > Alexander > > > > _______________________________________________ > > > > Devel mailing list > > > > Devel@ovirt.org > > > > http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
participants (5)
-
Alexander Wels
-
Fred Rolland
-
Miroslava Voglova
-
Roy Golan
-
Tomas Jelinek