[ovirt-devel] Unable to start VMs
Miroslava Voglova
mvoglova at redhat.com
Tue Sep 26 10:04:10 UTC 2017
Promised patch: https://gerrit.ovirt.org/#/c/82203/
On Tue, Sep 26, 2017 at 11:44 AM, Miroslava Voglova <mvoglova at redhat.com>
wrote:
> So we found where is problem. There was patch before some time that was
> reverted and had two badly formatted jsons in value of vdc_option. If you
> have db from that time, the bug will appear, because the values are not
> updated (because they are already in db).
>
> To fix this, its enough to change 'HotPlugMemorySupported' to value
> '{"x86":"true","ppc":"true"}' for versions 4.0 - 4.2 and
> 'HotUnplugMemorySupported' to '{"x86":"true","ppc":"true"}' for 4.2.
>
> Will do the patch that will include this update in 0000_config in case
> anyone else had the same problem.
>
> On Tue, Sep 26, 2017 at 10:12 AM, Fred Rolland <frolland at redhat.com>
> wrote:
>
>> I have same issue with new VM :
>>
>> 2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery]
>> (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Query
>> 'GetArchitectureCapabilitiesQuery' failed: null
>> 2017-09-26 11:07:59,255+03 ERROR [org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery]
>> (default task-2) [c066bc1d-4048-4b5f-bdb6-74fd813aa82e] Exception:
>> java.lang.NullPointerException
>> at org.ovirt.engine.core.common.FeatureSupported.supportedInConfig(FeatureSupported.java:23)
>> [common.jar:]
>> at org.ovirt.engine.core.common.FeatureSupported.hotUnplugMemory(FeatureSupported.java:43)
>> [common.jar:]
>> at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.
>> isSupported(GetArchitectureCapabilitiesQuery.java:66) [bll.jar:]
>> at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.
>> getMap(GetArchitectureCapabilitiesQuery.java:36) [bll.jar:]
>> at org.ovirt.engine.core.bll.GetArchitectureCapabilitiesQuery.
>> executeQueryCommand(GetArchitectureCapabilitiesQuery.java:22) [bll.jar:]
>> at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:106)
>> [bll.jar:]
>> at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
>> [dal.jar:]
>> at org.ovirt.engine.core.bll.executor.DefaultBackendQueryExecut
>> or.execute(DefaultBackendQueryExecutor.java:14) [bll.jar:]
>>
>> Then:
>>
>> 2017-09-26 11:08:08,397+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand]
>> (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11)
>> [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Failed to create VM: null
>> 2017-09-26 11:08:08,398+03 ERROR [org.ovirt.engine.core.vdsbroker.CreateVDSCommand]
>> (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-11)
>> [9eb76d15-2bd5-49a9-a7d8-c73cfd55282c] Command 'CreateVDSCommand(
>> CreateVDSCommandParameters:{hostId='b6b6a226-8d4f-4929-85d1-b218eceee99e',
>> vmId='f15ddd07-408b-4665-aede-a9efc5716dc7', vm='VM [FEDORA_CINDER]'})'
>> execution failed: java.lang.NullPointerException
>>
>>
>> On Tue, Sep 26, 2017 at 10:26 AM, Tomas Jelinek <tjelinek at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Sep 26, 2017 at 9:17 AM, Miroslava Voglova <mvoglova at redhat.com>
>>> wrote:
>>>
>>>> From 4.0 architecture family was renamed in
>>>> script 04_00_0080_rename_architecture_family. So
>>>> 'HotPlugCpuSupported', 'HotUnplugCpuSupported', 'HotPlugMemo
>>>> rySupported', 'HotUnplugMemorySupported', 'IsMigrationSupported',
>>>> 'IsMemorySnapshotSupported' and 'IsSuspendSupported' are all in db with x86
>>>> not x86_64. In my point of view nothing wrong with that particular line in
>>>> [1].
>>>>
>>>> Could be that somewhere in code is not used architecture family, but
>>>> host architecture, when asked for value of this ConfigValues. But that
>>>> would throw exception even before my patch, because
>>>> '{"x86:"true","ppc":"true"}' was default value for HotPlugMemorySupported.
>>>>
>>>
>>> I see a code path where the cluster arch can be set to x86_64 - it is
>>> always executed for external VMs (imported from external provider or
>>> unmanaged). It does not happen all the time, it is only a fallback if the
>>> arch type is not known/reported etc.
>>>
>>> @Alexander: by any chance, was this VM an unmanaged one? Or imported? In
>>> logs you should find something like:
>>> "Illegal architecture type: {}, replacing with x86_64" or "null
>>> architecture type, replacing with x86_64, {}".
>>>
>>> Also, if you create a new VM, can you start it?
>>>
>>>
>>>>
>>>>
>>>> [1] *https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql
>>>> <https://gerrit.ovirt.org/#/c/81464/7/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql>*
>>>>
>>>> On Tue, Sep 26, 2017 at 9:00 AM, Tomas Jelinek <tjelinek at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 25, 2017 at 10:08 PM, Roy Golan <rgolan at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, 25 Sep 2017 at 22:52 Alexander Wels <awels at redhat.com> wrote:
>>>>>>
>>>>>>> On Monday, September 25, 2017 3:50:56 PM EDT Roy Golan wrote:
>>>>>>> > So somewhere in the code somebody used the Arch and not the
>>>>>>> family. See the
>>>>>>> > enum getFamily() method
>>>>>>> >
>>>>>>>
>>>>>>> Yep, in particular line 23 of FeatureSupported.java.
>>>>>>>
>>>>>>> I meant the caller of the method on this line. Do you have it in the
>>>>>> trace so we can see who passed x86_64 as arch ?
>>>>>>
>>>>>> > On Mon, 25 Sep 2017 at 22:31 Alexander Wels <awels at redhat.com>
>>>>>>> wrote:
>>>>>>> > > On Monday, September 25, 2017 3:24:14 PM EDT Roy Golan wrote:
>>>>>>> > > > what JRE are you using? any change with that?
>>>>>>> > >
>>>>>>> > > So I just figured out the problem, and its really strange. It
>>>>>>> has nothing
>>>>>>> > > to
>>>>>>> > > do with the SSL as the stack trace is mentioning. I manually
>>>>>>> stepped
>>>>>>> > > through
>>>>>>> > > the code to see what was going on and it turns out it is failing
>>>>>>> in
>>>>>>> > > FeatureSupported.java in supportedInConfig call from
>>>>>>> hotPlugMemory.
>>>>>>> > >
>>>>>>> > > The Config.<Map>getValue(feature, version.getValue()) (version
>>>>>>> is 4.2) is
>>>>>>> > > returning a map containing x86=true and ppc=true. But then it
>>>>>>> compares
>>>>>>> > > this to
>>>>>>> > > ArchitectureType.name() it returns null, because .name() return
>>>>>>> x86_64. No
>>>>>>> > > it
>>>>>>> > > appears that sometime during the last few months we dropped the
>>>>>>> _64 in the
>>>>>>> > > ArchitectureType, or at least in the database.
>>>>>>>
>>>>>>
>>>>> It looks a lot like introduced here: https://gerrit.ovirt.org/#/c/8
>>>>> 1464/
>>>>>
>>>>> @Mirka: what you think?
>>>>>
>>>>>
>>>>>> > >
>>>>>>> > > As soon as I added a vdc_options tha contains x86_64 value for
>>>>>>> that key it
>>>>>>> > > started working. Now I have checked with Greg who has a fresh
>>>>>>> database
>>>>>>> > > that he
>>>>>>> > > can start VMs no problem, and his database contains x86 instead
>>>>>>> of x86_64.
>>>>>>> > >
>>>>>>> > > > On Mon, 25 Sep 2017 at 21:12 Alexander Wels <awels at redhat.com>
>>>>>>> wrote:
>>>>>>> > > > > Hi guys,
>>>>>>> > > > >
>>>>>>> > > > > I see to be having an issue starting VMs with the latest
>>>>>>> master.
>>>>>>> > >
>>>>>>> > > Whenever
>>>>>>> > >
>>>>>>> > > > > I
>>>>>>> > > > > try to start a VM I get null pointer exception. And the VM
>>>>>>> doesn't
>>>>>>> > >
>>>>>>> > > start.
>>>>>>> > >
>>>>>>> > > > > I
>>>>>>> > > > > have debugged the engine, and it appears that the null
>>>>>>> pointer happens
>>>>>>> > > > > after
>>>>>>> > > > > the engine tries to connect to the host. In the stack trace
>>>>>>> I see
>>>>>>> > > > > SSLPeerUnverifiedException, so it appears something went
>>>>>>> wrong with a
>>>>>>> > > > > certificate somewhere.
>>>>>>> > > > >
>>>>>>> > > > > I have put my hosts in maintaince and re-enrolled the
>>>>>>> certificate, but
>>>>>>> > > > > that
>>>>>>> > > > > doesn't appear to be helping at all. Any other place I need
>>>>>>> to look at
>>>>>>> > >
>>>>>>> > > to
>>>>>>> > >
>>>>>>> > > > > make
>>>>>>> > > > > sure the engine can talk to the hosts? This appears to have
>>>>>>> started
>>>>>>> > >
>>>>>>> > > after
>>>>>>> > >
>>>>>>> > > > > I
>>>>>>> > > > > upgraded Wildfly to 11, so it is possible it has something
>>>>>>> to do with
>>>>>>> > >
>>>>>>> > > that
>>>>>>> > >
>>>>>>> > > > > as
>>>>>>> > > > > well.
>>>>>>> > > > >
>>>>>>> > > > > Any help figuring this out would be appreciated.
>>>>>>> > > > >
>>>>>>> > > > > Alexander
>>>>>>> > > > > _______________________________________________
>>>>>>> > > > > Devel mailing list
>>>>>>> > > > > Devel at ovirt.org
>>>>>>> > > > > http://lists.ovirt.org/mailman/listinfo/devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Devel mailing list
>>>>>> Devel at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170926/5d9c53f0/attachment.html>
More information about the Devel
mailing list