[ovirt-devel] OST: HE vm does not restart on HC setup

Francesco Romani fromani at redhat.com
Wed Feb 22 15:34:49 UTC 2017


On 02/22/2017 03:42 PM, Yaniv Kaul wrote:
>
>
> On Wed, Feb 22, 2017 at 4:32 PM Francesco Romani <fromani at redhat.com
> <mailto:fromani at redhat.com>> wrote:
>
>     On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
>>
>>
>>     On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi
>>     <stirabos at redhat.com <mailto:stirabos at redhat.com>> wrote:
>>
>>         When ovirt-ha-agent checks the status of the engine VM we get:
>>
>>         2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69)
>>         Traceback (most recent call last):
>>           File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method
>>             ret = func(*args, **kwargs)
>>           File "/usr/share/vdsm/API.py", line 335, in getStats
>>             vm = self.vm
>>           File "/usr/share/vdsm/API.py", line 130, in vm
>>             raise exception.NoSuchVM(vmId=self._UUID)
>>         NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
>>
>>         While in ovirt-ha-agent logs we have:
>>
>>         MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
>>
>>         ...
>>
>>         MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
>>
>>         Probably it's a bug or a regression somewhere on master.
>>
>>     On ovirt-ha-broker side the detection is based on a strict string
>>     match on the error message that is expected to be exactly
>>     'Virtual machine does not exist' to set down status otherwise we
>>     set unknown status as in this case:
>>     https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54
>>
>>      
>>     Adding Francesco here to understand if something has recently
>>     changed there on vdsm side.
>     It has changed indeed; we had a series of changes which added
>     context to some exceptions. I believe the straw who broke the
>     camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is
>     unfortunate that this change broke HA. Could you perhaps fixing it
>     checking that the message *begins* with that string, and/or
>     checking the error code. bests,
>
>
> On the bright side, this is exactly why we need o-s-t running
> Hosted-Engine - though we probably need to exercise more HE flows
> (global and local maint., for example).
> On the downside, how come I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5
> was merged on Jan1st, and we only saw the regression now? Is there
> another bug that hid this one until now?
> Y.
>

It was merged on Jan 29 on master, backported on Feb 8 on 4.1 branch
(because it was part of the vmleases feature, needed on 4.1.z).

Bests,

-- 
Francesco Romani
Red Hat Engineering Virtualization R & D
IRC: fromani

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170222/30855022/attachment.html>


More information about the Devel mailing list