[lago-devel] [ovirt-devel] OST: HE vm does not restart on HC setup
Yaniv Kaul
ykaul at redhat.com
Wed Feb 22 14:42:50 UTC 2017
On Wed, Feb 22, 2017 at 4:32 PM Francesco Romani <fromani at redhat.com> wrote:
> On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
>
>
>
> On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
> When ovirt-ha-agent checks the status of the engine VM we get:
>
> 2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method
> ret = func(*args, **kwargs)
> File "/usr/share/vdsm/API.py", line 335, in getStats
> vm = self.vm
> File "/usr/share/vdsm/API.py", line 130, in vm
> raise exception.NoSuchVM(vmId=self._UUID)
> NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
>
> While in ovirt-ha-agent logs we have:
>
> MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
>
> ...
>
> MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
>
> Probably it's a bug or a regression somewhere on master.
>
> On ovirt-ha-broker side the detection is based on a strict string match on
> the error message that is expected to be exactly 'Virtual machine does not
> exist' to set down status otherwise we set unknown status as in this case:
>
> https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54
>
> Adding Francesco here to understand if something has recently changed
> there on vdsm side.
>
> It has changed indeed; we had a series of changes which added context to
> some exceptions. I believe the straw who broke the camel's back was
> I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this
> change broke HA. Could you perhaps fixing it checking that the message
> *begins* with that string, and/or checking the error code. bests,
>
On the bright side, this is exactly why we need o-s-t running Hosted-Engine
- though we probably need to exercise more HE flows (global and local
maint., for example).
On the downside, how come I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 was
merged on Jan1st, and we only saw the regression now? Is there another bug
that hid this one until now?
Y.
> --
> Francesco Romani
> Red Hat Engineering Virtualization R & D
> IRC: fromani
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/lago-devel/attachments/20170222/213aef08/attachment.html>
More information about the lago-devel
mailing list