[lago-devel] [ovirt-devel] OST: HE vm does not restart on HC setup

Simone Tiraboschi stirabos at redhat.com
Wed Feb 22 14:26:42 UTC 2017


On Wed, Feb 22, 2017 at 3:22 PM, Michal Skrivanek <
michal.skrivanek at redhat.com> wrote:

>
> On 22 Feb 2017, at 13:53, Simone Tiraboschi <stirabos at redhat.com> wrote:
>
>
>
> On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>> When ovirt-ha-agent checks the status of the engine VM we get:
>>
>> 2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69)
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method
>>     ret = func(*args, **kwargs)
>>   File "/usr/share/vdsm/API.py", line 335, in getStats
>>     vm = self.vm
>>   File "/usr/share/vdsm/API.py", line 130, in vm
>>     raise exception.NoSuchVM(vmId=self._UUID)
>> NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
>>
>>
>> While in ovirt-ha-agent logs we have:
>>
>> MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
>>
>> ...
>>
>> MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
>>
>> Probably it's a bug or a regression somewhere on master.
>>
>
> On ovirt-ha-broker side the detection is based on a strict string match on
> the error message that is expected to be exactly 'Virtual machine does not
> exist' to set down status otherwise we set unknown status as in this case:
> https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-
> ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/
> engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/
> master#l54
>
> Adding Francesco here to understand if something has recently changed
> there on vdsm side.
>
>
> That’s not a very robust code handling.
> Yes, the text changed, the vm id was added.
> And yes, it may change again any time I guess
>

I agree, we are going to move to code check:
https://gerrit.ovirt.org/#/c/72891


>
>
>
>>
>> On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <sbonazzo at redhat.com>
>> wrote:
>>
>>> Adding Lev
>>>
>>> On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sabose at redhat.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> On the HC setup, the HE VM is not restarted.
>>>> The agent.log has
>>>> MainThread::INFO::2017-02-21 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {}
>>>> MainThread::INFO::2017-02-21 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True}
>>>> ...
>>>> MainThread::INFO::2017-02-21 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
>>>> MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1487733029.22 type=state_transition detail=ReinitializeFSM-UnknownLocalVmState hostname='lago-hc-basic-suite-master-host0'
>>>> MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-UnknownLocalVmState) sent? ignored
>>>>
>>>> and the vdsm.log
>>>>
>>>> 2017-02-21 22:09:11,962-0500 INFO  (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User shut down from within the guest (code=7) (vm:1269)
>>>> 2017-02-21 22:09:11,962-0500 INFO  (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection (guestagent:429)
>>>>
>>>> 2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69)
>>>> Traceback (most recent call last):
>>>>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method
>>>>     ret = func(*args, **kwargs)
>>>>   File "/usr/share/vdsm/API.py", line 335, in getStats
>>>>     vm = self.vm
>>>>   File "/usr/share/vdsm/API.py", line 130, in vm
>>>>     raise exception.NoSuchVM(vmId=self._UUID)
>>>> NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
>>>>
>>>>
>>>> What should I be looking for to identify the issue?
>>>>
>>>> The logs are at http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild/artifact/exported-artifacts/test_logs/hc-basic-suite-master/post-002_bootstrap.py/lago-hc-basic-suite-master-host0
>>>>
>>>> thanks
>>>>
>>>> sahina
>>>>
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>>
>>>
>>>
>>>
>>> --
>>> Sandro Bonazzola
>>> Better technology. Faster innovation. Powered by community collaboration.
>>> See how it works at redhat.com
>>>
>>
>>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/lago-devel/attachments/20170222/e2732627/attachment.html>


More information about the lago-devel mailing list