[lago-devel] [ovirt-devel] OST: HE vm does not restart on HC setup

Simone Tiraboschi stirabos at redhat.com
Wed Feb 22 12:53:11 UTC 2017


On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos at redhat.com>
wrote:

> When ovirt-ha-agent checks the status of the engine VM we get:
>
> 2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method
>     ret = func(*args, **kwargs)
>   File "/usr/share/vdsm/API.py", line 335, in getStats
>     vm = self.vm
>   File "/usr/share/vdsm/API.py", line 130, in vm
>     raise exception.NoSuchVM(vmId=self._UUID)
> NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
>
>
> While in ovirt-ha-agent logs we have:
>
> MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
>
> ...
>
> MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
>
> Probably it's a bug or a regression somewhere on master.
>

On ovirt-ha-broker side the detection is based on a strict string match on
the error message that is expected to be exactly 'Virtual machine does not
exist' to set down status otherwise we set unknown status as in this case:
https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54

Adding Francesco here to understand if something has recently changed there
on vdsm side.


>
> On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <sbonazzo at redhat.com>
> wrote:
>
>> Adding Lev
>>
>> On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sabose at redhat.com> wrote:
>>
>>> Hi all,
>>>
>>> On the HC setup, the HE VM is not restarted.
>>> The agent.log has
>>> MainThread::INFO::2017-02-21 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {}
>>> MainThread::INFO::2017-02-21 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True}
>>> ...
>>> MainThread::INFO::2017-02-21 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
>>> MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1487733029.22 type=state_transition detail=ReinitializeFSM-UnknownLocalVmState hostname='lago-hc-basic-suite-master-host0'
>>> MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-UnknownLocalVmState) sent? ignored
>>>
>>> and the vdsm.log
>>>
>>> 2017-02-21 22:09:11,962-0500 INFO  (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User shut down from within the guest (code=7) (vm:1269)
>>> 2017-02-21 22:09:11,962-0500 INFO  (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection (guestagent:429)
>>>
>>> 2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69)
>>> Traceback (most recent call last):
>>>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method
>>>     ret = func(*args, **kwargs)
>>>   File "/usr/share/vdsm/API.py", line 335, in getStats
>>>     vm = self.vm
>>>   File "/usr/share/vdsm/API.py", line 130, in vm
>>>     raise exception.NoSuchVM(vmId=self._UUID)
>>> NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
>>>
>>>
>>> What should I be looking for to identify the issue?
>>>
>>> The logs are at http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild/artifact/exported-artifacts/test_logs/hc-basic-suite-master/post-002_bootstrap.py/lago-hc-basic-suite-master-host0
>>>
>>> thanks
>>>
>>> sahina
>>>
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>
>>
>>
>>
>> --
>> Sandro Bonazzola
>> Better technology. Faster innovation. Powered by community collaboration.
>> See how it works at redhat.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/lago-devel/attachments/20170222/66c93ce3/attachment.html>


More information about the lago-devel mailing list