On 22 Feb 2017, at 13:53, Simone Tiraboschi <stirabos@redhat.com> wrote:On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:When ovirt-ha-agent checks the status of the engine VM we get:2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packa ges/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._ UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78f a4cad671'} While in ovirt-ha-agent logs we have:Probably it's a bug or a regression somewhere on master.MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring) Current state UnknownLocalVmState (score: 3400) ...MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngi ne::(check) Unknown local engine vm status no actions taken On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case:Adding Francesco here to understand if something has recently changed there on vdsm side.That’s not a very robust code handling.Yes, the text changed, the vm id was added.And yes, it may change again any time I guess
______________________________On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:Adding LevOn Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sabose@redhat.com> wrote:______________________________Hi all,
On the HC setup, the HE VM is not restarted.
The agent.log has
MainThread::INFO::2017-02-21 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(refresh) Global metadata: {} MainThread::INFO::2017-02-21 22:09:58,023::state_machine::1 77::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(refresh) Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True}
...
MainThread::INFO::2017-02-21 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.a gent.hosted_engine.HostedEngin e::(check) Unknown local engine vm status no actions taken MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111: :ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(notify) Trying: notify time=1487733029.22 type=state_transition detail=ReinitializeFSM-Unknown LocalVmState hostname='lago-hc-basic-suite- master-host0' MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121: :ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-UnknownLocalV mState) sent? ignored and the vdsm.log
2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User shut down from within the guest (code=7) (vm:1269) 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91 -a78fa4cad671') Stopping connection (guestagent:429)
2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packa ges/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._ UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78f a4cad671'} What should I be looking for to identify the issue?The logs are at http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/la stCompletedBuild/artifact/expo rted-artifacts/test_logs/hc-ba sic-suite-master/post-002_boot strap.py/lago-hc-basic-suite-m aster-host0 thanks
sahina_________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel --Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com_________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel