OST: HE vm does not restart on HC setup

Hi all, On the HC setup, the HE VM is not restarted. The agent.log has MainThread::INFO::2017-02-21 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {} MainThread::INFO::2017-02-21 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True} ... MainThread::INFO::2017-02-21 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1487733029.22 type=state_transition detail=ReinitializeFSM-UnknownLocalVmState hostname='lago-hc-basic-suite-master-host0' MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-UnknownLocalVmState) sent? ignored and the vdsm.log 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User shut down from within the guest (code=7) (vm:1269) 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection (guestagent:429) 2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} What should I be looking for to identify the issue? The logs are at http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild... thanks sahina

Adding Lev On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sabose@redhat.com> wrote:
Hi all,
On the HC setup, the HE VM is not restarted. The agent.log has MainThread::INFO::2017-02-21 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {} MainThread::INFO::2017-02-21 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True} ... MainThread::INFO::2017-02-21 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1487733029.22 type=state_transition detail=ReinitializeFSM-UnknownLocalVmState hostname='lago-hc-basic-suite-master-host0' MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-UnknownLocalVmState) sent? ignored
and the vdsm.log
2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User shut down from within the guest (code=7) (vm:1269) 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection (guestagent:429)
2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
What should I be looking for to identify the issue?
The logs are at http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild...
thanks
sahina
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

When ovirt-ha-agent checks the status of the engine VM we get: 2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} While in ovirt-ha-agent logs we have: MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400) ... MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken Probably it's a bug or a regression somewhere on master. On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Adding Lev
On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sabose@redhat.com> wrote:
Hi all,
On the HC setup, the HE VM is not restarted. The agent.log has MainThread::INFO::2017-02-21 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {} MainThread::INFO::2017-02-21 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True} ... MainThread::INFO::2017-02-21 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1487733029.22 type=state_transition detail=ReinitializeFSM-UnknownLocalVmState hostname='lago-hc-basic-suite-master-host0' MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-UnknownLocalVmState) sent? ignored
and the vdsm.log
2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User shut down from within the guest (code=7) (vm:1269) 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection (guestagent:429)
2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
What should I be looking for to identify the issue?
The logs are at http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild...
thanks
sahina
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
When ovirt-ha-agent checks the status of the engine VM we get:
2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
While in ovirt-ha-agent logs we have:
MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
...
MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
Probably it's a bug or a regression somewhere on master.
On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case: https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_... Adding Francesco here to understand if something has recently changed there on vdsm side.
On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Adding Lev
On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sabose@redhat.com> wrote:
Hi all,
On the HC setup, the HE VM is not restarted. The agent.log has MainThread::INFO::2017-02-21 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {} MainThread::INFO::2017-02-21 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True} ... MainThread::INFO::2017-02-21 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1487733029.22 type=state_transition detail=ReinitializeFSM-UnknownLocalVmState hostname='lago-hc-basic-suite-master-host0' MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-UnknownLocalVmState) sent? ignored
and the vdsm.log
2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User shut down from within the guest (code=7) (vm:1269) 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection (guestagent:429)
2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
What should I be looking for to identify the issue?
The logs are at http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild...
thanks
sahina
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

On 22 Feb 2017, at 13:53, Simone Tiraboschi <stirabos@redhat.com> = wrote: =20 =20 =20 On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi = <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote: When ovirt-ha-agent checks the status of the engine VM we get: 2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats = error=3DVirtual machine does not exist: {'vmId': = u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, = in method ret =3D func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm =3D self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=3Dself._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': = u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} =20 While in ovirt-ha-agent logs we have: MainThread::INFO::2017-02-21 = 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engi= ne.HostedEngine::(start_monitoring) Current state UnknownLocalVmState = (score: 3400) ... MainThread::INFO::2017-02-21 = 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_en= gine.HostedEngine::(check) Unknown local engine vm status no actions = taken Probably it's a bug or a regression somewhere on master. =20 On ovirt-ha-broker side the detection is based on a strict string = match on the error message that is expected to be exactly 'Virtual = machine does not exist' to set down status otherwise we set unknown = status as in this case: = https://gerrit.ovirt.org/gitweb?p=3Dovirt-hosted-engine-ha.git;a=3Dblob;f=3D... ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=3Dd633cb860b8= 11e84021221771bf706a9a4ac1d63;hb=3Drefs/heads/master#l54 = <https://gerrit.ovirt.org/gitweb?p=3Dovirt-hosted-engine-ha.git;a=3Dblob;f= =3Dovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=3Dd633cb86= 0b811e84021221771bf706a9a4ac1d63;hb=3Drefs/heads/master#l54> =20 Adding Francesco here to understand if something has recently changed =
--Apple-Mail=_F2D44161-4CD5-4109-BD15-4E9B9ED8862F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 there on vdsm side. That=E2=80=99s not a very robust code handling. Yes, the text changed, the vm id was added. And yes, it may change again any time I guess
=20 =20 =20 On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <sbonazzo@redhat.com = <mailto:sbonazzo@redhat.com>> wrote: Adding Lev =20 On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sabose@redhat.com = <mailto:sabose@redhat.com>> wrote: Hi all, =20 On the HC setup, the HE VM is not restarted. The agent.log has=20 MainThread::INFO::2017-02-21 = 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engi= ne.HostedEngine::(refresh) Global metadata: {} MainThread::INFO::2017-02-21 = 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engi= ne.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': = 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': = 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, = 'cpu-load': 0.0491, 'gateway': True} ... MainThread::INFO::2017-02-21 = 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_en= gine.HostedEngine::(check) Unknown local engine vm status no actions = taken MainThread::INFO::2017-02-21 = 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.br = <http://ovirt_hosted_engine_ha.lib.br/>okerlink.BrokerLink::(notify) = Trying: notify time=3D1487733029.22 type=3Dstate_transition = detail=3DReinitializeFSM-UnknownLocalVmState = hostname=3D'lago-hc-basic-suite-master-host0' MainThread::INFO::2017-02-21 = 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.br = <http://ovirt_hosted_engine_ha.lib.br/>okerlink.BrokerLink::(notify) = Success, was notification of state_transition = (ReinitializeFSM-UnknownLocalVmState) sent? ignored and the vdsm.log=20 =20 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] = (vmId=3D'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: = User shut down from within the guest (code=3D7) (vm:1269) 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] = (vmId=3D'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection = (guestagent:429) =20 2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats = error=3DVirtual machine does not exist: {'vmId': = u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, = in method ret =3D func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm =3D self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=3Dself._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': = u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} =20 =20 What should I be looking for to identify the issue? =20 The logs are at = http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBui= ld/artifact/exported-artifacts/test_logs/hc-basic-suite-master/post-002_bo= otstrap.py/lago-hc-basic-suite-master-host0 = <http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBu= ild/artifact/exported-artifacts/test_logs/hc-basic-suite-master/post-002_b= ootstrap.py/lago-hc-basic-suite-master-host0> =20 thanks sahina =20 _______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel = <http://lists.ovirt.org/mailman/listinfo/devel> =20 =20 =20 --=20 Sandro Bonazzola Better technology. Faster innovation. Powered by community =
collaboration. > See how it works at redhat.com <http://redhat.com/> >=20 > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel
--Apple-Mail=_F2D44161-4CD5-4109-BD15-4E9B9ED8862F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 22 Feb 2017, at 13:53, Simone Tiraboschi <<a = href=3D"mailto:stirabos@redhat.com" class=3D"">stirabos@redhat.com</a>>= wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><br class=3D""><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote">On Wed, Feb 22, 2017 at 1:33 PM, = Simone Tiraboschi <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:stirabos@redhat.com" target=3D"_blank" = class=3D"">stirabos@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div = dir=3D"ltr" class=3D"">When ovirt-ha-agent checks the status of the = engine VM we get:<div class=3D""><pre style=3D"" class=3D"">2017-02-21 = 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=3DVirtual = machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-<wbr = class=3D"">a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-<wbr = class=3D"">packages/vdsm/common/api.py", line 67, in method ret =3D func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm =3D self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=3Dself._<wbr class=3D"">UUID) NoSuchVM: Virtual machine does not exist: {'vmId': = u'2ccc0ef0-cc31-45b8-8e91-<wbr class=3D"">a78fa4cad671'}</pre><pre = style=3D"" class=3D""><br class=3D""></pre><pre class=3D"">While in = ovirt-ha-agent logs we have:<pre style=3D"" class=3D""><pre = class=3D"">MainThread::INFO::2017-02-21 = 22:21:18,583::hosted_engine::<wbr = class=3D"">453::ovirt_hosted_engine_ha.<wbr = class=3D"">agent.hosted_engine.<wbr class=3D"">HostedEngine::(start_<wbr = class=3D"">monitoring) Current state UnknownLocalVmState (score: = 3400)</pre><pre class=3D"">...</pre></pre><pre style=3D"" = class=3D"">MainThread::INFO::2017-02-21 22:21:31,199::state_<wbr = class=3D"">decorators::25::ovirt_hosted_<wbr = class=3D"">engine_ha.agent.hosted_engine.<wbr = class=3D"">HostedEngine::(check) Unknown local engine vm status no = actions taken</pre></pre>Probably it's a bug or a regression somewhere = on master.</div></div></blockquote><div class=3D""><br = class=3D""></div><div class=3D"">On ovirt-ha-broker side the detection = is based on a strict string match on the error message that is expected = to be exactly 'Virtual machine does not exist' to set down status = otherwise we set unknown status as in this case:</div><div class=3D""><a = href=3D"https://gerrit.ovirt.org/gitweb?p=3Dovirt-hosted-engine-ha.git;a=3D= blob;f=3Dovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=3Dd6= 33cb860b811e84021221771bf706a9a4ac1d63;hb=3Drefs/heads/master#l54" = class=3D"">https://gerrit.ovirt.org/gitweb?p=3Dovirt-hosted-engine-ha.git;= a=3Dblob;f=3Dovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=3D= d633cb860b811e84021221771bf706a9a4ac1d63;hb=3Drefs/heads/master#l54</a><br= class=3D""></div><div class=3D""> </div><div class=3D"">Adding = Francesco here to understand if something has recently changed there on = vdsm side.</div></div></div></div></div></blockquote><div><br = class=3D""></div>That=E2=80=99s not a very robust code = handling.</div><div>Yes, the text changed, the vm id was = added.</div><div>And yes, it may change again any time I = guess</div><div><br class=3D""></div><div><blockquote type=3D"cite" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D""><br = class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px = 0px 0px 0.8ex;border-left:1px solid = rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div = class=3D""><div class=3D"gmail-h5"><div class=3D""><br class=3D""><div = class=3D"gmail_extra"><br class=3D""><div class=3D"gmail_quote">On Wed, = Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:sbonazzo@redhat.com" target=3D"_blank" = class=3D"">sbonazzo@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div = dir=3D"ltr" class=3D"">Adding Lev</div><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote"><div class=3D""><div = class=3D"gmail-m_3795450909556566802gmail-h5">On Wed, Feb 22, 2017 at = 12:59 PM, Sahina Bose <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:sabose@redhat.com" target=3D"_blank" = class=3D"">sabose@redhat.com</a>></span> wrote:<br = class=3D""></div></div><blockquote class=3D"gmail_quote" = style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid = rgb(204,204,204);padding-left:1ex"><div class=3D""><div = class=3D"gmail-m_3795450909556566802gmail-h5"><div dir=3D"ltr" = class=3D""><pre class=3D"">Hi all,<br class=3D""><br class=3D"">On the = HC setup, the HE VM is not restarted.<br class=3D"">The agent.log has = <br class=3D"">MainThread::INFO::2017-02-21 = 22:09:58,022::state_machine::1<wbr = class=3D"">69::ovirt_hosted_engine_ha.age<wbr = class=3D"">nt.hosted_engine.HostedEngine:<wbr class=3D"">:(refresh) = Global metadata: {} MainThread::INFO::2017-02-21 22:09:58,023::state_machine::1<wbr = class=3D"">77::ovirt_hosted_engine_ha.age<wbr = class=3D"">nt.hosted_engine.HostedEngine:<wbr class=3D"">:(refresh) = Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', = 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': = True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, = 'gateway': True}<br class=3D"">...<br = class=3D"">MainThread::INFO::2017-02-21 = 22:10:29,219::state_decorators<wbr = class=3D"">::25::ovirt_hosted_engine_ha.a<wbr = class=3D"">gent.hosted_engine.HostedEngin<wbr class=3D"">e::(check) = Unknown local engine vm status no actions taken MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111:<wbr = class=3D"">:<a href=3D"http://ovirt_hosted_engine_ha.lib.br/" = target=3D"_blank" class=3D"">ovirt_hosted_engine_ha.lib.br</a><wbr = class=3D"">okerlink.BrokerLink::(notify) Trying: notify = time=3D1487733029.22 type=3Dstate_transition = detail=3DReinitializeFSM-Unknown<wbr class=3D"">LocalVmState = hostname=3D'lago-hc-basic-suite-<wbr class=3D"">master-host0' MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121:<wbr = class=3D"">:<a href=3D"http://ovirt_hosted_engine_ha.lib.br/" = target=3D"_blank" class=3D"">ovirt_hosted_engine_ha.lib.br</a><wbr = class=3D"">okerlink.BrokerLink::(notify) Success, was notification of = state_transition (ReinitializeFSM-UnknownLocalV<wbr class=3D"">mState) = sent? ignored<br class=3D""></pre><pre class=3D"">and the vdsm.log <br = class=3D""><br class=3D"">2017-02-21 22:09:11,962-0500 INFO = (libvirt/events) [virt.vm] (vmId=3D'2ccc0ef0-cc31-45b8-8e91<wbr = class=3D"">-a78fa4cad671') Changed state to Down: User shut down from = within the guest (code=3D7) (vm:1269) 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] = (vmId=3D'2ccc0ef0-cc31-45b8-8e91<wbr class=3D"">-a78fa4cad671') Stopping = connection (guestagent:429)<br class=3D""><br class=3D"">2017-02-21 = 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=3DVirtual = machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78f<wbr = class=3D"">a4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packa<wbr = class=3D"">ges/vdsm/common/api.py", line 67, in method ret =3D func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm =3D self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=3Dself._<wbr class=3D"">UUID) NoSuchVM: Virtual machine does not exist: {'vmId': = u'2ccc0ef0-cc31-45b8-8e91-a78f<wbr class=3D"">a4cad671'}<br class=3D""><br= class=3D""><br class=3D""></pre><pre class=3D"">What should I be = looking for to identify the issue?<br class=3D""><br class=3D""></pre><pre= class=3D"">The logs are at <a = href=3D"http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastComp= letedBuild/artifact/exported-artifacts/test_logs/hc-basic-suite-master/pos= t-002_bootstrap.py/lago-hc-basic-suite-master-host0" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/o<wbr = class=3D"">virt_master_hc-system-tests/la<wbr = class=3D"">stCompletedBuild/artifact/expo<wbr = class=3D"">rted-artifacts/test_logs/hc-ba<wbr = class=3D"">sic-suite-master/post-002_boot<wbr = class=3D"">strap.py/lago-hc-basic-suite-<wbr = class=3D"">master-host0</a><br class=3D""><br class=3D""></pre><pre = class=3D"">thanks<span = class=3D"gmail-m_3795450909556566802gmail-m_-511095684978464768HOEnZb"><fo= nt color=3D"#888888" class=3D""><br class=3D""></font></span></pre><span = class=3D"gmail-m_3795450909556566802gmail-m_-511095684978464768HOEnZb"><fo= nt color=3D"#888888" class=3D""><pre class=3D"">sahina<br = class=3D""></pre></font></span></div> <br class=3D""></div></div>______________________________<wbr = class=3D"">_________________<br class=3D""> Devel mailing list<br class=3D""> <a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" = class=3D"">Devel@ovirt.org</a><br class=3D""> <a href=3D"http://lists.ovirt.org/mailman/listinfo/devel" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/devel</a><span = class=3D"gmail-m_3795450909556566802gmail-HOEnZb"><font color=3D"#888888" = class=3D""><br class=3D""></font></span></blockquote></div><span = class=3D"gmail-m_3795450909556566802gmail-HOEnZb"><font color=3D"#888888" = class=3D""><br class=3D""><br clear=3D"all" class=3D""><div class=3D""><br= class=3D""></div>-- <br class=3D""><div = class=3D"gmail-m_3795450909556566802gmail-m_-511095684978464768gmail_signa= ture"><div dir=3D"ltr" class=3D""><div class=3D""><div dir=3D"ltr" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D"">Sandro Bonazzola<br = class=3D"">Better technology. Faster innovation. Powered by community = collaboration.<br class=3D"">See how it works at <a = href=3D"http://redhat.com/" target=3D"_blank" = class=3D"">redhat.com</a></div></div></div></div></div></div></div></div> </font></span></div> </blockquote></div><br class=3D""></div></div></div></div></div> </blockquote></div><br class=3D""></div></div> _______________________________________________<br class=3D"">Devel = mailing list<br class=3D""><a href=3D"mailto:Devel@ovirt.org" = class=3D"">Devel@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/devel</div></blockquote=
</div><br class=3D""></body></html>=
--Apple-Mail=_F2D44161-4CD5-4109-BD15-4E9B9ED8862F--

On Wed, Feb 22, 2017 at 3:22 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 22 Feb 2017, at 13:53, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
When ovirt-ha-agent checks the status of the engine VM we get:
2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
While in ovirt-ha-agent logs we have:
MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
...
MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
Probably it's a bug or a regression somewhere on master.
On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case: https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine- ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/ engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/ master#l54
Adding Francesco here to understand if something has recently changed there on vdsm side.
That’s not a very robust code handling. Yes, the text changed, the vm id was added. And yes, it may change again any time I guess
I agree, we are going to move to code check: https://gerrit.ovirt.org/#/c/72891
On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Adding Lev
On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <sabose@redhat.com> wrote:
Hi all,
On the HC setup, the HE VM is not restarted. The agent.log has MainThread::INFO::2017-02-21 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {} MainThread::INFO::2017-02-21 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True} ... MainThread::INFO::2017-02-21 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken MainThread::INFO::2017-02-21 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1487733029.22 type=state_transition detail=ReinitializeFSM-UnknownLocalVmState hostname='lago-hc-basic-suite-master-host0' MainThread::INFO::2017-02-21 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-UnknownLocalVmState) sent? ignored
and the vdsm.log
2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User shut down from within the guest (code=7) (vm:1269) 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection (guestagent:429)
2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
What should I be looking for to identify the issue?
The logs are at http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild...
thanks
sahina
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

This is a multi-part message in MIME format. --------------2DA879FFF9040AC877FCC61C Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote:
When ovirt-ha-agent checks the status of the engine VM we get:
2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
While in ovirt-ha-agent logs we have:
MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
...
MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
Probably it's a bug or a regression somewhere on master.
On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case: https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_...
Adding Francesco here to understand if something has recently changed there on vdsm side.
It has changed indeed; we had a series of changes which added context to some exceptions. I believe the straw who broke the camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this change broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests, -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani --------------2DA879FFF9040AC877FCC61C Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:<br> <blockquote cite="mid:CAN8-ONooweDRBtBrPRZ6OgwcTOfRSB4S96eK8=pCo3mv85+C+w@mail.gmail.com" type="cite"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <span dir="ltr"><<a moz-do-not-send="true" href="mailto:stirabos@redhat.com" target="_blank">stirabos@redhat.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div dir="ltr">When ovirt-ha-agent checks the status of the engine VM we get: <div> <pre style="color:rgb(0,0,0)">2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-<wbr>a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-<wbr>packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._<wbr>UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-<wbr>a78fa4cad671'}</pre><pre style="color:rgb(0,0,0)"> </pre><pre>While in ovirt-ha-agent logs we have:<pre style="color:rgb(0,0,0)"><pre>MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-02-21">INFO::2017-02-21</a> 22:21:18,583::hosted_engine::<wbr>453::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(start_<wbr>monitoring) Current state UnknownLocalVmState (score: 3400)</pre><pre>...</pre></pre><pre style="color:rgb(0,0,0)">MainThread::<a class="moz-txt-link-freetext" href="INFO::2017-02-21">INFO::2017-02-21</a> 22:21:31,199::state_<wbr>decorators::25::ovirt_hosted_<wbr>engine_ha.agent.hosted_engine.<wbr>HostedEngine::(check) Unknown local engine vm status no actions taken</pre></pre>Probably it's a bug or a regression somewhere on master.</div></div></blockquote><div> </div><div>On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case:</div><div><a moz-do-not-send="true" href="https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54">https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54</a> </div><div> </div><div>Adding Francesco here to understand if something has recently changed there on vdsm side.</div></div></div></div></blockquote> It has changed indeed; we had a series of changes which added context to some exceptions. I believe the straw who broke the camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this change broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests, <pre class="moz-signature" cols="72">-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani</pre></body></html> --------------2DA879FFF9040AC877FCC61C--

broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests,
Already done: https://gerrit.ovirt.org/#/c/72791/ Martin On Wed, Feb 22, 2017 at 3:32 PM, Francesco Romani <fromani@redhat.com> wrote:
On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
When ovirt-ha-agent checks the status of the engine VM we get:
2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
While in ovirt-ha-agent logs we have:
MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
...
MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
Probably it's a bug or a regression somewhere on master.
On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case: https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_...
Adding Francesco here to understand if something has recently changed there on vdsm side.
It has changed indeed; we had a series of changes which added context to some exceptions. I believe the straw who broke the camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this change broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests,
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Btw, this whole thing is probably the oldest untouched piece of hosted engine we have. It worked till now :) Martin On Wed, Feb 22, 2017 at 3:38 PM, Martin Sivak <msivak@redhat.com> wrote:
broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests,
Already done:
https://gerrit.ovirt.org/#/c/72791/
Martin
On Wed, Feb 22, 2017 at 3:32 PM, Francesco Romani <fromani@redhat.com> wrote:
On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
When ovirt-ha-agent checks the status of the engine VM we get:
2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
While in ovirt-ha-agent logs we have:
MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
...
MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
Probably it's a bug or a regression somewhere on master.
On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case: https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_...
Adding Francesco here to understand if something has recently changed there on vdsm side.
It has changed indeed; we had a series of changes which added context to some exceptions. I believe the straw who broke the camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this change broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests,
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Wed, Feb 22, 2017 at 4:32 PM Francesco Romani <fromani@redhat.com> wrote:
On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
When ovirt-ha-agent checks the status of the engine VM we get:
2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
While in ovirt-ha-agent logs we have:
MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
...
MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
Probably it's a bug or a regression somewhere on master.
On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case:
https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_...
Adding Francesco here to understand if something has recently changed there on vdsm side.
It has changed indeed; we had a series of changes which added context to some exceptions. I believe the straw who broke the camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this change broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests,
On the bright side, this is exactly why we need o-s-t running Hosted-Engine - though we probably need to exercise more HE flows (global and local maint., for example). On the downside, how come I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 was merged on Jan1st, and we only saw the regression now? Is there another bug that hid this one until now? Y.
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

This is a multi-part message in MIME format. --------------EAD9CD4367177BDB55FBF586 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 02/22/2017 03:42 PM, Yaniv Kaul wrote:
On Wed, Feb 22, 2017 at 4:32 PM Francesco Romani <fromani@redhat.com <mailto:fromani@redhat.com>> wrote:
On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote:
When ovirt-ha-agent checks the status of the engine VM we get:
2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
While in ovirt-ha-agent logs we have:
MainThread::INFO::2017-02-21 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)
...
MainThread::INFO::2017-02-21 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken
Probably it's a bug or a regression somewhere on master.
On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case: https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_...
Adding Francesco here to understand if something has recently changed there on vdsm side.
It has changed indeed; we had a series of changes which added context to some exceptions. I believe the straw who broke the camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this change broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests,
On the bright side, this is exactly why we need o-s-t running Hosted-Engine - though we probably need to exercise more HE flows (global and local maint., for example). On the downside, how come I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 was merged on Jan1st, and we only saw the regression now? Is there another bug that hid this one until now? Y.
It was merged on Jan 29 on master, backported on Feb 8 on 4.1 branch (because it was part of the vmleases feature, needed on 4.1.z). Bests, -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani --------------EAD9CD4367177BDB55FBF586 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> On 02/22/2017 03:42 PM, Yaniv Kaul wrote:<br> <blockquote cite="mid:CAJgorsaL15MmmHX-c-7Qr3pb-rBjKp+sBtExD8mH2zh6hiMWWQ@mail.gmail.com" type="cite"> <div dir="ltr"><br> <br> <div class="gmail_quote"> <div dir="ltr">On Wed, Feb 22, 2017 at 4:32 PM Francesco Romani <<a moz-do-not-send="true" href="mailto:fromani@redhat.com">fromani@redhat.com</a>> wrote:<br> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div bgcolor="#FFFFFF" text="#000000" class="gmail_msg"> On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:<br class="gmail_msg"> <blockquote type="cite" class="gmail_msg"> <div dir="ltr" class="gmail_msg"><br class="gmail_msg"> <div class="gmail_extra gmail_msg"><br class="gmail_msg"> <div class="gmail_quote gmail_msg">On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <span dir="ltr" class="gmail_msg"><<a moz-do-not-send="true" href="mailto:stirabos@redhat.com" class="gmail_msg" target="_blank">stirabos@redhat.com</a>></span> wrote:<br class="gmail_msg"> <blockquote class="gmail_quote gmail_msg" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div dir="ltr" class="gmail_msg">When ovirt-ha-agent checks the status of the engine VM we get: <div class="gmail_msg"> <pre style="color:rgb(0,0,0)" class="gmail_msg">2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/API.py", line 335, in getStats vm = self.vm File "/usr/share/vdsm/API.py", line 130, in vm raise exception.NoSuchVM(vmId=self._UUID) NoSuchVM: Virtual machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}</pre> <pre class="gmail_msg">While in ovirt-ha-agent logs we have:<pre style="color:rgb(0,0,0)" class="gmail_msg"><pre class="gmail_msg">MainThread::<a moz-do-not-send="true" class="m_-8855782447270567219moz-txt-link-freetext gmail_msg">INFO::2017-02-21</a> 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state UnknownLocalVmState (score: 3400)</pre><pre class="gmail_msg">...</pre></pre><pre style="color:rgb(0,0,0)" class="gmail_msg">MainThread::<a moz-do-not-send="true" class="m_-8855782447270567219moz-txt-link-freetext gmail_msg">INFO::2017-02-21</a> 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local engine vm status no actions taken</pre></pre> Probably it's a bug or a regression somewhere on master.</div> </div> </blockquote> <div class="gmail_msg"> </div> <div class="gmail_msg">On ovirt-ha-broker side the detection is based on a strict string match on the error message that is expected to be exactly 'Virtual machine does not exist' to set down status otherwise we set unknown status as in this case:</div> <div class="gmail_msg"><a moz-do-not-send="true" href="https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_..." class="gmail_msg" target="_blank">https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54</a> </div> <div class="gmail_msg"> </div> <div class="gmail_msg">Adding Francesco here to understand if something has recently changed there on vdsm side.</div> </div> </div> </div> </blockquote> </div> <div bgcolor="#FFFFFF" text="#000000" class="gmail_msg"> It has changed indeed; we had a series of changes which added context to some exceptions. I believe the straw who broke the camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this change broke HA. Could you perhaps fixing it checking that the message *begins* with that string, and/or checking the error code. bests,</div> </blockquote> <div><br> </div> <div>On the bright side, this is exactly why we need o-s-t running Hosted-Engine - though we probably need to exercise more HE flows (global and local maint., for example).</div> <div>On the downside, how come <span style="font-size:13px;color:rgb(33,33,33)">I32ec3f86f8d53f8412f4c0526fc85</span><span style="font-size:13px;color:rgb(33,33,33)">e2a42e30ea5 was merged on Jan1st, and we only saw the regression now? Is there another bug that hid this one until now?</span></div> <div><font color="#212121">Y.</font></div> <br> </div> </div> </blockquote> <br> It was merged on Jan 29 on master, backported on Feb 8 on 4.1 branch (because it was part of the vmleases feature, needed on 4.1.z).<br> <br> Bests,<br> <br> <pre class="moz-signature" cols="72">-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani</pre> </body> </html> --------------EAD9CD4367177BDB55FBF586--
participants (7)
-
Francesco Romani
-
Martin Sivak
-
Michal Skrivanek
-
Sahina Bose
-
Sandro Bonazzola
-
Simone Tiraboschi
-
Yaniv Kaul