This is a multi-part message in MIME format.
--------------2DA879FFF9040AC877FCC61C
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi
<stirabos(a)redhat.com <mailto:stirabos@redhat.com>> wrote:
When ovirt-ha-agent checks the status of the engine VM we get:
2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual
machine does not exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
(api:69)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in
method
ret = func(*args, **kwargs)
File "/usr/share/vdsm/API.py", line 335, in getStats
vm = self.vm
File "/usr/share/vdsm/API.py", line 130, in vm
raise exception.NoSuchVM(vmId=self._UUID)
NoSuchVM: Virtual machine does not exist: {'vmId':
u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
While in ovirt-ha-agent logs we have:
MainThread::INFO::2017-02-21
22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state UnknownLocalVmState (score: 3400)
...
MainThread::INFO::2017-02-21
22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Unknown local engine vm status no actions taken
Probably it's a bug or a regression somewhere on master.
On ovirt-ha-broker side the detection is based on a strict string
match on the error message that is expected to be exactly 'Virtual
machine does not exist' to set down status otherwise we set unknown
status as in this case:
https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovi...
Adding Francesco here to understand if something has recently changed
there on vdsm side.
It has changed indeed; we had a series of changes which added
context to
some exceptions. I believe the straw who broke the camel's back was
I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is unfortunate that this
change broke HA. Could you perhaps fixing it checking that the message
*begins* with that string, and/or checking the error code. bests,
--
Francesco Romani
Red Hat Engineering Virtualization R & D
IRC: fromani
--------------2DA879FFF9040AC877FCC61C
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:<br>
<blockquote
cite="mid:CAN8-ONooweDRBtBrPRZ6OgwcTOfRSB4S96eK8=pCo3mv85+C+w@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Feb 22, 2017 at 1:33 PM,
Simone Tiraboschi <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:stirabos@redhat.com"
target="_blank">stirabos(a)redhat.com</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">When ovirt-ha-agent checks the status of
the engine VM we get:
<div>
<pre style="color:rgb(0,0,0)">2017-02-21
22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats error=Virtual machine does not
exist: {'vmId': u'2ccc0ef0-cc31-45b8-8e91-<wbr>a78fa4cad671'}
(api:69)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-<wbr>packages/vdsm/common/api.py", line
67, in method
ret = func(*args, **kwargs)
File "/usr/share/vdsm/API.py", line 335, in getStats
vm = self.vm
File "/usr/share/vdsm/API.py", line 130, in vm
raise exception.NoSuchVM(vmId=self._<wbr>UUID)
NoSuchVM: Virtual machine does not exist: {'vmId':
u'2ccc0ef0-cc31-45b8-8e91-<wbr>a78fa4cad671'}</pre><pre
style="color:rgb(0,0,0)">
</pre><pre>While in ovirt-ha-agent logs we have:<pre
style="color:rgb(0,0,0)"><pre>MainThread::<a
class="moz-txt-link-freetext"
href="INFO::2017-02-21">INFO::2017-02-21</a>
22:21:18,583::hosted_engine::<wbr>453::ovirt_hosted_engine_ha.<wbr>agent.hosted_engine.<wbr>HostedEngine::(start_<wbr>monitoring)
Current state UnknownLocalVmState (score:
3400)</pre><pre>...</pre></pre><pre
style="color:rgb(0,0,0)">MainThread::<a
class="moz-txt-link-freetext"
href="INFO::2017-02-21">INFO::2017-02-21</a>
22:21:31,199::state_<wbr>decorators::25::ovirt_hosted_<wbr>engine_ha.agent.hosted_engine.<wbr>HostedEngine::(check)
Unknown local engine vm status no actions taken</pre></pre>Probably it's a
bug or a regression somewhere on
master.</div></div></blockquote><div>
</div><div>On ovirt-ha-broker side the detection is based on a strict string
match on the error message that is expected to be exactly 'Virtual machine does not
exist' to set down status otherwise we set unknown status as in this
case:</div><div><a moz-do-not-send="true"
href="https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a...
</div><div> </div><div>Adding Francesco here to understand if
something has recently changed there on vdsm
side.</div></div></div></div></blockquote>
It has changed indeed; we had a series of changes which added context to some exceptions.
I believe the straw who broke the camel's back was
I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5
It is unfortunate that this change broke HA.
Could you perhaps fixing it checking that the message *begins* with that string, and/or
checking the error code.
bests,
<pre class="moz-signature" cols="72">--
Francesco Romani
Red Hat Engineering Virtualization R & D
IRC: fromani</pre></body></html>
--------------2DA879FFF9040AC877FCC61C--