<div dir="ltr">Ho god, My cpu usage is 80% on host1<div><br></div><div><div>1729 vdsm      20   0  762m  15m 2884 S <font color="#ff0000" style="background-color:rgb(255,255,255)">297.6</font>  0.1  77:16.70 ovirt-ha-broker</div>
</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-23 16:40 GMT+02:00 Kevin Tibi <span dir="ltr">&lt;<a href="mailto:kevintibi@hotmail.com" target="_blank">kevintibi@hotmail.com</a>&gt;</span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">In engine, i have <table style="padding-right:16px;padding-left:16px;line-height:18px;font-family:&#39;Arial Unicode MS&#39;,Arial,sans-serif;display:block">
<tbody><tr>
<td style="font-family:&#39;Arial Unicode MS&#39;,Arial,sans-serif"><div style="white-space:nowrap;font-family:Arial,sans-serif;padding-top:1px;padding-bottom:2px">Hosted Engine HA: not active    for my host1</div>
<div style="white-space:nowrap;font-family:Arial,sans-serif;padding-top:1px;padding-bottom:2px"><table style="display:block;padding-left:16px;padding-right:16px;font-family:&#39;Arial Unicode MS&#39;,Arial,sans-serif;white-space:normal">

<tbody><tr><td style="font-family:&#39;Arial Unicode MS&#39;,Arial,sans-serif"><div style="white-space:nowrap;font-family:Arial,sans-serif;padding-top:1px;padding-bottom:2px">
Hosted Engine HA: active (score 0)   for my host2</div><div><br></div></td></tr></tbody></table></div><div><br></div></td></tr></tbody></table></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-23 13:52 GMT+02:00 Jiri Moskovcak <span dir="ltr">&lt;<a href="mailto:jmoskovc@redhat.com" target="_blank">jmoskovc@redhat.com</a>&gt;</span>:<div>
<div class="h5"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
I&#39;m not sure yet what causes the problem, but the workaround should be:<br>
<br>
open file /usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/states.py in your favorite editor, go to line 52 and change it:<br>
<br>
from: except ValueError:<br>
to: except (ValueError, TypeError):<br>
<br>
--Jirka<div><div><br>
<br>
On 04/23/2014 12:43 PM, Kevin Tibi wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>
Hi,<br>
<br>
/var/log/ovirt-hosted-engine-<u></u>ha/broker.log<br>
<br>
Host1:<br>
Thread-118327::INFO::2014-04-<u></u>23<br>
12:34:59,360::listener::134::<u></u>ovirt_hosted_engine_ha.broker.<u></u>listener.ConnectionHandler::(<u></u>setup)<br>
Connection established<br>
Thread-118327::INFO::2014-04-<u></u>23<br>
12:34:59,375::listener::184::<u></u>ovirt_hosted_engine_ha.broker.<u></u>listener.ConnectionHandler::(<u></u>handle)<br>
Connection closed<br>
Thread-118328::INFO::2014-04-<u></u>23<br>
12:35:14,546::listener::134::<u></u>ovirt_hosted_engine_ha.broker.<u></u>listener.ConnectionHandler::(<u></u>setup)<br>
Connection established<br>
Thread-118328::INFO::2014-04-<u></u>23<br>
12:35:14,549::listener::184::<u></u>ovirt_hosted_engine_ha.broker.<u></u>listener.ConnectionHandler::(<u></u>handle)<br>
Connection closed<br>
<br>
Host2:<br>
Thread-4::INFO::2014-04-23<br>
12:36:08,020::mem_free::53::<u></u>mem_free.MemFree::(action<br>
  ) memFree: 9816<br>
Thread-3::INFO::2014-04-23<br>
12:36:08,240::mgmt_bridge::59:<u></u>:mgmt_bridge.MgmtBridge<br>
  ::(action) Found bridge ovirtmgmt<br>
Thread-296455::INFO::2014-04-<u></u>23<br>
12:36:08,678::listener::134::<u></u>ovirt_hosted_engine<br>
  _ha.broker.listener.<u></u>ConnectionHandler::(setup) Connection established<br>
Thread-296455::INFO::2014-04-<u></u>23<br>
12:36:08,684::listener::184::<u></u>ovirt_hosted_engine<br>
  _ha.broker.listener.<u></u>ConnectionHandler::(handle) Connection closed<br>
<br>
<br>
<br>
/var/log/ovirt-hosted-engine-<u></u>ha/agent.log<br>
<br>
host1:<br>
<br>
MainThread::INFO::2014-04-02<br>
17:46:14,856::state_<u></u>decorators::25::ovirt_hosted_<u></u>en<br>
       gine_ha.agent.hosted_engine.<u></u>HostedEngine::(check) Unknown local<br>
engine vm status                            no actions taken<br>
MainThread::INFO::2014-04-02<br>
17:46:14,857::brokerlink::108:<u></u>:ovirt_hosted_engine_<br>
       ha.lib.brokerlink.BrokerLink::<u></u>(notify) Trying: notify<br>
time=1396453574.86 type=st                           ate_transition<br>
detail=UnknownLocalVmState-<u></u>UnknownLocalVmState hostname=&#39;host01.o<br>
                     virt.lan&#39;<br>
MainThread::INFO::2014-04-02<br>
17:46:14,858::brokerlink::117:<u></u>:ovirt_hosted_engine_<br>
       ha.lib.brokerlink.BrokerLink::<u></u>(notify) Success, was notification<br>
of state_transi                           tion<br>
(UnknownLocalVmState-<u></u>UnknownLocalVmState) sent? ignored<br>
MainThread::WARNING::2014-04-<u></u>02<br>
17:46:15,463::hosted_engine::<u></u>334::ovirt_hosted_e<br>
   ngine_ha.agent.hosted_engine.<u></u>HostedEngine::(start_<u></u>monitoring) Error<br>
while monito                           ring engine: float() argument<br>
must be a string or a number<br>
MainThread::WARNING::2014-04-<u></u>02<br>
17:46:15,464::hosted_engine::<u></u>337::ovirt_hosted_e<br>
   ngine_ha.agent.hosted_engine.<u></u>HostedEngine::(start_<u></u>monitoring)<br>
Unexpected error<br>
Traceback (most recent call last):<br>
   File<br>
&quot;/usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/hosted_eng<br>
                         ine.py&quot;, line 323, in start_monitoring<br>
     state.score(self._log))<br>
   File<br>
&quot;/usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/states.py&quot;<br>
                         , line 160, in score<br>
     lm, logger, score, score_cfg)<br>
   File<br>
&quot;/usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/states.py&quot;<br>
                         , line 61, in _penalize_memory<br>
     if self._float_or_default(lm[&#39;<u></u>mem-free&#39;], 0) &lt; vm_mem:<br>
   File<br>
&quot;/usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/states.py&quot;<br>
                         , line 51, in _float_or_default<br>
     return float(value)<br>
TypeError: float() argument must be a string or a number<br>
MainThread::ERROR::2014-04-02<br>
17:46:15,464::hosted_engine::<u></u>350::ovirt_hosted_eng<br>
     ine_ha.agent.hosted_engine.<u></u>HostedEngine::(start_<u></u>monitoring)<br>
Shutting down the ag                           ent because of 3 failures<br>
in a row!<br>
MainThread::INFO::2014-04-02<br>
17:46:15,466::agent::116::<a href="http://ovirt_hosted_engine_ha.ag" target="_blank">ovir<u></u>t_hosted_engine_ha.ag</a><br></div></div>
&lt;<a href="http://ovirt_hosted_engine_ha.ag" target="_blank">http://ovirt_hosted_engine_<u></u>ha.ag</a>&gt;<div><div><br>
ent.agent.Agent::(run) Agent shutting down<br>
<br>
<br>
host2:<br>
<br>
MainThread::INFO::2014-04-23<br>
12:36:44,800::hosted_engine::<u></u>323::ovirt_hosted_engine_ha.<u></u>agent.hosted_engine.<u></u>HostedEngine::(start_<u></u>monitoring)<br>
Current state EngineUnexpectedlyDown (score: 0)<br>
MainThread::INFO::2014-04-23<br>
12:36:54,844::brokerlink::108:<u></u>:ovirt_hosted_engine_ha.lib.<u></u>brokerlink.BrokerLink::(<u></u>notify)<br>
Trying: notify time=1398249414.84 type=state_transition<br>
detail=EngineUnexpectedlyDown-<u></u>EngineUnexpectedlyDown<br>
hostname=&#39;host02.ovirt.lan&#39;<br>
MainThread::INFO::2014-04-23<br>
12:36:54,846::brokerlink::117:<u></u>:ovirt_hosted_engine_ha.lib.<u></u>brokerlink.BrokerLink::(<u></u>notify)<br>
Success, was notification of state_transition<br>
(EngineUnexpectedlyDown-<u></u>EngineUnexpectedlyDown) sent? ignored<br>
<br>
/var/log/vdsm/vdsm.log<br>
<br>
host1 :<br>
<br>
Thread-116::DEBUG::2014-04-23<br>
12:40:17,060::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) &#39;/bin/dd<br>
iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_iso/<u></u>cc51143e-8ad7-4b0b-a4d2-<u></u>9024dffc1188/dom_md/metadata<br>
bs=4096 count=1&#39; (cwd None)<br>
Thread-116::DEBUG::2014-04-23<br>
12:40:17,070::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCCESS:<br>
&lt;err&gt; = &#39;0+1 records in\n0+1 records out\n343 bytes (343 B) copied,<br>
0.000183642 s, 1.9 MB/s\n&#39;; &lt;rc&gt; = 0<br>
Thread-37::DEBUG::2014-04-23<br>
12:40:17,504::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) &#39;/bin/dd<br>
iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_NFS01/<u></u>aea040f8-ab9d-435b-9ecf-<u></u>ddd4272e592f/dom_md/metadata<br>
bs=4096 count=1&#39; (cwd None)<br>
Thread-37::DEBUG::2014-04-23<br>
12:40:17,514::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCCESS:<br>
&lt;err&gt; = &#39;0+1 records in\n0+1 records out\n472 bytes (472 B) copied,<br>
0.000165064 s, 2.9 MB/s\n&#39;; &lt;rc&gt; = 0<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,170::task::595::<u></u>TaskManager.Task::(_<u></u>updateState)<br>
Task=`8a3a3e42-6e79-4849-9b1c-<u></u>cad895722884`::moving from state init -&gt;<br>
state preparing<br>
Thread-11736::INFO::2014-04-23<br>
12:40:18,170::logUtils::44::<u></u>dispatcher::(wrapper) Run and protect:<br>
repoStats(options=None)<br>
Thread-11736::INFO::2014-04-23<br>
12:40:18,171::logUtils::47::<u></u>dispatcher::(wrapper) Run and protect:<br>
repoStats, Return response: {&#39;aea040f8-ab9d-435b-9ecf-<u></u>ddd4272e592f&#39;:<br>
{&#39;code&#39;: 0, &#39;version&#39;: 3, &#39;acquired&#39;: True, &#39;delay&#39;: &#39;0.000165064&#39;,<br>
&#39;lastCheck&#39;: &#39;0.7&#39;, &#39;valid&#39;: True},<br>
&#39;5ae613a4-44e4-42cb-89fc-<u></u>7b5d34c1f30f&#39;: {&#39;code&#39;: 0, &#39;version&#39;: 3,<br>
&#39;acquired&#39;: True, &#39;delay&#39;: &#39;0.000174536&#39;, &#39;lastCheck&#39;: &#39;3.0&#39;, &#39;valid&#39;:<br>
True}, &#39;cc51143e-8ad7-4b0b-a4d2-<u></u>9024dffc1188&#39;: {&#39;code&#39;: 0, &#39;version&#39;: 0,<br>
&#39;acquired&#39;: True, &#39;delay&#39;: &#39;0.000183642&#39;, &#39;lastCheck&#39;: &#39;1.1&#39;, &#39;valid&#39;:<br>
True}, &#39;ff98d346-4515-4349-8437-<u></u>fb2f5e9eaadf&#39;: {&#39;code&#39;: 0, &#39;version&#39;: 0,<br>
&#39;acquired&#39;: True, &#39;delay&#39;: &#39;0.00045492&#39;, &#39;lastCheck&#39;: &#39;8.6&#39;, &#39;valid&#39;: True}}<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,171::task::1185::<u></u>TaskManager.Task::(prepare)<br>
Task=`8a3a3e42-6e79-4849-9b1c-<u></u>cad895722884`::finished:<br>
{&#39;aea040f8-ab9d-435b-9ecf-<u></u>ddd4272e592f&#39;: {&#39;code&#39;: 0, &#39;version&#39;: 3,<br>
&#39;acquired&#39;: True, &#39;delay&#39;: &#39;0.000165064&#39;, &#39;lastCheck&#39;: &#39;0.7&#39;, &#39;valid&#39;:<br>
True}, &#39;5ae613a4-44e4-42cb-89fc-<u></u>7b5d34c1f30f&#39;: {&#39;code&#39;: 0, &#39;version&#39;: 3,<br>
&#39;acquired&#39;: True, &#39;delay&#39;: &#39;0.000174536&#39;, &#39;lastCheck&#39;: &#39;3.0&#39;, &#39;valid&#39;:<br>
True}, &#39;cc51143e-8ad7-4b0b-a4d2-<u></u>9024dffc1188&#39;: {&#39;code&#39;: 0, &#39;version&#39;: 0,<br>
&#39;acquired&#39;: True, &#39;delay&#39;: &#39;0.000183642&#39;, &#39;lastCheck&#39;: &#39;1.1&#39;, &#39;valid&#39;:<br>
True}, &#39;ff98d346-4515-4349-8437-<u></u>fb2f5e9eaadf&#39;: {&#39;code&#39;: 0, &#39;version&#39;: 0,<br>
&#39;acquired&#39;: True, &#39;delay&#39;: &#39;0.00045492&#39;, &#39;lastCheck&#39;: &#39;8.6&#39;, &#39;valid&#39;: True}}<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,172::task::595::<u></u>TaskManager.Task::(_<u></u>updateState)<br>
Task=`8a3a3e42-6e79-4849-9b1c-<u></u>cad895722884`::moving from state preparing<br>
-&gt; state finished<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,172::resourceManager:<u></u>:940::ResourceManager.Owner::(<u></u>releaseAll)<br>
Owner.releaseAll requests {} resources {}<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,172::resourceManager:<u></u>:977::ResourceManager.Owner::(<u></u>cancelAll)<br>
Owner.cancelAll requests {}<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,172::task::990::<u></u>TaskManager.Task::(_decref)<br>
Task=`8a3a3e42-6e79-4849-9b1c-<u></u>cad895722884`::ref 0 aborting False<br>
Thread-299::DEBUG::2014-04-23<br>
12:40:19,599::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) &#39;/bin/dd<br>
iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_export/<u></u>ff98d346-4515-4349-8437-<u></u>fb2f5e9eaadf/dom_md/metadata<br>
bs=4096 count=1&#39; (cwd None)<br>
Thread-299::DEBUG::2014-04-23<br>
12:40:19,610::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCCESS:<br>
&lt;err&gt; = &#39;0+1 records in\n0+1 records out\n352 bytes (352 B) copied,<br>
0.000525872 s, 669 kB/s\n&#39;; &lt;rc&gt; = 0<br>
<br>
<br>
host2 :<br>
<br>
Thread-1688899::DEBUG::2014-<u></u>04-23<br>
12:41:30,270::task::990::<u></u>TaskManager.Task::(_decref) Task=`c23aeaf<br>
                       5-aed4-4285-a8c9-2bffadc0240e`<u></u>::ref 0 aborting False<br>
Thread-159126::DEBUG::2014-04-<u></u>23<br>
12:41:30,547::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) &#39;/bi<br>
                         n/dd iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_iso/<u></u>cc51143e-8ad7-4b0b-a4d2-9024df<br>
                             fc1188/dom_md/metadata bs=4096 count=1&#39;<br>
(cwd None)<br>
Thread-159126::DEBUG::2014-04-<u></u>23<br>
12:41:30,569::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCC<br>
                         ESS: &lt;err&gt; = &#39;0+1 records in\n0+1 records<br>
out\n343 bytes (343 B) copied, 0.000480513 s, 714 kB/s\n&#39;;<br>
                &lt;rc&gt; = 0<br>
Thread-159125::DEBUG::2014-04-<u></u>23<br>
12:41:30,740::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) &#39;/bi<br>
                         n/dd iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_DATA/<u></u>5ae613a4-44e4-42cb-89fc-7b5d3<br>
                             4c1f30f/dom_md/metadata bs=4096 count=1&#39;<br>
(cwd None)<br>
Thread-159125::DEBUG::2014-04-<u></u>23<br>
12:41:30,762::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCC<br>
                         ESS: &lt;err&gt; = &#39;0+1 records in\n0+1 records<br>
out\n545 bytes (545 B) copied, 0.000382036 s, 1.4 MB/s\n&#39;;<br>
                &lt;rc&gt; = 0<br>
Thread-159128::DEBUG::2014-04-<u></u>23<br>
12:41:32,226::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) &#39;/bi<br>
                         n/dd iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_export/<u></u>ff98d346-4515-4349-8437-fb2<br>
                             f5e9eaadf/dom_md/metadata bs=4096 count=1&#39;<br>
(cwd None)<br>
Thread-159128::DEBUG::2014-04-<u></u>23<br>
12:41:32,245::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCC<br>
                         ESS: &lt;err&gt; = &#39;0+1 records in\n0+1 records<br>
out\n352 bytes (352 B) copied, 0.000648972 s, 542 kB/s\n&#39;;<br>
                &lt;rc&gt; = 0<br>
<br>
<br>
<br>
2014-04-23 0:21 GMT+02:00 Doron Fediuck &lt;<a href="mailto:dfediuck@redhat.com" target="_blank">dfediuck@redhat.com</a><br></div></div>
&lt;mailto:<a href="mailto:dfediuck@redhat.com" target="_blank">dfediuck@redhat.com</a>&gt;&gt;:<div><br>
<br>
<br>
<br>
    ----- Original Message -----<br>
     &gt; From: &quot;Kevin Tibi&quot; &lt;<a href="mailto:kevintibi@hotmail.com" target="_blank">kevintibi@hotmail.com</a><br></div><div>
    &lt;mailto:<a href="mailto:kevintibi@hotmail.com" target="_blank">kevintibi@hotmail.com</a>&gt;<u></u>&gt;<br>
     &gt; To: &quot;users&quot; &lt;<a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a> &lt;mailto:<a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a>&gt;&gt;<br>
     &gt; Sent: Tuesday, April 22, 2014 2:12:50 PM<br>
     &gt; Subject: [ovirt-users] Hosted Engine error -243<br>
     &gt;<br>
     &gt; Hi all,<br>
     &gt;<br>
     &gt; I have a probleme with my hosted engine. Every 10 min i have a<br>
    event in<br>
     &gt; engine :<br>
     &gt;<br>
     &gt; VM HostedEngine is down. Exit message: internal error Failed to<br>
    acquire lock:<br>
     &gt; error -243<br>
     &gt;<br>
     &gt; My data is a local export NFS.<br>
     &gt;<br>
     &gt; Thx for you help.<br>
     &gt;<br>
     &gt; Kevin.<br>
     &gt;<br>
<br>
    Hi Kevin,<br>
    can you please check the /var/log/ovirt-hosted-* log files in your hosts<br>
    and let us know if you see something else there or in your vdsm log<br>
    file?<br>
    ______________________________<u></u>_________________<br>
    Users mailing list<br></div>
    <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a> &lt;mailto:<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>&gt;<br>
    <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<u></u>mailman/listinfo/users</a><div><br>
<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<u></u>mailman/listinfo/users</a><br>
<br>
</div></blockquote><div><div>
<br>
______________________________<u></u>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<u></u>mailman/listinfo/users</a><br>
</div></div></blockquote></div></div></div><br></div>
</blockquote></div><br></div>