<div dir="ltr">Ho god, My cpu usage is 80% on host1<div><br></div><div><div>1729 vdsm 20 0 762m 15m 2884 S <font color="#ff0000" style="background-color:rgb(255,255,255)">297.6</font> 0.1 77:16.70 ovirt-ha-broker</div>
</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-23 16:40 GMT+02:00 Kevin Tibi <span dir="ltr"><<a href="mailto:kevintibi@hotmail.com" target="_blank">kevintibi@hotmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">In engine, i have <table style="padding-right:16px;padding-left:16px;line-height:18px;font-family:'Arial Unicode MS',Arial,sans-serif;display:block">
<tbody><tr>
<td style="font-family:'Arial Unicode MS',Arial,sans-serif"><div style="white-space:nowrap;font-family:Arial,sans-serif;padding-top:1px;padding-bottom:2px">Hosted Engine HA: not active for my host1</div>
<div style="white-space:nowrap;font-family:Arial,sans-serif;padding-top:1px;padding-bottom:2px"><table style="display:block;padding-left:16px;padding-right:16px;font-family:'Arial Unicode MS',Arial,sans-serif;white-space:normal">
<tbody><tr><td style="font-family:'Arial Unicode MS',Arial,sans-serif"><div style="white-space:nowrap;font-family:Arial,sans-serif;padding-top:1px;padding-bottom:2px">
Hosted Engine HA: active (score 0) for my host2</div><div><br></div></td></tr></tbody></table></div><div><br></div></td></tr></tbody></table></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-23 13:52 GMT+02:00 Jiri Moskovcak <span dir="ltr"><<a href="mailto:jmoskovc@redhat.com" target="_blank">jmoskovc@redhat.com</a>></span>:<div>
<div class="h5"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
I'm not sure yet what causes the problem, but the workaround should be:<br>
<br>
open file /usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/states.py in your favorite editor, go to line 52 and change it:<br>
<br>
from: except ValueError:<br>
to: except (ValueError, TypeError):<br>
<br>
--Jirka<div><div><br>
<br>
On 04/23/2014 12:43 PM, Kevin Tibi wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>
Hi,<br>
<br>
/var/log/ovirt-hosted-engine-<u></u>ha/broker.log<br>
<br>
Host1:<br>
Thread-118327::INFO::2014-04-<u></u>23<br>
12:34:59,360::listener::134::<u></u>ovirt_hosted_engine_ha.broker.<u></u>listener.ConnectionHandler::(<u></u>setup)<br>
Connection established<br>
Thread-118327::INFO::2014-04-<u></u>23<br>
12:34:59,375::listener::184::<u></u>ovirt_hosted_engine_ha.broker.<u></u>listener.ConnectionHandler::(<u></u>handle)<br>
Connection closed<br>
Thread-118328::INFO::2014-04-<u></u>23<br>
12:35:14,546::listener::134::<u></u>ovirt_hosted_engine_ha.broker.<u></u>listener.ConnectionHandler::(<u></u>setup)<br>
Connection established<br>
Thread-118328::INFO::2014-04-<u></u>23<br>
12:35:14,549::listener::184::<u></u>ovirt_hosted_engine_ha.broker.<u></u>listener.ConnectionHandler::(<u></u>handle)<br>
Connection closed<br>
<br>
Host2:<br>
Thread-4::INFO::2014-04-23<br>
12:36:08,020::mem_free::53::<u></u>mem_free.MemFree::(action<br>
) memFree: 9816<br>
Thread-3::INFO::2014-04-23<br>
12:36:08,240::mgmt_bridge::59:<u></u>:mgmt_bridge.MgmtBridge<br>
::(action) Found bridge ovirtmgmt<br>
Thread-296455::INFO::2014-04-<u></u>23<br>
12:36:08,678::listener::134::<u></u>ovirt_hosted_engine<br>
_ha.broker.listener.<u></u>ConnectionHandler::(setup) Connection established<br>
Thread-296455::INFO::2014-04-<u></u>23<br>
12:36:08,684::listener::184::<u></u>ovirt_hosted_engine<br>
_ha.broker.listener.<u></u>ConnectionHandler::(handle) Connection closed<br>
<br>
<br>
<br>
/var/log/ovirt-hosted-engine-<u></u>ha/agent.log<br>
<br>
host1:<br>
<br>
MainThread::INFO::2014-04-02<br>
17:46:14,856::state_<u></u>decorators::25::ovirt_hosted_<u></u>en<br>
gine_ha.agent.hosted_engine.<u></u>HostedEngine::(check) Unknown local<br>
engine vm status no actions taken<br>
MainThread::INFO::2014-04-02<br>
17:46:14,857::brokerlink::108:<u></u>:ovirt_hosted_engine_<br>
ha.lib.brokerlink.BrokerLink::<u></u>(notify) Trying: notify<br>
time=1396453574.86 type=st ate_transition<br>
detail=UnknownLocalVmState-<u></u>UnknownLocalVmState hostname='host01.o<br>
virt.lan'<br>
MainThread::INFO::2014-04-02<br>
17:46:14,858::brokerlink::117:<u></u>:ovirt_hosted_engine_<br>
ha.lib.brokerlink.BrokerLink::<u></u>(notify) Success, was notification<br>
of state_transi tion<br>
(UnknownLocalVmState-<u></u>UnknownLocalVmState) sent? ignored<br>
MainThread::WARNING::2014-04-<u></u>02<br>
17:46:15,463::hosted_engine::<u></u>334::ovirt_hosted_e<br>
ngine_ha.agent.hosted_engine.<u></u>HostedEngine::(start_<u></u>monitoring) Error<br>
while monito ring engine: float() argument<br>
must be a string or a number<br>
MainThread::WARNING::2014-04-<u></u>02<br>
17:46:15,464::hosted_engine::<u></u>337::ovirt_hosted_e<br>
ngine_ha.agent.hosted_engine.<u></u>HostedEngine::(start_<u></u>monitoring)<br>
Unexpected error<br>
Traceback (most recent call last):<br>
File<br>
"/usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/hosted_eng<br>
ine.py", line 323, in start_monitoring<br>
state.score(self._log))<br>
File<br>
"/usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/states.py"<br>
, line 160, in score<br>
lm, logger, score, score_cfg)<br>
File<br>
"/usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/states.py"<br>
, line 61, in _penalize_memory<br>
if self._float_or_default(lm['<u></u>mem-free'], 0) < vm_mem:<br>
File<br>
"/usr/lib/python2.6/site-<u></u>packages/ovirt_hosted_engine_<u></u>ha/agent/states.py"<br>
, line 51, in _float_or_default<br>
return float(value)<br>
TypeError: float() argument must be a string or a number<br>
MainThread::ERROR::2014-04-02<br>
17:46:15,464::hosted_engine::<u></u>350::ovirt_hosted_eng<br>
ine_ha.agent.hosted_engine.<u></u>HostedEngine::(start_<u></u>monitoring)<br>
Shutting down the ag ent because of 3 failures<br>
in a row!<br>
MainThread::INFO::2014-04-02<br>
17:46:15,466::agent::116::<a href="http://ovirt_hosted_engine_ha.ag" target="_blank">ovir<u></u>t_hosted_engine_ha.ag</a><br></div></div>
<<a href="http://ovirt_hosted_engine_ha.ag" target="_blank">http://ovirt_hosted_engine_<u></u>ha.ag</a>><div><div><br>
ent.agent.Agent::(run) Agent shutting down<br>
<br>
<br>
host2:<br>
<br>
MainThread::INFO::2014-04-23<br>
12:36:44,800::hosted_engine::<u></u>323::ovirt_hosted_engine_ha.<u></u>agent.hosted_engine.<u></u>HostedEngine::(start_<u></u>monitoring)<br>
Current state EngineUnexpectedlyDown (score: 0)<br>
MainThread::INFO::2014-04-23<br>
12:36:54,844::brokerlink::108:<u></u>:ovirt_hosted_engine_ha.lib.<u></u>brokerlink.BrokerLink::(<u></u>notify)<br>
Trying: notify time=1398249414.84 type=state_transition<br>
detail=EngineUnexpectedlyDown-<u></u>EngineUnexpectedlyDown<br>
hostname='host02.ovirt.lan'<br>
MainThread::INFO::2014-04-23<br>
12:36:54,846::brokerlink::117:<u></u>:ovirt_hosted_engine_ha.lib.<u></u>brokerlink.BrokerLink::(<u></u>notify)<br>
Success, was notification of state_transition<br>
(EngineUnexpectedlyDown-<u></u>EngineUnexpectedlyDown) sent? ignored<br>
<br>
/var/log/vdsm/vdsm.log<br>
<br>
host1 :<br>
<br>
Thread-116::DEBUG::2014-04-23<br>
12:40:17,060::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) '/bin/dd<br>
iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_iso/<u></u>cc51143e-8ad7-4b0b-a4d2-<u></u>9024dffc1188/dom_md/metadata<br>
bs=4096 count=1' (cwd None)<br>
Thread-116::DEBUG::2014-04-23<br>
12:40:17,070::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCCESS:<br>
<err> = '0+1 records in\n0+1 records out\n343 bytes (343 B) copied,<br>
0.000183642 s, 1.9 MB/s\n'; <rc> = 0<br>
Thread-37::DEBUG::2014-04-23<br>
12:40:17,504::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) '/bin/dd<br>
iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_NFS01/<u></u>aea040f8-ab9d-435b-9ecf-<u></u>ddd4272e592f/dom_md/metadata<br>
bs=4096 count=1' (cwd None)<br>
Thread-37::DEBUG::2014-04-23<br>
12:40:17,514::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCCESS:<br>
<err> = '0+1 records in\n0+1 records out\n472 bytes (472 B) copied,<br>
0.000165064 s, 2.9 MB/s\n'; <rc> = 0<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,170::task::595::<u></u>TaskManager.Task::(_<u></u>updateState)<br>
Task=`8a3a3e42-6e79-4849-9b1c-<u></u>cad895722884`::moving from state init -><br>
state preparing<br>
Thread-11736::INFO::2014-04-23<br>
12:40:18,170::logUtils::44::<u></u>dispatcher::(wrapper) Run and protect:<br>
repoStats(options=None)<br>
Thread-11736::INFO::2014-04-23<br>
12:40:18,171::logUtils::47::<u></u>dispatcher::(wrapper) Run and protect:<br>
repoStats, Return response: {'aea040f8-ab9d-435b-9ecf-<u></u>ddd4272e592f':<br>
{'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000165064',<br>
'lastCheck': '0.7', 'valid': True},<br>
'5ae613a4-44e4-42cb-89fc-<u></u>7b5d34c1f30f': {'code': 0, 'version': 3,<br>
'acquired': True, 'delay': '0.000174536', 'lastCheck': '3.0', 'valid':<br>
True}, 'cc51143e-8ad7-4b0b-a4d2-<u></u>9024dffc1188': {'code': 0, 'version': 0,<br>
'acquired': True, 'delay': '0.000183642', 'lastCheck': '1.1', 'valid':<br>
True}, 'ff98d346-4515-4349-8437-<u></u>fb2f5e9eaadf': {'code': 0, 'version': 0,<br>
'acquired': True, 'delay': '0.00045492', 'lastCheck': '8.6', 'valid': True}}<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,171::task::1185::<u></u>TaskManager.Task::(prepare)<br>
Task=`8a3a3e42-6e79-4849-9b1c-<u></u>cad895722884`::finished:<br>
{'aea040f8-ab9d-435b-9ecf-<u></u>ddd4272e592f': {'code': 0, 'version': 3,<br>
'acquired': True, 'delay': '0.000165064', 'lastCheck': '0.7', 'valid':<br>
True}, '5ae613a4-44e4-42cb-89fc-<u></u>7b5d34c1f30f': {'code': 0, 'version': 3,<br>
'acquired': True, 'delay': '0.000174536', 'lastCheck': '3.0', 'valid':<br>
True}, 'cc51143e-8ad7-4b0b-a4d2-<u></u>9024dffc1188': {'code': 0, 'version': 0,<br>
'acquired': True, 'delay': '0.000183642', 'lastCheck': '1.1', 'valid':<br>
True}, 'ff98d346-4515-4349-8437-<u></u>fb2f5e9eaadf': {'code': 0, 'version': 0,<br>
'acquired': True, 'delay': '0.00045492', 'lastCheck': '8.6', 'valid': True}}<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,172::task::595::<u></u>TaskManager.Task::(_<u></u>updateState)<br>
Task=`8a3a3e42-6e79-4849-9b1c-<u></u>cad895722884`::moving from state preparing<br>
-> state finished<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,172::resourceManager:<u></u>:940::ResourceManager.Owner::(<u></u>releaseAll)<br>
Owner.releaseAll requests {} resources {}<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,172::resourceManager:<u></u>:977::ResourceManager.Owner::(<u></u>cancelAll)<br>
Owner.cancelAll requests {}<br>
Thread-11736::DEBUG::2014-04-<u></u>23<br>
12:40:18,172::task::990::<u></u>TaskManager.Task::(_decref)<br>
Task=`8a3a3e42-6e79-4849-9b1c-<u></u>cad895722884`::ref 0 aborting False<br>
Thread-299::DEBUG::2014-04-23<br>
12:40:19,599::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) '/bin/dd<br>
iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_export/<u></u>ff98d346-4515-4349-8437-<u></u>fb2f5e9eaadf/dom_md/metadata<br>
bs=4096 count=1' (cwd None)<br>
Thread-299::DEBUG::2014-04-23<br>
12:40:19,610::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCCESS:<br>
<err> = '0+1 records in\n0+1 records out\n352 bytes (352 B) copied,<br>
0.000525872 s, 669 kB/s\n'; <rc> = 0<br>
<br>
<br>
host2 :<br>
<br>
Thread-1688899::DEBUG::2014-<u></u>04-23<br>
12:41:30,270::task::990::<u></u>TaskManager.Task::(_decref) Task=`c23aeaf<br>
5-aed4-4285-a8c9-2bffadc0240e`<u></u>::ref 0 aborting False<br>
Thread-159126::DEBUG::2014-04-<u></u>23<br>
12:41:30,547::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) '/bi<br>
n/dd iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_iso/<u></u>cc51143e-8ad7-4b0b-a4d2-9024df<br>
fc1188/dom_md/metadata bs=4096 count=1'<br>
(cwd None)<br>
Thread-159126::DEBUG::2014-04-<u></u>23<br>
12:41:30,569::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCC<br>
ESS: <err> = '0+1 records in\n0+1 records<br>
out\n343 bytes (343 B) copied, 0.000480513 s, 714 kB/s\n';<br>
<rc> = 0<br>
Thread-159125::DEBUG::2014-04-<u></u>23<br>
12:41:30,740::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) '/bi<br>
n/dd iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_DATA/<u></u>5ae613a4-44e4-42cb-89fc-7b5d3<br>
4c1f30f/dom_md/metadata bs=4096 count=1'<br>
(cwd None)<br>
Thread-159125::DEBUG::2014-04-<u></u>23<br>
12:41:30,762::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCC<br>
ESS: <err> = '0+1 records in\n0+1 records<br>
out\n545 bytes (545 B) copied, 0.000382036 s, 1.4 MB/s\n';<br>
<rc> = 0<br>
Thread-159128::DEBUG::2014-04-<u></u>23<br>
12:41:32,226::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) '/bi<br>
n/dd iflag=direct<br>
if=/rhev/data-center/mnt/<u></u>host01.ovirt.lan:_home_export/<u></u>ff98d346-4515-4349-8437-fb2<br>
f5e9eaadf/dom_md/metadata bs=4096 count=1'<br>
(cwd None)<br>
Thread-159128::DEBUG::2014-04-<u></u>23<br>
12:41:32,245::fileSD::225::<u></u>Storage.Misc.excCmd::(<u></u>getReadDelay) SUCC<br>
ESS: <err> = '0+1 records in\n0+1 records<br>
out\n352 bytes (352 B) copied, 0.000648972 s, 542 kB/s\n';<br>
<rc> = 0<br>
<br>
<br>
<br>
2014-04-23 0:21 GMT+02:00 Doron Fediuck <<a href="mailto:dfediuck@redhat.com" target="_blank">dfediuck@redhat.com</a><br></div></div>
<mailto:<a href="mailto:dfediuck@redhat.com" target="_blank">dfediuck@redhat.com</a>>>:<div><br>
<br>
<br>
<br>
----- Original Message -----<br>
> From: "Kevin Tibi" <<a href="mailto:kevintibi@hotmail.com" target="_blank">kevintibi@hotmail.com</a><br></div><div>
<mailto:<a href="mailto:kevintibi@hotmail.com" target="_blank">kevintibi@hotmail.com</a>><u></u>><br>
> To: "users" <<a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a> <mailto:<a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a>>><br>
> Sent: Tuesday, April 22, 2014 2:12:50 PM<br>
> Subject: [ovirt-users] Hosted Engine error -243<br>
><br>
> Hi all,<br>
><br>
> I have a probleme with my hosted engine. Every 10 min i have a<br>
event in<br>
> engine :<br>
><br>
> VM HostedEngine is down. Exit message: internal error Failed to<br>
acquire lock:<br>
> error -243<br>
><br>
> My data is a local export NFS.<br>
><br>
> Thx for you help.<br>
><br>
> Kevin.<br>
><br>
<br>
Hi Kevin,<br>
can you please check the /var/log/ovirt-hosted-* log files in your hosts<br>
and let us know if you see something else there or in your vdsm log<br>
file?<br>
______________________________<u></u>_________________<br>
Users mailing list<br></div>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<u></u>mailman/listinfo/users</a><div><br>
<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<u></u>mailman/listinfo/users</a><br>
<br>
</div></blockquote><div><div>
<br>
______________________________<u></u>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/<u></u>mailman/listinfo/users</a><br>
</div></div></blockquote></div></div></div><br></div>
</blockquote></div><br></div>