[ovirt-users] Hosted Engine error -243

Kevin Tibi kevintibi at hotmail.com
Wed Apr 23 14:42:53 UTC 2014


Ho god, My cpu usage is 80% on host1

1729 vdsm      20   0  762m  15m 2884 S 297.6  0.1  77:16.70 ovirt-ha-broker



2014-04-23 16:40 GMT+02:00 Kevin Tibi <kevintibi at hotmail.com>:

> In engine, i have
> Hosted Engine HA: not active    for my host1
>  Hosted Engine HA: active (score 0)   for my host2
>
>
>
>
> 2014-04-23 13:52 GMT+02:00 Jiri Moskovcak <jmoskovc at redhat.com>:
>
> Hi,
>> I'm not sure yet what causes the problem, but the workaround should be:
>>
>> open file /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/states.py
>> in your favorite editor, go to line 52 and change it:
>>
>> from: except ValueError:
>> to: except (ValueError, TypeError):
>>
>> --Jirka
>>
>>
>> On 04/23/2014 12:43 PM, Kevin Tibi wrote:
>>
>>> Hi,
>>>
>>> /var/log/ovirt-hosted-engine-ha/broker.log
>>>
>>> Host1:
>>> Thread-118327::INFO::2014-04-23
>>> 12:34:59,360::listener::134::ovirt_hosted_engine_ha.broker.
>>> listener.ConnectionHandler::(setup)
>>> Connection established
>>> Thread-118327::INFO::2014-04-23
>>> 12:34:59,375::listener::184::ovirt_hosted_engine_ha.broker.
>>> listener.ConnectionHandler::(handle)
>>> Connection closed
>>> Thread-118328::INFO::2014-04-23
>>> 12:35:14,546::listener::134::ovirt_hosted_engine_ha.broker.
>>> listener.ConnectionHandler::(setup)
>>> Connection established
>>> Thread-118328::INFO::2014-04-23
>>> 12:35:14,549::listener::184::ovirt_hosted_engine_ha.broker.
>>> listener.ConnectionHandler::(handle)
>>> Connection closed
>>>
>>> Host2:
>>> Thread-4::INFO::2014-04-23
>>> 12:36:08,020::mem_free::53::mem_free.MemFree::(action
>>>   ) memFree: 9816
>>> Thread-3::INFO::2014-04-23
>>> 12:36:08,240::mgmt_bridge::59::mgmt_bridge.MgmtBridge
>>>   ::(action) Found bridge ovirtmgmt
>>> Thread-296455::INFO::2014-04-23
>>> 12:36:08,678::listener::134::ovirt_hosted_engine
>>>   _ha.broker.listener.ConnectionHandler::(setup) Connection established
>>> Thread-296455::INFO::2014-04-23
>>> 12:36:08,684::listener::184::ovirt_hosted_engine
>>>   _ha.broker.listener.ConnectionHandler::(handle) Connection closed
>>>
>>>
>>>
>>> /var/log/ovirt-hosted-engine-ha/agent.log
>>>
>>> host1:
>>>
>>> MainThread::INFO::2014-04-02
>>> 17:46:14,856::state_decorators::25::ovirt_hosted_en
>>>        gine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local
>>> engine vm status                            no actions taken
>>> MainThread::INFO::2014-04-02
>>> 17:46:14,857::brokerlink::108::ovirt_hosted_engine_
>>>        ha.lib.brokerlink.BrokerLink::(notify) Trying: notify
>>> time=1396453574.86 type=st                           ate_transition
>>> detail=UnknownLocalVmState-UnknownLocalVmState hostname='host01.o
>>>                      virt.lan'
>>> MainThread::INFO::2014-04-02
>>> 17:46:14,858::brokerlink::117::ovirt_hosted_engine_
>>>        ha.lib.brokerlink.BrokerLink::(notify) Success, was notification
>>> of state_transi                           tion
>>> (UnknownLocalVmState-UnknownLocalVmState) sent? ignored
>>> MainThread::WARNING::2014-04-02
>>> 17:46:15,463::hosted_engine::334::ovirt_hosted_e
>>>    ngine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error
>>> while monito                           ring engine: float() argument
>>> must be a string or a number
>>> MainThread::WARNING::2014-04-02
>>> 17:46:15,464::hosted_engine::337::ovirt_hosted_e
>>>    ngine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>> Unexpected error
>>> Traceback (most recent call last):
>>>    File
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_
>>> ha/agent/hosted_eng
>>>                          ine.py", line 323, in start_monitoring
>>>      state.score(self._log))
>>>    File
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_
>>> ha/agent/states.py"
>>>                          , line 160, in score
>>>      lm, logger, score, score_cfg)
>>>    File
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_
>>> ha/agent/states.py"
>>>                          , line 61, in _penalize_memory
>>>      if self._float_or_default(lm['mem-free'], 0) < vm_mem:
>>>    File
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_
>>> ha/agent/states.py"
>>>                          , line 51, in _float_or_default
>>>      return float(value)
>>> TypeError: float() argument must be a string or a number
>>> MainThread::ERROR::2014-04-02
>>> 17:46:15,464::hosted_engine::350::ovirt_hosted_eng
>>>      ine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>> Shutting down the ag                           ent because of 3 failures
>>> in a row!
>>> MainThread::INFO::2014-04-02
>>> 17:46:15,466::agent::116::ovirt_hosted_engine_ha.ag
>>> <http://ovirt_hosted_engine_ha.ag>
>>>
>>> ent.agent.Agent::(run) Agent shutting down
>>>
>>>
>>> host2:
>>>
>>> MainThread::INFO::2014-04-23
>>> 12:36:44,800::hosted_engine::323::ovirt_hosted_engine_ha.
>>> agent.hosted_engine.HostedEngine::(start_monitoring)
>>> Current state EngineUnexpectedlyDown (score: 0)
>>> MainThread::INFO::2014-04-23
>>> 12:36:54,844::brokerlink::108::ovirt_hosted_engine_ha.lib.
>>> brokerlink.BrokerLink::(notify)
>>> Trying: notify time=1398249414.84 type=state_transition
>>> detail=EngineUnexpectedlyDown-EngineUnexpectedlyDown
>>> hostname='host02.ovirt.lan'
>>> MainThread::INFO::2014-04-23
>>> 12:36:54,846::brokerlink::117::ovirt_hosted_engine_ha.lib.
>>> brokerlink.BrokerLink::(notify)
>>> Success, was notification of state_transition
>>> (EngineUnexpectedlyDown-EngineUnexpectedlyDown) sent? ignored
>>>
>>> /var/log/vdsm/vdsm.log
>>>
>>> host1 :
>>>
>>> Thread-116::DEBUG::2014-04-23
>>> 12:40:17,060::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bin/dd
>>> iflag=direct
>>> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_iso/
>>> cc51143e-8ad7-4b0b-a4d2-9024dffc1188/dom_md/metadata
>>> bs=4096 count=1' (cwd None)
>>> Thread-116::DEBUG::2014-04-23
>>> 12:40:17,070::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS:
>>> <err> = '0+1 records in\n0+1 records out\n343 bytes (343 B) copied,
>>> 0.000183642 s, 1.9 MB/s\n'; <rc> = 0
>>> Thread-37::DEBUG::2014-04-23
>>> 12:40:17,504::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bin/dd
>>> iflag=direct
>>> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_NFS01/
>>> aea040f8-ab9d-435b-9ecf-ddd4272e592f/dom_md/metadata
>>> bs=4096 count=1' (cwd None)
>>> Thread-37::DEBUG::2014-04-23
>>> 12:40:17,514::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS:
>>> <err> = '0+1 records in\n0+1 records out\n472 bytes (472 B) copied,
>>> 0.000165064 s, 2.9 MB/s\n'; <rc> = 0
>>> Thread-11736::DEBUG::2014-04-23
>>> 12:40:18,170::task::595::TaskManager.Task::(_updateState)
>>> Task=`8a3a3e42-6e79-4849-9b1c-cad895722884`::moving from state init ->
>>> state preparing
>>> Thread-11736::INFO::2014-04-23
>>> 12:40:18,170::logUtils::44::dispatcher::(wrapper) Run and protect:
>>> repoStats(options=None)
>>> Thread-11736::INFO::2014-04-23
>>> 12:40:18,171::logUtils::47::dispatcher::(wrapper) Run and protect:
>>> repoStats, Return response: {'aea040f8-ab9d-435b-9ecf-ddd4272e592f':
>>> {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000165064',
>>> 'lastCheck': '0.7', 'valid': True},
>>> '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f': {'code': 0, 'version': 3,
>>> 'acquired': True, 'delay': '0.000174536', 'lastCheck': '3.0', 'valid':
>>> True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188': {'code': 0, 'version': 0,
>>> 'acquired': True, 'delay': '0.000183642', 'lastCheck': '1.1', 'valid':
>>> True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf': {'code': 0, 'version': 0,
>>> 'acquired': True, 'delay': '0.00045492', 'lastCheck': '8.6', 'valid':
>>> True}}
>>> Thread-11736::DEBUG::2014-04-23
>>> 12:40:18,171::task::1185::TaskManager.Task::(prepare)
>>> Task=`8a3a3e42-6e79-4849-9b1c-cad895722884`::finished:
>>> {'aea040f8-ab9d-435b-9ecf-ddd4272e592f': {'code': 0, 'version': 3,
>>> 'acquired': True, 'delay': '0.000165064', 'lastCheck': '0.7', 'valid':
>>> True}, '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f': {'code': 0, 'version': 3,
>>> 'acquired': True, 'delay': '0.000174536', 'lastCheck': '3.0', 'valid':
>>> True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188': {'code': 0, 'version': 0,
>>> 'acquired': True, 'delay': '0.000183642', 'lastCheck': '1.1', 'valid':
>>> True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf': {'code': 0, 'version': 0,
>>> 'acquired': True, 'delay': '0.00045492', 'lastCheck': '8.6', 'valid':
>>> True}}
>>> Thread-11736::DEBUG::2014-04-23
>>> 12:40:18,172::task::595::TaskManager.Task::(_updateState)
>>> Task=`8a3a3e42-6e79-4849-9b1c-cad895722884`::moving from state preparing
>>> -> state finished
>>> Thread-11736::DEBUG::2014-04-23
>>> 12:40:18,172::resourceManager::940::ResourceManager.Owner::(releaseAll)
>>> Owner.releaseAll requests {} resources {}
>>> Thread-11736::DEBUG::2014-04-23
>>> 12:40:18,172::resourceManager::977::ResourceManager.Owner::(cancelAll)
>>> Owner.cancelAll requests {}
>>> Thread-11736::DEBUG::2014-04-23
>>> 12:40:18,172::task::990::TaskManager.Task::(_decref)
>>> Task=`8a3a3e42-6e79-4849-9b1c-cad895722884`::ref 0 aborting False
>>> Thread-299::DEBUG::2014-04-23
>>> 12:40:19,599::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bin/dd
>>> iflag=direct
>>> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_export/
>>> ff98d346-4515-4349-8437-fb2f5e9eaadf/dom_md/metadata
>>> bs=4096 count=1' (cwd None)
>>> Thread-299::DEBUG::2014-04-23
>>> 12:40:19,610::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS:
>>> <err> = '0+1 records in\n0+1 records out\n352 bytes (352 B) copied,
>>> 0.000525872 s, 669 kB/s\n'; <rc> = 0
>>>
>>>
>>> host2 :
>>>
>>> Thread-1688899::DEBUG::2014-04-23
>>> 12:41:30,270::task::990::TaskManager.Task::(_decref) Task=`c23aeaf
>>>                        5-aed4-4285-a8c9-2bffadc0240e`::ref 0 aborting
>>> False
>>> Thread-159126::DEBUG::2014-04-23
>>> 12:41:30,547::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bi
>>>                          n/dd iflag=direct
>>> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_iso/
>>> cc51143e-8ad7-4b0b-a4d2-9024df
>>>                              fc1188/dom_md/metadata bs=4096 count=1'
>>> (cwd None)
>>> Thread-159126::DEBUG::2014-04-23
>>> 12:41:30,569::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCC
>>>                          ESS: <err> = '0+1 records in\n0+1 records
>>> out\n343 bytes (343 B) copied, 0.000480513 s, 714 kB/s\n';
>>>                 <rc> = 0
>>> Thread-159125::DEBUG::2014-04-23
>>> 12:41:30,740::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bi
>>>                          n/dd iflag=direct
>>> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_DATA/
>>> 5ae613a4-44e4-42cb-89fc-7b5d3
>>>                              4c1f30f/dom_md/metadata bs=4096 count=1'
>>> (cwd None)
>>> Thread-159125::DEBUG::2014-04-23
>>> 12:41:30,762::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCC
>>>                          ESS: <err> = '0+1 records in\n0+1 records
>>> out\n545 bytes (545 B) copied, 0.000382036 s, 1.4 MB/s\n';
>>>                 <rc> = 0
>>> Thread-159128::DEBUG::2014-04-23
>>> 12:41:32,226::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bi
>>>                          n/dd iflag=direct
>>> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_export/
>>> ff98d346-4515-4349-8437-fb2
>>>                              f5e9eaadf/dom_md/metadata bs=4096 count=1'
>>> (cwd None)
>>> Thread-159128::DEBUG::2014-04-23
>>> 12:41:32,245::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCC
>>>                          ESS: <err> = '0+1 records in\n0+1 records
>>> out\n352 bytes (352 B) copied, 0.000648972 s, 542 kB/s\n';
>>>                 <rc> = 0
>>>
>>>
>>>
>>> 2014-04-23 0:21 GMT+02:00 Doron Fediuck <dfediuck at redhat.com
>>> <mailto:dfediuck at redhat.com>>:
>>>
>>>
>>>
>>>
>>>     ----- Original Message -----
>>>      > From: "Kevin Tibi" <kevintibi at hotmail.com
>>>     <mailto:kevintibi at hotmail.com>>
>>>      > To: "users" <users at ovirt.org <mailto:users at ovirt.org>>
>>>      > Sent: Tuesday, April 22, 2014 2:12:50 PM
>>>      > Subject: [ovirt-users] Hosted Engine error -243
>>>      >
>>>      > Hi all,
>>>      >
>>>      > I have a probleme with my hosted engine. Every 10 min i have a
>>>     event in
>>>      > engine :
>>>      >
>>>      > VM HostedEngine is down. Exit message: internal error Failed to
>>>     acquire lock:
>>>      > error -243
>>>      >
>>>      > My data is a local export NFS.
>>>      >
>>>      > Thx for you help.
>>>      >
>>>      > Kevin.
>>>      >
>>>
>>>     Hi Kevin,
>>>     can you please check the /var/log/ovirt-hosted-* log files in your
>>> hosts
>>>     and let us know if you see something else there or in your vdsm log
>>>     file?
>>>     _______________________________________________
>>>     Users mailing list
>>>     Users at ovirt.org <mailto:Users at ovirt.org>
>>>     http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140423/f7c23a31/attachment-0001.html>


More information about the Users mailing list