Users
Threads by month
- ----- 2026 -----
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- 19162 discussions
------=_Part_891997_988106587.1415681763753
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Hi Jirka,
the patch works. it stabilized the status of my two hosts. the engine migration during failover also works fine. thanks guys!
Jaicel
From: "Jiri Moskovcak" <jmoskovc(a)redhat.com>
To: "Jaicel" <jaicel(a)asti.dost.gov.ph>
Cc: "Niels de Vos" <ndevos(a)redhat.com>, "Vijay Bellur" <vbellur(a)redhat.com>, users(a)ovirt.org, "Gluster Devel" <gluster-devel(a)gluster.org>
Sent: Monday, November 3, 2014 3:33:16 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem
On 11/01/2014 07:43 AM, Jaicel wrote:
> Hi,
>
> my engine runs on Host1. current status and agent logs below.
>
> Host 1
Hi,
it seems like you ran into [1], you can either zero-out the metadata
file or apply the patch from [1] manually.
--Jirka
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158925
>
> MainThread::INFO::2014-10-31 16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engi
> ne-ha agent 1.1.6 started
> MainThread::INFO::2014-10-31 16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(_get_hostname) Found certificate common name: 192.168.12.11
> MainThread::INFO::2014-10-31 16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(_initialize_broker) Initializing ha-broker connection
> MainThread::INFO::2014-10-31 16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor ping, options {'addr': '192.168.12.254'}
> MainThread::INFO::2014-10-31 16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 140634215107920
> MainThread::INFO::2014-10-31 16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
> MainThread::INFO::2014-10-31 16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 140634215108432
> MainThread::INFO::2014-10-31 16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
> MainThread::INFO::2014-10-31 16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 39956688
> MainThread::INFO::2014-10-31 16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f
> 9', 'address': '0'}
> MainThread::INFO::2014-10-31 16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 140634215107664
> MainThread::INFO::2014-10-31 16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f9', '
> address': '0'}
> MainThread::INFO::2014-10-31 16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 140634006879632
> MainThread::INFO::2014-10-31 16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(_initialize_broker) Broker initialized, all submonitors started
> MainThread::INFO::2014-10-31 16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /rhev/data-center/mnt/g
> luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
> MainThread::INFO::2014-10-31 16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(refresh) Global metadata: {'maintenance': False}
> MainThread::INFO::2014-10-31 16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra': 'metadata_parse_version=1\nmetadata_feature_version
> =1\ntimestamp=1413882675 (Tue Oct 21 17:11:15 2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 'hostname': '192.168.12.12', 'host-id': 2, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400, 'maintenance': False, 'host-ts': 1413882675}
> MainThread::INFO::2014-10-31 16:55:40,322::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': None, 'bridge': True, 'mem-free': None, 'maintenance': False, 'cpu-load': None, 'gateway': True}
> MainThread::INFO::2014-10-31 16:55:40,323::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1414745740.32 type=state_transition detail=StartState-ReinitializeFSM hostname='ovirt1'
> MainThread::INFO::2014-10-31 16:55:40,392::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (StartState-ReinitializeFSM) sent? ignored
> MainThread::INFO::2014-10-31 16:55:40,675::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state ReinitializeFSM (score: 0)
> MainThread::INFO::2014-10-31 16:55:50,710::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Trying: notify time=1414745750.71 type=state_transition detail=ReinitializeFSM-EngineUp hostname='ovirt1'
> MainThread::INFO::2014-10-31 16:55:50,710::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-EngineUp) sent? ignored
> MainThread::INFO::2014-10-31 16:55:51,001::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 2400)
> MainThread::CRITICAL::2014-10-31 16:56:01,033::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 97, in run
> self._run_agent()
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 154, in _run_agent
> hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 307, in start_monitoring
> for old_state, state, delay in self.fsm:
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 125, in next
> new_data = self.refresh(self._state.data)
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh
> stats.update(self.hosted_engine.collect_stats())
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 700, in collect_stats
> stats = self.process_remote_metadata(host_id, remote_data)
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 747, in process_remote_metadata
> md['engine-status'] = engine_status(md["engine-status"])
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 79, in engine_status
> in json.loads(status).iteritems()])
> AttributeError: 'NoneType' object has no attribute 'iteritems'
> [root@ovirt1 ~]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> Status up-to-date : False
> Hostname : 192.168.12.11
> Host ID : 1
> Engine status : unknown stale-data
> Score : 2400
> Local maintenance : False
> Host timestamp : 1414745750
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=1414745750 (Fri Oct 31 16:55:50 2014)
> host-id=1
> score=2400
> maintenance=False
> state=EngineUp
>
>
> --== Host 2 status ==--
>
> Status up-to-date : False
> Hostname : 192.168.12.12
> Host ID : 2
> Engine status : unknown stale-data
> Score : 2400
> Local maintenance : False
> Host timestamp : 1414745821
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=1414745821 (Fri Oct 31 16:57:01 2014)
> host-id=2
> score=2400
> maintenance=False
> state=EngineStart
> [root@ovirt1 ~]# service ovirt-ha-agent status
> ovirt-ha-agent dead but subsys locked
>
> Host2
>
> MainThread::INFO::2014-10-31 16:55:59,642::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engi
> ne-ha agent 1.1.6 started
> MainThread::INFO::2014-10-31 16:55:59,678::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(_get_hostname) Found certificate common name: 192.168.12.12
> MainThread::INFO::2014-10-31 16:55:59,918::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(_initialize_broker) Initializing ha-broker connection
> MainThread::INFO::2014-10-31 16:55:59,919::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor ping, options {'addr': '192.168.12.254'}
> MainThread::INFO::2014-10-31 16:55:59,922::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 25353488
> MainThread::INFO::2014-10-31 16:55:59,922::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
> MainThread::INFO::2014-10-31 16:55:59,928::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 25354128
> MainThread::INFO::2014-10-31 16:55:59,928::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
> MainThread::INFO::2014-10-31 16:55:59,931::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 25353552
> MainThread::INFO::2014-10-31 16:55:59,931::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f
> 9', 'address': '0'}
> MainThread::INFO::2014-10-31 16:55:59,934::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 139976608389584
> MainThread::INFO::2014-10-31 16:55:59,934::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f9', '
> address': '0'}
> MainThread::INFO::2014-10-31 16:55:59,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
> nitor) Success, id 139976608447760
> MainThread::INFO::2014-10-31 16:55:59,939::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(_initialize_broker) Broker initialized, all submonitors started
> MainThread::INFO::2014-10-31 16:55:59,983::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 2 is acquired (file: /rhev/data-center/mnt/g
> luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
> MainThread::INFO::2014-10-31 16:56:00,001::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(refresh) Global metadata: {'maintenance': False}
> MainThread::INFO::2014-10-31 16:56:00,001::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(refresh) Host 192.168.12.11 (id 1): {'live-data': True, 'extra': 'metadata_parse_version=1\nmetadata_feature_version=
> 1\ntimestamp=1414745750 (Fri Oct 31 16:55:50 2014)\nhost-id=1\nscore=2400\nmaintenance=False\nstate=EngineUp\n', 'hostn
> ame': '192.168.12.11', 'host-id': 1, 'engine-status': {'health': 'good', 'vm': 'up', 'detail': 'up'}, 'score': 2400, 'm
> aintenance': False, 'host-ts': 1414745750}
> MainThread::INFO::2014-10-31 16:56:00,001::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(refresh) Local (id 2): {'engine-health': None, 'bridge': True, 'mem-free': None, 'maintenance': False, 'cpu-load': No
> ne, 'gateway': True}
> MainThread::INFO::2014-10-31 16:56:00,002::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Trying: notify time=1414745760.0 type=state_transition detail=StartState-ReinitializeFSM hostname='ovirt2'
> MainThread::INFO::2014-10-31 16:56:00,045::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Success, was notification of state_transition (StartState-ReinitializeFSM) sent? ignored
> MainThread::INFO::2014-10-31 16:56:00,325::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
> :(start_monitoring) Current state ReinitializeFSM (score: 0)
> MainThread::INFO::2014-10-31 16:56:10,352::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1414745770.35 type=state_transition detail=ReinitializeFSM-EngineDown hostname='ovirt2'
> MainThread::INFO::2014-10-31 16:56:10,353::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-EngineDown) sent? ignored
> MainThread::INFO::2014-10-31 16:56:10,638::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400)
> MainThread::INFO::2014-10-31 16:56:20,663::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not running, but we do not have enough data to decide which hosts are alive
> MainThread::INFO::2014-10-31 16:56:20,663::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1414745780.66 type=state_transition detail=EngineDown-EngineDown hostname='ovirt2'
> MainThread::INFO::2014-10-31 16:56:20,664::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored
> MainThread::INFO::2014-10-31 16:56:20,943::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400)
> MainThread::INFO::2014-10-31 16:56:30,968::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not running, but we do not have enough data to decide which hosts are alive
> MainThread::INFO::2014-10-31 16:56:30,969::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1414745790.97 type=state_transition detail=EngineDown-EngineDown hostname='ovirt2'
> MainThread::INFO::2014-10-31 16:56:30,969::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored
> MainThread::INFO::2014-10-31 16:56:31,248::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400)
> MainThread::INFO::2014-10-31 16:56:41,274::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not running, but we do not have enough data to decide which hosts are alive
> MainThread::INFO::2014-10-31 16:56:41,275::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1414745801.28 type=state_transition detail=EngineDown-EngineDown hostname='ovirt2'
> MainThread::INFO::2014-10-31 16:56:41,276::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored
> MainThread::INFO::2014-10-31 16:56:41,555::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400)
> MainThread::INFO::2014-10-31 16:56:51,583::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not running, but we do not have enough data to decide which hosts are alive
> MainThread::INFO::2014-10-31 16:56:51,584::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1414745811.58 type=state_transition detail=EngineDown-EngineDown hostname='ovirt2'
> MainThread::INFO::2014-10-31 16:56:51,584::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored
> MainThread::INFO::2014-10-31 16:56:51,864::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400)
> MainThread::INFO::2014-10-31 16:57:01,897::states::454::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (2400), attempting to start engine VM
> MainThread::INFO::2014-10-31 16:57:01,898::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1414745821.9 type=state_transition detail=EngineDown-EngineStart hostname='ovirt2'
> MainThread::INFO::2014-10-31 16:57:01,906::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored
> MainThread::INFO::2014-10-31 16:57:02,189::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineStart (score: 2400)
> MainThread::CRITICAL::2014-10-31 16:57:02,207::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 97, in run
> self._run_agent()
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 154, in _run_agent
> hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 307, in start_monitoring
> for old_state, state, delay in self.fsm:
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 125, in next
> new_data = self.refresh(self._state.data)
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh
> stats.update(self.hosted_engine.collect_stats())
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 662, in collect_stats
> constants.SERVICE_TYPE)
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage
> result = self._checked_communicate(request)
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate
> .format(message or response))
> RequestError: Request failed: <type 'exceptions.OSError'>
>
> [root@ovirt2 ~]# hosted-engine --vm-status
> Traceback (most recent call last):
> File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
> File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
> exec code in run_globals
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 111, in <module>
> if not status_checker.print_status():
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 58, in print_status
> all_host_stats = ha_cli.get_all_host_stats()
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 137, in get_all_host_stats
> return self.get_all_stats(self.StatModes.HOST)
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 86, in get_all_stats
> constants.SERVICE_TYPE)
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage
> result = self._checked_communicate(request)
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate
> .format(message or response))
> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: <type 'exceptions.OSError'>
> [root@ovirt2 ~]# service ovirt-ha-agent status
> ovirt-ha-agent dead but subsys locked
>
>
> Thanks,
> Jaicel
>
> ----- Original Message -----
> From: "Jiri Moskovcak" <jmoskovc(a)redhat.com>
> To: "Jaicel" <jaicel(a)asti.dost.gov.ph>
> Cc: "Niels de Vos" <ndevos(a)redhat.com>, "Vijay Bellur" <vbellur(a)redhat.com>, users(a)ovirt.org, "Gluster Devel" <gluster-devel(a)gluster.org>
> Sent: Friday, October 31, 2014 11:05:32 PM
> Subject: Re: [ovirt-users] Hosted-Engine HA problem
>
> On 10/31/2014 10:26 AM, Jaicel wrote:
>> i've increased the limit and then restarted agent and broker. status normalize, but then right now it went to "False" state again but still both having 2400 score. agent logs remains the same, with "ovirt-ha-agent dead but subsys locked" status. ha-broker logs below
>>
>> Thread-138::INFO::2014-10-31 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
>> Thread-138::INFO::2014-10-31 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
>> Thread-139::INFO::2014-10-31 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
>> Thread-139::INFO::2014-10-31 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
>> Thread-140::INFO::2014-10-31 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
>> Thread-140::INFO::2014-10-31 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
>> Thread-141::INFO::2014-10-31 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
>> Thread-141::INFO::2014-10-31 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
>> Thread-142::INFO::2014-10-31 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
>> Thread-142::INFO::2014-10-31 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
>>
>> Thanks,
>> Jaicel
>
> ok, now it seems that broker runs fine, so I need the recent agent.log
> to debug it more.
>
> --Jirka
>
>>
>> ----- Original Message -----
>> From: "Jiri Moskovcak" <jmoskovc(a)redhat.com>
>> To: "Jaicel R. Sabonsolin" <jaicel(a)asti.dost.gov.ph>, "Niels de Vos" <ndevos(a)redhat.com>
>> Cc: "Vijay Bellur" <vbellur(a)redhat.com>, users(a)ovirt.org, "Gluster Devel" <gluster-devel(a)gluster.org>
>> Sent: Friday, October 31, 2014 4:32:02 PM
>> Subject: Re: [ovirt-users] Hosted-Engine HA problem
>>
>> On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
>>> Hi guys,
>>>
>>> these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared.
>>>
>>> agent.log
>>> new_data = self.refresh(self._state.data)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh
>>> stats.update(self.hosted_engine.collect_stats())
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 662, in collect_stats
>>> constants.SERVICE_TYPE)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage
>>> result = self._checked_communicate(request)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate
>>> .format(message or response))
>>> RequestError: Request failed: <type 'exceptions.OSError'>
>>>
>>> broker.log
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle
>>> response = "success " + self._dispatch(data)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch
>>> .get_all_stats_for_service_type(**options)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type
>>> d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type
>>> f = os.open(path, direct_flag | os.O_RDONLY)
>>> OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
>>
>> - ah, there we go ^^^^^^ you might need to tweak the limit of allowed
>> open files as described here [1] or find the app keeps so many files open
>>
>>
>> --Jirka
>>
>> [1]
>> http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-fileā¦
>>
>>> Thread-38160::INFO::2014-10-31 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
>>> Thread-38161::INFO::2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
>>> Thread-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'get-stats storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent service_type=hosted-engine'
>>> Traceback (most recent call last):
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle
>>> response = "success " + self._dispatch(data)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch
>>> .get_all_stats_for_service_type(**options)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type
>>> d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type
>>> f = os.open(path, direct_flag | os.O_RDONLY)
>>> OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
>>> Thread-38161::INFO::2014-10-31 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
>>>
>>> Thanks,
>>> Jaicel
>>>
>>> ----- Original Message -----
>>> From: "Niels de Vos" <ndevos(a)redhat.com>
>>> To: "Vijay Bellur" <vbellur(a)redhat.com>
>>> Cc: "Jiri Moskovcak" <jmoskovc(a)redhat.com>, "Jaicel R. Sabonsolin" <jaicel(a)asti.dost.gov.ph>, users(a)ovirt.org, "Gluster Devel" <gluster-devel(a)gluster.org>
>>> Sent: Friday, October 31, 2014 4:11:25 AM
>>> Subject: Re: [ovirt-users] Hosted-Engine HA problem
>>>
>>> On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:
>>>> On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:
>>>>> On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:
>>>>>> Hi Guys,
>>>>>>
>>>>>> I need help with my ovirt Hosted-Engine HA setup. I am running on 2
>>>>>> ovirt hosts and 2 gluster nodes with replicated volumes. i already have
>>>>>> VMs running on my hosts and they can migrate normally once i for example
>>>>>> power off the host that they are running on. the problem is that the
>>>>>> engine can't migrate once i switch off the host that hosts the engine.
>>>>>>
>>>>>> oVirt 3.4.3-1.el6
>>>>>> KVM 0.12.1.2 - 2.415.el6_5.10
>>>>>> LIBVIRT libvirt-0.10.2-29.el6_5.9
>>>>>> VDSM vdsm-4.14.17-0.el6
>>>>>>
>>>>>>
>>>>>> right now, i have this result from hosted-engine --vm-status.
>>>>>>
>>>>>> File "/usr/lib64/python2.6/runpy.py", line 122, in
>>>>>> _run_module_as_main
>>>>>> "__main__", fname, loader, pkg_name)
>>>>>> File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
>>>>>> exec code in run_globals
>>>>>> File
>>>>>>
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>>>
>>>>>> line 111, in <module>
>>>>>> if not status_checker.print_status():
>>>>>> File
>>>>>>
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>>>
>>>>>> line 58, in print_status
>>>>>> all_host_stats = ha_cli.get_all_host_stats()
>>>>>> File
>>>>>>
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>>>
>>>>>> line 137, in get_all_host_stats
>>>>>> return self.get_all_stats(self.StatModes.HOST)
>>>>>> File
>>>>>>
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>>>
>>>>>> line 86, in get_all_stats
>>>>>> constants.SERVICE_TYPE)
>>>>>> File
>>>>>>
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>>
>>>>>> line 171, in get_stats_from_storage
>>>>>> result = self._checked_communicate(request)
>>>>>> File
>>>>>>
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>>
>>>>>> line 199, in _checked_communicate
>>>>>> .format(message or response))
>>>>>> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
>>>>>> <type 'exceptions.OSError'>
>>>>>>
>>>>>>
>>>>>> restarting ha-broker and ha-agent normalizes the status but eventually
>>>>>> it would become "false" and then return to the result above. hope you
>>>>>> guys could help me with this.
>>>>>>
>>>>>
>>>>> Hi Jaicel,
>>>>> please attach agent.log and broker.log from the host where you trying to
>>>>> run hosted-engine --vm-status. I have a feeling that you ran into a
>>>>> known problem on gluster - stalled file descriptor, in that case the
>>>>> only known solution at this time is to restart the broker & agent as you
>>>>> have already found out.
>>>>>
>>>>
>>>> Adding Niels and gluster-devel to troubleshoot from Gluster NFS perspective.
>>>
>>> I'd welcome any details on this "stalled file descriptor" problem. Is
>>> there a bug filed with some details like logs, sysrq-t and maybe even
>>> tcpdumps? If there is an easy way to reproduce this behaviour, I can
>>> surely look into it and hopefully come up with some advise or fix.
>>>
>>> Thanks,
>>> Niels
>>>
------=_Part_891997_988106587.1415681763753
Content-Type: multipart/related;
boundary="----=_Part_891998_1183135702.1415681763754"
------=_Part_891998_1183135702.1415681763754
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<html><body><div style=3D"font-family: arial, helvetica, sans-serif; font-s=
ize: 10pt; color: #000000"><div data-marker=3D"__QUOTED_TEXT__"><div style=
=3D"font-family: arial, helvetica, sans-serif; font-size: 10pt; color: #000=
000"><div>Hi Jirka,<br><br>the patch works. it stabilized the status of my =
two hosts. the engine migration during failover also works fine. thanks guy=
s! <img src=3D"cid:8b096be5d873a9597907183bb13f9baf5a0669a2@zimbra"><br></d=
iv><div><br data-mce-bogus=3D"1"></div><div>Jaicel</div><br><hr id=3D"zwchr=
"><div><b>From: </b>"Jiri Moskovcak" <jmoskovc(a)redhat.com><br><b>To: =
</b>"Jaicel" <jaicel(a)asti.dost.gov.ph><br><b>Cc: </b>"Niels de Vos" &=
lt;ndevos(a)redhat.com>, "Vijay Bellur" <vbellur(a)redhat.com>, users@=
ovirt.org, "Gluster Devel" <gluster-devel(a)gluster.org><br><b>Sent: </=
b>Monday, November 3, 2014 3:33:16 PM<br><b>Subject: </b>Re: [ovirt-users] =
Hosted-Engine HA problem<br></div><br><div>On 11/01/2014 07:43 AM, Jaicel w=
rote:<br>> Hi,<br>><br>> my engine runs on Host1. current status a=
nd agent logs below.<br>><br>> Host 1<br><br>Hi,<br>it seems like you=
ran into [1], you can either zero-out the metadata <br>file or apply the p=
atch from [1] manually.<br><br>--Jirka<br><br>[1] https://bugzilla.redhat.c=
om/show_bug.cgi?id=3D1158925<br><br>><br>> MainThread::INFO::2014-10-=
31 16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run)=
ovirt-hosted-engi<br>> ne-ha agent 1.1.6 started<br>> MainThread::IN=
FO::2014-10-31 16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.age=
nt.hosted_engine.HostedEngine:<br>> :(_get_hostname) Found certificate c=
ommon name: 192.168.12.11<br>> MainThread::INFO::2014-10-31 16:55:40,228=
::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngi=
ne:<br>> :(_initialize_broker) Initializing ha-broker connection<br>>=
MainThread::INFO::2014-10-31 16:55:40,228::brokerlink::126::ovirt_hosted_e=
ngine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Starting monit=
or ping, options {'addr': '192.168.12.254'}<br>> MainThread::INFO::2014-=
10-31 16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.=
BrokerLink::(start_mo<br>> nitor) Success, id 140634215107920<br>> Ma=
inThread::INFO::2014-10-31 16:55:40,231::brokerlink::126::ovirt_hosted_engi=
ne_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Starting monitor =
mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'addre=
ss': '0'}<br>> MainThread::INFO::2014-10-31 16:55:40,237::brokerlink::13=
7::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nito=
r) Success, id 140634215108432<br>> MainThread::INFO::2014-10-31 16:55:4=
0,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(=
start_mo<br>> nitor) Starting monitor mem-free, options {'use_ssl': 'tru=
e', 'address': '0'}<br>> MainThread::INFO::2014-10-31 16:55:40,240::brok=
erlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br=
>> nitor) Success, id 39956688<br>> MainThread::INFO::2014-10-31 16:5=
5:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink=
::(start_mo<br>> nitor) Starting monitor cpu-load-no-engine, options {'u=
se_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f<br>> 9'=
, 'address': '0'}<br>> MainThread::INFO::2014-10-31 16:55:40,243::broker=
link::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>&=
gt; nitor) Success, id 140634215107664<br>> MainThread::INFO::2014-10-31=
16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.Broke=
rLink::(start_mo<br>> nitor) Starting monitor engine-health, options {'u=
se_ssl': 'true', 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f9', '<br>&g=
t; address': '0'}<br>> MainThread::INFO::2014-10-31 16:55:40,249::broker=
link::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>&=
gt; nitor) Success, id 140634006879632<br>> MainThread::INFO::2014-10-31=
16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engi=
ne.HostedEngine:<br>> :(_initialize_broker) Broker initialized, all subm=
onitors started<br>> MainThread::INFO::2014-10-31 16:55:40,298::hosted_e=
ngine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:<br>>=
; :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host i=
d 1 is acquired (file: /rhev/data-center/mnt/g<br>> luster1:_engine/6eb2=
20be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)<br>> =
MainThread::INFO::2014-10-31 16:55:40,322::state_machine::153::ovirt_hosted=
_engine_ha.agent.hosted_engine.HostedEngine:<br>> :(refresh) Global meta=
data: {'maintenance': False}<br>> MainThread::INFO::2014-10-31 16:55:40,=
322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE=
ngine:<br>> :(refresh) Host 192.168.12.12 (id 2): {'live-data': False, '=
extra': 'metadata_parse_version=3D1\nmetadata_feature_version<br>> =3D1\=
ntimestamp=3D1413882675 (Tue Oct 21 17:11:15 2014)\nhost-id=3D2\nscore=3D24=
00\nmaintenance=3DFalse\nstate=3DEngineDown\n', 'hostname': '192.168.12.12'=
, 'host-id': 2, 'engine-status': {'reason': 'vm not running on this host', =
'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400, 'mainte=
nance': False, 'host-ts': 1413882675}<br>> MainThread::INFO::2014-10-31 =
16:55:40,322::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engin=
e.HostedEngine::(refresh) Local (id 1): {'engine-health': None, 'bridge': T=
rue, 'mem-free': None, 'maintenance': False, 'cpu-load': None, 'gateway': T=
rue}<br>> MainThread::INFO::2014-10-31 16:55:40,323::brokerlink::108::ov=
irt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify tim=
e=3D1414745740.32 type=3Dstate_transition detail=3DStartState-ReinitializeF=
SM hostname=3D'ovirt1'<br>> MainThread::INFO::2014-10-31 16:55:40,392::b=
rokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) =
Success, was notification of state_transition (StartState-ReinitializeFSM) =
sent? ignored<br>> MainThread::INFO::2014-10-31 16:55:40,675::hosted_eng=
ine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_m=
onitoring) Current state ReinitializeFSM (score: 0)<br>> MainThread::INF=
O::2014-10-31 16:55:50,710::brokerlink::108::ovirt_hosted_engine_ha.lib.bro=
kerlink.BrokerLink::(notify)<br>> Trying: notify time=3D1414745750.71 ty=
pe=3Dstate_transition detail=3DReinitializeFSM-EngineUp hostname=3D'ovirt1'=
<br>> MainThread::INFO::2014-10-31 16:55:50,710::brokerlink::117::ovirt_=
hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notificat=
ion of state_transition (ReinitializeFSM-EngineUp) sent? ignored<br>> Ma=
inThread::INFO::2014-10-31 16:55:51,001::hosted_engine::327::ovirt_hosted_e=
ngine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state=
EngineUp (score: 2400)<br>> MainThread::CRITICAL::2014-10-31 16:56:01,0=
33::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not s=
tart ha-agent<br>> Traceback (most recent call last):<br>> &nb=
sp;File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agen=
t.py", line 97, in run<br>> self._run_agent()<br>>=
; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_h=
a/agent/agent.py", line 154, in _run_agent<br>> host=
ed_engine.HostedEngine(self.shutdown_requested).start_monitoring()<br>> =
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/=
agent/hosted_engine.py", line 307, in start_monitoring<br>>  =
; for old_state, state, delay in self.fsm:<br>> File =
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py=
", line 125, in next<br>> new_data =3D self.refresh(=
self._state.data)<br>> File "/usr/lib/python2.6/site-packag=
es/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh<br>&=
gt; stats.update(self.hosted_engine.collect_stats())<br=
>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engi=
ne_ha/agent/hosted_engine.py", line 700, in collect_stats<br>> &n=
bsp; stats =3D self.process_remote_metadata(host_id, remote_data)<br>=
> File "/usr/lib/python2.6/site-packages/ovirt_hosted_engin=
e_ha/agent/hosted_engine.py", line 747, in process_remote_metadata<br>> =
md['engine-status'] =3D engine_status(md["engine-status=
"])<br>> File "/usr/lib/python2.6/site-packages/ovirt_hoste=
d_engine_ha/agent/hosted_engine.py", line 79, in engine_status<br>> &nbs=
p; in json.loads(status).iteritems()])<br>> AttributeError:=
'NoneType' object has no attribute 'iteritems'<br>> [root@ovirt1 ~]# ho=
sted-engine --vm-status<br>><br>><br>> --=3D=3D Host 1 status =3D=
=3D--<br>><br>> Status up-to-date =
: False<br>> Hostname &n=
bsp; : 192.1=
68.12.11<br>> Host ID &=
nbsp; : 1<br>> Engine status &n=
bsp; :=
unknown stale-data<br>> Score =
: 2400<br>&g=
t; Local maintenance  =
; : False<br>> Host timestamp &=
nbsp; : 1414745750<br>> Extra metadata (vali=
d at timestamp):<br>> metadata_parse_v=
ersion=3D1<br>> metadata_feature_versi=
on=3D1<br>> timestamp=3D1414745750 (Fr=
i Oct 31 16:55:50 2014)<br>> host-id=
=3D1<br>> score=3D2400<br>> =
maintenance=3DFalse<br>> =
state=3DEngineUp<br>><br>><br>> --=3D=3D Host 2 stat=
us =3D=3D--<br>><br>> Status up-to-date &=
nbsp; : False<br>> Hostname &nb=
sp; :=
192.168.12.12<br>> Host ID &n=
bsp; : 2<br>> Engine sta=
tus &=
nbsp;: unknown stale-data<br>> Score =
: 2400=
<br>> Local maintenance =
: False<br>> Host timestamp &n=
bsp; : 1414745821<br>> Extra metadata=
(valid at timestamp):<br>> metadata_p=
arse_version=3D1<br>> metadata_feature=
_version=3D1<br>> timestamp=3D14147458=
21 (Fri Oct 31 16:57:01 2014)<br>> hos=
t-id=3D2<br>> score=3D2400<br>> &nb=
sp; maintenance=3DFalse<br>> &n=
bsp; state=3DEngineStart<br>> [root@ovirt1 ~]# service ovir=
t-ha-agent status<br>> ovirt-ha-agent dead but subsys locked<br>><br>=
> Host2<br>><br>> MainThread::INFO::2014-10-31 16:55:59,642::agent=
::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engi<br>=
> ne-ha agent 1.1.6 started<br>> MainThread::INFO::2014-10-31 16:55:5=
9,678::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.Hoste=
dEngine:<br>> :(_get_hostname) Found certificate common name: 192.168.12=
.12<br>> MainThread::INFO::2014-10-31 16:55:59,918::hosted_engine::367::=
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:<br>> :(_initial=
ize_broker) Initializing ha-broker connection<br>> MainThread::INFO::201=
4-10-31 16:55:59,919::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlin=
k.BrokerLink::(start_mo<br>> nitor) Starting monitor ping, options {'add=
r': '192.168.12.254'}<br>> MainThread::INFO::2014-10-31 16:55:59,922::br=
okerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<=
br>> nitor) Success, id 25353488<br>> MainThread::INFO::2014-10-31 16=
:55:59,922::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLi=
nk::(start_mo<br>> nitor) Starting monitor mgmt-bridge, options {'use_ss=
l': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}<br>> MainThread:=
:INFO::2014-10-31 16:55:59,928::brokerlink::137::ovirt_hosted_engine_ha.lib=
.brokerlink.BrokerLink::(start_mo<br>> nitor) Success, id 25354128<br>&g=
t; MainThread::INFO::2014-10-31 16:55:59,928::brokerlink::126::ovirt_hosted=
_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Starting mon=
itor mem-free, options {'use_ssl': 'true', 'address': '0'}<br>> MainThre=
ad::INFO::2014-10-31 16:55:59,931::brokerlink::137::ovirt_hosted_engine_ha.=
lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Success, id 25353552<br=
>> MainThread::INFO::2014-10-31 16:55:59,931::brokerlink::126::ovirt_hos=
ted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Starting =
monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '41d4aff=
1-54e1-4946-a812-2e656bb7d3f<br>> 9', 'address': '0'}<br>> MainThread=
::INFO::2014-10-31 16:55:59,934::brokerlink::137::ovirt_hosted_engine_ha.li=
b.brokerlink.BrokerLink::(start_mo<br>> nitor) Success, id 1399766083895=
84<br>> MainThread::INFO::2014-10-31 16:55:59,934::brokerlink::126::ovir=
t_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Star=
ting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '41d4aff=
1-54e1-4946-a812-2e656bb7d3f9', '<br>> address': '0'}<br>> MainThread=
::INFO::2014-10-31 16:55:59,939::brokerlink::137::ovirt_hosted_engine_ha.li=
b.brokerlink.BrokerLink::(start_mo<br>> nitor) Success, id 1399766084477=
60<br>> MainThread::INFO::2014-10-31 16:55:59,939::hosted_engine::391::o=
virt_hosted_engine_ha.agent.hosted_engine.HostedEngine:<br>> :(_initiali=
ze_broker) Broker initialized, all submonitors started<br>> MainThread::=
INFO::2014-10-31 16:55:59,983::hosted_engine::476::ovirt_hosted_engine_ha.a=
gent.hosted_engine.HostedEngine:<br>> :(_initialize_sanlock) Ensuring le=
ase for lockspace hosted-engine, host id 2 is acquired (file: /rhev/data-ce=
nter/mnt/g<br>> luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_=
agent/hosted-engine.lockspace)<br>> MainThread::INFO::2014-10-31 16:56:0=
0,001::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.Hoste=
dEngine:<br>> :(refresh) Global metadata: {'maintenance': False}<br>>=
MainThread::INFO::2014-10-31 16:56:00,001::state_machine::158::ovirt_hoste=
d_engine_ha.agent.hosted_engine.HostedEngine:<br>> :(refresh) Host 192.1=
68.12.11 (id 1): {'live-data': True, 'extra': 'metadata_parse_version=3D1\n=
metadata_feature_version=3D<br>> 1\ntimestamp=3D1414745750 (Fri Oct 31 1=
6:55:50 2014)\nhost-id=3D1\nscore=3D2400\nmaintenance=3DFalse\nstate=3DEngi=
neUp\n', 'hostn<br>> ame': '192.168.12.11', 'host-id': 1, 'engine-status=
': {'health': 'good', 'vm': 'up', 'detail': 'up'}, 'score': 2400, 'm<br>>=
; aintenance': False, 'host-ts': 1414745750}<br>> MainThread::INFO::2014=
-10-31 16:56:00,001::state_machine::161::ovirt_hosted_engine_ha.agent.hoste=
d_engine.HostedEngine:<br>> :(refresh) Local (id 2): {'engine-health': N=
one, 'bridge': True, 'mem-free': None, 'maintenance': False, 'cpu-load': No=
<br>> ne, 'gateway': True}<br>> MainThread::INFO::2014-10-31 16:56:00=
,002::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(n=
otify)<br>> Trying: notify time=3D1414745760.0 type=3Dstate_transition d=
etail=3DStartState-ReinitializeFSM hostname=3D'ovirt2'<br>> MainThread::=
INFO::2014-10-31 16:56:00,045::brokerlink::117::ovirt_hosted_engine_ha.lib.=
brokerlink.BrokerLink::(notify)<br>> Success, was notification of state_=
transition (StartState-ReinitializeFSM) sent? ignored<br>> MainThread::I=
NFO::2014-10-31 16:56:00,325::hosted_engine::327::ovirt_hosted_engine_ha.ag=
ent.hosted_engine.HostedEngine:<br>> :(start_monitoring) Current state R=
einitializeFSM (score: 0)<br>> MainThread::INFO::2014-10-31 16:56:10,352=
::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notif=
y) Trying: notify time=3D1414745770.35 type=3Dstate_transition detail=3DRei=
nitializeFSM-EngineDown hostname=3D'ovirt2'<br>> MainThread::INFO::2014-=
10-31 16:56:10,353::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.=
BrokerLink::(notify) Success, was notification of state_transition (Reiniti=
alizeFSM-EngineDown) sent? ignored<br>> MainThread::INFO::2014-10-31 16:=
56:10,638::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.H=
ostedEngine::(start_monitoring) Current state EngineDown (score: 2400)<br>&=
gt; MainThread::INFO::2014-10-31 16:56:20,663::states::441::ovirt_hosted_en=
gine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not runni=
ng, but we do not have enough data to decide which hosts are alive<br>> =
MainThread::INFO::2014-10-31 16:56:20,663::brokerlink::108::ovirt_hosted_en=
gine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=3D141474578=
0.66 type=3Dstate_transition detail=3DEngineDown-EngineDown hostname=3D'ovi=
rt2'<br>> MainThread::INFO::2014-10-31 16:56:20,664::brokerlink::117::ov=
irt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notif=
ication of state_transition (EngineDown-EngineDown) sent? ignored<br>> M=
ainThread::INFO::2014-10-31 16:56:20,943::hosted_engine::327::ovirt_hosted_=
engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current stat=
e EngineDown (score: 2400)<br>> MainThread::INFO::2014-10-31 16:56:30,96=
8::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(c=
onsume) The engine is not running, but we do not have enough data to decide=
which hosts are alive<br>> MainThread::INFO::2014-10-31 16:56:30,969::b=
rokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) =
Trying: notify time=3D1414745790.97 type=3Dstate_transition detail=3DEngine=
Down-EngineDown hostname=3D'ovirt2'<br>> MainThread::INFO::2014-10-31 16=
:56:30,969::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLi=
nk::(notify) Success, was notification of state_transition (EngineDown-Engi=
neDown) sent? ignored<br>> MainThread::INFO::2014-10-31 16:56:31,248::ho=
sted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::=
(start_monitoring) Current state EngineDown (score: 2400)<br>> MainThrea=
d::INFO::2014-10-31 16:56:41,274::states::441::ovirt_hosted_engine_ha.agent=
.hosted_engine.HostedEngine::(consume) The engine is not running, but we do=
not have enough data to decide which hosts are alive<br>> MainThread::I=
NFO::2014-10-31 16:56:41,275::brokerlink::108::ovirt_hosted_engine_ha.lib.b=
rokerlink.BrokerLink::(notify) Trying: notify time=3D1414745801.28 type=3Ds=
tate_transition detail=3DEngineDown-EngineDown hostname=3D'ovirt2'<br>> =
MainThread::INFO::2014-10-31 16:56:41,276::brokerlink::117::ovirt_hosted_en=
gine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of st=
ate_transition (EngineDown-EngineDown) sent? ignored<br>> MainThread::IN=
FO::2014-10-31 16:56:41,555::hosted_engine::327::ovirt_hosted_engine_ha.age=
nt.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown =
(score: 2400)<br>> MainThread::INFO::2014-10-31 16:56:51,583::states::44=
1::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The e=
ngine is not running, but we do not have enough data to decide which hosts =
are alive<br>> MainThread::INFO::2014-10-31 16:56:51,584::brokerlink::10=
8::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notif=
y time=3D1414745811.58 type=3Dstate_transition detail=3DEngineDown-EngineDo=
wn hostname=3D'ovirt2'<br>> MainThread::INFO::2014-10-31 16:56:51,584::b=
rokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) =
Success, was notification of state_transition (EngineDown-EngineDown) sent?=
ignored<br>> MainThread::INFO::2014-10-31 16:56:51,864::hosted_engine::=
327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monito=
ring) Current state EngineDown (score: 2400)<br>> MainThread::INFO::2014=
-10-31 16:57:01,897::states::454::ovirt_hosted_engine_ha.agent.hosted_engin=
e.HostedEngine::(consume) Engine down and local host has best score (2400),=
attempting to start engine VM<br>> MainThread::INFO::2014-10-31 16:57:0=
1,898::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(=
notify) Trying: notify time=3D1414745821.9 type=3Dstate_transition detail=
=3DEngineDown-EngineStart hostname=3D'ovirt2'<br>> MainThread::INFO::201=
4-10-31 16:57:01,906::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlin=
k.BrokerLink::(notify) Success, was notification of state_transition (Engin=
eDown-EngineStart) sent? ignored<br>> MainThread::INFO::2014-10-31 16:57=
:02,189::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.Hos=
tedEngine::(start_monitoring) Current state EngineStart (score: 2400)<br>&g=
t; MainThread::CRITICAL::2014-10-31 16:57:02,207::agent::103::ovirt_hosted_=
engine_ha.agent.agent.Agent::(run) Could not start ha-agent<br>> Traceba=
ck (most recent call last):<br>> File "/usr/lib/python2.6/s=
ite-packages/ovirt_hosted_engine_ha/agent/agent.py", line 97, in run<br>>=
; self._run_agent()<br>> File "/usr/lib=
/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 154, =
in _run_agent<br>> hosted_engine.HostedEngine(self.s=
hutdown_requested).start_monitoring()<br>> File "/usr/lib/p=
ython2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line=
307, in start_monitoring<br>> for old_state, state,=
delay in self.fsm:<br>> File "/usr/lib/python2.6/site-pack=
ages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 125, in next<br>> =
new_data =3D self.refresh(self._state.data)<br>> &nb=
sp; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/age=
nt/state_machine.py", line 77, in refresh<br>> stats=
.update(self.hosted_engine.collect_stats())<br>> File "/usr=
/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py"=
, line 662, in collect_stats<br>> constants.SERVICE_=
TYPE)<br>> File "/usr/lib/python2.6/site-packages/ovirt_hos=
ted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage<br>&g=
t; result =3D self._checked_communicate(request)<br>>=
; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_h=
a/lib/brokerlink.py", line 199, in _checked_communicate<br>> &nbs=
p; .format(message or response))<br>> RequestError: Request failed=
: <type 'exceptions.OSError'><br>><br>> [root@ovirt2 ~]# hosted=
-engine --vm-status<br>> Traceback (most recent call last):<br>> &nbs=
p; File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_=
main<br>> "__main__", fname, loader, pkg_name)<br>&g=
t; File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code=
<br>> exec code in run_globals<br>> =
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.=
py", line 111, in <module><br>> if not status_=
checker.print_status():<br>> File "/usr/lib/python2.6/site-=
packages/ovirt_hosted_engine_setup/vm_status.py", line 58, in print_status<=
br>> all_host_stats =3D ha_cli.get_all_host_stats()<=
br>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_en=
gine_ha/client/client.py", line 137, in get_all_host_stats<br>> &=
nbsp; return self.get_all_stats(self.StatModes.HOST)<br>> &=
nbsp;File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/c=
lient.py", line 86, in get_all_stats<br>> constants.=
SERVICE_TYPE)<br>> File "/usr/lib/python2.6/site-packages/o=
virt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_stora=
ge<br>> result =3D self._checked_communicate(request=
)<br>> File "/usr/lib/python2.6/site-packages/ovirt_hosted_=
engine_ha/lib/brokerlink.py", line 199, in _checked_communicate<br>> &nb=
sp; .format(message or response))<br>> ovirt_hosted_engine_=
ha.lib.exceptions.RequestError: Request failed: <type 'exceptions.OSErro=
r'><br>> [root@ovirt2 ~]# service ovirt-ha-agent status<br>> ovirt=
-ha-agent dead but subsys locked<br>><br>><br>> Thanks,<br>> Ja=
icel<br>><br>> ----- Original Message -----<br>> From: "Jiri Mosko=
vcak" <jmoskovc(a)redhat.com><br>> To: "Jaicel" <jaicel(a)asti.dost=
.gov.ph><br>> Cc: "Niels de Vos" <ndevos(a)redhat.com>, "Vijay Be=
llur" <vbellur(a)redhat.com>, users(a)ovirt.org, "Gluster Devel" <glus=
ter-devel(a)gluster.org><br>> Sent: Friday, October 31, 2014 11:05:32 P=
M<br>> Subject: Re: [ovirt-users] Hosted-Engine HA problem<br>><br>&g=
t; On 10/31/2014 10:26 AM, Jaicel wrote:<br>>> i've increased the lim=
it and then restarted agent and broker. status normalize, but then right no=
w it went to "False" state again but still both having 2400 score. agent lo=
gs remains the same, with "ovirt-ha-agent dead but subsys locked" status. h=
a-broker logs below<br>>><br>>> Thread-138::INFO::2014-10-31 17=
:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.Connectio=
nHandler::(setup) Connection established<br>>> Thread-138::INFO::2014=
-10-31 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.=
ConnectionHandler::(handle) Connection closed<br>>> Thread-139::INFO:=
:2014-10-31 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.list=
ener.ConnectionHandler::(setup) Connection established<br>>> Thread-1=
39::INFO::2014-10-31 17:24:38,395::listener::184::ovirt_hosted_engine_ha.br=
oker.listener.ConnectionHandler::(handle) Connection closed<br>>> Thr=
ead-140::INFO::2014-10-31 17:24:53,816::listener::134::ovirt_hosted_engine_=
ha.broker.listener.ConnectionHandler::(setup) Connection established<br>>=
;> Thread-140::INFO::2014-10-31 17:24:53,827::listener::184::ovirt_hoste=
d_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed<b=
r>>> Thread-141::INFO::2014-10-31 17:25:09,172::listener::134::ovirt_=
hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection esta=
blished<br>>> Thread-141::INFO::2014-10-31 17:25:09,182::listener::18=
4::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Conne=
ction closed<br>>> Thread-142::INFO::2014-10-31 17:25:24,551::listene=
r::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) C=
onnection established<br>>> Thread-142::INFO::2014-10-31 17:25:24,562=
::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::=
(handle) Connection closed<br>>><br>>> Thanks,<br>>> Jaic=
el<br>><br>> ok, now it seems that broker runs fine, so I need the re=
cent agent.log<br>> to debug it more.<br>><br>> --Jirka<br>><br=
>>><br>>> ----- Original Message -----<br>>> From: "Jiri =
Moskovcak" <jmoskovc(a)redhat.com><br>>> To: "Jaicel R. Sabonsoli=
n" <jaicel(a)asti.dost.gov.ph>, "Niels de Vos" <ndevos(a)redhat.com>=
;<br>>> Cc: "Vijay Bellur" <vbellur(a)redhat.com>, users(a)ovirt.or=
g, "Gluster Devel" <gluster-devel(a)gluster.org><br>>> Sent: Frid=
ay, October 31, 2014 4:32:02 PM<br>>> Subject: Re: [ovirt-users] Host=
ed-Engine HA problem<br>>><br>>> On 10/31/2014 03:53 AM, Jaicel=
R. Sabonsolin wrote:<br>>>> Hi guys,<br>>>><br>>>&=
gt; these logs appear on both hosts just like the result of --vm-status. tr=
ied to tcpdump on ovirt hosts and gluster nodes but only packets exchange w=
ith my monitoring VM(zabbix) appeared.<br>>>><br>>>> agen=
t.log<br>>>> new_data =3D self.refresh(=
self._state.data)<br>>>> File "/usr/lib/python=
2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, =
in refresh<br>>>> stats.update(self.hos=
ted_engine.collect_stats())<br>>>> File "/usr/=
lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",=
line 662, in collect_stats<br>>>> cons=
tants.SERVICE_TYPE)<br>>>> File "/usr/lib/pyth=
on2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in=
get_stats_from_storage<br>>>> result =
=3D self._checked_communicate(request)<br>>>> =
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlin=
k.py", line 199, in _checked_communicate<br>>>> &nbs=
p; .format(message or response))<br>>>> RequestError: Reques=
t failed: <type 'exceptions.OSError'><br>>>><br>>>>=
broker.log<br>>>> File "/usr/lib/python2.6/si=
te-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle=
<br>>>> response =3D "success " + self.=
_dispatch(data)<br>>>> File "/usr/lib/python2.=
6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _d=
ispatch<br>>>> .get_all_stats_for_servi=
ce_type(**options)<br>>>> File "/usr/lib/pytho=
n2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 4=
1, in get_all_stats_for_service_type<br>>>> &=
nbsp;d =3D self.get_raw_stats_for_service_type(storage_dir, service_type)<b=
r>>>> File "/usr/lib/python2.6/site-packages/o=
virt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_=
for_service_type<br>>>> f =3D os.open(p=
ath, direct_flag | os.O_RDONLY)<br>>>> OSError: [Errno 24] Too man=
y open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f=
78-111cc24139c4/ha_agent/hosted-engine.metadata'<br>>><br>>> - =
ah, there we go ^^^^^^ you might need to tweak the limit of allowed<br>>=
> open files as described here [1] or find the app keeps so many files o=
pen<br>>><br>>><br>>> --Jirka<br>>><br>>> [1]=
<br>>> http://www.cyberciti.biz/faq/linux-increase-the-maximum-number=
-of-open-files/<br>>><br>>>> Thread-38160::INFO::2014-10-31 =
10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.Connect=
ionHandler::(handle) Connection closed<br>>>> Thread-38161::INFO::=
2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.liste=
ner.ConnectionHandler::(setup) Connection established<br>>>> Threa=
d-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine=
_ha.broker.listener.ConnectionHandler::(handle) Error handling request, dat=
a: 'get-stats storage_dir=3D/rhev/data-center/mnt/gluster1:_engine/6eb220be=
-daff-4785-8f78-111cc24139c4/ha_agent service_type=3Dhosted-engine'<br>>=
>> Traceback (most recent call last):<br>>>> &=
nbsp;File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/l=
istener.py", line 165, in handle<br>>>>  =
;response =3D "success " + self._dispatch(data)<br>>>> &nbs=
p; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/brok=
er/listener.py", line 261, in _dispatch<br>>>>  =
; .get_all_stats_for_service_type(**options)<br>>>> &=
nbsp; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/b=
roker/storage_broker.py", line 41, in get_all_stats_for_service_type<br>>=
;>> d =3D self.get_raw_stats_for_service_t=
ype(storage_dir, service_type)<br>>>> File "/u=
sr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker=
.py", line 74, in get_raw_stats_for_service_type<br>>>> &nb=
sp; f =3D os.open(path, direct_flag | os.O_RDONLY)<br>>>=
> OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluste=
r1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.meta=
data'<br>>>> Thread-38161::INFO::2014-10-31 10:28:53,658::listener=
::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) C=
onnection closed<br>>>><br>>>> Thanks,<br>>>> Ja=
icel<br>>>><br>>>> ----- Original Message -----<br>>&g=
t;> From: "Niels de Vos" <ndevos(a)redhat.com><br>>>> To: "=
Vijay Bellur" <vbellur(a)redhat.com><br>>>> Cc: "Jiri Moskovca=
k" <jmoskovc(a)redhat.com>, "Jaicel R. Sabonsolin" <jaicel(a)asti.dost=
.gov.ph>, users(a)ovirt.org, "Gluster Devel" <gluster-devel(a)gluster.org=
><br>>>> Sent: Friday, October 31, 2014 4:11:25 AM<br>>>&=
gt; Subject: Re: [ovirt-users] Hosted-Engine HA problem<br>>>><br>=
>>> On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:<=
br>>>>> On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:<br>>&g=
t;>>> On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:<br>>&=
gt;>>>> Hi Guys,<br>>>>>>><br>>>>>=
;>> I need help with my ovirt Hosted-Engine HA setup. I am running on=
2<br>>>>>>> ovirt hosts and 2 gluster nodes with replica=
ted volumes. i already have<br>>>>>>> VMs running on my h=
osts and they can migrate normally once i for example<br>>>>>&g=
t;> power off the host that they are running on. the problem is that the=
<br>>>>>>> engine can't migrate once i switch off the hos=
t that hosts the engine.<br>>>>>>><br>>>>>>=
;> oVirt 3.4.3-1.el6<br>=
>>>>>> KVM &nbs=
p; 0.12.1.2 - 2.415.el6_5.10<br>>>>>>> &nbs=
p; LIBVIRT libvirt-0.10.2-29.el6_5.9<br>>>>>>> &nb=
sp; VDSM vdsm-4.14.17-0.el6<br>>>&g=
t;>>><br>>>>>>><br>>>>>>> righ=
t now, i have this result from hosted-engine --vm-status.<br>>>>&g=
t;>><br>>>>>>> Fi=
le "/usr/lib64/python2.6/runpy.py", line 122, in<br>>>>>>>=
; _run_module_as_main<br>>>>>>> &nbs=
p; "__main__", fname, loader, pkg_name)<b=
r>>>>>>> File "/usr/lib=
64/python2.6/runpy.py", line 34, in _run_code<br>>>>>>> &=
nbsp; exec code in run_globals<br>>>=
;>>>> File<br>>>>>=
;>><br>>>>>>> "/usr/lib/python2.6/site-packages/ovi=
rt_hosted_engine_setup/vm_status.py",<br>>>>>>><br>>&g=
t;>>>> line 111, in <module><br>>=
>>>>> if not status=
_checker.print_status():<br>>>>>>> &=
nbsp; File<br>>>>>>><br>>>>>>> "/=
usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",<br=
>>>>>>><br>>>>>>> =
line 58, in print_status<br>>>>>>> &=
nbsp; all_host_stats =3D ha_cli.get_all_host_stats()<br>>&g=
t;>>>> File<br>>>>&g=
t;>><br>>>>>>> "/usr/lib/python2.6/site-packages/ov=
irt_hosted_engine_ha/client/client.py",<br>>>>>>><br>>=
>>>>> line 137, in get_all_host_stats<b=
r>>>>>>> return =
self.get_all_stats(self.StatModes.HOST)<br>>>>>>> =
File<br>>>>>>><br>>>>=
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/clien=
t/client.py",<br>>>>>>><br>>>>>>>  =
; line 86, in get_all_stats<br>>>>>>>  =
; constants.SERVICE_TYPE)<br>>>>=
>>> File<br>>>>>>=
><br>>>>>>> "/usr/lib/python2.6/site-packages/ovirt_ho=
sted_engine_ha/lib/brokerlink.py",<br>>>>>>><br>>>&=
gt;>>> line 171, in get_stats_from_storage<br=
>>>>>>> result =
=3D self._checked_communicate(request)<br>>>>>>> &=
nbsp; File<br>>>>>>><br>>>>&=
gt;>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/br=
okerlink.py",<br>>>>>>><br>>>>>>>  =
; line 199, in _checked_communicate<br>>>>>>&g=
t; .format(message or response))<b=
r>>>>>>> ovirt_hosted_engine_ha.lib.=
exceptions.RequestError: Request failed:<br>>>>>>> =
<type 'exceptions.OSError'><br>>>>>>>=
;<br>>>>>>><br>>>>>>> restarting ha-bro=
ker and ha-agent normalizes the status but eventually<br>>>>>&g=
t;> it would become "false" and then return to the result above. hope yo=
u<br>>>>>>> guys could help me with this.<br>>>>=
>>><br>>>>>><br>>>>>> Hi Jaicel,<br>=
>>>>> please attach agent.log and broker.log from the host w=
here you trying to<br>>>>>> run hosted-engine --vm-status. I=
have a feeling that you ran into a<br>>>>>> known problem o=
n gluster - stalled file descriptor, in that case the<br>>>>>&g=
t; only known solution at this time is to restart the broker & agent as=
you<br>>>>>> have already found out.<br>>>>>>=
;<br>>>>><br>>>>> Adding Niels and gluster-devel to=
troubleshoot from Gluster NFS perspective.<br>>>><br>>>>=
I'd welcome any details on this "stalled file descriptor" problem. Is<br>&=
gt;>> there a bug filed with some details like logs, sysrq-t and mayb=
e even<br>>>> tcpdumps? If there is an easy way to reproduce this =
behaviour, I can<br>>>> surely look into it and hopefully come up =
with some advise or fix.<br>>>><br>>>> Thanks,<br>>>=
;> Niels<br>>>></div></div><br></div></div></body></html>
------=_Part_891998_1183135702.1415681763754
Content-Type: image/gif; name=undefined
Content-Disposition: attachment; filename=undefined
Content-Transfer-Encoding: base64
Content-ID: <8b096be5d873a9597907183bb13f9baf5a0669a2@zimbra>
R0lGODlhEgASAPQfAMKmMq6qpuPQHKOGBqGVjPXnLO/v7r+qTnJeSPPwWdK+H8OrGsjHxWpTEsS8
nfbYEPryR+DYurebE/r6+v79cf32N4h1WtjV07qfLsq2at7SMnljE9XDMVI9Df77WgAAACH5BAUA
AB8ALAAAAAASABIAQAXU4CeKU2mWo2hIVOUW8CMIT+1JxjhdzgEsC8DBcZmkJgcFBcLhuJoQiuJg
nDgWHkHjwigxLg2FZ+EwkjISToJCSXAkGfMHqfG4Op17vuLRUFcJCwgWhIWECBgQOHMXAAoJFQsb
AwsVCQoARSlzEQcSAwMSBxFyIwYYCwkeEKwQHoEYOZwsEBUNe7cVURKkgLUVGw0uwS4QCTgrFAUc
NDXODwsCigYGBxq2BAwBFxcBDATCfgYT1WICGwTc4AIeU+MqFwcLGq6rGgsHF7JHBlwM//pKhQAA
Ow==
------=_Part_891998_1183135702.1415681763754--
------=_Part_891997_988106587.1415681763753--
2
1
JiÅĆ MoskovÄĆ”k zmÄnil Äas udĆ”losti Ovirt - Hosted Engine iSCSI support (deep dive)
by Google+ (JiÅĆ MoskovÄĆ”k) 11 Nov '14
by Google+ (JiÅĆ MoskovÄĆ”k) 11 Nov '14
11 Nov '14
JiÅĆ MoskovÄĆ”k zmÄnil Äas udĆ”losti na
st 12. listopadu, 14:00 SEÄ
Toto oznÔmenà bylo odeslÔno na adresu users(a)ovirt.org; Chcete-li
aktualizovat svou adresu, pÅejdÄte na nastavenĆ doruÄovĆ”nĆ oznĆ”menĆ:
https://plus.google.com/_/notifications/ngemlink?&emid=COCoq96I8sECFRKW3Aodā¦
Ve sprĆ”vÄ odbÄrÅÆ můžete nastavit, jakĆ© e-maily z Google+ chcete dostĆ”vat:
https://plus.google.com/_/notifications/ngemlink?&emid=COCoq96I8sECFRKW3Aodā¦
Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA 94043 USA
1
0
I attempted to make a live snapshot of a Windows VM last night, and got
the message "Failed to create live snapshot <name> for VM <vm>. VM restart
is recommended".
So, I then shut the VM down normally, and attempted to remove the snapshot.
I got an error that the snapshot could not be removed because it was still
in progress. Looking underneath the covers at the 'images' directory for
this VM on the storage filesystem, I don't see any evidence of a snapshot
in progress (the *_MERGE* files).
The Snapshots tab for the VM does not show any snapshots (other than
the 'Active VM' entry).
I can't boot the VM however. When attempting to detach the disk and
mount it on another Windows VM for inspection, I receive the message
"Cannot detach virtual disk. The disk is already configured in a snapshot.
In order to detach it, remove the disk's snapshots".
How can I convince oVirt there is no snapshot so I can move forward with
the rest of resurrecting this important VM?
oVirt Engine Version: 3.4.0-1.fc19 (running on Fedora Core 19 server)
hosts: running vdsm-4.13.0-11.el6.x86_64 on CentOS 6.5
storage: NFS from an in-house NAS based on ZFS on OpenIndiana
Many thanks in advance,
Toby
--
Toby Chappell, RHCE
Director, Enterprise Services
Educational Technology
Georgia Gwinnett College
toby(a)ggc.edu / 678-407-5305
1
0
Hello,
I would like to know if there are improvements regarding ovirt support of
mixed clusters (with amd and intel cpus).
I see that is a feature supported by kvm because at least pve/proxmox and
cloudstack support mixed cluster with migration and ha.
Thanks in advance for any reply!
Mario
1
1
This is a multi-part message in MIME format. Your mail reader does not
understand MIME message format.
--=_U6go9sZJw+ExxIXj8eDxUmK29cVrvfz7VlTTLHlCG3g8DaMM
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Hello,=0D=0A=0D=0A=0D=0A=0D=0A=0D=0A=0D=0Aif tried to install ovirt-engin=
e on centos 7 but during setup i got the following error:=0D=0A=0D=0A=0D=0A=
=0D=0A[ ERROR ] Failed to execute stage 'Misc configuration': Cannot loca=
te application option SysPrep2K3Path=0D=0A=0D=0A=0D=0A=0D=0AGreetings=0D=0A=
=0D=0A=0D=0A=0D=0AWolfgang Bucher=0D=0A=0D=0A=0D=0A=0D=0A=0D=0A=0D=0A=C2=A0=
=0D=0A=C2=A0=0D=0A
--=_U6go9sZJw+ExxIXj8eDxUmK29cVrvfz7VlTTLHlCG3g8DaMM
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://ww=
w.w3.org/TR/html4/loose.dtd"><html>=0A<head>=0A <meta name=3D"Generator"=
content=3D"Zarafa WebApp v7.1.10-44973">=0A <meta http-equiv=3D"Content=
-Type" content=3D"text/html; charset=3Dutf-8">=0A <title>Installing engi=
ne on EL7</title>=0A</head>=0A<body>=0A<p style=3D"padding: 0; margin: 0;=
"><span style=3D"font-family: tahoma;"><br /></span></p>=0A<p style=3D"ma=
rgin-top: 0px; margin-bottom: 0px; color: #000000; font-family: Arial, Ve=
rdana, sans-serif;"><span style=3D"font-family: arial, helvetica, sans-se=
rif; font-size: 12pt;">Hello,</span></p>=0A<p style=3D"margin-top: 0px; m=
argin-bottom: 0px; color: #000000; font-family: Arial, Verdana, sans-seri=
f;"><span style=3D"font-family: arial, helvetica, sans-serif; font-size: =
12pt;"><br /></span></p>=0A<p style=3D"margin-top: 0px; margin-bottom: 0p=
x; color: #000000; font-family: Arial, Verdana, sans-serif;"><span style=3D=
"font-family: arial, helvetica, sans-serif; font-size: 12pt;"><br /></spa=
n></p>=0A<p style=3D"margin-top: 0px; margin-bottom: 0px; color: #000000;=
font-family: Arial, Verdana, sans-serif;"><span style=3D"font-family: ar=
ial, helvetica, sans-serif; font-size: 12pt;">if tried to install ovirt-e=
ngine on centos 7 but during setup i got the following error:</span></p>=0A=
<p style=3D"margin-top: 0px; margin-bottom: 0px; color: #000000; font-fam=
ily: Arial, Verdana, sans-serif;"><br /></p>=0A<p style=3D"margin-top: 0p=
x; margin-bottom: 0px; color: #000000; font-family: Arial, Verdana, sans-=
serif;"><span style=3D"font-family: arial, helvetica, sans-serif; font-si=
ze: 12pt;">[ ERROR ] Failed to execute stage 'Misc configuration': Cannot=
locate application option SysPrep2K3Path</span></p>=0A<p style=3D"margin=
-top: 0px; margin-bottom: 0px; color: #000000; font-family: Arial, Verdan=
a, sans-serif;"><span style=3D"font-family: arial, helvetica, sans-serif;=
font-size: 12pt;"><br /></span></p>=0A<p style=3D"margin-top: 0px; margi=
n-bottom: 0px; color: #000000; font-family: Arial, Verdana, sans-serif;">=
<span style=3D"font-family: arial, helvetica, sans-serif; font-size: 12pt=
;">Greetings</span></p>=0A<p style=3D"margin-top: 0px; margin-bottom: 0px=
; color: #000000; font-family: Arial, Verdana, sans-serif;"><span style=3D=
"font-family: arial, helvetica, sans-serif; font-size: 12pt;"><br /></spa=
n></p>=0A<p style=3D"margin-top: 0px; margin-bottom: 0px; color: #000000;=
font-family: Arial, Verdana, sans-serif;"><span style=3D"font-family: ar=
ial, helvetica, sans-serif; font-size: 12pt;">Wolfgang Bucher</span></p>=0A=
<p style=3D"margin-top: 0px; margin-bottom: 0px; color: #000000; font-fam=
ily: Arial, Verdana, sans-serif;"><span style=3D"font-family: arial, helv=
etica, sans-serif; font-size: 12pt;"><br /></span></p>=0A<p><br /></p>=0A=
<p style=3D"margin-top: 0px; margin-bottom: 0px; color: #000000; font-fam=
ily: Arial, Verdana, sans-serif;"> </p>=0A<p style=3D"margin-top: 0p=
x; margin-bottom: 0px; color: #000000; font-family: Arial, Verdana, sans-=
serif;"> </p>=0A</body>=0A</html>
--=_U6go9sZJw+ExxIXj8eDxUmK29cVrvfz7VlTTLHlCG3g8DaMM--
2
1
JiÅĆ MoskovÄĆ”k vĆ”s zve na udĆ”lost Ovirt - Hosted Engine iSCSI support (deep dive)
by Google+ (JiÅĆ MoskovÄĆ”k) 10 Nov '14
by Google+ (JiÅĆ MoskovÄĆ”k) 10 Nov '14
10 Nov '14
JiÅĆ MoskovÄĆ”k vĆ”s zve na udĆ”lost Ovirt - Hosted Engine iSCSI support (deep
dive)
st 12. listopadu, 15:00 SEÄ
Jiri Kastner, Ales Kozumplik, Jakub Filak a 34 dalÅ”Ćch je pozvĆ”no
Zobrazit pozvƔnku:
https://plus.google.com/_/notifications/ngemlink?&emid=CNi6v7aX8MECFYs93Qodā¦
In this talk I'm going to describe the changes we did in the code to add
support for iSCSI, then I will describe the changes visible to user and
since I'm feeling lucky I will also do a quick live demo of the deployment
on the iSCSI :)
Toto oznÔmenà bylo odeslÔno na adresu users(a)ovirt.org; Chcete-li
aktualizovat svou adresu, pÅejdÄte na nastavenĆ doruÄovĆ”nĆ oznĆ”menĆ:
https://plus.google.com/_/notifications/ngemlink?&emid=CNi6v7aX8MECFYs93Qodā¦
Ve sprĆ”vÄ odbÄrÅÆ můžete nastavit, jakĆ© e-maily z Google+ chcete dostĆ”vat:
https://plus.google.com/_/notifications/ngemlink?&emid=CNi6v7aX8MECFYs93Qodā¦
Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA 94043 USA
1
0
--=_e599e30f-169e-4e22-8806-22df7d2c92fd
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
we have and ovirt hsoted engine, acting at same time as nfs server
node 1, contianing hosted-engine
nfs master = node1:/nfs/image
try to adopt same type server node 2 on the default cluster, and get canot access storage pool message
how can solve this?
Juan Carlos Lin
Unisoft S.A.
+595-993-288330
---------------------------------------------------
"Antes de imprimir, recuƩrdese de su compromiso con el Medio Ambiente"
"Aviso: Este mensaje es dirigido para su destinatario y contiene informaciones que no pueden ser usadas por otras personas que no sean su(s) destinatario(s). La retransmisión del contenido no estĆ” autorizada fuera del contexto de su envĆo y a quien corresponde. El uso no autorizado de la información en este mensaje se halla penado por las leyes vigentes en todo el mundo. Si ha recibido este mensaje por error, por favor bórrala y notifique al remitente en la brevedad posible. El contenido de este mensaje no es responsabilidad de la Empresa y debe ser atribuido siempre a su autor. Gracias."
--=_e599e30f-169e-4e22-8806-22df7d2c92fd
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit
<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Arial; font-size: 10pt; color: #000000'><span>we have and ovirt hsoted engine, acting at same time as nfs server<br>node 1, contianing hosted-engine<br>nfs master = node1:/nfs/image<br><br>try to adopt same type server node 2 on the default cluster, and get canot access storage pool message<br>how can solve this?<br><br><br><span name="x"></span>Juan Carlos Lin<br>Unisoft S.A.<br>+595-993-288330<span name="x"></span><br></span><br></div>
<br>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <HTML lang="es"> <HEAD> <TITLE>System-wide Disclaimer</TITLE> </HEAD> <BODY>---------------------------------------------------<br>"Antes de imprimir, recuĆ©rdese de su compromiso con el Medio Ambiente"<br>"Aviso: Este mensaje es dirigido para su destinatario y contiene informaciones que no pueden ser usadas por otras personas que no sean su(s) destinatario(s). La retransmisión del contenido no estĆ” autorizada fuera del contexto de su envĆo y a quien corresponde. El uso no autorizado de la información en este mensaje se halla penado por las leyes vigentes en todo el mundo. Si ha recibido este mensaje por error, por favor bórrala y notifique al remitente en la brevedad posible. El contenido de este mensaje no es responsabilidad de la Empresa y debe ser atribuido siempre a su autor. Gracias." </BODY> </HTML><br>
</body></html>
--=_e599e30f-169e-4e22-8806-22df7d2c92fd--
1
0
Hello,
I just wanted to clarify the upgrade procedure for when I am ready to go
from my 3.4 -> 3.5.
I have a single server running oVirt 3.4 (host+storage) with a
self-hosted engine setup. The hosted engine and the host are both on
CentOS 6.5.
According to the release notes for 3.5, I would install the
ovirt-release35.rpm RPM, which gives me the oVirt 3.5 repositories. Is
the upgrade path at that point then as simple as running "yum upgrade"
on both the host and the hosted-engine or do I still need to run "yum
update ovirt-engine-setup" then "engine-setup" after upgrading both?
It isn't clear (to me) if the "yum ovirt-engine-setup" and
"engine-setup" steps on the 3.5 Release Notes are for a clean install
only, or if they still need to be run on a system being upgraded.
Thanks! :-)
-Alan
2
1
------=_Part_2357_378790510.1415517080365
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Hello,
At last night I have sussessfully migrated my vms from kvm to gluster based ovirt 3.5.
It sounds good, but the results was terrible: The glusterfs (two node replicated) is too slow for our vms. It is a very big problem, so I have looking for soultion.
I don't know why but glusterfsd now uses 500-800% of my cpus on each server.
At this moment I have only one gluster volume for vms.
I just wondering I will configure local storage on the second host and copy to all of vms from glusterfs to this volume.
I know in this case the second host will be on a different Datacenter. But in this case what will happend with my glusterfs? It will be break it? Will I lost my data? Could it make split-brains on gluster?
It will break the whole ovirt-portal?
If possible I don't want to more downtime by this procedure.
It is possible?
My config:
2 node dell r710, raid5 , 3xbonded 1GBs NICs + one for internet access for vms. The glusterfs uses same network with ovirtmgmt.
What can I do ?
Thanks in advance
Tibor
------=_Part_2357_378790510.1415517080365
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<html><body><div style=3D"font-family: times new roman, new york, times, se=
rif; font-size: 12pt; color: #000000"><div><span style=3D"font-size: 12pt;"=
>Hello,</span></div><div><br></div><div>At last night I have sussessfully m=
igrated my vms from kvm to gluster based ovirt 3.5.</div><div>It sounds goo=
d, but the results was terrible: The glusterfs (two node replicated) is too=
slow for our vms. It is a very big problem, so I have looking for soultion=
.</div><div>I don't know why but glusterfsd now uses 500-800% of my cpus on=
each server.</div><div>At this moment I have only one gluster volume for v=
ms. </div><div>I just wondering I will configure local storage on the =
second host and copy to all of vms from glusterfs to this volume. </di=
v><div>I know in this case the second host will be on a different Datacente=
r. But in this case what will happend with my glusterfs? It will be break i=
t? Will I lost my data? Could it make split-brains on gluster?</div><div>It=
will break the whole ovirt-portal? </div><div>If possible I don't wan=
t to more downtime by this procedure.</div><div><br></div><div>It is possib=
le?</div><div><br></div><div>My config:</div><div><br></div><div>2 node del=
l r710, raid5 , 3xbonded 1GBs NICs + one for internet access for vms. The g=
lusterfs uses same network with ovirtmgmt.</div><div><br></div><div>What ca=
n I do ?</div><div><br></div><div>Thanks in advance</div><div><br></div><di=
v>Tibor</div><div><br></div><div><br></div></div></body></html>
------=_Part_2357_378790510.1415517080365--
1
0
--Sig_/XgT780OOyzGFn25W/6Ar8P0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
I've been doing lots of unsuccessful 3.5 hosted-engine installs in my lab,
where it's easy for me to re-install the OS if I need to start over. Now I
need to try an install in a remote datacenter where I won't be able to
re-install the OS. So I was wondering if there is a way to 'reset' a failed
install so that another install can be attempted...
My thoughts so far are:
- stop vdsm, supervdsm, and libvirt
- use etckeeper to reset everything under /etc
- delete old log files
- delete hosted_engine storage domain on storage (if install got that far)
- restart vdsm, supervdsm, and libvirt
What am I missing? Maybe some remnants in /var (hmm, probably the vdsm
persistent config)? Anything else?
Robert
--=20
Senior Software Engineer @ Parsons
--Sig_/XgT780OOyzGFn25W/6Ar8P0
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEARECAAYFAlRdbTEACgkQ7/fVLLY1mnhwSACfdoCXEoZLXLJxr0SWsnl6ySo3
Df4An2uqSv1rNtrWP1p4dE9RrMHMmdYo
=cTpl
-----END PGP SIGNATURE-----
--Sig_/XgT780OOyzGFn25W/6Ar8P0--
3
2