------=_Part_891997_988106587.1415681763753
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Hi Jirka,
the patch works. it stabilized the status of my two hosts. the engine migration during
failover also works fine. thanks guys!
Jaicel
From: "Jiri Moskovcak" <jmoskovc(a)redhat.com>
To: "Jaicel" <jaicel(a)asti.dost.gov.ph>
Cc: "Niels de Vos" <ndevos(a)redhat.com>, "Vijay Bellur"
<vbellur(a)redhat.com>, users(a)ovirt.org, "Gluster Devel"
<gluster-devel(a)gluster.org>
Sent: Monday, November 3, 2014 3:33:16 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem
On 11/01/2014 07:43 AM, Jaicel wrote:
Hi,
my engine runs on Host1. current status and agent logs below.
Host 1
Hi,
it seems like you ran into [1], you can either zero-out the metadata
file or apply the patch from [1] manually.
--Jirka
[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1158925
MainThread::INFO::2014-10-31
16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engi
ne-ha agent 1.1.6 started
MainThread::INFO::2014-10-31
16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_get_hostname) Found certificate common name: 192.168.12.11
MainThread::INFO::2014-10-31
16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2014-10-31
16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor ping, options {'addr': '192.168.12.254'}
MainThread::INFO::2014-10-31
16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 140634215107920
MainThread::INFO::2014-10-31
16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true',
'bridge_name': 'ovirtmgmt', 'address': '0'}
MainThread::INFO::2014-10-31
16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 140634215108432
MainThread::INFO::2014-10-31
16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor mem-free, options {'use_ssl': 'true',
'address': '0'}
MainThread::INFO::2014-10-31
16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 39956688
MainThread::INFO::2014-10-31
16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true',
'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f
9', 'address': '0'}
MainThread::INFO::2014-10-31
16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 140634215107664
MainThread::INFO::2014-10-31
16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor engine-health, options {'use_ssl': 'true',
'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f9', '
address': '0'}
MainThread::INFO::2014-10-31
16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 140634006879632
MainThread::INFO::2014-10-31
16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_broker) Broker initialized, all submonitors started
MainThread::INFO::2014-10-31
16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired
(file: /rhev/data-center/mnt/g
luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
MainThread::INFO::2014-10-31
16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Global metadata: {'maintenance': False}
MainThread::INFO::2014-10-31
16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra':
'metadata_parse_version=1\nmetadata_feature_version
=1\ntimestamp=1413882675 (Tue Oct 21 17:11:15
2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': '192.168.12.12', 'host-id': 2,
'engine-status': {'reason': 'vm not running on this host',
'health': 'bad', 'vm': 'down', 'detail':
'unknown'}, 'score': 2400, 'maintenance': False,
'host-ts': 1413882675}
MainThread::INFO::2014-10-31
16:55:40,322::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Local (id 1): {'engine-health': None, 'bridge': True, 'mem-free':
None, 'maintenance': False, 'cpu-load': None, 'gateway': True}
MainThread::INFO::2014-10-31
16:55:40,323::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745740.32 type=state_transition detail=StartState-ReinitializeFSM
hostname='ovirt1'
MainThread::INFO::2014-10-31
16:55:40,392::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (StartState-ReinitializeFSM) sent? ignored
MainThread::INFO::2014-10-31
16:55:40,675::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state ReinitializeFSM (score: 0)
MainThread::INFO::2014-10-31
16:55:50,710::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745750.71 type=state_transition detail=ReinitializeFSM-EngineUp
hostname='ovirt1'
MainThread::INFO::2014-10-31
16:55:50,710::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (ReinitializeFSM-EngineUp) sent? ignored
MainThread::INFO::2014-10-31
16:55:51,001::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::CRITICAL::2014-10-31
16:56:01,033::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start
ha-agent
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 97, in run
self._run_agent()
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 154, in _run_agent
hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 307, in start_monitoring
for old_state, state, delay in self.fsm:
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
line 125, in next
new_data = self.refresh(self._state.data)
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
line 77, in refresh
stats.update(self.hosted_engine.collect_stats())
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 700, in collect_stats
stats = self.process_remote_metadata(host_id, remote_data)
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 747, in process_remote_metadata
md['engine-status'] = engine_status(md["engine-status"])
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 79, in engine_status
in json.loads(status).iteritems()])
AttributeError: 'NoneType' object has no attribute 'iteritems'
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
Status up-to-date : False
Hostname : 192.168.12.11
Host ID : 1
Engine status : unknown stale-data
Score : 2400
Local maintenance : False
Host timestamp : 1414745750
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1414745750 (Fri Oct 31 16:55:50 2014)
host-id=1
score=2400
maintenance=False
state=EngineUp
--== Host 2 status ==--
Status up-to-date : False
Hostname : 192.168.12.12
Host ID : 2
Engine status : unknown stale-data
Score : 2400
Local maintenance : False
Host timestamp : 1414745821
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1414745821 (Fri Oct 31 16:57:01 2014)
host-id=2
score=2400
maintenance=False
state=EngineStart
[root@ovirt1 ~]# service ovirt-ha-agent status
ovirt-ha-agent dead but subsys locked
Host2
MainThread::INFO::2014-10-31
16:55:59,642::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engi
ne-ha agent 1.1.6 started
MainThread::INFO::2014-10-31
16:55:59,678::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_get_hostname) Found certificate common name: 192.168.12.12
MainThread::INFO::2014-10-31
16:55:59,918::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2014-10-31
16:55:59,919::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor ping, options {'addr': '192.168.12.254'}
MainThread::INFO::2014-10-31
16:55:59,922::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 25353488
MainThread::INFO::2014-10-31
16:55:59,922::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true',
'bridge_name': 'ovirtmgmt', 'address': '0'}
MainThread::INFO::2014-10-31
16:55:59,928::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 25354128
MainThread::INFO::2014-10-31
16:55:59,928::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor mem-free, options {'use_ssl': 'true',
'address': '0'}
MainThread::INFO::2014-10-31
16:55:59,931::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 25353552
MainThread::INFO::2014-10-31
16:55:59,931::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true',
'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f
9', 'address': '0'}
MainThread::INFO::2014-10-31
16:55:59,934::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 139976608389584
MainThread::INFO::2014-10-31
16:55:59,934::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Starting monitor engine-health, options {'use_ssl': 'true',
'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f9', '
address': '0'}
MainThread::INFO::2014-10-31
16:55:59,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
nitor) Success, id 139976608447760
MainThread::INFO::2014-10-31
16:55:59,939::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_broker) Broker initialized, all submonitors started
MainThread::INFO::2014-10-31
16:55:59,983::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 2 is acquired
(file: /rhev/data-center/mnt/g
luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
MainThread::INFO::2014-10-31
16:56:00,001::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Global metadata: {'maintenance': False}
MainThread::INFO::2014-10-31
16:56:00,001::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Host 192.168.12.11 (id 1): {'live-data': True, 'extra':
'metadata_parse_version=1\nmetadata_feature_version=
1\ntimestamp=1414745750 (Fri Oct 31 16:55:50
2014)\nhost-id=1\nscore=2400\nmaintenance=False\nstate=EngineUp\n', 'hostn
ame': '192.168.12.11', 'host-id': 1, 'engine-status':
{'health': 'good', 'vm': 'up', 'detail':
'up'}, 'score': 2400, 'm
aintenance': False, 'host-ts': 1414745750}
MainThread::INFO::2014-10-31
16:56:00,001::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(refresh) Local (id 2): {'engine-health': None, 'bridge': True,
'mem-free': None, 'maintenance': False, 'cpu-load': No
ne, 'gateway': True}
MainThread::INFO::2014-10-31
16:56:00,002::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745760.0 type=state_transition detail=StartState-ReinitializeFSM
hostname='ovirt2'
MainThread::INFO::2014-10-31
16:56:00,045::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (StartState-ReinitializeFSM) sent? ignored
MainThread::INFO::2014-10-31
16:56:00,325::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
:(start_monitoring) Current state ReinitializeFSM (score: 0)
MainThread::INFO::2014-10-31
16:56:10,352::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745770.35 type=state_transition detail=ReinitializeFSM-EngineDown
hostname='ovirt2'
MainThread::INFO::2014-10-31
16:56:10,353::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (ReinitializeFSM-EngineDown) sent? ignored
MainThread::INFO::2014-10-31
16:56:10,638::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31
16:56:20,663::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
The engine is not running, but we do not have enough data to decide which hosts are alive
MainThread::INFO::2014-10-31
16:56:20,663::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745780.66 type=state_transition detail=EngineDown-EngineDown
hostname='ovirt2'
MainThread::INFO::2014-10-31
16:56:20,664::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored
MainThread::INFO::2014-10-31
16:56:20,943::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31
16:56:30,968::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
The engine is not running, but we do not have enough data to decide which hosts are alive
MainThread::INFO::2014-10-31
16:56:30,969::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745790.97 type=state_transition detail=EngineDown-EngineDown
hostname='ovirt2'
MainThread::INFO::2014-10-31
16:56:30,969::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored
MainThread::INFO::2014-10-31
16:56:31,248::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31
16:56:41,274::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
The engine is not running, but we do not have enough data to decide which hosts are alive
MainThread::INFO::2014-10-31
16:56:41,275::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745801.28 type=state_transition detail=EngineDown-EngineDown
hostname='ovirt2'
MainThread::INFO::2014-10-31
16:56:41,276::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored
MainThread::INFO::2014-10-31
16:56:41,555::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31
16:56:51,583::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
The engine is not running, but we do not have enough data to decide which hosts are alive
MainThread::INFO::2014-10-31
16:56:51,584::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745811.58 type=state_transition detail=EngineDown-EngineDown
hostname='ovirt2'
MainThread::INFO::2014-10-31
16:56:51,584::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored
MainThread::INFO::2014-10-31
16:56:51,864::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-10-31
16:57:01,897::states::454::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine down and local host has best score (2400), attempting to start engine VM
MainThread::INFO::2014-10-31
16:57:01,898::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1414745821.9 type=state_transition detail=EngineDown-EngineStart
hostname='ovirt2'
MainThread::INFO::2014-10-31
16:57:01,906::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored
MainThread::INFO::2014-10-31
16:57:02,189::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineStart (score: 2400)
MainThread::CRITICAL::2014-10-31
16:57:02,207::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start
ha-agent
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 97, in run
self._run_agent()
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 154, in _run_agent
hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 307, in start_monitoring
for old_state, state, delay in self.fsm:
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
line 125, in next
new_data = self.refresh(self._state.data)
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
line 77, in refresh
stats.update(self.hosted_engine.collect_stats())
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 662, in collect_stats
constants.SERVICE_TYPE)
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 171, in get_stats_from_storage
result = self._checked_communicate(request)
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 199, in _checked_communicate
.format(message or response))
RequestError: Request failed: <type 'exceptions.OSError'>
[root@ovirt2 ~]# hosted-engine --vm-status
Traceback (most recent call last):
File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
exec code in run_globals
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
line 111, in <module>
if not status_checker.print_status():
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
line 58, in print_status
all_host_stats = ha_cli.get_all_host_stats()
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line
137, in get_all_host_stats
return self.get_all_stats(self.StatModes.HOST)
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line
86, in get_all_stats
constants.SERVICE_TYPE)
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 171, in get_stats_from_storage
result = self._checked_communicate(request)
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 199, in _checked_communicate
.format(message or response))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: <type
'exceptions.OSError'>
[root@ovirt2 ~]# service ovirt-ha-agent status
ovirt-ha-agent dead but subsys locked
Thanks,
Jaicel
----- Original Message -----
From: "Jiri Moskovcak" <jmoskovc(a)redhat.com>
To: "Jaicel" <jaicel(a)asti.dost.gov.ph>
Cc: "Niels de Vos" <ndevos(a)redhat.com>, "Vijay Bellur"
<vbellur(a)redhat.com>, users(a)ovirt.org, "Gluster Devel"
<gluster-devel(a)gluster.org>
Sent: Friday, October 31, 2014 11:05:32 PM
Subject: Re: [ovirt-users] Hosted-Engine HA problem
On 10/31/2014 10:26 AM, Jaicel wrote:
> i've increased the limit and then restarted agent and broker. status normalize,
but then right now it went to "False" state again but still both having 2400
score. agent logs remains the same, with "ovirt-ha-agent dead but subsys locked"
status. ha-broker logs below
>
> Thread-138::INFO::2014-10-31
17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
> Thread-138::INFO::2014-10-31
17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
> Thread-139::INFO::2014-10-31
17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
> Thread-139::INFO::2014-10-31
17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
> Thread-140::INFO::2014-10-31
17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
> Thread-140::INFO::2014-10-31
17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
> Thread-141::INFO::2014-10-31
17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
> Thread-141::INFO::2014-10-31
17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
> Thread-142::INFO::2014-10-31
17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
> Thread-142::INFO::2014-10-31
17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
>
> Thanks,
> Jaicel
ok, now it seems that broker runs fine, so I need the recent agent.log
to debug it more.
--Jirka
>
> ----- Original Message -----
> From: "Jiri Moskovcak" <jmoskovc(a)redhat.com>
> To: "Jaicel R. Sabonsolin" <jaicel(a)asti.dost.gov.ph>, "Niels de
Vos" <ndevos(a)redhat.com>
> Cc: "Vijay Bellur" <vbellur(a)redhat.com>, users(a)ovirt.org,
"Gluster Devel" <gluster-devel(a)gluster.org>
> Sent: Friday, October 31, 2014 4:32:02 PM
> Subject: Re: [ovirt-users] Hosted-Engine HA problem
>
> On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
>> Hi guys,
>>
>> these logs appear on both hosts just like the result of --vm-status. tried to
tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring
VM(zabbix) appeared.
>>
>> agent.log
>> new_data = self.refresh(self._state.data)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
line 77, in refresh
>> stats.update(self.hosted_engine.collect_stats())
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 662, in collect_stats
>> constants.SERVICE_TYPE)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 171, in get_stats_from_storage
>> result = self._checked_communicate(request)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 199, in _checked_communicate
>> .format(message or response))
>> RequestError: Request failed: <type 'exceptions.OSError'>
>>
>> broker.log
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 165, in handle
>> response = "success " + self._dispatch(data)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 261, in _dispatch
>> .get_all_stats_for_service_type(**options)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 41, in get_all_stats_for_service_type
>> d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 74, in get_raw_stats_for_service_type
>> f = os.open(path, direct_flag | os.O_RDONLY)
>> OSError: [Errno 24] Too many open files:
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
>
> - ah, there we go ^^^^^^ you might need to tweak the limit of allowed
> open files as described here [1] or find the app keeps so many files open
>
>
> --Jirka
>
> [1]
>
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-fi...
>
>> Thread-38160::INFO::2014-10-31
10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
>> Thread-38161::INFO::2014-10-31
10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
>> Thread-38161::ERROR::2014-10-31
10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Error handling request, data: 'get-stats
storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent
service_type=hosted-engine'
>> Traceback (most recent call last):
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 165, in handle
>> response = "success " + self._dispatch(data)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 261, in _dispatch
>> .get_all_stats_for_service_type(**options)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 41, in get_all_stats_for_service_type
>> d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 74, in get_raw_stats_for_service_type
>> f = os.open(path, direct_flag | os.O_RDONLY)
>> OSError: [Errno 24] Too many open files:
'/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
>> Thread-38161::INFO::2014-10-31
10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
>>
>> Thanks,
>> Jaicel
>>
>> ----- Original Message -----
>> From: "Niels de Vos" <ndevos(a)redhat.com>
>> To: "Vijay Bellur" <vbellur(a)redhat.com>
>> Cc: "Jiri Moskovcak" <jmoskovc(a)redhat.com>, "Jaicel R.
Sabonsolin" <jaicel(a)asti.dost.gov.ph>, users(a)ovirt.org, "Gluster
Devel" <gluster-devel(a)gluster.org>
>> Sent: Friday, October 31, 2014 4:11:25 AM
>> Subject: Re: [ovirt-users] Hosted-Engine HA problem
>>
>> On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:
>>> On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:
>>>> On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:
>>>>> Hi Guys,
>>>>>
>>>>> I need help with my ovirt Hosted-Engine HA setup. I am running on 2
>>>>> ovirt hosts and 2 gluster nodes with replicated volumes. i already
have
>>>>> VMs running on my hosts and they can migrate normally once i for
example
>>>>> power off the host that they are running on. the problem is that the
>>>>> engine can't migrate once i switch off the host that hosts the
engine.
>>>>>
>>>>> oVirt 3.4.3-1.el6
>>>>> KVM 0.12.1.2 - 2.415.el6_5.10
>>>>> LIBVIRT libvirt-0.10.2-29.el6_5.9
>>>>> VDSM vdsm-4.14.17-0.el6
>>>>>
>>>>>
>>>>> right now, i have this result from hosted-engine --vm-status.
>>>>>
>>>>> File "/usr/lib64/python2.6/runpy.py", line 122, in
>>>>> _run_module_as_main
>>>>> "__main__", fname, loader, pkg_name)
>>>>> File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
>>>>> exec code in run_globals
>>>>> File
>>>>>
>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>>
>>>>> line 111, in <module>
>>>>> if not status_checker.print_status():
>>>>> File
>>>>>
>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>>
>>>>> line 58, in print_status
>>>>> all_host_stats = ha_cli.get_all_host_stats()
>>>>> File
>>>>>
>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>>
>>>>> line 137, in get_all_host_stats
>>>>> return self.get_all_stats(self.StatModes.HOST)
>>>>> File
>>>>>
>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>>
>>>>> line 86, in get_all_stats
>>>>> constants.SERVICE_TYPE)
>>>>> File
>>>>>
>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>
>>>>> line 171, in get_stats_from_storage
>>>>> result = self._checked_communicate(request)
>>>>> File
>>>>>
>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>
>>>>> line 199, in _checked_communicate
>>>>> .format(message or response))
>>>>> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
>>>>> <type 'exceptions.OSError'>
>>>>>
>>>>>
>>>>> restarting ha-broker and ha-agent normalizes the status but
eventually
>>>>> it would become "false" and then return to the result
above. hope you
>>>>> guys could help me with this.
>>>>>
>>>>
>>>> Hi Jaicel,
>>>> please attach agent.log and broker.log from the host where you trying to
>>>> run hosted-engine --vm-status. I have a feeling that you ran into a
>>>> known problem on gluster - stalled file descriptor, in that case the
>>>> only known solution at this time is to restart the broker & agent as
you
>>>> have already found out.
>>>>
>>>
>>> Adding Niels and gluster-devel to troubleshoot from Gluster NFS perspective.
>>
>> I'd welcome any details on this "stalled file descriptor" problem.
Is
>> there a bug filed with some details like logs, sysrq-t and maybe even
>> tcpdumps? If there is an easy way to reproduce this behaviour, I can
>> surely look into it and hopefully come up with some advise or fix.
>>
>> Thanks,
>> Niels
>>
------=_Part_891997_988106587.1415681763753
Content-Type: multipart/related;
boundary="----=_Part_891998_1183135702.1415681763754"
------=_Part_891998_1183135702.1415681763754
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<html><body><div style=3D"font-family: arial, helvetica, sans-serif;
font-s=
ize: 10pt; color: #000000"><div
data-marker=3D"__QUOTED_TEXT__"><div style=
=3D"font-family: arial, helvetica, sans-serif; font-size: 10pt; color: #000=
000"><div>Hi Jirka,<br><br>the patch works. it stabilized the
status of my =
two hosts. the engine migration during failover also works fine. thanks guy=
s! <img
src=3D"cid:8b096be5d873a9597907183bb13f9baf5a0669a2@zimbra"><br></d=
iv><div><br
data-mce-bogus=3D"1"></div><div>Jaicel</div><br><hr
id=3D"zwchr=
"><div><b>From: </b>"Jiri Moskovcak"
&lt;jmoskovc(a)redhat.com&gt;<br><b>To: =
</b>"Jaicel" &lt;jaicel(a)asti.dost.gov.ph&gt;<br><b>Cc:
</b>"Niels de Vos" &=
lt;ndevos(a)redhat.com&gt;, "Vijay Bellur" &lt;vbellur(a)redhat.com&gt;,
users@=
ovirt.org, "Gluster Devel"
&lt;gluster-devel(a)gluster.org&gt;<br><b>Sent: </=
b>Monday, November 3, 2014 3:33:16 PM<br><b>Subject: </b>Re:
[ovirt-users] =
Hosted-Engine HA problem<br></div><br><div>On 11/01/2014 07:43 AM,
Jaicel w=
rote:<br>> Hi,<br>><br>> my engine runs on Host1.
current status a=
nd agent logs below.<br>><br>> Host
1<br><br>Hi,<br>it seems like you=
ran into [1], you can either zero-out the metadata <br>file or apply the p=
atch from [1] manually.<br><br>--Jirka<br><br>[1]
https://bugzilla.redhat.c=
om/show_bug.cgi?id=3D1158925<br><br>><br>>
MainThread::INFO::2014-10-=
31 16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run)=
ovirt-hosted-engi<br>> ne-ha agent 1.1.6 started<br>>
MainThread::IN=
FO::2014-10-31 16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.age=
nt.hosted_engine.HostedEngine:<br>> :(_get_hostname) Found certificate c=
ommon name: 192.168.12.11<br>> MainThread::INFO::2014-10-31 16:55:40,228=
::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngi=
ne:<br>> :(_initialize_broker) Initializing ha-broker
connection<br>>=
MainThread::INFO::2014-10-31 16:55:40,228::brokerlink::126::ovirt_hosted_e=
ngine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Starting monit=
or ping, options {'addr': '192.168.12.254'}<br>>
MainThread::INFO::2014-=
10-31 16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.=
BrokerLink::(start_mo<br>> nitor) Success, id
140634215107920<br>> Ma=
inThread::INFO::2014-10-31 16:55:40,231::brokerlink::126::ovirt_hosted_engi=
ne_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Starting monitor =
mgmt-bridge, options {'use_ssl': 'true', 'bridge_name':
'ovirtmgmt', 'addre=
ss': '0'}<br>> MainThread::INFO::2014-10-31
16:55:40,237::brokerlink::13=
7::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nito=
r) Success, id 140634215108432<br>> MainThread::INFO::2014-10-31 16:55:4=
0,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(=
start_mo<br>> nitor) Starting monitor mem-free, options {'use_ssl':
'tru=
e', 'address': '0'}<br>> MainThread::INFO::2014-10-31
16:55:40,240::brok=
erlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br=
> nitor) Success, id 39956688<br>>
MainThread::INFO::2014-10-31 16:5=
5:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink=
::(start_mo<br>> nitor) Starting monitor cpu-load-no-engine, options
{'u=
se_ssl': 'true', 'vm_uuid':
'41d4aff1-54e1-4946-a812-2e656bb7d3f<br>> 9'=
, 'address': '0'}<br>> MainThread::INFO::2014-10-31
16:55:40,243::broker=
link::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>&=
gt; nitor) Success, id 140634215107664<br>> MainThread::INFO::2014-10-31=
16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.Broke=
rLink::(start_mo<br>> nitor) Starting monitor engine-health, options
{'u=
se_ssl': 'true', 'vm_uuid':
'41d4aff1-54e1-4946-a812-2e656bb7d3f9', '<br>&g=
t; address': '0'}<br>> MainThread::INFO::2014-10-31
16:55:40,249::broker=
link::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>&=
gt; nitor) Success, id 140634006879632<br>> MainThread::INFO::2014-10-31=
16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engi=
ne.HostedEngine:<br>> :(_initialize_broker) Broker initialized, all subm=
onitors started<br>> MainThread::INFO::2014-10-31 16:55:40,298::hosted_e=
ngine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:<br>>=
; :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host i=
d 1 is acquired (file: /rhev/data-center/mnt/g<br>> luster1:_engine/6eb2=
20be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)<br>> =
MainThread::INFO::2014-10-31 16:55:40,322::state_machine::153::ovirt_hosted=
_engine_ha.agent.hosted_engine.HostedEngine:<br>> :(refresh) Global meta=
data: {'maintenance': False}<br>> MainThread::INFO::2014-10-31
16:55:40,=
322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE=
ngine:<br>> :(refresh) Host 192.168.12.12 (id 2): {'live-data':
False, '=
extra': 'metadata_parse_version=3D1\nmetadata_feature_version<br>>
=3D1\=
ntimestamp=3D1413882675 (Tue Oct 21 17:11:15 2014)\nhost-id=3D2\nscore=3D24=
00\nmaintenance=3DFalse\nstate=3DEngineDown\n', 'hostname':
'192.168.12.12'=
, 'host-id': 2, 'engine-status': {'reason': 'vm not running on
this host', =
'health': 'bad', 'vm': 'down', 'detail':
'unknown'}, 'score': 2400, 'mainte=
nance': False, 'host-ts': 1413882675}<br>>
MainThread::INFO::2014-10-31 =
16:55:40,322::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engin=
e.HostedEngine::(refresh) Local (id 1): {'engine-health': None, 'bridge':
T=
rue, 'mem-free': None, 'maintenance': False, 'cpu-load': None,
'gateway': T=
rue}<br>> MainThread::INFO::2014-10-31 16:55:40,323::brokerlink::108::ov=
irt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify tim=
e=3D1414745740.32 type=3Dstate_transition detail=3DStartState-ReinitializeF=
SM hostname=3D'ovirt1'<br>> MainThread::INFO::2014-10-31
16:55:40,392::b=
rokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) =
Success, was notification of state_transition (StartState-ReinitializeFSM) =
sent? ignored<br>> MainThread::INFO::2014-10-31 16:55:40,675::hosted_eng=
ine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_m=
onitoring) Current state ReinitializeFSM (score: 0)<br>> MainThread::INF=
O::2014-10-31 16:55:50,710::brokerlink::108::ovirt_hosted_engine_ha.lib.bro=
kerlink.BrokerLink::(notify)<br>> Trying: notify time=3D1414745750.71 ty=
pe=3Dstate_transition detail=3DReinitializeFSM-EngineUp hostname=3D'ovirt1'=
<br>> MainThread::INFO::2014-10-31 16:55:50,710::brokerlink::117::ovirt_=
hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notificat=
ion of state_transition (ReinitializeFSM-EngineUp) sent? ignored<br>> Ma=
inThread::INFO::2014-10-31 16:55:51,001::hosted_engine::327::ovirt_hosted_e=
ngine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state=
EngineUp (score: 2400)<br>> MainThread::CRITICAL::2014-10-31 16:56:01,0=
33::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not s=
tart ha-agent<br>> Traceback (most recent call last):<br>>
&nb=
sp;File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agen=
t.py", line 97, in run<br>>
self._run_agent()<br>>=
; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_h=
a/agent/agent.py", line 154, in _run_agent<br>>
host=
ed_engine.HostedEngine(self.shutdown_requested).start_monitoring()<br>> =
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/=
agent/hosted_engine.py", line 307, in start_monitoring<br>>
 =
; for old_state, state, delay in self.fsm:<br>>
File =
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py=
", line 125, in next<br>> new_data =3D
self.refresh(=
self._state.data)<br>> File
"/usr/lib/python2.6/site-packag=
es/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in
refresh<br>&=
gt;
stats.update(self.hosted_engine.collect_stats())<br=
> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engi=
ne_ha/agent/hosted_engine.py", line 700, in collect_stats<br>>
&n=
bsp; stats =3D self.process_remote_metadata(host_id, remote_data)<br>=
> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engin=
e_ha/agent/hosted_engine.py", line 747, in process_remote_metadata<br>>
=
md['engine-status'] =3D
engine_status(md["engine-status=
"])<br>> File
"/usr/lib/python2.6/site-packages/ovirt_hoste=
d_engine_ha/agent/hosted_engine.py", line 79, in engine_status<br>>
&nbs=
p; in json.loads(status).iteritems()])<br>>
AttributeError:=
'NoneType' object has no attribute 'iteritems'<br>>
[root@ovirt1 ~]# ho=
sted-engine --vm-status<br>><br>><br>> --=3D=3D
Host 1 status =3D=
=3D--<br>><br>> Status up-to-date
=
: False<br>> Hostname
&n=
bsp;
: 192.1=
68.12.11<br>> Host ID
&=
nbsp; :
1<br>> Engine status &n=
bsp;
:=
unknown stale-data<br>> Score
=
: 2400<br>&g=
t; Local maintenance
 =
; : False<br>> Host timestamp
&=
nbsp; : 1414745750<br>> Extra
metadata (vali=
d at timestamp):<br>>
metadata_parse_v=
ersion=3D1<br>>
metadata_feature_versi=
on=3D1<br>>
timestamp=3D1414745750 (Fr=
i Oct 31 16:55:50 2014)<br>>
host-id=
=3D1<br>>
score=3D2400<br>> =
maintenance=3DFalse<br>>
=
state=3DEngineUp<br>><br>><br>> --=3D=3D
Host 2 stat=
us =3D=3D--<br>><br>> Status up-to-date
&=
nbsp; : False<br>> Hostname
&nb=
sp;
:=
192.168.12.12<br>> Host ID
&n=
bsp; :
2<br>> Engine sta=
tus
&=
nbsp;: unknown stale-data<br>> Score
=
: 2400=
<br>> Local maintenance
=
: False<br>> Host timestamp
&n=
bsp; : 1414745821<br>>
Extra metadata=
(valid at timestamp):<br>>
metadata_p=
arse_version=3D1<br>>
metadata_feature=
_version=3D1<br>>
timestamp=3D14147458=
21 (Fri Oct 31 16:57:01 2014)<br>>
hos=
t-id=3D2<br>>
score=3D2400<br>> &nb=
sp; maintenance=3DFalse<br>>
&n=
bsp; state=3DEngineStart<br>> [root@ovirt1 ~]# service
ovir=
t-ha-agent status<br>> ovirt-ha-agent dead but subsys
locked<br>><br>=
> Host2<br>><br>> MainThread::INFO::2014-10-31
16:55:59,642::agent=
::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engi<br>=
> ne-ha agent 1.1.6 started<br>> MainThread::INFO::2014-10-31
16:55:5=
9,678::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.Hoste=
dEngine:<br>> :(_get_hostname) Found certificate common name: 192.168.12=
.12<br>> MainThread::INFO::2014-10-31 16:55:59,918::hosted_engine::367::=
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:<br>> :(_initial=
ize_broker) Initializing ha-broker connection<br>> MainThread::INFO::201=
4-10-31 16:55:59,919::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlin=
k.BrokerLink::(start_mo<br>> nitor) Starting monitor ping, options
{'add=
r': '192.168.12.254'}<br>> MainThread::INFO::2014-10-31
16:55:59,922::br=
okerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<=
br>> nitor) Success, id 25353488<br>> MainThread::INFO::2014-10-31
16=
:55:59,922::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLi=
nk::(start_mo<br>> nitor) Starting monitor mgmt-bridge, options
{'use_ss=
l': 'true', 'bridge_name': 'ovirtmgmt', 'address':
'0'}<br>> MainThread:=
:INFO::2014-10-31 16:55:59,928::brokerlink::137::ovirt_hosted_engine_ha.lib=
.brokerlink.BrokerLink::(start_mo<br>> nitor) Success, id
25354128<br>&g=
t; MainThread::INFO::2014-10-31 16:55:59,928::brokerlink::126::ovirt_hosted=
_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Starting mon=
itor mem-free, options {'use_ssl': 'true', 'address':
'0'}<br>> MainThre=
ad::INFO::2014-10-31 16:55:59,931::brokerlink::137::ovirt_hosted_engine_ha.=
lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Success, id 25353552<br=
> MainThread::INFO::2014-10-31
16:55:59,931::brokerlink::126::ovirt_hos=
ted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Starting
=
monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid':
'41d4aff=
1-54e1-4946-a812-2e656bb7d3f<br>> 9', 'address':
'0'}<br>> MainThread=
::INFO::2014-10-31 16:55:59,934::brokerlink::137::ovirt_hosted_engine_ha.li=
b.brokerlink.BrokerLink::(start_mo<br>> nitor) Success, id 1399766083895=
84<br>> MainThread::INFO::2014-10-31 16:55:59,934::brokerlink::126::ovir=
t_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo<br>> nitor) Star=
ting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid':
'41d4aff=
1-54e1-4946-a812-2e656bb7d3f9', '<br>> address':
'0'}<br>> MainThread=
::INFO::2014-10-31 16:55:59,939::brokerlink::137::ovirt_hosted_engine_ha.li=
b.brokerlink.BrokerLink::(start_mo<br>> nitor) Success, id 1399766084477=
60<br>> MainThread::INFO::2014-10-31 16:55:59,939::hosted_engine::391::o=
virt_hosted_engine_ha.agent.hosted_engine.HostedEngine:<br>> :(_initiali=
ze_broker) Broker initialized, all submonitors started<br>> MainThread::=
INFO::2014-10-31 16:55:59,983::hosted_engine::476::ovirt_hosted_engine_ha.a=
gent.hosted_engine.HostedEngine:<br>> :(_initialize_sanlock) Ensuring le=
ase for lockspace hosted-engine, host id 2 is acquired (file: /rhev/data-ce=
nter/mnt/g<br>> luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_=
agent/hosted-engine.lockspace)<br>> MainThread::INFO::2014-10-31 16:56:0=
0,001::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.Hoste=
dEngine:<br>> :(refresh) Global metadata: {'maintenance':
False}<br>>=
MainThread::INFO::2014-10-31 16:56:00,001::state_machine::158::ovirt_hoste=
d_engine_ha.agent.hosted_engine.HostedEngine:<br>> :(refresh) Host 192.1=
68.12.11 (id 1): {'live-data': True, 'extra':
'metadata_parse_version=3D1\n=
metadata_feature_version=3D<br>> 1\ntimestamp=3D1414745750 (Fri Oct 31 1=
6:55:50 2014)\nhost-id=3D1\nscore=3D2400\nmaintenance=3DFalse\nstate=3DEngi=
neUp\n', 'hostn<br>> ame': '192.168.12.11',
'host-id': 1, 'engine-status=
': {'health': 'good', 'vm': 'up', 'detail':
'up'}, 'score': 2400, 'm<br>>=
; aintenance': False, 'host-ts': 1414745750}<br>>
MainThread::INFO::2014=
-10-31 16:56:00,001::state_machine::161::ovirt_hosted_engine_ha.agent.hoste=
d_engine.HostedEngine:<br>> :(refresh) Local (id 2):
{'engine-health': N=
one, 'bridge': True, 'mem-free': None, 'maintenance': False,
'cpu-load': No=
<br>> ne, 'gateway': True}<br>>
MainThread::INFO::2014-10-31 16:56:00=
,002::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(n=
otify)<br>> Trying: notify time=3D1414745760.0 type=3Dstate_transition d=
etail=3DStartState-ReinitializeFSM hostname=3D'ovirt2'<br>>
MainThread::=
INFO::2014-10-31 16:56:00,045::brokerlink::117::ovirt_hosted_engine_ha.lib.=
brokerlink.BrokerLink::(notify)<br>> Success, was notification of state_=
transition (StartState-ReinitializeFSM) sent? ignored<br>> MainThread::I=
NFO::2014-10-31 16:56:00,325::hosted_engine::327::ovirt_hosted_engine_ha.ag=
ent.hosted_engine.HostedEngine:<br>> :(start_monitoring) Current state R=
einitializeFSM (score: 0)<br>> MainThread::INFO::2014-10-31 16:56:10,352=
::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notif=
y) Trying: notify time=3D1414745770.35 type=3Dstate_transition detail=3DRei=
nitializeFSM-EngineDown hostname=3D'ovirt2'<br>>
MainThread::INFO::2014-=
10-31 16:56:10,353::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.=
BrokerLink::(notify) Success, was notification of state_transition (Reiniti=
alizeFSM-EngineDown) sent? ignored<br>> MainThread::INFO::2014-10-31 16:=
56:10,638::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.H=
ostedEngine::(start_monitoring) Current state EngineDown (score: 2400)<br>&=
gt; MainThread::INFO::2014-10-31 16:56:20,663::states::441::ovirt_hosted_en=
gine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not runni=
ng, but we do not have enough data to decide which hosts are alive<br>> =
MainThread::INFO::2014-10-31 16:56:20,663::brokerlink::108::ovirt_hosted_en=
gine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=3D141474578=
0.66 type=3Dstate_transition detail=3DEngineDown-EngineDown hostname=3D'ovi=
rt2'<br>> MainThread::INFO::2014-10-31
16:56:20,664::brokerlink::117::ov=
irt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notif=
ication of state_transition (EngineDown-EngineDown) sent? ignored<br>> M=
ainThread::INFO::2014-10-31 16:56:20,943::hosted_engine::327::ovirt_hosted_=
engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current stat=
e EngineDown (score: 2400)<br>> MainThread::INFO::2014-10-31 16:56:30,96=
8::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(c=
onsume) The engine is not running, but we do not have enough data to decide=
which hosts are alive<br>> MainThread::INFO::2014-10-31 16:56:30,969::b=
rokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) =
Trying: notify time=3D1414745790.97 type=3Dstate_transition detail=3DEngine=
Down-EngineDown hostname=3D'ovirt2'<br>> MainThread::INFO::2014-10-31
16=
:56:30,969::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLi=
nk::(notify) Success, was notification of state_transition (EngineDown-Engi=
neDown) sent? ignored<br>> MainThread::INFO::2014-10-31 16:56:31,248::ho=
sted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::=
(start_monitoring) Current state EngineDown (score: 2400)<br>> MainThrea=
d::INFO::2014-10-31 16:56:41,274::states::441::ovirt_hosted_engine_ha.agent=
.hosted_engine.HostedEngine::(consume) The engine is not running, but we do=
not have enough data to decide which hosts are alive<br>> MainThread::I=
NFO::2014-10-31 16:56:41,275::brokerlink::108::ovirt_hosted_engine_ha.lib.b=
rokerlink.BrokerLink::(notify) Trying: notify time=3D1414745801.28 type=3Ds=
tate_transition detail=3DEngineDown-EngineDown
hostname=3D'ovirt2'<br>> =
MainThread::INFO::2014-10-31 16:56:41,276::brokerlink::117::ovirt_hosted_en=
gine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of st=
ate_transition (EngineDown-EngineDown) sent? ignored<br>> MainThread::IN=
FO::2014-10-31 16:56:41,555::hosted_engine::327::ovirt_hosted_engine_ha.age=
nt.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown =
(score: 2400)<br>> MainThread::INFO::2014-10-31 16:56:51,583::states::44=
1::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The e=
ngine is not running, but we do not have enough data to decide which hosts =
are alive<br>> MainThread::INFO::2014-10-31 16:56:51,584::brokerlink::10=
8::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notif=
y time=3D1414745811.58 type=3Dstate_transition detail=3DEngineDown-EngineDo=
wn hostname=3D'ovirt2'<br>> MainThread::INFO::2014-10-31
16:56:51,584::b=
rokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) =
Success, was notification of state_transition (EngineDown-EngineDown) sent?=
ignored<br>> MainThread::INFO::2014-10-31 16:56:51,864::hosted_engine::=
327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monito=
ring) Current state EngineDown (score: 2400)<br>> MainThread::INFO::2014=
-10-31 16:57:01,897::states::454::ovirt_hosted_engine_ha.agent.hosted_engin=
e.HostedEngine::(consume) Engine down and local host has best score (2400),=
attempting to start engine VM<br>> MainThread::INFO::2014-10-31 16:57:0=
1,898::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(=
notify) Trying: notify time=3D1414745821.9 type=3Dstate_transition detail=
=3DEngineDown-EngineStart hostname=3D'ovirt2'<br>>
MainThread::INFO::201=
4-10-31 16:57:01,906::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlin=
k.BrokerLink::(notify) Success, was notification of state_transition (Engin=
eDown-EngineStart) sent? ignored<br>> MainThread::INFO::2014-10-31 16:57=
:02,189::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.Hos=
tedEngine::(start_monitoring) Current state EngineStart (score: 2400)<br>&g=
t; MainThread::CRITICAL::2014-10-31 16:57:02,207::agent::103::ovirt_hosted_=
engine_ha.agent.agent.Agent::(run) Could not start ha-agent<br>> Traceba=
ck (most recent call last):<br>> File
"/usr/lib/python2.6/s=
ite-packages/ovirt_hosted_engine_ha/agent/agent.py", line 97, in
run<br>>=
; self._run_agent()<br>>
File "/usr/lib=
/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 154, =
in _run_agent<br>>
hosted_engine.HostedEngine(self.s=
hutdown_requested).start_monitoring()<br>> File
"/usr/lib/p=
ython2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line=
307, in start_monitoring<br>> for old_state,
state,=
delay in self.fsm:<br>> File
"/usr/lib/python2.6/site-pack=
ages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 125, in next<br>>
=
new_data =3D
self.refresh(self._state.data)<br>> &nb=
sp; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/age=
nt/state_machine.py", line 77, in refresh<br>>
stats=
.update(self.hosted_engine.collect_stats())<br>> File
"/usr=
/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py"=
, line 662, in collect_stats<br>>
constants.SERVICE_=
TYPE)<br>> File
"/usr/lib/python2.6/site-packages/ovirt_hos=
ted_engine_ha/lib/brokerlink.py", line 171, in
get_stats_from_storage<br>&g=
t; result =3D
self._checked_communicate(request)<br>>=
; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_h=
a/lib/brokerlink.py", line 199, in _checked_communicate<br>>
&nbs=
p; .format(message or response))<br>> RequestError: Request
failed=
: <type 'exceptions.OSError'><br>><br>>
[root@ovirt2 ~]# hosted=
-engine --vm-status<br>> Traceback (most recent call last):<br>>
&nbs=
p; File "/usr/lib64/python2.6/runpy.py", line 122, in
_run_module_as_=
main<br>> "__main__", fname,
loader, pkg_name)<br>&g=
t; File "/usr/lib64/python2.6/runpy.py", line 34, in
_run_code=
<br>> exec code in
run_globals<br>> =
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.=
py", line 111, in <module><br>>
if not status_=
checker.print_status():<br>> File
"/usr/lib/python2.6/site-=
packages/ovirt_hosted_engine_setup/vm_status.py", line 58, in print_status<=
br>> all_host_stats =3D
ha_cli.get_all_host_stats()<=
br>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_en=
gine_ha/client/client.py", line 137, in get_all_host_stats<br>>
&=
nbsp; return self.get_all_stats(self.StatModes.HOST)<br>>
&=
nbsp;File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/c=
lient.py", line 86, in get_all_stats<br>>
constants.=
SERVICE_TYPE)<br>> File
"/usr/lib/python2.6/site-packages/o=
virt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_stora=
ge<br>> result =3D
self._checked_communicate(request=
)<br>> File
"/usr/lib/python2.6/site-packages/ovirt_hosted_=
engine_ha/lib/brokerlink.py", line 199, in _checked_communicate<br>>
&nb=
sp; .format(message or response))<br>>
ovirt_hosted_engine_=
ha.lib.exceptions.RequestError: Request failed: <type 'exceptions.OSErro=
r'><br>> [root@ovirt2 ~]# service ovirt-ha-agent
status<br>> ovirt=
-ha-agent dead but subsys locked<br>><br>><br>>
Thanks,<br>> Ja=
icel<br>><br>> ----- Original Message -----<br>>
From: "Jiri Mosko=
vcak" &lt;jmoskovc(a)redhat.com&gt;<br>&gt; To: "Jaicel"
&lt;jaicel(a)asti.dost=
.gov.ph><br>> Cc: "Niels de Vos"
&lt;ndevos(a)redhat.com&gt;, "Vijay Be=
llur" &lt;vbellur(a)redhat.com&gt;, users(a)ovirt.org, "Gluster Devel"
<glus=
ter-devel(a)gluster.org&gt;<br>&gt; Sent: Friday, October 31, 2014 11:05:32
P=
M<br>> Subject: Re: [ovirt-users] Hosted-Engine HA
problem<br>><br>&g=
t; On 10/31/2014 10:26 AM, Jaicel wrote:<br>>> i've increased the
lim=
it and then restarted agent and broker. status normalize, but then right no=
w it went to "False" state again but still both having 2400 score. agent lo=
gs remains the same, with "ovirt-ha-agent dead but subsys locked" status. h=
a-broker logs below<br>>><br>>>
Thread-138::INFO::2014-10-31 17=
:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.Connectio=
nHandler::(setup) Connection established<br>>>
Thread-138::INFO::2014=
-10-31 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.=
ConnectionHandler::(handle) Connection closed<br>>>
Thread-139::INFO:=
:2014-10-31 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.list=
ener.ConnectionHandler::(setup) Connection established<br>>>
Thread-1=
39::INFO::2014-10-31 17:24:38,395::listener::184::ovirt_hosted_engine_ha.br=
oker.listener.ConnectionHandler::(handle) Connection closed<br>>>
Thr=
ead-140::INFO::2014-10-31 17:24:53,816::listener::134::ovirt_hosted_engine_=
ha.broker.listener.ConnectionHandler::(setup) Connection established<br>>=
;> Thread-140::INFO::2014-10-31 17:24:53,827::listener::184::ovirt_hoste=
d_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed<b=
r>>> Thread-141::INFO::2014-10-31 17:25:09,172::listener::134::ovirt_=
hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection esta=
blished<br>>> Thread-141::INFO::2014-10-31
17:25:09,182::listener::18=
4::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Conne=
ction closed<br>>> Thread-142::INFO::2014-10-31
17:25:24,551::listene=
r::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) C=
onnection established<br>>> Thread-142::INFO::2014-10-31
17:25:24,562=
::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::=
(handle) Connection closed<br>>><br>>>
Thanks,<br>>> Jaic=
el<br>><br>> ok, now it seems that broker runs fine, so I need
the re=
cent agent.log<br>> to debug it more.<br>><br>>
--Jirka<br>><br=
>><br>>> ----- Original Message
-----<br>>> From: "Jiri =
Moskovcak"
&lt;jmoskovc(a)redhat.com&gt;<br>&gt;&gt; To: "Jaicel R.
Sabonsoli=
n" &lt;jaicel(a)asti.dost.gov.ph&gt;, "Niels de Vos"
&lt;ndevos(a)redhat.com&gt=
;<br>>> Cc: "Vijay Bellur"
&lt;vbellur(a)redhat.com&gt;, users(a)ovirt.or=
g, "Gluster Devel"
&lt;gluster-devel(a)gluster.org&gt;<br>&gt;&gt; Sent: Frid=
ay, October 31, 2014 4:32:02 PM<br>>> Subject: Re: [ovirt-users]
Host=
ed-Engine HA problem<br>>><br>>> On 10/31/2014
03:53 AM, Jaicel=
R. Sabonsolin wrote:<br>>>> Hi
guys,<br>>>><br>>>&=
gt; these logs appear on both hosts just like the result of --vm-status. tr=
ied to tcpdump on ovirt hosts and gluster nodes but only packets exchange w=
ith my monitoring VM(zabbix)
appeared.<br>>>><br>>>> agen=
t.log<br>>>>
new_data =3D self.refresh(=
self._state.data)<br>>>> File
"/usr/lib/python=
2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, =
in refresh<br>>>>
stats.update(self.hos=
ted_engine.collect_stats())<br>>>>
File "/usr/=
lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",=
line 662, in collect_stats<br>>>>
cons=
tants.SERVICE_TYPE)<br>>>> File
"/usr/lib/pyth=
on2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in=
get_stats_from_storage<br>>>>
result =
=3D self._checked_communicate(request)<br>>>>
=
File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlin=
k.py", line 199, in _checked_communicate<br>>>>
&nbs=
p; .format(message or response))<br>>>> RequestError:
Reques=
t failed: <type
'exceptions.OSError'><br>>>><br>>>>=
broker.log<br>>>> File
"/usr/lib/python2.6/si=
te-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle=
<br>>>> response =3D
"success " + self.=
_dispatch(data)<br>>>> File
"/usr/lib/python2.=
6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _d=
ispatch<br>>>>
.get_all_stats_for_servi=
ce_type(**options)<br>>>> File
"/usr/lib/pytho=
n2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 4=
1, in get_all_stats_for_service_type<br>>>>
&=
nbsp;d =3D self.get_raw_stats_for_service_type(storage_dir, service_type)<b=
r>>>> File
"/usr/lib/python2.6/site-packages/o=
virt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_=
for_service_type<br>>>>
f =3D os.open(p=
ath, direct_flag | os.O_RDONLY)<br>>>> OSError: [Errno 24] Too
man=
y open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f=
78-111cc24139c4/ha_agent/hosted-engine.metadata'<br>>><br>>>
- =
ah, there we go ^^^^^^ you might need to tweak the limit of allowed<br>>=
> open files as described here [1] or find the app keeps so many files o=
pen<br>>><br>>><br>>>
--Jirka<br>>><br>>> [1]=
<br>>>
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number=
-of-open-files/<br>>><br>>>>
Thread-38160::INFO::2014-10-31 =
10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.Connect=
ionHandler::(handle) Connection closed<br>>>>
Thread-38161::INFO::=
2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.liste=
ner.ConnectionHandler::(setup) Connection established<br>>>>
Threa=
d-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine=
_ha.broker.listener.ConnectionHandler::(handle) Error handling request, dat=
a: 'get-stats storage_dir=3D/rhev/data-center/mnt/gluster1:_engine/6eb220be=
-daff-4785-8f78-111cc24139c4/ha_agent
service_type=3Dhosted-engine'<br>>=
>> Traceback (most recent call last):<br>>>>
&=
nbsp;File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/l=
istener.py", line 165, in handle<br>>>>
 =
;response =3D "success " +
self._dispatch(data)<br>>>> &nbs=
p; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/brok=
er/listener.py", line 261, in _dispatch<br>>>>
 =
; .get_all_stats_for_service_type(**options)<br>>>>
&=
nbsp; File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/b=
roker/storage_broker.py", line 41, in
get_all_stats_for_service_type<br>>=
;>> d =3D
self.get_raw_stats_for_service_t=
ype(storage_dir, service_type)<br>>>>
File "/u=
sr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker=
.py", line 74, in get_raw_stats_for_service_type<br>>>>
&nb=
sp; f =3D os.open(path, direct_flag |
os.O_RDONLY)<br>>>=
> OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluste=
r1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.meta=
data'<br>>>> Thread-38161::INFO::2014-10-31
10:28:53,658::listener=
::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) C=
onnection closed<br>>>><br>>>>
Thanks,<br>>>> Ja=
icel<br>>>><br>>>> ----- Original
Message -----<br>>&g=
t;> From: "Niels de Vos"
&lt;ndevos(a)redhat.com&gt;<br>&gt;&gt;&gt; To: "=
Vijay Bellur" &lt;vbellur(a)redhat.com&gt;<br>&gt;&gt;&gt;
Cc: "Jiri Moskovca=
k" &lt;jmoskovc(a)redhat.com&gt;, "Jaicel R. Sabonsolin"
&lt;jaicel(a)asti.dost=
.gov.ph>, users(a)ovirt.org, "Gluster Devel"
&lt;gluster-devel(a)gluster.org=
><br>>>> Sent: Friday, October 31, 2014 4:11:25
AM<br>>>&=
gt; Subject: Re: [ovirt-users] Hosted-Engine HA
problem<br>>>><br>=
>>> On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur
wrote:<=
br>>>>> On 10/30/2014 06:45 PM, Jiri Moskovcak
wrote:<br>>&g=
t;>>> On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin
wrote:<br>>&=
gt;>>>> Hi
Guys,<br>>>>>>><br>>>>>=
;>> I need help with my ovirt Hosted-Engine HA setup. I am running on=
2<br>>>>>>> ovirt hosts and 2 gluster
nodes with replica=
ted volumes. i already have<br>>>>>>> VMs
running on my h=
osts and they can migrate normally once i for
example<br>>>>>&g=
t;> power off the host that they are running on. the problem is that the=
<br>>>>>>> engine can't migrate once i
switch off the hos=
t that hosts the
engine.<br>>>>>>><br>>>>>>=
;> oVirt
3.4.3-1.el6<br>=
>>>>>> KVM
&nbs=
p; 0.12.1.2 - 2.415.el6_5.10<br>>>>>>>
&nbs=
p; LIBVIRT
libvirt-0.10.2-29.el6_5.9<br>>>>>>>
&nb=
sp; VDSM
vdsm-4.14.17-0.el6<br>>>&g=
t;>>><br>>>>>>><br>>>>>>>
righ=
t now, i have this result from hosted-engine
--vm-status.<br>>>>&g=
t;>><br>>>>>>>
Fi=
le "/usr/lib64/python2.6/runpy.py", line 122,
in<br>>>>>>>=
;
_run_module_as_main<br>>>>>>> &nbs=
p; "__main__", fname,
loader, pkg_name)<b=
r>>>>>>>
File "/usr/lib=
64/python2.6/runpy.py", line 34, in
_run_code<br>>>>>>> &=
nbsp; exec code in
run_globals<br>>>=
;>>>>
File<br>>>>>=
;>><br>>>>>>>
"/usr/lib/python2.6/site-packages/ovi=
rt_hosted_engine_setup/vm_status.py",<br>>>>>>><br>>&g=
t;>>>> line 111, in
<module><br>>=
>>>>>
if not status=
_checker.print_status():<br>>>>>>>
&=
nbsp;
File<br>>>>>>><br>>>>>>>
"/=
usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",<br=
>>>>>><br>>>>>>>
=
line 58, in
print_status<br>>>>>>>
&=
nbsp; all_host_stats =3D
ha_cli.get_all_host_stats()<br>>&g=
t;>>>>
File<br>>>>&g=
t;>><br>>>>>>>
"/usr/lib/python2.6/site-packages/ov=
irt_hosted_engine_ha/client/client.py",<br>>>>>>><br>>=
>>>>> line 137, in
get_all_host_stats<b=
r>>>>>>>
return =
self.get_all_stats(self.StatModes.HOST)<br>>>>>>>
=
File<br>>>>>>><br>>>>=
>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/clien=
t/client.py",<br>>>>>>><br>>>>>>>
 =
; line 86, in
get_all_stats<br>>>>>>>  =
;
constants.SERVICE_TYPE)<br>>>>=
>>>
File<br>>>>>>=
><br>>>>>>>
"/usr/lib/python2.6/site-packages/ovirt_ho=
sted_engine_ha/lib/brokerlink.py",<br>>>>>>><br>>>&=
gt;>>> line 171, in
get_stats_from_storage<br=
>>>>>>
result =
=3D
self._checked_communicate(request)<br>>>>>>>
&=
nbsp;
File<br>>>>>>><br>>>>&=
gt;>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/br=
okerlink.py",<br>>>>>>><br>>>>>>>
 =
; line 199, in
_checked_communicate<br>>>>>>&g=
t; .format(message or
response))<b=
r>>>>>>>
ovirt_hosted_engine_ha.lib.=
exceptions.RequestError: Request
failed:<br>>>>>>> =
<type
'exceptions.OSError'><br>>>>>>>=
;<br>>>>>>><br>>>>>>>
restarting ha-bro=
ker and ha-agent normalizes the status but
eventually<br>>>>>&g=
t;> it would become "false" and then return to the result above. hope
yo=
u<br>>>>>>> guys could help me with
this.<br>>>>=
>>><br>>>>>><br>>>>>>
Hi Jaicel,<br>=
>>>>> please attach agent.log and broker.log from the
host w=
here you trying to<br>>>>>> run hosted-engine
--vm-status. I=
have a feeling that you ran into a<br>>>>>>
known problem o=
n gluster - stalled file descriptor, in that case
the<br>>>>>&g=
t; only known solution at this time is to restart the broker & agent as=
you<br>>>>>> have already found
out.<br>>>>>>=
;<br>>>>><br>>>>>
Adding Niels and gluster-devel to=
troubleshoot from Gluster NFS
perspective.<br>>>><br>>>>=
I'd welcome any details on this "stalled file descriptor" problem.
Is<br>&=
gt;>> there a bug filed with some details like logs, sysrq-t and mayb=
e even<br>>>> tcpdumps? If there is an easy way to reproduce
this =
behaviour, I can<br>>>> surely look into it and hopefully come
up =
with some advise or
fix.<br>>>><br>>>>
Thanks,<br>>>=
;>
Niels<br>>>></div></div><br></div></div></body></html>
------=_Part_891998_1183135702.1415681763754
Content-Type: image/gif; name=undefined
Content-Disposition: attachment; filename=undefined
Content-Transfer-Encoding: base64
Content-ID: <8b096be5d873a9597907183bb13f9baf5a0669a2@zimbra>
R0lGODlhEgASAPQfAMKmMq6qpuPQHKOGBqGVjPXnLO/v7r+qTnJeSPPwWdK+H8OrGsjHxWpTEsS8
nfbYEPryR+DYurebE/r6+v79cf32N4h1WtjV07qfLsq2at7SMnljE9XDMVI9Df77WgAAACH5BAUA
AB8ALAAAAAASABIAQAXU4CeKU2mWo2hIVOUW8CMIT+1JxjhdzgEsC8DBcZmkJgcFBcLhuJoQiuJg
nDgWHkHjwigxLg2FZ+EwkjISToJCSXAkGfMHqfG4Op17vuLRUFcJCwgWhIWECBgQOHMXAAoJFQsb
AwsVCQoARSlzEQcSAwMSBxFyIwYYCwkeEKwQHoEYOZwsEBUNe7cVURKkgLUVGw0uwS4QCTgrFAUc
NDXODwsCigYGBxq2BAwBFxcBDATCfgYT1WICGwTc4AIeU+MqFwcLGq6rGgsHF7JHBlwM//pKhQAA
Ow==
------=_Part_891998_1183135702.1415681763754--
------=_Part_891997_988106587.1415681763753--