
------=_Part_24816046_1946051580.1411575237782 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Seems we should consider not adding the host if already there. Please open a bug. Though I really hope in 3.6 to see this done from the gui On Sep 24, 2014 4:23 PM, Stefan Wendler <stefan.wendler@tngtech.com> wrote:
Okay, I'm truncating the previous mails here....
Davids hiOkay, I'm truncating the previous mails here....
Davids hint was the solution. I had the ovirt hosts already added to the cluster and tried to do the hosted-engine-ha setup on them. After removing the hosts from the cluster and putting the data domain to maintenance mode I was able to deploy an all other nodes. I now have a HA'd hosted engine. Which can also be migrated \o/ Maybe that is something that could be stated in the documentation more clearly? Unfortunately now I have a new problem. The agents crash rapidly after startup. The error is the following: (/var/log/ovirt-hosted-engine-ha/agent.log) AttributeError: 'NoneType' object has no attribute 'iteritems' And the whole output here - The agents have been started and I tried a migration of the hosted engine from ovirt host 1 to host 2 which succeeded. But the agents crashed afterwards: MainThread::INFO::2014-09-24 15:09:24,839::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.5 started MainThread::INFO::2014-09-24 15:09:24,871::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: 10.8.2.101 MainThread::INFO::2014-09-24 15:09:25,081::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2014-09-24 15:09:25,082::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '10.8.2.1'} MainThread::INFO::2014-09-24 15:09:25,083::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 25293072 MainThread::INFO::2014-09-24 15:09:25,083::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2014-09-24 15:09:25,086::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 25294160 MainThread::INFO::2014-09-24 15:09:25,086::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2014-09-24 15:09:25,088::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 25293968 MainThread::INFO::2014-09-24 15:09:25,088::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'e1ca293f-09e0-4d2e-8915-221839af1489', 'address': '0'} MainThread::INFO::2014-09-24 15:09:25,089::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 25360400 MainThread::INFO::2014-09-24 15:09:25,089::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'e1ca293f-09e0-4d2e-8915-221839af1489', 'address': '0'} MainThread::INFO::2014-09-24 15:09:25,091::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 25509776 MainThread::INFO::2014-09-24 15:09:25,091::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2014-09-24 15:09:25,125::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 2 is acquired (file: /rhev/data-center/mnt/10.8.2.12:_volume1_engine-store/e313da39-594c-46b5-95c9-c445889c745c/ha_agent/hosted-engine.lockspace) MainThread::INFO::2014-09-24 15:09:25,134::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {'maintenance': False} MainThread::INFO::2014-09-24 15:09:25,134::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Host 10.8.2.100 (id 1): {'live-data': True, 'extra': 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=1411564164 (Wed Sep 24 15:09:24 2014)\nhost-id=1\nscore=2400\nmaintenance=False\nstate=EngineUp\n', 'hostname': '10.8.2.100', 'host-id': 1, 'engine-status': {'health': 'good', 'vm': 'up', 'detail': 'up'}, 'score': 2400, 'maintenance': False, 'host-ts': 1411564164} MainThread::INFO::2014-09-24 15:09:25,134::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Host 10.8.2.102 (id 3): {'live-data': False, 'extra': 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=1411562496 (Wed Sep 24 14:41:36 2014)\nhost-id=3\nscore=0\nmaintenance=False\nstate=EngineUnexpectedlyDown\ntimeout=Wed Sep 24 14:50:24 2014\n', 'hostname': '10.8.2.102', 'host-id': 3, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 0, 'maintenance': False, 'host-ts': 1411562496} MainThread::INFO::2014-09-24 15:09:25,134::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 2): {'engine-health': None, 'bridge': True, 'mem-free': None, 'maintenance': False, 'cpu-load': None, 'gateway': True} MainThread::INFO::2014-09-24 15:09:25,135::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1411564165.14 type=state_transition detail=StartState-ReinitializeFSM hostname='ovirt-node-mapconv2.int.tngtech.com' MainThread::INFO::2014-09-24 15:09:25,170::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (StartState-ReinitializeFSM) sent? sent MainThread::INFO::2014-09-24 15:09:25,383::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state ReinitializeFSM (score: 0) MainThread::INFO::2014-09-24 15:09:35,409::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1411564175.41 type=state_transition detail=ReinitializeFSM-EngineDown hostname='ovirt-node-mapconv2.int.tngtech.com' MainThread::INFO::2014-09-24 15:09:35,410::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-EngineDown) sent? ignored MainThread::INFO::2014-09-24 15:09:35,627::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400) MainThread::INFO::2014-09-24 15:09:45,652::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not running, but we do not have enough data to decide which hosts are alive MainThread::INFO::2014-09-24 15:09:45,653::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1411564185.65 type=state_transition detail=EngineDown-EngineDown hostname='ovirt-node-mapconv2.int.tngtech.com' MainThread::INFO::2014-09-24 15:09:45,653::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored MainThread::INFO::2014-09-24 15:09:45,875::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400) MainThread::CRITICAL::2014-09-24 15:09:55,899::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 97, in run self._run_agent() File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 307, in start_monitoring for old_state, state, delay in self.fsm: File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 125, in next new_data = self.refresh(self._state.data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 700, in collect_stats stats = self.process_remote_metadata(host_id, remote_data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 747, in process_remote_metadata md['engine-status'] = engine_status(md["engine-status"]) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 79, in engine_status in json.loads(status).iteritems()]) AttributeError: 'NoneType' object has no attribute 'iteritems' ------=_Part_24816046_1946051580.1411575237782 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable <html><body><div>Seems we should consider not adding the host if already th= ere. Please open a bug.<br>Though I really hope in 3.6 to see this done fro= m the gui<br><br>On Sep 24, 2014 4:23 PM, Stefan Wendler <stefan.wendler= @tngtech.com> wrote:<br>><br>> Okay, I'm truncating the previous m= ails here.... <br>><br>> Davids hi</div><br><div>Okay, I'm truncating= the previous mails here.... <br> <br>Davids hint was the solution. I had the ovirt hosts already added to th= e <br>cluster and tried to do the hosted-engine-ha setup on them. <br> <br>After removing the hosts from the cluster and putting the data domain t= o <br>maintenance mode I was able to deploy an all other nodes. I now have a <br>HA'd hosted engine. Which can also be migrated \o/ <br> <br>Maybe that is something that could be stated in the documentation more <br>clearly? <br> <br>Unfortunately now I have a new problem. The agents crash rapidly after <br>startup. The error is the following: <br>(/var/log/ovirt-hosted-engine-ha/agent.log) <br> <br>AttributeError: 'NoneType' object has no attribute 'iteritems' <br> <br>And the whole output here - The agents have been started and I tried a <br>migration of the hosted engine from ovirt host 1 to host 2 which <br>succeeded. But the agents crashed afterwards: <br> <br>MainThread::INFO::2014-09-24 <br>15:09:24,839::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run= ) <br>ovirt-hosted-engine-ha agent 1.1.5 started <br>MainThread::INFO::2014-09-24 <br>15:09:24,871::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(_get_hostname) <br>Found certificate common name: 10.8.2.101 <br>MainThread::INFO::2014-09-24 <br>15:09:25,081::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(_initialize_broker) <br>Initializing ha-broker connection <br>MainThread::INFO::2014-09-24 <br>15:09:25,082::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Starting monitor ping, options {'addr': '10.8.2.1'} <br>MainThread::INFO::2014-09-24 <br>15:09:25,083::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Success, id 25293072 <br>MainThread::INFO::2014-09-24 <br>15:09:25,083::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name'= : <br>'ovirtmgmt', 'address': '0'} <br>MainThread::INFO::2014-09-24 <br>15:09:25,086::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Success, id 25294160 <br>MainThread::INFO::2014-09-24 <br>15:09:25,086::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} <br>MainThread::INFO::2014-09-24 <br>15:09:25,088::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Success, id 25293968 <br>MainThread::INFO::2014-09-24 <br>15:09:25,088::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', <br>'vm_uuid': 'e1ca293f-09e0-4d2e-8915-221839af1489', 'address': '0'} <br>MainThread::INFO::2014-09-24 <br>15:09:25,089::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Success, id 25360400 <br>MainThread::INFO::2014-09-24 <br>15:09:25,089::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': <br>'e1ca293f-09e0-4d2e-8915-221839af1489', 'address': '0'} <br>MainThread::INFO::2014-09-24 <br>15:09:25,091::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(start_monitor) <br>Success, id 25509776 <br>MainThread::INFO::2014-09-24 <br>15:09:25,091::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(_initialize_broker) <br>Broker initialized, all submonitors started <br>MainThread::INFO::2014-09-24 <br>15:09:25,125::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(_initialize_sanlock) <br>Ensuring lease for lockspace hosted-engine, host id 2 is acquired (file= : <br>/rhev/data-center/mnt/10.8.2.12:_volume1_engine-store/e313da39-594c-46b= 5-95c9-c445889c745c/ha_agent/hosted-engine.lockspace) <br>MainThread::INFO::2014-09-24 <br>15:09:25,134::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(refresh) <br>Global metadata: {'maintenance': False} <br>MainThread::INFO::2014-09-24 <br>15:09:25,134::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(refresh) <br>Host 10.8.2.100 (id 1): {'live-data': True, 'extra': <br>'metadata_parse_version=3D1\nmetadata_feature_version=3D1\ntimestamp=3D= 1411564164 <br>(Wed Sep 24 15:09:24 <br>2014)\nhost-id=3D1\nscore=3D2400\nmaintenance=3DFalse\nstate=3DEngineUp= \n', <br>'hostname': '10.8.2.100', 'host-id': 1, 'engine-status': {'health': <br>'good', 'vm': 'up', 'detail': 'up'}, 'score': 2400, 'maintenance': <br>False, 'host-ts': 1411564164} <br>MainThread::INFO::2014-09-24 <br>15:09:25,134::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(refresh) <br>Host 10.8.2.102 (id 3): {'live-data': False, 'extra': <br>'metadata_parse_version=3D1\nmetadata_feature_version=3D1\ntimestamp=3D= 1411562496 <br>(Wed Sep 24 14:41:36 <br>2014)\nhost-id=3D3\nscore=3D0\nmaintenance=3DFalse\nstate=3DEngineUnexp= ectedlyDown\ntimeout=3DWed <br>Sep 24 14:50:24 2014\n', 'hostname': '10.8.2.102', 'host-id': 3, <br>'engine-status': {'reason': 'vm not running on this host', 'health': <br>'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 0, 'maintenance': <br>False, 'host-ts': 1411562496} <br>MainThread::INFO::2014-09-24 <br>15:09:25,134::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(refresh) <br>Local (id 2): {'engine-health': None, 'bridge': True, 'mem-free': None, <br>'maintenance': False, 'cpu-load': None, 'gateway': True} <br>MainThread::INFO::2014-09-24 <br>15:09:25,135::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(notify) <br>Trying: notify time=3D1411564165.14 type=3Dstate_transition <br>detail=3DStartState-ReinitializeFSM <br>hostname=3D'ovirt-node-mapconv2.int.tngtech.com' <br>MainThread::INFO::2014-09-24 <br>15:09:25,170::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(notify) <br>Success, was notification of state_transition <br>(StartState-ReinitializeFSM) sent? sent <br>MainThread::INFO::2014-09-24 <br>15:09:25,383::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(start_monitoring) <br>Current state ReinitializeFSM (score: 0) <br>MainThread::INFO::2014-09-24 <br>15:09:35,409::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(notify) <br>Trying: notify time=3D1411564175.41 type=3Dstate_transition <br>detail=3DReinitializeFSM-EngineDown <br>hostname=3D'ovirt-node-mapconv2.int.tngtech.com' <br>MainThread::INFO::2014-09-24 <br>15:09:35,410::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(notify) <br>Success, was notification of state_transition <br>(ReinitializeFSM-EngineDown) sent? ignored <br>MainThread::INFO::2014-09-24 <br>15:09:35,627::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(start_monitoring) <br>Current state EngineDown (score: 2400) <br>MainThread::INFO::2014-09-24 <br>15:09:45,652::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.H= ostedEngine::(consume) <br>The engine is not running, but we do not have enough data to decide <br>which hosts are alive <br>MainThread::INFO::2014-09-24 <br>15:09:45,653::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(notify) <br>Trying: notify time=3D1411564185.65 type=3Dstate_transition <br>detail=3DEngineDown-EngineDown hostname=3D'ovirt-node-mapconv2.int.tngt= ech.com' <br>MainThread::INFO::2014-09-24 <br>15:09:45,653::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.Br= okerLink::(notify) <br>Success, was notification of state_transition (EngineDown-EngineDown) <br>sent? ignored <br>MainThread::INFO::2014-09-24 <br>15:09:45,875::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(start_monitoring) <br>Current state EngineDown (score: 2400) <br>MainThread::CRITICAL::2014-09-24 <br>15:09:55,899::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(ru= n) Could <br>not start ha-agent <br>Traceback (most recent call last): <br> File <br>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/age= nt.py", line <br>97, in run <br> self._run_agent() <br> File <br>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/age= nt.py", line <br>154, in _run_agent <br> hosted_engine.HostedEngine(self.shutdown_requested).start= _monitoring() <br> File <br>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hos= ted_engine.py", <br>line 307, in start_monitoring <br> for old_state, state, delay in self.fsm: <br> File <br>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/m= achine.py", <br>line 125, in next <br> new_data =3D self.refresh(self._state.data) <br> File <br>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/sta= te_machine.py", <br>line 77, in refresh <br> stats.update(self.hosted_engine.collect_stats()) <br> File <br>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hos= ted_engine.py", <br>line 700, in collect_stats <br> stats =3D self.process_remote_metadata(host_id, remote_da= ta) <br> File <br>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hos= ted_engine.py", <br>line 747, in process_remote_metadata <br> md['engine-status'] =3D engine_status(md["engine-sta= tus"]) <br> File <br>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hos= ted_engine.py", <br>line 79, in engine_status <br> in json.loads(status).iteritems()]) <br>AttributeError: 'NoneType' object has no attribute 'iteritems' <br> <br> <br></div></body></html> ------=_Part_24816046_1946051580.1411575237782--