Re: [ovirt-users] hosted-engine unknow stale-data

Explored logs on both hosts. broker.log shows no errors. agent.log looking not good: on host1 (which running hosted engine) : MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 749, in _initialize_sanlock "Failed to initialize sanlock, the number of errors has" SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '1' MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt1.telia.ru MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2018-01-12 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Refreshing vm.conf MainThread::INFO::2018-01-12 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '80.239.162.97'} MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457680 MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458064 MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458448 MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457552 MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104459792 MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) Success, id 140546772847056 MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769) MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt agent.log from second host MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server Any suggestions how to resolve this . regards, Artem On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Trying to fix one thing I broke another :(
I fixed mnt_options for hosted engine storage domain and installed latest security patches to my hosts and hosted engine. All VM's up and running, but hosted_engine --vm-status reports about issues:
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt2 Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : False Local maintenance : False crc32 : 193164b8 local_conf_timestamp : 8350 Host timestamp : 8350 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8350 (Fri Jan 12 19:03:54 2018) host-id=1 score=0 vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True [root@ovirt1 ~]#
from second host situation looks a bit different:
[root@ovirt2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : 78eabdb6 local_conf_timestamp : 8403 Host timestamp : 8402 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8402 (Fri Jan 12 19:04:47 2018) host-id=1 score=0 vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
WebGUI shows that engine running on host ovirt1. Gluster looks fine [root@ovirt1 ~]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y 3244 Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y 20372 Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y 16609 Self-heal Daemon on localhost N/A N/A Y 117868 Self-heal Daemon on ovirt2.telia.ru N/A N/A Y 20521 Self-heal Daemon on ovirt3 N/A N/A Y 25093
Task Status of Volume engine ------------------------------------------------------------ ------------------ There are no active volume tasks
How to resolve this issue?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello Artem, Can you check if glusterd service is running on host1 and all the peers are in connected state ? If yes, can you restart ovirt-ha-agent and broker services and check if things are working fine ? Thanks kasturi On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Explored logs on both hosts. broker.log shows no errors.
agent.log looking not good:
on host1 (which running hosted engine) :
MainThread::ERROR::2018-01-12 21:51:03,883::agent::205:: ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 749, in _initialize_sanlock "Failed to initialize sanlock, the number of errors has" SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit
MainThread::ERROR::2018-01-12 21:51:03,884::agent::206:: ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::WARNING::2018-01-12 21:51:08,889::agent::209:: ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '1' MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine:: 242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt1.telia.ru MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine:: 604::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine:: 630::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 21:51:11,399::storage_server:: 220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 21:51:13,725::storage_server:: 239::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,390::storage_server:: 246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,423::storage_server:: 253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine:: 663::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2018-01-12 21:51:18,690::image::126:: ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine:: 666::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Refreshing vm.conf MainThread::INFO::2018-01-12 21:51:21,895::config::493:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 21:51:21,896::config::416:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf MainThread::INFO::2018-01-12 21:51:21,915::config::435:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 21:51:21,918::config::440:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine:: 509::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '80.239.162.97'} MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457680 MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458064 MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458448 MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457552 MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104459792 MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179: :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) Success, id 140546772847056 MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine:: 601::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine:: 704::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/ 4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/ 911c7637-b49d-463e-b186-23b404e50769) MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine:: 745::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine:: 745::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine:: 745::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine:: 745::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt
agent.log from second host
MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine:: 630::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:37,242::storage_server:: 220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine:: 639::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine:: 453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:01:52,150::config::493:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:01:52,150::config::416:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf MainThread::INFO::2018-01-12 22:01:52,174::config::435:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:01:52,179::config::440:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine:: 604::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine:: 630::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:54,587::storage_server:: 220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine:: 639::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:59,299::states::682:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine:: 453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:09,659::config::493:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:09,659::config::416:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf MainThread::INFO::2018-01-12 22:02:09,683::config::435:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:09,688::config::440:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine:: 604::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine:: 630::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:12,113::storage_server:: 220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine:: 639::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:02:16,859::states::682:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine:: 453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:27,100::config::493:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:27,100::config::416:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf MainThread::INFO::2018-01-12 22:02:27,125::config::435:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:27,129::config::440:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine. config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:27,130::states::667:: ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine:: 604::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine:: 630::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server:: 220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server
Any suggestions how to resolve this .
regards, Artem
On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Trying to fix one thing I broke another :(
I fixed mnt_options for hosted engine storage domain and installed latest security patches to my hosts and hosted engine. All VM's up and running, but hosted_engine --vm-status reports about issues:
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt2 Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : False Local maintenance : False crc32 : 193164b8 local_conf_timestamp : 8350 Host timestamp : 8350 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8350 (Fri Jan 12 19:03:54 2018) host-id=1 score=0 vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True [root@ovirt1 ~]#
from second host situation looks a bit different:
[root@ovirt2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : 78eabdb6 local_conf_timestamp : 8403 Host timestamp : 8402 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8402 (Fri Jan 12 19:04:47 2018) host-id=1 score=0 vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
WebGUI shows that engine running on host ovirt1. Gluster looks fine [root@ovirt1 ~]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y 3244 Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y 20372 Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y 16609 Self-heal Daemon on localhost N/A N/A Y 117868 Self-heal Daemon on ovirt2.telia.ru N/A N/A Y 20521 Self-heal Daemon on ovirt3 N/A N/A Y 25093
Task Status of Volume engine ------------------------------------------------------------ ------------------ There are no active volume tasks
How to resolve this issue?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello, I have uploaded 2 archives with all relevant logs to shared hosting files from host 1 (which is currently running all VM's including hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK files from second host - https://yadi.sk/d/UBducEsV3RTvhc I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it gives no effect. I have also tried to shutdown hosted_engine VM, stop ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect it again - no effect as well. Also I tried to reinstall second host from WebGUI - this lead to the interesting situation - now hosted-engine --vm-status shows that both hosts have the same address. [root@ovirt1 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a7758085 local_conf_timestamp : 259327 Host timestamp : 259327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=259327 (Mon Jan 15 14:06:48 2018) host-id=1 score=3400 vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True Gluster seems working fine. all gluster nodes showing connected state. Any advises on how to resolve this situation are highly appreciated! Regards, Artem On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com> wrote:
Hello Artem,
Can you check if glusterd service is running on host1 and all the peers are in connected state ? If yes, can you restart ovirt-ha-agent and broker services and check if things are working fine ?
Thanks kasturi
On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Explored logs on both hosts. broker.log shows no errors.
agent.log looking not good:
on host1 (which running hosted engine) :
MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 749, in _initialize_sanlock "Failed to initialize sanlock, the number of errors has" SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit
MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '1' MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::2 42::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt1.telia.ru MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 21:51:11,399::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 21:51:13,725::storage_server::239:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,390::storage_server::246:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,423::storage_server::253:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::6 63::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2018-01-12 21:51:18,690::image::126::ovir t_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::6 66::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Refreshing vm.conf MainThread::INFO::2018-01-12 21:51:21,895::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 21:51:21,896::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc- 227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 21:51:21,915::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 21:51:21,918::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::5 09::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '80.239.162.97'} MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457680 MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458064 MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458448 MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457552 MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104459792 MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) Success, id 140546772847056 MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::6 01::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::7 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/ 093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d- 463e-b186-23b404e50769) MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt
agent.log from second host
MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:37,242::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:01:52,150::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc- 227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:01:52,174::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:01:52,179::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:54,587::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:59,299::states::682::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:09,659::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:09,659::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc- 227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:09,683::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:09,688::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:12,113::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:02:16,859::states::682::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:27,100::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:27,100::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc- 227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:27,125::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:27,129::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:27,130::states::667::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server
Any suggestions how to resolve this .
regards, Artem
On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Trying to fix one thing I broke another :(
I fixed mnt_options for hosted engine storage domain and installed latest security patches to my hosts and hosted engine. All VM's up and running, but hosted_engine --vm-status reports about issues:
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt2 Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : False Local maintenance : False crc32 : 193164b8 local_conf_timestamp : 8350 Host timestamp : 8350 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8350 (Fri Jan 12 19:03:54 2018) host-id=1 score=0 vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True [root@ovirt1 ~]#
from second host situation looks a bit different:
[root@ovirt2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : 78eabdb6 local_conf_timestamp : 8403 Host timestamp : 8402 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8402 (Fri Jan 12 19:04:47 2018) host-id=1 score=0 vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
WebGUI shows that engine running on host ovirt1. Gluster looks fine [root@ovirt1 ~]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y 3244 Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y 20372 Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y 16609 Self-heal Daemon on localhost N/A N/A Y 117868 Self-heal Daemon on ovirt2.telia.ru N/A N/A Y 20521 Self-heal Daemon on ovirt3 N/A N/A Y 25093
Task Status of Volume engine ------------------------------------------------------------ ------------------ There are no active volume tasks
How to resolve this issue?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello, I now see that your hosted engine is up and running. Can you let me know how did you try reinstalling the host? Below is the procedure which is used and hope you did not miss any step while reinstalling. If no, can you try reinstalling again and see if that works ? 1) Move the host to maintenance 2) click on reinstall 3) provide the password 4) uncheck 'automatically configure host firewall' 5) click on 'Deploy' tab 6) click Hosted Engine deployment as 'Deploy' And once the host installation is done, wait till the active score of the host shows 3400 in the general tab then check hosted-engine --vm-status. Thanks kasturi On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
I have uploaded 2 archives with all relevant logs to shared hosting files from host 1 (which is currently running all VM's including hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK files from second host - https://yadi.sk/d/UBducEsV3RTvhc
I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it gives no effect. I have also tried to shutdown hosted_engine VM, stop ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect it again - no effect as well. Also I tried to reinstall second host from WebGUI - this lead to the interesting situation - now hosted-engine --vm-status shows that both hosts have the same address.
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a7758085 local_conf_timestamp : 259327 Host timestamp : 259327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=259327 (Mon Jan 15 14:06:48 2018) host-id=1 score=3400 vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Gluster seems working fine. all gluster nodes showing connected state.
Any advises on how to resolve this situation are highly appreciated!
Regards, Artem
On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com> wrote:
Hello Artem,
Can you check if glusterd service is running on host1 and all the peers are in connected state ? If yes, can you restart ovirt-ha-agent and broker services and check if things are working fine ?
Thanks kasturi
On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Explored logs on both hosts. broker.log shows no errors.
agent.log looking not good:
on host1 (which running hosted engine) :
MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 749, in _initialize_sanlock "Failed to initialize sanlock, the number of errors has" SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit
MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '1' MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::2 42::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt1.telia.ru MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 21:51:11,399::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 21:51:13,725::storage_server::239:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,390::storage_server::246:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,423::storage_server::253:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::6 63::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2018-01-12 21:51:18,690::image::126::ovir t_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::6 66::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Refreshing vm.conf MainThread::INFO::2018-01-12 21:51:21,895::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 21:51:21,896::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 21:51:21,915::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 21:51:21,918::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::5 09::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '80.239.162.97'} MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457680 MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458064 MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458448 MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457552 MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104459792 MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) Success, id 140546772847056 MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::6 01::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::7 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/0 93faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186- 23b404e50769) MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt
agent.log from second host
MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:37,242::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:01:52,150::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:01:52,174::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:01:52,179::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:54,587::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:59,299::states::682::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:09,659::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:09,659::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:09,683::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:09,688::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:12,113::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:02:16,859::states::682::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:27,100::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:27,100::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:27,125::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:27,129::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:27,130::states::667::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server
Any suggestions how to resolve this .
regards, Artem
On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Trying to fix one thing I broke another :(
I fixed mnt_options for hosted engine storage domain and installed latest security patches to my hosts and hosted engine. All VM's up and running, but hosted_engine --vm-status reports about issues:
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt2 Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : False Local maintenance : False crc32 : 193164b8 local_conf_timestamp : 8350 Host timestamp : 8350 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8350 (Fri Jan 12 19:03:54 2018) host-id=1 score=0 vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True [root@ovirt1 ~]#
from second host situation looks a bit different:
[root@ovirt2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : 78eabdb6 local_conf_timestamp : 8403 Host timestamp : 8402 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8402 (Fri Jan 12 19:04:47 2018) host-id=1 score=0 vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
WebGUI shows that engine running on host ovirt1. Gluster looks fine [root@ovirt1 ~]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y 3244 Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y 20372 Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y 16609 Self-heal Daemon on localhost N/A N/A Y 117868 Self-heal Daemon on ovirt2.telia.ru N/A N/A Y 20521 Self-heal Daemon on ovirt3 N/A N/A Y 25093
Task Status of Volume engine ------------------------------------------------------------ ------------------ There are no active volume tasks
How to resolve this issue?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello, Yes, I followed exactly the same procedure while reinstalling the hosts (the only difference that I have SSH key configured instead of the password). Just reinstalled the second host one more time, after 20 min the host still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and I still don't see crown icon for this host. hosted-engine --vm-status from ovirt1 host [root@ovirt1 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3f94156a local_conf_timestamp : 349144 Host timestamp : 349144 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349144 (Tue Jan 16 15:03:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True hosted-engine --vm-status output from ovirt2 host [root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 6d3606f1 local_conf_timestamp : 349264 Host timestamp : 349264 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349264 (Tue Jan 16 15:05:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True Also I saw some log messages in webGUI about time drift like "Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum configured value is 300 seconds." that is a bit weird as haven't touched any time settings since I installed the cluster. both host have the same time and timezone (MSK) but hosted engine lives in UTC timezone. Is it mandatory to have everything in sync and in the same timezone? Regards, Artem On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> wrote:
Hello,
I now see that your hosted engine is up and running. Can you let me know how did you try reinstalling the host? Below is the procedure which is used and hope you did not miss any step while reinstalling. If no, can you try reinstalling again and see if that works ?
1) Move the host to maintenance 2) click on reinstall 3) provide the password 4) uncheck 'automatically configure host firewall' 5) click on 'Deploy' tab 6) click Hosted Engine deployment as 'Deploy'
And once the host installation is done, wait till the active score of the host shows 3400 in the general tab then check hosted-engine --vm-status.
Thanks kasturi
On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
I have uploaded 2 archives with all relevant logs to shared hosting files from host 1 (which is currently running all VM's including hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK files from second host - https://yadi.sk/d/UBducEsV3RTvhc
I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it gives no effect. I have also tried to shutdown hosted_engine VM, stop ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect it again - no effect as well. Also I tried to reinstall second host from WebGUI - this lead to the interesting situation - now hosted-engine --vm-status shows that both hosts have the same address.
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a7758085 local_conf_timestamp : 259327 Host timestamp : 259327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=259327 (Mon Jan 15 14:06:48 2018) host-id=1 score=3400 vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Gluster seems working fine. all gluster nodes showing connected state.
Any advises on how to resolve this situation are highly appreciated!
Regards, Artem
On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com> wrote:
Hello Artem,
Can you check if glusterd service is running on host1 and all the peers are in connected state ? If yes, can you restart ovirt-ha-agent and broker services and check if things are working fine ?
Thanks kasturi
On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Explored logs on both hosts. broker.log shows no errors.
agent.log looking not good:
on host1 (which running hosted engine) :
MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 749, in _initialize_sanlock "Failed to initialize sanlock, the number of errors has" SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit
MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '1' MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::2 42::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt1.telia.ru MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 21:51:11,399::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 21:51:13,725::storage_server::239:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,390::storage_server::246:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,423::storage_server::253:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::6 63::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2018-01-12 21:51:18,690::image::126::ovir t_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::6 66::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Refreshing vm.conf MainThread::INFO::2018-01-12 21:51:21,895::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 21:51:21,896::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 21:51:21,915::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 21:51:21,918::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::5 09::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '80.239.162.97'} MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457680 MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458064 MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458448 MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457552 MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104459792 MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) Success, id 140546772847056 MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::6 01::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::7 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/0 93faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186- 23b404e50769) MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt
agent.log from second host
MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:37,242::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:01:52,150::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:01:52,174::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:01:52,179::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:54,587::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:59,299::states::682::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:09,659::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:09,659::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:09,683::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:09,688::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:12,113::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:02:16,859::states::682::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:27,100::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:27,100::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:27,125::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:27,129::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:27,130::states::667::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server
Any suggestions how to resolve this .
regards, Artem
On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Trying to fix one thing I broke another :(
I fixed mnt_options for hosted engine storage domain and installed latest security patches to my hosts and hosted engine. All VM's up and running, but hosted_engine --vm-status reports about issues:
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt2 Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : False Local maintenance : False crc32 : 193164b8 local_conf_timestamp : 8350 Host timestamp : 8350 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8350 (Fri Jan 12 19:03:54 2018) host-id=1 score=0 vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True [root@ovirt1 ~]#
from second host situation looks a bit different:
[root@ovirt2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : 78eabdb6 local_conf_timestamp : 8403 Host timestamp : 8402 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8402 (Fri Jan 12 19:04:47 2018) host-id=1 score=0 vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
WebGUI shows that engine running on host ovirt1. Gluster looks fine [root@ovirt1 ~]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y 3244 Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y 20372 Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y 16609 Self-heal Daemon on localhost N/A N/A Y 117868 Self-heal Daemon on ovirt2.telia.ru N/A N/A Y 20521 Self-heal Daemon on ovirt3 N/A N/A Y 25093
Task Status of Volume engine ------------------------------------------------------------ ------------------ There are no active volume tasks
How to resolve this issue?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. ------------160fee5d54b636727eac37286c Content-Type: text/plain; format=flowed; charset="UTF-8" Content-Transfer-Encoding: 8bit Why are both hosts reporting as ovirt 1? Look at the hostname fields to see what mean. -derek Sent using my mobile device. Please excuse any typos. On January 16, 2018 7:11:09 AM Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hello,
Yes, I followed exactly the same procedure while reinstalling the hosts (the only difference that I have SSH key configured instead of the password).
Just reinstalled the second host one more time, after 20 min the host still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and I still don't see crown icon for this host.
hosted-engine --vm-status from ovirt1 host
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3f94156a local_conf_timestamp : 349144 Host timestamp : 349144 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349144 (Tue Jan 16 15:03:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
hosted-engine --vm-status output from ovirt2 host
[root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 6d3606f1 local_conf_timestamp : 349264 Host timestamp : 349264 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349264 (Tue Jan 16 15:05:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Also I saw some log messages in webGUI about time drift like
"Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum configured value is 300 seconds." that is a bit weird as haven't touched any time settings since I installed the cluster. both host have the same time and timezone (MSK) but hosted engine lives in UTC timezone. Is it mandatory to have everything in sync and in the same timezone?
Regards, Artem
On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> wrote:
Hello,
I now see that your hosted engine is up and running. Can you let me know how did you try reinstalling the host? Below is the procedure which is used and hope you did not miss any step while reinstalling. If no, can you try reinstalling again and see if that works ?
1) Move the host to maintenance 2) click on reinstall 3) provide the password 4) uncheck 'automatically configure host firewall' 5) click on 'Deploy' tab 6) click Hosted Engine deployment as 'Deploy'
And once the host installation is done, wait till the active score of the host shows 3400 in the general tab then check hosted-engine --vm-status.
Thanks kasturi
On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
I have uploaded 2 archives with all relevant logs to shared hosting files from host 1 (which is currently running all VM's including hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK files from second host - https://yadi.sk/d/UBducEsV3RTvhc
I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it gives no effect. I have also tried to shutdown hosted_engine VM, stop ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect it again - no effect as well. Also I tried to reinstall second host from WebGUI - this lead to the interesting situation - now hosted-engine --vm-status shows that both hosts have the same address.
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a7758085 local_conf_timestamp : 259327 Host timestamp : 259327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=259327 (Mon Jan 15 14:06:48 2018) host-id=1 score=3400 vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Gluster seems working fine. all gluster nodes showing connected state.
Any advises on how to resolve this situation are highly appreciated!
Regards, Artem
On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com> wrote:
Hello Artem,
Can you check if glusterd service is running on host1 and all the peers are in connected state ? If yes, can you restart ovirt-ha-agent and broker services and check if things are working fine ?
Thanks kasturi
On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Explored logs on both hosts. broker.log shows no errors.
agent.log looking not good:
on host1 (which running hosted engine) :
MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 749, in _initialize_sanlock "Failed to initialize sanlock, the number of errors has" SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit
MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovir t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '1' MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::2 42::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt1.telia.ru MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 21:51:11,399::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 21:51:13,725::storage_server::239:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,390::storage_server::246:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,423::storage_server::253:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::6 63::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2018-01-12 21:51:18,690::image::126::ovir t_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::6 66::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Refreshing vm.conf MainThread::INFO::2018-01-12 21:51:21,895::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 21:51:21,896::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 21:51:21,915::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 21:51:21,918::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::5 09::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '80.239.162.97'} MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457680 MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458064 MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458448 MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457552 MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104459792 MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179:: ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) Success, id 140546772847056 MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::6 01::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::7 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/0 93faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186- 23b404e50769) MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::7 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt
agent.log from second host
MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:37,242::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:01:52,150::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:01:52,174::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:01:52,179::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:54,587::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:59,299::states::682::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:09,659::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:09,659::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:09,683::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:09,688::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:12,113::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::6 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:02:16,859::states::682::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::4 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:27,100::config::493::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:27,100::config::416::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134:: ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:27,125::config::435::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:27,129::config::440::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config: :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:27,130::states::667::ovi rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::6 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::6 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220:: ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server
Any suggestions how to resolve this .
regards, Artem
On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Trying to fix one thing I broke another :(
I fixed mnt_options for hosted engine storage domain and installed latest security patches to my hosts and hosted engine. All VM's up and running, but hosted_engine --vm-status reports about issues:
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt2 Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : False Local maintenance : False crc32 : 193164b8 local_conf_timestamp : 8350 Host timestamp : 8350 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8350 (Fri Jan 12 19:03:54 2018) host-id=1 score=0 vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True [root@ovirt1 ~]#
from second host situation looks a bit different:
[root@ovirt2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : 78eabdb6 local_conf_timestamp : 8403 Host timestamp : 8402 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8402 (Fri Jan 12 19:04:47 2018) host-id=1 score=0 vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Thu Jan 1 05:24:43 1970
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
WebGUI shows that engine running on host ovirt1. Gluster looks fine [root@ovirt1 ~]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y 3244 Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y 20372 Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y 16609 Self-heal Daemon on localhost N/A N/A Y 117868 Self-heal Daemon on ovirt2.telia.ru N/A N/A Y 20521 Self-heal Daemon on ovirt3 N/A N/A Y 25093
Task Status of Volume engine ------------------------------------------------------------ ------------------ There are no active volume tasks
How to resolve this issue?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
---------- _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Host timestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 : 349144</div><div>Extra metadata (valid at timestamp):</div><di= v>=C2=A0 =C2=A0 =C2=A0 =C2=A0 metadata_parse_version=3D1</div><div>=C2=A0 = =C2=A0 =C2=A0 =C2=A0 metadata_feature_version=3D1</div><div>=C2=A0 =C2=A0 = =C2=A0 =C2=A0 timestamp=3D349144 (Tue Jan 16 15:03:45 2018)</div><div>=C2= =A0 =C2=A0 =C2=A0 =C2=A0 host-id=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 = score=3D3400</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 vm_conf_refresh_time=3D3= 49144 (Tue Jan 16 15:03:45 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf= _on_shared_storage=3DTrue</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance= =3DFalse</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 state=3DEngineUp</div><div>= =C2=A0 =C2=A0 =C2=A0 =C2=A0 stopped=3DFalse</div><div><br></div><div><br></=
<div>conf_on_shared_storage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : Tr= ue</div><div>Status up-to-date =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0: False</div><div>Hostname =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : <a href=3D= "http://ovirt1.telia.ru">ovirt1.telia.ru</a></div><div>Host ID =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0: 1</div><div>Engine status =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: unknown stale-data</div><div>Sc= ore =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 3400</div><div>stopped =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0: False</div><div>Local maintenance =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>crc32 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0: 6d3606f1</div><div>local_conf_timestamp =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 349264</div><div>Host timestamp =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 349264</div><=
<div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance=3DFalse</div><div>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 state=3DEngineUp</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 st= opped=3DFalse</div><div><br></div><div><br></div><div>--=3D=3D Host 2 statu= s =3D=3D--</div><div><br></div><div>conf_on_shared_storage =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 : True</div><div>Status up-to-date =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>Hostn= ame =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 : <a href=3D"http://ovirt1.telia.ru">ovirt1.telia.ru</= a></div><div>Host ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 2</div><div>Engine status = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0: unknown stale-data</div><div>Score =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 0</d= iv><div>stopped =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: True</div><div>Local maintenance = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div>= <div>crc32 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: c7037c03</div><div>local_conf_ti= mestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 7530</div><div>H= ost timestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 : 7530</div><div>Extra metadata (valid at timestamp):</div><div>= =C2=A0 =C2=A0 =C2=A0 =C2=A0 metadata_parse_version=3D1</div><div>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 metadata_feature_version=3D1</div><div>=C2=A0 =C2=A0 =C2= =A0 =C2=A0 timestamp=3D7530 (Fri Jan 12 16:10:12 2018)</div><div>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 host-id=3D2</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 score= =3D0</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 vm_conf_refresh_time=3D7530 (Fri= Jan 12 16:10:12 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_on_shared= _storage=3DTrue</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance=3DFalse</=
Also I saw some log messages in webGUI about time drift like=C2=A0<br><br>= "Host <a href=3D"http://ovirt2.telia.ru">ovirt2.telia.ru</a> has time-= drift of 5305 seconds while maximum configured value is 300 seconds." =
Hostname =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 : <a href=3D"http://ovirt1.telia.ru" target=3D"= _blank">ovirt1.telia.ru</a></div><div>Host ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 1</d= iv><div>Engine status =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0: {"health": "good", "vm&q= uot;: "up", "detail": "up"}</div><div>Score = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 3400</div><span><div>stopped =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0: False</div><div>Local maintenance =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div></span><div>crc32 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0: a7758085</div><div>local_conf_timestamp =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 259327</div><div>Host timestamp =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 259327= </div><span><div>Extra metadata (valid at timestamp):</div><div>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 metadata_parse_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0= =C2=A0 metadata_feature_version=3D1</div></span><div>=C2=A0 =C2=A0 =C2=A0 = =C2=A0 timestamp=3D259327 (Mon Jan 15 14:06:48 2018)</div><div>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 host-id=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 score= =3D3400</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 vm_conf_refresh_time=3D259327= (Mon Jan 15 14:06:48 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_on_s= hared_storage=3DTrue</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance=3DFa= lse</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 state=3DEngineUp</div><div>=C2=A0= =C2=A0 =C2=A0 =C2=A0 stopped=3DFalse</div><span><div><br></div><div><br></=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_on_shared_storage=3DTrue</div><div>=C2=A0= =C2=A0 =C2=A0 =C2=A0 maintenance=3DFalse</div><div>=C2=A0 =C2=A0 =C2=A0 = =C2=A0 state=3DAgentStopped</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 stopped= =3DTrue</div><div><br></div></span><div>Gluster seems working fine. all glu= ster nodes showing connected state.</div><div><br></div><div>Any advises on= how to resolve this situation are highly appreciated!<br><br></div><div>Re= gards,</div><div>Artem</div><div><br></div></div></div><div class=3D"gmail-= m_4720594609300657870HOEnZb"><div class=3D"gmail-m_4720594609300657870h5"><=
-9bb0-4d80-8016-498fa4b88162/5<wbr>cabd8e1-5f4b-469e-becc-227469e<wbr>03f5= c/8048cbd7-77e2-4805-9af4-<wbr>d109fa36dfcf=C2=A0</div><div>MainThread::INF= O::2018-01-12 21:51:21,915::config::435::ovi<wbr>rt_hosted_engine_ha.agent.= host<wbr>ed_engine.HostedEngine.config:<wbr>:(_get_vm_conf_content_from_ov<= wbr>f_store) Found an OVF for HE VM, trying to convert</div><div>MainThread= ::INFO::2018-01-12 21:51:21,918::config::440::ovi<wbr>rt_hosted_engine_ha.a= gent.host<wbr>ed_engine.HostedEngine.config:<wbr>:(_get_vm_conf_content_fro= m_ov<wbr>f_store) Got vm.conf from OVF_STORE</div><div>MainThread::INFO::20= 18-01-12 21:51:21,919::hosted_engine::5<wbr>09::ovirt_hosted_engine_ha.age<= wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_broker) Initializing h= a-broker connection</div><div>MainThread::INFO::2018-01-12 21:51:21,919::br= okerlink::130:<wbr>:<a href=3D"http://ovirt_hosted_engine_ha.lib.br" target= =3D"_blank">ovirt_hosted_engine_ha.lib.br</a><wbr>okerlink.BrokerLink::(sta= rt_mo<wbr>nitor) Starting monitor ping, options {'addr': '80.23= 9.162.97'}</div><div>MainThread::INFO::2018-01-12 21:51:21,922::brokerl= ink::141:<wbr>:<a href=3D"http://ovirt_hosted_engine_ha.lib.br" target=3D"_= blank">ovirt_hosted_engine_ha.lib.br</a><wbr>okerlink.BrokerLink::(start_mo= <wbr>nitor) Success, id 140547104457680</div><div>MainThread::INFO::2018-01= -12 21:51:21,922::brokerlink::130:<wbr>:<a href=3D"http://ovirt_hosted_engi= ne_ha.lib.br" target=3D"_blank">ovirt_hosted_engine_ha.lib.br</a><wbr>okerl= ink.BrokerLink::(start_mo<wbr>nitor) Starting monitor mgmt-bridge, options = {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt= 39;, 'address': '0'}</div><div>MainThread::INFO::2018-01-12= 21:51:21,936::brokerlink::141:<wbr>:<a href=3D"http://ovirt_hosted_engine_= ha.lib.br" target=3D"_blank">ovirt_hosted_engine_ha.lib.br</a><wbr>okerlink= BrokerLink::(start_mo<wbr>nitor) Success, id 140547104458064</div><div>Mai= nThread::INFO::2018-01-12 21:51:21,936::brokerlink::130:<wbr>:<a href=3D"ht= tp://ovirt_hosted_engine_ha.lib.br" target=3D"_blank">ovirt_hosted_engine_h= a.lib.br</a><wbr>okerlink.BrokerLink::(start_mo<wbr>nitor) Starting monitor= mem-free, options {'use_ssl': 'true', 'address': &= #39;0'}</div><div>MainThread::INFO::2018-01-12 21:51:21,938::brokerlink= ::141:<wbr>:<a href=3D"http://ovirt_hosted_engine_ha.lib.br" target=3D"_bla= nk">ovirt_hosted_engine_ha.lib.br</a><wbr>okerlink.BrokerLink::(start_mo<wb= r>nitor) Success, id 140547104458448</div><div>MainThread::INFO::2018-01-12= 21:51:21,939::brokerlink::130:<wbr>:<a href=3D"http://ovirt_hosted_engine_= ha.lib.br" target=3D"_blank">ovirt_hosted_engine_ha.lib.br</a><wbr>okerlink= BrokerLink::(start_mo<wbr>nitor) Starting monitor cpu-load-no-engine, opti= ons {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0= ea-4a09-866b-d0248<wbr>d7523a6', 'address': '0'}</div><=
d7523a6', 'address': '0'}</div><div>MainThread::INFO::= 2018-01-12 21:51:21,942::brokerlink::141:<wbr>:<a href=3D"http://ovirt_host= ed_engine_ha.lib.br" target=3D"_blank">ovirt_hosted_engine_ha.lib.br</a><wb= r>okerlink.BrokerLink::(start_mo<wbr>nitor) Success, id 140547104459792</di= v><div>MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179:<wbr>:<a =
<br><div>MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::6<wbr>3= 0::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_ini= tialize_storage_images) Connecting the storage</div><div>MainThread::INFO::= 2018-01-12 22:01:37,242::storage_server::<wbr>220::<a href=3D"http://ovirt_= hosted_engine_ha.li" target=3D"_blank">ovirt_hosted_engine_ha.li</a><wbr>b.= storage_server.StorageServer<wbr>::(validate_storage_server) Validating sto= rage server</div><div>MainThread::INFO::2018-01-12 22:01:39,540::hosted_eng= ine::6<wbr>39::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine= :<wbr>:(_initialize_storage_images) Storage domain reported as valid and re= connect is not forced.</div><div>MainThread::INFO::2018-01-12 22:01:41,939:= :hosted_engine::4<wbr>53::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.H= ostedEngine:<wbr>:(start_monitoring) Current state EngineUnexpectedlyDown (= score: 0)</div><div>MainThread::INFO::2018-01-12 22:01:52,150::config::493:= :ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.HostedEngine.config:<= wbr>:(refresh_vm_conf) Reloading vm.conf from the shared storage domain</di= v><div>MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovi<wbr>rt_h= osted_engine_ha.agent.host<wbr>ed_engine.HostedEngine.config:<wbr>:(_get_vm= _conf_content_from_ov<wbr>f_store) Trying to get a fresher copy of vm confi= guration from the OVF_STORE</div><div>MainThread::INFO::2018-01-12 22:01:52= ,151::ovf_store::132::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OV= FStore::(getEngin<wbr>eVMOVF) Extracting Engine VM OVF from the OVF_STORE</=
f_store) Trying to get a fresher copy of vm configuration from the OVF_STO= RE</div><div>MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132::<wb= r>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVM= OVF) Extracting Engine VM OVF from the OVF_STORE</div><div>MainThread::INFO= ::2018-01-12 22:02:09,663::ovf_store::134::<wbr>ovirt_hosted_engine_ha.lib.= ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF) OVF_STORE volume path: /= var/run/vdsm/storage/4a7f8717<wbr>-9bb0-4d80-8016-498fa4b88162/5<wbr>cabd8e= 1-5f4b-469e-becc-227469e<wbr>03f5c/8048cbd7-77e2-4805-9af4-<wbr>d109fa36dfc= f=C2=A0</div><div>MainThread::INFO::2018-01-12 22:02:09,683::config::435::o= vi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.HostedEngine.config:<wb= r>:(_get_vm_conf_content_from_ov<wbr>f_store) Found an OVF for HE VM, tryin= g to convert</div><div>MainThread::INFO::2018-01-12 22:02:09,688::config::4= 40::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.HostedEngine.confi= g:<wbr>:(_get_vm_conf_content_from_ov<wbr>f_store) Got vm.conf from OVF_STO= RE</div><div>MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::6<wb= r>04::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_= initialize_vdsm) Initializing VDSM</div><div>MainThread::INFO::2018-01-12 2= 2:02:12,112::hosted_engine::6<wbr>30::ovirt_hosted_engine_ha.age<wbr>nt.hos= ted_engine.HostedEngine:<wbr>:(_initialize_storage_images) Connecting the s= torage</div><div>MainThread::INFO::2018-01-12 22:02:12,113::storage_server:= :<wbr>220::<a href=3D"http://ovirt_hosted_engine_ha.li" target=3D"_blank">o= virt_hosted_engine_ha.li</a><wbr>b.storage_server.StorageServer<wbr>::(vali= date_storage_server) Validating storage server</div><div>MainThread::INFO::= 2018-01-12 22:02:14,444::hosted_engine::6<wbr>39::ovirt_hosted_engine_ha.ag= e<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_storage_images) Stor= age domain reported as valid and reconnect is not forced.</div><div>MainThr= ead::INFO::2018-01-12 22:02:16,859::states::682::ovi<wbr>rt_hosted_engine_h= a.agent.host<wbr>ed_engine.HostedEngine::(score<wbr>) Score is 0 due to une= xpected vm shutdown at Fri Jan 12 21:57:47 2018</div><div>MainThread::INFO:= :2018-01-12 22:02:16,859::hosted_engine::4<wbr>53::ovirt_hosted_engine_ha.a= ge<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring) Current state= EngineUnexpectedlyDown (score: 0)</div><div>MainThread::INFO::2018-01-12 2= 2:02:27,100::config::493::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_eng= ine.HostedEngine.config:<wbr>:(refresh_vm_conf) Reloading vm.conf from the = shared storage domain</div><div>MainThread::INFO::2018-01-12 22:02:27,100::= config::416::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.HostedEng= ine.config:<wbr>:(_get_vm_conf_content_from_ov<wbr>f_store) Trying to get a= fresher copy of vm configuration from the OVF_STORE</div><div>MainThread::= INFO::2018-01-12 22:02:27,101::ovf_store::132::<wbr>ovirt_hosted_engine_ha.=
<br>Any suggestions how to resolve this .</div><div><br>regards,</div></di= v><div>Artem</div><div><br><div class=3D"gmail_extra"><br><div class=3D"gma= il_quote">On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy <span dir=3D"lt= r"><<a href=3D"mailto:artem.tambovskiy@gmail.com" target=3D"_blank">arte= m.tambovskiy@gmail.com</a>></span> wrote:<br><blockquote class=3D"gmail_= quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,= 204);padding-left:1ex"><div dir=3D"ltr">Trying to fix one thing I broke ano=
<br></div><div><br></div><div>--=3D=3D Host 1 status =3D=3D--</div><div><b= r></div><div>conf_on_shared_storage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 : True</div><div>Status up-to-date =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>Hostname =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : ovi= rt2</div><div>Host ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 1</div><div>Engine status = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0: unknown stale-data</div><div>Score =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 0</d= iv><div>stopped =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>Local maintenance = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div>= <div>crc32 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 193164b8</div><div>local_conf_ti= mestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 8350</div><div>H= ost timestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 : 8350</div><div>Extra metadata (valid at timestamp):</div><div>= =C2=A0 =C2=A0 =C2=A0 =C2=A0 metadata_parse_version=3D1</div><div>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 metadata_feature_version=3D1</div><div>=C2=A0 =C2=A0 =C2= =A0 =C2=A0 timestamp=3D8350 (Fri Jan 12 19:03:54 2018)</div><div>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 host-id=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 score= =3D0</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 vm_conf_refresh_time=3D8350 (Fri= Jan 12 19:03:54 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_on_shared= _storage=3DTrue</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance=3DFalse</=
<br></div><div>--=3D=3D Host 2 status =3D=3D--</div><div><br></div><div>co= nf_on_shared_storage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : True</div>= <div>Status up-to-date =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0: False</div><div>Hostname =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : <a href=3D"http://ov= irt1.telia.ru" target=3D"_blank">ovirt1.telia.ru</a></div><div>Host ID =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0: 2</div><div>Engine status =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: unknown stale-data</div>= <div>Score =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 0</div><div>stopped =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0: True</div><div>Local maintenance =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>crc32 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0: c7037c03</div><div>local_conf_timestamp =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 7530</div><div>Host timestamp =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 7530</div>= <div>Extra metadata (valid at timestamp):</div><div>=C2=A0 =C2=A0 =C2=A0 = =C2=A0 metadata_parse_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 met= adata_feature_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 timestamp= =3D7530 (Fri Jan 12 16:10:12 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 ho= st-id=3D2</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 score=3D0</div><div>=C2=A0 = =C2=A0 =C2=A0 =C2=A0 vm_conf_refresh_time=3D7530 (Fri Jan 12 16:10:12 2018)= </div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_on_shared_storage=3DTrue</div><=
</div><div><br></div><div>from second host situation looks a bit different= :<br><br><br><div>[root@ovirt2 ~]# hosted-engine --vm-status</div><div><br>= </div><div><br></div><div>--=3D=3D Host 1 status =3D=3D--</div><div><br></d= iv><div>conf_on_shared_storage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : = True</div><div>Status up-to-date =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0: True</div><div>Hostname =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : ovirt2</di= v><div>Host ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 1</div><div>Engine status =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: {&qu= ot;reason": "vm not running on this host", "health"= ;: "bad", "vm": "down", "detail": &= quot;unknown"}</div><div>Score =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 0</div>= <div>stopped =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>Local maintenance =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><di= v>crc32 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 78eabdb6</div><div>local_conf_times= tamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 8403</div><div>Host= timestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 : 8402</div><div>Extra metadata (valid at timestamp):</div><div>=C2= =A0 =C2=A0 =C2=A0 =C2=A0 metadata_parse_version=3D1</div><div>=C2=A0 =C2=A0= =C2=A0 =C2=A0 metadata_feature_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 = =C2=A0 timestamp=3D8402 (Fri Jan 12 19:04:47 2018)</div><div>=C2=A0 =C2=A0 = =C2=A0 =C2=A0 host-id=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 score=3D0</=
------------160fee5d54b636727eac37286c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <html> <head> </head> <body> <div style=3D"color: black;"> <div style=3D"color: black;"> <p style=3D"margin: 0 0 1em 0; color: black;">Why are both hosts reporting = as ovirt 1? <br> Look at the hostname fields to see what mean. </p> <p style=3D"margin: 0 0 1em 0; color: black;">-derek<br> Sent using my mobile device. Please excuse any typos. <br> </p> </div> <div style=3D"color: black;"> <p style=3D"color: black; font-size: 10pt; font-family: Arial, sans-serif; = margin: 10pt 0;">On January 16, 2018 7:11:09 AM Artem Tambovskiy <artem.= tambovskiy@gmail.com> wrote:</p> <blockquote type=3D"cite" class=3D"gmail_quote" style=3D"margin: 0 0 0 0.75= ex; border-left: 1px solid #808080; padding-left: 0.75ex;"> <div dir=3D"ltr">Hello,<br><br>Yes, I followed exactly the same procedure w= hile reinstalling the hosts (the only difference that I have SSH key config= ured instead of the password).=C2=A0<div><br></div><div>Just reinstalled th= e second host one more time, after 20 min the host still haven't reache= d active score of 3400 (Hosted Engine HA:Not Active) and I still don't = see crown icon for this host.=C2=A0<br><br>hosted-engine --vm-status =C2=A0= from ovirt1 host=C2=A0<br><br><div>[root@ovirt1 ~]# hosted-engine --vm-stat= us</div><div><br></div><div><br></div><div>--=3D=3D Host 1 status =3D=3D--<= /div><div><br></div><div>conf_on_shared_storage =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 : True</div><div>Status up-to-date =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: True</div><div>Hostname =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 : <a href=3D"http://ovirt1.telia.ru">ovirt1.telia.ru</a></div><div>H= ost ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 1</div><div>Engine status =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: {"heal= th": "good", "vm": "up", "detail&qu= ot;: "up"}</div><div>Score =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 3400</d= iv><div>stopped =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>Local maintenance = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div>= <div>crc32 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 3f94156a</div><div>local_conf_ti= mestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 349144</div><div= div><div>--=3D=3D Host 2 status =3D=3D--</div><div><br></div><div>conf_on_s= hared_storage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : True</div><div>St= atus up-to-date =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0: False</div><div>Hostname =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : <a href=3D"http://ovirt1.te= lia.ru">ovirt1.telia.ru</a></div><div>Host ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 2</d= iv><div>Engine status =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0: unknown stale-data</div><div>Score =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0: 0</div><div>stopped =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: True</div><= div>Local maintenance =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0: False</div><div>crc32 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: c7037c03</= div><div>local_conf_timestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 : 7530</div><div>Host timestamp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 7530</div><div>Extra metadata (valid at t= imestamp):</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 metadata_parse_version=3D1= </div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 metadata_feature_version=3D1</div><d= iv>=C2=A0 =C2=A0 =C2=A0 =C2=A0 timestamp=3D7530 (Fri Jan 12 16:10:12 2018)<= /div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 host-id=3D2</div><div>=C2=A0 =C2=A0 = =C2=A0 =C2=A0 score=3D0</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 vm_conf_refre= sh_time=3D7530 (Fri Jan 12 16:10:12 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 = =C2=A0 conf_on_shared_storage=3DTrue</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 = maintenance=3DFalse</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 state=3DAgentStop= ped</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 stopped=3DTrue</div><br><br>hoste= d-engine --vm-status output from ovirt2 host=C2=A0</div><div><br><div>[root= @ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status</div><div><br></= div><div><br></div><div>--=3D=3D Host 1 status =3D=3D--</div><div><br></div= div>Extra metadata (valid at timestamp):</div><div>=C2=A0 =C2=A0 =C2=A0 =C2= =A0 metadata_parse_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 metada= ta_feature_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 timestamp=3D34= 9264 (Tue Jan 16 15:05:45 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 host-= id=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 score=3D3400</div><div>=C2=A0 = =C2=A0 =C2=A0 =C2=A0 vm_conf_refresh_time=3D349264 (Tue Jan 16 15:05:45 201= 8)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_on_shared_storage=3DTrue</div= div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 state=3DAgentStopped</div><div>=C2=A0 = =C2=A0 =C2=A0 =C2=A0 stopped=3DTrue</div><div><br></div><div><br></div><div= that is a bit weird as haven't touched any time settings since I instal= led the cluster.=C2=A0<br>both host have the same time and timezone (MSK) b= ut hosted engine lives in UTC timezone. Is it mandatory to have everything = in sync and in the same timezone?</div><div><br></div>Regards,</div><div>Ar= tem<br><br><br><br><br><br></div><div class=3D"gmail_extra"><br><div class= =3D"gmail_quote">On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <span dir= =3D"ltr"><<a href=3D"mailto:knarra@redhat.com" target=3D"_blank">knarra@= redhat.com</a>></span> wrote:<br><blockquote class=3D"gmail_quote" style= =3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding= -left:1ex"><div dir=3D"ltr">Hello,<div>=C2=A0=C2=A0</div><div>=C2=A0 =C2=A0= =C2=A0I now see that your hosted engine is up and running. Can you let me = know how did you try reinstalling the host? Below is the procedure which is= used and hope you did not miss any step while reinstalling. If no, can you= try reinstalling again and see if that works ?</div><div><br></div><div>1)= Move the host to maintenance</div><div>2) click on reinstall</div><div>3) = provide the password</div><div>4) uncheck 'automatically configure host= firewall'</div><div>5) click on 'Deploy' tab</div><div>6) clic= k Hosted Engine deployment as 'Deploy'</div><div><br></div><div>And= once the host installation is done, wait till the active score of the host= shows 3400 in the general tab then check hosted-engine --vm-status.=C2=A0<= /div><div><br></div><div>Thanks</div><span class=3D"gmail-HOEnZb"><font col= or=3D"#888888"><div>kasturi</div></font></span></div><div class=3D"gmail-HO= EnZb"><div class=3D"gmail-h5"><div class=3D"gmail_extra"><br><div class=3D"= gmail_quote">On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy <span dir=3D= "ltr"><<a href=3D"mailto:artem.tambovskiy@gmail.com" target=3D"_blank">a= rtem.tambovskiy@gmail.com</a>></span> wrote:<br><blockquote class=3D"gma= il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,2= 04,204);padding-left:1ex"><div dir=3D"ltr">Hello,<br><br>I have uploaded 2 = archives with all relevant logs to shared hosting<br>files from host 1 =C2= =A0(which is currently running all VM's including hosted_engine) =C2=A0= - =C2=A0<span style=3D"color:rgb(0,0,0);font-family:Arial,sans-serif;font-s= ize:15.0016px"><a href=3D"https://yadi.sk/d/PttRoYV63RTvhK" target=3D"_blan= k">https://yadi.sk/d/PttRoYV63RT<wbr>vhK</a><br>files from second host -=C2= =A0</span><span style=3D"color:rgb(0,0,0);font-family:Arial,sans-serif;font= -size:15.0016px"><a href=3D"https://yadi.sk/d/UBducEsV3RTvhc" target=3D"_bl= ank">https://yadi.sk/d/UBducEsV3R<wbr>Tvhc</a>=C2=A0<br></span><span style= =3D"color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:15.0016px"><br>= I have tried to restart both=C2=A0</span>ovirt-ha-agent and ovirt-ha-broker= but it gives no effect. I have also tried to shutdown hosted_engine VM, st= op ovirt-ha-agent and ovirt-ha-broker =C2=A0services disconnect storage and= connect it again =C2=A0- no effect as well.=C2=A0<br>Also I tried to reins= tall second host from WebGUI - this lead to the interesting situation - now= =C2=A0=C2=A0hosted-engine --vm-status =C2=A0shows that both hosts have the = same address.=C2=A0<div><br></div><div><span><div>[root@ovirt1 ~]# hosted-e= ngine --vm-status =C2=A0 =C2=A0</div><div><br></div><div>--=3D=3D Host 1 st= atus =3D=3D--</div><div><br></div><div>conf_on_shared_storage =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 : True</div></span><div>Status up-to-date =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: True</div><div= div><div>--=3D=3D Host 2 status =3D=3D--</div><div><br></div><div>conf_on_s= hared_storage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : True</div><div>St= atus up-to-date =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0: False</div><div>Hostname =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : <a href=3D"http://ovirt1.te= lia.ru" target=3D"_blank">ovirt1.telia.ru</a></div><div>Host ID =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0: 2</div><div>Engine status =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: unknown stale-data</div><div>Sc= ore =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 0</div><div>stopped =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0: True</div><div>Local maintenance =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>crc32 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0: c7037c03</div><div>local_conf_timestamp =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 7530</div><div>Host timestamp =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 7530</div><div>Ex= tra metadata (valid at timestamp):</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 me= tadata_parse_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 metadata_fea= ture_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 timestamp=3D7530 (Fr= i Jan 12 16:10:12 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 host-id=3D2</= div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 score=3D0</div><div>=C2=A0 =C2=A0 =C2= =A0 =C2=A0 vm_conf_refresh_time=3D7530 (Fri Jan 12 16:10:12 2018)</div><div= div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Jan 15, 20= 18 at 11:45 AM, Kasturi Narra <span dir=3D"ltr"><<a href=3D"mailto:knarr= a@redhat.com" target=3D"_blank">knarra@redhat.com</a>></span> wrote:<br>= <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-= left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Hello Ar= tem,<div><br></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 Can you check if gluste= rd service is running on host1 and all the peers are in connected state ? I= f yes, can you restart ovirt-ha-agent and broker services and check if thin= gs are working fine ?</div><div><br></div><div>Thanks</div><span class=3D"g= mail-m_4720594609300657870m_-3199546786741141787HOEnZb"><font color=3D"#888= 888"><div>kasturi</div></font></span></div><div class=3D"gmail-m_4720594609= 300657870m_-3199546786741141787HOEnZb"><div class=3D"gmail-m_47205946093006= 57870m_-3199546786741141787h5"><div class=3D"gmail_extra"><br><div class=3D= "gmail_quote">On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy <span dir= =3D"ltr"><<a href=3D"mailto:artem.tambovskiy@gmail.com" target=3D"_blank= ">artem.tambovskiy@gmail.com</a>></span> wrote:<br><blockquote class=3D"= gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20= 4,204,204);padding-left:1ex"><div dir=3D"ltr">Explored logs on both hosts.= =C2=A0<div>broker.log shows no errors.</div><div><br></div><div>agent.log l= ooking not good:<br><br>on host1 (which running hosted engine) :</div><div>= <br></div><div><div>MainThread::ERROR::2018-01-12 21:51:03,883::agent::205:= :ovir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(_run_agent) Traceback= (most recent call last):</div><div>=C2=A0 File "/usr/lib/python2.7/si= te-packa<wbr>ges/ovirt_hosted_engine_ha/age<wbr>nt/agent.py", line 191= , in _run_agent</div><div>=C2=A0 =C2=A0 return action(he)</div><div>=C2=A0 = File "/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/age= <wbr>nt/agent.py", line 64, in action_proper</div><div>=C2=A0 =C2=A0 r= eturn he.start_monitoring()</div><div>=C2=A0 File "/usr/lib/python2.7/= site-packa<wbr>ges/ovirt_hosted_engine_ha/age<wbr>nt/hosted_engine.py"= , line 411, in start_monitoring</div><div>=C2=A0 =C2=A0 self._initialize_sa= nlock()</div><div>=C2=A0 File "/usr/lib/python2.7/site-packa<wbr>ges/o= virt_hosted_engine_ha/age<wbr>nt/hosted_engine.py", line 749, in _init= ialize_sanlock</div><div>=C2=A0 =C2=A0 "Failed to initialize sanlock, = the number of errors has"</div><div>SanlockInitializationError: Failed= to initialize sanlock, the number of errors has exceeded the limit</div><d= iv><br></div><div>MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::o= vir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(_run_agent) Trying to r= estart agent</div><div>MainThread::WARNING::2018-01-1<wbr>2 21:51:08,889::a= gent::209::ovir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(_run_agent)= Restarting agent, attempt '1'</div><div>MainThread::INFO::2018-01-= 12 21:51:08,919::hosted_engine::2<wbr>42::ovirt_hosted_engine_ha.age<wbr>nt= hosted_engine.HostedEngine:<wbr>:(_get_hostname) Found certificate common = name: <a href=3D"http://ovirt1.telia.ru" target=3D"_blank">ovirt1.telia.ru<= /a></div><div>MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::6<w= br>04::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(= _initialize_vdsm) Initializing VDSM</div><div>MainThread::INFO::2018-01-12 = 21:51:11,398::hosted_engine::6<wbr>30::ovirt_hosted_engine_ha.age<wbr>nt.ho= sted_engine.HostedEngine:<wbr>:(_initialize_storage_images) Connecting the = storage</div><div>MainThread::INFO::2018-01-12 21:51:11,399::storage_server= ::<wbr>220::<a href=3D"http://ovirt_hosted_engine_ha.li" target=3D"_blank">= ovirt_hosted_engine_ha.li</a><wbr>b.storage_server.StorageServer<wbr>::(val= idate_storage_server) Validating storage server</div><div>MainThread::INFO:= :2018-01-12 21:51:13,725::storage_server::<wbr>239::<a href=3D"http://ovirt= _hosted_engine_ha.li" target=3D"_blank">ovirt_hosted_engine_ha.li</a><wbr>b= storage_server.StorageServer<wbr>::(connect_storage_server) Connecting sto= rage server</div><div>MainThread::INFO::2018-01-12 21:51:18,390::storage_se= rver::<wbr>246::<a href=3D"http://ovirt_hosted_engine_ha.li" target=3D"_bla= nk">ovirt_hosted_engine_ha.li</a><wbr>b.storage_server.StorageServer<wbr>::= (connect_storage_server) Connecting storage server</div><div>MainThread::IN= FO::2018-01-12 21:51:18,423::storage_server::<wbr>253::<a href=3D"http://ov= irt_hosted_engine_ha.li" target=3D"_blank">ovirt_hosted_engine_ha.li</a><wb= r>b.storage_server.StorageServer<wbr>::(connect_storage_server) Refreshing = the storage domain</div><div>MainThread::INFO::2018-01-12 21:51:18,689::hos= ted_engine::6<wbr>63::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.Hoste= dEngine:<wbr>:(_initialize_storage_images) Preparing images</div><div>MainT= hread::INFO::2018-01-12 21:51:18,690::image::126::ovir<wbr>t_hosted_engine_= ha.lib.image.I<wbr>mage::(prepare_images) Preparing images</div><div>MainTh= read::INFO::2018-01-12 21:51:21,895::hosted_engine::6<wbr>66::ovirt_hosted_= engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_storage_= images) Refreshing vm.conf</div><div>MainThread::INFO::2018-01-12 21:51:21,= 895::config::493::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.Host= edEngine.config:<wbr>:(refresh_vm_conf) Reloading vm.conf from the shared s= torage domain</div><div>MainThread::INFO::2018-01-12 21:51:21,896::config::= 416::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.HostedEngine.conf= ig:<wbr>:(_get_vm_conf_content_from_ov<wbr>f_store) Trying to get a fresher= copy of vm configuration from the OVF_STORE</div><div>MainThread::INFO::20= 18-01-12 21:51:21,896::ovf_store::132::<wbr>ovirt_hosted_engine_ha.lib.ovf<= wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF) Extracting Engine VM OVF fro= m the OVF_STORE</div><div>MainThread::INFO::2018-01-12 21:51:21,897::ovf_st= ore::134::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(get= Engin<wbr>eVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717<wbr= div>MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141:<wbr>:<a hre= f=3D"http://ovirt_hosted_engine_ha.lib.br" target=3D"_blank">ovirt_hosted_e= ngine_ha.lib.br</a><wbr>okerlink.BrokerLink::(start_mo<wbr>nitor) Success, = id 140547104457552</div><div>MainThread::INFO::2018-01-12 21:51:21,941::bro= kerlink::130:<wbr>:<a href=3D"http://ovirt_hosted_engine_ha.lib.br" target= =3D"_blank">ovirt_hosted_engine_ha.lib.br</a><wbr>okerlink.BrokerLink::(sta= rt_mo<wbr>nitor) Starting monitor engine-health, options {'use_ssl'= : 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248<wbr= href=3D"http://ovirt_hosted_engine_ha.lib.br" target=3D"_blank">ovirt_hoste= d_engine_ha.lib.br</a><wbr>okerlink.BrokerLink::(set_stor<wbr>age_domain) S= uccess, id 140546772847056</div><div>MainThread::INFO::2018-01-12 21:51:26,= 952::hosted_engine::6<wbr>01::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engi= ne.HostedEngine:<wbr>:(_initialize_broker) Broker initialized, all submonit= ors started</div><div>MainThread::INFO::2018-01-12 21:51:27,049::hosted_eng= ine::7<wbr>04::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine= :<wbr>:(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, ho= st id 1 is acquired (file: /var/run/vdsm/storage/4a7f8717<wbr>-9bb0-4d80-80= 16-498fa4b88162/0<wbr>93faa75-5e33-4559-84fa-1f1f8d4<wbr>8153b/911c7637-b49= d-463e-b186-<wbr>23b404e50769)</div><div>MainThread::INFO::2018-01-12 21:53= :48,067::hosted_engine::7<wbr>45::ovirt_hosted_engine_ha.age<wbr>nt.hosted_= engine.HostedEngine:<wbr>:(_initialize_sanlock) Failed to acquire the lock.= Waiting '5's before the next attempt</div><div>MainThread::INFO::2= 018-01-12 21:56:14,088::hosted_engine::7<wbr>45::ovirt_hosted_engine_ha.age= <wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_sanlock) Failed to ac= quire the lock. Waiting '5's before the next attempt</div><div>Main= Thread::INFO::2018-01-12 21:58:40,111::hosted_engine::7<wbr>45::ovirt_hoste= d_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_sanloc= k) Failed to acquire the lock. Waiting '5's before the next attempt= </div><div>MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::7<wbr>= 45::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_in= itialize_sanlock) Failed to acquire the lock. Waiting '5's before t= he next attempt</div></div><div><br><br>agent.log from second host=C2=A0<br= div><div>MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134::<wbr>ov= irt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF)= OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717<wbr>-9bb0-4d80-8016-= 498fa4b88162/5<wbr>cabd8e1-5f4b-469e-becc-227469e<wbr>03f5c/8048cbd7-77e2-4= 805-9af4-<wbr>d109fa36dfcf=C2=A0</div><div>MainThread::INFO::2018-01-12 22:= 01:52,174::config::435::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engin= e.HostedEngine.config:<wbr>:(_get_vm_conf_content_from_ov<wbr>f_store) Foun= d an OVF for HE VM, trying to convert</div><div>MainThread::INFO::2018-01-1= 2 22:01:52,179::config::440::ovi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_= engine.HostedEngine.config:<wbr>:(_get_vm_conf_content_from_ov<wbr>f_store)= Got vm.conf from OVF_STORE</div><div>MainThread::INFO::2018-01-12 22:01:52= ,189::hosted_engine::6<wbr>04::ovirt_hosted_engine_ha.age<wbr>nt.hosted_eng= ine.HostedEngine:<wbr>:(_initialize_vdsm) Initializing VDSM</div><div>MainT= hread::INFO::2018-01-12 22:01:54,586::hosted_engine::6<wbr>30::ovirt_hosted= _engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_storage= _images) Connecting the storage</div><div>MainThread::INFO::2018-01-12 22:0= 1:54,587::storage_server::<wbr>220::<a href=3D"http://ovirt_hosted_engine_h= a.li" target=3D"_blank">ovirt_hosted_engine_ha.li</a><wbr>b.storage_server.= StorageServer<wbr>::(validate_storage_server) Validating storage server</di= v><div>MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::6<wbr>39::= ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initia= lize_storage_images) Storage domain reported as valid and reconnect is not = forced.</div><div>MainThread::INFO::2018-01-12 22:01:59,299::states::682::o= vi<wbr>rt_hosted_engine_ha.agent.host<wbr>ed_engine.HostedEngine::(score<wb= r>) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018</d= iv><div>MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::4<wbr>53:= :ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_= monitoring) Current state EngineUnexpectedlyDown (score: 0)</div><div>MainT= hread::INFO::2018-01-12 22:02:09,659::config::493::ovi<wbr>rt_hosted_engine= _ha.agent.host<wbr>ed_engine.HostedEngine.config:<wbr>:(refresh_vm_conf) Re= loading vm.conf from the shared storage domain</div><div>MainThread::INFO::= 2018-01-12 22:02:09,659::config::416::ovi<wbr>rt_hosted_engine_ha.agent.hos= t<wbr>ed_engine.HostedEngine.config:<wbr>:(_get_vm_conf_content_from_ov<wbr= lib.ovf<wbr>.ovf_store.OVFStore::(getEngin<wbr>eVMOVF) Extracting Engine VM= OVF from the OVF_STORE</div><div>MainThread::INFO::2018-01-12 22:02:27,103= ::ovf_store::134::<wbr>ovirt_hosted_engine_ha.lib.ovf<wbr>.ovf_store.OVFSto= re::(getEngin<wbr>eVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f= 8717<wbr>-9bb0-4d80-8016-498fa4b88162/5<wbr>cabd8e1-5f4b-469e-becc-227469e<= wbr>03f5c/8048cbd7-77e2-4805-9af4-<wbr>d109fa36dfcf=C2=A0</div><div>MainThr= ead::INFO::2018-01-12 22:02:27,125::config::435::ovi<wbr>rt_hosted_engine_h= a.agent.host<wbr>ed_engine.HostedEngine.config:<wbr>:(_get_vm_conf_content_= from_ov<wbr>f_store) Found an OVF for HE VM, trying to convert</div><div>Ma= inThread::INFO::2018-01-12 22:02:27,129::config::440::ovi<wbr>rt_hosted_eng= ine_ha.agent.host<wbr>ed_engine.HostedEngine.config:<wbr>:(_get_vm_conf_con= tent_from_ov<wbr>f_store) Got vm.conf from OVF_STORE</div><div>MainThread::= INFO::2018-01-12 22:02:27,130::states::667::ovi<wbr>rt_hosted_engine_ha.age= nt.host<wbr>ed_engine.HostedEngine::(consu<wbr>me) Engine down, local host = does not have best score</div><div>MainThread::INFO::2018-01-12 22:02:27,13= 9::hosted_engine::6<wbr>04::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine= HostedEngine:<wbr>:(_initialize_vdsm) Initializing VDSM</div><div>MainThre= ad::INFO::2018-01-12 22:02:29,584::hosted_engine::6<wbr>30::ovirt_hosted_en= gine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_storage_im= ages) Connecting the storage</div><div>MainThread::INFO::2018-01-12 22:02:2= 9,586::storage_server::<wbr>220::<a href=3D"http://ovirt_hosted_engine_ha.l= i" target=3D"_blank">ovirt_hosted_engine_ha.li</a><wbr>b.storage_server.Sto= rageServer<wbr>::(validate_storage_server) Validating storage server<br><br= ther :(=C2=A0<br><br>I fixed mnt_options for hosted engine storage domain a= nd installed latest security patches to my hosts and hosted engine. All VM&= #39;s up and running, but =C2=A0hosted_engine --vm-status reports about iss= ues:=C2=A0<br><br><div>[root@ovirt1 ~]# hosted-engine --vm-status</div><div= div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 state=3DEngineUnexpectedlyDown</div><d= iv>=C2=A0 =C2=A0 =C2=A0 =C2=A0 stopped=3DFalse</div><div>=C2=A0 =C2=A0 =C2= =A0 =C2=A0 timeout=3DThu Jan =C2=A01 05:24:43 1970</div><div><br></div><div= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance=3DFalse</div><div>=C2=A0 =C2=A0= =C2=A0 =C2=A0 state=3DAgentStopped</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 s= topped=3DTrue</div><div>[root@ovirt1 ~]#=C2=A0</div><div><br></div><div><br= div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 vm_conf_refresh_time=3D8403 (Fri Jan 1= 2 19:04:47 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_on_shared_stora= ge=3DTrue</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance=3DFalse</div><d= iv>=C2=A0 =C2=A0 =C2=A0 =C2=A0 state=3DEngineUnexpectedlyDown</div><div>=C2= =A0 =C2=A0 =C2=A0 =C2=A0 stopped=3DFalse</div><div>=C2=A0 =C2=A0 =C2=A0 =C2= =A0 timeout=3DThu Jan =C2=A01 05:24:43 1970</div><div><br></div><div><br></= div><div>--=3D=3D Host 2 status =3D=3D--</div><div><br></div><div>conf_on_s= hared_storage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : True</div><div>St= atus up-to-date =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0: False</div><div>Hostname =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : <a href=3D"http://ovirt1.te= lia.ru" target=3D"_blank">ovirt1.telia.ru</a></div><div>Host ID =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0: 2</div><div>Engine status =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: unknown stale-data</div><div>Sc= ore =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 0</div><div>stopped =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0: True</div><div>Local maintenance =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0: False</div><div>crc32 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0: c7037c03</div><div>local_conf_timestamp =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 7530</div><div>Host timestamp =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 7530</div><div>Ex= tra metadata (valid at timestamp):</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 me= tadata_parse_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 metadata_fea= ture_version=3D1</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 timestamp=3D7530 (Fr= i Jan 12 16:10:12 2018)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 host-id=3D2</= div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 score=3D0</div><div>=C2=A0 =C2=A0 =C2= =A0 =C2=A0 vm_conf_refresh_time=3D7530 (Fri Jan 12 16:10:12 2018)</div><div=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_on_shared_storage=3DTrue</div><div>=C2=A0= =C2=A0 =C2=A0 =C2=A0 maintenance=3DFalse</div><div>=C2=A0 =C2=A0 =C2=A0 = =C2=A0 state=3DAgentStopped</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 stopped= =3DTrue</div></div><div><br></div><div><br></div><div>WebGUI shows that eng= ine running on host ovirt1.=C2=A0<br>Gluster looks fine=C2=A0<br><div>[root= @ovirt1 ~]# gluster volume status engine</div><div>Status of volume: engine= </div><div>Gluster process =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TCP Port =C2=A0RDMA Port = =C2=A0Online =C2=A0Pid</div><div>------------------------------<wbr>-------= -----------------------<wbr>------------------</div><div>Brick ovirt1.telia= ru:/oVirt/engine =C2=A0 =C2=A0 =C2=A0 =C2=A0 49169 =C2=A0 =C2=A0 0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0Y =C2=A0 =C2=A0 =C2=A0 3244=C2=A0</div><div>Bric= k ovirt2.telia.ru:/oVirt/engine =C2=A0 =C2=A0 =C2=A0 =C2=A0 49179 =C2=A0 = =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Y =C2=A0 =C2=A0 =C2=A0 20372</di= v><div>Brick ovirt3.telia.ru:/oVirt/engine =C2=A0 =C2=A0 =C2=A0 =C2=A0 4920= 6 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Y =C2=A0 =C2=A0 =C2=A0 = 16609</div><div>Self-heal Daemon on localhost =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 N/A =C2=A0 =C2=A0 =C2=A0 N/A =C2=A0 =C2=A0 =C2=A0 =C2= =A0Y =C2=A0 =C2=A0 =C2=A0 117868</div><div>Self-heal Daemon on <a href=3D"h= ttp://ovirt2.telia.ru" target=3D"_blank">ovirt2.telia.ru</a> =C2=A0 =C2=A0 = =C2=A0 =C2=A0 N/A =C2=A0 =C2=A0 =C2=A0 N/A =C2=A0 =C2=A0 =C2=A0 =C2=A0Y =C2= =A0 =C2=A0 =C2=A0 20521</div><div>Self-heal Daemon on ovirt3 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0N/A =C2=A0 =C2=A0 =C2=A0 N/= A =C2=A0 =C2=A0 =C2=A0 =C2=A0Y =C2=A0 =C2=A0 =C2=A0 25093</div><div>=C2=A0<= /div><div>Task Status of Volume engine</div><div>--------------------------= ----<wbr>------------------------------<wbr>------------------</div><div>Th= ere are no active volume tasks<br><br>How to resolve this issue?</div><br><= /div></div> <br>______________________________<wbr>_________________<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org" target=3D"_blank">Users@ovirt.org</a><br=
<a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer= " target=3D"_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><= br> <br></blockquote></div><br></div></div></div> <br>______________________________<wbr>_________________<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org" target=3D"_blank">Users@ovirt.org</a><br=
<a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer= " target=3D"_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><= br> <br></blockquote></div><br></div> </div></div></blockquote></div><br></div> </div></div></blockquote></div><br></div> </div></div></blockquote></div><br></div></div> _______________________________________________<br> Users mailing list<br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"mailto:Users%40ovirt.org">Us= ers@ovirt.org</a><br> <a class=3D"aqm-autolink aqm-autowrap" href=3D"http://lists.ovirt.org/mailm= an/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br> <br> </blockquote> </div> </div> </body> </html> ------------160fee5d54b636727eac37286c--

Hi everybody, there are couple of things to check here. - what version of hosted engine agent is this? The logs look like coming from 4.1 - what version of engine is used? - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on both hosts, the numbers must be different - it looks like the agent or broker on host 2 is not active (or there would be a report) - the second host does not see data from the first host (unknown stale-data), wait for a minute and check again, then check the storage connection And then the general troubleshooting: - put hosted engine in global maintenance mode (and check that it is visible from the other host using he --vm-status) - mount storage domain (hosted-engine --connect-storage) - check sanlock client status to see if proper lockspaces are present Best regards Martin Sivak On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek@ihtfp.com> wrote:
Why are both hosts reporting as ovirt 1? Look at the hostname fields to see what mean.
-derek Sent using my mobile device. Please excuse any typos.
On January 16, 2018 7:11:09 AM Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hello,
Yes, I followed exactly the same procedure while reinstalling the hosts (the only difference that I have SSH key configured instead of the password).
Just reinstalled the second host one more time, after 20 min the host still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and I still don't see crown icon for this host.
hosted-engine --vm-status from ovirt1 host
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3f94156a local_conf_timestamp : 349144 Host timestamp : 349144 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349144 (Tue Jan 16 15:03:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
hosted-engine --vm-status output from ovirt2 host
[root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 6d3606f1 local_conf_timestamp : 349264 Host timestamp : 349264 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349264 (Tue Jan 16 15:05:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Also I saw some log messages in webGUI about time drift like
"Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum configured value is 300 seconds." that is a bit weird as haven't touched any time settings since I installed the cluster. both host have the same time and timezone (MSK) but hosted engine lives in UTC timezone. Is it mandatory to have everything in sync and in the same timezone?
Regards, Artem
On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> wrote:
Hello,
I now see that your hosted engine is up and running. Can you let me know how did you try reinstalling the host? Below is the procedure which is used and hope you did not miss any step while reinstalling. If no, can you try reinstalling again and see if that works ?
1) Move the host to maintenance 2) click on reinstall 3) provide the password 4) uncheck 'automatically configure host firewall' 5) click on 'Deploy' tab 6) click Hosted Engine deployment as 'Deploy'
And once the host installation is done, wait till the active score of the host shows 3400 in the general tab then check hosted-engine --vm-status.
Thanks kasturi
On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hello,
I have uploaded 2 archives with all relevant logs to shared hosting files from host 1 (which is currently running all VM's including hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK files from second host - https://yadi.sk/d/UBducEsV3RTvhc
I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it gives no effect. I have also tried to shutdown hosted_engine VM, stop ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect it again - no effect as well. Also I tried to reinstall second host from WebGUI - this lead to the interesting situation - now hosted-engine --vm-status shows that both hosts have the same address.
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a7758085 local_conf_timestamp : 259327 Host timestamp : 259327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=259327 (Mon Jan 15 14:06:48 2018) host-id=1 score=3400 vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Gluster seems working fine. all gluster nodes showing connected state.
Any advises on how to resolve this situation are highly appreciated!
Regards, Artem
On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com> wrote:
Hello Artem,
Can you check if glusterd service is running on host1 and all the peers are in connected state ? If yes, can you restart ovirt-ha-agent and broker services and check if things are working fine ?
Thanks kasturi
On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Explored logs on both hosts. broker.log shows no errors.
agent.log looking not good:
on host1 (which running hosted engine) :
MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 749, in _initialize_sanlock "Failed to initialize sanlock, the number of errors has" SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limit
MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '1' MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.agenthosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt1.telia.ru MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.libstorage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images MainThread::INFO::2018-01-12 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Refreshing vm.conf MainThread::INFO::2018-01-12 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '80.239.162.97'} MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457680 MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor) Success, id 140547104458064 MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104458448 MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104457552 MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 140547104459792 MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) Success, id 140546772847056 MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769) MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt
agent.log from second host
MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engineHostedEngine::(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server
Any suggestions how to resolve this .
regards, Artem
On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote: > > Trying to fix one thing I broke another :( > > I fixed mnt_options for hosted engine storage domain and installed > latest security patches to my hosts and hosted engine. All VM's up and > running, but hosted_engine --vm-status reports about issues: > > [root@ovirt1 ~]# hosted-engine --vm-status > > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt2 > Host ID : 1 > Engine status : unknown stale-data > Score : 0 > stopped : False > Local maintenance : False > crc32 : 193164b8 > local_conf_timestamp : 8350 > Host timestamp : 8350 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=8350 (Fri Jan 12 19:03:54 2018) > host-id=1 > score=0 > vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) > conf_on_shared_storage=True > maintenance=False > state=EngineUnexpectedlyDown > stopped=False > timeout=Thu Jan 1 05:24:43 1970 > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt1.telia.ru > Host ID : 2 > Engine status : unknown stale-data > Score : 0 > stopped : True > Local maintenance : False > crc32 : c7037c03 > local_conf_timestamp : 7530 > Host timestamp : 7530 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=7530 (Fri Jan 12 16:10:12 2018) > host-id=2 > score=0 > vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) > conf_on_shared_storage=True > maintenance=False > state=AgentStopped > stopped=True > [root@ovirt1 ~]# > > > > from second host situation looks a bit different: > > > [root@ovirt2 ~]# hosted-engine --vm-status > > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : ovirt2 > Host ID : 1 > Engine status : {"reason": "vm not running on > this host", "health": "bad", "vm": "down", "detail": "unknown"} > Score : 0 > stopped : False > Local maintenance : False > crc32 : 78eabdb6 > local_conf_timestamp : 8403 > Host timestamp : 8402 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=8402 (Fri Jan 12 19:04:47 2018) > host-id=1 > score=0 > vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) > conf_on_shared_storage=True > maintenance=False > state=EngineUnexpectedlyDown > stopped=False > timeout=Thu Jan 1 05:24:43 1970 > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt1.telia.ru > Host ID : 2 > Engine status : unknown stale-data > Score : 0 > stopped : True > Local maintenance : False > crc32 : c7037c03 > local_conf_timestamp : 7530 > Host timestamp : 7530 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=7530 (Fri Jan 12 16:10:12 2018) > host-id=2 > score=0 > vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) > conf_on_shared_storage=True > maintenance=False > state=AgentStopped > stopped=True > > > WebGUI shows that engine running on host ovirt1. > Gluster looks fine > [root@ovirt1 ~]# gluster volume status engine > Status of volume: engine > Gluster process TCP Port RDMA Port > Online Pid > > ------------------------------------------------------------------------------ > Brick ovirt1.teliaru:/oVirt/engine 49169 0 Y > 3244 > Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y > 20372 > Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y > 16609 > Self-heal Daemon on localhost N/A N/A Y > 117868 > Self-heal Daemon on ovirt2.telia.ru N/A N/A Y > 20521 > Self-heal Daemon on ovirt3 N/A N/A Y > 25093 > > Task Status of Volume engine > > ------------------------------------------------------------------------------ > There are no active volume tasks > > How to resolve this issue? > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Martin, Thanks for feedback. All hosts and hosted-engine running 4.1.8 release. The strange thing : I can see that host ID is set to 1 on both hosts at /etc/ovirt-hosted-engine/hosted-engine.conf file. I have no idea how this happen, the only thing I have changed recently is that I have changed mnt_options in order to add backup-volfile-servers by using hosted-engine --set-shared-config command Both agent and broker are running on second host [root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- vdsm 42331 1 26 14:40 ? 00:31:35 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon vdsm 42332 1 0 14:40 ? 00:00:16 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon but I saw some tracebacks during the broker start [root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago Main PID: 42331 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─42331 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in /rhev/data-center/mnt/glusterSD I have tried to issue hosted-engine --connect-storage on second host followed by agent & broker restart But there is no any visible improvements. Regards, Artem On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi everybody,
there are couple of things to check here.
- what version of hosted engine agent is this? The logs look like coming from 4.1 - what version of engine is used? - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on both hosts, the numbers must be different - it looks like the agent or broker on host 2 is not active (or there would be a report) - the second host does not see data from the first host (unknown stale-data), wait for a minute and check again, then check the storage connection
And then the general troubleshooting:
- put hosted engine in global maintenance mode (and check that it is visible from the other host using he --vm-status) - mount storage domain (hosted-engine --connect-storage) - check sanlock client status to see if proper lockspaces are present
Best regards
Martin Sivak
Why are both hosts reporting as ovirt 1? Look at the hostname fields to see what mean.
-derek Sent using my mobile device. Please excuse any typos.
On January 16, 2018 7:11:09 AM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
Yes, I followed exactly the same procedure while reinstalling the hosts (the only difference that I have SSH key configured instead of the password).
Just reinstalled the second host one more time, after 20 min the host still haven't reached active score of 3400 (Hosted Engine HA:Not
Active) and
I still don't see crown icon for this host.
hosted-engine --vm-status from ovirt1 host
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3f94156a local_conf_timestamp : 349144 Host timestamp : 349144 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349144 (Tue Jan 16 15:03:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
hosted-engine --vm-status output from ovirt2 host
[root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 6d3606f1 local_conf_timestamp : 349264 Host timestamp : 349264 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349264 (Tue Jan 16 15:05:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Also I saw some log messages in webGUI about time drift like
"Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum configured value is 300 seconds." that is a bit weird as haven't touched any time settings since I installed the cluster. both host have the same time and timezone (MSK) but hosted engine lives in UTC timezone. Is it mandatory to have everything in sync and in the same timezone?
Regards, Artem
On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> wrote:
Hello,
I now see that your hosted engine is up and running. Can you let
me
know how did you try reinstalling the host? Below is the procedure which is used and hope you did not miss any step while reinstalling. If no, can you try reinstalling again and see if that works ?
1) Move the host to maintenance 2) click on reinstall 3) provide the password 4) uncheck 'automatically configure host firewall' 5) click on 'Deploy' tab 6) click Hosted Engine deployment as 'Deploy'
And once the host installation is done, wait till the active score of
host shows 3400 in the general tab then check hosted-engine --vm-status.
Thanks kasturi
On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hello,
I have uploaded 2 archives with all relevant logs to shared hosting files from host 1 (which is currently running all VM's including hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK files from second host - https://yadi.sk/d/UBducEsV3RTvhc
I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it gives no effect. I have also tried to shutdown hosted_engine VM, stop ovirt-ha-agent and ovirt-ha-broker services disconnect storage and
connect
it again - no effect as well. Also I tried to reinstall second host from WebGUI - this lead to the interesting situation - now hosted-engine --vm-status shows that both hosts have the same address.
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a7758085 local_conf_timestamp : 259327 Host timestamp : 259327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=259327 (Mon Jan 15 14:06:48 2018) host-id=1 score=3400 vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Gluster seems working fine. all gluster nodes showing connected state.
Any advises on how to resolve this situation are highly appreciated!
Regards, Artem
On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com> wrote:
Hello Artem,
Can you check if glusterd service is running on host1 and all the peers are in connected state ? If yes, can you restart
ovirt-ha-agent
and broker services and check if things are working fine ?
Thanks kasturi
On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote: > > Explored logs on both hosts. > broker.log shows no errors. > > agent.log looking not good: > > on host1 (which running hosted engine) : > > MainThread::ERROR::2018-01-12 > 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent. agent.Agent::(_run_agent) > Traceback (most recent call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/agent/agent.py", > line 191, in _run_agent > return action(he) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/agent/agent.py", > line 64, in action_proper > return he.start_monitoring() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/agent/hosted_engine.py", > line 411, in start_monitoring > self._initialize_sanlock() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/agent/hosted_engine.py", > line 749, in _initialize_sanlock > "Failed to initialize sanlock, the number of errors has" > SanlockInitializationError: Failed to initialize sanlock, the number > of errors has exceeded the limit > > MainThread::ERROR::2018-01-12 > 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent. agent.Agent::(_run_agent) > Trying to restart agent > MainThread::WARNING::2018-01-12 > 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent. agent.Agent::(_run_agent) > Restarting agent, attempt '1' > MainThread::INFO::2018-01-12 > 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha. agenthosted_engine.HostedEngine::(_get_hostname) > Found certificate common name: ovirt1.telia.ru > MainThread::INFO::2018-01-12 > 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_vdsm) > Initializing VDSM > MainThread::INFO::2018-01-12 > 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Connecting the storage > MainThread::INFO::2018-01-12 > 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.
> Validating storage server > MainThread::INFO::2018-01-12 > 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.
> Connecting storage server > MainThread::INFO::2018-01-12 > 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.
> Connecting storage server > MainThread::INFO::2018-01-12 > 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.
> Refreshing the storage domain > MainThread::INFO::2018-01-12 > 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Preparing images > MainThread::INFO::2018-01-12 > 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib. image.Image::(prepare_images) > Preparing images > MainThread::INFO::2018-01-12 > 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Refreshing vm.conf > MainThread::INFO::2018-01-12 > 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(refresh_vm_conf) > Reloading vm.conf from the shared storage domain > MainThread::INFO::2018-01-12 > 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Trying to get a fresher copy of vm configuration from the OVF_STORE > MainThread::INFO::2018-01-12 > 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > Extracting Engine VM OVF from the OVF_STORE > MainThread::INFO::2018-01-12 > 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > OVF_STORE volume path: > /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf > MainThread::INFO::2018-01-12 > 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Found an OVF for HE VM, trying to convert > MainThread::INFO::2018-01-12 > 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Got vm.conf from OVF_STORE > MainThread::INFO::2018-01-12 > 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_broker) > Initializing ha-broker connection > MainThread::INFO::2018-01-12 > 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_monitor) > Starting monitor ping, options {'addr': '80.239.162.97'} > MainThread::INFO::2018-01-12 > 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_monitor) > Success, id 140547104457680 > MainThread::INFO::2018-01-12 > 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_monitor) > Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': > 'ovirtmgmt', 'address': '0'} > MainThread::INFO::2018-01-12 > 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlinkBrokerLink::(start_monitor) > Success, id 140547104458064 > MainThread::INFO::2018-01-12 > 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_monitor) > Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} > MainThread::INFO::2018-01-12 > 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_monitor) > Success, id 140547104458448 > MainThread::INFO::2018-01-12 > 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlinkBrokerLink::(start_monitor) > Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': > 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} > MainThread::INFO::2018-01-12 > 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_monitor) > Success, id 140547104457552 > MainThread::INFO::2018-01-12 > 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_monitor) > Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': > 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} > MainThread::INFO::2018-01-12 > 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_monitor) > Success, id 140547104459792 > MainThread::INFO::2018-01-12 > 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(set_storage_domain) > Success, id 140546772847056 > MainThread::INFO::2018-01-12 > 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_broker) > Broker initialized, all submonitors started > MainThread::INFO::2018-01-12 > 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_sanlock) > Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: > /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186- 23b404e50769) > MainThread::INFO::2018-01-12 > 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_sanlock) > Failed to acquire the lock. Waiting '5's before the next attempt > MainThread::INFO::2018-01-12 > 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_sanlock) > Failed to acquire the lock. Waiting '5's before the next attempt > MainThread::INFO::2018-01-12 > 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_sanlock) > Failed to acquire the lock. Waiting '5's before the next attempt > MainThread::INFO::2018-01-12 > 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_sanlock) > Failed to acquire the lock. Waiting '5's before the next attempt > > > agent.log from second host > > MainThread::INFO::2018-01-12 > 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Connecting the storage > MainThread::INFO::2018-01-12 > 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.
> Validating storage server > MainThread::INFO::2018-01-12 > 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Storage domain reported as valid and reconnect is not forced. > MainThread::INFO::2018-01-12 > 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(start_monitoring) > Current state EngineUnexpectedlyDown (score: 0) > MainThread::INFO::2018-01-12 > 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(refresh_vm_conf) > Reloading vm.conf from the shared storage domain > MainThread::INFO::2018-01-12 > 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Trying to get a fresher copy of vm configuration from the OVF_STORE > MainThread::INFO::2018-01-12 > 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > Extracting Engine VM OVF from the OVF_STORE > MainThread::INFO::2018-01-12 > 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > OVF_STORE volume path: > /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf > MainThread::INFO::2018-01-12 > 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Found an OVF for HE VM, trying to convert > MainThread::INFO::2018-01-12 > 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Got vm.conf from OVF_STORE > MainThread::INFO::2018-01-12 > 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_vdsm) > Initializing VDSM > MainThread::INFO::2018-01-12 > 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Connecting the storage > MainThread::INFO::2018-01-12 > 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.
> Validating storage server > MainThread::INFO::2018-01-12 > 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Storage domain reported as valid and reconnect is not forced. > MainThread::INFO::2018-01-12 > 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine::(score) > Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 > MainThread::INFO::2018-01-12 > 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(start_monitoring) > Current state EngineUnexpectedlyDown (score: 0) > MainThread::INFO::2018-01-12 > 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(refresh_vm_conf) > Reloading vm.conf from the shared storage domain > MainThread::INFO::2018-01-12 > 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Trying to get a fresher copy of vm configuration from the OVF_STORE > MainThread::INFO::2018-01-12 > 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > Extracting Engine VM OVF from the OVF_STORE > MainThread::INFO::2018-01-12 > 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > OVF_STORE volume path: > /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf > MainThread::INFO::2018-01-12 > 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Found an OVF for HE VM, trying to convert > MainThread::INFO::2018-01-12 > 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Got vm.conf from OVF_STORE > MainThread::INFO::2018-01-12 > 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_vdsm) > Initializing VDSM > MainThread::INFO::2018-01-12 > 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Connecting the storage > MainThread::INFO::2018-01-12 > 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.
> Validating storage server > MainThread::INFO::2018-01-12 > 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Storage domain reported as valid and reconnect is not forced. > MainThread::INFO::2018-01-12 > 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine::(score) > Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 > MainThread::INFO::2018-01-12 > 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(start_monitoring) > Current state EngineUnexpectedlyDown (score: 0) > MainThread::INFO::2018-01-12 > 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(refresh_vm_conf) > Reloading vm.conf from the shared storage domain > MainThread::INFO::2018-01-12 > 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Trying to get a fresher copy of vm configuration from the OVF_STORE > MainThread::INFO::2018-01-12 > 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > Extracting Engine VM OVF from the OVF_STORE > MainThread::INFO::2018-01-12 > 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(getEngineVMOVF) > OVF_STORE volume path: > /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016- 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf > MainThread::INFO::2018-01-12 > 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Found an OVF for HE VM, trying to convert > MainThread::INFO::2018-01-12 > 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Got vm.conf from OVF_STORE > MainThread::INFO::2018-01-12 > 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine::(consume) > Engine down, local host does not have best score > MainThread::INFO::2018-01-12 > 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha. agent.hosted_engineHostedEngine::(_initialize_vdsm) > Initializing VDSM > MainThread::INFO::2018-01-12 > 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha. agent.hosted_engine.HostedEngine::(_initialize_storage_images) > Connecting the storage > MainThread::INFO::2018-01-12 > 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.
On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek@ihtfp.com> wrote: the lib.storage_server.StorageServer::(validate_storage_server) libstorage_server.StorageServer::(connect_storage_server) lib.storage_server.StorageServer::(connect_storage_server) lib.storage_server.StorageServer::(connect_storage_server) lib.storage_server.StorageServer::(validate_storage_server) lib.storage_server.StorageServer::(validate_storage_server) lib.storage_server.StorageServer::(validate_storage_server) lib.storage_server.StorageServer::(validate_storage_server)
> Validating storage server > > > Any suggestions how to resolve this . > > regards, > Artem > > > On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy > <artem.tambovskiy@gmail.com> wrote: >> >> Trying to fix one thing I broke another :( >> >> I fixed mnt_options for hosted engine storage domain and installed >> latest security patches to my hosts and hosted engine. All VM's up and >> running, but hosted_engine --vm-status reports about issues: >> >> [root@ovirt1 ~]# hosted-engine --vm-status >> >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt2 >> Host ID : 1 >> Engine status : unknown stale-data >> Score : 0 >> stopped : False >> Local maintenance : False >> crc32 : 193164b8 >> local_conf_timestamp : 8350 >> Host timestamp : 8350 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=8350 (Fri Jan 12 19:03:54 2018) >> host-id=1 >> score=0 >> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUnexpectedlyDown >> stopped=False >> timeout=Thu Jan 1 05:24:43 1970 >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped : True >> Local maintenance : False >> crc32 : c7037c03 >> local_conf_timestamp : 7530 >> Host timestamp : 7530 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=7530 (Fri Jan 12 16:10:12 2018) >> host-id=2 >> score=0 >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> [root@ovirt1 ~]# >> >> >> >> from second host situation looks a bit different: >> >> >> [root@ovirt2 ~]# hosted-engine --vm-status >> >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : True >> Hostname : ovirt2 >> Host ID : 1 >> Engine status : {"reason": "vm not running on >> this host", "health": "bad", "vm": "down", "detail": "unknown"} >> Score : 0 >> stopped : False >> Local maintenance : False >> crc32 : 78eabdb6 >> local_conf_timestamp : 8403 >> Host timestamp : 8402 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=8402 (Fri Jan 12 19:04:47 2018) >> host-id=1 >> score=0 >> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUnexpectedlyDown >> stopped=False >> timeout=Thu Jan 1 05:24:43 1970 >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped : True >> Local maintenance : False >> crc32 : c7037c03 >> local_conf_timestamp : 7530 >> Host timestamp : 7530 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=7530 (Fri Jan 12 16:10:12 2018) >> host-id=2 >> score=0 >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> >> >> WebGUI shows that engine running on host ovirt1. >> Gluster looks fine >> [root@ovirt1 ~]# gluster volume status engine >> Status of volume: engine >> Gluster process TCP Port RDMA Port >> Online Pid >> >> ------------------------------------------------------------
>> Brick ovirt1.teliaru:/oVirt/engine 49169 0 Y >> 3244 >> Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y >> 20372 >> Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y >> 16609 >> Self-heal Daemon on localhost N/A N/A Y >> 117868 >> Self-heal Daemon on ovirt2.telia.ru N/A N/A Y >> 20521 >> Self-heal Daemon on ovirt3 N/A N/A Y >> 25093 >> >> Task Status of Volume engine >> >> ------------------------------------------------------------
>> There are no active volume tasks >> >> How to resolve this issue? >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello, Any further suggestions on how to fix the issue and make HA setup working? Can the complete removal of second host (with complete removal ovirt configuration files and packages) from cluster and adding it again solve the issue? Or it might completly ruin the cluster? Regards, Artem 16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" < artem.tambovskiy@gmail.com> написал:
Hi Martin,
Thanks for feedback.
All hosts and hosted-engine running 4.1.8 release. The strange thing : I can see that host ID is set to 1 on both hosts at /etc/ovirt-hosted-engine/hosted-engine.conf file. I have no idea how this happen, the only thing I have changed recently is that I have changed mnt_options in order to add backup-volfile-servers by using hosted-engine --set-shared-config command
Both agent and broker are running on second host
[root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- vdsm 42331 1 26 14:40 ? 00:31:35 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon vdsm 42332 1 0 14:40 ? 00:00:16 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
but I saw some tracebacks during the broker start
[root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago Main PID: 42331 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─42331 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch
.set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain
self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in /rhev/data-center/mnt/glusterSD
I have tried to issue hosted-engine --connect-storage on second host followed by agent & broker restart But there is no any visible improvements.
Regards, Artem
On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi everybody,
there are couple of things to check here.
- what version of hosted engine agent is this? The logs look like coming from 4.1 - what version of engine is used? - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on both hosts, the numbers must be different - it looks like the agent or broker on host 2 is not active (or there would be a report) - the second host does not see data from the first host (unknown stale-data), wait for a minute and check again, then check the storage connection
And then the general troubleshooting:
- put hosted engine in global maintenance mode (and check that it is visible from the other host using he --vm-status) - mount storage domain (hosted-engine --connect-storage) - check sanlock client status to see if proper lockspaces are present
Best regards
Martin Sivak
Why are both hosts reporting as ovirt 1? Look at the hostname fields to see what mean.
-derek Sent using my mobile device. Please excuse any typos.
On January 16, 2018 7:11:09 AM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
Yes, I followed exactly the same procedure while reinstalling the hosts (the only difference that I have SSH key configured instead of the password).
Just reinstalled the second host one more time, after 20 min the host still haven't reached active score of 3400 (Hosted Engine HA:Not
Active) and
I still don't see crown icon for this host.
hosted-engine --vm-status from ovirt1 host
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3f94156a local_conf_timestamp : 349144 Host timestamp : 349144 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349144 (Tue Jan 16 15:03:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
hosted-engine --vm-status output from ovirt2 host
[root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 6d3606f1 local_conf_timestamp : 349264 Host timestamp : 349264 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349264 (Tue Jan 16 15:05:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Also I saw some log messages in webGUI about time drift like
"Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum configured value is 300 seconds." that is a bit weird as haven't touched any time settings since I installed the cluster. both host have the same time and timezone (MSK) but hosted engine
UTC timezone. Is it mandatory to have everything in sync and in the same timezone?
Regards, Artem
On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> wrote:
Hello,
I now see that your hosted engine is up and running. Can you let
me
know how did you try reinstalling the host? Below is the procedure which is used and hope you did not miss any step while reinstalling. If no, can you try reinstalling again and see if that works ?
1) Move the host to maintenance 2) click on reinstall 3) provide the password 4) uncheck 'automatically configure host firewall' 5) click on 'Deploy' tab 6) click Hosted Engine deployment as 'Deploy'
And once the host installation is done, wait till the active score of
On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek@ihtfp.com> wrote: lives in the
host shows 3400 in the general tab then check hosted-engine --vm-status.
Thanks kasturi
On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hello,
I have uploaded 2 archives with all relevant logs to shared hosting files from host 1 (which is currently running all VM's including hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK files from second host - https://yadi.sk/d/UBducEsV3RTvhc
I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but
it
gives no effect. I have also tried to shutdown hosted_engine VM, stop ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect it again - no effect as well. Also I tried to reinstall second host from WebGUI - this lead to the interesting situation - now hosted-engine --vm-status shows that both hosts have the same address.
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a7758085 local_conf_timestamp : 259327 Host timestamp : 259327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=259327 (Mon Jan 15 14:06:48 2018) host-id=1 score=3400 vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Gluster seems working fine. all gluster nodes showing connected state.
Any advises on how to resolve this situation are highly appreciated!
Regards, Artem
On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com> wrote: > > Hello Artem, > > Can you check if glusterd service is running on host1 and all > the peers are in connected state ? If yes, can you restart ovirt-ha-agent > and broker services and check if things are working fine ? > > Thanks > kasturi > > On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy > <artem.tambovskiy@gmail.com> wrote: >> >> Explored logs on both hosts. >> broker.log shows no errors. >> >> agent.log looking not good: >> >> on host1 (which running hosted engine) : >> >> MainThread::ERROR::2018-01-12 >> 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >> Traceback (most recent call last): >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/agent.py", >> line 191, in _run_agent >> return action(he) >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/agent.py", >> line 64, in action_proper >> return he.start_monitoring() >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/hosted_engine.py", >> line 411, in start_monitoring >> self._initialize_sanlock() >> File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/hosted_engine.py", >> line 749, in _initialize_sanlock >> "Failed to initialize sanlock, the number of errors has" >> SanlockInitializationError: Failed to initialize sanlock, the number >> of errors has exceeded the limit >> >> MainThread::ERROR::2018-01-12 >> 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >> Trying to restart agent >> MainThread::WARNING::2018-01-12 >> 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >> Restarting agent, attempt '1' >> MainThread::INFO::2018-01-12 >> 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.age nthosted_engine.HostedEngine::(_get_hostname) >> Found certificate common name: ovirt1.telia.ru >> MainThread::INFO::2018-01-12 >> 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >> Initializing VDSM >> MainThread::INFO::2018-01-12 >> 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Connecting the storage >> MainThread::INFO::2018-01-12 >> 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >> Validating storage server >> MainThread::INFO::2018-01-12 >> 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.li bstorage_server.StorageServer::(connect_storage_server) >> Connecting storage server >> MainThread::INFO::2018-01-12 >> 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) >> Connecting storage server >> MainThread::INFO::2018-01-12 >> 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) >> Refreshing the storage domain >> MainThread::INFO::2018-01-12 >> 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Preparing images >> MainThread::INFO::2018-01-12 >> 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image. Image::(prepare_images) >> Preparing images >> MainThread::INFO::2018-01-12 >> 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Refreshing vm.conf >> MainThread::INFO::2018-01-12 >> 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >> Reloading vm.conf from the shared storage domain >> MainThread::INFO::2018-01-12 >> 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Trying to get a fresher copy of vm configuration from the OVF_STORE >> MainThread::INFO::2018-01-12 >> 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >> Extracting Engine VM OVF from the OVF_STORE >> MainThread::INFO::2018-01-12 >> 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >> OVF_STORE volume path: >> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/ 5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >> MainThread::INFO::2018-01-12 >> 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Found an OVF for HE VM, trying to convert >> MainThread::INFO::2018-01-12 >> 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Got vm.conf from OVF_STORE >> MainThread::INFO::2018-01-12 >> 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_broker) >> Initializing ha-broker connection >> MainThread::INFO::2018-01-12 >> 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >> Starting monitor ping, options {'addr': '80.239.162.97'} >> MainThread::INFO::2018-01-12 >> 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >> Success, id 140547104457680 >> MainThread::INFO::2018-01-12 >> 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >> Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': >> 'ovirtmgmt', 'address': '0'} >> MainThread::INFO::2018-01-12 >> 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlinkBrokerLink::(start_monitor) >> Success, id 140547104458064 >> MainThread::INFO::2018-01-12 >> 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >> Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} >> MainThread::INFO::2018-01-12 >> 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >> Success, id 140547104458448 >> MainThread::INFO::2018-01-12 >> 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlinkBrokerLink::(start_monitor) >> Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': >> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >> MainThread::INFO::2018-01-12 >> 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >> Success, id 140547104457552 >> MainThread::INFO::2018-01-12 >> 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >> Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': >> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >> MainThread::INFO::2018-01-12 >> 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >> Success, id 140547104459792 >> MainThread::INFO::2018-01-12 >> 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(set_storage_domain) >> Success, id 140546772847056 >> MainThread::INFO::2018-01-12 >> 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_broker) >> Broker initialized, all submonitors started >> MainThread::INFO::2018-01-12 >> 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >> Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: >> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/ 093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d- 463e-b186-23b404e50769) >> MainThread::INFO::2018-01-12 >> 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >> Failed to acquire the lock. Waiting '5's before the next attempt >> MainThread::INFO::2018-01-12 >> 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >> Failed to acquire the lock. Waiting '5's before the next attempt >> MainThread::INFO::2018-01-12 >> 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >> Failed to acquire the lock. Waiting '5's before the next attempt >> MainThread::INFO::2018-01-12 >> 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >> Failed to acquire the lock. Waiting '5's before the next attempt >> >> >> agent.log from second host >> >> MainThread::INFO::2018-01-12 >> 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Connecting the storage >> MainThread::INFO::2018-01-12 >> 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >> Validating storage server >> MainThread::INFO::2018-01-12 >> 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Storage domain reported as valid and reconnect is not forced. >> MainThread::INFO::2018-01-12 >> 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >> Current state EngineUnexpectedlyDown (score: 0) >> MainThread::INFO::2018-01-12 >> 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >> Reloading vm.conf from the shared storage domain >> MainThread::INFO::2018-01-12 >> 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Trying to get a fresher copy of vm configuration from the OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >> Extracting Engine VM OVF from the OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >> OVF_STORE volume path: >> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/ 5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >> MainThread::INFO::2018-01-12 >> 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Found an OVF for HE VM, trying to convert >> MainThread::INFO::2018-01-12 >> 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Got vm.conf from OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >> Initializing VDSM >> MainThread::INFO::2018-01-12 >> 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Connecting the storage >> MainThread::INFO::2018-01-12 >> 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >> Validating storage server >> MainThread::INFO::2018-01-12 >> 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Storage domain reported as valid and reconnect is not forced. >> MainThread::INFO::2018-01-12 >> 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score) >> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 >> MainThread::INFO::2018-01-12 >> 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >> Current state EngineUnexpectedlyDown (score: 0) >> MainThread::INFO::2018-01-12 >> 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >> Reloading vm.conf from the shared storage domain >> MainThread::INFO::2018-01-12 >> 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Trying to get a fresher copy of vm configuration from the OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >> Extracting Engine VM OVF from the OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >> OVF_STORE volume path: >> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/ 5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >> MainThread::INFO::2018-01-12 >> 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Found an OVF for HE VM, trying to convert >> MainThread::INFO::2018-01-12 >> 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Got vm.conf from OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >> Initializing VDSM >> MainThread::INFO::2018-01-12 >> 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Connecting the storage >> MainThread::INFO::2018-01-12 >> 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >> Validating storage server >> MainThread::INFO::2018-01-12 >> 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Storage domain reported as valid and reconnect is not forced. >> MainThread::INFO::2018-01-12 >> 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score) >> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 >> MainThread::INFO::2018-01-12 >> 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >> Current state EngineUnexpectedlyDown (score: 0) >> MainThread::INFO::2018-01-12 >> 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >> Reloading vm.conf from the shared storage domain >> MainThread::INFO::2018-01-12 >> 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Trying to get a fresher copy of vm configuration from the OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >> Extracting Engine VM OVF from the OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >> OVF_STORE volume path: >> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/ 5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >> MainThread::INFO::2018-01-12 >> 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Found an OVF for HE VM, trying to convert >> MainThread::INFO::2018-01-12 >> 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >> Got vm.conf from OVF_STORE >> MainThread::INFO::2018-01-12 >> 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(consume) >> Engine down, local host does not have best score >> MainThread::INFO::2018-01-12 >> 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engineHostedEngine::(_initialize_vdsm) >> Initializing VDSM >> MainThread::INFO::2018-01-12 >> 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >> Connecting the storage >> MainThread::INFO::2018-01-12 >> 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >> Validating storage server >> >> >> Any suggestions how to resolve this . >> >> regards, >> Artem >> >> >> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy >> <artem.tambovskiy@gmail.com> wrote: >>> >>> Trying to fix one thing I broke another :( >>> >>> I fixed mnt_options for hosted engine storage domain and installed >>> latest security patches to my hosts and hosted engine. All VM's up and >>> running, but hosted_engine --vm-status reports about issues: >>> >>> [root@ovirt1 ~]# hosted-engine --vm-status >>> >>> >>> --== Host 1 status ==-- >>> >>> conf_on_shared_storage : True >>> Status up-to-date : False >>> Hostname : ovirt2 >>> Host ID : 1 >>> Engine status : unknown stale-data >>> Score : 0 >>> stopped : False >>> Local maintenance : False >>> crc32 : 193164b8 >>> local_conf_timestamp : 8350 >>> Host timestamp : 8350 >>> Extra metadata (valid at timestamp): >>> metadata_parse_version=1 >>> metadata_feature_version=1 >>> timestamp=8350 (Fri Jan 12 19:03:54 2018) >>> host-id=1 >>> score=0 >>> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) >>> conf_on_shared_storage=True >>> maintenance=False >>> state=EngineUnexpectedlyDown >>> stopped=False >>> timeout=Thu Jan 1 05:24:43 1970 >>> >>> >>> --== Host 2 status ==-- >>> >>> conf_on_shared_storage : True >>> Status up-to-date : False >>> Hostname : ovirt1.telia.ru >>> Host ID : 2 >>> Engine status : unknown stale-data >>> Score : 0 >>> stopped : True >>> Local maintenance : False >>> crc32 : c7037c03 >>> local_conf_timestamp : 7530 >>> Host timestamp : 7530 >>> Extra metadata (valid at timestamp): >>> metadata_parse_version=1 >>> metadata_feature_version=1 >>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>> host-id=2 >>> score=0 >>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>> conf_on_shared_storage=True >>> maintenance=False >>> state=AgentStopped >>> stopped=True >>> [root@ovirt1 ~]# >>> >>> >>> >>> from second host situation looks a bit different: >>> >>> >>> [root@ovirt2 ~]# hosted-engine --vm-status >>> >>> >>> --== Host 1 status ==-- >>> >>> conf_on_shared_storage : True >>> Status up-to-date : True >>> Hostname : ovirt2 >>> Host ID : 1 >>> Engine status : {"reason": "vm not running on >>> this host", "health": "bad", "vm": "down", "detail": "unknown"} >>> Score : 0 >>> stopped : False >>> Local maintenance : False >>> crc32 : 78eabdb6 >>> local_conf_timestamp : 8403 >>> Host timestamp : 8402 >>> Extra metadata (valid at timestamp): >>> metadata_parse_version=1 >>> metadata_feature_version=1 >>> timestamp=8402 (Fri Jan 12 19:04:47 2018) >>> host-id=1 >>> score=0 >>> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) >>> conf_on_shared_storage=True >>> maintenance=False >>> state=EngineUnexpectedlyDown >>> stopped=False >>> timeout=Thu Jan 1 05:24:43 1970 >>> >>> >>> --== Host 2 status ==-- >>> >>> conf_on_shared_storage : True >>> Status up-to-date : False >>> Hostname : ovirt1.telia.ru >>> Host ID : 2 >>> Engine status : unknown stale-data >>> Score : 0 >>> stopped : True >>> Local maintenance : False >>> crc32 : c7037c03 >>> local_conf_timestamp : 7530 >>> Host timestamp : 7530 >>> Extra metadata (valid at timestamp): >>> metadata_parse_version=1 >>> metadata_feature_version=1 >>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>> host-id=2 >>> score=0 >>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>> conf_on_shared_storage=True >>> maintenance=False >>> state=AgentStopped >>> stopped=True >>> >>> >>> WebGUI shows that engine running on host ovirt1. >>> Gluster looks fine >>> [root@ovirt1 ~]# gluster volume status engine >>> Status of volume: engine >>> Gluster process TCP Port RDMA Port >>> Online Pid >>> >>> ------------------------------------------------------------
>>> Brick ovirt1.teliaru:/oVirt/engine 49169 0 Y >>> 3244 >>> Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y >>> 20372 >>> Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y >>> 16609 >>> Self-heal Daemon on localhost N/A N/A Y >>> 117868 >>> Self-heal Daemon on ovirt2.telia.ru N/A N/A Y >>> 20521 >>> Self-heal Daemon on ovirt3 N/A N/A Y >>> 25093 >>> >>> Task Status of Volume engine >>> >>> ------------------------------------------------------------
>>> There are no active volume tasks >>> >>> How to resolve this issue? >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi, Ok, i decided to remove second host from the cluster. I reinstalled from webUI it with hosted-engine action UNDEPLOY, and removed it from the cluster aftewards. All VM's are fine hosted engine running ok, But hosted-engine --vm-status still showing 2 hosts. How I can clean the traces of second host in a correct way? --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1b1b6f6d local_conf_timestamp : 545385 Host timestamp : 545385 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=545385 (Thu Jan 18 21:34:25 2018) host-id=1 score=3400 vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True !! Cluster is in GLOBAL MAINTENANCE mode !! Thank you in advance! Regards, Artem On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
Any further suggestions on how to fix the issue and make HA setup working? Can the complete removal of second host (with complete removal ovirt configuration files and packages) from cluster and adding it again solve the issue? Or it might completly ruin the cluster?
Regards, Artem
16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" < artem.tambovskiy@gmail.com> написал:
Hi Martin,
Thanks for feedback.
All hosts and hosted-engine running 4.1.8 release. The strange thing : I can see that host ID is set to 1 on both hosts at /etc/ovirt-hosted-engine/hosted-engine.conf file. I have no idea how this happen, the only thing I have changed recently is that I have changed mnt_options in order to add backup-volfile-servers by using hosted-engine --set-shared-config command
Both agent and broker are running on second host
[root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- vdsm 42331 1 26 14:40 ? 00:31:35 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon vdsm 42332 1 0 14:40 ? 00:00:16 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
but I saw some tracebacks during the broker start
[root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago Main PID: 42331 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─42331 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch
.set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain
self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect
self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in /rhev/data-center/mnt/glusterSD
I have tried to issue hosted-engine --connect-storage on second host followed by agent & broker restart But there is no any visible improvements.
Regards, Artem
On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi everybody,
there are couple of things to check here.
- what version of hosted engine agent is this? The logs look like coming from 4.1 - what version of engine is used? - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on both hosts, the numbers must be different - it looks like the agent or broker on host 2 is not active (or there would be a report) - the second host does not see data from the first host (unknown stale-data), wait for a minute and check again, then check the storage connection
And then the general troubleshooting:
- put hosted engine in global maintenance mode (and check that it is visible from the other host using he --vm-status) - mount storage domain (hosted-engine --connect-storage) - check sanlock client status to see if proper lockspaces are present
Best regards
Martin Sivak
Why are both hosts reporting as ovirt 1? Look at the hostname fields to see what mean.
-derek Sent using my mobile device. Please excuse any typos.
On January 16, 2018 7:11:09 AM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
Yes, I followed exactly the same procedure while reinstalling the
hosts
(the only difference that I have SSH key configured instead of the password).
Just reinstalled the second host one more time, after 20 min the host still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and I still don't see crown icon for this host.
hosted-engine --vm-status from ovirt1 host
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3f94156a local_conf_timestamp : 349144 Host timestamp : 349144 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349144 (Tue Jan 16 15:03:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
hosted-engine --vm-status output from ovirt2 host
[root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 6d3606f1 local_conf_timestamp : 349264 Host timestamp : 349264 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349264 (Tue Jan 16 15:05:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Also I saw some log messages in webGUI about time drift like
"Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum configured value is 300 seconds." that is a bit weird as haven't touched any time settings since I installed the cluster. both host have the same time and timezone (MSK) but hosted engine
On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek@ihtfp.com> wrote: lives in
UTC timezone. Is it mandatory to have everything in sync and in the same timezone?
Regards, Artem
On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> wrote:
Hello,
I now see that your hosted engine is up and running. Can you
let me
know how did you try reinstalling the host? Below is the procedure which is used and hope you did not miss any step while reinstalling. If no, can you try reinstalling again and see if that works ?
1) Move the host to maintenance 2) click on reinstall 3) provide the password 4) uncheck 'automatically configure host firewall' 5) click on 'Deploy' tab 6) click Hosted Engine deployment as 'Deploy'
And once the host installation is done, wait till the active score of the host shows 3400 in the general tab then check hosted-engine --vm-status.
Thanks kasturi
On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote: > > Hello, > > I have uploaded 2 archives with all relevant logs to shared hosting > files from host 1 (which is currently running all VM's including > hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK > files from second host - https://yadi.sk/d/UBducEsV3RTvhc > > I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it > gives no effect. I have also tried to shutdown hosted_engine VM, stop > ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect > it again - no effect as well. > Also I tried to reinstall second host from WebGUI - this lead to the > interesting situation - now hosted-engine --vm-status shows that both > hosts have the same address. > > [root@ovirt1 ~]# hosted-engine --vm-status > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : ovirt1.telia.ru > Host ID : 1 > Engine status : {"health": "good", "vm": "up", > "detail": "up"} > Score : 3400 > stopped : False > Local maintenance : False > crc32 : a7758085 > local_conf_timestamp : 259327 > Host timestamp : 259327 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=259327 (Mon Jan 15 14:06:48 2018) > host-id=1 > score=3400 > vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) > conf_on_shared_storage=True > maintenance=False > state=EngineUp > stopped=False > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt1.telia.ru > Host ID : 2 > Engine status : unknown stale-data > Score : 0 > stopped : True > Local maintenance : False > crc32 : c7037c03 > local_conf_timestamp : 7530 > Host timestamp : 7530 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=7530 (Fri Jan 12 16:10:12 2018) > host-id=2 > score=0 > vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) > conf_on_shared_storage=True > maintenance=False > state=AgentStopped > stopped=True > > Gluster seems working fine. all gluster nodes showing connected state. > > Any advises on how to resolve this situation are highly appreciated! > > Regards, > Artem > > > On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com> > wrote: >> >> Hello Artem, >> >> Can you check if glusterd service is running on host1 and all >> the peers are in connected state ? If yes, can you restart ovirt-ha-agent >> and broker services and check if things are working fine ? >> >> Thanks >> kasturi >> >> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy >> <artem.tambovskiy@gmail.com> wrote: >>> >>> Explored logs on both hosts. >>> broker.log shows no errors. >>> >>> agent.log looking not good: >>> >>> on host1 (which running hosted engine) : >>> >>> MainThread::ERROR::2018-01-12 >>> 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>> Traceback (most recent call last): >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/agent.py", >>> line 191, in _run_agent >>> return action(he) >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/agent.py", >>> line 64, in action_proper >>> return he.start_monitoring() >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/hosted_engine.py", >>> line 411, in start_monitoring >>> self._initialize_sanlock() >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/hosted_engine.py", >>> line 749, in _initialize_sanlock >>> "Failed to initialize sanlock, the number of errors has" >>> SanlockInitializationError: Failed to initialize sanlock, the number >>> of errors has exceeded the limit >>> >>> MainThread::ERROR::2018-01-12 >>> 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>> Trying to restart agent >>> MainThread::WARNING::2018-01-12 >>> 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>> Restarting agent, attempt '1' >>> MainThread::INFO::2018-01-12 >>> 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.age nthosted_engine.HostedEngine::(_get_hostname) >>> Found certificate common name: ovirt1.telia.ru >>> MainThread::INFO::2018-01-12 >>> 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>> Initializing VDSM >>> MainThread::INFO::2018-01-12 >>> 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Connecting the storage >>> MainThread::INFO::2018-01-12 >>> 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>> Validating storage server >>> MainThread::INFO::2018-01-12 >>> 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.li bstorage_server.StorageServer::(connect_storage_server) >>> Connecting storage server >>> MainThread::INFO::2018-01-12 >>> 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) >>> Connecting storage server >>> MainThread::INFO::2018-01-12 >>> 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) >>> Refreshing the storage domain >>> MainThread::INFO::2018-01-12 >>> 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Preparing images >>> MainThread::INFO::2018-01-12 >>> 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.I mage::(prepare_images) >>> Preparing images >>> MainThread::INFO::2018-01-12 >>> 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Refreshing vm.conf >>> MainThread::INFO::2018-01-12 >>> 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>> Reloading vm.conf from the shared storage domain >>> MainThread::INFO::2018-01-12 >>> 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>> Extracting Engine VM OVF from the OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>> OVF_STORE volume path: >>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>> MainThread::INFO::2018-01-12 >>> 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Found an OVF for HE VM, trying to convert >>> MainThread::INFO::2018-01-12 >>> 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Got vm.conf from OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_broker) >>> Initializing ha-broker connection >>> MainThread::INFO::2018-01-12 >>> 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>> Starting monitor ping, options {'addr': '80.239.162.97'} >>> MainThread::INFO::2018-01-12 >>> 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>> Success, id 140547104457680 >>> MainThread::INFO::2018-01-12 >>> 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>> Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': >>> 'ovirtmgmt', 'address': '0'} >>> MainThread::INFO::2018-01-12 >>> 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlinkBrokerLink::(start_monitor) >>> Success, id 140547104458064 >>> MainThread::INFO::2018-01-12 >>> 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>> Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} >>> MainThread::INFO::2018-01-12 >>> 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>> Success, id 140547104458448 >>> MainThread::INFO::2018-01-12 >>> 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlinkBrokerLink::(start_monitor) >>> Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': >>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>> MainThread::INFO::2018-01-12 >>> 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>> Success, id 140547104457552 >>> MainThread::INFO::2018-01-12 >>> 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>> Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': >>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>> MainThread::INFO::2018-01-12 >>> 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>> Success, id 140547104459792 >>> MainThread::INFO::2018-01-12 >>> 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(set_storage_domain) >>> Success, id 140546772847056 >>> MainThread::INFO::2018-01-12 >>> 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_broker) >>> Broker initialized, all submonitors started >>> MainThread::INFO::2018-01-12 >>> 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>> Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: >>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/0 93faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186- 23b404e50769) >>> MainThread::INFO::2018-01-12 >>> 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>> Failed to acquire the lock. Waiting '5's before the next attempt >>> MainThread::INFO::2018-01-12 >>> 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>> Failed to acquire the lock. Waiting '5's before the next attempt >>> MainThread::INFO::2018-01-12 >>> 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>> Failed to acquire the lock. Waiting '5's before the next attempt >>> MainThread::INFO::2018-01-12 >>> 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>> Failed to acquire the lock. Waiting '5's before the next attempt >>> >>> >>> agent.log from second host >>> >>> MainThread::INFO::2018-01-12 >>> 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Connecting the storage >>> MainThread::INFO::2018-01-12 >>> 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>> Validating storage server >>> MainThread::INFO::2018-01-12 >>> 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Storage domain reported as valid and reconnect is not forced. >>> MainThread::INFO::2018-01-12 >>> 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>> Current state EngineUnexpectedlyDown (score: 0) >>> MainThread::INFO::2018-01-12 >>> 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>> Reloading vm.conf from the shared storage domain >>> MainThread::INFO::2018-01-12 >>> 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>> Extracting Engine VM OVF from the OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>> OVF_STORE volume path: >>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>> MainThread::INFO::2018-01-12 >>> 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Found an OVF for HE VM, trying to convert >>> MainThread::INFO::2018-01-12 >>> 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Got vm.conf from OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>> Initializing VDSM >>> MainThread::INFO::2018-01-12 >>> 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Connecting the storage >>> MainThread::INFO::2018-01-12 >>> 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>> Validating storage server >>> MainThread::INFO::2018-01-12 >>> 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Storage domain reported as valid and reconnect is not forced. >>> MainThread::INFO::2018-01-12 >>> 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score) >>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 >>> MainThread::INFO::2018-01-12 >>> 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>> Current state EngineUnexpectedlyDown (score: 0) >>> MainThread::INFO::2018-01-12 >>> 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>> Reloading vm.conf from the shared storage domain >>> MainThread::INFO::2018-01-12 >>> 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>> Extracting Engine VM OVF from the OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>> OVF_STORE volume path: >>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>> MainThread::INFO::2018-01-12 >>> 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Found an OVF for HE VM, trying to convert >>> MainThread::INFO::2018-01-12 >>> 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Got vm.conf from OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>> Initializing VDSM >>> MainThread::INFO::2018-01-12 >>> 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Connecting the storage >>> MainThread::INFO::2018-01-12 >>> 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>> Validating storage server >>> MainThread::INFO::2018-01-12 >>> 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Storage domain reported as valid and reconnect is not forced. >>> MainThread::INFO::2018-01-12 >>> 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score) >>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 >>> MainThread::INFO::2018-01-12 >>> 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>> Current state EngineUnexpectedlyDown (score: 0) >>> MainThread::INFO::2018-01-12 >>> 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>> Reloading vm.conf from the shared storage domain >>> MainThread::INFO::2018-01-12 >>> 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>> Extracting Engine VM OVF from the OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>> OVF_STORE volume path: >>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>> MainThread::INFO::2018-01-12 >>> 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Found an OVF for HE VM, trying to convert >>> MainThread::INFO::2018-01-12 >>> 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>> Got vm.conf from OVF_STORE >>> MainThread::INFO::2018-01-12 >>> 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(consume) >>> Engine down, local host does not have best score >>> MainThread::INFO::2018-01-12 >>> 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engineHostedEngine::(_initialize_vdsm) >>> Initializing VDSM >>> MainThread::INFO::2018-01-12 >>> 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>> Connecting the storage >>> MainThread::INFO::2018-01-12 >>> 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>> Validating storage server >>> >>> >>> Any suggestions how to resolve this . >>> >>> regards, >>> Artem >>> >>> >>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy >>> <artem.tambovskiy@gmail.com> wrote: >>>> >>>> Trying to fix one thing I broke another :( >>>> >>>> I fixed mnt_options for hosted engine storage domain and installed >>>> latest security patches to my hosts and hosted engine. All VM's up and >>>> running, but hosted_engine --vm-status reports about issues: >>>> >>>> [root@ovirt1 ~]# hosted-engine --vm-status >>>> >>>> >>>> --== Host 1 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : False >>>> Hostname : ovirt2 >>>> Host ID : 1 >>>> Engine status : unknown stale-data >>>> Score : 0 >>>> stopped : False >>>> Local maintenance : False >>>> crc32 : 193164b8 >>>> local_conf_timestamp : 8350 >>>> Host timestamp : 8350 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=8350 (Fri Jan 12 19:03:54 2018) >>>> host-id=1 >>>> score=0 >>>> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=EngineUnexpectedlyDown >>>> stopped=False >>>> timeout=Thu Jan 1 05:24:43 1970 >>>> >>>> >>>> --== Host 2 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : False >>>> Hostname : ovirt1.telia.ru >>>> Host ID : 2 >>>> Engine status : unknown stale-data >>>> Score : 0 >>>> stopped : True >>>> Local maintenance : False >>>> crc32 : c7037c03 >>>> local_conf_timestamp : 7530 >>>> Host timestamp : 7530 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>> host-id=2 >>>> score=0 >>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=AgentStopped >>>> stopped=True >>>> [root@ovirt1 ~]# >>>> >>>> >>>> >>>> from second host situation looks a bit different: >>>> >>>> >>>> [root@ovirt2 ~]# hosted-engine --vm-status >>>> >>>> >>>> --== Host 1 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : True >>>> Hostname : ovirt2 >>>> Host ID : 1 >>>> Engine status : {"reason": "vm not running on >>>> this host", "health": "bad", "vm": "down", "detail": "unknown"} >>>> Score : 0 >>>> stopped : False >>>> Local maintenance : False >>>> crc32 : 78eabdb6 >>>> local_conf_timestamp : 8403 >>>> Host timestamp : 8402 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=8402 (Fri Jan 12 19:04:47 2018) >>>> host-id=1 >>>> score=0 >>>> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=EngineUnexpectedlyDown >>>> stopped=False >>>> timeout=Thu Jan 1 05:24:43 1970 >>>> >>>> >>>> --== Host 2 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : False >>>> Hostname : ovirt1.telia.ru >>>> Host ID : 2 >>>> Engine status : unknown stale-data >>>> Score : 0 >>>> stopped : True >>>> Local maintenance : False >>>> crc32 : c7037c03 >>>> local_conf_timestamp : 7530 >>>> Host timestamp : 7530 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>> host-id=2 >>>> score=0 >>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=AgentStopped >>>> stopped=True >>>> >>>> >>>> WebGUI shows that engine running on host ovirt1. >>>> Gluster looks fine >>>> [root@ovirt1 ~]# gluster volume status engine >>>> Status of volume: engine >>>> Gluster process TCP Port RDMA Port >>>> Online Pid >>>> >>>> ------------------------------------------------------------
>>>> Brick ovirt1.teliaru:/oVirt/engine 49169 0 Y >>>> 3244 >>>> Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y >>>> 20372 >>>> Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y >>>> 16609 >>>> Self-heal Daemon on localhost N/A N/A Y >>>> 117868 >>>> Self-heal Daemon on ovirt2.telia.ru N/A N/A Y >>>> 20521 >>>> Self-heal Daemon on ovirt3 N/A N/A Y >>>> 25093 >>>> >>>> Task Status of Volume engine >>>> >>>> ------------------------------------------------------------
>>>> There are no active volume tasks >>>> >>>> How to resolve this issue? >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >> >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello Artem, Any reason why you chose hosted-engine undeploy action for the second host ? I see that the cluster is in global maintenance mode, was this intended ? command to clear the entries from hosted-engine --vm-status is "hosted-engine --clean-metadata --host-id=<old_host_id> --force-clean" Hope this helps !! Thanks kasturi On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hi,
Ok, i decided to remove second host from the cluster. I reinstalled from webUI it with hosted-engine action UNDEPLOY, and removed it from the cluster aftewards. All VM's are fine hosted engine running ok, But hosted-engine --vm-status still showing 2 hosts.
How I can clean the traces of second host in a correct way?
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1b1b6f6d local_conf_timestamp : 545385 Host timestamp : 545385 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=545385 (Thu Jan 18 21:34:25 2018) host-id=1 score=3400 vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
!! Cluster is in GLOBAL MAINTENANCE mode !!
Thank you in advance! Regards, Artem
On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
Any further suggestions on how to fix the issue and make HA setup working? Can the complete removal of second host (with complete removal ovirt configuration files and packages) from cluster and adding it again solve the issue? Or it might completly ruin the cluster?
Regards, Artem
16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" < artem.tambovskiy@gmail.com> написал:
Hi Martin,
Thanks for feedback.
All hosts and hosted-engine running 4.1.8 release. The strange thing : I can see that host ID is set to 1 on both hosts at /etc/ovirt-hosted-engine/hosted-engine.conf file. I have no idea how this happen, the only thing I have changed recently is that I have changed mnt_options in order to add backup-volfile-servers by using hosted-engine --set-shared-config command
Both agent and broker are running on second host
[root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- vdsm 42331 1 26 14:40 ? 00:31:35 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon vdsm 42332 1 0 14:40 ? 00:00:16 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
but I saw some tracebacks during the broker start
[root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago Main PID: 42331 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─42331 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch
.set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain
self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect
self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in /rhev/data-center/mnt/glusterSD
I have tried to issue hosted-engine --connect-storage on second host followed by agent & broker restart But there is no any visible improvements.
Regards, Artem
On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi everybody,
there are couple of things to check here.
- what version of hosted engine agent is this? The logs look like coming from 4.1 - what version of engine is used? - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on both hosts, the numbers must be different - it looks like the agent or broker on host 2 is not active (or there would be a report) - the second host does not see data from the first host (unknown stale-data), wait for a minute and check again, then check the storage connection
And then the general troubleshooting:
- put hosted engine in global maintenance mode (and check that it is visible from the other host using he --vm-status) - mount storage domain (hosted-engine --connect-storage) - check sanlock client status to see if proper lockspaces are present
Best regards
Martin Sivak
Why are both hosts reporting as ovirt 1? Look at the hostname fields to see what mean.
-derek Sent using my mobile device. Please excuse any typos.
On January 16, 2018 7:11:09 AM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
Yes, I followed exactly the same procedure while reinstalling the
hosts
(the only difference that I have SSH key configured instead of the password).
Just reinstalled the second host one more time, after 20 min the host still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and I still don't see crown icon for this host.
hosted-engine --vm-status from ovirt1 host
[root@ovirt1 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 3f94156a local_conf_timestamp : 349144 Host timestamp : 349144 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349144 (Tue Jan 16 15:03:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
hosted-engine --vm-status output from ovirt2 host
[root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 6d3606f1 local_conf_timestamp : 349264 Host timestamp : 349264 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=349264 (Tue Jan 16 15:05:45 2018) host-id=1 score=3400 vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
Also I saw some log messages in webGUI about time drift like
"Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum configured value is 300 seconds." that is a bit weird as haven't touched any time settings since I installed the cluster. both host have the same time and timezone (MSK) but hosted engine
UTC timezone. Is it mandatory to have everything in sync and in the same timezone?
Regards, Artem
On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> wrote: > > Hello, > > I now see that your hosted engine is up and running. Can you let me > know how did you try reinstalling the host? Below is the procedure which is > used and hope you did not miss any step while reinstalling. If no, can you > try reinstalling again and see if that works ? > > 1) Move the host to maintenance > 2) click on reinstall > 3) provide the password > 4) uncheck 'automatically configure host firewall' > 5) click on 'Deploy' tab > 6) click Hosted Engine deployment as 'Deploy' > > And once the host installation is done, wait till the active score of the > host shows 3400 in the general tab then check hosted-engine --vm-status. > > Thanks > kasturi > > On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy > <artem.tambovskiy@gmail.com> wrote: >> >> Hello, >> >> I have uploaded 2 archives with all relevant logs to shared hosting >> files from host 1 (which is currently running all VM's including >> hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK >> files from second host - https://yadi.sk/d/UBducEsV3RTvhc >> >> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it >> gives no effect. I have also tried to shutdown hosted_engine VM, stop >> ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect >> it again - no effect as well. >> Also I tried to reinstall second host from WebGUI - this lead to
On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek@ihtfp.com> wrote: lives in the
>> interesting situation - now hosted-engine --vm-status shows that both >> hosts have the same address. >> >> [root@ovirt1 ~]# hosted-engine --vm-status >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : True >> Hostname : ovirt1.telia.ru >> Host ID : 1 >> Engine status : {"health": "good", "vm": "up", >> "detail": "up"} >> Score : 3400 >> stopped : False >> Local maintenance : False >> crc32 : a7758085 >> local_conf_timestamp : 259327 >> Host timestamp : 259327 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=259327 (Mon Jan 15 14:06:48 2018) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUp >> stopped=False >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped : True >> Local maintenance : False >> crc32 : c7037c03 >> local_conf_timestamp : 7530 >> Host timestamp : 7530 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=7530 (Fri Jan 12 16:10:12 2018) >> host-id=2 >> score=0 >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> >> Gluster seems working fine. all gluster nodes showing connected state. >> >> Any advises on how to resolve this situation are highly appreciated! >> >> Regards, >> Artem >> >> >> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra@redhat.com
>> wrote: >>> >>> Hello Artem, >>> >>> Can you check if glusterd service is running on host1 and all >>> the peers are in connected state ? If yes, can you restart ovirt-ha-agent >>> and broker services and check if things are working fine ? >>> >>> Thanks >>> kasturi >>> >>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy >>> <artem.tambovskiy@gmail.com> wrote: >>>> >>>> Explored logs on both hosts. >>>> broker.log shows no errors. >>>> >>>> agent.log looking not good: >>>> >>>> on host1 (which running hosted engine) : >>>> >>>> MainThread::ERROR::2018-01-12 >>>> 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>>> Traceback (most recent call last): >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/agent.py", >>>> line 191, in _run_agent >>>> return action(he) >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/agent.py", >>>> line 64, in action_proper >>>> return he.start_monitoring() >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/hosted_engine.py", >>>> line 411, in start_monitoring >>>> self._initialize_sanlock() >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/hosted_engine.py", >>>> line 749, in _initialize_sanlock >>>> "Failed to initialize sanlock, the number of errors has" >>>> SanlockInitializationError: Failed to initialize sanlock, the number >>>> of errors has exceeded the limit >>>> >>>> MainThread::ERROR::2018-01-12 >>>> 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>>> Trying to restart agent >>>> MainThread::WARNING::2018-01-12 >>>> 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>>> Restarting agent, attempt '1' >>>> MainThread::INFO::2018-01-12 >>>> 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.age nthosted_engine.HostedEngine::(_get_hostname) >>>> Found certificate common name: ovirt1.telia.ru >>>> MainThread::INFO::2018-01-12 >>>> 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>>> Initializing VDSM >>>> MainThread::INFO::2018-01-12 >>>> 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Connecting the storage >>>> MainThread::INFO::2018-01-12 >>>> 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>> Validating storage server >>>> MainThread::INFO::2018-01-12 >>>> 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.li bstorage_server.StorageServer::(connect_storage_server) >>>> Connecting storage server >>>> MainThread::INFO::2018-01-12 >>>> 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) >>>> Connecting storage server >>>> MainThread::INFO::2018-01-12 >>>> 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) >>>> Refreshing the storage domain >>>> MainThread::INFO::2018-01-12 >>>> 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Preparing images >>>> MainThread::INFO::2018-01-12 >>>> 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.I mage::(prepare_images) >>>> Preparing images >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Refreshing vm.conf >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>>> Reloading vm.conf from the shared storage domain >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>> Extracting Engine VM OVF from the OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>> OVF_STORE volume path: >>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Found an OVF for HE VM, trying to convert >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Got vm.conf from OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_broker) >>>> Initializing ha-broker connection >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>> Starting monitor ping, options {'addr': '80.239.162.97'} >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>> Success, id 140547104457680 >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>> Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': >>>> 'ovirtmgmt', 'address': '0'} >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlinkBrokerLink::(start_monitor) >>>> Success, id 140547104458064 >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>> Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>> Success, id 140547104458448 >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlinkBrokerLink::(start_monitor) >>>> Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': >>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>> Success, id 140547104457552 >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>> Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': >>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>>> MainThread::INFO::2018-01-12 >>>> 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>> Success, id 140547104459792 >>>> MainThread::INFO::2018-01-12 >>>> 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(set_storage_domain) >>>> Success, id 140546772847056 >>>> MainThread::INFO::2018-01-12 >>>> 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_broker) >>>> Broker initialized, all submonitors started >>>> MainThread::INFO::2018-01-12 >>>> 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>> Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: >>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/0 93faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186- 23b404e50769) >>>> MainThread::INFO::2018-01-12 >>>> 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>> MainThread::INFO::2018-01-12 >>>> 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>> MainThread::INFO::2018-01-12 >>>> 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>> MainThread::INFO::2018-01-12 >>>> 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>> >>>> >>>> agent.log from second host >>>> >>>> MainThread::INFO::2018-01-12 >>>> 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Connecting the storage >>>> MainThread::INFO::2018-01-12 >>>> 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>> Validating storage server >>>> MainThread::INFO::2018-01-12 >>>> 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Storage domain reported as valid and reconnect is not forced. >>>> MainThread::INFO::2018-01-12 >>>> 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>>> Current state EngineUnexpectedlyDown (score: 0) >>>> MainThread::INFO::2018-01-12 >>>> 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>>> Reloading vm.conf from the shared storage domain >>>> MainThread::INFO::2018-01-12 >>>> 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>> Extracting Engine VM OVF from the OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>> OVF_STORE volume path: >>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf >>>> MainThread::INFO::2018-01-12 >>>> 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Found an OVF for HE VM, trying to convert >>>> MainThread::INFO::2018-01-12 >>>> 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Got vm.conf from OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>>> Initializing VDSM >>>> MainThread::INFO::2018-01-12 >>>> 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Connecting the storage >>>> MainThread::INFO::2018-01-12 >>>> 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>> Validating storage server >>>> MainThread::INFO::2018-01-12 >>>> 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Storage domain reported as valid and reconnect is not forced. >>>> MainThread::INFO::2018-01-12 >>>> 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score) >>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 >>>> MainThread::INFO::2018-01-12 >>>> 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>>> Current state EngineUnexpectedlyDown (score: 0) >>>> MainThread::INFO::2018-01-12 >>>> 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>>> Reloading vm.conf from the shared storage domain >>>> MainThread::INFO::2018-01-12 >>>> 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>> Extracting Engine VM OVF from the OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>> OVF_STORE volume path: >>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf >>>> MainThread::INFO::2018-01-12 >>>> 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Found an OVF for HE VM, trying to convert >>>> MainThread::INFO::2018-01-12 >>>> 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Got vm.conf from OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>>> Initializing VDSM >>>> MainThread::INFO::2018-01-12 >>>> 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Connecting the storage >>>> MainThread::INFO::2018-01-12 >>>> 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>> Validating storage server >>>> MainThread::INFO::2018-01-12 >>>> 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Storage domain reported as valid and reconnect is not forced. >>>> MainThread::INFO::2018-01-12 >>>> 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score) >>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 >>>> MainThread::INFO::2018-01-12 >>>> 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>>> Current state EngineUnexpectedlyDown (score: 0) >>>> MainThread::INFO::2018-01-12 >>>> 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>>> Reloading vm.conf from the shared storage domain >>>> MainThread::INFO::2018-01-12 >>>> 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>> Extracting Engine VM OVF from the OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>> OVF_STORE volume path: >>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf >>>> MainThread::INFO::2018-01-12 >>>> 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Found an OVF for HE VM, trying to convert >>>> MainThread::INFO::2018-01-12 >>>> 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>> Got vm.conf from OVF_STORE >>>> MainThread::INFO::2018-01-12 >>>> 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(consume) >>>> Engine down, local host does not have best score >>>> MainThread::INFO::2018-01-12 >>>> 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engineHostedEngine::(_initialize_vdsm) >>>> Initializing VDSM >>>> MainThread::INFO::2018-01-12 >>>> 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>> Connecting the storage >>>> MainThread::INFO::2018-01-12 >>>> 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>> Validating storage server >>>> >>>> >>>> Any suggestions how to resolve this . >>>> >>>> regards, >>>> Artem >>>> >>>> >>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy >>>> <artem.tambovskiy@gmail.com> wrote: >>>>> >>>>> Trying to fix one thing I broke another :( >>>>> >>>>> I fixed mnt_options for hosted engine storage domain and installed >>>>> latest security patches to my hosts and hosted engine. All VM's up and >>>>> running, but hosted_engine --vm-status reports about issues: >>>>> >>>>> [root@ovirt1 ~]# hosted-engine --vm-status >>>>> >>>>> >>>>> --== Host 1 status ==-- >>>>> >>>>> conf_on_shared_storage : True >>>>> Status up-to-date : False >>>>> Hostname : ovirt2 >>>>> Host ID : 1 >>>>> Engine status : unknown stale-data >>>>> Score : 0 >>>>> stopped : False >>>>> Local maintenance : False >>>>> crc32 : 193164b8 >>>>> local_conf_timestamp : 8350 >>>>> Host timestamp : 8350 >>>>> Extra metadata (valid at timestamp): >>>>> metadata_parse_version=1 >>>>> metadata_feature_version=1 >>>>> timestamp=8350 (Fri Jan 12 19:03:54 2018) >>>>> host-id=1 >>>>> score=0 >>>>> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) >>>>> conf_on_shared_storage=True >>>>> maintenance=False >>>>> state=EngineUnexpectedlyDown >>>>> stopped=False >>>>> timeout=Thu Jan 1 05:24:43 1970 >>>>> >>>>> >>>>> --== Host 2 status ==-- >>>>> >>>>> conf_on_shared_storage : True >>>>> Status up-to-date : False >>>>> Hostname : ovirt1.telia.ru >>>>> Host ID : 2 >>>>> Engine status : unknown stale-data >>>>> Score : 0 >>>>> stopped : True >>>>> Local maintenance : False >>>>> crc32 : c7037c03 >>>>> local_conf_timestamp : 7530 >>>>> Host timestamp : 7530 >>>>> Extra metadata (valid at timestamp): >>>>> metadata_parse_version=1 >>>>> metadata_feature_version=1 >>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>>> host-id=2 >>>>> score=0 >>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>>> conf_on_shared_storage=True >>>>> maintenance=False >>>>> state=AgentStopped >>>>> stopped=True >>>>> [root@ovirt1 ~]# >>>>> >>>>> >>>>> >>>>> from second host situation looks a bit different: >>>>> >>>>> >>>>> [root@ovirt2 ~]# hosted-engine --vm-status >>>>> >>>>> >>>>> --== Host 1 status ==-- >>>>> >>>>> conf_on_shared_storage : True >>>>> Status up-to-date : True >>>>> Hostname : ovirt2 >>>>> Host ID : 1 >>>>> Engine status : {"reason": "vm not running on >>>>> this host", "health": "bad", "vm": "down", "detail": "unknown"} >>>>> Score : 0 >>>>> stopped : False >>>>> Local maintenance : False >>>>> crc32 : 78eabdb6 >>>>> local_conf_timestamp : 8403 >>>>> Host timestamp : 8402 >>>>> Extra metadata (valid at timestamp): >>>>> metadata_parse_version=1 >>>>> metadata_feature_version=1 >>>>> timestamp=8402 (Fri Jan 12 19:04:47 2018) >>>>> host-id=1 >>>>> score=0 >>>>> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) >>>>> conf_on_shared_storage=True >>>>> maintenance=False >>>>> state=EngineUnexpectedlyDown >>>>> stopped=False >>>>> timeout=Thu Jan 1 05:24:43 1970 >>>>> >>>>> >>>>> --== Host 2 status ==-- >>>>> >>>>> conf_on_shared_storage : True >>>>> Status up-to-date : False >>>>> Hostname : ovirt1.telia.ru >>>>> Host ID : 2 >>>>> Engine status : unknown stale-data >>>>> Score : 0 >>>>> stopped : True >>>>> Local maintenance : False >>>>> crc32 : c7037c03 >>>>> local_conf_timestamp : 7530 >>>>> Host timestamp : 7530 >>>>> Extra metadata (valid at timestamp): >>>>> metadata_parse_version=1 >>>>> metadata_feature_version=1 >>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>>> host-id=2 >>>>> score=0 >>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>>> conf_on_shared_storage=True >>>>> maintenance=False >>>>> state=AgentStopped >>>>> stopped=True >>>>> >>>>> >>>>> WebGUI shows that engine running on host ovirt1. >>>>> Gluster looks fine >>>>> [root@ovirt1 ~]# gluster volume status engine >>>>> Status of volume: engine >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> >>>>> ------------------------------------------------------------
>>>>> Brick ovirt1.teliaru:/oVirt/engine 49169 0 Y >>>>> 3244 >>>>> Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y >>>>> 20372 >>>>> Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y >>>>> 16609 >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 117868 >>>>> Self-heal Daemon on ovirt2.telia.ru N/A N/A Y >>>>> 20521 >>>>> Self-heal Daemon on ovirt3 N/A N/A Y >>>>> 25093 >>>>> >>>>> Task Status of Volume engine >>>>> >>>>> ------------------------------------------------------------
>>>>> There are no active volume tasks >>>>> >>>>> How to resolve this issue? >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>> >> >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hello Kasturi, Yes, I set global maintenance mode intentionally, I'm run out of the ideas troubleshooting my cluster and decided to undeploy the hosted engine from second host, clean the installation and add again to the cluster. Also I cleaned the metadata with *hosted-engine --clean-metadata --host-id=2 --force-clean *But once I added the second host to the cluster again it doesn't show the capability to run hosted engine. And doesn't even appear in the output hosted-engine --vm-status [root@ovirt1 ~]#hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a23c7cbd local_conf_timestamp : 848931 Host timestamp : 848930 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=848930 (Mon Jan 22 09:53:29 2018) host-id=1 score=3400 vm_conf_refresh_time=848931 (Mon Jan 22 09:53:29 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False On redeployed second host I see unknown-stale-data again, and second host doesn't show up as a hosted-engine capable. [root@ovirt2 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : False Local maintenance : False crc32 : 18765f68 local_conf_timestamp : 848951 Host timestamp : 848951 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=848951 (Mon Jan 22 09:53:49 2018) host-id=1 score=0 vm_conf_refresh_time=848951 (Mon Jan 22 09:53:50 2018) conf_on_shared_storage=True maintenance=False state=ReinitializeFSM stopped=False Really strange situation ... Regards, Artem On Mon, Jan 22, 2018 at 9:46 AM, Kasturi Narra <knarra@redhat.com> wrote:
Hello Artem,
Any reason why you chose hosted-engine undeploy action for the second host ? I see that the cluster is in global maintenance mode, was this intended ?
command to clear the entries from hosted-engine --vm-status is "hosted-engine --clean-metadata --host-id=<old_host_id> --force-clean"
Hope this helps !!
Thanks kasturi
On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hi,
Ok, i decided to remove second host from the cluster. I reinstalled from webUI it with hosted-engine action UNDEPLOY, and removed it from the cluster aftewards. All VM's are fine hosted engine running ok, But hosted-engine --vm-status still showing 2 hosts.
How I can clean the traces of second host in a correct way?
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1b1b6f6d local_conf_timestamp : 545385 Host timestamp : 545385 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=545385 (Thu Jan 18 21:34:25 2018) host-id=1 score=3400 vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
!! Cluster is in GLOBAL MAINTENANCE mode !!
Thank you in advance! Regards, Artem
On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote:
Hello,
Any further suggestions on how to fix the issue and make HA setup working? Can the complete removal of second host (with complete removal ovirt configuration files and packages) from cluster and adding it again solve the issue? Or it might completly ruin the cluster?
Regards, Artem
16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" < artem.tambovskiy@gmail.com> написал:
Hi Martin,
Thanks for feedback.
All hosts and hosted-engine running 4.1.8 release. The strange thing : I can see that host ID is set to 1 on both hosts at /etc/ovirt-hosted-engine/hosted-engine.conf file. I have no idea how this happen, the only thing I have changed recently is that I have changed mnt_options in order to add backup-volfile-servers by using hosted-engine --set-shared-config command
Both agent and broker are running on second host
[root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- vdsm 42331 1 26 14:40 ? 00:31:35 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon vdsm 42332 1 0 14:40 ? 00:00:16 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
but I saw some tracebacks during the broker start
[root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago Main PID: 42331 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─42331 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch
.set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain
self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect
self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in /rhev/data-center/mnt/glusterSD
I have tried to issue hosted-engine --connect-storage on second host followed by agent & broker restart But there is no any visible improvements.
Regards, Artem
On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi everybody,
there are couple of things to check here.
- what version of hosted engine agent is this? The logs look like coming from 4.1 - what version of engine is used? - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on both hosts, the numbers must be different - it looks like the agent or broker on host 2 is not active (or there would be a report) - the second host does not see data from the first host (unknown stale-data), wait for a minute and check again, then check the storage connection
And then the general troubleshooting:
- put hosted engine in global maintenance mode (and check that it is visible from the other host using he --vm-status) - mount storage domain (hosted-engine --connect-storage) - check sanlock client status to see if proper lockspaces are present
Best regards
Martin Sivak
Why are both hosts reporting as ovirt 1? Look at the hostname fields to see what mean.
-derek Sent using my mobile device. Please excuse any typos.
On January 16, 2018 7:11:09 AM Artem Tambovskiy < artem.tambovskiy@gmail.com> wrote: > > Hello, > > Yes, I followed exactly the same procedure while reinstalling the hosts > (the only difference that I have SSH key configured instead of the > password). > > Just reinstalled the second host one more time, after 20 min the host > still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and > I still don't see crown icon for this host. > > hosted-engine --vm-status from ovirt1 host > > [root@ovirt1 ~]# hosted-engine --vm-status > > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : ovirt1.telia.ru > Host ID : 1 > Engine status : {"health": "good", "vm": "up", > "detail": "up"} > Score : 3400 > stopped : False > Local maintenance : False > crc32 : 3f94156a > local_conf_timestamp : 349144 > Host timestamp : 349144 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=349144 (Tue Jan 16 15:03:45 2018) > host-id=1 > score=3400 > vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) > conf_on_shared_storage=True > maintenance=False > state=EngineUp > stopped=False > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt1.telia.ru > Host ID : 2 > Engine status : unknown stale-data > Score : 0 > stopped : True > Local maintenance : False > crc32 : c7037c03 > local_conf_timestamp : 7530 > Host timestamp : 7530 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=7530 (Fri Jan 12 16:10:12 2018) > host-id=2 > score=0 > vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) > conf_on_shared_storage=True > maintenance=False > state=AgentStopped > stopped=True > > > hosted-engine --vm-status output from ovirt2 host > > [root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status > > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt1.telia.ru > Host ID : 1 > Engine status : unknown stale-data > Score : 3400 > stopped : False > Local maintenance : False > crc32 : 6d3606f1 > local_conf_timestamp : 349264 > Host timestamp : 349264 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=349264 (Tue Jan 16 15:05:45 2018) > host-id=1 > score=3400 > vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) > conf_on_shared_storage=True > maintenance=False > state=EngineUp > stopped=False > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt1.telia.ru > Host ID : 2 > Engine status : unknown stale-data > Score : 0 > stopped : True > Local maintenance : False > crc32 : c7037c03 > local_conf_timestamp : 7530 > Host timestamp : 7530 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=7530 (Fri Jan 12 16:10:12 2018) > host-id=2 > score=0 > vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) > conf_on_shared_storage=True > maintenance=False > state=AgentStopped > stopped=True > > > Also I saw some log messages in webGUI about time drift like > > "Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum > configured value is 300 seconds." that is a bit weird as haven't touched any > time settings since I installed the cluster. > both host have the same time and timezone (MSK) but hosted engine
> UTC timezone. Is it mandatory to have everything in sync and in the same > timezone? > > Regards, > Artem > > > > > > > On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> wrote: >> >> Hello, >> >> I now see that your hosted engine is up and running. Can you let me >> know how did you try reinstalling the host? Below is the procedure which is >> used and hope you did not miss any step while reinstalling. If no, can you >> try reinstalling again and see if that works ? >> >> 1) Move the host to maintenance >> 2) click on reinstall >> 3) provide the password >> 4) uncheck 'automatically configure host firewall' >> 5) click on 'Deploy' tab >> 6) click Hosted Engine deployment as 'Deploy' >> >> And once the host installation is done, wait till the active score of the >> host shows 3400 in the general tab then check hosted-engine --vm-status. >> >> Thanks >> kasturi >> >> On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy >> <artem.tambovskiy@gmail.com> wrote: >>> >>> Hello, >>> >>> I have uploaded 2 archives with all relevant logs to shared hosting >>> files from host 1 (which is currently running all VM's including >>> hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK >>> files from second host - https://yadi.sk/d/UBducEsV3RTvhc >>> >>> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it >>> gives no effect. I have also tried to shutdown hosted_engine VM, stop >>> ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect >>> it again - no effect as well. >>> Also I tried to reinstall second host from WebGUI - this lead to
>>> interesting situation - now hosted-engine --vm-status shows
On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek@ihtfp.com> wrote: lives in the that both
>>> hosts have the same address. >>> >>> [root@ovirt1 ~]# hosted-engine --vm-status >>> >>> --== Host 1 status ==-- >>> >>> conf_on_shared_storage : True >>> Status up-to-date : True >>> Hostname : ovirt1.telia.ru >>> Host ID : 1 >>> Engine status : {"health": "good", "vm": "up", >>> "detail": "up"} >>> Score : 3400 >>> stopped : False >>> Local maintenance : False >>> crc32 : a7758085 >>> local_conf_timestamp : 259327 >>> Host timestamp : 259327 >>> Extra metadata (valid at timestamp): >>> metadata_parse_version=1 >>> metadata_feature_version=1 >>> timestamp=259327 (Mon Jan 15 14:06:48 2018) >>> host-id=1 >>> score=3400 >>> vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) >>> conf_on_shared_storage=True >>> maintenance=False >>> state=EngineUp >>> stopped=False >>> >>> >>> --== Host 2 status ==-- >>> >>> conf_on_shared_storage : True >>> Status up-to-date : False >>> Hostname : ovirt1.telia.ru >>> Host ID : 2 >>> Engine status : unknown stale-data >>> Score : 0 >>> stopped : True >>> Local maintenance : False >>> crc32 : c7037c03 >>> local_conf_timestamp : 7530 >>> Host timestamp : 7530 >>> Extra metadata (valid at timestamp): >>> metadata_parse_version=1 >>> metadata_feature_version=1 >>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>> host-id=2 >>> score=0 >>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>> conf_on_shared_storage=True >>> maintenance=False >>> state=AgentStopped >>> stopped=True >>> >>> Gluster seems working fine. all gluster nodes showing connected state. >>> >>> Any advises on how to resolve this situation are highly appreciated! >>> >>> Regards, >>> Artem >>> >>> >>> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra < knarra@redhat.com> >>> wrote: >>>> >>>> Hello Artem, >>>> >>>> Can you check if glusterd service is running on host1 and all >>>> the peers are in connected state ? If yes, can you restart ovirt-ha-agent >>>> and broker services and check if things are working fine ? >>>> >>>> Thanks >>>> kasturi >>>> >>>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy >>>> <artem.tambovskiy@gmail.com> wrote: >>>>> >>>>> Explored logs on both hosts. >>>>> broker.log shows no errors. >>>>> >>>>> agent.log looking not good: >>>>> >>>>> on host1 (which running hosted engine) : >>>>> >>>>> MainThread::ERROR::2018-01-12 >>>>> 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>>>> Traceback (most recent call last): >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/agent.py", >>>>> line 191, in _run_agent >>>>> return action(he) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/agent.py", >>>>> line 64, in action_proper >>>>> return he.start_monitoring() >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/hosted_engine.py", >>>>> line 411, in start_monitoring >>>>> self._initialize_sanlock() >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/age nt/hosted_engine.py", >>>>> line 749, in _initialize_sanlock >>>>> "Failed to initialize sanlock, the number of errors has" >>>>> SanlockInitializationError: Failed to initialize sanlock, the number >>>>> of errors has exceeded the limit >>>>> >>>>> MainThread::ERROR::2018-01-12 >>>>> 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>>>> Trying to restart agent >>>>> MainThread::WARNING::2018-01-12 >>>>> 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) >>>>> Restarting agent, attempt '1' >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.age nthosted_engine.HostedEngine::(_get_hostname) >>>>> Found certificate common name: ovirt1.telia.ru >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>> Initializing VDSM >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Connecting the storage >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>>> Validating storage server >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.li bstorage_server.StorageServer::(connect_storage_server) >>>>> Connecting storage server >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) >>>>> Connecting storage server >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(connect_storage_server) >>>>> Refreshing the storage domain >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Preparing images >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.I mage::(prepare_images) >>>>> Preparing images >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Refreshing vm.conf >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>>>> Reloading vm.conf from the shared storage domain >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>>> Extracting Engine VM OVF from the OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>>> OVF_STORE volume path: >>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Found an OVF for HE VM, trying to convert >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Got vm.conf from OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_broker) >>>>> Initializing ha-broker connection >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>>> Starting monitor ping, options {'addr': '80.239.162.97'} >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>>> Success, id 140547104457680 >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>>> Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': >>>>> 'ovirtmgmt', 'address': '0'} >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlinkBrokerLink::(start_monitor) >>>>> Success, id 140547104458064 >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>>> Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>>> Success, id 140547104458448 >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlinkBrokerLink::(start_monitor) >>>>> Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': >>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>>> Success, id 140547104457552 >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>>> Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': >>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_monitor) >>>>> Success, id 140547104459792 >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(set_storage_domain) >>>>> Success, id 140546772847056 >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_broker) >>>>> Broker initialized, all submonitors started >>>>> MainThread::INFO::2018-01-12 >>>>> 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>> Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: >>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/0 93faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186- 23b404e50769) >>>>> MainThread::INFO::2018-01-12 >>>>> 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>>> MainThread::INFO::2018-01-12 >>>>> 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>>> MainThread::INFO::2018-01-12 >>>>> 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>>> >>>>> >>>>> agent.log from second host >>>>> >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Connecting the storage >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>>> Validating storage server >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Storage domain reported as valid and reconnect is not forced. >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>>>> Reloading vm.conf from the shared storage domain >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>>> Extracting Engine VM OVF from the OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>>> OVF_STORE volume path: >>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Found an OVF for HE VM, trying to convert >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Got vm.conf from OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>> Initializing VDSM >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Connecting the storage >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>>> Validating storage server >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Storage domain reported as valid and reconnect is not forced. >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score) >>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 >>>>> MainThread::INFO::2018-01-12 >>>>> 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>>>> Reloading vm.conf from the shared storage domain >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>>> Extracting Engine VM OVF from the OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>>> OVF_STORE volume path: >>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Found an OVF for HE VM, trying to convert >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Got vm.conf from OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>> Initializing VDSM >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Connecting the storage >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>>> Validating storage server >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Storage domain reported as valid and reconnect is not forced. >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score) >>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(start_monitoring) >>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(refresh_vm_conf) >>>>> Reloading vm.conf from the shared storage domain >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>>> Extracting Engine VM OVF from the OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngineVMOVF) >>>>> OVF_STORE volume path: >>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5 cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4- d109fa36dfcf >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Found an OVF for HE VM, trying to convert >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>> Got vm.conf from OVF_STORE >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(consume) >>>>> Engine down, local host does not have best score >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engineHostedEngine::(_initialize_vdsm) >>>>> Initializing VDSM >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>> Connecting the storage >>>>> MainThread::INFO::2018-01-12 >>>>> 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer::(validate_storage_server) >>>>> Validating storage server >>>>> >>>>> >>>>> Any suggestions how to resolve this . >>>>> >>>>> regards, >>>>> Artem >>>>> >>>>> >>>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy >>>>> <artem.tambovskiy@gmail.com> wrote: >>>>>> >>>>>> Trying to fix one thing I broke another :( >>>>>> >>>>>> I fixed mnt_options for hosted engine storage domain and installed >>>>>> latest security patches to my hosts and hosted engine. All VM's up and >>>>>> running, but hosted_engine --vm-status reports about issues: >>>>>> >>>>>> [root@ovirt1 ~]# hosted-engine --vm-status >>>>>> >>>>>> >>>>>> --== Host 1 status ==-- >>>>>> >>>>>> conf_on_shared_storage : True >>>>>> Status up-to-date : False >>>>>> Hostname : ovirt2 >>>>>> Host ID : 1 >>>>>> Engine status : unknown stale-data >>>>>> Score : 0 >>>>>> stopped : False >>>>>> Local maintenance : False >>>>>> crc32 : 193164b8 >>>>>> local_conf_timestamp : 8350 >>>>>> Host timestamp : 8350 >>>>>> Extra metadata (valid at timestamp): >>>>>> metadata_parse_version=1 >>>>>> metadata_feature_version=1 >>>>>> timestamp=8350 (Fri Jan 12 19:03:54 2018) >>>>>> host-id=1 >>>>>> score=0 >>>>>> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) >>>>>> conf_on_shared_storage=True >>>>>> maintenance=False >>>>>> state=EngineUnexpectedlyDown >>>>>> stopped=False >>>>>> timeout=Thu Jan 1 05:24:43 1970 >>>>>> >>>>>> >>>>>> --== Host 2 status ==-- >>>>>> >>>>>> conf_on_shared_storage : True >>>>>> Status up-to-date : False >>>>>> Hostname : ovirt1.telia.ru >>>>>> Host ID : 2 >>>>>> Engine status : unknown stale-data >>>>>> Score : 0 >>>>>> stopped : True >>>>>> Local maintenance : False >>>>>> crc32 : c7037c03 >>>>>> local_conf_timestamp : 7530 >>>>>> Host timestamp : 7530 >>>>>> Extra metadata (valid at timestamp): >>>>>> metadata_parse_version=1 >>>>>> metadata_feature_version=1 >>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>>>> host-id=2 >>>>>> score=0 >>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>>>> conf_on_shared_storage=True >>>>>> maintenance=False >>>>>> state=AgentStopped >>>>>> stopped=True >>>>>> [root@ovirt1 ~]# >>>>>> >>>>>> >>>>>> >>>>>> from second host situation looks a bit different: >>>>>> >>>>>> >>>>>> [root@ovirt2 ~]# hosted-engine --vm-status >>>>>> >>>>>> >>>>>> --== Host 1 status ==-- >>>>>> >>>>>> conf_on_shared_storage : True >>>>>> Status up-to-date : True >>>>>> Hostname : ovirt2 >>>>>> Host ID : 1 >>>>>> Engine status : {"reason": "vm not running on >>>>>> this host", "health": "bad", "vm": "down", "detail": "unknown"} >>>>>> Score : 0 >>>>>> stopped : False >>>>>> Local maintenance : False >>>>>> crc32 : 78eabdb6 >>>>>> local_conf_timestamp : 8403 >>>>>> Host timestamp : 8402 >>>>>> Extra metadata (valid at timestamp): >>>>>> metadata_parse_version=1 >>>>>> metadata_feature_version=1 >>>>>> timestamp=8402 (Fri Jan 12 19:04:47 2018) >>>>>> host-id=1 >>>>>> score=0 >>>>>> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) >>>>>> conf_on_shared_storage=True >>>>>> maintenance=False >>>>>> state=EngineUnexpectedlyDown >>>>>> stopped=False >>>>>> timeout=Thu Jan 1 05:24:43 1970 >>>>>> >>>>>> >>>>>> --== Host 2 status ==-- >>>>>> >>>>>> conf_on_shared_storage : True >>>>>> Status up-to-date : False >>>>>> Hostname : ovirt1.telia.ru >>>>>> Host ID : 2 >>>>>> Engine status : unknown stale-data >>>>>> Score : 0 >>>>>> stopped : True >>>>>> Local maintenance : False >>>>>> crc32 : c7037c03 >>>>>> local_conf_timestamp : 7530 >>>>>> Host timestamp : 7530 >>>>>> Extra metadata (valid at timestamp): >>>>>> metadata_parse_version=1 >>>>>> metadata_feature_version=1 >>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>>>> host-id=2 >>>>>> score=0 >>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>>>> conf_on_shared_storage=True >>>>>> maintenance=False >>>>>> state=AgentStopped >>>>>> stopped=True >>>>>> >>>>>> >>>>>> WebGUI shows that engine running on host ovirt1. >>>>>> Gluster looks fine >>>>>> [root@ovirt1 ~]# gluster volume status engine >>>>>> Status of volume: engine >>>>>> Gluster process TCP Port RDMA Port >>>>>> Online Pid >>>>>> >>>>>> ------------------------------------------------------------
>>>>>> Brick ovirt1.teliaru:/oVirt/engine 49169 0 Y >>>>>> 3244 >>>>>> Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y >>>>>> 20372 >>>>>> Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y >>>>>> 16609 >>>>>> Self-heal Daemon on localhost N/A N/A Y >>>>>> 117868 >>>>>> Self-heal Daemon on ovirt2.telia.ru N/A N/A Y >>>>>> 20521 >>>>>> Self-heal Daemon on ovirt3 N/A N/A Y >>>>>> 25093 >>>>>> >>>>>> Task Status of Volume engine >>>>>> >>>>>> ------------------------------------------------------------
>>>>>> There are no active volume tasks >>>>>> >>>>>> How to resolve this issue? >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>> >>> >> > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Artem, make sure the IDs are different, change them manually if you must! That is all you need to do to get the agent up I think. The symlink issue is probably related to another change we did (it happens when a new hosted engine node is deployed by the engine) and a simple broker restart should fix it too. Best regards Martin Sivak On Mon, Jan 22, 2018 at 8:03 AM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hello Kasturi,
Yes, I set global maintenance mode intentionally, I'm run out of the ideas troubleshooting my cluster and decided to undeploy the hosted engine from second host, clean the installation and add again to the cluster. Also I cleaned the metadata with hosted-engine --clean-metadata --host-id=2 --force-clean But once I added the second host to the cluster again it doesn't show the capability to run hosted engine. And doesn't even appear in the output hosted-engine --vm-status [root@ovirt1 ~]#hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : a23c7cbd local_conf_timestamp : 848931 Host timestamp : 848930 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=848930 (Mon Jan 22 09:53:29 2018) host-id=1 score=3400 vm_conf_refresh_time=848931 (Mon Jan 22 09:53:29 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False
On redeployed second host I see unknown-stale-data again, and second host doesn't show up as a hosted-engine capable. [root@ovirt2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 1 Engine status : unknown stale-data Score : 0 stopped : False Local maintenance : False crc32 : 18765f68 local_conf_timestamp : 848951 Host timestamp : 848951 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=848951 (Mon Jan 22 09:53:49 2018) host-id=1 score=0 vm_conf_refresh_time=848951 (Mon Jan 22 09:53:50 2018) conf_on_shared_storage=True maintenance=False state=ReinitializeFSM stopped=False
Really strange situation ...
Regards, Artem
On Mon, Jan 22, 2018 at 9:46 AM, Kasturi Narra <knarra@redhat.com> wrote:
Hello Artem,
Any reason why you chose hosted-engine undeploy action for the second host ? I see that the cluster is in global maintenance mode, was this intended ?
command to clear the entries from hosted-engine --vm-status is "hosted-engine --clean-metadata --host-id=<old_host_id> --force-clean"
Hope this helps !!
Thanks kasturi
On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hi,
Ok, i decided to remove second host from the cluster. I reinstalled from webUI it with hosted-engine action UNDEPLOY, and removed it from the cluster aftewards. All VM's are fine hosted engine running ok, But hosted-engine --vm-status still showing 2 hosts.
How I can clean the traces of second host in a correct way?
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1b1b6f6d local_conf_timestamp : 545385 Host timestamp : 545385 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=545385 (Thu Jan 18 21:34:25 2018) host-id=1 score=3400 vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt1.telia.ru Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : c7037c03 local_conf_timestamp : 7530 Host timestamp : 7530 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7530 (Fri Jan 12 16:10:12 2018) host-id=2 score=0 vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True
!! Cluster is in GLOBAL MAINTENANCE mode !!
Thank you in advance! Regards, Artem
On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:
Hello,
Any further suggestions on how to fix the issue and make HA setup working? Can the complete removal of second host (with complete removal ovirt configuration files and packages) from cluster and adding it again solve the issue? Or it might completly ruin the cluster?
Regards, Artem
16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" <artem.tambovskiy@gmail.com> написал:
Hi Martin,
Thanks for feedback.
All hosts and hosted-engine running 4.1.8 release. The strange thing : I can see that host ID is set to 1 on both hosts at /etc/ovirt-hosted-engine/hosted-engine.conf file. I have no idea how this happen, the only thing I have changed recently is that I have changed mnt_options in order to add backup-volfile-servers by using hosted-engine --set-shared-config command
Both agent and broker are running on second host
[root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha- vdsm 42331 1 26 14:40 ? 00:31:35 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon vdsm 42332 1 0 14:40 ? 00:00:16 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
but I saw some tracebacks during the broker start
[root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago Main PID: 42331 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─42331 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch
.set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain
self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect
self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in /rhev/data-center/mnt/glusterSD
I have tried to issue hosted-engine --connect-storage on second host followed by agent & broker restart But there is no any visible improvements.
Regards, Artem
On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak@redhat.com> wrote:
Hi everybody,
there are couple of things to check here.
- what version of hosted engine agent is this? The logs look like coming from 4.1 - what version of engine is used? - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on both hosts, the numbers must be different - it looks like the agent or broker on host 2 is not active (or there would be a report) - the second host does not see data from the first host (unknown stale-data), wait for a minute and check again, then check the storage connection
And then the general troubleshooting:
- put hosted engine in global maintenance mode (and check that it is visible from the other host using he --vm-status) - mount storage domain (hosted-engine --connect-storage) - check sanlock client status to see if proper lockspaces are present
Best regards
Martin Sivak
On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek@ihtfp.com> wrote: > Why are both hosts reporting as ovirt 1? > Look at the hostname fields to see what mean. > > -derek > Sent using my mobile device. Please excuse any typos. > > On January 16, 2018 7:11:09 AM Artem Tambovskiy > <artem.tambovskiy@gmail.com> > wrote: >> >> Hello, >> >> Yes, I followed exactly the same procedure while reinstalling the >> hosts >> (the only difference that I have SSH key configured instead of the >> password). >> >> Just reinstalled the second host one more time, after 20 min the >> host >> still haven't reached active score of 3400 (Hosted Engine HA:Not >> Active) and >> I still don't see crown icon for this host. >> >> hosted-engine --vm-status from ovirt1 host >> >> [root@ovirt1 ~]# hosted-engine --vm-status >> >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : True >> Hostname : ovirt1.telia.ru >> Host ID : 1 >> Engine status : {"health": "good", "vm": "up", >> "detail": "up"} >> Score : 3400 >> stopped : False >> Local maintenance : False >> crc32 : 3f94156a >> local_conf_timestamp : 349144 >> Host timestamp : 349144 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=349144 (Tue Jan 16 15:03:45 2018) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUp >> stopped=False >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped : True >> Local maintenance : False >> crc32 : c7037c03 >> local_conf_timestamp : 7530 >> Host timestamp : 7530 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=7530 (Fri Jan 12 16:10:12 2018) >> host-id=2 >> score=0 >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> >> >> hosted-engine --vm-status output from ovirt2 host >> >> [root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status >> >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 1 >> Engine status : unknown stale-data >> Score : 3400 >> stopped : False >> Local maintenance : False >> crc32 : 6d3606f1 >> local_conf_timestamp : 349264 >> Host timestamp : 349264 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=349264 (Tue Jan 16 15:05:45 2018) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUp >> stopped=False >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped : True >> Local maintenance : False >> crc32 : c7037c03 >> local_conf_timestamp : 7530 >> Host timestamp : 7530 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=7530 (Fri Jan 12 16:10:12 2018) >> host-id=2 >> score=0 >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> >> >> Also I saw some log messages in webGUI about time drift like >> >> "Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum >> configured value is 300 seconds." that is a bit weird as haven't >> touched any >> time settings since I installed the cluster. >> both host have the same time and timezone (MSK) but hosted engine >> lives in >> UTC timezone. Is it mandatory to have everything in sync and in the >> same >> timezone? >> >> Regards, >> Artem >> >> >> >> >> >> >> On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra@redhat.com> >> wrote: >>> >>> Hello, >>> >>> I now see that your hosted engine is up and running. Can you >>> let me >>> know how did you try reinstalling the host? Below is the procedure >>> which is >>> used and hope you did not miss any step while reinstalling. If no, >>> can you >>> try reinstalling again and see if that works ? >>> >>> 1) Move the host to maintenance >>> 2) click on reinstall >>> 3) provide the password >>> 4) uncheck 'automatically configure host firewall' >>> 5) click on 'Deploy' tab >>> 6) click Hosted Engine deployment as 'Deploy' >>> >>> And once the host installation is done, wait till the active score >>> of the >>> host shows 3400 in the general tab then check hosted-engine >>> --vm-status. >>> >>> Thanks >>> kasturi >>> >>> On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy >>> <artem.tambovskiy@gmail.com> wrote: >>>> >>>> Hello, >>>> >>>> I have uploaded 2 archives with all relevant logs to shared >>>> hosting >>>> files from host 1 (which is currently running all VM's including >>>> hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK >>>> files from second host - https://yadi.sk/d/UBducEsV3RTvhc >>>> >>>> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker >>>> but it >>>> gives no effect. I have also tried to shutdown hosted_engine VM, >>>> stop >>>> ovirt-ha-agent and ovirt-ha-broker services disconnect storage >>>> and connect >>>> it again - no effect as well. >>>> Also I tried to reinstall second host from WebGUI - this lead to >>>> the >>>> interesting situation - now hosted-engine --vm-status shows >>>> that both >>>> hosts have the same address. >>>> >>>> [root@ovirt1 ~]# hosted-engine --vm-status >>>> >>>> --== Host 1 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : True >>>> Hostname : ovirt1.telia.ru >>>> Host ID : 1 >>>> Engine status : {"health": "good", "vm": >>>> "up", >>>> "detail": "up"} >>>> Score : 3400 >>>> stopped : False >>>> Local maintenance : False >>>> crc32 : a7758085 >>>> local_conf_timestamp : 259327 >>>> Host timestamp : 259327 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=259327 (Mon Jan 15 14:06:48 2018) >>>> host-id=1 >>>> score=3400 >>>> vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=EngineUp >>>> stopped=False >>>> >>>> >>>> --== Host 2 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : False >>>> Hostname : ovirt1.telia.ru >>>> Host ID : 2 >>>> Engine status : unknown stale-data >>>> Score : 0 >>>> stopped : True >>>> Local maintenance : False >>>> crc32 : c7037c03 >>>> local_conf_timestamp : 7530 >>>> Host timestamp : 7530 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>> host-id=2 >>>> score=0 >>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=AgentStopped >>>> stopped=True >>>> >>>> Gluster seems working fine. all gluster nodes showing connected >>>> state. >>>> >>>> Any advises on how to resolve this situation are highly >>>> appreciated! >>>> >>>> Regards, >>>> Artem >>>> >>>> >>>> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra >>>> <knarra@redhat.com> >>>> wrote: >>>>> >>>>> Hello Artem, >>>>> >>>>> Can you check if glusterd service is running on host1 >>>>> and all >>>>> the peers are in connected state ? If yes, can you restart >>>>> ovirt-ha-agent >>>>> and broker services and check if things are working fine ? >>>>> >>>>> Thanks >>>>> kasturi >>>>> >>>>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy >>>>> <artem.tambovskiy@gmail.com> wrote: >>>>>> >>>>>> Explored logs on both hosts. >>>>>> broker.log shows no errors. >>>>>> >>>>>> agent.log looking not good: >>>>>> >>>>>> on host1 (which running hosted engine) : >>>>>> >>>>>> MainThread::ERROR::2018-01-12 >>>>>> >>>>>> 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>>>> Traceback (most recent call last): >>>>>> File >>>>>> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>> line 191, in _run_agent >>>>>> return action(he) >>>>>> File >>>>>> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>> line 64, in action_proper >>>>>> return he.start_monitoring() >>>>>> File >>>>>> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 411, in start_monitoring >>>>>> self._initialize_sanlock() >>>>>> File >>>>>> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 749, in _initialize_sanlock >>>>>> "Failed to initialize sanlock, the number of errors has" >>>>>> SanlockInitializationError: Failed to initialize sanlock, the >>>>>> number >>>>>> of errors has exceeded the limit >>>>>> >>>>>> MainThread::ERROR::2018-01-12 >>>>>> >>>>>> 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>>>> Trying to restart agent >>>>>> MainThread::WARNING::2018-01-12 >>>>>> >>>>>> 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>>>> Restarting agent, attempt '1' >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.agenthosted_engine.HostedEngine::(_get_hostname) >>>>>> Found certificate common name: ovirt1.telia.ru >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.libstorage_server.StorageServer::(connect_storage_server) >>>>>> Connecting storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>>>> Connecting storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>>>> Refreshing the storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Preparing images >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) >>>>>> Preparing images >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Refreshing vm.conf >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Trying to get a fresher copy of vm configuration from the >>>>>> OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>>>> Initializing ha-broker connection >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Starting monitor ping, options {'addr': '80.239.162.97'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Success, id 140547104457680 >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Starting monitor mgmt-bridge, options {'use_ssl': 'true', >>>>>> 'bridge_name': >>>>>> 'ovirtmgmt', 'address': '0'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor) >>>>>> Success, id 140547104458064 >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Starting monitor mem-free, options {'use_ssl': 'true', >>>>>> 'address': '0'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Success, id 140547104458448 >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor) >>>>>> Starting monitor cpu-load-no-engine, options {'use_ssl': >>>>>> 'true', 'vm_uuid': >>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Success, id 140547104457552 >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Starting monitor engine-health, options {'use_ssl': 'true', >>>>>> 'vm_uuid': >>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Success, id 140547104459792 >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) >>>>>> Success, id 140546772847056 >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>>>> Broker initialized, all submonitors started >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Ensuring lease for lockspace hosted-engine, host id 1 is >>>>>> acquired (file: >>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769) >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Failed to acquire the lock. Waiting '5's before the next >>>>>> attempt >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Failed to acquire the lock. Waiting '5's before the next >>>>>> attempt >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Failed to acquire the lock. Waiting '5's before the next >>>>>> attempt >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Failed to acquire the lock. Waiting '5's before the next >>>>>> attempt >>>>>> >>>>>> >>>>>> agent.log from second host >>>>>> >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Storage domain reported as valid and reconnect is not forced. >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Trying to get a fresher copy of vm configuration from the >>>>>> OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Storage domain reported as valid and reconnect is not forced. >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 >>>>>> 2018 >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Trying to get a fresher copy of vm configuration from the >>>>>> OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Storage domain reported as valid and reconnect is not forced. >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 >>>>>> 2018 >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Trying to get a fresher copy of vm configuration from the >>>>>> OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) >>>>>> Engine down, local host does not have best score >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engineHostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> >>>>>> 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> >>>>>> >>>>>> Any suggestions how to resolve this . >>>>>> >>>>>> regards, >>>>>> Artem >>>>>> >>>>>> >>>>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy >>>>>> <artem.tambovskiy@gmail.com> wrote: >>>>>>> >>>>>>> Trying to fix one thing I broke another :( >>>>>>> >>>>>>> I fixed mnt_options for hosted engine storage domain and >>>>>>> installed >>>>>>> latest security patches to my hosts and hosted engine. All >>>>>>> VM's up and >>>>>>> running, but hosted_engine --vm-status reports about issues: >>>>>>> >>>>>>> [root@ovirt1 ~]# hosted-engine --vm-status >>>>>>> >>>>>>> >>>>>>> --== Host 1 status ==-- >>>>>>> >>>>>>> conf_on_shared_storage : True >>>>>>> Status up-to-date : False >>>>>>> Hostname : ovirt2 >>>>>>> Host ID : 1 >>>>>>> Engine status : unknown stale-data >>>>>>> Score : 0 >>>>>>> stopped : False >>>>>>> Local maintenance : False >>>>>>> crc32 : 193164b8 >>>>>>> local_conf_timestamp : 8350 >>>>>>> Host timestamp : 8350 >>>>>>> Extra metadata (valid at timestamp): >>>>>>> metadata_parse_version=1 >>>>>>> metadata_feature_version=1 >>>>>>> timestamp=8350 (Fri Jan 12 19:03:54 2018) >>>>>>> host-id=1 >>>>>>> score=0 >>>>>>> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) >>>>>>> conf_on_shared_storage=True >>>>>>> maintenance=False >>>>>>> state=EngineUnexpectedlyDown >>>>>>> stopped=False >>>>>>> timeout=Thu Jan 1 05:24:43 1970 >>>>>>> >>>>>>> >>>>>>> --== Host 2 status ==-- >>>>>>> >>>>>>> conf_on_shared_storage : True >>>>>>> Status up-to-date : False >>>>>>> Hostname : ovirt1.telia.ru >>>>>>> Host ID : 2 >>>>>>> Engine status : unknown stale-data >>>>>>> Score : 0 >>>>>>> stopped : True >>>>>>> Local maintenance : False >>>>>>> crc32 : c7037c03 >>>>>>> local_conf_timestamp : 7530 >>>>>>> Host timestamp : 7530 >>>>>>> Extra metadata (valid at timestamp): >>>>>>> metadata_parse_version=1 >>>>>>> metadata_feature_version=1 >>>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>>>>> host-id=2 >>>>>>> score=0 >>>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>>>>> conf_on_shared_storage=True >>>>>>> maintenance=False >>>>>>> state=AgentStopped >>>>>>> stopped=True >>>>>>> [root@ovirt1 ~]# >>>>>>> >>>>>>> >>>>>>> >>>>>>> from second host situation looks a bit different: >>>>>>> >>>>>>> >>>>>>> [root@ovirt2 ~]# hosted-engine --vm-status >>>>>>> >>>>>>> >>>>>>> --== Host 1 status ==-- >>>>>>> >>>>>>> conf_on_shared_storage : True >>>>>>> Status up-to-date : True >>>>>>> Hostname : ovirt2 >>>>>>> Host ID : 1 >>>>>>> Engine status : {"reason": "vm not >>>>>>> running on >>>>>>> this host", "health": "bad", "vm": "down", "detail": >>>>>>> "unknown"} >>>>>>> Score : 0 >>>>>>> stopped : False >>>>>>> Local maintenance : False >>>>>>> crc32 : 78eabdb6 >>>>>>> local_conf_timestamp : 8403 >>>>>>> Host timestamp : 8402 >>>>>>> Extra metadata (valid at timestamp): >>>>>>> metadata_parse_version=1 >>>>>>> metadata_feature_version=1 >>>>>>> timestamp=8402 (Fri Jan 12 19:04:47 2018) >>>>>>> host-id=1 >>>>>>> score=0 >>>>>>> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) >>>>>>> conf_on_shared_storage=True >>>>>>> maintenance=False >>>>>>> state=EngineUnexpectedlyDown >>>>>>> stopped=False >>>>>>> timeout=Thu Jan 1 05:24:43 1970 >>>>>>> >>>>>>> >>>>>>> --== Host 2 status ==-- >>>>>>> >>>>>>> conf_on_shared_storage : True >>>>>>> Status up-to-date : False >>>>>>> Hostname : ovirt1.telia.ru >>>>>>> Host ID : 2 >>>>>>> Engine status : unknown stale-data >>>>>>> Score : 0 >>>>>>> stopped : True >>>>>>> Local maintenance : False >>>>>>> crc32 : c7037c03 >>>>>>> local_conf_timestamp : 7530 >>>>>>> Host timestamp : 7530 >>>>>>> Extra metadata (valid at timestamp): >>>>>>> metadata_parse_version=1 >>>>>>> metadata_feature_version=1 >>>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>>>>> host-id=2 >>>>>>> score=0 >>>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>>>>> conf_on_shared_storage=True >>>>>>> maintenance=False >>>>>>> state=AgentStopped >>>>>>> stopped=True >>>>>>> >>>>>>> >>>>>>> WebGUI shows that engine running on host ovirt1. >>>>>>> Gluster looks fine >>>>>>> [root@ovirt1 ~]# gluster volume status engine >>>>>>> Status of volume: engine >>>>>>> Gluster process TCP Port RDMA >>>>>>> Port >>>>>>> Online Pid >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Brick ovirt1.teliaru:/oVirt/engine 49169 0 >>>>>>> Y >>>>>>> 3244 >>>>>>> Brick ovirt2.telia.ru:/oVirt/engine 49179 0 >>>>>>> Y >>>>>>> 20372 >>>>>>> Brick ovirt3.telia.ru:/oVirt/engine 49206 0 >>>>>>> Y >>>>>>> 16609 >>>>>>> Self-heal Daemon on localhost N/A N/A >>>>>>> Y >>>>>>> 117868 >>>>>>> Self-heal Daemon on ovirt2.telia.ru N/A N/A >>>>>>> Y >>>>>>> 20521 >>>>>>> Self-heal Daemon on ovirt3 N/A N/A >>>>>>> Y >>>>>>> 25093 >>>>>>> >>>>>>> Task Status of Volume engine >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> There are no active volume tasks >>>>>>> >>>>>>> How to resolve this issue? >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> Users@ovirt.org >>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (4)
-
Artem Tambovskiy
-
Derek Atkins
-
Kasturi Narra
-
Martin Sivak