Hello Artem,Can you check if glusterd service is running on host1 and all the peers are in connected state ? If yes, can you restart ovirt-ha-agent and broker services and check if things are working fine ?ThankskasturiOn Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:Explored logs on both hosts.broker.log shows no errors.agent.log looking not good:
on host1 (which running hosted engine) :MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/agent.py", line 191, in _run_agent return action(he)File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/agent.py", line 64, in action_proper return he.start_monitoring()File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/hosted_engine.py", line 411, in start_monitoring self._initialize_sanlock()File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/hosted_engine.py", line 749, in _initialize_sanlock "Failed to initialize sanlock, the number of errors has"SanlockInitializationError: Failed to initialize sanlock, the number of errors has exceeded the limitMainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent .Agent::(_run_agent) Trying to restart agent MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovir t_hosted_engine_ha.agent.agent .Agent::(_run_agent) Restarting agent, attempt '1' MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_get_hostname) Found certificate common name: ovirt1.telia.ru MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(connect_storage_server) Connecting storage server MainThread::INFO::2018-01-12 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(connect_storage_server) Refreshing the storage domain MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Preparing images MainThread::INFO::2018-01-12 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image. Image::(prepare_images) Preparing images MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Refreshing vm.conf MainThread::INFO::2018-01-12 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/ 5cabd8e1-5f4b-469e-becc- 227469e03f5c/8048cbd7-77e2- 4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Starting monitor ping, options {'addr': '80.239.162.97'} MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Success, id 140547104457680 MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Success, id 140547104458064 MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Success, id 140547104458448 MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248 d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Success, id 140547104457552 MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'b366e466-b0ea-4a09-866b-d0248 d7523a6', 'address': '0'} MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(start_mo nitor) Success, id 140547104459792 MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.br okerlink.BrokerLink::(set_stor age_domain) Success, id 140546772847056 MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/ 093faa75-5e33-4559-84fa- 1f1f8d48153b/911c7637-b49d- 463e-b186-23b404e50769) MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_sanlock) Failed to acquire the lock. Waiting '5's before the next attempt
agent.log from second hostMainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/ 5cabd8e1-5f4b-469e-becc- 227469e03f5c/8048cbd7-77e2- 4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score ) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/ 5cabd8e1-5f4b-469e-becc- 227469e03f5c/8048cbd7-77e2- 4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(validate_storage_server) Validating storage server MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced. MainThread::INFO::2018-01-12 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(score ) Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) MainThread::INFO::2018-01-12 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(refresh_vm_conf) Reloading vm.conf from the shared storage domain MainThread::INFO::2018-01-12 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) Extracting Engine VM OVF from the OVF_STORE MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf .ovf_store.OVFStore::(getEngin eVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717 -9bb0-4d80-8016-498fa4b88162/ 5cabd8e1-5f4b-469e-becc- 227469e03f5c/8048cbd7-77e2- 4805-9af4-d109fa36dfcf MainThread::INFO::2018-01-12 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Found an OVF for HE VM, trying to convert MainThread::INFO::2018-01-12 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine.config: :(_get_vm_conf_content_from_ ovf_store) Got vm.conf from OVF_STORE MainThread::INFO::2018-01-12 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(consu me) Engine down, local host does not have best score MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_vdsm) Initializing VDSM MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_storage_images) Connecting the storage MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.li b.storage_server.StorageServer ::(validate_storage_server) Validating storage server
Any suggestions how to resolve this .
regards,ArtemOn Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy <artem.tambovskiy@gmail.com> wrote:Trying to fix one thing I broke another :(
I fixed mnt_options for hosted engine storage domain and installed latest security patches to my hosts and hosted engine. All VM's up and running, but hosted_engine --vm-status reports about issues:[root@ovirt1 ~]# hosted-engine --vm-status--== Host 1 status ==--conf_on_shared_storage : TrueStatus up-to-date : FalseHostname : ovirt2Host ID : 1Engine status : unknown stale-dataScore : 0stopped : FalseLocal maintenance : Falsecrc32 : 193164b8local_conf_timestamp : 8350Host timestamp : 8350Extra metadata (valid at timestamp):metadata_parse_version=1metadata_feature_version=1timestamp=8350 (Fri Jan 12 19:03:54 2018)host-id=1score=0vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018)conf_on_shared_storage=Truemaintenance=Falsestate=EngineUnexpectedlyDownstopped=Falsetimeout=Thu Jan 1 05:24:43 1970--== Host 2 status ==--conf_on_shared_storage : TrueStatus up-to-date : FalseHostname : ovirt1.telia.ruHost ID : 2Engine status : unknown stale-dataScore : 0stopped : TrueLocal maintenance : Falsecrc32 : c7037c03local_conf_timestamp : 7530Host timestamp : 7530Extra metadata (valid at timestamp):metadata_parse_version=1metadata_feature_version=1timestamp=7530 (Fri Jan 12 16:10:12 2018)host-id=2score=0vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)conf_on_shared_storage=Truemaintenance=Falsestate=AgentStoppedstopped=True[root@ovirt1 ~]#from second host situation looks a bit different:[root@ovirt2 ~]# hosted-engine --vm-status--== Host 1 status ==--conf_on_shared_storage : TrueStatus up-to-date : TrueHostname : ovirt2Host ID : 1Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}Score : 0stopped : FalseLocal maintenance : Falsecrc32 : 78eabdb6local_conf_timestamp : 8403Host timestamp : 8402Extra metadata (valid at timestamp):metadata_parse_version=1metadata_feature_version=1timestamp=8402 (Fri Jan 12 19:04:47 2018)host-id=1score=0vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018)conf_on_shared_storage=Truemaintenance=Falsestate=EngineUnexpectedlyDownstopped=Falsetimeout=Thu Jan 1 05:24:43 1970--== Host 2 status ==--conf_on_shared_storage : TrueStatus up-to-date : FalseHostname : ovirt1.telia.ruHost ID : 2Engine status : unknown stale-dataScore : 0stopped : TrueLocal maintenance : Falsecrc32 : c7037c03local_conf_timestamp : 7530Host timestamp : 7530Extra metadata (valid at timestamp):metadata_parse_version=1metadata_feature_version=1timestamp=7530 (Fri Jan 12 16:10:12 2018)host-id=2score=0vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)conf_on_shared_storage=Truemaintenance=Falsestate=AgentStoppedstopped=TrueWebGUI shows that engine running on host ovirt1.
Gluster looks fine[root@ovirt1 ~]# gluster volume status engineStatus of volume: engineGluster process TCP Port RDMA Port Online Pid------------------------------------------------------------ ------------------ Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y 3244Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y 20372Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y 16609Self-heal Daemon on localhost N/A N/A Y 117868Self-heal Daemon on ovirt2.telia.ru N/A N/A Y 20521Self-heal Daemon on ovirt3 N/A N/A Y 25093Task Status of Volume engine------------------------------------------------------------ ------------------ There are no active volume tasks
How to resolve this issue?
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users