Hello Artem,
Can you check if glusterd service is running on host1 and all the
peers are in connected state ? If yes, can you restart ovirt-ha-agent and
broker services and check if things are working fine ?
Thanks
kasturi
On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy <
artem.tambovskiy(a)gmail.com> wrote:
Explored logs on both hosts.
broker.log shows no errors.
agent.log looking not good:
on host1 (which running hosted engine) :
MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::
ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most
recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 191, in _run_agent
return action(he)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 64, in action_proper
return he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 411, in start_monitoring
self._initialize_sanlock()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 749, in _initialize_sanlock
"Failed to initialize sanlock, the number of errors has"
SanlockInitializationError: Failed to initialize sanlock, the number of
errors has exceeded the limit
MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::
ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart
agent
MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::
ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent,
attempt '1'
MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::
242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: ovirt1.telia.ru
MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::
604::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_vdsm) Initializing VDSM
MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::
630::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Connecting the storage
MainThread::INFO::2018-01-12 21:51:11,399::storage_server::
220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
Validating storage server
MainThread::INFO::2018-01-12 21:51:13,725::storage_server::
239::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2018-01-12 21:51:18,390::storage_server::
246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2018-01-12 21:51:18,423::storage_server::
253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::
663::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Preparing images
MainThread::INFO::2018-01-12 21:51:18,690::image::126::
ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images
MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::
666::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Refreshing vm.conf
MainThread::INFO::2018-01-12 21:51:21,895::config::493::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
Reloading vm.conf from the shared storage domain
MainThread::INFO::2018-01-12 21:51:21,896::config::416::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher
copy of vm configuration from the OVF_STORE
MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132::
ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134::
ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
d109fa36dfcf
MainThread::INFO::2018-01-12 21:51:21,915::config::435::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM,
trying to convert
MainThread::INFO::2018-01-12 21:51:21,918::config::440::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::
509::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor ping, options {'addr': '80.239.162.97'}
MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Success, id 140547104457680
MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor mgmt-bridge, options {'use_ssl': 'true',
'bridge_name':
'ovirtmgmt', 'address': '0'}
MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Success, id 140547104458064
MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor mem-free, options {'use_ssl': 'true', 'address':
'0'}
MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Success, id 140547104458448
MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor cpu-load-no-engine, options {'use_ssl': 'true',
'vm_uuid':
'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Success, id 140547104457552
MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor engine-health, options {'use_ssl': 'true',
'vm_uuid':
'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Success, id 140547104459792
MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179:
:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain)
Success, id 140546772847056
MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::
601::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_broker) Broker initialized, all submonitors
started
MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::
704::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace
hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/
4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/
911c7637-b49d-463e-b186-23b404e50769)
MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::
745::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting
'5's before the next attempt
MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::
745::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting
'5's before the next attempt
MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::
745::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting
'5's before the next attempt
MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::
745::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_sanlock) Failed to acquire the lock. Waiting
'5's before the next attempt
agent.log from second host
MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::
630::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Connecting the storage
MainThread::INFO::2018-01-12 22:01:37,242::storage_server::
220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
Validating storage server
MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::
639::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Storage domain reported as
valid and reconnect is not forced.
MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::
453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUnexpectedlyDown (score: 0)
MainThread::INFO::2018-01-12 22:01:52,150::config::493::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
Reloading vm.conf from the shared storage domain
MainThread::INFO::2018-01-12 22:01:52,150::config::416::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher
copy of vm configuration from the OVF_STORE
MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132::
ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134::
ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
d109fa36dfcf
MainThread::INFO::2018-01-12 22:01:52,174::config::435::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM,
trying to convert
MainThread::INFO::2018-01-12 22:01:52,179::config::440::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::
604::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_vdsm) Initializing VDSM
MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::
630::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Connecting the storage
MainThread::INFO::2018-01-12 22:01:54,587::storage_server::
220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
Validating storage server
MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::
639::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Storage domain reported as
valid and reconnect is not forced.
MainThread::INFO::2018-01-12 22:01:59,299::states::682::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is
0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018
MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::
453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUnexpectedlyDown (score: 0)
MainThread::INFO::2018-01-12 22:02:09,659::config::493::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
Reloading vm.conf from the shared storage domain
MainThread::INFO::2018-01-12 22:02:09,659::config::416::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher
copy of vm configuration from the OVF_STORE
MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132::
ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134::
ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
d109fa36dfcf
MainThread::INFO::2018-01-12 22:02:09,683::config::435::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM,
trying to convert
MainThread::INFO::2018-01-12 22:02:09,688::config::440::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::
604::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_vdsm) Initializing VDSM
MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::
630::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Connecting the storage
MainThread::INFO::2018-01-12 22:02:12,113::storage_server::
220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
Validating storage server
MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::
639::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Storage domain reported as
valid and reconnect is not forced.
MainThread::INFO::2018-01-12 22:02:16,859::states::682::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is
0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018
MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::
453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUnexpectedlyDown (score: 0)
MainThread::INFO::2018-01-12 22:02:27,100::config::493::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
Reloading vm.conf from the shared storage domain
MainThread::INFO::2018-01-12 22:02:27,100::config::416::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher
copy of vm configuration from the OVF_STORE
MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132::
ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134::
ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
d109fa36dfcf
MainThread::INFO::2018-01-12 22:02:27,125::config::435::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM,
trying to convert
MainThread::INFO::2018-01-12 22:02:27,129::config::440::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
MainThread::INFO::2018-01-12 22:02:27,130::states::667::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine
down, local host does not have best score
MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::
604::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_vdsm) Initializing VDSM
MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::
630::ovirt_hosted_engine_ha.agent.hosted_engine.
HostedEngine::(_initialize_storage_images) Connecting the storage
MainThread::INFO::2018-01-12 22:02:29,586::storage_server::
220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
Validating storage server
Any suggestions how to resolve this .
regards,
Artem
On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy <
artem.tambovskiy(a)gmail.com> wrote:
> Trying to fix one thing I broke another :(
>
> I fixed mnt_options for hosted engine storage domain and installed latest
> security patches to my hosts and hosted engine. All VM's up and running,
> but hosted_engine --vm-status reports about issues:
>
> [root@ovirt1 ~]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : ovirt2
> Host ID : 1
> Engine status : unknown stale-data
> Score : 0
> stopped : False
> Local maintenance : False
> crc32 : 193164b8
> local_conf_timestamp : 8350
> Host timestamp : 8350
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=8350 (Fri Jan 12 19:03:54 2018)
> host-id=1
> score=0
> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineUnexpectedlyDown
> stopped=False
> timeout=Thu Jan 1 05:24:43 1970
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : ovirt1.telia.ru
> Host ID : 2
> Engine status : unknown stale-data
> Score : 0
> stopped : True
> Local maintenance : False
> crc32 : c7037c03
> local_conf_timestamp : 7530
> Host timestamp : 7530
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=7530 (Fri Jan 12 16:10:12 2018)
> host-id=2
> score=0
> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=AgentStopped
> stopped=True
> [root@ovirt1 ~]#
>
>
>
> from second host situation looks a bit different:
>
>
> [root@ovirt2 ~]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : True
> Hostname : ovirt2
> Host ID : 1
> Engine status : {"reason": "vm not running on
this
> host", "health": "bad", "vm": "down",
"detail": "unknown"}
> Score : 0
> stopped : False
> Local maintenance : False
> crc32 : 78eabdb6
> local_conf_timestamp : 8403
> Host timestamp : 8402
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=8402 (Fri Jan 12 19:04:47 2018)
> host-id=1
> score=0
> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineUnexpectedlyDown
> stopped=False
> timeout=Thu Jan 1 05:24:43 1970
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : ovirt1.telia.ru
> Host ID : 2
> Engine status : unknown stale-data
> Score : 0
> stopped : True
> Local maintenance : False
> crc32 : c7037c03
> local_conf_timestamp : 7530
> Host timestamp : 7530
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=7530 (Fri Jan 12 16:10:12 2018)
> host-id=2
> score=0
> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=AgentStopped
> stopped=True
>
>
> WebGUI shows that engine running on host ovirt1.
> Gluster looks fine
> [root@ovirt1 ~]# gluster volume status engine
> Status of volume: engine
> Gluster process TCP Port RDMA Port Online
> Pid
> ------------------------------------------------------------
> ------------------
> Brick ovirt1.telia.ru:/oVirt/engine 49169 0 Y
> 3244
> Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y
> 20372
> Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y
> 16609
> Self-heal Daemon on localhost N/A N/A Y
> 117868
> Self-heal Daemon on ovirt2.telia.ru N/A N/A Y
> 20521
> Self-heal Daemon on ovirt3 N/A N/A Y
> 25093
>
> Task Status of Volume engine
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
> How to resolve this issue?
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users