Hi everybody,
there are couple of things to check here.
- what version of hosted engine agent is this? The logs look like
coming from 4.1
- what version of engine is used?
- check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on
both hosts, the numbers must be different
- it looks like the agent or broker on host 2 is not active (or there
would be a report)
- the second host does not see data from the first host (unknown
stale-data), wait for a minute and check again, then check the storage
connection
And then the general troubleshooting:
- put hosted engine in global maintenance mode (and check that it is
visible from the other host using he --vm-status)
- mount storage domain (hosted-engine --connect-storage)
- check sanlock client status to see if proper lockspaces are present
Best regards
Martin Sivak
On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek(a)ihtfp.com> wrote:
Why are both hosts reporting as ovirt 1?
Look at the hostname fields to see what mean.
-derek
Sent using my mobile device. Please excuse any typos.
On January 16, 2018 7:11:09 AM Artem Tambovskiy <artem.tambovskiy(a)gmail.com>
wrote:
>
> Hello,
>
> Yes, I followed exactly the same procedure while reinstalling the hosts
> (the only difference that I have SSH key configured instead of the
> password).
>
> Just reinstalled the second host one more time, after 20 min the host
> still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and
> I still don't see crown icon for this host.
>
> hosted-engine --vm-status from ovirt1 host
>
> [root@ovirt1 ~]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : True
> Hostname : ovirt1.telia.ru
> Host ID : 1
> Engine status : {"health": "good",
"vm": "up",
> "detail": "up"}
> Score : 3400
> stopped : False
> Local maintenance : False
> crc32 : 3f94156a
> local_conf_timestamp : 349144
> Host timestamp : 349144
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=349144 (Tue Jan 16 15:03:45 2018)
> host-id=1
> score=3400
> vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineUp
> stopped=False
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : ovirt1.telia.ru
> Host ID : 2
> Engine status : unknown stale-data
> Score : 0
> stopped : True
> Local maintenance : False
> crc32 : c7037c03
> local_conf_timestamp : 7530
> Host timestamp : 7530
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=7530 (Fri Jan 12 16:10:12 2018)
> host-id=2
> score=0
> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=AgentStopped
> stopped=True
>
>
> hosted-engine --vm-status output from ovirt2 host
>
> [root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : ovirt1.telia.ru
> Host ID : 1
> Engine status : unknown stale-data
> Score : 3400
> stopped : False
> Local maintenance : False
> crc32 : 6d3606f1
> local_conf_timestamp : 349264
> Host timestamp : 349264
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=349264 (Tue Jan 16 15:05:45 2018)
> host-id=1
> score=3400
> vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineUp
> stopped=False
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : ovirt1.telia.ru
> Host ID : 2
> Engine status : unknown stale-data
> Score : 0
> stopped : True
> Local maintenance : False
> crc32 : c7037c03
> local_conf_timestamp : 7530
> Host timestamp : 7530
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=7530 (Fri Jan 12 16:10:12 2018)
> host-id=2
> score=0
> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=AgentStopped
> stopped=True
>
>
> Also I saw some log messages in webGUI about time drift like
>
> "Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum
> configured value is 300 seconds." that is a bit weird as haven't touched
any
> time settings since I installed the cluster.
> both host have the same time and timezone (MSK) but hosted engine lives in
> UTC timezone. Is it mandatory to have everything in sync and in the same
> timezone?
>
> Regards,
> Artem
>
>
>
>
>
>
> On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra(a)redhat.com> wrote:
>>
>> Hello,
>>
>> I now see that your hosted engine is up and running. Can you let me
>> know how did you try reinstalling the host? Below is the procedure which is
>> used and hope you did not miss any step while reinstalling. If no, can you
>> try reinstalling again and see if that works ?
>>
>> 1) Move the host to maintenance
>> 2) click on reinstall
>> 3) provide the password
>> 4) uncheck 'automatically configure host firewall'
>> 5) click on 'Deploy' tab
>> 6) click Hosted Engine deployment as 'Deploy'
>>
>> And once the host installation is done, wait till the active score of the
>> host shows 3400 in the general tab then check hosted-engine --vm-status.
>>
>> Thanks
>> kasturi
>>
>> On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy
>> <artem.tambovskiy(a)gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I have uploaded 2 archives with all relevant logs to shared hosting
>>> files from host 1 (which is currently running all VM's including
>>> hosted_engine) -
https://yadi.sk/d/PttRoYV63RTvhK
>>> files from second host -
https://yadi.sk/d/UBducEsV3RTvhc
>>>
>>> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it
>>> gives no effect. I have also tried to shutdown hosted_engine VM, stop
>>> ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect
>>> it again - no effect as well.
>>> Also I tried to reinstall second host from WebGUI - this lead to the
>>> interesting situation - now hosted-engine --vm-status shows that both
>>> hosts have the same address.
>>>
>>> [root@ovirt1 ~]# hosted-engine --vm-status
>>>
>>> --== Host 1 status ==--
>>>
>>> conf_on_shared_storage : True
>>> Status up-to-date : True
>>> Hostname : ovirt1.telia.ru
>>> Host ID : 1
>>> Engine status : {"health": "good",
"vm": "up",
>>> "detail": "up"}
>>> Score : 3400
>>> stopped : False
>>> Local maintenance : False
>>> crc32 : a7758085
>>> local_conf_timestamp : 259327
>>> Host timestamp : 259327
>>> Extra metadata (valid at timestamp):
>>> metadata_parse_version=1
>>> metadata_feature_version=1
>>> timestamp=259327 (Mon Jan 15 14:06:48 2018)
>>> host-id=1
>>> score=3400
>>> vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018)
>>> conf_on_shared_storage=True
>>> maintenance=False
>>> state=EngineUp
>>> stopped=False
>>>
>>>
>>> --== Host 2 status ==--
>>>
>>> conf_on_shared_storage : True
>>> Status up-to-date : False
>>> Hostname : ovirt1.telia.ru
>>> Host ID : 2
>>> Engine status : unknown stale-data
>>> Score : 0
>>> stopped : True
>>> Local maintenance : False
>>> crc32 : c7037c03
>>> local_conf_timestamp : 7530
>>> Host timestamp : 7530
>>> Extra metadata (valid at timestamp):
>>> metadata_parse_version=1
>>> metadata_feature_version=1
>>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>> host-id=2
>>> score=0
>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>> conf_on_shared_storage=True
>>> maintenance=False
>>> state=AgentStopped
>>> stopped=True
>>>
>>> Gluster seems working fine. all gluster nodes showing connected state.
>>>
>>> Any advises on how to resolve this situation are highly appreciated!
>>>
>>> Regards,
>>> Artem
>>>
>>>
>>> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra(a)redhat.com>
>>> wrote:
>>>>
>>>> Hello Artem,
>>>>
>>>> Can you check if glusterd service is running on host1 and all
>>>> the peers are in connected state ? If yes, can you restart
ovirt-ha-agent
>>>> and broker services and check if things are working fine ?
>>>>
>>>> Thanks
>>>> kasturi
>>>>
>>>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy
>>>> <artem.tambovskiy(a)gmail.com> wrote:
>>>>>
>>>>> Explored logs on both hosts.
>>>>> broker.log shows no errors.
>>>>>
>>>>> agent.log looking not good:
>>>>>
>>>>> on host1 (which running hosted engine) :
>>>>>
>>>>> MainThread::ERROR::2018-01-12
>>>>>
21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>> Traceback (most recent call last):
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> line 191, in _run_agent
>>>>> return action(he)
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> line 64, in action_proper
>>>>> return he.start_monitoring()
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> line 411, in start_monitoring
>>>>> self._initialize_sanlock()
>>>>> File
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> line 749, in _initialize_sanlock
>>>>> "Failed to initialize sanlock, the number of errors
has"
>>>>> SanlockInitializationError: Failed to initialize sanlock, the number
>>>>> of errors has exceeded the limit
>>>>>
>>>>> MainThread::ERROR::2018-01-12
>>>>>
21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>> Trying to restart agent
>>>>> MainThread::WARNING::2018-01-12
>>>>>
21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>> Restarting agent, attempt '1'
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.agenthosted_engine.HostedEngine::(_get_hostname)
>>>>> Found certificate common name: ovirt1.telia.ru
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> Initializing VDSM
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.libstorage_server.StorageServer::(connect_storage_server)
>>>>> Connecting storage server
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>> Connecting storage server
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>> Refreshing the storage domain
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Preparing images
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images)
>>>>> Preparing images
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Refreshing vm.conf
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> Reloading vm.conf from the shared storage domain
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> OVF_STORE volume path:
>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Found an OVF for HE VM, trying to convert
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Got vm.conf from OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>>> Initializing ha-broker connection
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor ping, options {'addr':
'80.239.162.97'}
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104457680
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor mgmt-bridge, options {'use_ssl':
'true', 'bridge_name':
>>>>> 'ovirtmgmt', 'address': '0'}
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor)
>>>>> Success, id 140547104458064
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor mem-free, options {'use_ssl':
'true', 'address': '0'}
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104458448
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor)
>>>>> Starting monitor cpu-load-no-engine, options {'use_ssl':
'true', 'vm_uuid':
>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address':
'0'}
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104457552
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor engine-health, options {'use_ssl':
'true', 'vm_uuid':
>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address':
'0'}
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104459792
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain)
>>>>> Success, id 140546772847056
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>>> Broker initialized, all submonitors started
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Ensuring lease for lockspace hosted-engine, host id 1 is acquired
(file:
>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769)
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Failed to acquire the lock. Waiting '5's before the next
attempt
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Failed to acquire the lock. Waiting '5's before the next
attempt
>>>>> MainThread::INFO::2018-01-12
>>>>>
21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Failed to acquire the lock. Waiting '5's before the next
attempt
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Failed to acquire the lock. Waiting '5's before the next
attempt
>>>>>
>>>>>
>>>>> agent.log from second host
>>>>>
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> Reloading vm.conf from the shared storage domain
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> OVF_STORE volume path:
>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Found an OVF for HE VM, trying to convert
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Got vm.conf from OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> Initializing VDSM
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> Reloading vm.conf from the shared storage domain
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> OVF_STORE volume path:
>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Found an OVF for HE VM, trying to convert
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Got vm.conf from OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> Initializing VDSM
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> Reloading vm.conf from the shared storage domain
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> OVF_STORE volume path:
>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Found an OVF for HE VM, trying to convert
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> Got vm.conf from OVF_STORE
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>>>>> Engine down, local host does not have best score
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engineHostedEngine::(_initialize_vdsm)
>>>>> Initializing VDSM
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12
>>>>>
22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>>
>>>>>
>>>>> Any suggestions how to resolve this .
>>>>>
>>>>> regards,
>>>>> Artem
>>>>>
>>>>>
>>>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy
>>>>> <artem.tambovskiy(a)gmail.com> wrote:
>>>>>>
>>>>>> Trying to fix one thing I broke another :(
>>>>>>
>>>>>> I fixed mnt_options for hosted engine storage domain and
installed
>>>>>> latest security patches to my hosts and hosted engine. All
VM's up and
>>>>>> running, but hosted_engine --vm-status reports about issues:
>>>>>>
>>>>>> [root@ovirt1 ~]# hosted-engine --vm-status
>>>>>>
>>>>>>
>>>>>> --== Host 1 status ==--
>>>>>>
>>>>>> conf_on_shared_storage : True
>>>>>> Status up-to-date : False
>>>>>> Hostname : ovirt2
>>>>>> Host ID : 1
>>>>>> Engine status : unknown stale-data
>>>>>> Score : 0
>>>>>> stopped : False
>>>>>> Local maintenance : False
>>>>>> crc32 : 193164b8
>>>>>> local_conf_timestamp : 8350
>>>>>> Host timestamp : 8350
>>>>>> Extra metadata (valid at timestamp):
>>>>>> metadata_parse_version=1
>>>>>> metadata_feature_version=1
>>>>>> timestamp=8350 (Fri Jan 12 19:03:54 2018)
>>>>>> host-id=1
>>>>>> score=0
>>>>>> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018)
>>>>>> conf_on_shared_storage=True
>>>>>> maintenance=False
>>>>>> state=EngineUnexpectedlyDown
>>>>>> stopped=False
>>>>>> timeout=Thu Jan 1 05:24:43 1970
>>>>>>
>>>>>>
>>>>>> --== Host 2 status ==--
>>>>>>
>>>>>> conf_on_shared_storage : True
>>>>>> Status up-to-date : False
>>>>>> Hostname : ovirt1.telia.ru
>>>>>> Host ID : 2
>>>>>> Engine status : unknown stale-data
>>>>>> Score : 0
>>>>>> stopped : True
>>>>>> Local maintenance : False
>>>>>> crc32 : c7037c03
>>>>>> local_conf_timestamp : 7530
>>>>>> Host timestamp : 7530
>>>>>> Extra metadata (valid at timestamp):
>>>>>> metadata_parse_version=1
>>>>>> metadata_feature_version=1
>>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> host-id=2
>>>>>> score=0
>>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> conf_on_shared_storage=True
>>>>>> maintenance=False
>>>>>> state=AgentStopped
>>>>>> stopped=True
>>>>>> [root@ovirt1 ~]#
>>>>>>
>>>>>>
>>>>>>
>>>>>> from second host situation looks a bit different:
>>>>>>
>>>>>>
>>>>>> [root@ovirt2 ~]# hosted-engine --vm-status
>>>>>>
>>>>>>
>>>>>> --== Host 1 status ==--
>>>>>>
>>>>>> conf_on_shared_storage : True
>>>>>> Status up-to-date : True
>>>>>> Hostname : ovirt2
>>>>>> Host ID : 1
>>>>>> Engine status : {"reason":
"vm not running on
>>>>>> this host", "health": "bad",
"vm": "down", "detail": "unknown"}
>>>>>> Score : 0
>>>>>> stopped : False
>>>>>> Local maintenance : False
>>>>>> crc32 : 78eabdb6
>>>>>> local_conf_timestamp : 8403
>>>>>> Host timestamp : 8402
>>>>>> Extra metadata (valid at timestamp):
>>>>>> metadata_parse_version=1
>>>>>> metadata_feature_version=1
>>>>>> timestamp=8402 (Fri Jan 12 19:04:47 2018)
>>>>>> host-id=1
>>>>>> score=0
>>>>>> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018)
>>>>>> conf_on_shared_storage=True
>>>>>> maintenance=False
>>>>>> state=EngineUnexpectedlyDown
>>>>>> stopped=False
>>>>>> timeout=Thu Jan 1 05:24:43 1970
>>>>>>
>>>>>>
>>>>>> --== Host 2 status ==--
>>>>>>
>>>>>> conf_on_shared_storage : True
>>>>>> Status up-to-date : False
>>>>>> Hostname : ovirt1.telia.ru
>>>>>> Host ID : 2
>>>>>> Engine status : unknown stale-data
>>>>>> Score : 0
>>>>>> stopped : True
>>>>>> Local maintenance : False
>>>>>> crc32 : c7037c03
>>>>>> local_conf_timestamp : 7530
>>>>>> Host timestamp : 7530
>>>>>> Extra metadata (valid at timestamp):
>>>>>> metadata_parse_version=1
>>>>>> metadata_feature_version=1
>>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> host-id=2
>>>>>> score=0
>>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> conf_on_shared_storage=True
>>>>>> maintenance=False
>>>>>> state=AgentStopped
>>>>>> stopped=True
>>>>>>
>>>>>>
>>>>>> WebGUI shows that engine running on host ovirt1.
>>>>>> Gluster looks fine
>>>>>> [root@ovirt1 ~]# gluster volume status engine
>>>>>> Status of volume: engine
>>>>>> Gluster process TCP Port RDMA Port
>>>>>> Online Pid
>>>>>>
>>>>>>
------------------------------------------------------------------------------
>>>>>> Brick ovirt1.teliaru:/oVirt/engine 49169 0
Y
>>>>>> 3244
>>>>>> Brick ovirt2.telia.ru:/oVirt/engine 49179 0
Y
>>>>>> 20372
>>>>>> Brick ovirt3.telia.ru:/oVirt/engine 49206 0
Y
>>>>>> 16609
>>>>>> Self-heal Daemon on localhost N/A N/A
Y
>>>>>> 117868
>>>>>> Self-heal Daemon on ovirt2.telia.ru N/A N/A
Y
>>>>>> 20521
>>>>>> Self-heal Daemon on ovirt3 N/A N/A
Y
>>>>>> 25093
>>>>>>
>>>>>> Task Status of Volume engine
>>>>>>
>>>>>>
------------------------------------------------------------------------------
>>>>>> There are no active volume tasks
>>>>>>
>>>>>> How to resolve this issue?
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users(a)ovirt.org
>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users(a)ovirt.org
>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users