[ovirt-users] hosted-engine unknow stale-data

Derek Atkins derek at ihtfp.com
Tue Jan 16 12:16:03 UTC 2018


Why are both hosts reporting as ovirt 1?
Look at the hostname fields to see what mean.

-derek
Sent using my mobile device. Please excuse any typos.



On January 16, 2018 7:11:09 AM Artem Tambovskiy 
<artem.tambovskiy at gmail.com> wrote:

> Hello,
>
> Yes, I followed exactly the same procedure while reinstalling the hosts
> (the only difference that I have SSH key configured instead of the
> password).
>
> Just reinstalled the second host one more time, after 20 min the host still
> haven't reached active score of 3400 (Hosted Engine HA:Not Active) and I
> still don't see crown icon for this host.
>
> hosted-engine --vm-status  from ovirt1 host
>
> [root at ovirt1 ~]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : ovirt1.telia.ru
> Host ID                            : 1
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 3f94156a
> local_conf_timestamp               : 349144
> Host timestamp                     : 349144
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=349144 (Tue Jan 16 15:03:45 2018)
>         host-id=1
>         score=3400
>         vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=EngineUp
>         stopped=False
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : False
> Hostname                           : ovirt1.telia.ru
> Host ID                            : 2
> Engine status                      : unknown stale-data
> Score                              : 0
> stopped                            : True
> Local maintenance                  : False
> crc32                              : c7037c03
> local_conf_timestamp               : 7530
> Host timestamp                     : 7530
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
>         host-id=2
>         score=0
>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=AgentStopped
>         stopped=True
>
>
> hosted-engine --vm-status output from ovirt2 host
>
> [root at ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : False
> Hostname                           : ovirt1.telia.ru
> Host ID                            : 1
> Engine status                      : unknown stale-data
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 6d3606f1
> local_conf_timestamp               : 349264
> Host timestamp                     : 349264
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=349264 (Tue Jan 16 15:05:45 2018)
>         host-id=1
>         score=3400
>         vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=EngineUp
>         stopped=False
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : False
> Hostname                           : ovirt1.telia.ru
> Host ID                            : 2
> Engine status                      : unknown stale-data
> Score                              : 0
> stopped                            : True
> Local maintenance                  : False
> crc32                              : c7037c03
> local_conf_timestamp               : 7530
> Host timestamp                     : 7530
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
>         host-id=2
>         score=0
>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=AgentStopped
>         stopped=True
>
>
> Also I saw some log messages in webGUI about time drift like
>
> "Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum
> configured value is 300 seconds." that is a bit weird as haven't touched
> any time settings since I installed the cluster.
> both host have the same time and timezone (MSK) but hosted engine lives in
> UTC timezone. Is it mandatory to have everything in sync and in the same
> timezone?
>
> Regards,
> Artem
>
>
>
>
>
>
> On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra at redhat.com> wrote:
>
>> Hello,
>>
>>      I now see that your hosted engine is up and running. Can you let me
>> know how did you try reinstalling the host? Below is the procedure which is
>> used and hope you did not miss any step while reinstalling. If no, can you
>> try reinstalling again and see if that works ?
>>
>> 1) Move the host to maintenance
>> 2) click on reinstall
>> 3) provide the password
>> 4) uncheck 'automatically configure host firewall'
>> 5) click on 'Deploy' tab
>> 6) click Hosted Engine deployment as 'Deploy'
>>
>> And once the host installation is done, wait till the active score of the
>> host shows 3400 in the general tab then check hosted-engine --vm-status.
>>
>> Thanks
>> kasturi
>>
>> On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy <
>> artem.tambovskiy at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have uploaded 2 archives with all relevant logs to shared hosting
>>> files from host 1  (which is currently running all VM's including
>>> hosted_engine)  -  https://yadi.sk/d/PttRoYV63RTvhK
>>> files from second host - https://yadi.sk/d/UBducEsV3RTvhc
>>>
>>> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it
>>> gives no effect. I have also tried to shutdown hosted_engine VM, stop
>>> ovirt-ha-agent and ovirt-ha-broker  services disconnect storage and connect
>>> it again  - no effect as well.
>>> Also I tried to reinstall second host from WebGUI - this lead to the
>>> interesting situation - now  hosted-engine --vm-status  shows that both
>>> hosts have the same address.
>>>
>>> [root at ovirt1 ~]# hosted-engine --vm-status
>>>
>>> --== Host 1 status ==--
>>>
>>> conf_on_shared_storage             : True
>>> Status up-to-date                  : True
>>> Hostname                           : ovirt1.telia.ru
>>> Host ID                            : 1
>>> Engine status                      : {"health": "good", "vm": "up",
>>> "detail": "up"}
>>> Score                              : 3400
>>> stopped                            : False
>>> Local maintenance                  : False
>>> crc32                              : a7758085
>>> local_conf_timestamp               : 259327
>>> Host timestamp                     : 259327
>>> Extra metadata (valid at timestamp):
>>>         metadata_parse_version=1
>>>         metadata_feature_version=1
>>>         timestamp=259327 (Mon Jan 15 14:06:48 2018)
>>>         host-id=1
>>>         score=3400
>>>         vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018)
>>>         conf_on_shared_storage=True
>>>         maintenance=False
>>>         state=EngineUp
>>>         stopped=False
>>>
>>>
>>> --== Host 2 status ==--
>>>
>>> conf_on_shared_storage             : True
>>> Status up-to-date                  : False
>>> Hostname                           : ovirt1.telia.ru
>>> Host ID                            : 2
>>> Engine status                      : unknown stale-data
>>> Score                              : 0
>>> stopped                            : True
>>> Local maintenance                  : False
>>> crc32                              : c7037c03
>>> local_conf_timestamp               : 7530
>>> Host timestamp                     : 7530
>>> Extra metadata (valid at timestamp):
>>>         metadata_parse_version=1
>>>         metadata_feature_version=1
>>>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>         host-id=2
>>>         score=0
>>>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>         conf_on_shared_storage=True
>>>         maintenance=False
>>>         state=AgentStopped
>>>         stopped=True
>>>
>>> Gluster seems working fine. all gluster nodes showing connected state.
>>>
>>> Any advises on how to resolve this situation are highly appreciated!
>>>
>>> Regards,
>>> Artem
>>>
>>>
>>> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra at redhat.com>
>>> wrote:
>>>
>>>> Hello Artem,
>>>>
>>>>         Can you check if glusterd service is running on host1 and all
>>>> the peers are in connected state ? If yes, can you restart ovirt-ha-agent
>>>> and broker services and check if things are working fine ?
>>>>
>>>> Thanks
>>>> kasturi
>>>>
>>>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy <
>>>> artem.tambovskiy at gmail.com> wrote:
>>>>
>>>>> Explored logs on both hosts.
>>>>> broker.log shows no errors.
>>>>>
>>>>> agent.log looking not good:
>>>>>
>>>>> on host1 (which running hosted engine) :
>>>>>
>>>>> MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovir
>>>>> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most
>>>>> recent call last):
>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> line 191, in _run_agent
>>>>>     return action(he)
>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> line 64, in action_proper
>>>>>     return he.start_monitoring()
>>>>>   File 
>>>>>   "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> line 411, in start_monitoring
>>>>>     self._initialize_sanlock()
>>>>>   File 
>>>>>   "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> line 749, in _initialize_sanlock
>>>>>     "Failed to initialize sanlock, the number of errors has"
>>>>> SanlockInitializationError: Failed to initialize sanlock, the number of
>>>>> errors has exceeded the limit
>>>>>
>>>>> MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovir
>>>>> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart
>>>>> agent
>>>>> MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovir
>>>>> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent,
>>>>> attempt '1'
>>>>> MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::2
>>>>> 42::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>>>>> Found certificate common name: ovirt1.telia.ru
>>>>> MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::6
>>>>> 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> Initializing VDSM
>>>>> MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::6
>>>>> 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12 21:51:11,399::storage_server::220::
>>>>> ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>> MainThread::INFO::2018-01-12 21:51:13,725::storage_server::239::
>>>>> ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>> Connecting storage server
>>>>> MainThread::INFO::2018-01-12 21:51:18,390::storage_server::246::
>>>>> ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>> Connecting storage server
>>>>> MainThread::INFO::2018-01-12 21:51:18,423::storage_server::253::
>>>>> ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>> Refreshing the storage domain
>>>>> MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::6
>>>>> 63::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Preparing images
>>>>> MainThread::INFO::2018-01-12 21:51:18,690::image::126::ovir
>>>>> t_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images
>>>>> MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::6
>>>>> 66::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Refreshing vm.conf
>>>>> MainThread::INFO::2018-01-12 21:51:21,895::config::493::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> Reloading vm.conf from the shared storage domain
>>>>> MainThread::INFO::2018-01-12 21:51:21,896::config::416::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of
>>>>> vm configuration from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132::
>>>>> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134::
>>>>> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717
>>>>> -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e
>>>>> 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> MainThread::INFO::2018-01-12 21:51:21,915::config::435::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying
>>>>> to convert
>>>>> MainThread::INFO::2018-01-12 21:51:21,918::config::440::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
>>>>> MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::5
>>>>> 09::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>>> Initializing ha-broker connection
>>>>> MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor ping, options {'addr': '80.239.162.97'}
>>>>> MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::141::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104457680
>>>>> MainThread::INFO::2018-01-12 21:51:21,922::brokerlink::130::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name':
>>>>> 'ovirtmgmt', 'address': '0'}
>>>>> MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::141::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104458064
>>>>> MainThread::INFO::2018-01-12 21:51:21,936::brokerlink::130::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
>>>>> MainThread::INFO::2018-01-12 21:51:21,938::brokerlink::141::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104458448
>>>>> MainThread::INFO::2018-01-12 21:51:21,939::brokerlink::130::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid':
>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
>>>>> MainThread::INFO::2018-01-12 21:51:21,940::brokerlink::141::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104457552
>>>>> MainThread::INFO::2018-01-12 21:51:21,941::brokerlink::130::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid':
>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
>>>>> MainThread::INFO::2018-01-12 21:51:21,942::brokerlink::141::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> Success, id 140547104459792
>>>>> MainThread::INFO::2018-01-12 21:51:26,951::brokerlink::179::
>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain)
>>>>> Success, id 140546772847056
>>>>> MainThread::INFO::2018-01-12 21:51:26,952::hosted_engine::6
>>>>> 01::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>>> Broker initialized, all submonitors started
>>>>> MainThread::INFO::2018-01-12 21:51:27,049::hosted_engine::7
>>>>> 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file:
>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/0
>>>>> 93faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-
>>>>> 23b404e50769)
>>>>> MainThread::INFO::2018-01-12 21:53:48,067::hosted_engine::7
>>>>> 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Failed to acquire the lock. Waiting '5's before the next attempt
>>>>> MainThread::INFO::2018-01-12 21:56:14,088::hosted_engine::7
>>>>> 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Failed to acquire the lock. Waiting '5's before the next attempt
>>>>> MainThread::INFO::2018-01-12 21:58:40,111::hosted_engine::7
>>>>> 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Failed to acquire the lock. Waiting '5's before the next attempt
>>>>> MainThread::INFO::2018-01-12 22:01:06,133::hosted_engine::7
>>>>> 45::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> Failed to acquire the lock. Waiting '5's before the next attempt
>>>>>
>>>>>
>>>>> agent.log from second host
>>>>>
>>>>> MainThread::INFO::2018-01-12 22:01:37,241::hosted_engine::6
>>>>> 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12 22:01:37,242::storage_server::220::
>>>>> ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>> MainThread::INFO::2018-01-12 22:01:39,540::hosted_engine::6
>>>>> 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>> MainThread::INFO::2018-01-12 22:01:41,939::hosted_engine::4
>>>>> 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>> MainThread::INFO::2018-01-12 22:01:52,150::config::493::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> Reloading vm.conf from the shared storage domain
>>>>> MainThread::INFO::2018-01-12 22:01:52,150::config::416::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of
>>>>> vm configuration from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:01:52,151::ovf_store::132::
>>>>> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:01:52,153::ovf_store::134::
>>>>> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717
>>>>> -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e
>>>>> 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> MainThread::INFO::2018-01-12 22:01:52,174::config::435::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying
>>>>> to convert
>>>>> MainThread::INFO::2018-01-12 22:01:52,179::config::440::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:01:52,189::hosted_engine::6
>>>>> 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> Initializing VDSM
>>>>> MainThread::INFO::2018-01-12 22:01:54,586::hosted_engine::6
>>>>> 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12 22:01:54,587::storage_server::220::
>>>>> ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>> MainThread::INFO::2018-01-12 22:01:56,903::hosted_engine::6
>>>>> 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>> MainThread::INFO::2018-01-12 22:01:59,299::states::682::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is
>>>>> 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018
>>>>> MainThread::INFO::2018-01-12 22:01:59,299::hosted_engine::4
>>>>> 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>> MainThread::INFO::2018-01-12 22:02:09,659::config::493::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> Reloading vm.conf from the shared storage domain
>>>>> MainThread::INFO::2018-01-12 22:02:09,659::config::416::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of
>>>>> vm configuration from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:02:09,660::ovf_store::132::
>>>>> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:02:09,663::ovf_store::134::
>>>>> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717
>>>>> -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e
>>>>> 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> MainThread::INFO::2018-01-12 22:02:09,683::config::435::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying
>>>>> to convert
>>>>> MainThread::INFO::2018-01-12 22:02:09,688::config::440::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:02:09,698::hosted_engine::6
>>>>> 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> Initializing VDSM
>>>>> MainThread::INFO::2018-01-12 22:02:12,112::hosted_engine::6
>>>>> 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12 22:02:12,113::storage_server::220::
>>>>> ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>> MainThread::INFO::2018-01-12 22:02:14,444::hosted_engine::6
>>>>> 39::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>> MainThread::INFO::2018-01-12 22:02:16,859::states::682::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is
>>>>> 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018
>>>>> MainThread::INFO::2018-01-12 22:02:16,859::hosted_engine::4
>>>>> 53::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>> MainThread::INFO::2018-01-12 22:02:27,100::config::493::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> Reloading vm.conf from the shared storage domain
>>>>> MainThread::INFO::2018-01-12 22:02:27,100::config::416::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Trying to get a fresher copy of
>>>>> vm configuration from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:02:27,101::ovf_store::132::
>>>>> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:02:27,103::ovf_store::134::
>>>>> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717
>>>>> -9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e
>>>>> 03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> MainThread::INFO::2018-01-12 22:02:27,125::config::435::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying
>>>>> to convert
>>>>> MainThread::INFO::2018-01-12 22:02:27,129::config::440::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:
>>>>> :(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
>>>>> MainThread::INFO::2018-01-12 22:02:27,130::states::667::ovi
>>>>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine
>>>>> down, local host does not have best score
>>>>> MainThread::INFO::2018-01-12 22:02:27,139::hosted_engine::6
>>>>> 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> Initializing VDSM
>>>>> MainThread::INFO::2018-01-12 22:02:29,584::hosted_engine::6
>>>>> 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> Connecting the storage
>>>>> MainThread::INFO::2018-01-12 22:02:29,586::storage_server::220::
>>>>> ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> Validating storage server
>>>>>
>>>>>
>>>>> Any suggestions how to resolve this .
>>>>>
>>>>> regards,
>>>>> Artem
>>>>>
>>>>>
>>>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy <
>>>>> artem.tambovskiy at gmail.com> wrote:
>>>>>
>>>>>> Trying to fix one thing I broke another :(
>>>>>>
>>>>>> I fixed mnt_options for hosted engine storage domain and installed
>>>>>> latest security patches to my hosts and hosted engine. All VM's up and
>>>>>> running, but  hosted_engine --vm-status reports about issues:
>>>>>>
>>>>>> [root at ovirt1 ~]# hosted-engine --vm-status
>>>>>>
>>>>>>
>>>>>> --== Host 1 status ==--
>>>>>>
>>>>>> conf_on_shared_storage             : True
>>>>>> Status up-to-date                  : False
>>>>>> Hostname                           : ovirt2
>>>>>> Host ID                            : 1
>>>>>> Engine status                      : unknown stale-data
>>>>>> Score                              : 0
>>>>>> stopped                            : False
>>>>>> Local maintenance                  : False
>>>>>> crc32                              : 193164b8
>>>>>> local_conf_timestamp               : 8350
>>>>>> Host timestamp                     : 8350
>>>>>> Extra metadata (valid at timestamp):
>>>>>>         metadata_parse_version=1
>>>>>>         metadata_feature_version=1
>>>>>>         timestamp=8350 (Fri Jan 12 19:03:54 2018)
>>>>>>         host-id=1
>>>>>>         score=0
>>>>>>         vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018)
>>>>>>         conf_on_shared_storage=True
>>>>>>         maintenance=False
>>>>>>         state=EngineUnexpectedlyDown
>>>>>>         stopped=False
>>>>>>         timeout=Thu Jan  1 05:24:43 1970
>>>>>>
>>>>>>
>>>>>> --== Host 2 status ==--
>>>>>>
>>>>>> conf_on_shared_storage             : True
>>>>>> Status up-to-date                  : False
>>>>>> Hostname                           : ovirt1.telia.ru
>>>>>> Host ID                            : 2
>>>>>> Engine status                      : unknown stale-data
>>>>>> Score                              : 0
>>>>>> stopped                            : True
>>>>>> Local maintenance                  : False
>>>>>> crc32                              : c7037c03
>>>>>> local_conf_timestamp               : 7530
>>>>>> Host timestamp                     : 7530
>>>>>> Extra metadata (valid at timestamp):
>>>>>>         metadata_parse_version=1
>>>>>>         metadata_feature_version=1
>>>>>>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>>         host-id=2
>>>>>>         score=0
>>>>>>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>>         conf_on_shared_storage=True
>>>>>>         maintenance=False
>>>>>>         state=AgentStopped
>>>>>>         stopped=True
>>>>>> [root at ovirt1 ~]#
>>>>>>
>>>>>>
>>>>>>
>>>>>> from second host situation looks a bit different:
>>>>>>
>>>>>>
>>>>>> [root at ovirt2 ~]# hosted-engine --vm-status
>>>>>>
>>>>>>
>>>>>> --== Host 1 status ==--
>>>>>>
>>>>>> conf_on_shared_storage             : True
>>>>>> Status up-to-date                  : True
>>>>>> Hostname                           : ovirt2
>>>>>> Host ID                            : 1
>>>>>> Engine status                      : {"reason": "vm not running on
>>>>>> this host", "health": "bad", "vm": "down", "detail": "unknown"}
>>>>>> Score                              : 0
>>>>>> stopped                            : False
>>>>>> Local maintenance                  : False
>>>>>> crc32                              : 78eabdb6
>>>>>> local_conf_timestamp               : 8403
>>>>>> Host timestamp                     : 8402
>>>>>> Extra metadata (valid at timestamp):
>>>>>>         metadata_parse_version=1
>>>>>>         metadata_feature_version=1
>>>>>>         timestamp=8402 (Fri Jan 12 19:04:47 2018)
>>>>>>         host-id=1
>>>>>>         score=0
>>>>>>         vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018)
>>>>>>         conf_on_shared_storage=True
>>>>>>         maintenance=False
>>>>>>         state=EngineUnexpectedlyDown
>>>>>>         stopped=False
>>>>>>         timeout=Thu Jan  1 05:24:43 1970
>>>>>>
>>>>>>
>>>>>> --== Host 2 status ==--
>>>>>>
>>>>>> conf_on_shared_storage             : True
>>>>>> Status up-to-date                  : False
>>>>>> Hostname                           : ovirt1.telia.ru
>>>>>> Host ID                            : 2
>>>>>> Engine status                      : unknown stale-data
>>>>>> Score                              : 0
>>>>>> stopped                            : True
>>>>>> Local maintenance                  : False
>>>>>> crc32                              : c7037c03
>>>>>> local_conf_timestamp               : 7530
>>>>>> Host timestamp                     : 7530
>>>>>> Extra metadata (valid at timestamp):
>>>>>>         metadata_parse_version=1
>>>>>>         metadata_feature_version=1
>>>>>>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>>         host-id=2
>>>>>>         score=0
>>>>>>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>>         conf_on_shared_storage=True
>>>>>>         maintenance=False
>>>>>>         state=AgentStopped
>>>>>>         stopped=True
>>>>>>
>>>>>>
>>>>>> WebGUI shows that engine running on host ovirt1.
>>>>>> Gluster looks fine
>>>>>> [root at ovirt1 ~]# gluster volume status engine
>>>>>> Status of volume: engine
>>>>>> Gluster process                             TCP Port  RDMA Port
>>>>>>  Online  Pid
>>>>>> ------------------------------------------------------------
>>>>>> ------------------
>>>>>> Brick ovirt1.telia.ru:/oVirt/engine         49169     0          Y
>>>>>>     3244
>>>>>> Brick ovirt2.telia.ru:/oVirt/engine         49179     0          Y
>>>>>>     20372
>>>>>> Brick ovirt3.telia.ru:/oVirt/engine         49206     0          Y
>>>>>>     16609
>>>>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>>>>>   117868
>>>>>> Self-heal Daemon on ovirt2.telia.ru         N/A       N/A        Y
>>>>>>     20521
>>>>>> Self-heal Daemon on ovirt3                  N/A       N/A        Y
>>>>>>   25093
>>>>>>
>>>>>> Task Status of Volume engine
>>>>>> ------------------------------------------------------------
>>>>>> ------------------
>>>>>> There are no active volume tasks
>>>>>>
>>>>>> How to resolve this issue?
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
> ----------
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180116/fe10cd80/attachment.html>


More information about the Users mailing list