[ovirt-users] hosted-engine unknow stale-data

Artem Tambovskiy artem.tambovskiy at gmail.com
Tue Jan 16 14:00:41 UTC 2018


Hi Martin,

Thanks for feedback.

All hosts and hosted-engine running 4.1.8 release.
The strange thing : I can see that host ID is set to 1 on both hosts at
/etc/ovirt-hosted-engine/hosted-engine.conf file.
I have no idea how this happen, the only thing I have changed recently is
that I have changed mnt_options in order to add backup-volfile-servers
by using hosted-engine --set-shared-config command

Both agent and broker are running on second host

[root at ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha-
vdsm      42331      1 26 14:40 ?        00:31:35 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
vdsm      42332      1  0 14:40 ?        00:00:16 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

but I saw some tracebacks during the broker start

[root at ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago
 Main PID: 42331 (ovirt-ha-broker)
   CGroup: /system.slice/ovirt-ha-broker.service
           └─42331 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon

Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine
High Availability Communications Broker.
Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine
High Availability Communications Broker...
Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker
ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error
handling request, data: 'set-storage-domain FilesystemBackend
dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162'
                                                        Traceback (most
recent call last):
                                                          File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 166, in handle
                                                            data)
                                                          File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 299, in _dispatch

.set_storage_domain(client, sd_type, **options)
                                                          File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 66, in set_storage_domain

self._backends[client].connect()
                                                          File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 462, in connect
                                                            self._dom_type)
                                                          File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 107, in get_domain_path
                                                            " in
{1}".format(sd_uuid, parent))

BackendFailureException: path to storage domain
4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in
/rhev/data-center/mnt/glusterSD



I have tried to issue hosted-engine --connect-storage on second host
followed by agent & broker restart
But there is no any visible improvements.

Regards,
Artem







On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak at redhat.com> wrote:

> Hi everybody,
>
> there are couple of things to check here.
>
> - what version of hosted engine agent is this? The logs look like
> coming from 4.1
> - what version of engine is used?
> - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on
> both hosts, the numbers must be different
> - it looks like the agent or broker on host 2 is not active (or there
> would be a report)
> - the second host does not see data from the first host (unknown
> stale-data), wait for a minute and check again, then check the storage
> connection
>
> And then the general troubleshooting:
>
> - put hosted engine in global maintenance mode (and check that it is
> visible from the other host using he --vm-status)
> - mount storage domain (hosted-engine --connect-storage)
> - check sanlock client status to see if proper lockspaces are present
>
> Best regards
>
> Martin Sivak
>
> On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek at ihtfp.com> wrote:
> > Why are both hosts reporting as ovirt 1?
> > Look at the hostname fields to see what mean.
> >
> > -derek
> > Sent using my mobile device. Please excuse any typos.
> >
> > On January 16, 2018 7:11:09 AM Artem Tambovskiy <
> artem.tambovskiy at gmail.com>
> > wrote:
> >>
> >> Hello,
> >>
> >> Yes, I followed exactly the same procedure while reinstalling the hosts
> >> (the only difference that I have SSH key configured instead of the
> >> password).
> >>
> >> Just reinstalled the second host one more time, after 20 min the host
> >> still haven't reached active score of 3400 (Hosted Engine HA:Not
> Active) and
> >> I still don't see crown icon for this host.
> >>
> >> hosted-engine --vm-status  from ovirt1 host
> >>
> >> [root at ovirt1 ~]# hosted-engine --vm-status
> >>
> >>
> >> --== Host 1 status ==--
> >>
> >> conf_on_shared_storage             : True
> >> Status up-to-date                  : True
> >> Hostname                           : ovirt1.telia.ru
> >> Host ID                            : 1
> >> Engine status                      : {"health": "good", "vm": "up",
> >> "detail": "up"}
> >> Score                              : 3400
> >> stopped                            : False
> >> Local maintenance                  : False
> >> crc32                              : 3f94156a
> >> local_conf_timestamp               : 349144
> >> Host timestamp                     : 349144
> >> Extra metadata (valid at timestamp):
> >>         metadata_parse_version=1
> >>         metadata_feature_version=1
> >>         timestamp=349144 (Tue Jan 16 15:03:45 2018)
> >>         host-id=1
> >>         score=3400
> >>         vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018)
> >>         conf_on_shared_storage=True
> >>         maintenance=False
> >>         state=EngineUp
> >>         stopped=False
> >>
> >>
> >> --== Host 2 status ==--
> >>
> >> conf_on_shared_storage             : True
> >> Status up-to-date                  : False
> >> Hostname                           : ovirt1.telia.ru
> >> Host ID                            : 2
> >> Engine status                      : unknown stale-data
> >> Score                              : 0
> >> stopped                            : True
> >> Local maintenance                  : False
> >> crc32                              : c7037c03
> >> local_conf_timestamp               : 7530
> >> Host timestamp                     : 7530
> >> Extra metadata (valid at timestamp):
> >>         metadata_parse_version=1
> >>         metadata_feature_version=1
> >>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
> >>         host-id=2
> >>         score=0
> >>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> >>         conf_on_shared_storage=True
> >>         maintenance=False
> >>         state=AgentStopped
> >>         stopped=True
> >>
> >>
> >> hosted-engine --vm-status output from ovirt2 host
> >>
> >> [root at ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
> >>
> >>
> >> --== Host 1 status ==--
> >>
> >> conf_on_shared_storage             : True
> >> Status up-to-date                  : False
> >> Hostname                           : ovirt1.telia.ru
> >> Host ID                            : 1
> >> Engine status                      : unknown stale-data
> >> Score                              : 3400
> >> stopped                            : False
> >> Local maintenance                  : False
> >> crc32                              : 6d3606f1
> >> local_conf_timestamp               : 349264
> >> Host timestamp                     : 349264
> >> Extra metadata (valid at timestamp):
> >>         metadata_parse_version=1
> >>         metadata_feature_version=1
> >>         timestamp=349264 (Tue Jan 16 15:05:45 2018)
> >>         host-id=1
> >>         score=3400
> >>         vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018)
> >>         conf_on_shared_storage=True
> >>         maintenance=False
> >>         state=EngineUp
> >>         stopped=False
> >>
> >>
> >> --== Host 2 status ==--
> >>
> >> conf_on_shared_storage             : True
> >> Status up-to-date                  : False
> >> Hostname                           : ovirt1.telia.ru
> >> Host ID                            : 2
> >> Engine status                      : unknown stale-data
> >> Score                              : 0
> >> stopped                            : True
> >> Local maintenance                  : False
> >> crc32                              : c7037c03
> >> local_conf_timestamp               : 7530
> >> Host timestamp                     : 7530
> >> Extra metadata (valid at timestamp):
> >>         metadata_parse_version=1
> >>         metadata_feature_version=1
> >>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
> >>         host-id=2
> >>         score=0
> >>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> >>         conf_on_shared_storage=True
> >>         maintenance=False
> >>         state=AgentStopped
> >>         stopped=True
> >>
> >>
> >> Also I saw some log messages in webGUI about time drift like
> >>
> >> "Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum
> >> configured value is 300 seconds." that is a bit weird as haven't
> touched any
> >> time settings since I installed the cluster.
> >> both host have the same time and timezone (MSK) but hosted engine lives
> in
> >> UTC timezone. Is it mandatory to have everything in sync and in the same
> >> timezone?
> >>
> >> Regards,
> >> Artem
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra at redhat.com>
> wrote:
> >>>
> >>> Hello,
> >>>
> >>>      I now see that your hosted engine is up and running. Can you let
> me
> >>> know how did you try reinstalling the host? Below is the procedure
> which is
> >>> used and hope you did not miss any step while reinstalling. If no, can
> you
> >>> try reinstalling again and see if that works ?
> >>>
> >>> 1) Move the host to maintenance
> >>> 2) click on reinstall
> >>> 3) provide the password
> >>> 4) uncheck 'automatically configure host firewall'
> >>> 5) click on 'Deploy' tab
> >>> 6) click Hosted Engine deployment as 'Deploy'
> >>>
> >>> And once the host installation is done, wait till the active score of
> the
> >>> host shows 3400 in the general tab then check hosted-engine
> --vm-status.
> >>>
> >>> Thanks
> >>> kasturi
> >>>
> >>> On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy
> >>> <artem.tambovskiy at gmail.com> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> I have uploaded 2 archives with all relevant logs to shared hosting
> >>>> files from host 1  (which is currently running all VM's including
> >>>> hosted_engine)  -  https://yadi.sk/d/PttRoYV63RTvhK
> >>>> files from second host - https://yadi.sk/d/UBducEsV3RTvhc
> >>>>
> >>>> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it
> >>>> gives no effect. I have also tried to shutdown hosted_engine VM, stop
> >>>> ovirt-ha-agent and ovirt-ha-broker  services disconnect storage and
> connect
> >>>> it again  - no effect as well.
> >>>> Also I tried to reinstall second host from WebGUI - this lead to the
> >>>> interesting situation - now  hosted-engine --vm-status  shows that
> both
> >>>> hosts have the same address.
> >>>>
> >>>> [root at ovirt1 ~]# hosted-engine --vm-status
> >>>>
> >>>> --== Host 1 status ==--
> >>>>
> >>>> conf_on_shared_storage             : True
> >>>> Status up-to-date                  : True
> >>>> Hostname                           : ovirt1.telia.ru
> >>>> Host ID                            : 1
> >>>> Engine status                      : {"health": "good", "vm": "up",
> >>>> "detail": "up"}
> >>>> Score                              : 3400
> >>>> stopped                            : False
> >>>> Local maintenance                  : False
> >>>> crc32                              : a7758085
> >>>> local_conf_timestamp               : 259327
> >>>> Host timestamp                     : 259327
> >>>> Extra metadata (valid at timestamp):
> >>>>         metadata_parse_version=1
> >>>>         metadata_feature_version=1
> >>>>         timestamp=259327 (Mon Jan 15 14:06:48 2018)
> >>>>         host-id=1
> >>>>         score=3400
> >>>>         vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018)
> >>>>         conf_on_shared_storage=True
> >>>>         maintenance=False
> >>>>         state=EngineUp
> >>>>         stopped=False
> >>>>
> >>>>
> >>>> --== Host 2 status ==--
> >>>>
> >>>> conf_on_shared_storage             : True
> >>>> Status up-to-date                  : False
> >>>> Hostname                           : ovirt1.telia.ru
> >>>> Host ID                            : 2
> >>>> Engine status                      : unknown stale-data
> >>>> Score                              : 0
> >>>> stopped                            : True
> >>>> Local maintenance                  : False
> >>>> crc32                              : c7037c03
> >>>> local_conf_timestamp               : 7530
> >>>> Host timestamp                     : 7530
> >>>> Extra metadata (valid at timestamp):
> >>>>         metadata_parse_version=1
> >>>>         metadata_feature_version=1
> >>>>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
> >>>>         host-id=2
> >>>>         score=0
> >>>>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> >>>>         conf_on_shared_storage=True
> >>>>         maintenance=False
> >>>>         state=AgentStopped
> >>>>         stopped=True
> >>>>
> >>>> Gluster seems working fine. all gluster nodes showing connected state.
> >>>>
> >>>> Any advises on how to resolve this situation are highly appreciated!
> >>>>
> >>>> Regards,
> >>>> Artem
> >>>>
> >>>>
> >>>> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <knarra at redhat.com>
> >>>> wrote:
> >>>>>
> >>>>> Hello Artem,
> >>>>>
> >>>>>         Can you check if glusterd service is running on host1 and all
> >>>>> the peers are in connected state ? If yes, can you restart
> ovirt-ha-agent
> >>>>> and broker services and check if things are working fine ?
> >>>>>
> >>>>> Thanks
> >>>>> kasturi
> >>>>>
> >>>>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy
> >>>>> <artem.tambovskiy at gmail.com> wrote:
> >>>>>>
> >>>>>> Explored logs on both hosts.
> >>>>>> broker.log shows no errors.
> >>>>>>
> >>>>>> agent.log looking not good:
> >>>>>>
> >>>>>> on host1 (which running hosted engine) :
> >>>>>>
> >>>>>> MainThread::ERROR::2018-01-12
> >>>>>> 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.
> agent.Agent::(_run_agent)
> >>>>>> Traceback (most recent call last):
> >>>>>>   File
> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_
> ha/agent/agent.py",
> >>>>>> line 191, in _run_agent
> >>>>>>     return action(he)
> >>>>>>   File
> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_
> ha/agent/agent.py",
> >>>>>> line 64, in action_proper
> >>>>>>     return he.start_monitoring()
> >>>>>>   File
> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_
> ha/agent/hosted_engine.py",
> >>>>>> line 411, in start_monitoring
> >>>>>>     self._initialize_sanlock()
> >>>>>>   File
> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_
> ha/agent/hosted_engine.py",
> >>>>>> line 749, in _initialize_sanlock
> >>>>>>     "Failed to initialize sanlock, the number of errors has"
> >>>>>> SanlockInitializationError: Failed to initialize sanlock, the number
> >>>>>> of errors has exceeded the limit
> >>>>>>
> >>>>>> MainThread::ERROR::2018-01-12
> >>>>>> 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.
> agent.Agent::(_run_agent)
> >>>>>> Trying to restart agent
> >>>>>> MainThread::WARNING::2018-01-12
> >>>>>> 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.
> agent.Agent::(_run_agent)
> >>>>>> Restarting agent, attempt '1'
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.
> agenthosted_engine.HostedEngine::(_get_hostname)
> >>>>>> Found certificate common name: ovirt1.telia.ru
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> >>>>>> Initializing VDSM
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Connecting the storage
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.
> lib.storage_server.StorageServer::(validate_storage_server)
> >>>>>> Validating storage server
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.
> libstorage_server.StorageServer::(connect_storage_server)
> >>>>>> Connecting storage server
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.
> lib.storage_server.StorageServer::(connect_storage_server)
> >>>>>> Connecting storage server
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.
> lib.storage_server.StorageServer::(connect_storage_server)
> >>>>>> Refreshing the storage domain
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Preparing images
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.
> image.Image::(prepare_images)
> >>>>>> Preparing images
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Refreshing vm.conf
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(refresh_vm_conf)
> >>>>>> Reloading vm.conf from the shared storage domain
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> >>>>>> Extracting Engine VM OVF from the OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> >>>>>> OVF_STORE volume path:
> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
> 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
> d109fa36dfcf
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Found an OVF for HE VM, trying to convert
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Got vm.conf from OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_broker)
> >>>>>> Initializing ha-broker connection
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(start_monitor)
> >>>>>> Starting monitor ping, options {'addr': '80.239.162.97'}
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(start_monitor)
> >>>>>> Success, id 140547104457680
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(start_monitor)
> >>>>>> Starting monitor mgmt-bridge, options {'use_ssl': 'true',
> 'bridge_name':
> >>>>>> 'ovirtmgmt', 'address': '0'}
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.
> brokerlinkBrokerLink::(start_monitor)
> >>>>>> Success, id 140547104458064
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(start_monitor)
> >>>>>> Starting monitor mem-free, options {'use_ssl': 'true', 'address':
> '0'}
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(start_monitor)
> >>>>>> Success, id 140547104458448
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.
> brokerlinkBrokerLink::(start_monitor)
> >>>>>> Starting monitor cpu-load-no-engine, options {'use_ssl': 'true',
> 'vm_uuid':
> >>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(start_monitor)
> >>>>>> Success, id 140547104457552
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(start_monitor)
> >>>>>> Starting monitor engine-health, options {'use_ssl': 'true',
> 'vm_uuid':
> >>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(start_monitor)
> >>>>>> Success, id 140547104459792
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.
> brokerlink.BrokerLink::(set_storage_domain)
> >>>>>> Success, id 140546772847056
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_broker)
> >>>>>> Broker initialized, all submonitors started
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_sanlock)
> >>>>>> Ensuring lease for lockspace hosted-engine, host id 1 is acquired
> (file:
> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
> 498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-
> 23b404e50769)
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_sanlock)
> >>>>>> Failed to acquire the lock. Waiting '5's before the next attempt
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_sanlock)
> >>>>>> Failed to acquire the lock. Waiting '5's before the next attempt
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_sanlock)
> >>>>>> Failed to acquire the lock. Waiting '5's before the next attempt
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_sanlock)
> >>>>>> Failed to acquire the lock. Waiting '5's before the next attempt
> >>>>>>
> >>>>>>
> >>>>>> agent.log from second host
> >>>>>>
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Connecting the storage
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.
> lib.storage_server.StorageServer::(validate_storage_server)
> >>>>>> Validating storage server
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Storage domain reported as valid and reconnect is not forced.
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(start_monitoring)
> >>>>>> Current state EngineUnexpectedlyDown (score: 0)
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(refresh_vm_conf)
> >>>>>> Reloading vm.conf from the shared storage domain
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> >>>>>> Extracting Engine VM OVF from the OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> >>>>>> OVF_STORE volume path:
> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
> 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
> d109fa36dfcf
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Found an OVF for HE VM, trying to convert
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Got vm.conf from OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> >>>>>> Initializing VDSM
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Connecting the storage
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.
> lib.storage_server.StorageServer::(validate_storage_server)
> >>>>>> Validating storage server
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Storage domain reported as valid and reconnect is not forced.
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine::(score)
> >>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(start_monitoring)
> >>>>>> Current state EngineUnexpectedlyDown (score: 0)
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(refresh_vm_conf)
> >>>>>> Reloading vm.conf from the shared storage domain
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> >>>>>> Extracting Engine VM OVF from the OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> >>>>>> OVF_STORE volume path:
> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
> 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
> d109fa36dfcf
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Found an OVF for HE VM, trying to convert
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Got vm.conf from OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> >>>>>> Initializing VDSM
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Connecting the storage
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.
> lib.storage_server.StorageServer::(validate_storage_server)
> >>>>>> Validating storage server
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Storage domain reported as valid and reconnect is not forced.
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine::(score)
> >>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(start_monitoring)
> >>>>>> Current state EngineUnexpectedlyDown (score: 0)
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(refresh_vm_conf)
> >>>>>> Reloading vm.conf from the shared storage domain
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> >>>>>> Extracting Engine VM OVF from the OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> >>>>>> OVF_STORE volume path:
> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
> 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
> d109fa36dfcf
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Found an OVF for HE VM, trying to convert
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> >>>>>> Got vm.conf from OVF_STORE
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine::(consume)
> >>>>>> Engine down, local host does not have best score
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.
> agent.hosted_engineHostedEngine::(_initialize_vdsm)
> >>>>>> Initializing VDSM
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.
> agent.hosted_engine.HostedEngine::(_initialize_storage_images)
> >>>>>> Connecting the storage
> >>>>>> MainThread::INFO::2018-01-12
> >>>>>> 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.
> lib.storage_server.StorageServer::(validate_storage_server)
> >>>>>> Validating storage server
> >>>>>>
> >>>>>>
> >>>>>> Any suggestions how to resolve this .
> >>>>>>
> >>>>>> regards,
> >>>>>> Artem
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy
> >>>>>> <artem.tambovskiy at gmail.com> wrote:
> >>>>>>>
> >>>>>>> Trying to fix one thing I broke another :(
> >>>>>>>
> >>>>>>> I fixed mnt_options for hosted engine storage domain and installed
> >>>>>>> latest security patches to my hosts and hosted engine. All VM's up
> and
> >>>>>>> running, but  hosted_engine --vm-status reports about issues:
> >>>>>>>
> >>>>>>> [root at ovirt1 ~]# hosted-engine --vm-status
> >>>>>>>
> >>>>>>>
> >>>>>>> --== Host 1 status ==--
> >>>>>>>
> >>>>>>> conf_on_shared_storage             : True
> >>>>>>> Status up-to-date                  : False
> >>>>>>> Hostname                           : ovirt2
> >>>>>>> Host ID                            : 1
> >>>>>>> Engine status                      : unknown stale-data
> >>>>>>> Score                              : 0
> >>>>>>> stopped                            : False
> >>>>>>> Local maintenance                  : False
> >>>>>>> crc32                              : 193164b8
> >>>>>>> local_conf_timestamp               : 8350
> >>>>>>> Host timestamp                     : 8350
> >>>>>>> Extra metadata (valid at timestamp):
> >>>>>>>         metadata_parse_version=1
> >>>>>>>         metadata_feature_version=1
> >>>>>>>         timestamp=8350 (Fri Jan 12 19:03:54 2018)
> >>>>>>>         host-id=1
> >>>>>>>         score=0
> >>>>>>>         vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018)
> >>>>>>>         conf_on_shared_storage=True
> >>>>>>>         maintenance=False
> >>>>>>>         state=EngineUnexpectedlyDown
> >>>>>>>         stopped=False
> >>>>>>>         timeout=Thu Jan  1 05:24:43 1970
> >>>>>>>
> >>>>>>>
> >>>>>>> --== Host 2 status ==--
> >>>>>>>
> >>>>>>> conf_on_shared_storage             : True
> >>>>>>> Status up-to-date                  : False
> >>>>>>> Hostname                           : ovirt1.telia.ru
> >>>>>>> Host ID                            : 2
> >>>>>>> Engine status                      : unknown stale-data
> >>>>>>> Score                              : 0
> >>>>>>> stopped                            : True
> >>>>>>> Local maintenance                  : False
> >>>>>>> crc32                              : c7037c03
> >>>>>>> local_conf_timestamp               : 7530
> >>>>>>> Host timestamp                     : 7530
> >>>>>>> Extra metadata (valid at timestamp):
> >>>>>>>         metadata_parse_version=1
> >>>>>>>         metadata_feature_version=1
> >>>>>>>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
> >>>>>>>         host-id=2
> >>>>>>>         score=0
> >>>>>>>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> >>>>>>>         conf_on_shared_storage=True
> >>>>>>>         maintenance=False
> >>>>>>>         state=AgentStopped
> >>>>>>>         stopped=True
> >>>>>>> [root at ovirt1 ~]#
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> from second host situation looks a bit different:
> >>>>>>>
> >>>>>>>
> >>>>>>> [root at ovirt2 ~]# hosted-engine --vm-status
> >>>>>>>
> >>>>>>>
> >>>>>>> --== Host 1 status ==--
> >>>>>>>
> >>>>>>> conf_on_shared_storage             : True
> >>>>>>> Status up-to-date                  : True
> >>>>>>> Hostname                           : ovirt2
> >>>>>>> Host ID                            : 1
> >>>>>>> Engine status                      : {"reason": "vm not running on
> >>>>>>> this host", "health": "bad", "vm": "down", "detail": "unknown"}
> >>>>>>> Score                              : 0
> >>>>>>> stopped                            : False
> >>>>>>> Local maintenance                  : False
> >>>>>>> crc32                              : 78eabdb6
> >>>>>>> local_conf_timestamp               : 8403
> >>>>>>> Host timestamp                     : 8402
> >>>>>>> Extra metadata (valid at timestamp):
> >>>>>>>         metadata_parse_version=1
> >>>>>>>         metadata_feature_version=1
> >>>>>>>         timestamp=8402 (Fri Jan 12 19:04:47 2018)
> >>>>>>>         host-id=1
> >>>>>>>         score=0
> >>>>>>>         vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018)
> >>>>>>>         conf_on_shared_storage=True
> >>>>>>>         maintenance=False
> >>>>>>>         state=EngineUnexpectedlyDown
> >>>>>>>         stopped=False
> >>>>>>>         timeout=Thu Jan  1 05:24:43 1970
> >>>>>>>
> >>>>>>>
> >>>>>>> --== Host 2 status ==--
> >>>>>>>
> >>>>>>> conf_on_shared_storage             : True
> >>>>>>> Status up-to-date                  : False
> >>>>>>> Hostname                           : ovirt1.telia.ru
> >>>>>>> Host ID                            : 2
> >>>>>>> Engine status                      : unknown stale-data
> >>>>>>> Score                              : 0
> >>>>>>> stopped                            : True
> >>>>>>> Local maintenance                  : False
> >>>>>>> crc32                              : c7037c03
> >>>>>>> local_conf_timestamp               : 7530
> >>>>>>> Host timestamp                     : 7530
> >>>>>>> Extra metadata (valid at timestamp):
> >>>>>>>         metadata_parse_version=1
> >>>>>>>         metadata_feature_version=1
> >>>>>>>         timestamp=7530 (Fri Jan 12 16:10:12 2018)
> >>>>>>>         host-id=2
> >>>>>>>         score=0
> >>>>>>>         vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> >>>>>>>         conf_on_shared_storage=True
> >>>>>>>         maintenance=False
> >>>>>>>         state=AgentStopped
> >>>>>>>         stopped=True
> >>>>>>>
> >>>>>>>
> >>>>>>> WebGUI shows that engine running on host ovirt1.
> >>>>>>> Gluster looks fine
> >>>>>>> [root at ovirt1 ~]# gluster volume status engine
> >>>>>>> Status of volume: engine
> >>>>>>> Gluster process                             TCP Port  RDMA Port
> >>>>>>> Online  Pid
> >>>>>>>
> >>>>>>> ------------------------------------------------------------
> ------------------
> >>>>>>> Brick ovirt1.teliaru:/oVirt/engine         49169     0          Y
> >>>>>>> 3244
> >>>>>>> Brick ovirt2.telia.ru:/oVirt/engine         49179     0          Y
> >>>>>>> 20372
> >>>>>>> Brick ovirt3.telia.ru:/oVirt/engine         49206     0          Y
> >>>>>>> 16609
> >>>>>>> Self-heal Daemon on localhost               N/A       N/A        Y
> >>>>>>> 117868
> >>>>>>> Self-heal Daemon on ovirt2.telia.ru         N/A       N/A        Y
> >>>>>>> 20521
> >>>>>>> Self-heal Daemon on ovirt3                  N/A       N/A        Y
> >>>>>>> 25093
> >>>>>>>
> >>>>>>> Task Status of Volume engine
> >>>>>>>
> >>>>>>> ------------------------------------------------------------
> ------------------
> >>>>>>> There are no active volume tasks
> >>>>>>>
> >>>>>>> How to resolve this issue?
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Users mailing list
> >>>>>>> Users at ovirt.org
> >>>>>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Users mailing list
> >>>>>> Users at ovirt.org
> >>>>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> Users mailing list
> >> Users at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> >>
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180116/6d75a85c/attachment.html>


More information about the Users mailing list