[ovirt-users] hosted-engine unknow stale-data
Martin Sivak
msivak at redhat.com
Mon Jan 22 08:51:03 UTC 2018
Hi Artem,
make sure the IDs are different, change them manually if you must!
That is all you need to do to get the agent up I think. The symlink
issue is probably related to another change we did (it happens when a
new hosted engine node is deployed by the engine) and a simple broker
restart should fix it too.
Best regards
Martin Sivak
On Mon, Jan 22, 2018 at 8:03 AM, Artem Tambovskiy
<artem.tambovskiy at gmail.com> wrote:
> Hello Kasturi,
>
> Yes, I set global maintenance mode intentionally,
> I'm run out of the ideas troubleshooting my cluster and decided to undeploy
> the hosted engine from second host, clean the installation and add again to
> the cluster.
> Also I cleaned the metadata with hosted-engine --clean-metadata --host-id=2
> --force-clean But once I added the second host to the cluster again it
> doesn't show the capability to run hosted engine. And doesn't even appear in
> the output hosted-engine --vm-status
> [root at ovirt1 ~]#hosted-engine --vm-status --== Host 1 status ==--
> conf_on_shared_storage : True Status up-to-date : True Hostname :
> ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up",
> "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32
> : a23c7cbd local_conf_timestamp : 848931 Host timestamp : 848930 Extra
> metadata (valid at timestamp): metadata_parse_version=1
> metadata_feature_version=1 timestamp=848930 (Mon Jan 22 09:53:29 2018)
> host-id=1 score=3400 vm_conf_refresh_time=848931 (Mon Jan 22 09:53:29 2018)
> conf_on_shared_storage=True maintenance=False state=GlobalMaintenance
> stopped=False
>
> On redeployed second host I see unknown-stale-data again, and second host
> doesn't show up as a hosted-engine capable.
> [root at ovirt2 ~]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : False
> Hostname : ovirt1.telia.ru
> Host ID : 1
> Engine status : unknown stale-data
> Score : 0
> stopped : False
> Local maintenance : False
> crc32 : 18765f68
> local_conf_timestamp : 848951
> Host timestamp : 848951
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=848951 (Mon Jan 22 09:53:49 2018)
> host-id=1
> score=0
> vm_conf_refresh_time=848951 (Mon Jan 22 09:53:50 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=ReinitializeFSM
> stopped=False
>
>
> Really strange situation ...
>
> Regards,
> Artem
>
>
>
> On Mon, Jan 22, 2018 at 9:46 AM, Kasturi Narra <knarra at redhat.com> wrote:
>>
>> Hello Artem,
>>
>> Any reason why you chose hosted-engine undeploy action for the second
>> host ? I see that the cluster is in global maintenance mode, was this
>> intended ?
>>
>> command to clear the entries from hosted-engine --vm-status is
>> "hosted-engine --clean-metadata --host-id=<old_host_id> --force-clean"
>>
>> Hope this helps !!
>>
>> Thanks
>> kasturi
>>
>>
>> On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy
>> <artem.tambovskiy at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Ok, i decided to remove second host from the cluster.
>>> I reinstalled from webUI it with hosted-engine action UNDEPLOY, and
>>> removed it from the cluster aftewards.
>>> All VM's are fine hosted engine running ok,
>>> But hosted-engine --vm-status still showing 2 hosts.
>>>
>>> How I can clean the traces of second host in a correct way?
>>>
>>>
>>> --== Host 1 status ==--
>>>
>>> conf_on_shared_storage : True
>>> Status up-to-date : True
>>> Hostname : ovirt1.telia.ru
>>> Host ID : 1
>>> Engine status : {"health": "good", "vm": "up",
>>> "detail": "up"}
>>> Score : 3400
>>> stopped : False
>>> Local maintenance : False
>>> crc32 : 1b1b6f6d
>>> local_conf_timestamp : 545385
>>> Host timestamp : 545385
>>> Extra metadata (valid at timestamp):
>>> metadata_parse_version=1
>>> metadata_feature_version=1
>>> timestamp=545385 (Thu Jan 18 21:34:25 2018)
>>> host-id=1
>>> score=3400
>>> vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018)
>>> conf_on_shared_storage=True
>>> maintenance=False
>>> state=GlobalMaintenance
>>> stopped=False
>>>
>>>
>>> --== Host 2 status ==--
>>>
>>> conf_on_shared_storage : True
>>> Status up-to-date : False
>>> Hostname : ovirt1.telia.ru
>>> Host ID : 2
>>> Engine status : unknown stale-data
>>> Score : 0
>>> stopped : True
>>> Local maintenance : False
>>> crc32 : c7037c03
>>> local_conf_timestamp : 7530
>>> Host timestamp : 7530
>>> Extra metadata (valid at timestamp):
>>> metadata_parse_version=1
>>> metadata_feature_version=1
>>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>> host-id=2
>>> score=0
>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>> conf_on_shared_storage=True
>>> maintenance=False
>>> state=AgentStopped
>>> stopped=True
>>>
>>>
>>> !! Cluster is in GLOBAL MAINTENANCE mode !!
>>>
>>> Thank you in advance!
>>> Regards,
>>> Artem
>>>
>>>
>>> On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy
>>> <artem.tambovskiy at gmail.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Any further suggestions on how to fix the issue and make HA setup
>>>> working? Can the complete removal of second host (with complete removal
>>>> ovirt configuration files and packages) from cluster and adding it again
>>>> solve the issue? Or it might completly ruin the cluster?
>>>>
>>>> Regards,
>>>> Artem
>>>>
>>>> 16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy"
>>>> <artem.tambovskiy at gmail.com> написал:
>>>>
>>>>> Hi Martin,
>>>>>
>>>>> Thanks for feedback.
>>>>>
>>>>> All hosts and hosted-engine running 4.1.8 release.
>>>>> The strange thing : I can see that host ID is set to 1 on both hosts at
>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf file.
>>>>> I have no idea how this happen, the only thing I have changed recently
>>>>> is that I have changed mnt_options in order to add backup-volfile-servers
>>>>> by using hosted-engine --set-shared-config command
>>>>>
>>>>> Both agent and broker are running on second host
>>>>>
>>>>> [root at ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha-
>>>>> vdsm 42331 1 26 14:40 ? 00:31:35 /usr/bin/python
>>>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
>>>>> vdsm 42332 1 0 14:40 ? 00:00:16 /usr/bin/python
>>>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>>>>>
>>>>> but I saw some tracebacks during the broker start
>>>>>
>>>>> [root at ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker
>>>>> -l
>>>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>>>>> Communications Broker
>>>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>>>>> enabled; vendor preset: disabled)
>>>>> Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min
>>>>> ago
>>>>> Main PID: 42331 (ovirt-ha-broker)
>>>>> CGroup: /system.slice/ovirt-ha-broker.service
>>>>> └─42331 /usr/bin/python
>>>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
>>>>>
>>>>> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine
>>>>> High Availability Communications Broker.
>>>>> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted
>>>>> Engine High Availability Communications Broker...
>>>>> Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker
>>>>> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error
>>>>> handling request, data: 'set-storage-domain FilesystemBackend
>>>>> dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162'
>>>>> Traceback (most
>>>>> recent call last):
>>>>> File
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>>> line 166, in handle
>>>>> data)
>>>>> File
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>>> line 299, in _dispatch
>>>>>
>>>>> .set_storage_domain(client, sd_type, **options)
>>>>> File
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>> line 66, in set_storage_domain
>>>>>
>>>>> self._backends[client].connect()
>>>>> File
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
>>>>> line 462, in connect
>>>>>
>>>>> self._dom_type)
>>>>> File
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
>>>>> line 107, in get_domain_path
>>>>> " in
>>>>> {1}".format(sd_uuid, parent))
>>>>>
>>>>> BackendFailureException: path to storage domain
>>>>> 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in
>>>>> /rhev/data-center/mnt/glusterSD
>>>>>
>>>>>
>>>>>
>>>>> I have tried to issue hosted-engine --connect-storage on second host
>>>>> followed by agent & broker restart
>>>>> But there is no any visible improvements.
>>>>>
>>>>> Regards,
>>>>> Artem
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak at redhat.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi everybody,
>>>>>>
>>>>>> there are couple of things to check here.
>>>>>>
>>>>>> - what version of hosted engine agent is this? The logs look like
>>>>>> coming from 4.1
>>>>>> - what version of engine is used?
>>>>>> - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on
>>>>>> both hosts, the numbers must be different
>>>>>> - it looks like the agent or broker on host 2 is not active (or there
>>>>>> would be a report)
>>>>>> - the second host does not see data from the first host (unknown
>>>>>> stale-data), wait for a minute and check again, then check the storage
>>>>>> connection
>>>>>>
>>>>>> And then the general troubleshooting:
>>>>>>
>>>>>> - put hosted engine in global maintenance mode (and check that it is
>>>>>> visible from the other host using he --vm-status)
>>>>>> - mount storage domain (hosted-engine --connect-storage)
>>>>>> - check sanlock client status to see if proper lockspaces are present
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Martin Sivak
>>>>>>
>>>>>> On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek at ihtfp.com> wrote:
>>>>>> > Why are both hosts reporting as ovirt 1?
>>>>>> > Look at the hostname fields to see what mean.
>>>>>> >
>>>>>> > -derek
>>>>>> > Sent using my mobile device. Please excuse any typos.
>>>>>> >
>>>>>> > On January 16, 2018 7:11:09 AM Artem Tambovskiy
>>>>>> > <artem.tambovskiy at gmail.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Hello,
>>>>>> >>
>>>>>> >> Yes, I followed exactly the same procedure while reinstalling the
>>>>>> >> hosts
>>>>>> >> (the only difference that I have SSH key configured instead of the
>>>>>> >> password).
>>>>>> >>
>>>>>> >> Just reinstalled the second host one more time, after 20 min the
>>>>>> >> host
>>>>>> >> still haven't reached active score of 3400 (Hosted Engine HA:Not
>>>>>> >> Active) and
>>>>>> >> I still don't see crown icon for this host.
>>>>>> >>
>>>>>> >> hosted-engine --vm-status from ovirt1 host
>>>>>> >>
>>>>>> >> [root at ovirt1 ~]# hosted-engine --vm-status
>>>>>> >>
>>>>>> >>
>>>>>> >> --== Host 1 status ==--
>>>>>> >>
>>>>>> >> conf_on_shared_storage : True
>>>>>> >> Status up-to-date : True
>>>>>> >> Hostname : ovirt1.telia.ru
>>>>>> >> Host ID : 1
>>>>>> >> Engine status : {"health": "good", "vm": "up",
>>>>>> >> "detail": "up"}
>>>>>> >> Score : 3400
>>>>>> >> stopped : False
>>>>>> >> Local maintenance : False
>>>>>> >> crc32 : 3f94156a
>>>>>> >> local_conf_timestamp : 349144
>>>>>> >> Host timestamp : 349144
>>>>>> >> Extra metadata (valid at timestamp):
>>>>>> >> metadata_parse_version=1
>>>>>> >> metadata_feature_version=1
>>>>>> >> timestamp=349144 (Tue Jan 16 15:03:45 2018)
>>>>>> >> host-id=1
>>>>>> >> score=3400
>>>>>> >> vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018)
>>>>>> >> conf_on_shared_storage=True
>>>>>> >> maintenance=False
>>>>>> >> state=EngineUp
>>>>>> >> stopped=False
>>>>>> >>
>>>>>> >>
>>>>>> >> --== Host 2 status ==--
>>>>>> >>
>>>>>> >> conf_on_shared_storage : True
>>>>>> >> Status up-to-date : False
>>>>>> >> Hostname : ovirt1.telia.ru
>>>>>> >> Host ID : 2
>>>>>> >> Engine status : unknown stale-data
>>>>>> >> Score : 0
>>>>>> >> stopped : True
>>>>>> >> Local maintenance : False
>>>>>> >> crc32 : c7037c03
>>>>>> >> local_conf_timestamp : 7530
>>>>>> >> Host timestamp : 7530
>>>>>> >> Extra metadata (valid at timestamp):
>>>>>> >> metadata_parse_version=1
>>>>>> >> metadata_feature_version=1
>>>>>> >> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >> host-id=2
>>>>>> >> score=0
>>>>>> >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >> conf_on_shared_storage=True
>>>>>> >> maintenance=False
>>>>>> >> state=AgentStopped
>>>>>> >> stopped=True
>>>>>> >>
>>>>>> >>
>>>>>> >> hosted-engine --vm-status output from ovirt2 host
>>>>>> >>
>>>>>> >> [root at ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
>>>>>> >>
>>>>>> >>
>>>>>> >> --== Host 1 status ==--
>>>>>> >>
>>>>>> >> conf_on_shared_storage : True
>>>>>> >> Status up-to-date : False
>>>>>> >> Hostname : ovirt1.telia.ru
>>>>>> >> Host ID : 1
>>>>>> >> Engine status : unknown stale-data
>>>>>> >> Score : 3400
>>>>>> >> stopped : False
>>>>>> >> Local maintenance : False
>>>>>> >> crc32 : 6d3606f1
>>>>>> >> local_conf_timestamp : 349264
>>>>>> >> Host timestamp : 349264
>>>>>> >> Extra metadata (valid at timestamp):
>>>>>> >> metadata_parse_version=1
>>>>>> >> metadata_feature_version=1
>>>>>> >> timestamp=349264 (Tue Jan 16 15:05:45 2018)
>>>>>> >> host-id=1
>>>>>> >> score=3400
>>>>>> >> vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018)
>>>>>> >> conf_on_shared_storage=True
>>>>>> >> maintenance=False
>>>>>> >> state=EngineUp
>>>>>> >> stopped=False
>>>>>> >>
>>>>>> >>
>>>>>> >> --== Host 2 status ==--
>>>>>> >>
>>>>>> >> conf_on_shared_storage : True
>>>>>> >> Status up-to-date : False
>>>>>> >> Hostname : ovirt1.telia.ru
>>>>>> >> Host ID : 2
>>>>>> >> Engine status : unknown stale-data
>>>>>> >> Score : 0
>>>>>> >> stopped : True
>>>>>> >> Local maintenance : False
>>>>>> >> crc32 : c7037c03
>>>>>> >> local_conf_timestamp : 7530
>>>>>> >> Host timestamp : 7530
>>>>>> >> Extra metadata (valid at timestamp):
>>>>>> >> metadata_parse_version=1
>>>>>> >> metadata_feature_version=1
>>>>>> >> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >> host-id=2
>>>>>> >> score=0
>>>>>> >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >> conf_on_shared_storage=True
>>>>>> >> maintenance=False
>>>>>> >> state=AgentStopped
>>>>>> >> stopped=True
>>>>>> >>
>>>>>> >>
>>>>>> >> Also I saw some log messages in webGUI about time drift like
>>>>>> >>
>>>>>> >> "Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum
>>>>>> >> configured value is 300 seconds." that is a bit weird as haven't
>>>>>> >> touched any
>>>>>> >> time settings since I installed the cluster.
>>>>>> >> both host have the same time and timezone (MSK) but hosted engine
>>>>>> >> lives in
>>>>>> >> UTC timezone. Is it mandatory to have everything in sync and in the
>>>>>> >> same
>>>>>> >> timezone?
>>>>>> >>
>>>>>> >> Regards,
>>>>>> >> Artem
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <knarra at redhat.com>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hello,
>>>>>> >>>
>>>>>> >>> I now see that your hosted engine is up and running. Can you
>>>>>> >>> let me
>>>>>> >>> know how did you try reinstalling the host? Below is the procedure
>>>>>> >>> which is
>>>>>> >>> used and hope you did not miss any step while reinstalling. If no,
>>>>>> >>> can you
>>>>>> >>> try reinstalling again and see if that works ?
>>>>>> >>>
>>>>>> >>> 1) Move the host to maintenance
>>>>>> >>> 2) click on reinstall
>>>>>> >>> 3) provide the password
>>>>>> >>> 4) uncheck 'automatically configure host firewall'
>>>>>> >>> 5) click on 'Deploy' tab
>>>>>> >>> 6) click Hosted Engine deployment as 'Deploy'
>>>>>> >>>
>>>>>> >>> And once the host installation is done, wait till the active score
>>>>>> >>> of the
>>>>>> >>> host shows 3400 in the general tab then check hosted-engine
>>>>>> >>> --vm-status.
>>>>>> >>>
>>>>>> >>> Thanks
>>>>>> >>> kasturi
>>>>>> >>>
>>>>>> >>> On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy
>>>>>> >>> <artem.tambovskiy at gmail.com> wrote:
>>>>>> >>>>
>>>>>> >>>> Hello,
>>>>>> >>>>
>>>>>> >>>> I have uploaded 2 archives with all relevant logs to shared
>>>>>> >>>> hosting
>>>>>> >>>> files from host 1 (which is currently running all VM's including
>>>>>> >>>> hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK
>>>>>> >>>> files from second host - https://yadi.sk/d/UBducEsV3RTvhc
>>>>>> >>>>
>>>>>> >>>> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker
>>>>>> >>>> but it
>>>>>> >>>> gives no effect. I have also tried to shutdown hosted_engine VM,
>>>>>> >>>> stop
>>>>>> >>>> ovirt-ha-agent and ovirt-ha-broker services disconnect storage
>>>>>> >>>> and connect
>>>>>> >>>> it again - no effect as well.
>>>>>> >>>> Also I tried to reinstall second host from WebGUI - this lead to
>>>>>> >>>> the
>>>>>> >>>> interesting situation - now hosted-engine --vm-status shows
>>>>>> >>>> that both
>>>>>> >>>> hosts have the same address.
>>>>>> >>>>
>>>>>> >>>> [root at ovirt1 ~]# hosted-engine --vm-status
>>>>>> >>>>
>>>>>> >>>> --== Host 1 status ==--
>>>>>> >>>>
>>>>>> >>>> conf_on_shared_storage : True
>>>>>> >>>> Status up-to-date : True
>>>>>> >>>> Hostname : ovirt1.telia.ru
>>>>>> >>>> Host ID : 1
>>>>>> >>>> Engine status : {"health": "good", "vm":
>>>>>> >>>> "up",
>>>>>> >>>> "detail": "up"}
>>>>>> >>>> Score : 3400
>>>>>> >>>> stopped : False
>>>>>> >>>> Local maintenance : False
>>>>>> >>>> crc32 : a7758085
>>>>>> >>>> local_conf_timestamp : 259327
>>>>>> >>>> Host timestamp : 259327
>>>>>> >>>> Extra metadata (valid at timestamp):
>>>>>> >>>> metadata_parse_version=1
>>>>>> >>>> metadata_feature_version=1
>>>>>> >>>> timestamp=259327 (Mon Jan 15 14:06:48 2018)
>>>>>> >>>> host-id=1
>>>>>> >>>> score=3400
>>>>>> >>>> vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018)
>>>>>> >>>> conf_on_shared_storage=True
>>>>>> >>>> maintenance=False
>>>>>> >>>> state=EngineUp
>>>>>> >>>> stopped=False
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> --== Host 2 status ==--
>>>>>> >>>>
>>>>>> >>>> conf_on_shared_storage : True
>>>>>> >>>> Status up-to-date : False
>>>>>> >>>> Hostname : ovirt1.telia.ru
>>>>>> >>>> Host ID : 2
>>>>>> >>>> Engine status : unknown stale-data
>>>>>> >>>> Score : 0
>>>>>> >>>> stopped : True
>>>>>> >>>> Local maintenance : False
>>>>>> >>>> crc32 : c7037c03
>>>>>> >>>> local_conf_timestamp : 7530
>>>>>> >>>> Host timestamp : 7530
>>>>>> >>>> Extra metadata (valid at timestamp):
>>>>>> >>>> metadata_parse_version=1
>>>>>> >>>> metadata_feature_version=1
>>>>>> >>>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >>>> host-id=2
>>>>>> >>>> score=0
>>>>>> >>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >>>> conf_on_shared_storage=True
>>>>>> >>>> maintenance=False
>>>>>> >>>> state=AgentStopped
>>>>>> >>>> stopped=True
>>>>>> >>>>
>>>>>> >>>> Gluster seems working fine. all gluster nodes showing connected
>>>>>> >>>> state.
>>>>>> >>>>
>>>>>> >>>> Any advises on how to resolve this situation are highly
>>>>>> >>>> appreciated!
>>>>>> >>>>
>>>>>> >>>> Regards,
>>>>>> >>>> Artem
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra
>>>>>> >>>> <knarra at redhat.com>
>>>>>> >>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Hello Artem,
>>>>>> >>>>>
>>>>>> >>>>> Can you check if glusterd service is running on host1
>>>>>> >>>>> and all
>>>>>> >>>>> the peers are in connected state ? If yes, can you restart
>>>>>> >>>>> ovirt-ha-agent
>>>>>> >>>>> and broker services and check if things are working fine ?
>>>>>> >>>>>
>>>>>> >>>>> Thanks
>>>>>> >>>>> kasturi
>>>>>> >>>>>
>>>>>> >>>>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy
>>>>>> >>>>> <artem.tambovskiy at gmail.com> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>> Explored logs on both hosts.
>>>>>> >>>>>> broker.log shows no errors.
>>>>>> >>>>>>
>>>>>> >>>>>> agent.log looking not good:
>>>>>> >>>>>>
>>>>>> >>>>>> on host1 (which running hosted engine) :
>>>>>> >>>>>>
>>>>>> >>>>>> MainThread::ERROR::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>>> >>>>>> Traceback (most recent call last):
>>>>>> >>>>>> File
>>>>>> >>>>>>
>>>>>> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>> >>>>>> line 191, in _run_agent
>>>>>> >>>>>> return action(he)
>>>>>> >>>>>> File
>>>>>> >>>>>>
>>>>>> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>> >>>>>> line 64, in action_proper
>>>>>> >>>>>> return he.start_monitoring()
>>>>>> >>>>>> File
>>>>>> >>>>>>
>>>>>> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> >>>>>> line 411, in start_monitoring
>>>>>> >>>>>> self._initialize_sanlock()
>>>>>> >>>>>> File
>>>>>> >>>>>>
>>>>>> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> >>>>>> line 749, in _initialize_sanlock
>>>>>> >>>>>> "Failed to initialize sanlock, the number of errors has"
>>>>>> >>>>>> SanlockInitializationError: Failed to initialize sanlock, the
>>>>>> >>>>>> number
>>>>>> >>>>>> of errors has exceeded the limit
>>>>>> >>>>>>
>>>>>> >>>>>> MainThread::ERROR::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>>> >>>>>> Trying to restart agent
>>>>>> >>>>>> MainThread::WARNING::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>>> >>>>>> Restarting agent, attempt '1'
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.agenthosted_engine.HostedEngine::(_get_hostname)
>>>>>> >>>>>> Found certificate common name: ovirt1.telia.ru
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>>> >>>>>> Initializing VDSM
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Connecting the storage
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>>> >>>>>> Validating storage server
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.libstorage_server.StorageServer::(connect_storage_server)
>>>>>> >>>>>> Connecting storage server
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>>> >>>>>> Connecting storage server
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>>> >>>>>> Refreshing the storage domain
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Preparing images
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images)
>>>>>> >>>>>> Preparing images
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Refreshing vm.conf
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>> >>>>>> Reloading vm.conf from the shared storage domain
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Trying to get a fresher copy of vm configuration from the
>>>>>> >>>>>> OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> >>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> >>>>>> OVF_STORE volume path:
>>>>>> >>>>>>
>>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Found an OVF for HE VM, trying to convert
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Got vm.conf from OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>>>> >>>>>> Initializing ha-broker connection
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>>> >>>>>> Starting monitor ping, options {'addr': '80.239.162.97'}
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>>> >>>>>> Success, id 140547104457680
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>>> >>>>>> Starting monitor mgmt-bridge, options {'use_ssl': 'true',
>>>>>> >>>>>> 'bridge_name':
>>>>>> >>>>>> 'ovirtmgmt', 'address': '0'}
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor)
>>>>>> >>>>>> Success, id 140547104458064
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>>> >>>>>> Starting monitor mem-free, options {'use_ssl': 'true',
>>>>>> >>>>>> 'address': '0'}
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>>> >>>>>> Success, id 140547104458448
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor)
>>>>>> >>>>>> Starting monitor cpu-load-no-engine, options {'use_ssl':
>>>>>> >>>>>> 'true', 'vm_uuid':
>>>>>> >>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>>> >>>>>> Success, id 140547104457552
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>>> >>>>>> Starting monitor engine-health, options {'use_ssl': 'true',
>>>>>> >>>>>> 'vm_uuid':
>>>>>> >>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>>> >>>>>> Success, id 140547104459792
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain)
>>>>>> >>>>>> Success, id 140546772847056
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>>>> >>>>>> Broker initialized, all submonitors started
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>>> >>>>>> Ensuring lease for lockspace hosted-engine, host id 1 is
>>>>>> >>>>>> acquired (file:
>>>>>> >>>>>>
>>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769)
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>>> >>>>>> Failed to acquire the lock. Waiting '5's before the next
>>>>>> >>>>>> attempt
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>>> >>>>>> Failed to acquire the lock. Waiting '5's before the next
>>>>>> >>>>>> attempt
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>>> >>>>>> Failed to acquire the lock. Waiting '5's before the next
>>>>>> >>>>>> attempt
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>>> >>>>>> Failed to acquire the lock. Waiting '5's before the next
>>>>>> >>>>>> attempt
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> agent.log from second host
>>>>>> >>>>>>
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Connecting the storage
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>>> >>>>>> Validating storage server
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> >>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>> >>>>>> Reloading vm.conf from the shared storage domain
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Trying to get a fresher copy of vm configuration from the
>>>>>> >>>>>> OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> >>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> >>>>>> OVF_STORE volume path:
>>>>>> >>>>>>
>>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Found an OVF for HE VM, trying to convert
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Got vm.conf from OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>>> >>>>>> Initializing VDSM
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Connecting the storage
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>>> >>>>>> Validating storage server
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>> >>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48
>>>>>> >>>>>> 2018
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> >>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>> >>>>>> Reloading vm.conf from the shared storage domain
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Trying to get a fresher copy of vm configuration from the
>>>>>> >>>>>> OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> >>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> >>>>>> OVF_STORE volume path:
>>>>>> >>>>>>
>>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Found an OVF for HE VM, trying to convert
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Got vm.conf from OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>>> >>>>>> Initializing VDSM
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Connecting the storage
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>>> >>>>>> Validating storage server
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>> >>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47
>>>>>> >>>>>> 2018
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> >>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>> >>>>>> Reloading vm.conf from the shared storage domain
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Trying to get a fresher copy of vm configuration from the
>>>>>> >>>>>> OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> >>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>> >>>>>> OVF_STORE volume path:
>>>>>> >>>>>>
>>>>>> >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Found an OVF for HE VM, trying to convert
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>> >>>>>> Got vm.conf from OVF_STORE
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>>>>>> >>>>>> Engine down, local host does not have best score
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engineHostedEngine::(_initialize_vdsm)
>>>>>> >>>>>> Initializing VDSM
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> >>>>>> Connecting the storage
>>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>>> >>>>>>
>>>>>> >>>>>> 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>>> >>>>>> Validating storage server
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> Any suggestions how to resolve this .
>>>>>> >>>>>>
>>>>>> >>>>>> regards,
>>>>>> >>>>>> Artem
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy
>>>>>> >>>>>> <artem.tambovskiy at gmail.com> wrote:
>>>>>> >>>>>>>
>>>>>> >>>>>>> Trying to fix one thing I broke another :(
>>>>>> >>>>>>>
>>>>>> >>>>>>> I fixed mnt_options for hosted engine storage domain and
>>>>>> >>>>>>> installed
>>>>>> >>>>>>> latest security patches to my hosts and hosted engine. All
>>>>>> >>>>>>> VM's up and
>>>>>> >>>>>>> running, but hosted_engine --vm-status reports about issues:
>>>>>> >>>>>>>
>>>>>> >>>>>>> [root at ovirt1 ~]# hosted-engine --vm-status
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --== Host 1 status ==--
>>>>>> >>>>>>>
>>>>>> >>>>>>> conf_on_shared_storage : True
>>>>>> >>>>>>> Status up-to-date : False
>>>>>> >>>>>>> Hostname : ovirt2
>>>>>> >>>>>>> Host ID : 1
>>>>>> >>>>>>> Engine status : unknown stale-data
>>>>>> >>>>>>> Score : 0
>>>>>> >>>>>>> stopped : False
>>>>>> >>>>>>> Local maintenance : False
>>>>>> >>>>>>> crc32 : 193164b8
>>>>>> >>>>>>> local_conf_timestamp : 8350
>>>>>> >>>>>>> Host timestamp : 8350
>>>>>> >>>>>>> Extra metadata (valid at timestamp):
>>>>>> >>>>>>> metadata_parse_version=1
>>>>>> >>>>>>> metadata_feature_version=1
>>>>>> >>>>>>> timestamp=8350 (Fri Jan 12 19:03:54 2018)
>>>>>> >>>>>>> host-id=1
>>>>>> >>>>>>> score=0
>>>>>> >>>>>>> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018)
>>>>>> >>>>>>> conf_on_shared_storage=True
>>>>>> >>>>>>> maintenance=False
>>>>>> >>>>>>> state=EngineUnexpectedlyDown
>>>>>> >>>>>>> stopped=False
>>>>>> >>>>>>> timeout=Thu Jan 1 05:24:43 1970
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --== Host 2 status ==--
>>>>>> >>>>>>>
>>>>>> >>>>>>> conf_on_shared_storage : True
>>>>>> >>>>>>> Status up-to-date : False
>>>>>> >>>>>>> Hostname : ovirt1.telia.ru
>>>>>> >>>>>>> Host ID : 2
>>>>>> >>>>>>> Engine status : unknown stale-data
>>>>>> >>>>>>> Score : 0
>>>>>> >>>>>>> stopped : True
>>>>>> >>>>>>> Local maintenance : False
>>>>>> >>>>>>> crc32 : c7037c03
>>>>>> >>>>>>> local_conf_timestamp : 7530
>>>>>> >>>>>>> Host timestamp : 7530
>>>>>> >>>>>>> Extra metadata (valid at timestamp):
>>>>>> >>>>>>> metadata_parse_version=1
>>>>>> >>>>>>> metadata_feature_version=1
>>>>>> >>>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >>>>>>> host-id=2
>>>>>> >>>>>>> score=0
>>>>>> >>>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >>>>>>> conf_on_shared_storage=True
>>>>>> >>>>>>> maintenance=False
>>>>>> >>>>>>> state=AgentStopped
>>>>>> >>>>>>> stopped=True
>>>>>> >>>>>>> [root at ovirt1 ~]#
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> from second host situation looks a bit different:
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> [root at ovirt2 ~]# hosted-engine --vm-status
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --== Host 1 status ==--
>>>>>> >>>>>>>
>>>>>> >>>>>>> conf_on_shared_storage : True
>>>>>> >>>>>>> Status up-to-date : True
>>>>>> >>>>>>> Hostname : ovirt2
>>>>>> >>>>>>> Host ID : 1
>>>>>> >>>>>>> Engine status : {"reason": "vm not
>>>>>> >>>>>>> running on
>>>>>> >>>>>>> this host", "health": "bad", "vm": "down", "detail":
>>>>>> >>>>>>> "unknown"}
>>>>>> >>>>>>> Score : 0
>>>>>> >>>>>>> stopped : False
>>>>>> >>>>>>> Local maintenance : False
>>>>>> >>>>>>> crc32 : 78eabdb6
>>>>>> >>>>>>> local_conf_timestamp : 8403
>>>>>> >>>>>>> Host timestamp : 8402
>>>>>> >>>>>>> Extra metadata (valid at timestamp):
>>>>>> >>>>>>> metadata_parse_version=1
>>>>>> >>>>>>> metadata_feature_version=1
>>>>>> >>>>>>> timestamp=8402 (Fri Jan 12 19:04:47 2018)
>>>>>> >>>>>>> host-id=1
>>>>>> >>>>>>> score=0
>>>>>> >>>>>>> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018)
>>>>>> >>>>>>> conf_on_shared_storage=True
>>>>>> >>>>>>> maintenance=False
>>>>>> >>>>>>> state=EngineUnexpectedlyDown
>>>>>> >>>>>>> stopped=False
>>>>>> >>>>>>> timeout=Thu Jan 1 05:24:43 1970
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --== Host 2 status ==--
>>>>>> >>>>>>>
>>>>>> >>>>>>> conf_on_shared_storage : True
>>>>>> >>>>>>> Status up-to-date : False
>>>>>> >>>>>>> Hostname : ovirt1.telia.ru
>>>>>> >>>>>>> Host ID : 2
>>>>>> >>>>>>> Engine status : unknown stale-data
>>>>>> >>>>>>> Score : 0
>>>>>> >>>>>>> stopped : True
>>>>>> >>>>>>> Local maintenance : False
>>>>>> >>>>>>> crc32 : c7037c03
>>>>>> >>>>>>> local_conf_timestamp : 7530
>>>>>> >>>>>>> Host timestamp : 7530
>>>>>> >>>>>>> Extra metadata (valid at timestamp):
>>>>>> >>>>>>> metadata_parse_version=1
>>>>>> >>>>>>> metadata_feature_version=1
>>>>>> >>>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >>>>>>> host-id=2
>>>>>> >>>>>>> score=0
>>>>>> >>>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>>>>>> >>>>>>> conf_on_shared_storage=True
>>>>>> >>>>>>> maintenance=False
>>>>>> >>>>>>> state=AgentStopped
>>>>>> >>>>>>> stopped=True
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> WebGUI shows that engine running on host ovirt1.
>>>>>> >>>>>>> Gluster looks fine
>>>>>> >>>>>>> [root at ovirt1 ~]# gluster volume status engine
>>>>>> >>>>>>> Status of volume: engine
>>>>>> >>>>>>> Gluster process TCP Port RDMA
>>>>>> >>>>>>> Port
>>>>>> >>>>>>> Online Pid
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> ------------------------------------------------------------------------------
>>>>>> >>>>>>> Brick ovirt1.teliaru:/oVirt/engine 49169 0
>>>>>> >>>>>>> Y
>>>>>> >>>>>>> 3244
>>>>>> >>>>>>> Brick ovirt2.telia.ru:/oVirt/engine 49179 0
>>>>>> >>>>>>> Y
>>>>>> >>>>>>> 20372
>>>>>> >>>>>>> Brick ovirt3.telia.ru:/oVirt/engine 49206 0
>>>>>> >>>>>>> Y
>>>>>> >>>>>>> 16609
>>>>>> >>>>>>> Self-heal Daemon on localhost N/A N/A
>>>>>> >>>>>>> Y
>>>>>> >>>>>>> 117868
>>>>>> >>>>>>> Self-heal Daemon on ovirt2.telia.ru N/A N/A
>>>>>> >>>>>>> Y
>>>>>> >>>>>>> 20521
>>>>>> >>>>>>> Self-heal Daemon on ovirt3 N/A N/A
>>>>>> >>>>>>> Y
>>>>>> >>>>>>> 25093
>>>>>> >>>>>>>
>>>>>> >>>>>>> Task Status of Volume engine
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> ------------------------------------------------------------------------------
>>>>>> >>>>>>> There are no active volume tasks
>>>>>> >>>>>>>
>>>>>> >>>>>>> How to resolve this issue?
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> _______________________________________________
>>>>>> >>>>>>> Users mailing list
>>>>>> >>>>>>> Users at ovirt.org
>>>>>> >>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> _______________________________________________
>>>>>> >>>>>> Users mailing list
>>>>>> >>>>>> Users at ovirt.org
>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>
>>>>>> >> _______________________________________________
>>>>>> >> Users mailing list
>>>>>> >> Users at ovirt.org
>>>>>> >> http://lists.ovirt.org/mailman/listinfo/users
>>>>>> >>
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > Users mailing list
>>>>>> > Users at ovirt.org
>>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>>> >
>>>>>
>>>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
More information about the Users
mailing list