Hi Artem,
make sure the IDs are different, change them manually if you must!
That is all you need to do to get the agent up I think. The symlink
issue is probably related to another change we did (it happens when a
new hosted engine node is deployed by the engine) and a simple broker
restart should fix it too.
Best regards
Martin Sivak
On Mon, Jan 22, 2018 at 8:03 AM, Artem Tambovskiy
<artem.tambovskiy(a)gmail.com> wrote:
Hello Kasturi,
Yes, I set global maintenance mode intentionally,
I'm run out of the ideas troubleshooting my cluster and decided to undeploy
the hosted engine from second host, clean the installation and add again to
the cluster.
Also I cleaned the metadata with hosted-engine --clean-metadata --host-id=2
--force-clean But once I added the second host to the cluster again it
doesn't show the capability to run hosted engine. And doesn't even appear in
the output hosted-engine --vm-status
[root@ovirt1 ~]#hosted-engine --vm-status --== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname :
ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good",
"vm": "up",
"detail": "up"} Score : 3400 stopped : False Local maintenance :
False crc32
: a23c7cbd local_conf_timestamp : 848931 Host timestamp : 848930 Extra
metadata (valid at timestamp): metadata_parse_version=1
metadata_feature_version=1 timestamp=848930 (Mon Jan 22 09:53:29 2018)
host-id=1 score=3400 vm_conf_refresh_time=848931 (Mon Jan 22 09:53:29 2018)
conf_on_shared_storage=True maintenance=False state=GlobalMaintenance
stopped=False
On redeployed second host I see unknown-stale-data again, and second host
doesn't show up as a hosted-engine capable.
[root@ovirt2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : False
Hostname : ovirt1.telia.ru
Host ID : 1
Engine status : unknown stale-data
Score : 0
stopped : False
Local maintenance : False
crc32 : 18765f68
local_conf_timestamp : 848951
Host timestamp : 848951
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=848951 (Mon Jan 22 09:53:49 2018)
host-id=1
score=0
vm_conf_refresh_time=848951 (Mon Jan 22 09:53:50 2018)
conf_on_shared_storage=True
maintenance=False
state=ReinitializeFSM
stopped=False
Really strange situation ...
Regards,
Artem
On Mon, Jan 22, 2018 at 9:46 AM, Kasturi Narra <knarra(a)redhat.com> wrote:
>
> Hello Artem,
>
> Any reason why you chose hosted-engine undeploy action for the second
> host ? I see that the cluster is in global maintenance mode, was this
> intended ?
>
> command to clear the entries from hosted-engine --vm-status is
> "hosted-engine --clean-metadata --host-id=<old_host_id>
--force-clean"
>
> Hope this helps !!
>
> Thanks
> kasturi
>
>
> On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy
> <artem.tambovskiy(a)gmail.com> wrote:
>>
>> Hi,
>>
>> Ok, i decided to remove second host from the cluster.
>> I reinstalled from webUI it with hosted-engine action UNDEPLOY, and
>> removed it from the cluster aftewards.
>> All VM's are fine hosted engine running ok,
>> But hosted-engine --vm-status still showing 2 hosts.
>>
>> How I can clean the traces of second host in a correct way?
>>
>>
>> --== Host 1 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date : True
>> Hostname : ovirt1.telia.ru
>> Host ID : 1
>> Engine status : {"health": "good",
"vm": "up",
>> "detail": "up"}
>> Score : 3400
>> stopped : False
>> Local maintenance : False
>> crc32 : 1b1b6f6d
>> local_conf_timestamp : 545385
>> Host timestamp : 545385
>> Extra metadata (valid at timestamp):
>> metadata_parse_version=1
>> metadata_feature_version=1
>> timestamp=545385 (Thu Jan 18 21:34:25 2018)
>> host-id=1
>> score=3400
>> vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018)
>> conf_on_shared_storage=True
>> maintenance=False
>> state=GlobalMaintenance
>> stopped=False
>>
>>
>> --== Host 2 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date : False
>> Hostname : ovirt1.telia.ru
>> Host ID : 2
>> Engine status : unknown stale-data
>> Score : 0
>> stopped : True
>> Local maintenance : False
>> crc32 : c7037c03
>> local_conf_timestamp : 7530
>> Host timestamp : 7530
>> Extra metadata (valid at timestamp):
>> metadata_parse_version=1
>> metadata_feature_version=1
>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>> host-id=2
>> score=0
>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>> conf_on_shared_storage=True
>> maintenance=False
>> state=AgentStopped
>> stopped=True
>>
>>
>> !! Cluster is in GLOBAL MAINTENANCE mode !!
>>
>> Thank you in advance!
>> Regards,
>> Artem
>>
>>
>> On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy
>> <artem.tambovskiy(a)gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> Any further suggestions on how to fix the issue and make HA setup
>>> working? Can the complete removal of second host (with complete removal
>>> ovirt configuration files and packages) from cluster and adding it again
>>> solve the issue? Or it might completly ruin the cluster?
>>>
>>> Regards,
>>> Artem
>>>
>>> 16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy"
>>> <artem.tambovskiy(a)gmail.com> написал:
>>>
>>>> Hi Martin,
>>>>
>>>> Thanks for feedback.
>>>>
>>>> All hosts and hosted-engine running 4.1.8 release.
>>>> The strange thing : I can see that host ID is set to 1 on both hosts at
>>>> /etc/ovirt-hosted-engine/hosted-engine.conf file.
>>>> I have no idea how this happen, the only thing I have changed recently
>>>> is that I have changed mnt_options in order to add
backup-volfile-servers
>>>> by using hosted-engine --set-shared-config command
>>>>
>>>> Both agent and broker are running on second host
>>>>
>>>> [root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha-
>>>> vdsm 42331 1 26 14:40 ? 00:31:35 /usr/bin/python
>>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
>>>> vdsm 42332 1 0 14:40 ? 00:00:16 /usr/bin/python
>>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>>>>
>>>> but I saw some tracebacks during the broker start
>>>>
>>>> [root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker
>>>> -l
>>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>>>> Communications Broker
>>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>>>> enabled; vendor preset: disabled)
>>>> Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min
>>>> ago
>>>> Main PID: 42331 (ovirt-ha-broker)
>>>> CGroup: /system.slice/ovirt-ha-broker.service
>>>> └─42331 /usr/bin/python
>>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
>>>>
>>>> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine
>>>> High Availability Communications Broker.
>>>> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted
>>>> Engine High Availability Communications Broker...
>>>> Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker
>>>> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error
>>>> handling request, data: 'set-storage-domain FilesystemBackend
>>>> dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162'
>>>> Traceback (most
>>>> recent call last):
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>> line 166, in handle
>>>> data)
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>> line 299, in _dispatch
>>>>
>>>> .set_storage_domain(client, sd_type, **options)
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>> line 66, in set_storage_domain
>>>>
>>>> self._backends[client].connect()
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
>>>> line 462, in connect
>>>>
>>>> self._dom_type)
>>>> File
>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
>>>> line 107, in get_domain_path
>>>> " in
>>>> {1}".format(sd_uuid, parent))
>>>>
>>>> BackendFailureException: path to storage domain
>>>> 4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in
>>>> /rhev/data-center/mnt/glusterSD
>>>>
>>>>
>>>>
>>>> I have tried to issue hosted-engine --connect-storage on second host
>>>> followed by agent & broker restart
>>>> But there is no any visible improvements.
>>>>
>>>> Regards,
>>>> Artem
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak <msivak(a)redhat.com>
>>>> wrote:
>>>>>
>>>>> Hi everybody,
>>>>>
>>>>> there are couple of things to check here.
>>>>>
>>>>> - what version of hosted engine agent is this? The logs look like
>>>>> coming from 4.1
>>>>> - what version of engine is used?
>>>>> - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf
on
>>>>> both hosts, the numbers must be different
>>>>> - it looks like the agent or broker on host 2 is not active (or
there
>>>>> would be a report)
>>>>> - the second host does not see data from the first host (unknown
>>>>> stale-data), wait for a minute and check again, then check the
storage
>>>>> connection
>>>>>
>>>>> And then the general troubleshooting:
>>>>>
>>>>> - put hosted engine in global maintenance mode (and check that it is
>>>>> visible from the other host using he --vm-status)
>>>>> - mount storage domain (hosted-engine --connect-storage)
>>>>> - check sanlock client status to see if proper lockspaces are
present
>>>>>
>>>>> Best regards
>>>>>
>>>>> Martin Sivak
>>>>>
>>>>> On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <derek(a)ihtfp.com>
wrote:
>>>>> > Why are both hosts reporting as ovirt 1?
>>>>> > Look at the hostname fields to see what mean.
>>>>> >
>>>>> > -derek
>>>>> > Sent using my mobile device. Please excuse any typos.
>>>>> >
>>>>> > On January 16, 2018 7:11:09 AM Artem Tambovskiy
>>>>> > <artem.tambovskiy(a)gmail.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> Hello,
>>>>> >>
>>>>> >> Yes, I followed exactly the same procedure while
reinstalling the
>>>>> >> hosts
>>>>> >> (the only difference that I have SSH key configured instead
of the
>>>>> >> password).
>>>>> >>
>>>>> >> Just reinstalled the second host one more time, after 20 min
the
>>>>> >> host
>>>>> >> still haven't reached active score of 3400 (Hosted
Engine HA:Not
>>>>> >> Active) and
>>>>> >> I still don't see crown icon for this host.
>>>>> >>
>>>>> >> hosted-engine --vm-status from ovirt1 host
>>>>> >>
>>>>> >> [root@ovirt1 ~]# hosted-engine --vm-status
>>>>> >>
>>>>> >>
>>>>> >> --== Host 1 status ==--
>>>>> >>
>>>>> >> conf_on_shared_storage : True
>>>>> >> Status up-to-date : True
>>>>> >> Hostname : ovirt1.telia.ru
>>>>> >> Host ID : 1
>>>>> >> Engine status : {"health":
"good", "vm": "up",
>>>>> >> "detail": "up"}
>>>>> >> Score : 3400
>>>>> >> stopped : False
>>>>> >> Local maintenance : False
>>>>> >> crc32 : 3f94156a
>>>>> >> local_conf_timestamp : 349144
>>>>> >> Host timestamp : 349144
>>>>> >> Extra metadata (valid at timestamp):
>>>>> >> metadata_parse_version=1
>>>>> >> metadata_feature_version=1
>>>>> >> timestamp=349144 (Tue Jan 16 15:03:45 2018)
>>>>> >> host-id=1
>>>>> >> score=3400
>>>>> >> vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45
2018)
>>>>> >> conf_on_shared_storage=True
>>>>> >> maintenance=False
>>>>> >> state=EngineUp
>>>>> >> stopped=False
>>>>> >>
>>>>> >>
>>>>> >> --== Host 2 status ==--
>>>>> >>
>>>>> >> conf_on_shared_storage : True
>>>>> >> Status up-to-date : False
>>>>> >> Hostname : ovirt1.telia.ru
>>>>> >> Host ID : 2
>>>>> >> Engine status : unknown stale-data
>>>>> >> Score : 0
>>>>> >> stopped : True
>>>>> >> Local maintenance : False
>>>>> >> crc32 : c7037c03
>>>>> >> local_conf_timestamp : 7530
>>>>> >> Host timestamp : 7530
>>>>> >> Extra metadata (valid at timestamp):
>>>>> >> metadata_parse_version=1
>>>>> >> metadata_feature_version=1
>>>>> >> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>> >> host-id=2
>>>>> >> score=0
>>>>> >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12
2018)
>>>>> >> conf_on_shared_storage=True
>>>>> >> maintenance=False
>>>>> >> state=AgentStopped
>>>>> >> stopped=True
>>>>> >>
>>>>> >>
>>>>> >> hosted-engine --vm-status output from ovirt2 host
>>>>> >>
>>>>> >> [root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine
--vm-status
>>>>> >>
>>>>> >>
>>>>> >> --== Host 1 status ==--
>>>>> >>
>>>>> >> conf_on_shared_storage : True
>>>>> >> Status up-to-date : False
>>>>> >> Hostname : ovirt1.telia.ru
>>>>> >> Host ID : 1
>>>>> >> Engine status : unknown stale-data
>>>>> >> Score : 3400
>>>>> >> stopped : False
>>>>> >> Local maintenance : False
>>>>> >> crc32 : 6d3606f1
>>>>> >> local_conf_timestamp : 349264
>>>>> >> Host timestamp : 349264
>>>>> >> Extra metadata (valid at timestamp):
>>>>> >> metadata_parse_version=1
>>>>> >> metadata_feature_version=1
>>>>> >> timestamp=349264 (Tue Jan 16 15:05:45 2018)
>>>>> >> host-id=1
>>>>> >> score=3400
>>>>> >> vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45
2018)
>>>>> >> conf_on_shared_storage=True
>>>>> >> maintenance=False
>>>>> >> state=EngineUp
>>>>> >> stopped=False
>>>>> >>
>>>>> >>
>>>>> >> --== Host 2 status ==--
>>>>> >>
>>>>> >> conf_on_shared_storage : True
>>>>> >> Status up-to-date : False
>>>>> >> Hostname : ovirt1.telia.ru
>>>>> >> Host ID : 2
>>>>> >> Engine status : unknown stale-data
>>>>> >> Score : 0
>>>>> >> stopped : True
>>>>> >> Local maintenance : False
>>>>> >> crc32 : c7037c03
>>>>> >> local_conf_timestamp : 7530
>>>>> >> Host timestamp : 7530
>>>>> >> Extra metadata (valid at timestamp):
>>>>> >> metadata_parse_version=1
>>>>> >> metadata_feature_version=1
>>>>> >> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>> >> host-id=2
>>>>> >> score=0
>>>>> >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12
2018)
>>>>> >> conf_on_shared_storage=True
>>>>> >> maintenance=False
>>>>> >> state=AgentStopped
>>>>> >> stopped=True
>>>>> >>
>>>>> >>
>>>>> >> Also I saw some log messages in webGUI about time drift
like
>>>>> >>
>>>>> >> "Host ovirt2.telia.ru has time-drift of 5305 seconds
while maximum
>>>>> >> configured value is 300 seconds." that is a bit weird
as haven't
>>>>> >> touched any
>>>>> >> time settings since I installed the cluster.
>>>>> >> both host have the same time and timezone (MSK) but hosted
engine
>>>>> >> lives in
>>>>> >> UTC timezone. Is it mandatory to have everything in sync and
in the
>>>>> >> same
>>>>> >> timezone?
>>>>> >>
>>>>> >> Regards,
>>>>> >> Artem
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra
<knarra(a)redhat.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hello,
>>>>> >>>
>>>>> >>> I now see that your hosted engine is up and
running. Can you
>>>>> >>> let me
>>>>> >>> know how did you try reinstalling the host? Below is the
procedure
>>>>> >>> which is
>>>>> >>> used and hope you did not miss any step while
reinstalling. If no,
>>>>> >>> can you
>>>>> >>> try reinstalling again and see if that works ?
>>>>> >>>
>>>>> >>> 1) Move the host to maintenance
>>>>> >>> 2) click on reinstall
>>>>> >>> 3) provide the password
>>>>> >>> 4) uncheck 'automatically configure host
firewall'
>>>>> >>> 5) click on 'Deploy' tab
>>>>> >>> 6) click Hosted Engine deployment as 'Deploy'
>>>>> >>>
>>>>> >>> And once the host installation is done, wait till the
active score
>>>>> >>> of the
>>>>> >>> host shows 3400 in the general tab then check
hosted-engine
>>>>> >>> --vm-status.
>>>>> >>>
>>>>> >>> Thanks
>>>>> >>> kasturi
>>>>> >>>
>>>>> >>> On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy
>>>>> >>> <artem.tambovskiy(a)gmail.com> wrote:
>>>>> >>>>
>>>>> >>>> Hello,
>>>>> >>>>
>>>>> >>>> I have uploaded 2 archives with all relevant logs to
shared
>>>>> >>>> hosting
>>>>> >>>> files from host 1 (which is currently running all
VM's including
>>>>> >>>> hosted_engine) -
https://yadi.sk/d/PttRoYV63RTvhK
>>>>> >>>> files from second host -
https://yadi.sk/d/UBducEsV3RTvhc
>>>>> >>>>
>>>>> >>>> I have tried to restart both ovirt-ha-agent and
ovirt-ha-broker
>>>>> >>>> but it
>>>>> >>>> gives no effect. I have also tried to shutdown
hosted_engine VM,
>>>>> >>>> stop
>>>>> >>>> ovirt-ha-agent and ovirt-ha-broker services
disconnect storage
>>>>> >>>> and connect
>>>>> >>>> it again - no effect as well.
>>>>> >>>> Also I tried to reinstall second host from WebGUI -
this lead to
>>>>> >>>> the
>>>>> >>>> interesting situation - now hosted-engine
--vm-status shows
>>>>> >>>> that both
>>>>> >>>> hosts have the same address.
>>>>> >>>>
>>>>> >>>> [root@ovirt1 ~]# hosted-engine --vm-status
>>>>> >>>>
>>>>> >>>> --== Host 1 status ==--
>>>>> >>>>
>>>>> >>>> conf_on_shared_storage : True
>>>>> >>>> Status up-to-date : True
>>>>> >>>> Hostname :
ovirt1.telia.ru
>>>>> >>>> Host ID : 1
>>>>> >>>> Engine status :
{"health": "good", "vm":
>>>>> >>>> "up",
>>>>> >>>> "detail": "up"}
>>>>> >>>> Score : 3400
>>>>> >>>> stopped : False
>>>>> >>>> Local maintenance : False
>>>>> >>>> crc32 : a7758085
>>>>> >>>> local_conf_timestamp : 259327
>>>>> >>>> Host timestamp : 259327
>>>>> >>>> Extra metadata (valid at timestamp):
>>>>> >>>> metadata_parse_version=1
>>>>> >>>> metadata_feature_version=1
>>>>> >>>> timestamp=259327 (Mon Jan 15 14:06:48 2018)
>>>>> >>>> host-id=1
>>>>> >>>> score=3400
>>>>> >>>> vm_conf_refresh_time=259327 (Mon Jan 15
14:06:48 2018)
>>>>> >>>> conf_on_shared_storage=True
>>>>> >>>> maintenance=False
>>>>> >>>> state=EngineUp
>>>>> >>>> stopped=False
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --== Host 2 status ==--
>>>>> >>>>
>>>>> >>>> conf_on_shared_storage : True
>>>>> >>>> Status up-to-date : False
>>>>> >>>> Hostname :
ovirt1.telia.ru
>>>>> >>>> Host ID : 2
>>>>> >>>> Engine status : unknown
stale-data
>>>>> >>>> Score : 0
>>>>> >>>> stopped : True
>>>>> >>>> Local maintenance : False
>>>>> >>>> crc32 : c7037c03
>>>>> >>>> local_conf_timestamp : 7530
>>>>> >>>> Host timestamp : 7530
>>>>> >>>> Extra metadata (valid at timestamp):
>>>>> >>>> metadata_parse_version=1
>>>>> >>>> metadata_feature_version=1
>>>>> >>>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>>>>> >>>> host-id=2
>>>>> >>>> score=0
>>>>> >>>> vm_conf_refresh_time=7530 (Fri Jan 12
16:10:12 2018)
>>>>> >>>> conf_on_shared_storage=True
>>>>> >>>> maintenance=False
>>>>> >>>> state=AgentStopped
>>>>> >>>> stopped=True
>>>>> >>>>
>>>>> >>>> Gluster seems working fine. all gluster nodes
showing connected
>>>>> >>>> state.
>>>>> >>>>
>>>>> >>>> Any advises on how to resolve this situation are
highly
>>>>> >>>> appreciated!
>>>>> >>>>
>>>>> >>>> Regards,
>>>>> >>>> Artem
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra
>>>>> >>>> <knarra(a)redhat.com>
>>>>> >>>> wrote:
>>>>> >>>>>
>>>>> >>>>> Hello Artem,
>>>>> >>>>>
>>>>> >>>>> Can you check if glusterd service is
running on host1
>>>>> >>>>> and all
>>>>> >>>>> the peers are in connected state ? If yes, can
you restart
>>>>> >>>>> ovirt-ha-agent
>>>>> >>>>> and broker services and check if things are
working fine ?
>>>>> >>>>>
>>>>> >>>>> Thanks
>>>>> >>>>> kasturi
>>>>> >>>>>
>>>>> >>>>> On Sat, Jan 13, 2018 at 12:33 AM, Artem
Tambovskiy
>>>>> >>>>> <artem.tambovskiy(a)gmail.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Explored logs on both hosts.
>>>>> >>>>>> broker.log shows no errors.
>>>>> >>>>>>
>>>>> >>>>>> agent.log looking not good:
>>>>> >>>>>>
>>>>> >>>>>> on host1 (which running hosted engine) :
>>>>> >>>>>>
>>>>> >>>>>> MainThread::ERROR::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>> >>>>>> Traceback (most recent call last):
>>>>> >>>>>> File
>>>>> >>>>>>
>>>>> >>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> >>>>>> line 191, in _run_agent
>>>>> >>>>>> return action(he)
>>>>> >>>>>> File
>>>>> >>>>>>
>>>>> >>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> >>>>>> line 64, in action_proper
>>>>> >>>>>> return he.start_monitoring()
>>>>> >>>>>> File
>>>>> >>>>>>
>>>>> >>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> >>>>>> line 411, in start_monitoring
>>>>> >>>>>> self._initialize_sanlock()
>>>>> >>>>>> File
>>>>> >>>>>>
>>>>> >>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> >>>>>> line 749, in _initialize_sanlock
>>>>> >>>>>> "Failed to initialize sanlock, the
number of errors has"
>>>>> >>>>>> SanlockInitializationError: Failed to
initialize sanlock, the
>>>>> >>>>>> number
>>>>> >>>>>> of errors has exceeded the limit
>>>>> >>>>>>
>>>>> >>>>>> MainThread::ERROR::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>> >>>>>> Trying to restart agent
>>>>> >>>>>> MainThread::WARNING::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>>> >>>>>> Restarting agent, attempt '1'
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.agenthosted_engine.HostedEngine::(_get_hostname)
>>>>> >>>>>> Found certificate common name:
ovirt1.telia.ru
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> >>>>>> Initializing VDSM
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Connecting the storage
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> >>>>>> Validating storage server
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.libstorage_server.StorageServer::(connect_storage_server)
>>>>> >>>>>> Connecting storage server
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>> >>>>>> Connecting storage server
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>>> >>>>>> Refreshing the storage domain
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Preparing images
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images)
>>>>> >>>>>> Preparing images
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Refreshing vm.conf
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> >>>>>> Reloading vm.conf from the shared storage
domain
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Trying to get a fresher copy of vm
configuration from the
>>>>> >>>>>> OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> >>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> >>>>>> OVF_STORE volume path:
>>>>> >>>>>>
>>>>> >>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Found an OVF for HE VM, trying to convert
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Got vm.conf from OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>>> >>>>>> Initializing ha-broker connection
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> >>>>>> Starting monitor ping, options
{'addr': '80.239.162.97'}
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> >>>>>> Success, id 140547104457680
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> >>>>>> Starting monitor mgmt-bridge, options
{'use_ssl': 'true',
>>>>> >>>>>> 'bridge_name':
>>>>> >>>>>> 'ovirtmgmt', 'address':
'0'}
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor)
>>>>> >>>>>> Success, id 140547104458064
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> >>>>>> Starting monitor mem-free, options
{'use_ssl': 'true',
>>>>> >>>>>> 'address': '0'}
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> >>>>>> Success, id 140547104458448
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor)
>>>>> >>>>>> Starting monitor cpu-load-no-engine, options
{'use_ssl':
>>>>> >>>>>> 'true', 'vm_uuid':
>>>>> >>>>>>
'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> >>>>>> Success, id 140547104457552
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> >>>>>> Starting monitor engine-health, options
{'use_ssl': 'true',
>>>>> >>>>>> 'vm_uuid':
>>>>> >>>>>>
'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'}
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>>> >>>>>> Success, id 140547104459792
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain)
>>>>> >>>>>> Success, id 140546772847056
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>>> >>>>>> Broker initialized, all submonitors started
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> >>>>>> Ensuring lease for lockspace hosted-engine,
host id 1 is
>>>>> >>>>>> acquired (file:
>>>>> >>>>>>
>>>>> >>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769)
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> >>>>>> Failed to acquire the lock. Waiting
'5's before the next
>>>>> >>>>>> attempt
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> >>>>>> Failed to acquire the lock. Waiting
'5's before the next
>>>>> >>>>>> attempt
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> >>>>>> Failed to acquire the lock. Waiting
'5's before the next
>>>>> >>>>>> attempt
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>>>>> >>>>>> Failed to acquire the lock. Waiting
'5's before the next
>>>>> >>>>>> attempt
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> agent.log from second host
>>>>> >>>>>>
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Connecting the storage
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> >>>>>> Validating storage server
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Storage domain reported as valid and
reconnect is not forced.
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> >>>>>> Current state EngineUnexpectedlyDown (score:
0)
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> >>>>>> Reloading vm.conf from the shared storage
domain
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Trying to get a fresher copy of vm
configuration from the
>>>>> >>>>>> OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> >>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> >>>>>> OVF_STORE volume path:
>>>>> >>>>>>
>>>>> >>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Found an OVF for HE VM, trying to convert
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Got vm.conf from OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> >>>>>> Initializing VDSM
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Connecting the storage
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> >>>>>> Validating storage server
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Storage domain reported as valid and
reconnect is not forced.
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>> >>>>>> Score is 0 due to unexpected vm shutdown at
Fri Jan 12 21:57:48
>>>>> >>>>>> 2018
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> >>>>>> Current state EngineUnexpectedlyDown (score:
0)
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> >>>>>> Reloading vm.conf from the shared storage
domain
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Trying to get a fresher copy of vm
configuration from the
>>>>> >>>>>> OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> >>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> >>>>>> OVF_STORE volume path:
>>>>> >>>>>>
>>>>> >>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Found an OVF for HE VM, trying to convert
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Got vm.conf from OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>> >>>>>> Initializing VDSM
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Connecting the storage
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> >>>>>> Validating storage server
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Storage domain reported as valid and
reconnect is not forced.
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>> >>>>>> Score is 0 due to unexpected vm shutdown at
Fri Jan 12 21:57:47
>>>>> >>>>>> 2018
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>> >>>>>> Current state EngineUnexpectedlyDown (score:
0)
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>> >>>>>> Reloading vm.conf from the shared storage
domain
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Trying to get a fresher copy of vm
configuration from the
>>>>> >>>>>> OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> >>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>> >>>>>> OVF_STORE volume path:
>>>>> >>>>>>
>>>>> >>>>>>
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Found an OVF for HE VM, trying to convert
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>> >>>>>> Got vm.conf from OVF_STORE
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>>>>> >>>>>> Engine down, local host does not have best
score
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engineHostedEngine::(_initialize_vdsm)
>>>>> >>>>>> Initializing VDSM
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>> >>>>>> Connecting the storage
>>>>> >>>>>> MainThread::INFO::2018-01-12
>>>>> >>>>>>
>>>>> >>>>>>
22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>> >>>>>> Validating storage server
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Any suggestions how to resolve this .
>>>>> >>>>>>
>>>>> >>>>>> regards,
>>>>> >>>>>> Artem
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem
Tambovskiy
>>>>> >>>>>> <artem.tambovskiy(a)gmail.com> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> Trying to fix one thing I broke another
:(
>>>>> >>>>>>>
>>>>> >>>>>>> I fixed mnt_options for hosted engine
storage domain and
>>>>> >>>>>>> installed
>>>>> >>>>>>> latest security patches to my hosts and
hosted engine. All
>>>>> >>>>>>> VM's up and
>>>>> >>>>>>> running, but hosted_engine --vm-status
reports about issues:
>>>>> >>>>>>>
>>>>> >>>>>>> [root@ovirt1 ~]# hosted-engine
--vm-status
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --== Host 1 status ==--
>>>>> >>>>>>>
>>>>> >>>>>>> conf_on_shared_storage :
True
>>>>> >>>>>>> Status up-to-date :
False
>>>>> >>>>>>> Hostname :
ovirt2
>>>>> >>>>>>> Host ID : 1
>>>>> >>>>>>> Engine status :
unknown stale-data
>>>>> >>>>>>> Score : 0
>>>>> >>>>>>> stopped :
False
>>>>> >>>>>>> Local maintenance :
False
>>>>> >>>>>>> crc32 :
193164b8
>>>>> >>>>>>> local_conf_timestamp :
8350
>>>>> >>>>>>> Host timestamp :
8350
>>>>> >>>>>>> Extra metadata (valid at timestamp):
>>>>> >>>>>>> metadata_parse_version=1
>>>>> >>>>>>> metadata_feature_version=1
>>>>> >>>>>>> timestamp=8350 (Fri Jan 12
19:03:54 2018)
>>>>> >>>>>>> host-id=1
>>>>> >>>>>>> score=0
>>>>> >>>>>>> vm_conf_refresh_time=8350 (Fri
Jan 12 19:03:54 2018)
>>>>> >>>>>>> conf_on_shared_storage=True
>>>>> >>>>>>> maintenance=False
>>>>> >>>>>>> state=EngineUnexpectedlyDown
>>>>> >>>>>>> stopped=False
>>>>> >>>>>>> timeout=Thu Jan 1 05:24:43
1970
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --== Host 2 status ==--
>>>>> >>>>>>>
>>>>> >>>>>>> conf_on_shared_storage :
True
>>>>> >>>>>>> Status up-to-date :
False
>>>>> >>>>>>> Hostname :
ovirt1.telia.ru
>>>>> >>>>>>> Host ID : 2
>>>>> >>>>>>> Engine status :
unknown stale-data
>>>>> >>>>>>> Score : 0
>>>>> >>>>>>> stopped :
True
>>>>> >>>>>>> Local maintenance :
False
>>>>> >>>>>>> crc32 :
c7037c03
>>>>> >>>>>>> local_conf_timestamp :
7530
>>>>> >>>>>>> Host timestamp :
7530
>>>>> >>>>>>> Extra metadata (valid at timestamp):
>>>>> >>>>>>> metadata_parse_version=1
>>>>> >>>>>>> metadata_feature_version=1
>>>>> >>>>>>> timestamp=7530 (Fri Jan 12
16:10:12 2018)
>>>>> >>>>>>> host-id=2
>>>>> >>>>>>> score=0
>>>>> >>>>>>> vm_conf_refresh_time=7530 (Fri
Jan 12 16:10:12 2018)
>>>>> >>>>>>> conf_on_shared_storage=True
>>>>> >>>>>>> maintenance=False
>>>>> >>>>>>> state=AgentStopped
>>>>> >>>>>>> stopped=True
>>>>> >>>>>>> [root@ovirt1 ~]#
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> from second host situation looks a bit
different:
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> [root@ovirt2 ~]# hosted-engine
--vm-status
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --== Host 1 status ==--
>>>>> >>>>>>>
>>>>> >>>>>>> conf_on_shared_storage :
True
>>>>> >>>>>>> Status up-to-date :
True
>>>>> >>>>>>> Hostname :
ovirt2
>>>>> >>>>>>> Host ID : 1
>>>>> >>>>>>> Engine status :
{"reason": "vm not
>>>>> >>>>>>> running on
>>>>> >>>>>>> this host", "health":
"bad", "vm": "down", "detail":
>>>>> >>>>>>> "unknown"}
>>>>> >>>>>>> Score : 0
>>>>> >>>>>>> stopped :
False
>>>>> >>>>>>> Local maintenance :
False
>>>>> >>>>>>> crc32 :
78eabdb6
>>>>> >>>>>>> local_conf_timestamp :
8403
>>>>> >>>>>>> Host timestamp :
8402
>>>>> >>>>>>> Extra metadata (valid at timestamp):
>>>>> >>>>>>> metadata_parse_version=1
>>>>> >>>>>>> metadata_feature_version=1
>>>>> >>>>>>> timestamp=8402 (Fri Jan 12
19:04:47 2018)
>>>>> >>>>>>> host-id=1
>>>>> >>>>>>> score=0
>>>>> >>>>>>> vm_conf_refresh_time=8403 (Fri
Jan 12 19:04:47 2018)
>>>>> >>>>>>> conf_on_shared_storage=True
>>>>> >>>>>>> maintenance=False
>>>>> >>>>>>> state=EngineUnexpectedlyDown
>>>>> >>>>>>> stopped=False
>>>>> >>>>>>> timeout=Thu Jan 1 05:24:43
1970
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --== Host 2 status ==--
>>>>> >>>>>>>
>>>>> >>>>>>> conf_on_shared_storage :
True
>>>>> >>>>>>> Status up-to-date :
False
>>>>> >>>>>>> Hostname :
ovirt1.telia.ru
>>>>> >>>>>>> Host ID : 2
>>>>> >>>>>>> Engine status :
unknown stale-data
>>>>> >>>>>>> Score : 0
>>>>> >>>>>>> stopped :
True
>>>>> >>>>>>> Local maintenance :
False
>>>>> >>>>>>> crc32 :
c7037c03
>>>>> >>>>>>> local_conf_timestamp :
7530
>>>>> >>>>>>> Host timestamp :
7530
>>>>> >>>>>>> Extra metadata (valid at timestamp):
>>>>> >>>>>>> metadata_parse_version=1
>>>>> >>>>>>> metadata_feature_version=1
>>>>> >>>>>>> timestamp=7530 (Fri Jan 12
16:10:12 2018)
>>>>> >>>>>>> host-id=2
>>>>> >>>>>>> score=0
>>>>> >>>>>>> vm_conf_refresh_time=7530 (Fri
Jan 12 16:10:12 2018)
>>>>> >>>>>>> conf_on_shared_storage=True
>>>>> >>>>>>> maintenance=False
>>>>> >>>>>>> state=AgentStopped
>>>>> >>>>>>> stopped=True
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> WebGUI shows that engine running on host
ovirt1.
>>>>> >>>>>>> Gluster looks fine
>>>>> >>>>>>> [root@ovirt1 ~]# gluster volume status
engine
>>>>> >>>>>>> Status of volume: engine
>>>>> >>>>>>> Gluster process
TCP Port RDMA
>>>>> >>>>>>> Port
>>>>> >>>>>>> Online Pid
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
------------------------------------------------------------------------------
>>>>> >>>>>>> Brick ovirt1.teliaru:/oVirt/engine
49169 0
>>>>> >>>>>>> Y
>>>>> >>>>>>> 3244
>>>>> >>>>>>> Brick ovirt2.telia.ru:/oVirt/engine
49179 0
>>>>> >>>>>>> Y
>>>>> >>>>>>> 20372
>>>>> >>>>>>> Brick ovirt3.telia.ru:/oVirt/engine
49206 0
>>>>> >>>>>>> Y
>>>>> >>>>>>> 16609
>>>>> >>>>>>> Self-heal Daemon on localhost
N/A N/A
>>>>> >>>>>>> Y
>>>>> >>>>>>> 117868
>>>>> >>>>>>> Self-heal Daemon on ovirt2.telia.ru
N/A N/A
>>>>> >>>>>>> Y
>>>>> >>>>>>> 20521
>>>>> >>>>>>> Self-heal Daemon on ovirt3
N/A N/A
>>>>> >>>>>>> Y
>>>>> >>>>>>> 25093
>>>>> >>>>>>>
>>>>> >>>>>>> Task Status of Volume engine
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
------------------------------------------------------------------------------
>>>>> >>>>>>> There are no active volume tasks
>>>>> >>>>>>>
>>>>> >>>>>>> How to resolve this issue?
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
_______________________________________________
>>>>> >>>>>>> Users mailing list
>>>>> >>>>>>> Users(a)ovirt.org
>>>>> >>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
_______________________________________________
>>>>> >>>>>> Users mailing list
>>>>> >>>>>> Users(a)ovirt.org
>>>>> >>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> Users mailing list
>>>>> >> Users(a)ovirt.org
>>>>> >>
http://lists.ovirt.org/mailman/listinfo/users
>>>>> >>
>>>>> >
>>>>> > _______________________________________________
>>>>> > Users mailing list
>>>>> > Users(a)ovirt.org
>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>> >
>>>>
>>>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
>>
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users