It's very hard to understand your flow when time moves backwards.
Please try again from a clean state. Make sure all hosts have same clock.
Then document the exact time you do stuff - starting/stopping a host,
checking status, etc.
Some things to check from your logs:
in agent.host01.log:
MainThread::INFO::2016-04-25
15:32:41,370::states::488::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine down and local host has best score (3400), attempting to start
engine VM
...
MainThread::INFO::2016-04-25
15:32:44,276::hosted_engine::1147::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
Engine VM started on localhost
...
MainThread::INFO::2016-04-25
15:32:58,478::states::672::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
Score is 0 due to unexpected vm shutdown at Mon Apr 25 15:32:58 2016
Why?
Also, in agent.host03.log:
MainThread::INFO::2016-04-25
15:29:53,218::states::488::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine down and local host has best score (3400), attempting to start
engine VM
MainThread::INFO::2016-04-25
15:29:53,223::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1461572993.22 type=state_transition
detail=EngineDown-EngineStart hostname='host03.ovirt.forest.go.th'
MainThread::ERROR::2016-04-25
15:30:23,253::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate)
Connection closed: Connection timed out
Why?
Also, in addition to the actions you stated, you changed a lot maintenance mode.
You can try something like this to get some interesting lines from agent.log:
egrep -i 'start eng|shut|vm started|vm running|vm is running on|
maintenance detected|migra'
Best,
On Mon, Apr 25, 2016 at 12:27 PM, Wee Sritippho <wee.s(a)forest.go.th> wrote:
The hosted engine storage is located in an external Fibre Channel
SAN.
On 25/4/2559 16:19, Martin Sivak wrote:
>
> Hi,
>
> it seems that all nodes lost access to storage for some reason after
> the host was killed. Where is your hosted engine storage located?
>
> Regards
>
> --
> Martin Sivak
> SLA / oVirt
>
>
> On Mon, Apr 25, 2016 at 10:58 AM, Wee Sritippho <wee.s(a)forest.go.th>
> wrote:
>>
>> Hi,
>>
>> From the hosted-engine FAQ, the engine VM should be up and running in
>> about
>> 5 minutes after its host was forced poweroff. However, after updated
>> oVirt
>> 3.6.4 to 3.6.5, the engine VM won't restart automatically even after 10+
>> minutes (I already made sure that global maintenance mode is set to
>> none). I
>> initially thought its a time sync issue, so I installed and enabled ntp
>> on
>> the hosts and engine. However, the issue still persists.
>>
>> ###Versions:
>> [root@host01 ~]# rpm -qa | grep ovirt
>> libgovirt-0.3.3-1.el7_2.1.x86_64
>> ovirt-vmconsole-1.0.0-1.el7.centos.noarch
>> ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
>> ovirt-hosted-engine-ha-1.3.5.3-1.el7.centos.noarch
>> ovirt-host-deploy-1.4.1-1.el7.centos.noarch
>> ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
>> ovirt-hosted-engine-setup-1.3.5.0-1.el7.centos.noarch
>> ovirt-release36-007-1.noarch
>> ovirt-setup-lib-1.0.1-1.el7.centos.noarch
>> [root@host01 ~]# rpm -qa | grep vdsm
>> vdsm-infra-4.17.26-0.el7.centos.noarch
>> vdsm-jsonrpc-4.17.26-0.el7.centos.noarch
>> vdsm-gluster-4.17.26-0.el7.centos.noarch
>> vdsm-python-4.17.26-0.el7.centos.noarch
>> vdsm-yajsonrpc-4.17.26-0.el7.centos.noarch
>> vdsm-4.17.26-0.el7.centos.noarch
>> vdsm-cli-4.17.26-0.el7.centos.noarch
>> vdsm-xmlrpc-4.17.26-0.el7.centos.noarch
>> vdsm-hook-vmfex-dev-4.17.26-0.el7.centos.noarch
>>
>> ###Log files:
>>
https://app.box.com/s/fkurmwagogwkv5smkwwq7i4ztmwf9q9r
>>
>> ###After host02 was killed:
>> [root@host03 wees]# hosted-engine --vm-status
>>
>>
>> --== Host 1 status ==--
>>
>> Status up-to-date : True
>> Hostname : host01.ovirt.forest.go.th
>> Host ID : 1
>> Engine status : {"reason": "vm not running on
this
>> host", "health": "bad", "vm":
"down", "detail": "unknown"}
>> Score : 3400
>> stopped : False
>> Local maintenance : False
>> crc32 : 396766e0
>> Host timestamp : 4391
>>
>>
>> --== Host 2 status ==--
>>
>> Status up-to-date : True
>> Hostname : host02.ovirt.forest.go.th
>> Host ID : 2
>> Engine status : {"health": "good",
"vm": "up",
>> "detail": "up"}
>> Score : 0
>> stopped : True
>> Local maintenance : False
>> crc32 : 3a345b65
>> Host timestamp : 1458
>>
>>
>> --== Host 3 status ==--
>>
>> Status up-to-date : True
>> Hostname : host03.ovirt.forest.go.th
>> Host ID : 3
>> Engine status : {"reason": "vm not running on
this
>> host", "health": "bad", "vm":
"down", "detail": "unknown"}
>> Score : 3400
>> stopped : False
>> Local maintenance : False
>> crc32 : 4c34b0ed
>> Host timestamp : 11958
>>
>> ###After host02 was killed for a while:
>> [root@host03 wees]# hosted-engine --vm-status
>>
>>
>> --== Host 1 status ==--
>>
>> Status up-to-date : False
>> Hostname : host01.ovirt.forest.go.th
>> Host ID : 1
>> Engine status : unknown stale-data
>> Score : 3400
>> stopped : False
>> Local maintenance : False
>> crc32 : 72e4e418
>> Host timestamp : 4415
>>
>>
>> --== Host 2 status ==--
>>
>> Status up-to-date : False
>> Hostname : host02.ovirt.forest.go.th
>> Host ID : 2
>> Engine status : unknown stale-data
>> Score : 0
>> stopped : True
>> Local maintenance : False
>> crc32 : 3a345b65
>> Host timestamp : 1458
>>
>>
>> --== Host 3 status ==--
>>
>> Status up-to-date : False
>> Hostname : host03.ovirt.forest.go.th
>> Host ID : 3
>> Engine status : unknown stale-data
>> Score : 3400
>> stopped : False
>> Local maintenance : False
>> crc32 : 4c34b0ed
>> Host timestamp : 11958
>>
>> ###After host02 was up again completely:
>> [root@host03 wees]# hosted-engine --vm-status
>>
>>
>> --== Host 1 status ==--
>>
>> Status up-to-date : True
>> Hostname : host01.ovirt.forest.go.th
>> Host ID : 1
>> Engine status : {"reason": "vm not running on
this
>> host", "health": "bad", "vm":
"down", "detail": "unknown"}
>> Score : 0
>> stopped : False
>> Local maintenance : False
>> crc32 : f5728fca
>> Host timestamp : 5555
>>
>>
>> --== Host 2 status ==--
>>
>> Status up-to-date : True
>> Hostname : host02.ovirt.forest.go.th
>> Host ID : 2
>> Engine status : {"health": "good",
"vm": "up",
>> "detail": "up"}
>> Score : 3400
>> stopped : False
>> Local maintenance : False
>> crc32 : e5284763
>> Host timestamp : 715
>>
>>
>> --== Host 3 status ==--
>>
>> Status up-to-date : True
>> Hostname : host03.ovirt.forest.go.th
>> Host ID : 3
>> Engine status : {"reason": "vm not running on
this
>> host", "health": "bad", "vm":
"down", "detail": "unknown"}
>> Score : 3400
>> stopped : False
>> Local maintenance : False
>> crc32 : bc10c7fc
>> Host timestamp : 13119
>>
>> --
>> Wee
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
--
วีร์ ศรีทิพโพธิ์
นักวิชาการคอมพิวเตอร์ปฏิบัติการ
ศูนย์สารสนเทศ กรมป่าไม้
โทร. 025614292-3 ต่อ 5621
มือถือ. 0864678919
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--
Didi