It is as I expected:
Engine status : {"reason": "failed liveliness check"
The host can't talk to the ovirt-engine service. Please make sure the
host can reach the engine fqdn as configured in
/etc/ovirt-hosted-engine/hosted-engine.conf on the fqdn= line.
You can check it manually by executing $(hosted-engine
--check-liveliness) from the host.
Best regards
Martin Sivak
On Wed, Apr 25, 2018 at 12:51 PM, <dhy336(a)sina.com> wrote:
Hi,
two node :
192.168.122.66 hosted-engine1
192.168.122.223 hosted-engine2
I power off hosted-engine1, so I do not attach hosted-engine1`s log,
[root@hosted-engine2 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : False
Hostname : hosted-engine1
Host ID : 1
Engine status : unknown stale-data
Score : 3400
stopped : False
Local maintenance : False
crc32 : a7af0afa
local_conf_timestamp : 11485
Host timestamp : 11485
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=11485 (Wed Apr 25 10:08:34 2018)
host-id=1
score=3400
vm_conf_refresh_time=11485 (Wed Apr 25 10:08:34 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : hosted-engine2
Host ID : 2
Engine status : {"reason": "failed liveliness
check",
"health": "bad", "vm": "up", "detail":
"Up"}
Score : 3000
stopped : False
Local maintenance : False
crc32 : a2e82883
local_conf_timestamp : 6278
Host timestamp : 6278
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=6278 (Wed Apr 25 10:37:44 2018)
host-id=2
score=3000
vm_conf_refresh_time=6278 (Wed Apr 25 10:37:44 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineStop
stopped=False
timeout=Thu Jan 1 09:49:38 1970
----- Original Message -----
From: Martin Sivak <msivak(a)redhat.com>
To: dhy336 <dhy336(a)sina.com>, users <users(a)ovirt.org>
Subject: Re: Re: Re: Re: Re: [ovirt-users] 回复:Re: Hosted-engine can
not_switch
Date: 2018-04-25 17:41
Please attach the output of hosted-engine --vm-status and the
/var/log/ovirt-hosted-engine-ha/agent.log file from both hosts.
The VM will restart if the ovirt-engine service does not become
available within timeout. And that might mean couple of things - the
FQDN of the engine is wrong, the engine needs something that was only
available on the dead host (A) like some storage, host B cannot ping
the gateway..
Best regards
Martin Sivak
On Wed, Apr 25, 2018 at 11:33 AM, <dhy336(a)sina.com> wrote:
> sorry, I mis-represent,
>
> I hava two node, A:192.168.122.65 , B:192.168.122.66 with hosted-engine.
>
> testing engine HA :
>
> first two node is up, and hosted-engine VM run in A, then I poweroff A,
> and
> after 3 minutes, B start it`s hosted engine VM,
> But it`s ovirt-engine connect to host A, and continue for about 10
> minutes,
> then hosted engine VM restart.
> ----- Original Message -----
> From: Martin Sivak <msivak(a)redhat.com>
> To: dhy336 <dhy336(a)sina.com>
> Subject: Re: Re: Re: Re: [ovirt-users] 回复:Re: Hosted-engine can not_switch
> Date: 2018-04-25 17:11
>
>
> Your hosted engine VM has its own address that does not depend on
> which host it is currently running. So it should be available on the
> same address no matter where the VM is running.
> Best regards
> Martin Sivak
> On Wed, Apr 25, 2018 at 9:07 AM, <dhy336(a)sina.com> wrote:
>>>> I deploy two node for hosted engine, first hosted engine VM run in
>>>> 192.168.122.65, I power off this host, hosted-engine VM switch
>>>> another host,but ovirt engine still connect 192.168.122.65. if restart
>>>> ovirt-engine server, it is work.
>>
>> I think this issue is error, because hosted engine VM has power up in
>> another host( 192.168.122.66), so hosted engine should
>> connect to host( 192.168.122.66), not connet to host(192.168.122.66)?
>>
>> thanks
>>
>> ----- Original Message -----
>> From: Martin Sivak <msivak(a)redhat.com>
>> To: dhy336 <dhy336(a)sina.com>
>> Cc: users <users(a)ovirt.org>
>> Subject: Re: Re: Re: [ovirt-users] 回复:Re: Hosted-engine can not_switch
>> Date: 2018-04-20 18:28
>>
>>
>> Hi,
>> No, this is not an error. You killed the host without moving it to
>> maintenance first. The engine has no way to distinguish this from
>> temporary network failure for example. Give it some time and the host
>> will move its status to one of the error states and handle the highly
>> available VMs on it (if fencing is properly configured).
>> Best regards
>> Martin Sivak
>> On Fri, Apr 20, 2018 at 12:13 PM, <dhy336(a)sina.com> wrote:
>>> this process is not error ?
>>> ----- Original Message -----
>>> From: Martin Sivak <msivak(a)redhat.com>
>>> To: dhy336 <dhy336(a)sina.com>
>>> Cc: users <users(a)ovirt.org>
>>> Subject: Re: Re: [ovirt-users] 回复:Re: Hosted-engine can not_switch
>>> Date: 2018-04-20 18:05
>>>
>>>
>>> Hi,
>>> the engine does not know you killed the host. It will notice
>>> eventually and handle the situation. Just give it time (5 minutes or
>>> so).
>>> Best regards
>>> --
>>> Martin Sivak
>>> SLA / oVirt
>>> On Fri, Apr 20, 2018 at 12:00 PM, <dhy336(a)sina.com> wrote:
>>>> Hi, thanks for your feedback. I hava another qeustions
>>>>
>>>> I deploy two node for hosted engine, first hosted engine VM run in
>>>> 192.168.122.65, I power off this host, hosted-engine VM switch
>>>> another host,but ovirt engine still connect 192.168.122.65. if restart
>>>> ovirt-engine server, it is work.
>>>>
>>>>
>>>> 2018-04-20 17:13:04,692+08 ERROR
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
>>>> (EE-ManagedThreadFactory-en gineScheduled-Thread-98) [] Command
>>>> 'GetAllVmStatsVDSCommand(HostName = hosted-engine2,
>>>> VdsIdVDSCommandParametersBase:{hos
>>>> tId='a5428ef7-9df6-4a86-91de-7e36fda340fa'})' execution
failed:
>>>> java.net.NoRouteToHostException: No route to host
>>>> 6568 2018-04-20 17:13:04,693+08 INFO
>>>> [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher]
>>>> (EE-ManagedThreadFactory-engi neScheduled-Thread-98) [] Failed to fetch
>>>> vms info for host 'hosted-engin2' - skipping VMs monitoring.
>>>> 6569 2018-04-20 17:13:19,710+08 INFO
>>>> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
>>>> Reactor)
>>>> [] Connecting to hosted-engine2/192.168.122.656570 2018-04-20
>>>> 17:13:22,730+08 ERROR
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
>>>> (EE-ManagedThreadFactory-en gineScheduled-Thread-45) [] Command
>>>> 'GetAllVmStatsVDSCommand(HostName = hosted-engine-tchyp2,
>>>> VdsIdVDSCommandParametersBase:{hos
>>>> tId='a5428ef7-9df6-4a86-91de-7e36fda340fa'})' execution
failed:
>>>> java.net.NoRouteToHostException: No route to host
>>>> 6571 2018-04-20 17:13:22,732+08 INFO
>>>> [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher]
>>>> (EE-ManagedThreadFactory-engi neScheduled-Thread-45) [] Failed to fetch
>>>> vms info for host 'hosted-engine2' - skipping VMs monitoring.
>>>>
>>>> ----- Original Message -----
>>>> From: Martin Sivak <msivak(a)redhat.com>
>>>> To: dhy336 <dhy336(a)sina.com>
>>>> Cc: users <users(a)ovirt.org>
>>>> Subject: Re: [ovirt-users] 回复:Re: Hosted-engine can not_switch
>>>> Date: 2018-04-20 16:40
>>>>
>>>>
>>>> Hi,
>>>> your ovirt-hosted-engine-ha package is too old. You need at least
>>>> 2.1.9 to properly support 4.2 engine. The same applies to vdsm. Please
>>>> upgrade the node.
>>>> Best regards
>>>> Martin Sivak
>>>> On Fri, Apr 20, 2018 at 3:58 AM, <dhy336(a)sina.com> wrote:
>>>>> Hi I find some error logs in /var/log/ovirt-hosted-engine-ha/broker.
>>>>>
>>>>> [root@hosted-engine2 ~]# ll /rhev/data-center/mnt
>>>>> total 0
>>>>> drwxr-xr-x. 3 vdsm kvm 76 Apr 18 22:28 192.168.122.218:_exports_data
>>>>> drwxr-xr-x. 3 vdsm kvm 76 Apr 18 22:12
>>>>> 192.168.122.218:_exports_hosted-engine-test1
>>>>> [root@hosted-engine2 ~]# ll
>>>>> /rhev/data-center/mnt/192.168.122.218\:_exports_hosted-engine-test1/
>>>>> total 0
>>>>> drwxr-xr-x. 5 vdsm kvm 50 Apr 18 22:14
>>>>> 8a734205-65b7-4801-b7f0-d380eb45dbae
>>>>> -rwxr-xr-x. 1 vdsm kvm 0 Apr 20 09:54 __DIRECT_IO_TEST__
>>>>>
>>>>> uuid 8a734205-65b7-4801-b7f0-d380eb45dbae is in
>>>>> /rhev/data-center/mnt/192.168.122.218\:_exports_hosted-engine-test1/
>>>>> but broker find it in /rhev/data-center/mnt, is it my version is
>>>>> error?
>>>>> my
>>>>> ovirt-hosted-engine-ha version is 2.1.5, vdsm is 4.20.5,
>>>>> ovirt-engine is 4.2
>>>>>
>>>>> MainThread::INFO::2018-04-19
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
19:26:31,479::listener::41::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__)
>>>>> Initializing SocketServer
>>>>> MainThread::INFO::2018-04-19
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
19:26:31,480::listener::56::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__)
>>>>> SocketServer ready
>>>>> Thread-1::INFO::2018-04-19
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
19:26:31,558::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>>>> Connection established
>>>>> Thread-1::ERROR::2018-04-19
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
19:26:31,559::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>>>> Error handling request, data: 'set-storage-domain
FilesystemBackend
>>>>> dom_type=nfs3 sd_uuid=8a734205-65b7-4801-b7f0-d380eb45dbae'
>>>>> Traceback (most recent call last):
>>>>> File
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>>> line 166, in handle
>>>>> data)
>>>>> File
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>>> line 299, in _dispatch
>>>>> .set_storage_domain(client, sd_type, **options)
>>>>> File
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>> line 66, in set_storage_domain
>>>>> self._backends[client].connect()
>>>>> File
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
>>>>> line 462, in connect
>>>>> self._dom_type)
>>>>> File
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
>>>>> line 107, in get_domain_path
>>>>> " in {1}".format(sd_uuid, parent))
>>>>> BackendFailureException: path to storage domain
>>>>> 8a734205-65b7-4801-b7f0-d380eb45dbae not found in
>>>>> /rhev/data-center/mnt
>>>>> Thread-1::INFO::2018-04-19
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
19:26:31,563::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>>>> Connection closed
>>>>> Thread-2::INFO::2018-04-19
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
19:26:44,601::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>>>> Connection established
>>>>>
>>>>> ----- 原始邮件 -----
>>>>> 发件人:<dhy336(a)sina.com>
>>>>> 收件人:"Martin Sivak" <msivak(a)redhat.com>
>>>>> 抄送人:users <users(a)ovirt.org>
>>>>> 主题:[ovirt-users] 回复:Re: Hosted-engine can not_switch
>>>>> 日期:2018年04月20日 09点30分
>>>>>
>>>>> libvirt has not error logs . I only find some error for vdsm.
>>>>> vdsm log is:
>>>>> 2018-04-20 09:24:52,610+0800 INFO (jsonrpc/1) [vdsm.api] FINISH
>>>>> getVolumeInfo return={'info': {'status':
'OK', 'domain':
>>>>> '8a734205-65b7-4801-b7f0-d380eb45dbae', 'voltype':
'LEAF',
>>>>> 'description':
>>>>> 'hosted-engine.lockspace', 'parent':
>>>>> '00000000-0000-0000-0000-000000000000',
>>>>> 'format': 'RAW', 'generation': 0,
'image':
>>>>> '611272bd-c2cc-42bc-94e2-9aa52e754c35', 'ctime':
'1524032037',
>>>>> 'disktype':
>>>>> '2', 'legality': 'LEGAL', 'mtime':
'0', 'apparentsize': '1048576',
>>>>> 'children': [], 'pool': '',
'capacity': '1048576', 'uuid':
>>>>> u'7037aac6-7c8e-4efd-82f7-ca618c953fe6', 'truesize':
'1048576',
>>>>> 'type':
>>>>> 'PREALLOCATED', 'lease': {'owners': [],
'version': None}}}
>>>>> from=::1,48306,
>>>>> task_id=03a7938e-8afb-4b16-b8dd-126c2b1f5d52 (api:52)
>>>>> 2018-04-20 09:24:52,611+0800 INFO (jsonrpc/1)
[jsonrpc.JsonRpcServer]
>>>>> RPC
>>>>> call Volume.getInfo succeeded in 0.03 seconds (__init__:630)
>>>>> 2018-04-20 09:24:54,113+0800 ERROR (periodic/3)
>>>>> [virt.periodic.Operation]
>>>>> <vdsm.virt.sampling.VMBulkstatsMonitor object at 0x1e92f90>
operation
>>>>> failed
>>>>> (periodic:215)
>>>>> Traceback (most recent call last):
>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line
>>>>> 213,
>>>>> in __call__
>>>>> self._func()
>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
>>>>> 522,
>>>>> in __call__
>>>>> self._send_metrics()
>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
>>>>> 538,
>>>>> in _send_metrics
>>>>> vm_sample.interval)
>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 45,
>>>>> in
>>>>> produce
>>>>> networks(vm, stats, first_sample, last_sample, interval)
>>>>> File
"/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line
>>>>> 322,
>>>>> in
>>>>> networks
>>>>> if nic.name.startswith('hostdev'):
>>>>> AttributeError: name
>>>>> 2018-04-20 09:24:54,800+0800 INFO (Reactor thread)
>>>>> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48308
>>>>> (protocoldetector:61)
>>>>> 2018-04-20 09:24:54,810+0800 INFO (Reactor thread)
>>>>> [ProtocolDetector.Detector] Detected protocol stomp from ::1:48308
>>>>> (protocoldetector:125)
>>>>> 2018-04-20 09:24:54,810+0800 INFO (Reactor thread)
>>>>> [Broker.StompAdapter]
>>>>> Processing CONNECT request (stompreactor:103)
>>>>> 2018-04-20 09:24:54,818+0800 INFO (JsonRpc (StompReactor))
>>>>> [Broker.StompAdapter] Subscribe command received (stompreactor:132)
>>>>> 2018-04-20 09:24:55,119+0800 INFO (jsonrpc/6) [api.host] START
>>>>> getHardwareInfo() from=::1,48308 (api:46)
>>>>>
>>>>> ----- 原始邮件 -----
>>>>> 发件人:Martin Sivak <msivak(a)redhat.com>
>>>>> 收件人:dhy336 <dhy336(a)sina.com>
>>>>> 抄送人:users <users(a)ovirt.org>
>>>>> 主题:Re: [ovirt-users] Hosted-engine can not switch
>>>>> 日期:2018年04月19日 20点16分
>>>>>
>>>>>
>>>>> We need more than just this small log snippet. Please check the vdsm
>>>>> and libvirt logs as well.
>>>>> Best regards
>>>>> Martin Sivak
>>>>> On Thu, Apr 19, 2018 at 2:05 PM, <dhy336(a)sina.com> wrote:
>>>>>> Hi,
>>>>>> I deploy three node with hosted engine, I force shut down a node
>>>>>> which
>>>>>> Host-engine VM is run, But hosted engine VM in other nodes can
not
>>>>>> run.
>>>>>>
>>>>>> I find some error in /var/log/ovirt-hosted-engine-ha/agent.log
>>>>>>
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:35,787::hosted_engine::1192::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
>>>>>> Cleaning state for non-running VM
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:42,587::hosted_engine::1176::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
>>>>>> Vdsm state for VM clean
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:42,589::hosted_engine::1125::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
>>>>>> Starting vm using `/usr/sbin/hosted-engine --vm-start`
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:47,599::hosted_engine::1131::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
>>>>>> stdout:
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:47,600::hosted_engine::1132::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
>>>>>> stderr: Virtual machine does not exist: {'vmId':
>>>>>> u'08bbd680-a8a7-4267-82e7-89f36e87e930'}
>>>>>>
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:47,600::hosted_engine::1144::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
>>>>>> Engine VM started on localhost
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:47,609::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>> Trying: notify time=1524139007.61 type=state_transition
>>>>>> detail=EngineStart-EngineStarting
hostname='hosted-engine2'
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:47,670::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>> Success, was notification of state_transition
>>>>>> (EngineStart-EngineStarting)
>>>>>> sent? sent
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:47,670::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>>>>> Initializing VDSM
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:50,095::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> Connecting the storage
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:50,096::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
>>>>>> Validating storage server
>>>>>> MainThread::INFO::2018-04-19
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
19:56:52,449::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>>>>>> Storage domain reported as valid and reconnect is not forced.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users(a)ovirt.org
>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users(a)ovirt.org
>>>>>
http://lists.ovirt.org/mailman/listinfo/users