From: "Cong Yue" <Cong_Yue(a)alliedtelesis.com>
To: "Simone Tiraboschi" <stirabos(a)redhat.com>
Cc: users(a)ovirt.org
Sent: Friday, December 19, 2014 9:25:32 PM
Subject: RE: [ovirt-users] VM failover with ovirt3.5
In the documentation of
http://www.ovirt.org/OVirt_Administration_Guide#.E2.81.A0Improving_Uptime...
it says
To enable the migration of highly available virtual machines:
Power management must be configured for the hosts running the highly
available virtual machines.
hosted-engine and VM HA are note really the same feature cause other VMs can be managed by
the engine while engine VM itself cannot (chicken-and-egg problem)
No, it's not mandatory for hosted engine but it would be better to do so just for
power management itself.
Thanks,
Cong
-----Original Message-----
From: Yue, Cong
Sent: Friday, December 19, 2014 10:22 AM
To: 'Simone Tiraboschi'
Cc: users(a)ovirt.org
Subject: RE: [ovirt-users] VM failover with ovirt3.5
Thanks for the information. This is the log for my three ovirt nodes.
From the output of hosted-engine --vm-status, it shows the engine state for
my 2nd and 3rd ovirt node is DOWN.
Is this the reason why VM failover not work in my environment? How can I make
also engine works for my 2nd and 3rd ovit nodes?
--
--== Host 1 status ==--
Status up-to-date : True
Hostname : 10.0.0.94
Host ID : 1
Engine status : {"health": "good",
"vm": "up",
"detail": "up"}
Score : 2400
Local maintenance : False
Host timestamp : 150475
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=150475 (Fri Dec 19 13:12:18 2014)
host-id=1
score=2400
maintenance=False
state=EngineUp
--== Host 2 status ==--
Status up-to-date : True
Hostname : 10.0.0.93
Host ID : 2
Engine status : {"reason": "vm not running on
this host", "health": "bad", "vm": "down",
"detail": "unknown"}
Score : 2400
Local maintenance : False
Host timestamp : 1572
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1572 (Fri Dec 19 10:12:18 2014)
host-id=2
score=2400
maintenance=False
state=EngineDown
--== Host 3 status ==--
Status up-to-date : False
Hostname : 10.0.0.92
Host ID : 3
Engine status : unknown stale-data
Score : 2400
Local maintenance : False
Host timestamp : 987
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=987 (Fri Dec 19 10:09:58 2014)
host-id=3
score=2400
maintenance=False
state=EngineDown
--
And the /var/log/ovirt-hosted-engine-ha/agent.log for three ovirt nodes are
as follows:
--
10.0.0.94(hosted-engine-1)
---
MainThread::INFO::2014-12-19
13:09:33,716::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:09:33,716::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:09:44,017::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:09:44,017::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:09:54,303::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:09:54,303::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:10:04,342::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine vm running on localhost
MainThread::INFO::2014-12-19
13:10:04,617::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:10:04,617::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:10:14,657::state_machine::160::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Global metadata: {'maintenance': False}
MainThread::INFO::2014-12-19
13:10:14,657::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host 10.0.0.93 (id 2): {'extra':
'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=1448
(Fri Dec 19 10:10:14
2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': '10.0.0.93', 'alive': True, 'host-id': 2,
'engine-status':
{'reason': 'vm not running on this host', 'health':
'bad', 'vm':
'down', 'detail': 'unknown'}, 'score': 2400,
'maintenance': False,
'host-ts': 1448}
MainThread::INFO::2014-12-19
13:10:14,657::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host 10.0.0.92 (id 3): {'extra':
'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=987
(Fri Dec 19 10:09:58
2014)\nhost-id=3\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': '10.0.0.92', 'alive': True, 'host-id': 3,
'engine-status':
{'reason': 'vm not running on this host', 'health':
'bad', 'vm':
'down', 'detail': 'unknown'}, 'score': 2400,
'maintenance': False,
'host-ts': 987}
MainThread::INFO::2014-12-19
13:10:14,658::state_machine::168::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Local (id 1): {'engine-health': {'health': 'good', 'vm':
'up',
'detail': 'up'}, 'bridge': True, 'mem-free': 1079.0,
'maintenance':
False, 'cpu-load': 0.0269, 'gateway': True}
MainThread::INFO::2014-12-19
13:10:14,904::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:10:14,904::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:10:25,210::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:10:25,210::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:10:35,499::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:10:35,499::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:10:45,784::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:10:45,785::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:10:56,070::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:10:56,070::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:11:06,109::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine vm running on localhost
MainThread::INFO::2014-12-19
13:11:06,359::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:11:06,359::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:11:16,658::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:11:16,658::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:11:26,991::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:11:26,991::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
MainThread::INFO::2014-12-19
13:11:37,341::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2014-12-19
13:11:37,341::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.93 (id: 2, score: 2400)
----
10.0.0.93 (hosted-engine-2)
MainThread::INFO::2014-12-19
10:12:18,339::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-12-19
10:12:18,339::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.94 (id: 1, score: 2400)
MainThread::INFO::2014-12-19
10:12:28,651::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-12-19
10:12:28,652::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.94 (id: 1, score: 2400)
MainThread::INFO::2014-12-19
10:12:39,010::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-12-19
10:12:39,010::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.94 (id: 1, score: 2400)
MainThread::INFO::2014-12-19
10:12:49,338::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-12-19
10:12:49,338::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.94 (id: 1, score: 2400)
MainThread::INFO::2014-12-19
10:12:59,642::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-12-19
10:12:59,642::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.94 (id: 1, score: 2400)
MainThread::INFO::2014-12-19
10:13:10,010::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400)
MainThread::INFO::2014-12-19
10:13:10,010::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 10.0.0.94 (id: 1, score: 2400)
10.0.0.92(hosted-engine-3)
same as 10.0.0.93
--
-----Original Message-----
From: Simone Tiraboschi [mailto:stirabos@redhat.com]
Sent: Friday, December 19, 2014 12:28 AM
To: Yue, Cong
Cc: users(a)ovirt.org
Subject: Re: [ovirt-users] VM failover with ovirt3.5
----- Original Message -----
> From: "Cong Yue" <Cong_Yue(a)alliedtelesis.com>
> To: users(a)ovirt.org
> Sent: Friday, December 19, 2014 2:14:33 AM
> Subject: [ovirt-users] VM failover with ovirt3.5
>
>
>
> Hi
>
>
>
> In my environment, I have 3 ovirt nodes as one cluster. And on top of
> host-1, there is one vm to host ovirt engine.
>
> Also I have one external storage for the cluster to use as data domain
> of engine and data.
>
> I confirmed live migration works well in my environment.
>
> But it seems very buggy for VM failover if I try to force to shut down
> one ovirt node. Sometimes the VM in the node which is shutdown can
> migrate to other host, but it take more than several minutes.
>
> Sometimes, it can not migrate at all. Sometimes, only when the host is
> back, the VM is beginning to move.
Can you please check or share the logs under /var/log/ovirt-hosted-engine-ha/
?
> Is there some documentation to explain how VM failover is working? And
> is there some bugs reported related with this?
http://www.ovirt.org/Features/Self_Hosted_Engine#Agent_State_Diagram
> Thanks in advance,
>
> Cong
>
>
>
>
> This e-mail message is for the sole use of the intended recipient(s)
> and may contain confidential and privileged information. Any
> unauthorized review, use, disclosure or distribution is prohibited. If
> you are not the intended recipient, please contact the sender by reply
> e-mail and destroy all copies of the original message. If you are the
> intended recipient, please be advised that the content of this message
> is subject to access, review and disclosure by the sender's e-mail System
> Administrator.
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
This e-mail message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. Any unauthorized review,
use, disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy all copies
of the original message. If you are the intended recipient, please be
advised that the content of this message is subject to access, review and
disclosure by the sender's e-mail System Administrator.