[ovirt-users] VM failover with ovirt3.5

Simone Tiraboschi stirabos at redhat.com
Mon Dec 22 13:33:36 UTC 2014



----- Original Message -----
> From: "Cong Yue" <Cong_Yue at alliedtelesis.com>
> To: "Simone Tiraboschi" <stirabos at redhat.com>
> Cc: users at ovirt.org
> Sent: Friday, December 19, 2014 9:25:32 PM
> Subject: RE: [ovirt-users] VM failover with ovirt3.5
> 
> In the documentation of
> http://www.ovirt.org/OVirt_Administration_Guide#.E2.81.A0Improving_Uptime_with_Virtual_Machine_High_Availability
>  it says
> To enable the migration of highly available virtual machines:
> Power management must be configured for the hosts running the highly
> available virtual machines.

hosted-engine and VM HA are note really the same feature cause other VMs can be managed by the engine while engine VM itself cannot (chicken-and-egg problem)

> Does this mean I need to confirgure all poer management for all ovirt nodes?

No, it's not mandatory for hosted engine but it would be better to do so just for power management itself.

> Thanks,
> Cong
> 
> -----Original Message-----
> From: Yue, Cong
> Sent: Friday, December 19, 2014 10:22 AM
> To: 'Simone Tiraboschi'
> Cc: users at ovirt.org
> Subject: RE: [ovirt-users] VM failover with ovirt3.5
> 
> Thanks for the information. This is the log for my three ovirt nodes.
> From the output of hosted-engine --vm-status, it shows the engine state for
> my 2nd and 3rd ovirt node is DOWN.
> Is this the reason why VM failover not work in my environment? How can I make
> also engine works for my 2nd and 3rd ovit nodes?
> --
> --== Host 1 status ==--
> 
> Status up-to-date                  : True
> Hostname                           : 10.0.0.94
> Host ID                            : 1
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 2400
> Local maintenance                  : False
> Host timestamp                     : 150475
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=150475 (Fri Dec 19 13:12:18 2014)
> host-id=1
> score=2400
> maintenance=False
> state=EngineUp
> 
> 
> --== Host 2 status ==--
> 
> Status up-to-date                  : True
> Hostname                           : 10.0.0.93
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on
> this host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 2400
> Local maintenance                  : False
> Host timestamp                     : 1572
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=1572 (Fri Dec 19 10:12:18 2014)
> host-id=2
> score=2400
> maintenance=False
> state=EngineDown
> 
> 
> --== Host 3 status ==--
> 
> Status up-to-date                  : False
> Hostname                           : 10.0.0.92
> Host ID                            : 3
> Engine status                      : unknown stale-data
> Score                              : 2400
> Local maintenance                  : False
> Host timestamp                     : 987
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=987 (Fri Dec 19 10:09:58 2014)
> host-id=3
> score=2400
> maintenance=False
> state=EngineDown
> 
> --
> And the /var/log/ovirt-hosted-engine-ha/agent.log for three ovirt nodes are
> as follows:
> --
> 10.0.0.94(hosted-engine-1)
> ---
> MainThread::INFO::2014-12-19
> 13:09:33,716::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:09:33,716::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:09:44,017::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:09:44,017::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:09:54,303::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:09:54,303::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:04,342::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine vm running on localhost
> MainThread::INFO::2014-12-19
> 13:10:04,617::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:04,617::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:14,657::state_machine::160::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
> Global metadata: {'maintenance': False}
> MainThread::INFO::2014-12-19
> 13:10:14,657::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
> Host 10.0.0.93 (id 2): {'extra':
> 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=1448
> (Fri Dec 19 10:10:14
> 2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
> 'hostname': '10.0.0.93', 'alive': True, 'host-id': 2, 'engine-status':
> {'reason': 'vm not running on this host', 'health': 'bad', 'vm':
> 'down', 'detail': 'unknown'}, 'score': 2400, 'maintenance': False,
> 'host-ts': 1448}
> MainThread::INFO::2014-12-19
> 13:10:14,657::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
> Host 10.0.0.92 (id 3): {'extra':
> 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=987
> (Fri Dec 19 10:09:58
> 2014)\nhost-id=3\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
> 'hostname': '10.0.0.92', 'alive': True, 'host-id': 3, 'engine-status':
> {'reason': 'vm not running on this host', 'health': 'bad', 'vm':
> 'down', 'detail': 'unknown'}, 'score': 2400, 'maintenance': False,
> 'host-ts': 987}
> MainThread::INFO::2014-12-19
> 13:10:14,658::state_machine::168::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
> Local (id 1): {'engine-health': {'health': 'good', 'vm': 'up',
> 'detail': 'up'}, 'bridge': True, 'mem-free': 1079.0, 'maintenance':
> False, 'cpu-load': 0.0269, 'gateway': True}
> MainThread::INFO::2014-12-19
> 13:10:14,904::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:14,904::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:25,210::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:25,210::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:35,499::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:35,499::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:45,784::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:45,785::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:56,070::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:10:56,070::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:11:06,109::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine vm running on localhost
> MainThread::INFO::2014-12-19
> 13:11:06,359::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:11:06,359::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:11:16,658::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:11:16,658::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:11:26,991::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:11:26,991::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> MainThread::INFO::2014-12-19
> 13:11:37,341::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-12-19
> 13:11:37,341::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.93 (id: 2, score: 2400)
> ----
> 
> 10.0.0.93 (hosted-engine-2)
> MainThread::INFO::2014-12-19
> 10:12:18,339::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineDown (score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:18,339::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.94 (id: 1, score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:28,651::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineDown (score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:28,652::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.94 (id: 1, score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:39,010::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineDown (score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:39,010::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.94 (id: 1, score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:49,338::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineDown (score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:49,338::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.94 (id: 1, score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:59,642::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineDown (score: 2400)
> MainThread::INFO::2014-12-19
> 10:12:59,642::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.94 (id: 1, score: 2400)
> MainThread::INFO::2014-12-19
> 10:13:10,010::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineDown (score: 2400)
> MainThread::INFO::2014-12-19
> 10:13:10,010::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host 10.0.0.94 (id: 1, score: 2400)
> 
> 
> 10.0.0.92(hosted-engine-3)
> same as 10.0.0.93
> --
> 
> -----Original Message-----
> From: Simone Tiraboschi [mailto:stirabos at redhat.com]
> Sent: Friday, December 19, 2014 12:28 AM
> To: Yue, Cong
> Cc: users at ovirt.org
> Subject: Re: [ovirt-users] VM failover with ovirt3.5
> 
> 
> 
> ----- Original Message -----
> > From: "Cong Yue" <Cong_Yue at alliedtelesis.com>
> > To: users at ovirt.org
> > Sent: Friday, December 19, 2014 2:14:33 AM
> > Subject: [ovirt-users] VM failover with ovirt3.5
> >
> >
> >
> > Hi
> >
> >
> >
> > In my environment, I have 3 ovirt nodes as one cluster. And on top of
> > host-1, there is one vm to host ovirt engine.
> >
> > Also I have one external storage for the cluster to use as data domain
> > of engine and data.
> >
> > I confirmed live migration works well in my environment.
> >
> > But it seems very buggy for VM failover if I try to force to shut down
> > one ovirt node. Sometimes the VM in the node which is shutdown can
> > migrate to other host, but it take more than several minutes.
> >
> > Sometimes, it can not migrate at all. Sometimes, only when the host is
> > back, the VM is beginning to move.
> 
> Can you please check or share the logs under /var/log/ovirt-hosted-engine-ha/
> ?
> 
> > Is there some documentation to explain how VM failover is working? And
> > is there some bugs reported related with this?
> 
> http://www.ovirt.org/Features/Self_Hosted_Engine#Agent_State_Diagram
> 
> > Thanks in advance,
> >
> > Cong
> >
> >
> >
> >
> > This e-mail message is for the sole use of the intended recipient(s)
> > and may contain confidential and privileged information. Any
> > unauthorized review, use, disclosure or distribution is prohibited. If
> > you are not the intended recipient, please contact the sender by reply
> > e-mail and destroy all copies of the original message. If you are the
> > intended recipient, please be advised that the content of this message
> > is subject to access, review and disclosure by the sender's e-mail System
> > Administrator.
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
> 
> This e-mail message is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. Any unauthorized review,
> use, disclosure or distribution is prohibited. If you are not the intended
> recipient, please contact the sender by reply e-mail and destroy all copies
> of the original message. If you are the intended recipient, please be
> advised that the content of this message is subject to access, review and
> disclosure by the sender's e-mail System Administrator.
> 



More information about the Users mailing list