[Users] High Availability

René Koch r.koch at ovido.at
Tue Apr 16 12:31:48 UTC 2013


 
-----Original message-----
> From:suporte at logicworks.pt <suporte at logicworks.pt>
> Sent: Tuesday 16th April 2013 14:03
> To: Gianluca Cecchi <gianluca.cecchi at gmail.com>
> Cc: René Koch <r.koch at ovido.at>; users <Users at ovirt.org>
> Subject: Re: [Users] High Availability
> 
> Well, we also disconnected the ilo NIC cable. We did another test, and just disconnected the NIC cables but the ilo NIC cable, and voilá the HA took about 3 minutes to migrate the VM to the other host. We notice too that the manager did a reboot to the failed host. For a more real scenario we disconnected the power cable from the host and after about 2 or 3 minutes the manager put the host in non-responsive and the VM in unknown state. Is this the correct behavior?


Fencing means that the non-responsive host gets reseted (powered off and on).
If fencing isn't working (as you disconnected the power cable and so ILO can't send you a success message) the vms want get started on another host.
In your example this seems to be strange, but lets have a look at the following scenario:
- You have 2 datacenters with 1 hypervisor in DC 1 and 1 hypervisor in DC 2, ovirt-engine is running in DC 1
- Connection between dcs is lost
- Fencing isn't working
- VM is running on host in DC 2
- If VM would start on host in DC 1 without successful fencing your vm disk would be broken (host in DC 2 and DC 1 is writing on the same storage file)

Maybe there are better examples then this one (would be interesting to know what your storage metro-cluster is doing in this scenario with this split-brain-situation), but I hope it's clear to you why fencing is working as it is and what can happen if it would be less restrictive...


Regards,
René


> 
> Regards
> Jose
> 
> ----- Mensagem original -----
> De: "Gianluca Cecchi" <gianluca.cecchi at gmail.com>
> Para: suporte at logicworks.pt
> Cc: "René Koch (ovido)" <r.koch at ovido.at>, "users" <Users at ovirt.org>
> Enviadas: Terça-feira, 16 Abril, 2013 12:12:43
> Assunto: Re: [Users] High Availability
> 
> On Tue, Apr 16, 2013 at 12:56 PM,  suporte wrote:
> > Hi,
> >
> > We have 2 Fujitsu servers and one iSCSI storage domain. The servers have the power management configured with ilo3.
> > We can live migrate a VM and when rebooting the host of that VM it does the migration to the other host.
> >
> > For testing high availability we disconnected all NIC cables of the VM host, the VM does not migrate to the other host, we had to manually confirm the host has been rebooted, and than migration happens.
> >
> > Is this the correct behavior? We have to manually confirm that the host has been rebooted for HA happens?
> >
> > Regards
> > Jose
> 
> Hello,
> when you say "we disconnected all NIC cables" you mean "we
> disconnected all NIC cables but the ones connected to the iLO
> interface", correct?
> Because to know that one host has successfully fenced the problematic
> one, it has to send a get status message and see that it is off or
> that it has been successfully rebooted.....
> 
> For esxample in RHCS if you configure iLO as a fencing device it
> remains indefinitely in state similar to
> 
> wait for fence to complete
> 
> if the "fencer" is not able to get an acknowledge about the operation
> or to reach the other node iLO.
> Probably you can find something in your logs...
> 
> Gianluca
> 



More information about the Users mailing list