[Users] HA

Fri Apr 4 13:35:46 UTC 2014

----- Original Message -----
> From: "Itamar Heim" <iheim at redhat.com>
> To: "Koen Vanoppen" <vanoppen.koen at gmail.com>, "Doron Fediuck" <dfediuck at redhat.com>, users at ovirt.org
> Sent: Friday, April 4, 2014 3:27:07 PM
> Subject: Re: [Users] HA
> 
> On 04/04/2014 03:21 PM, Koen Vanoppen wrote:
> > So... It is possible for a fully automatic migration of the VM to
> > another hypervisor in case Storage connection fails?
> > How can we make this happen? Because for the moment, when we tested the
> > situation they stayed in pause state.
> > (Test situation:
> >
> >   * Unplug the 2 fibre cables from the hypervisor
> >   * VM's go in pause state
> >   * VM's stayed in pause state until the failure was solved
> >
> > )
> 
> the KVM team advised this would be an unsafe migration. iirc, since IO
> can be stuck at kernel level, pending write to the storage, which would
> cause corruption if storage is recovered while the VM is now running on
> another machine.

correct,

Migration while the VM was paused due to EIO id deemed as unsafe and might lead to data corruption,

There is a feature that automatically resumes the VM ones storage connectivity was regained.

In addition you can manually fence the host (if you have fencing device configured) and than run the VM somewhere else (or you can define the vm as Highly available and the engine will run it again for you).

Anyway just to be on the safe side, I saw earlier in the thread a comment about "host has been rebooted",
Do not use it unless you actually reboot the host.

> 
> >
> >
> > They only returned when we restored the fiber connection to the
> > Hypervisor...
> >
> > Kind Regards,
> >
> > Koen
> >
> >
> >
> > 2014-04-04 13:52 GMT+02:00 Koen Vanoppen <vanoppen.koen at gmail.com
> > <mailto:vanoppen.koen at gmail.com>>:
> >
> >     So... It is possible for a fully automatic migration of the VM to
> >     another hypervisor in case Storage connection fails?
> >     How can we make this happen? Because for the moment, when we tested
> >     the situation they stayed in pause state.
> >     (Test situation:
> >
> >       * Unplug the 2 fibre cables from the hypervisor
> >       * VM's go in pause state
> >       * VM's stayed in pause state until the failure was solved
> >
> >     )
> >
> >
> >     They only returned when we restored the fiber connection to the
> >     Hypervisor...
> >
> >     Kind Regards,
> >
> >     Koen
> >
> >
> >     2014-04-03 16:53 GMT+02:00 Koen Vanoppen <vanoppen.koen at gmail.com
> >     <mailto:vanoppen.koen at gmail.com>>:
> >
> >         ---------- Forwarded message ----------
> >         From: "Doron Fediuck" <dfediuck at redhat.com
> >         <mailto:dfediuck at redhat.com>>
> >         Date: Apr 3, 2014 4:51 PM
> >         Subject: Re: [Users] HA
> >         To: "Koen Vanoppen" <vanoppen.koen at gmail.com
> >         <mailto:vanoppen.koen at gmail.com>>
> >         Cc: "Omer Frenkel" <ofrenkel at redhat.com
> >         <mailto:ofrenkel at redhat.com>>, <users at ovirt.org
> >         <mailto:users at ovirt.org>>, "Federico Simoncelli"
> >         <fsimonce at redhat.com <mailto:fsimonce at redhat.com>>, "Allon
> >         Mureinik" <amureini at redhat.com <mailto:amureini at redhat.com>>
> >
> >
> >
> >         ----- Original Message -----
> >          > From: "Koen Vanoppen" <vanoppen.koen at gmail.com
> >         <mailto:vanoppen.koen at gmail.com>>
> >          > To: "Omer Frenkel" <ofrenkel at redhat.com
> >         <mailto:ofrenkel at redhat.com>>, users at ovirt.org
> >         <mailto:users at ovirt.org>
> >          > Sent: Wednesday, April 2, 2014 4:17:36 PM
> >          > Subject: Re: [Users] HA
> >          >
> >          > Yes, indeed. I meant not-operational. Sorry.
> >          > So, if I understand this correctly. When we ever come in a
> >         situation that we
> >          > loose both storage connections on our hypervisor, we will
> >         have to manually
> >          > restore the connections first?
> >          >
> >          > And thanx for the tip for speeding up thins :-).
> >          >
> >          > Kind regards,
> >          >
> >          > Koen
> >          >
> >          >
> >          > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel at redhat.com
> >         <mailto:ofrenkel at redhat.com> > :
> >          >
> >          >
> >          >
> >          >
> >          >
> >          > ----- Original Message -----
> >          > > From: "Koen Vanoppen" < vanoppen.koen at gmail.com
> >         <mailto:vanoppen.koen at gmail.com> >
> >          > > To: users at ovirt.org <mailto:users at ovirt.org>
> >          > > Sent: Wednesday, April 2, 2014 4:07:19 PM
> >          > > Subject: [Users] HA
> >          > >
> >          > > Dear All,
> >          > >
> >          > > Due our acceptance testing, we discovered something.
> >         (Document will
> >          > > follow).
> >          > > When we disable one fiber path, no problem multipath finds
> >         it way no pings
> >          > > are lost.
> >          > > BUT when we disabled both the fiber paths (so one of the
> >         storage domain is
> >          > > gone on this host, but still available on the other host),
> >         vms go in paused
> >          > > mode... He chooses a new SPM (can we speed this up?), put's
> >         the host in
> >          > > non-responsive (can we speed this up, more important) and
> >         the VM's stay on
> >          > > Paused mode... I would expect that they would be migrated
> >         (yes, HA is
> >          >
> >          > i guess you mean the host moves to not-operational (in
> >         contrast to
> >          > non-responsive)?
> >          > if so, the engine will not migrate vms that are paused to do
> >         io error,
> >          > because of data corruption risk.
> >          >
> >          > to speed up you can look at the storage domain monitoring
> >         timeout:
> >          > engine-config --get StorageDomainFalureTimeoutInMinutes
> >          >
> >          >
> >          > > enabled) to the other host and reboot there... Any
> >         solution? We are still
> >          > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4
> >         after the easter
> >          > > holiday.
> >          > >
> >          > > Kind Regards,
> >          > >
> >          > > Koen
> >          > >
> >
> >         Hi Koen,
> >         Resuming from paused due to io issues is supported (adding
> >         relevant folks).
> >         Regardless, if you did not define power management, you should
> >         manually approve
> >         source host was rebooted in order for migration to proceed.
> >         Otherwise we risk
> >         split-brain scenario.
> >
> >         Doron
> >
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
> 
>