The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem.

If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-).

Kind regards,

---------- Forwarded message ----------
From: Koen Vanoppen <vanoppen.koen@gmail.com>
Date: 2014-04-11 14:47 GMT+02:00
Subject: Re: [ovirt-users] [Users] HA
To: Michal Skrivanek <michal.skrivanek@redhat.com>

The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem.

If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-).

Kind regards,

2014-04-11 9:37 GMT+02:00 Michal Skrivanek <michal.skrivanek@redhat.com>:

On 11 Apr 2014, at 09:00, Koen Vanoppen wrote:

Hi All,

Any news about this? DSM hook or anything?
Thanx!

Kind regards

2014-04-09 9:37 GMT+02:00 Omer Frenkel <ofrenkel@redhat.com>:

----- Original Message -----
> From: "Koen Vanoppen" <vanoppen.koen@gmail.com>
> To: users@ovirt.org

> Sent: Tuesday, April 8, 2014 3:41:02 PM
> Subject: Re: [Users] HA
>

> Or with other words, the SPM and the VM should move almost immediate after
> the storage connections on the hypervisor are gone. I know, I'm asking to
> much maybe, but we would be very happy :-) :-).
>
> So sketch:
>
> Mercury1 SPM
> Mercury 2
>
> Mercury1 loses both fibre connections --> goes in non-operational and the VM
> goes in paused state and stays this way, until I manually reboot the host so
> it fences.
>
> What I would like is that when mercury 1 loses both fibre connections. He
> fences immediate so the VM's are moved also almost instantly... If this is
> possible... :-)
>
> Kind regards and thanks for all the help!
>

Michal, is there a vdsm hook for vm moved to pause?
if so, you could send KILL to it, and engine will identify vm was killed+HA,
so it will be restarted, and no need to reboot the host, it will stay in non-operational until storage is fixed.

you have to differentiate - if only the VMs would be paused, yes, you can do anything (also change the err reporting policy to not pause the VM)

but if the host becomes non-operational then it simply doesn't work, vdsm got stuck somewhere (often in get blk device stats)
proper power management config should fence it

Thanks,

michal

>
>
> 2014-04-08 14:26 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > :
>
>
>
> Ok,
> Thanx already for all the help. I adapted some things for quicker respons:
> engine-config --get FenceQuietTimeBetweenOperationsInSec-->180
> engine-config --set FenceQuietTimeBetweenOperationsInSec=60
>
> engine-config --get StorageDomainFalureTimeoutInMinutes-->180
> engine-config --set StorageDomainFalureTimeoutInMinutes=1
>
> engine-config --get SpmCommandFailOverRetries-->5
> engine-config --set SpmCommandFailOverRetries
>
> engine-config --get SPMFailOverAttempts-->3
> engine-config --set SPMFailOverAttempts=1
>
> engine-config --get NumberOfFailedRunsOnVds-->3
> engine-config --set NumberOfFailedRunsOnVds=1
>
> engine-config --get vdsTimeout-->180
> engine-config --set vdsTimeout=30
>
> engine-config --get VDSAttemptsToResetCount-->2
> engine-config --set VDSAttemptsToResetCount=1
>
> engine-config --get TimeoutToResetVdsInSeconds-->60
> engine-config --set TimeoutToResetVdsInSeconds=30
>
> Now the result of this is that when the VM is not running on the SPM that it
> will migrate before going in pause mode.
> But when we tried it, when the vm is running on the SPM, it get's in paused
> mode (for safety reasons, I know ;-) ). And stays there until the host gets
> MANUALLY fenced by rebooting it. So now my question is... How can I make the
> hypervisor fence (so reboots, so vm is moved) quicker?
>
> Kind regards,
>
> Koen
>
>
> 2014-04-04 16:28 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > :
>
>
>
>
>
> Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb
> reeds de time out aangepast. Die stond op 5 min voor hij den time out ging
> geven. Staat nu op 2 min
> On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" <
> david.van.zeebroeck@brusselsairport.be > wrote:
>
>
>
>
>
>
>
>
> Ik heb ze ook he
>
>
>
> Maar normaal had de fencing moeten werken als ik het zo lees
>
> Dus daar is ergens iets verkeerd gelopen zo te lezen
>
>
>
> From: Koen Vanoppen [mailto: vanoppen.koen@gmail.com ]
> Sent: vrijdag 4 april 2014 16:07
> To: David Van Zeebroeck
> Subject: Fwd: Re: [Users] HA
>
>
>
>
>
>
>
>
>
>
>
> David Van Zeebroeck
>
> Product Manager Unix Infrastructure
>
> Information & Communication Technology
>
> Brussels Airport Company
>
> T +32 (0)2 753 66 24
>
> M +32 (0)497 02 17 31
>
> david.van.zeebroeck@brusselsairport.be
>
>
>
> www.brusselsairport.be
>
>
>
>
>
>
>

> FOLLOW US ON:

>
>
>
>
>
>
>
> Company Info
>
>
>
>
>
>
> ---------- Forwarded message ----------
> From: "Michal Skrivanek" < michal.skrivanek@redhat.com >
> Date: Apr 4, 2014 3:39 PM
> Subject: Re: [Users] HA
> To: "Koen Vanoppen" < vanoppen.koen@gmail.com >
> Cc: "ovirt-users Users" < users@ovirt.org >
>
>
>
>
>
>
> On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
>
>
>
>
>
>
> Do you have power management configured?
>
>
> Was the "failed" host fenced/rebooted?
>
>
>
>
>
> On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < vanoppen.koen@gmail.com >
> wrote:
>
>
> So... It is possible for a fully automatic migration of the VM to another
> hypervisor in case Storage connection fails?
>
>
> How can we make this happen? Because for the moment, when we tested the
> situation they stayed in pause state.
>
>
> (Test situation:
>

> * Unplug the 2 fibre cables from the hypervisor
> * VM's go in pause state
> * VM's stayed in pause state until the failure was solved

>
>
>
>
>
> as said before, it's not safe hence we (try to) not migrate them.
>
>
> They only get paused when they actually access the storage which may not be
> always the case. I.e. the storage connection is severed, host deemed
> NonOperational and VMs are getting migrated from it, then some of them will
> succeed if they didn't access that "bad" storage … the paused VMs will
> remain (mostly, it can still happen that they appear paused migrated on
> other host when the disk access occurs only at the last stage of migration)
>
>
>
>
>
>
>
>
> so in other words, if you want to migrate the VMs without interruption it's
> not sometimes possible
>
>
> if you are fine with the VMs restarted in short time on other host then power
> management/fencing will help here
>
>
>
>
>
> Thanks,
>
>
> michal
>
>
>
>
>
>
> )
>
>
>
>
> They only returned when we restored the fiber connection to the Hypervisor…
>
>
>
>
>
> yes, since 3.3 we have the autoresume feature
>
>
>
>
>
> Thanks,
>
>
> michal
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Kind Regards,
>
> Koen
>
>
>
>
>
>
>
>
> 2014-04-04 13:52 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com >:
>
>
> So... It is possible for a fully automatic migration of the VM to another
> hypervisor in case Storage connection fails?
>
>
> How can we make this happen? Because for the moment, when we tested the
> situation they stayed in pause state.
>
>
> (Test situation:
>

> * Unplug the 2 fibre cables from the hypervisor
> * VM's go in pause state
> * VM's stayed in pause state until the failure was solved

>
>
> )
>
>
>
>
> They only returned when we restored the fiber connection to the Hypervisor...
>
>
> Kind Regards,
>
> Koen
>
>
>
>
>
> 2014-04-03 16:53 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com >:
>
>
>
>
>
>
>
> ---------- Forwarded message ----------
> From: "Doron Fediuck" < dfediuck@redhat.com >
> Date: Apr 3, 2014 4:51 PM
> Subject: Re: [Users] HA
>
>
> To: "Koen Vanoppen" < vanoppen.koen@gmail.com >
> Cc: "Omer Frenkel" < ofrenkel@redhat.com >, < users@ovirt.org >, "Federico
> Simoncelli" < fsimonce@redhat.com >, "Allon Mureinik" < amureini@redhat.com
> >
>
>
>
> ----- Original Message -----
> > From: "Koen Vanoppen" < vanoppen.koen@gmail.com >
> > To: "Omer Frenkel" < ofrenkel@redhat.com >, users@ovirt.org
> > Sent: Wednesday, April 2, 2014 4:17:36 PM
> > Subject: Re: [Users] HA
> >
> > Yes, indeed. I meant not-operational. Sorry.
> > So, if I understand this correctly. When we ever come in a situation that
> > we
> > loose both storage connections on our hypervisor, we will have to manually
> > restore the connections first?
> >
> > And thanx for the tip for speeding up thins :-).
> >
> > Kind regards,
> >
> > Koen
> >
> >
> > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel@redhat.com > :
> >
> >
> >
> >
> >
> > ----- Original Message -----
> > > From: "Koen Vanoppen" < vanoppen.koen@gmail.com >
> > > To: users@ovirt.org
> > > Sent: Wednesday, April 2, 2014 4:07:19 PM
> > > Subject: [Users] HA
> > >
> > > Dear All,
> > >
> > > Due our acceptance testing, we discovered something. (Document will
> > > follow).
> > > When we disable one fiber path, no problem multipath finds it way no
> > > pings
> > > are lost.
> > > BUT when we disabled both the fiber paths (so one of the storage domain
> > > is
> > > gone on this host, but still available on the other host), vms go in
> > > paused
> > > mode... He chooses a new SPM (can we speed this up?), put's the host in
> > > non-responsive (can we speed this up, more important) and the VM's stay
> > > on
> > > Paused mode... I would expect that they would be migrated (yes, HA is
> >
> > i guess you mean the host moves to not-operational (in contrast to
> > non-responsive)?
> > if so, the engine will not migrate vms that are paused to do io error,
> > because of data corruption risk.
> >
> > to speed up you can look at the storage domain monitoring timeout:
> > engine-config --get StorageDomainFalureTimeoutInMinutes
> >
> >
> > > enabled) to the other host and reboot there... Any solution? We are still
> > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the easter
> > > holiday.
> > >
> > > Kind Regards,
> > >
> > > Koen
> > >
>
> Hi Koen,
> Resuming from paused due to io issues is supported (adding relevant folks).
> Regardless, if you did not define power management, you should manually
> approve
> source host was rebooted in order for migration to proceed. Otherwise we risk
> split-brain scenario.
>
> Doron
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users