[ovirt-users] [Users] HA

Michal Skrivanek michal.skrivanek at redhat.com
Fri Apr 11 13:11:33 UTC 2014


On 11 Apr 2014, at 14:47, Koen Vanoppen wrote:

> The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem.

ah, I see

> If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-).

I'm not entirely sure that the after_vm_pause() hook gets invoked in this case. It was not intended for involuntary pause…but give it a try!:)
otherwise ….well, you can always do a periodic query…not very effective though

Thanks,
michal

> 
> Kind regards,
> 
> 
> ---------- Forwarded message ----------
> From: Koen Vanoppen <vanoppen.koen at gmail.com>
> Date: 2014-04-11 14:47 GMT+02:00
> Subject: Re: [ovirt-users] [Users] HA
> To: Michal Skrivanek <michal.skrivanek at redhat.com>
> 
> 
> The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem.
> If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-).
> 
> Kind regards,
> 
> 
> 
> 
> 2014-04-11 9:37 GMT+02:00 Michal Skrivanek <michal.skrivanek at redhat.com>:
> 
> 
> On 11 Apr 2014, at 09:00, Koen Vanoppen wrote:
> 
>> Hi All,
>> 
>> Any news about this? DSM hook or anything?
>> Thanx!
>> 
>> Kind regards
>> 
>> 
>> 2014-04-09 9:37 GMT+02:00 Omer Frenkel <ofrenkel at redhat.com>:
>> 
>> 
>> ----- Original Message -----
>> > From: "Koen Vanoppen" <vanoppen.koen at gmail.com>
>> > To: users at ovirt.org
>> > Sent: Tuesday, April 8, 2014 3:41:02 PM
>> > Subject: Re: [Users] HA
>> >
>> > Or with other words, the SPM and the VM should move almost immediate after
>> > the storage connections on the hypervisor are gone. I know, I'm asking to
>> > much maybe, but we would be very happy :-) :-).
>> >
>> > So sketch:
>> >
>> > Mercury1 SPM
>> > Mercury 2
>> >
>> > Mercury1 loses both fibre connections --> goes in non-operational and the VM
>> > goes in paused state and stays this way, until I manually reboot the host so
>> > it fences.
>> >
>> > What I would like is that when mercury 1 loses both fibre connections. He
>> > fences immediate so the VM's are moved also almost instantly... If this is
>> > possible... :-)
>> >
>> > Kind regards and thanks for all the help!
>> >
>> 
>> Michal, is there a vdsm hook for vm moved to pause?
>> if so, you could send KILL to it, and engine will identify vm was killed+HA,
>> so it will be restarted, and no need to reboot the host, it will stay in non-operational until storage is fixed.
> 
> you have to differentiate - if only the VMs would be paused, yes, you can do anything (also change the err reporting policy to not pause the VM)
> but if the host becomes non-operational then it simply doesn't work, vdsm got stuck somewhere (often in get blk device stats)
> proper power management config should fence it
> 
> Thanks,
> michal
> 
>> 
>> >
>> >
>> > 2014-04-08 14:26 GMT+02:00 Koen Vanoppen < vanoppen.koen at gmail.com > :
>> >
>> >
>> >
>> > Ok,
>> > Thanx already for all the help. I adapted some things for quicker respons:
>> > engine-config --get FenceQuietTimeBetweenOperationsInSec-->180
>> > engine-config --set FenceQuietTimeBetweenOperationsInSec=60
>> >
>> > engine-config --get StorageDomainFalureTimeoutInMinutes-->180
>> > engine-config --set StorageDomainFalureTimeoutInMinutes=1
>> >
>> > engine-config --get SpmCommandFailOverRetries-->5
>> > engine-config --set SpmCommandFailOverRetries
>> >
>> > engine-config --get SPMFailOverAttempts-->3
>> > engine-config --set SPMFailOverAttempts=1
>> >
>> > engine-config --get NumberOfFailedRunsOnVds-->3
>> > engine-config --set NumberOfFailedRunsOnVds=1
>> >
>> > engine-config --get vdsTimeout-->180
>> > engine-config --set vdsTimeout=30
>> >
>> > engine-config --get VDSAttemptsToResetCount-->2
>> > engine-config --set VDSAttemptsToResetCount=1
>> >
>> > engine-config --get TimeoutToResetVdsInSeconds-->60
>> > engine-config --set TimeoutToResetVdsInSeconds=30
>> >
>> > Now the result of this is that when the VM is not running on the SPM that it
>> > will migrate before going in pause mode.
>> > But when we tried it, when the vm is running on the SPM, it get's in paused
>> > mode (for safety reasons, I know ;-) ). And stays there until the host gets
>> > MANUALLY fenced by rebooting it. So now my question is... How can I make the
>> > hypervisor fence (so reboots, so vm is moved) quicker?
>> >
>> > Kind regards,
>> >
>> > Koen
>> >
>> >
>> > 2014-04-04 16:28 GMT+02:00 Koen Vanoppen < vanoppen.koen at gmail.com > :
>> >
>> >
>> >
>> >
>> >
>> > Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb
>> > reeds de time out aangepast. Die stond op 5 min voor hij den time out ging
>> > geven. Staat nu op 2 min
>> > On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" <
>> > david.van.zeebroeck at brusselsairport.be > wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Ik heb ze ook he
>> >
>> >
>> >
>> > Maar normaal had de fencing moeten werken als ik het zo lees
>> >
>> > Dus daar is ergens iets verkeerd gelopen zo te lezen
>> >
>> >
>> >
>> > From: Koen Vanoppen [mailto: vanoppen.koen at gmail.com ]
>> > Sent: vrijdag 4 april 2014 16:07
>> > To: David Van Zeebroeck
>> > Subject: Fwd: Re: [Users] HA
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > David Van Zeebroeck
>> >
>> > Product Manager Unix Infrastructure
>> >
>> > Information & Communication Technology
>> >
>> > Brussels Airport Company
>> >
>> > T +32 (0)2 753 66 24
>> >
>> > M +32 (0)497 02 17 31
>> >
>> > david.van.zeebroeck at brusselsairport.be
>> >
>> >
>> >
>> > www.brusselsairport.be
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > FOLLOW US ON:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Company Info
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: "Michal Skrivanek" < michal.skrivanek at redhat.com >
>> > Date: Apr 4, 2014 3:39 PM
>> > Subject: Re: [Users] HA
>> > To: "Koen Vanoppen" < vanoppen.koen at gmail.com >
>> > Cc: "ovirt-users Users" < users at ovirt.org >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> > Do you have power management configured?
>> >
>> >
>> > Was the "failed" host fenced/rebooted?
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < vanoppen.koen at gmail.com >
>> > wrote:
>> >
>> >
>> > So... It is possible for a fully automatic migration of the VM to another
>> > hypervisor in case Storage connection fails?
>> >
>> >
>> > How can we make this happen? Because for the moment, when we tested the
>> > situation they stayed in pause state.
>> >
>> >
>> > (Test situation:
>> >
>> >     * Unplug the 2 fibre cables from the hypervisor
>> >     * VM's go in pause state
>> >     * VM's stayed in pause state until the failure was solved
>> >
>> >
>> >
>> >
>> >
>> > as said before, it's not safe hence we (try to) not migrate them.
>> >
>> >
>> > They only get paused when they actually access the storage which may not be
>> > always the case. I.e. the storage connection is severed, host deemed
>> > NonOperational and VMs are getting migrated from it, then some of them will
>> > succeed if they didn't access that "bad" storage … the paused VMs will
>> > remain (mostly, it can still happen that they appear paused migrated on
>> > other host when the disk access occurs only at the last stage of migration)
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > so in other words, if you want to migrate the VMs without interruption it's
>> > not sometimes possible
>> >
>> >
>> > if you are fine with the VMs restarted in short time on other host then power
>> > management/fencing will help here
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> >
>> > michal
>> >
>> >
>> >
>> >
>> >
>> >
>> > )
>> >
>> >
>> >
>> >
>> > They only returned when we restored the fiber connection to the Hypervisor…
>> >
>> >
>> >
>> >
>> >
>> > yes, since 3.3 we have the autoresume feature
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> >
>> > michal
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Kind Regards,
>> >
>> > Koen
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 2014-04-04 13:52 GMT+02:00 Koen Vanoppen < vanoppen.koen at gmail.com >:
>> >
>> >
>> > So... It is possible for a fully automatic migration of the VM to another
>> > hypervisor in case Storage connection fails?
>> >
>> >
>> > How can we make this happen? Because for the moment, when we tested the
>> > situation they stayed in pause state.
>> >
>> >
>> > (Test situation:
>> >
>> >     * Unplug the 2 fibre cables from the hypervisor
>> >     * VM's go in pause state
>> >     * VM's stayed in pause state until the failure was solved
>> >
>> >
>> > )
>> >
>> >
>> >
>> >
>> > They only returned when we restored the fiber connection to the Hypervisor...
>> >
>> >
>> > Kind Regards,
>> >
>> > Koen
>> >
>> >
>> >
>> >
>> >
>> > 2014-04-03 16:53 GMT+02:00 Koen Vanoppen < vanoppen.koen at gmail.com >:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: "Doron Fediuck" < dfediuck at redhat.com >
>> > Date: Apr 3, 2014 4:51 PM
>> > Subject: Re: [Users] HA
>> >
>> >
>> > To: "Koen Vanoppen" < vanoppen.koen at gmail.com >
>> > Cc: "Omer Frenkel" < ofrenkel at redhat.com >, < users at ovirt.org >, "Federico
>> > Simoncelli" < fsimonce at redhat.com >, "Allon Mureinik" < amureini at redhat.com
>> > >
>> >
>> >
>> >
>> > ----- Original Message -----
>> > > From: "Koen Vanoppen" < vanoppen.koen at gmail.com >
>> > > To: "Omer Frenkel" < ofrenkel at redhat.com >, users at ovirt.org
>> > > Sent: Wednesday, April 2, 2014 4:17:36 PM
>> > > Subject: Re: [Users] HA
>> > >
>> > > Yes, indeed. I meant not-operational. Sorry.
>> > > So, if I understand this correctly. When we ever come in a situation that
>> > > we
>> > > loose both storage connections on our hypervisor, we will have to manually
>> > > restore the connections first?
>> > >
>> > > And thanx for the tip for speeding up thins :-).
>> > >
>> > > Kind regards,
>> > >
>> > > Koen
>> > >
>> > >
>> > > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel at redhat.com > :
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > ----- Original Message -----
>> > > > From: "Koen Vanoppen" < vanoppen.koen at gmail.com >
>> > > > To: users at ovirt.org
>> > > > Sent: Wednesday, April 2, 2014 4:07:19 PM
>> > > > Subject: [Users] HA
>> > > >
>> > > > Dear All,
>> > > >
>> > > > Due our acceptance testing, we discovered something. (Document will
>> > > > follow).
>> > > > When we disable one fiber path, no problem multipath finds it way no
>> > > > pings
>> > > > are lost.
>> > > > BUT when we disabled both the fiber paths (so one of the storage domain
>> > > > is
>> > > > gone on this host, but still available on the other host), vms go in
>> > > > paused
>> > > > mode... He chooses a new SPM (can we speed this up?), put's the host in
>> > > > non-responsive (can we speed this up, more important) and the VM's stay
>> > > > on
>> > > > Paused mode... I would expect that they would be migrated (yes, HA is
>> > >
>> > > i guess you mean the host moves to not-operational (in contrast to
>> > > non-responsive)?
>> > > if so, the engine will not migrate vms that are paused to do io error,
>> > > because of data corruption risk.
>> > >
>> > > to speed up you can look at the storage domain monitoring timeout:
>> > > engine-config --get StorageDomainFalureTimeoutInMinutes
>> > >
>> > >
>> > > > enabled) to the other host and reboot there... Any solution? We are still
>> > > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the easter
>> > > > holiday.
>> > > >
>> > > > Kind Regards,
>> > > >
>> > > > Koen
>> > > >
>> >
>> > Hi Koen,
>> > Resuming from paused due to io issues is supported (adding relevant folks).
>> > Regardless, if you did not define power management, you should manually
>> > approve
>> > source host was rebooted in order for migration to proceed. Otherwise we risk
>> > split-brain scenario.
>> >
>> > Doron
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>> 
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
> 
> 
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140411/c7528a3c/attachment-0001.html>


More information about the Users mailing list