[Users] HA

Koen Vanoppen vanoppen.koen at gmail.com
Tue Apr 8 12:26:20 UTC 2014


Ok,
Thanx already for all the help. I adapted some things for quicker respons:
engine-config --get FenceQuietTimeBetweenOperationsInSec-->180
 engine-config --set FenceQuietTimeBetweenOperationsInSec=60

 engine-config --get StorageDomainFalureTimeoutInMinutes-->180
 engine-config --set StorageDomainFalureTimeoutInMinutes=1

 engine-config --get SpmCommandFailOverRetries-->5
 engine-config --set SpmCommandFailOverRetries

 engine-config --get SPMFailOverAttempts-->3
 engine-config --set SPMFailOverAttempts=1

engine-config --get NumberOfFailedRunsOnVds-->3
 engine-config --set NumberOfFailedRunsOnVds=1

engine-config --get vdsTimeout-->180
engine-config --set vdsTimeout=30

engine-config --get VDSAttemptsToResetCount-->2
engine-config --set VDSAttemptsToResetCount=1

engine-config --get TimeoutToResetVdsInSeconds-->60
engine-config --set TimeoutToResetVdsInSeconds=30

Now the result of this is that when the VM is not running on the SPM that
it will migrate before going in pause mode.
But when we tried it, when the vm is running on the SPM, it get's in paused
mode (for safety reasons, I know ;-) ). And stays there until the host gets
MANUALLY fenced by rebooting it. So now my question is... How can I make
the hypervisor fence (so reboots, so vm is moved) quicker?

Kind regards,

Koen


2014-04-04 16:28 GMT+02:00 Koen Vanoppen <vanoppen.koen at gmail.com>:

> Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik
> heb reeds de time out aangepast. Die stond op 5 min voor hij den time out
> ging geven. Staat nu op 2 min
> On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" <
> david.van.zeebroeck at brusselsairport.be> wrote:
>
>>   Ik heb ze ook he
>>
>>
>>
>> Maar normaal had de fencing moeten werken als ik het zo lees
>>
>> Dus daar is ergens iets verkeerd gelopen zo te lezen
>>
>>
>>
>> *From:* Koen Vanoppen [mailto:vanoppen.koen at gmail.com]
>> *Sent:* vrijdag 4 april 2014 16:07
>> *To:* David Van Zeebroeck
>> *Subject:* Fwd: Re: [Users] HA
>>
>>
>>
>>
>>
>> David Van Zeebroeck
>>
>> Product Manager Unix Infrastructure
>>
>> Information & Communication Technology
>>
>> *Brussels Airport Company*
>>
>> T +32 (0)2 753 66 24
>>
>> M +32 (0)497 02 17 31
>>
>> david.van.zeebroeck at brusselsairport.be
>>
>>  *www.brusselsairport.be <http://www.brusselsairport.be>*
>>
>>
>>
>>
>>
>> *FOLLOW US ON:*
>>
>> <http://https://nl-nl.facebook.com/BrusselsairportBRU>
>>
>> <http://www.brusselsairport.be/en/mediaroom/brusm/>
>>
>>
>>
>> Company Info <http://www.brusselsairport.be/en/maildisclaimer/>
>>
>>
>>
>>  ---------- Forwarded message ----------
>> From: "Michal Skrivanek" <michal.skrivanek at redhat.com>
>> Date: Apr 4, 2014 3:39 PM
>> Subject: Re: [Users] HA
>> To: "Koen Vanoppen" <vanoppen.koen at gmail.com>
>> Cc: "ovirt-users Users" <users at ovirt.org>
>>
>>
>>
>> On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
>>
>>
>>
>>   Do you have power management configured?
>>
>> Was the "failed" host fenced/rebooted?
>>
>>
>>
>> On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen <vanoppen.koen at gmail.com>
>> wrote:
>>
>> So... It is possible for a fully automatic migration of the VM to another
>> hypervisor in case Storage connection fails?
>>
>> How can we make this happen? Because for the moment, when we tested the
>> situation they stayed in pause state.
>>
>> (Test situation:
>>
>>    - Unplug the 2 fibre cables from the hypervisor
>>    - VM's go in pause state
>>    - VM's stayed in pause state until the failure was solved
>>
>>
>>
>> as said before, it's not safe hence we (try to) not migrate them.
>>
>> They only get paused when they actually access the storage which may not
>> be always the case. I.e. the storage connection is severed, host deemed
>> NonOperational and VMs are getting migrated from it, then some of them will
>> succeed if they didn't access that "bad" storage … the paused VMs will
>> remain (mostly, it can still happen that they appear paused migrated on
>> other host when the disk access occurs only at the last stage of migration)
>>
>>
>>
>>
>>
>> so in other words, if you want to migrate the VMs without interruption
>> it's not sometimes possible
>>
>> if you are fine with the VMs restarted in short time on other host then
>> power management/fencing will help here
>>
>>
>>
>> Thanks,
>>
>> michal
>>
>>     )
>>
>>
>>
>> They only returned when we restored the fiber connection to the
>> Hypervisor…
>>
>>
>>
>> yes, since 3.3 we have the autoresume feature
>>
>>
>>
>> Thanks,
>>
>> michal
>>
>>
>>
>>
>>
>>
>>
>> Kind Regards,
>>
>> Koen
>>
>>
>>
>>
>>
>> 2014-04-04 13:52 GMT+02:00 Koen Vanoppen <vanoppen.koen at gmail.com>:
>>
>> So... It is possible for a fully automatic migration of the VM to another
>> hypervisor in case Storage connection fails?
>>
>> How can we make this happen? Because for the moment, when we tested the
>> situation they stayed in pause state.
>>
>> (Test situation:
>>
>>    - Unplug the 2 fibre cables from the hypervisor
>>    - VM's go in pause state
>>    - VM's stayed in pause state until the failure was solved
>>
>> )
>>
>>
>>
>> They only returned when we restored the fiber connection to the
>> Hypervisor...
>>
>> Kind Regards,
>>
>> Koen
>>
>>
>>
>> 2014-04-03 16:53 GMT+02:00 Koen Vanoppen <vanoppen.koen at gmail.com>:
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: "Doron Fediuck" <dfediuck at redhat.com>
>> Date: Apr 3, 2014 4:51 PM
>> Subject: Re: [Users] HA
>>
>> To: "Koen Vanoppen" <vanoppen.koen at gmail.com>
>> Cc: "Omer Frenkel" <ofrenkel at redhat.com>, <users at ovirt.org>, "Federico
>> Simoncelli" <fsimonce at redhat.com>, "Allon Mureinik" <amureini at redhat.com>
>>
>>
>>
>> ----- Original Message -----
>> > From: "Koen Vanoppen" <vanoppen.koen at gmail.com>
>> > To: "Omer Frenkel" <ofrenkel at redhat.com>, users at ovirt.org
>> > Sent: Wednesday, April 2, 2014 4:17:36 PM
>> > Subject: Re: [Users] HA
>> >
>> > Yes, indeed. I meant not-operational. Sorry.
>> > So, if I understand this correctly. When we ever come in a situation
>> that we
>> > loose both storage connections on our hypervisor, we will have to
>> manually
>> > restore the connections first?
>> >
>> > And thanx for the tip for speeding up thins :-).
>> >
>> > Kind regards,
>> >
>> > Koen
>> >
>> >
>> > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel at redhat.com > :
>> >
>> >
>> >
>> >
>> >
>> > ----- Original Message -----
>> > > From: "Koen Vanoppen" < vanoppen.koen at gmail.com >
>> > > To: users at ovirt.org
>> > > Sent: Wednesday, April 2, 2014 4:07:19 PM
>> > > Subject: [Users] HA
>> > >
>> > > Dear All,
>> > >
>> > > Due our acceptance testing, we discovered something. (Document will
>> > > follow).
>> > > When we disable one fiber path, no problem multipath finds it way no
>> pings
>> > > are lost.
>> > > BUT when we disabled both the fiber paths (so one of the storage
>> domain is
>> > > gone on this host, but still available on the other host), vms go in
>> paused
>> > > mode... He chooses a new SPM (can we speed this up?), put's the host
>> in
>> > > non-responsive (can we speed this up, more important) and the VM's
>> stay on
>> > > Paused mode... I would expect that they would be migrated (yes, HA is
>> >
>> > i guess you mean the host moves to not-operational (in contrast to
>> > non-responsive)?
>> > if so, the engine will not migrate vms that are paused to do io error,
>> > because of data corruption risk.
>> >
>> > to speed up you can look at the storage domain monitoring timeout:
>> > engine-config --get StorageDomainFalureTimeoutInMinutes
>> >
>> >
>> > > enabled) to the other host and reboot there... Any solution? We are
>> still
>> > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the
>> easter
>> > > holiday.
>> > >
>> > > Kind Regards,
>> > >
>> > > Koen
>> > >
>>
>> Hi Koen,
>> Resuming from paused due to io issues is supported (adding relevant
>> folks).
>> Regardless, if you did not define power management, you should manually
>> approve
>> source host was rebooted in order for migration to proceed. Otherwise we
>> risk
>> split-brain scenario.
>>
>> Doron
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140408/cd66aaee/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image490c34.JPG
Type: image/jpeg
Size: 5769 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140408/cd66aaee/attachment-0002.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image157135.JPG
Type: image/jpeg
Size: 10389 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140408/cd66aaee/attachment-0003.jpe>


More information about the Users mailing list