New subject: [Users] HA

11 Apr 2014

      The Power management is configured correctly. And as long as the host who
loses his storage isn't the SPM, there is no problem.
If I can make it work that, when the VM is pauzed it's get switched of and
(HA-way) reboots itself. I'm perfectly happy :-).

Kind regards,

---------- Forwarded message ----------
From: Koen Vanoppen <vanoppen.koen@gmail.com>
Date: 2014-04-11 14:47 GMT+02:00
Subject: Re: [ovirt-users] [Users] HA
To: Michal Skrivanek <michal.skrivanek@redhat.com>

The Power management is configured correctly. And as long as the host who
loses his storage isn't the SPM, there is no problem.
If I can make it work that, when the VM is pauzed it's get switched of and
(HA-way) reboots itself. I'm perfectly happy :-).

Kind regards,

2014-04-11 9:37 GMT+02:00 Michal Skrivanek <michal.skrivanek@redhat.com>:
...
On 11 Apr 2014, at 09:00, Koen Vanoppen wrote:
Hi All,
Any news about this? DSM hook or anything?
Thanx!
Kind regards
2014-04-09 9:37 GMT+02:00 Omer Frenkel <ofrenkel@redhat.com>:
...
...
From: "Koen Vanoppen" <vanoppen.koen@gmail.com>
To: users@ovirt.org
Sent: Tuesday, April 8, 2014 3:41:02 PM
Subject: Re: [Users] HA
Or with other words, the SPM and the VM should move almost immediate
after
the storage connections on the hypervisor are gone. I know, I'm asking
to
much maybe, but we would be very happy :-) :-).
So sketch:
Mercury1 SPM
Mercury 2
Mercury1 loses both fibre connections --> goes in non-operational and
----- Original Message -----
the VM
...
goes in paused state and stays this way, until I manually reboot the
host so
it fences.
What I would like is that when mercury 1 loses both fibre connections.
He
fences immediate so the VM's are moved also almost instantly... If this
is
possible... :-)
Kind regards and thanks for all the help!
Michal, is there a vdsm hook for vm moved to pause?
if so, you could send KILL to it, and engine will identify vm was
killed+HA,
so it will be restarted, and no need to reboot the host, it will stay in
non-operational until storage is fixed.
you have to differentiate - if only the VMs would be paused, yes, you can
do anything (also change the err reporting policy to not pause the VM)
but if the host becomes non-operational then it simply doesn't work, vdsm
got stuck somewhere (often in get blk device stats)
proper power management config should fence it
Thanks,
michal
...
...
2014-04-08 14:26 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > :
Ok,
Thanx already for all the help. I adapted some things for quicker
...
engine-config --get FenceQuietTimeBetweenOperationsInSec-->180
engine-config --set FenceQuietTimeBetweenOperationsInSec=60
engine-config --get StorageDomainFalureTimeoutInMinutes-->180
engine-config --set StorageDomainFalureTimeoutInMinutes=1
engine-config --get SpmCommandFailOverRetries-->5
engine-config --set SpmCommandFailOverRetries
engine-config --get SPMFailOverAttempts-->3
engine-config --set SPMFailOverAttempts=1
engine-config --get NumberOfFailedRunsOnVds-->3
engine-config --set NumberOfFailedRunsOnVds=1
engine-config --get vdsTimeout-->180
engine-config --set vdsTimeout=30
engine-config --get VDSAttemptsToResetCount-->2
engine-config --set VDSAttemptsToResetCount=1
engine-config --get TimeoutToResetVdsInSeconds-->60
engine-config --set TimeoutToResetVdsInSeconds=30
Now the result of this is that when the VM is not running on the SPM
...
will migrate before going in pause mode.
But when we tried it, when the vm is running on the SPM, it get's in
...
mode (for safety reasons, I know ;-) ). And stays there until the host
gets
MANUALLY fenced by rebooting it. So now my question is... How can I
make the
hypervisor fence (so reboots, so vm is moved) quicker?
Kind regards,
Koen
2014-04-04 16:28 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > :
Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-).
Ik heb
reeds de time out aangepast. Die stond op 5 min voor hij den time out
ging
geven. Staat nu op 2 min
On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" <
david.van.zeebroeck@brusselsairport.be > wrote:
Ik heb ze ook he
Maar normaal had de fencing moeten werken als ik het zo lees
Dus daar is ergens iets verkeerd gelopen zo te lezen
From: Koen Vanoppen [mailto: vanoppen.koen@gmail.com ]
Sent: vrijdag 4 april 2014 16:07
To: David Van Zeebroeck
Subject: Fwd: Re: [Users] HA
David Van Zeebroeck
Product Manager Unix Infrastructure
Information & Communication Technology
Brussels Airport Company
T +32 (0)2 753 66 24
M +32 (0)497 02 17 31
david.van.zeebroeck@brusselsairport.be
www.brusselsairport.be
FOLLOW US ON:
Company Info
---------- Forwarded message ----------
From: "Michal Skrivanek" < michal.skrivanek@redhat.com >
Date: Apr 4, 2014 3:39 PM
Subject: Re: [Users] HA
To: "Koen Vanoppen" < vanoppen.koen@gmail.com >
Cc: "ovirt-users Users" < users@ovirt.org >
On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
Do you have power management configured?
Was the "failed" host fenced/rebooted?
On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < vanoppen.koen@gmail.com>
wrote:
So... It is possible for a fully automatic migration of the VM to
another
hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the
situation they stayed in pause state.
(Test situation:
* Unplug the 2 fibre cables from the hypervisor
    * VM's go in pause state
    * VM's stayed in pause state until the failure was solved
as said before, it's not safe hence we (try to) not migrate them.
They only get paused when they actually access the storage which may
not be
always the case. I.e. the storage connection is severed, host deemed
NonOperational and VMs are getting migrated from it, then some of them
will
succeed if they didn't access that "bad" storage … the paused VMs will
remain (mostly, it can still happen that they appear paused migrated on
other host when the disk access occurs only at the last stage of
migration)
so in other words, if you want to migrate the VMs without interruption
it's
not sometimes possible
if you are fine with the VMs restarted in short time on other host then
...
management/fencing will help here
Thanks,
michal
)
They only returned when we restored the fiber connection to the
Hypervisor…
yes, since 3.3 we have the autoresume feature
Thanks,
michal
Kind Regards,
Koen
2014-04-04 13:52 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com >:
So... It is possible for a fully automatic migration of the VM to
another
hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the
situation they stayed in pause state.
(Test situation:
* Unplug the 2 fibre cables from the hypervisor
    * VM's go in pause state
    * VM's stayed in pause state until the failure was solved
)
They only returned when we restored the fiber connection to the
Hypervisor...
Kind Regards,
Koen
2014-04-03 16:53 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com >:
---------- Forwarded message ----------
From: "Doron Fediuck" < dfediuck@redhat.com >
Date: Apr 3, 2014 4:51 PM
Subject: Re: [Users] HA
To: "Koen Vanoppen" < vanoppen.koen@gmail.com >
Cc: "Omer Frenkel" < ofrenkel@redhat.com >, < users@ovirt.org >,
"Federico
Simoncelli" < fsimonce@redhat.com >, "Allon Mureinik" <
amureini@redhat.com
...
----- Original Message -----
...
From: "Koen Vanoppen" < vanoppen.koen@gmail.com >
To: "Omer Frenkel" < ofrenkel@redhat.com >, users@ovirt.org
Sent: Wednesday, April 2, 2014 4:17:36 PM
Subject: Re: [Users] HA
Yes, indeed. I meant not-operational. Sorry.
So, if I understand this correctly. When we ever come in a situation
respons:
that it
paused
power
that
...
...
we
loose both storage connections on our hypervisor, we will have to
manually
restore the connections first?
And thanx for the tip for speeding up thins :-).
Kind regards,
Koen
2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel@redhat.com > :
----- Original Message -----
...
From: "Koen Vanoppen" < vanoppen.koen@gmail.com >
To: users@ovirt.org
Sent: Wednesday, April 2, 2014 4:07:19 PM
Subject: [Users] HA
Dear All,
Due our acceptance testing, we discovered something. (Document will
follow).
When we disable one fiber path, no problem multipath finds it way no
pings
are lost.
BUT when we disabled both the fiber paths (so one of the storage
domain
is
gone on this host, but still available on the other host), vms go in
paused
mode... He chooses a new SPM (can we speed this up?), put's the
host in
non-responsive (can we speed this up, more important) and the VM's
stay
on
Paused mode... I would expect that they would be migrated (yes, HA
is
i guess you mean the host moves to not-operational (in contrast to
non-responsive)?
if so, the engine will not migrate vms that are paused to do io error,
because of data corruption risk.
to speed up you can look at the storage domain monitoring timeout:
engine-config --get StorageDomainFalureTimeoutInMinutes
...
enabled) to the other host and reboot there... Any solution? We are
still
using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the
easter
holiday.
Kind Regards,
Koen
Hi Koen,
Resuming from paused due to io issues is supported (adding relevant
folks).
Regardless, if you did not define power management, you should manually
approve
source host was rebooted in order for migration to proceed. Otherwise
we risk
split-brain scenario.
Doron
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Fwd: [Users] HA

Koen Vanoppen

Michal Skrivanek

tags

participants (2)