
Ok, Thanx already for all the help. I adapted some things for quicker respons: engine-config --get FenceQuietTimeBetweenOperationsInSec-->180 engine-config --set FenceQuietTimeBetweenOperationsInSec=60 engine-config --get StorageDomainFalureTimeoutInMinutes-->180 engine-config --set StorageDomainFalureTimeoutInMinutes=1 engine-config --get SpmCommandFailOverRetries-->5 engine-config --set SpmCommandFailOverRetries engine-config --get SPMFailOverAttempts-->3 engine-config --set SPMFailOverAttempts=1 engine-config --get NumberOfFailedRunsOnVds-->3 engine-config --set NumberOfFailedRunsOnVds=1 engine-config --get vdsTimeout-->180 engine-config --set vdsTimeout=30 engine-config --get VDSAttemptsToResetCount-->2 engine-config --set VDSAttemptsToResetCount=1 engine-config --get TimeoutToResetVdsInSeconds-->60 engine-config --set TimeoutToResetVdsInSeconds=30 Now the result of this is that when the VM is not running on the SPM that it will migrate before going in pause mode. But when we tried it, when the vm is running on the SPM, it get's in paused mode (for safety reasons, I know ;-) ). And stays there until the host gets MANUALLY fenced by rebooting it. So now my question is... How can I make the hypervisor fence (so reboots, so vm is moved) quicker? Kind regards, Koen 2014-04-04 16:28 GMT+02:00 Koen Vanoppen <vanoppen.koen@gmail.com>:
Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb reeds de time out aangepast. Die stond op 5 min voor hij den time out ging geven. Staat nu op 2 min On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" < david.van.zeebroeck@brusselsairport.be> wrote:
Ik heb ze ook he
Maar normaal had de fencing moeten werken als ik het zo lees
Dus daar is ergens iets verkeerd gelopen zo te lezen
*From:* Koen Vanoppen [mailto:vanoppen.koen@gmail.com] *Sent:* vrijdag 4 april 2014 16:07 *To:* David Van Zeebroeck *Subject:* Fwd: Re: [Users] HA
David Van Zeebroeck
Product Manager Unix Infrastructure
Information & Communication Technology
*Brussels Airport Company*
T +32 (0)2 753 66 24
M +32 (0)497 02 17 31
david.van.zeebroeck@brusselsairport.be
*www.brusselsairport.be <http://www.brusselsairport.be>*
*FOLLOW US ON:*
<http://https://nl-nl.facebook.com/BrusselsairportBRU>
<http://www.brusselsairport.be/en/mediaroom/brusm/>
Company Info <http://www.brusselsairport.be/en/maildisclaimer/>
---------- Forwarded message ---------- From: "Michal Skrivanek" <michal.skrivanek@redhat.com> Date: Apr 4, 2014 3:39 PM Subject: Re: [Users] HA To: "Koen Vanoppen" <vanoppen.koen@gmail.com> Cc: "ovirt-users Users" <users@ovirt.org>
On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
Do you have power management configured?
Was the "failed" host fenced/rebooted?
On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen <vanoppen.koen@gmail.com> wrote:
So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state.
(Test situation:
- Unplug the 2 fibre cables from the hypervisor - VM's go in pause state - VM's stayed in pause state until the failure was solved
as said before, it's not safe hence we (try to) not migrate them.
They only get paused when they actually access the storage which may not be always the case. I.e. the storage connection is severed, host deemed NonOperational and VMs are getting migrated from it, then some of them will succeed if they didn't access that "bad" storage … the paused VMs will remain (mostly, it can still happen that they appear paused migrated on other host when the disk access occurs only at the last stage of migration)
so in other words, if you want to migrate the VMs without interruption it's not sometimes possible
if you are fine with the VMs restarted in short time on other host then power management/fencing will help here
Thanks,
michal
)
They only returned when we restored the fiber connection to the Hypervisor…
yes, since 3.3 we have the autoresume feature
Thanks,
michal
Kind Regards,
Koen
2014-04-04 13:52 GMT+02:00 Koen Vanoppen <vanoppen.koen@gmail.com>:
So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state.
(Test situation:
- Unplug the 2 fibre cables from the hypervisor - VM's go in pause state - VM's stayed in pause state until the failure was solved
)
They only returned when we restored the fiber connection to the Hypervisor...
Kind Regards,
Koen
2014-04-03 16:53 GMT+02:00 Koen Vanoppen <vanoppen.koen@gmail.com>:
---------- Forwarded message ---------- From: "Doron Fediuck" <dfediuck@redhat.com> Date: Apr 3, 2014 4:51 PM Subject: Re: [Users] HA
To: "Koen Vanoppen" <vanoppen.koen@gmail.com> Cc: "Omer Frenkel" <ofrenkel@redhat.com>, <users@ovirt.org>, "Federico Simoncelli" <fsimonce@redhat.com>, "Allon Mureinik" <amureini@redhat.com>
From: "Koen Vanoppen" <vanoppen.koen@gmail.com> To: "Omer Frenkel" <ofrenkel@redhat.com>, users@ovirt.org Sent: Wednesday, April 2, 2014 4:17:36 PM Subject: Re: [Users] HA
Yes, indeed. I meant not-operational. Sorry. So, if I understand this correctly. When we ever come in a situation
loose both storage connections on our hypervisor, we will have to manually restore the connections first?
And thanx for the tip for speeding up thins :-).
Kind regards,
Koen
2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel@redhat.com > :
----- Original Message -----
From: "Koen Vanoppen" < vanoppen.koen@gmail.com > To: users@ovirt.org Sent: Wednesday, April 2, 2014 4:07:19 PM Subject: [Users] HA
Dear All,
Due our acceptance testing, we discovered something. (Document will follow). When we disable one fiber path, no problem multipath finds it way no
are lost. BUT when we disabled both the fiber paths (so one of the storage domain is gone on this host, but still available on the other host), vms go in
----- Original Message ----- that we pings paused
mode... He chooses a new SPM (can we speed this up?), put's the host in non-responsive (can we speed this up, more important) and the VM's stay on Paused mode... I would expect that they would be migrated (yes, HA is
i guess you mean the host moves to not-operational (in contrast to non-responsive)? if so, the engine will not migrate vms that are paused to do io error, because of data corruption risk.
to speed up you can look at the storage domain monitoring timeout: engine-config --get StorageDomainFalureTimeoutInMinutes
enabled) to the other host and reboot there... Any solution? We are still using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the easter holiday.
Kind Regards,
Koen
Hi Koen, Resuming from paused due to io issues is supported (adding relevant folks). Regardless, if you did not define power management, you should manually approve source host was rebooted in order for migration to proceed. Otherwise we risk split-brain scenario.
Doron
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Or with other words, the SPM and the VM should move almost immediate after the storage connections on the hypervisor are gone. I know, I'm asking to much maybe, but we would be very happy :-) :-). So sketch: Mercury1 SPM Mercury 2 Mercury1 loses both fibre connections --> goes in non-operational and the VM goes in paused state and stays this way, until I manually reboot the host so it fences. What I would like is that when mercury 1 loses both fibre connections. He fences immediate so the VM's are moved also almost instantly... If this is possible... :-) Kind regards and thanks for all the help! 2014-04-08 14:26 GMT+02:00 Koen Vanoppen <vanoppen.koen@gmail.com>:
Ok, Thanx already for all the help. I adapted some things for quicker respons: engine-config --get FenceQuietTimeBetweenOperationsInSec-->180 engine-config --set FenceQuietTimeBetweenOperationsInSec=60
engine-config --get StorageDomainFalureTimeoutInMinutes-->180 engine-config --set StorageDomainFalureTimeoutInMinutes=1
engine-config --get SpmCommandFailOverRetries-->5 engine-config --set SpmCommandFailOverRetries
engine-config --get SPMFailOverAttempts-->3 engine-config --set SPMFailOverAttempts=1
engine-config --get NumberOfFailedRunsOnVds-->3 engine-config --set NumberOfFailedRunsOnVds=1
engine-config --get vdsTimeout-->180 engine-config --set vdsTimeout=30
engine-config --get VDSAttemptsToResetCount-->2 engine-config --set VDSAttemptsToResetCount=1
engine-config --get TimeoutToResetVdsInSeconds-->60 engine-config --set TimeoutToResetVdsInSeconds=30
Now the result of this is that when the VM is not running on the SPM that it will migrate before going in pause mode. But when we tried it, when the vm is running on the SPM, it get's in paused mode (for safety reasons, I know ;-) ). And stays there until the host gets MANUALLY fenced by rebooting it. So now my question is... How can I make the hypervisor fence (so reboots, so vm is moved) quicker?
Kind regards,
Koen
2014-04-04 16:28 GMT+02:00 Koen Vanoppen <vanoppen.koen@gmail.com>:
Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik
heb reeds de time out aangepast. Die stond op 5 min voor hij den time out ging geven. Staat nu op 2 min On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" < david.van.zeebroeck@brusselsairport.be> wrote:
Ik heb ze ook he
Maar normaal had de fencing moeten werken als ik het zo lees
Dus daar is ergens iets verkeerd gelopen zo te lezen
*From:* Koen Vanoppen [mailto:vanoppen.koen@gmail.com] *Sent:* vrijdag 4 april 2014 16:07 *To:* David Van Zeebroeck *Subject:* Fwd: Re: [Users] HA
David Van Zeebroeck
Product Manager Unix Infrastructure
Information & Communication Technology
*Brussels Airport Company*
T +32 (0)2 753 66 24
M +32 (0)497 02 17 31
david.van.zeebroeck@brusselsairport.be
*www.brusselsairport.be <http://www.brusselsairport.be>*
*FOLLOW US ON:*
<http://https://nl-nl.facebook.com/BrusselsairportBRU>
<http://www.brusselsairport.be/en/mediaroom/brusm/>
Company Info <http://www.brusselsairport.be/en/maildisclaimer/>
---------- Forwarded message ---------- From: "Michal Skrivanek" <michal.skrivanek@redhat.com> Date: Apr 4, 2014 3:39 PM Subject: Re: [Users] HA To: "Koen Vanoppen" <vanoppen.koen@gmail.com> Cc: "ovirt-users Users" <users@ovirt.org>
On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
Do you have power management configured?
Was the "failed" host fenced/rebooted?
On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen <vanoppen.koen@gmail.com> wrote:
So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state.
(Test situation:
- Unplug the 2 fibre cables from the hypervisor - VM's go in pause state - VM's stayed in pause state until the failure was solved
as said before, it's not safe hence we (try to) not migrate them.
They only get paused when they actually access the storage which may not be always the case. I.e. the storage connection is severed, host deemed NonOperational and VMs are getting migrated from it, then some of them will succeed if they didn't access that "bad" storage … the paused VMs will remain (mostly, it can still happen that they appear paused migrated on other host when the disk access occurs only at the last stage of migration)
so in other words, if you want to migrate the VMs without interruption it's not sometimes possible
if you are fine with the VMs restarted in short time on other host then power management/fencing will help here
Thanks,
michal
)
They only returned when we restored the fiber connection to the Hypervisor…
yes, since 3.3 we have the autoresume feature
Thanks,
michal
Kind Regards,
Koen
2014-04-04 13:52 GMT+02:00 Koen Vanoppen <vanoppen.koen@gmail.com>:
So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state.
(Test situation:
- Unplug the 2 fibre cables from the hypervisor - VM's go in pause state - VM's stayed in pause state until the failure was solved
)
They only returned when we restored the fiber connection to the Hypervisor...
Kind Regards,
Koen
2014-04-03 16:53 GMT+02:00 Koen Vanoppen <vanoppen.koen@gmail.com>:
---------- Forwarded message ---------- From: "Doron Fediuck" <dfediuck@redhat.com> Date: Apr 3, 2014 4:51 PM Subject: Re: [Users] HA
To: "Koen Vanoppen" <vanoppen.koen@gmail.com> Cc: "Omer Frenkel" <ofrenkel@redhat.com>, <users@ovirt.org>, "Federico Simoncelli" <fsimonce@redhat.com>, "Allon Mureinik" <amureini@redhat.com
From: "Koen Vanoppen" <vanoppen.koen@gmail.com> To: "Omer Frenkel" <ofrenkel@redhat.com>, users@ovirt.org Sent: Wednesday, April 2, 2014 4:17:36 PM Subject: Re: [Users] HA
Yes, indeed. I meant not-operational. Sorry. So, if I understand this correctly. When we ever come in a situation
loose both storage connections on our hypervisor, we will have to manually restore the connections first?
And thanx for the tip for speeding up thins :-).
Kind regards,
Koen
2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel@redhat.com > :
----- Original Message -----
From: "Koen Vanoppen" < vanoppen.koen@gmail.com > To: users@ovirt.org Sent: Wednesday, April 2, 2014 4:07:19 PM Subject: [Users] HA
Dear All,
Due our acceptance testing, we discovered something. (Document will follow). When we disable one fiber path, no problem multipath finds it way no
are lost. BUT when we disabled both the fiber paths (so one of the storage domain is gone on this host, but still available on the other host), vms go in
----- Original Message ----- that we pings paused
mode... He chooses a new SPM (can we speed this up?), put's the host in non-responsive (can we speed this up, more important) and the VM's stay on Paused mode... I would expect that they would be migrated (yes, HA is
i guess you mean the host moves to not-operational (in contrast to non-responsive)? if so, the engine will not migrate vms that are paused to do io error, because of data corruption risk.
to speed up you can look at the storage domain monitoring timeout: engine-config --get StorageDomainFalureTimeoutInMinutes
enabled) to the other host and reboot there... Any solution? We are still using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the easter holiday.
Kind Regards,
Koen
Hi Koen, Resuming from paused due to io issues is supported (adding relevant folks). Regardless, if you did not define power management, you should manually approve source host was rebooted in order for migration to proceed. Otherwise we risk split-brain scenario.
Doron
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

----- Original Message -----
From: "Koen Vanoppen" <vanoppen.koen@gmail.com> To: users@ovirt.org Sent: Tuesday, April 8, 2014 3:41:02 PM Subject: Re: [Users] HA
Or with other words, the SPM and the VM should move almost immediate after the storage connections on the hypervisor are gone. I know, I'm asking to much maybe, but we would be very happy :-) :-).
So sketch:
Mercury1 SPM Mercury 2
Mercury1 loses both fibre connections --> goes in non-operational and the VM goes in paused state and stays this way, until I manually reboot the host so it fences.
What I would like is that when mercury 1 loses both fibre connections. He fences immediate so the VM's are moved also almost instantly... If this is possible... :-)
Kind regards and thanks for all the help!
Michal, is there a vdsm hook for vm moved to pause? if so, you could send KILL to it, and engine will identify vm was killed+HA, so it will be restarted, and no need to reboot the host, it will stay in non-operational until storage is fixed.
2014-04-08 14:26 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > :
Ok, Thanx already for all the help. I adapted some things for quicker respons: engine-config --get FenceQuietTimeBetweenOperationsInSec-->180 engine-config --set FenceQuietTimeBetweenOperationsInSec=60
engine-config --get StorageDomainFalureTimeoutInMinutes-->180 engine-config --set StorageDomainFalureTimeoutInMinutes=1
engine-config --get SpmCommandFailOverRetries-->5 engine-config --set SpmCommandFailOverRetries
engine-config --get SPMFailOverAttempts-->3 engine-config --set SPMFailOverAttempts=1
engine-config --get NumberOfFailedRunsOnVds-->3 engine-config --set NumberOfFailedRunsOnVds=1
engine-config --get vdsTimeout-->180 engine-config --set vdsTimeout=30
engine-config --get VDSAttemptsToResetCount-->2 engine-config --set VDSAttemptsToResetCount=1
engine-config --get TimeoutToResetVdsInSeconds-->60 engine-config --set TimeoutToResetVdsInSeconds=30
Now the result of this is that when the VM is not running on the SPM that it will migrate before going in pause mode. But when we tried it, when the vm is running on the SPM, it get's in paused mode (for safety reasons, I know ;-) ). And stays there until the host gets MANUALLY fenced by rebooting it. So now my question is... How can I make the hypervisor fence (so reboots, so vm is moved) quicker?
Kind regards,
Koen
2014-04-04 16:28 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > :
Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb reeds de time out aangepast. Die stond op 5 min voor hij den time out ging geven. Staat nu op 2 min On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" < david.van.zeebroeck@brusselsairport.be > wrote:
Ik heb ze ook he
Maar normaal had de fencing moeten werken als ik het zo lees
Dus daar is ergens iets verkeerd gelopen zo te lezen
From: Koen Vanoppen [mailto: vanoppen.koen@gmail.com ] Sent: vrijdag 4 april 2014 16:07 To: David Van Zeebroeck Subject: Fwd: Re: [Users] HA
David Van Zeebroeck
Product Manager Unix Infrastructure
Information & Communication Technology
Brussels Airport Company
T +32 (0)2 753 66 24
M +32 (0)497 02 17 31
david.van.zeebroeck@brusselsairport.be
www.brusselsairport.be
FOLLOW US ON:
Company Info
---------- Forwarded message ---------- From: "Michal Skrivanek" < michal.skrivanek@redhat.com > Date: Apr 4, 2014 3:39 PM Subject: Re: [Users] HA To: "Koen Vanoppen" < vanoppen.koen@gmail.com > Cc: "ovirt-users Users" < users@ovirt.org >
On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
Do you have power management configured?
Was the "failed" host fenced/rebooted?
On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < vanoppen.koen@gmail.com > wrote:
So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state.
(Test situation:
* Unplug the 2 fibre cables from the hypervisor * VM's go in pause state * VM's stayed in pause state until the failure was solved
as said before, it's not safe hence we (try to) not migrate them.
They only get paused when they actually access the storage which may not be always the case. I.e. the storage connection is severed, host deemed NonOperational and VMs are getting migrated from it, then some of them will succeed if they didn't access that "bad" storage … the paused VMs will remain (mostly, it can still happen that they appear paused migrated on other host when the disk access occurs only at the last stage of migration)
so in other words, if you want to migrate the VMs without interruption it's not sometimes possible
if you are fine with the VMs restarted in short time on other host then power management/fencing will help here
Thanks,
michal
)
They only returned when we restored the fiber connection to the Hypervisor…
yes, since 3.3 we have the autoresume feature
Thanks,
michal
Kind Regards,
Koen
2014-04-04 13:52 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com >:
So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state.
(Test situation:
* Unplug the 2 fibre cables from the hypervisor * VM's go in pause state * VM's stayed in pause state until the failure was solved
)
They only returned when we restored the fiber connection to the Hypervisor...
Kind Regards,
Koen
2014-04-03 16:53 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com >:
---------- Forwarded message ---------- From: "Doron Fediuck" < dfediuck@redhat.com > Date: Apr 3, 2014 4:51 PM Subject: Re: [Users] HA
To: "Koen Vanoppen" < vanoppen.koen@gmail.com > Cc: "Omer Frenkel" < ofrenkel@redhat.com >, < users@ovirt.org >, "Federico Simoncelli" < fsimonce@redhat.com >, "Allon Mureinik" < amureini@redhat.com
----- Original Message -----
From: "Koen Vanoppen" < vanoppen.koen@gmail.com > To: "Omer Frenkel" < ofrenkel@redhat.com >, users@ovirt.org Sent: Wednesday, April 2, 2014 4:17:36 PM Subject: Re: [Users] HA
Yes, indeed. I meant not-operational. Sorry. So, if I understand this correctly. When we ever come in a situation that we loose both storage connections on our hypervisor, we will have to manually restore the connections first?
And thanx for the tip for speeding up thins :-).
Kind regards,
Koen
2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel@redhat.com > :
----- Original Message -----
From: "Koen Vanoppen" < vanoppen.koen@gmail.com > To: users@ovirt.org Sent: Wednesday, April 2, 2014 4:07:19 PM Subject: [Users] HA
Dear All,
Due our acceptance testing, we discovered something. (Document will follow). When we disable one fiber path, no problem multipath finds it way no pings are lost. BUT when we disabled both the fiber paths (so one of the storage domain is gone on this host, but still available on the other host), vms go in paused mode... He chooses a new SPM (can we speed this up?), put's the host in non-responsive (can we speed this up, more important) and the VM's stay on Paused mode... I would expect that they would be migrated (yes, HA is
i guess you mean the host moves to not-operational (in contrast to non-responsive)? if so, the engine will not migrate vms that are paused to do io error, because of data corruption risk.
to speed up you can look at the storage domain monitoring timeout: engine-config --get StorageDomainFalureTimeoutInMinutes
enabled) to the other host and reboot there... Any solution? We are still using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the easter holiday.
Kind Regards,
Koen
Hi Koen, Resuming from paused due to io issues is supported (adding relevant folks). Regardless, if you did not define power management, you should manually approve source host was rebooted in order for migration to proceed. Otherwise we risk split-brain scenario.
Doron
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi All, Any news about this? DSM hook or anything? Thanx! Kind regards 2014-04-09 9:37 GMT+02:00 Omer Frenkel <ofrenkel@redhat.com>:
From: "Koen Vanoppen" <vanoppen.koen@gmail.com> To: users@ovirt.org Sent: Tuesday, April 8, 2014 3:41:02 PM Subject: Re: [Users] HA
Or with other words, the SPM and the VM should move almost immediate after the storage connections on the hypervisor are gone. I know, I'm asking to much maybe, but we would be very happy :-) :-).
So sketch:
Mercury1 SPM Mercury 2
Mercury1 loses both fibre connections --> goes in non-operational and
----- Original Message ----- the VM
goes in paused state and stays this way, until I manually reboot the host so it fences.
What I would like is that when mercury 1 loses both fibre connections. He fences immediate so the VM's are moved also almost instantly... If this is possible... :-)
Kind regards and thanks for all the help!
Michal, is there a vdsm hook for vm moved to pause? if so, you could send KILL to it, and engine will identify vm was killed+HA, so it will be restarted, and no need to reboot the host, it will stay in non-operational until storage is fixed.
2014-04-08 14:26 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > :
Ok, Thanx already for all the help. I adapted some things for quicker
engine-config --get FenceQuietTimeBetweenOperationsInSec-->180 engine-config --set FenceQuietTimeBetweenOperationsInSec=60
engine-config --get StorageDomainFalureTimeoutInMinutes-->180 engine-config --set StorageDomainFalureTimeoutInMinutes=1
engine-config --get SpmCommandFailOverRetries-->5 engine-config --set SpmCommandFailOverRetries
engine-config --get SPMFailOverAttempts-->3 engine-config --set SPMFailOverAttempts=1
engine-config --get NumberOfFailedRunsOnVds-->3 engine-config --set NumberOfFailedRunsOnVds=1
engine-config --get vdsTimeout-->180 engine-config --set vdsTimeout=30
engine-config --get VDSAttemptsToResetCount-->2 engine-config --set VDSAttemptsToResetCount=1
engine-config --get TimeoutToResetVdsInSeconds-->60 engine-config --set TimeoutToResetVdsInSeconds=30
Now the result of this is that when the VM is not running on the SPM
will migrate before going in pause mode. But when we tried it, when the vm is running on the SPM, it get's in
mode (for safety reasons, I know ;-) ). And stays there until the host gets MANUALLY fenced by rebooting it. So now my question is... How can I make
hypervisor fence (so reboots, so vm is moved) quicker?
Kind regards,
Koen
2014-04-04 16:28 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > :
Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb reeds de time out aangepast. Die stond op 5 min voor hij den time out ging geven. Staat nu op 2 min On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" < david.van.zeebroeck@brusselsairport.be > wrote:
Ik heb ze ook he
Maar normaal had de fencing moeten werken als ik het zo lees
Dus daar is ergens iets verkeerd gelopen zo te lezen
From: Koen Vanoppen [mailto: vanoppen.koen@gmail.com ] Sent: vrijdag 4 april 2014 16:07 To: David Van Zeebroeck Subject: Fwd: Re: [Users] HA
David Van Zeebroeck
Product Manager Unix Infrastructure
Information & Communication Technology
Brussels Airport Company
T +32 (0)2 753 66 24
M +32 (0)497 02 17 31
david.van.zeebroeck@brusselsairport.be
www.brusselsairport.be
FOLLOW US ON:
Company Info
---------- Forwarded message ---------- From: "Michal Skrivanek" < michal.skrivanek@redhat.com > Date: Apr 4, 2014 3:39 PM Subject: Re: [Users] HA To: "Koen Vanoppen" < vanoppen.koen@gmail.com > Cc: "ovirt-users Users" < users@ovirt.org >
On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
Do you have power management configured?
Was the "failed" host fenced/rebooted?
On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < vanoppen.koen@gmail.com> wrote:
So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state.
(Test situation:
* Unplug the 2 fibre cables from the hypervisor * VM's go in pause state * VM's stayed in pause state until the failure was solved
as said before, it's not safe hence we (try to) not migrate them.
They only get paused when they actually access the storage which may not be always the case. I.e. the storage connection is severed, host deemed NonOperational and VMs are getting migrated from it, then some of them will succeed if they didn't access that "bad" storage … the paused VMs will remain (mostly, it can still happen that they appear paused migrated on other host when the disk access occurs only at the last stage of migration)
so in other words, if you want to migrate the VMs without interruption it's not sometimes possible
if you are fine with the VMs restarted in short time on other host then
management/fencing will help here
Thanks,
michal
)
They only returned when we restored the fiber connection to the Hypervisor…
yes, since 3.3 we have the autoresume feature
Thanks,
michal
Kind Regards,
Koen
2014-04-04 13:52 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com >:
So... It is possible for a fully automatic migration of the VM to another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested the situation they stayed in pause state.
(Test situation:
* Unplug the 2 fibre cables from the hypervisor * VM's go in pause state * VM's stayed in pause state until the failure was solved
)
They only returned when we restored the fiber connection to the Hypervisor...
Kind Regards,
Koen
2014-04-03 16:53 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com >:
---------- Forwarded message ---------- From: "Doron Fediuck" < dfediuck@redhat.com > Date: Apr 3, 2014 4:51 PM Subject: Re: [Users] HA
To: "Koen Vanoppen" < vanoppen.koen@gmail.com > Cc: "Omer Frenkel" < ofrenkel@redhat.com >, < users@ovirt.org >, "Federico Simoncelli" < fsimonce@redhat.com >, "Allon Mureinik" < amureini@redhat.com
----- Original Message -----
From: "Koen Vanoppen" < vanoppen.koen@gmail.com > To: "Omer Frenkel" < ofrenkel@redhat.com >, users@ovirt.org Sent: Wednesday, April 2, 2014 4:17:36 PM Subject: Re: [Users] HA
Yes, indeed. I meant not-operational. Sorry. So, if I understand this correctly. When we ever come in a situation
respons: that it paused the power that
we loose both storage connections on our hypervisor, we will have to manually restore the connections first?
And thanx for the tip for speeding up thins :-).
Kind regards,
Koen
2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel@redhat.com > :
----- Original Message -----
From: "Koen Vanoppen" < vanoppen.koen@gmail.com > To: users@ovirt.org Sent: Wednesday, April 2, 2014 4:07:19 PM Subject: [Users] HA
Dear All,
Due our acceptance testing, we discovered something. (Document will follow). When we disable one fiber path, no problem multipath finds it way no pings are lost. BUT when we disabled both the fiber paths (so one of the storage domain is gone on this host, but still available on the other host), vms go in paused mode... He chooses a new SPM (can we speed this up?), put's the host in non-responsive (can we speed this up, more important) and the VM's stay on Paused mode... I would expect that they would be migrated (yes, HA is
i guess you mean the host moves to not-operational (in contrast to non-responsive)? if so, the engine will not migrate vms that are paused to do io error, because of data corruption risk.
to speed up you can look at the storage domain monitoring timeout: engine-config --get StorageDomainFalureTimeoutInMinutes
enabled) to the other host and reboot there... Any solution? We are still using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the easter holiday.
Kind Regards,
Koen
Hi Koen, Resuming from paused due to io issues is supported (adding relevant folks). Regardless, if you did not define power management, you should manually approve source host was rebooted in order for migration to proceed. Otherwise we risk split-brain scenario.
Doron
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi All, =20 Any news about this? DSM hook or anything? Thanx! =20 Kind regards =20 =20 2014-04-09 9:37 GMT+02:00 Omer Frenkel <ofrenkel@redhat.com>: =20 =20 ----- Original Message -----
From: "Koen Vanoppen" <vanoppen.koen@gmail.com> To: users@ovirt.org Sent: Tuesday, April 8, 2014 3:41:02 PM Subject: Re: [Users] HA
Or with other words, the SPM and the VM should move almost immediate = after the storage connections on the hypervisor are gone. I know, I'm = asking to much maybe, but we would be very happy :-) :-).
So sketch:
Mercury1 SPM Mercury 2
Mercury1 loses both fibre connections --> goes in non-operational = and the VM goes in paused state and stays this way, until I manually reboot the = host so it fences.
What I would like is that when mercury 1 loses both fibre = connections. He fences immediate so the VM's are moved also almost instantly... If =
--Apple-Mail=_0681B1A7-9C69-4ECE-A5B9-25C543F80270 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On 11 Apr 2014, at 09:00, Koen Vanoppen wrote: this is
possible... :-)
Kind regards and thanks for all the help!
=20 Michal, is there a vdsm hook for vm moved to pause? if so, you could send KILL to it, and engine will identify vm was = killed+HA, so it will be restarted, and no need to reboot the host, it will stay = in non-operational until storage is fixed.
=20
2014-04-08 14:26 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > =
:
Ok, Thanx already for all the help. I adapted some things for quicker =
respons:
engine-config --get FenceQuietTimeBetweenOperationsInSec-->180 engine-config --set FenceQuietTimeBetweenOperationsInSec=3D60
engine-config --get StorageDomainFalureTimeoutInMinutes-->180 engine-config --set StorageDomainFalureTimeoutInMinutes=3D1
engine-config --get SpmCommandFailOverRetries-->5 engine-config --set SpmCommandFailOverRetries
engine-config --get SPMFailOverAttempts-->3 engine-config --set SPMFailOverAttempts=3D1
engine-config --get NumberOfFailedRunsOnVds-->3 engine-config --set NumberOfFailedRunsOnVds=3D1
engine-config --get vdsTimeout-->180 engine-config --set vdsTimeout=3D30
engine-config --get VDSAttemptsToResetCount-->2 engine-config --set VDSAttemptsToResetCount=3D1
engine-config --get TimeoutToResetVdsInSeconds-->60 engine-config --set TimeoutToResetVdsInSeconds=3D30
Now the result of this is that when the VM is not running on the SPM =
will migrate before going in pause mode. But when we tried it, when the vm is running on the SPM, it get's in =
mode (for safety reasons, I know ;-) ). And stays there until the = host gets MANUALLY fenced by rebooting it. So now my question is... How can I = make the hypervisor fence (so reboots, so vm is moved) quicker?
Kind regards,
Koen
2014-04-04 16:28 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com > = :
Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan = :-). Ik heb reeds de time out aangepast. Die stond op 5 min voor hij den time = out ging geven. Staat nu op 2 min On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" < david.van.zeebroeck@brusselsairport.be > wrote:
Ik heb ze ook he
Maar normaal had de fencing moeten werken als ik het zo lees
Dus daar is ergens iets verkeerd gelopen zo te lezen
From: Koen Vanoppen [mailto: vanoppen.koen@gmail.com ] Sent: vrijdag 4 april 2014 16:07 To: David Van Zeebroeck Subject: Fwd: Re: [Users] HA
David Van Zeebroeck
Product Manager Unix Infrastructure
Information & Communication Technology
Brussels Airport Company
T +32 (0)2 753 66 24
M +32 (0)497 02 17 31
david.van.zeebroeck@brusselsairport.be
www.brusselsairport.be
FOLLOW US ON:
Company Info
---------- Forwarded message ---------- From: "Michal Skrivanek" < michal.skrivanek@redhat.com > Date: Apr 4, 2014 3:39 PM Subject: Re: [Users] HA To: "Koen Vanoppen" < vanoppen.koen@gmail.com > Cc: "ovirt-users Users" < users@ovirt.org >
On 4 Apr 2014, at 15:14, Sander Grendelman wrote:
Do you have power management configured?
Was the "failed" host fenced/rebooted?
On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < = vanoppen.koen@gmail.com > wrote:
So... It is possible for a fully automatic migration of the VM to = another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested =
situation they stayed in pause state.
(Test situation:
* Unplug the 2 fibre cables from the hypervisor * VM's go in pause state * VM's stayed in pause state until the failure was solved
as said before, it's not safe hence we (try to) not migrate them.
They only get paused when they actually access the storage which may = not be always the case. I.e. the storage connection is severed, host deemed NonOperational and VMs are getting migrated from it, then some of =
succeed if they didn't access that "bad" storage =85 the paused VMs = will remain (mostly, it can still happen that they appear paused migrated = on other host when the disk access occurs only at the last stage of = migration)
so in other words, if you want to migrate the VMs without = interruption it's not sometimes possible
if you are fine with the VMs restarted in short time on other host =
management/fencing will help here
Thanks,
michal
)
They only returned when we restored the fiber connection to the = Hypervisor=85
yes, since 3.3 we have the autoresume feature
Thanks,
michal
Kind Regards,
Koen
2014-04-04 13:52 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com = :
So... It is possible for a fully automatic migration of the VM to = another hypervisor in case Storage connection fails?
How can we make this happen? Because for the moment, when we tested =
situation they stayed in pause state.
(Test situation:
* Unplug the 2 fibre cables from the hypervisor * VM's go in pause state * VM's stayed in pause state until the failure was solved
)
They only returned when we restored the fiber connection to the = Hypervisor...
Kind Regards,
Koen
2014-04-03 16:53 GMT+02:00 Koen Vanoppen < vanoppen.koen@gmail.com = :
---------- Forwarded message ---------- From: "Doron Fediuck" < dfediuck@redhat.com > Date: Apr 3, 2014 4:51 PM Subject: Re: [Users] HA
To: "Koen Vanoppen" < vanoppen.koen@gmail.com > Cc: "Omer Frenkel" < ofrenkel@redhat.com >, < users@ovirt.org >, = "Federico Simoncelli" < fsimonce@redhat.com >, "Allon Mureinik" < = amureini@redhat.com
----- Original Message -----
From: "Koen Vanoppen" < vanoppen.koen@gmail.com > To: "Omer Frenkel" < ofrenkel@redhat.com >, users@ovirt.org Sent: Wednesday, April 2, 2014 4:17:36 PM Subject: Re: [Users] HA
Yes, indeed. I meant not-operational. Sorry. So, if I understand this correctly. When we ever come in a = situation that we loose both storage connections on our hypervisor, we will have to = manually restore the connections first?
And thanx for the tip for speeding up thins :-).
Kind regards,
Koen
2014-04-02 15:14 GMT+02:00 Omer Frenkel < ofrenkel@redhat.com > :
----- Original Message -----
From: "Koen Vanoppen" < vanoppen.koen@gmail.com > To: users@ovirt.org Sent: Wednesday, April 2, 2014 4:07:19 PM Subject: [Users] HA
Dear All,
Due our acceptance testing, we discovered something. (Document = will follow). When we disable one fiber path, no problem multipath finds it = way no pings are lost. BUT when we disabled both the fiber paths (so one of the storage = domain is gone on this host, but still available on the other host), vms = go in paused mode... He chooses a new SPM (can we speed this up?), put's the = host in non-responsive (can we speed this up, more important) and the = VM's stay on Paused mode... I would expect that they would be migrated (yes, = HA is
i guess you mean the host moves to not-operational (in contrast to non-responsive)? if so, the engine will not migrate vms that are paused to do io = error, because of data corruption risk.
to speed up you can look at the storage domain monitoring timeout: engine-config --get StorageDomainFalureTimeoutInMinutes
enabled) to the other host and reboot there... Any solution? We = are still using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after =
you have to differentiate - if only the VMs would be paused, yes, you = can do anything (also change the err reporting policy to not pause the = VM) but if the host becomes non-operational then it simply doesn't work, = vdsm got stuck somewhere (often in get blk device stats) proper power management config should fence it Thanks, michal that it paused the them will then power the the easter
holiday.
Kind Regards,
Koen
Hi Koen, Resuming from paused due to io issues is supported (adding relevant = folks). Regardless, if you did not define power management, you should = manually approve source host was rebooted in order for migration to proceed. = Otherwise we risk split-brain scenario.
Doron
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
=20 _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_0681B1A7-9C69-4ECE-A5B9-25C543F80270 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 <html><head></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; = "><br><div><div>On 11 Apr 2014, at 09:00, Koen Vanoppen wrote:</div><br = class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div = dir=3D"ltr"><div><div>Hi All,<br><br>Any news about this? DSM hook or = anything?<br></div>Thanx!<br><br></div>Kind regards<br></div><div = class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2014-04-09 9:37 = GMT+02:00 Omer Frenkel <span dir=3D"ltr"><<a = href=3D"mailto:ofrenkel@redhat.com" = target=3D"_blank">ofrenkel@redhat.com</a>></span>:<br> <blockquote class=3D"gmail_quote" style=3D"margin-top: 0px; = margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; = border-left-width: 1px; border-left-color: rgb(204, 204, 204); = border-left-style: solid; padding-left: 1ex; position: static; z-index: = auto; "><div class=3D""><br> <br> ----- Original Message -----<br> > From: "Koen Vanoppen" <<a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a>><br=
><br> > Date: Apr 4, 2014 3:39 PM<br> > Subject: Re: [Users] HA<br> > To: "Koen Vanoppen" < <a =
> To: <a href=3D"mailto:users@ovirt.org">users@ovirt.org</a><br> </div><div class=3D"">> Sent: Tuesday, April 8, 2014 3:41:02 PM<br> > Subject: Re: [Users] HA<br> ><br> </div><div class=3D"">> Or with other words, the SPM and the VM = should move almost immediate after<br> > the storage connections on the hypervisor are gone. I know, I'm = asking to<br> > much maybe, but we would be very happy :-) :-).<br> ><br> > So sketch:<br> ><br> > Mercury1 SPM<br> > Mercury 2<br> ><br> > Mercury1 loses both fibre connections --> goes in = non-operational and the VM<br> > goes in paused state and stays this way, until I manually reboot = the host so<br> > it fences.<br> ><br> > What I would like is that when mercury 1 loses both fibre = connections. He<br> > fences immediate so the VM's are moved also almost instantly... If = this is<br> > possible... :-)<br> ><br> > Kind regards and thanks for all the help!<br> ><br> <br> </div>Michal, is there a vdsm hook for vm moved to pause?<br> if so, you could send KILL to it, and engine will identify vm was = killed+HA,<br> so it will be restarted, and no need to reboot the host, it will stay in = non-operational until storage is = fixed.<br></blockquote></div></div></blockquote><div><br></div>you have = to differentiate - if only the VMs would be paused, yes, you can do = anything (also change the err reporting policy to not pause the = VM)</div><div>but if the host becomes non-operational then it simply = doesn't work, vdsm got stuck somewhere (often in get blk device = stats)</div><div>proper power management config should fence = it</div><div><br></div><div>Thanks,</div><div>michal</div><div><br><blockq= uote type=3D"cite"><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><blockquote class=3D"gmail_quote" = style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, = 204, 204); border-left-style: solid; padding-left: 1ex; position: = static; z-index: auto; "> <div><div class=3D"h5"><br> ><br> ><br> > 2014-04-08 14:26 GMT+02:00 Koen Vanoppen < <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> > = :<br> ><br> ><br> ><br> > Ok,<br> > Thanx already for all the help. I adapted some things for quicker = respons:<br> > engine-config --get = FenceQuietTimeBetweenOperationsInSec-->180<br> > engine-config --set FenceQuietTimeBetweenOperationsInSec=3D60<br> ><br> > engine-config --get = StorageDomainFalureTimeoutInMinutes-->180<br> > engine-config --set StorageDomainFalureTimeoutInMinutes=3D1<br> ><br> > engine-config --get SpmCommandFailOverRetries-->5<br> > engine-config --set SpmCommandFailOverRetries<br> ><br> > engine-config --get SPMFailOverAttempts-->3<br> > engine-config --set SPMFailOverAttempts=3D1<br> ><br> > engine-config --get NumberOfFailedRunsOnVds-->3<br> > engine-config --set NumberOfFailedRunsOnVds=3D1<br> ><br> > engine-config --get vdsTimeout-->180<br> > engine-config --set vdsTimeout=3D30<br> ><br> > engine-config --get VDSAttemptsToResetCount-->2<br> > engine-config --set VDSAttemptsToResetCount=3D1<br> ><br> > engine-config --get TimeoutToResetVdsInSeconds-->60<br> > engine-config --set TimeoutToResetVdsInSeconds=3D30<br> ><br> > Now the result of this is that when the VM is not running on the = SPM that it<br> > will migrate before going in pause mode.<br> > But when we tried it, when the vm is running on the SPM, it get's = in paused<br> > mode (for safety reasons, I know ;-) ). And stays there until the = host gets<br> > MANUALLY fenced by rebooting it. So now my question is... How can I = make the<br> > hypervisor fence (so reboots, so vm is moved) quicker?<br> ><br> > Kind regards,<br> ><br> > Koen<br> ><br> ><br> > 2014-04-04 16:28 GMT+02:00 Koen Vanoppen < <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> > = :<br> ><br> ><br> ><br> ><br> ><br> > Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan = :-). Ik heb<br> > reeds de time out aangepast. Die stond op 5 min voor hij den time = out ging<br> > geven. Staat nu op 2 min<br> > On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" <<br> > <a = href=3D"mailto:david.van.zeebroeck@brusselsairport.be">david.van.zeebroeck= @brusselsairport.be</a> > wrote:<br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > Ik heb ze ook he<br> ><br> ><br> ><br> > Maar normaal had de fencing moeten werken als ik het zo lees<br> ><br> > Dus daar is ergens iets verkeerd gelopen zo te lezen<br> ><br> ><br> ><br> > From: Koen Vanoppen [mailto: <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> = ]<br> > Sent: vrijdag 4 april 2014 16:07<br> > To: David Van Zeebroeck<br> > Subject: Fwd: Re: [Users] HA<br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > David Van Zeebroeck<br> ><br> > Product Manager Unix Infrastructure<br> ><br> > Information & Communication Technology<br> ><br> > Brussels Airport Company<br> ><br> > T <a href=3D"tel:%2B32%20%280%292%20753%2066%2024" = value=3D"+3227536624">+32 (0)2 753 66 24</a><br> ><br> > M <a href=3D"tel:%2B32%20%280%29497%2002%2017%2031" = value=3D"+32497021731">+32 (0)497 02 17 31</a><br> ><br> > <a = href=3D"mailto:david.van.zeebroeck@brusselsairport.be">david.van.zeebroeck= @brusselsairport.be</a><br> ><br> ><br> ><br> > <a href=3D"http://www.brusselsairport.be/" = target=3D"_blank">www.brusselsairport.be</a><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> </div></div>> FOLLOW US ON:<br> <div class=3D"">><br> ><br> ><br> ><br> ><br> ><br> ><br> > Company Info<br> ><br> ><br> ><br> ><br> ><br> ><br> > ---------- Forwarded message ----------<br> > From: "Michal Skrivanek" < <a = href=3D"mailto:michal.skrivanek@redhat.com">michal.skrivanek@redhat.com</a= href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> = ><br> > Cc: "ovirt-users Users" < <a = href=3D"mailto:users@ovirt.org">users@ovirt.org</a> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > On 4 Apr 2014, at 15:14, Sander Grendelman wrote:<br> ><br> ><br> ><br> ><br> ><br> ><br> > Do you have power management configured?<br> ><br> ><br> > Was the "failed" host fenced/rebooted?<br> ><br> ><br> ><br> ><br> ><br> > On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> = ><br> > wrote:<br> ><br> ><br> > So... It is possible for a fully automatic migration of the VM to = another<br> > hypervisor in case Storage connection fails?<br> ><br> ><br> > How can we make this happen? Because for the moment, when we tested = the<br> > situation they stayed in pause state.<br> ><br> ><br> > (Test situation:<br> ><br> </div>> * Unplug the 2 fibre cables from the = hypervisor<br> > * VM's go in pause state<br> > * VM's stayed in pause state until the failure was = solved<br> <div><div class=3D"h5">><br> ><br> ><br> ><br> ><br> > as said before, it's not safe hence we (try to) not migrate = them.<br> ><br> ><br> > They only get paused when they actually access the storage which = may not be<br> > always the case. I.e. the storage connection is severed, host = deemed<br> > NonOperational and VMs are getting migrated from it, then some of = them will<br> > succeed if they didn't access that "bad" storage =85 the paused VMs = will<br> > remain (mostly, it can still happen that they appear paused = migrated on<br> > other host when the disk access occurs only at the last stage of = migration)<br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > so in other words, if you want to migrate the VMs without = interruption it's<br> > not sometimes possible<br> ><br> ><br> > if you are fine with the VMs restarted in short time on other host = then power<br> > management/fencing will help here<br> ><br> ><br> ><br> ><br> ><br> > Thanks,<br> ><br> ><br> > michal<br> ><br> ><br> ><br> ><br> ><br> ><br> > )<br> ><br> ><br> ><br> ><br> > They only returned when we restored the fiber connection to the = Hypervisor=85<br> ><br> ><br> ><br> ><br> ><br> > yes, since 3.3 we have the autoresume feature<br> ><br> ><br> ><br> ><br> ><br> > Thanks,<br> ><br> ><br> > michal<br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > Kind Regards,<br> ><br> > Koen<br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > 2014-04-04 13:52 GMT+02:00 Koen Vanoppen < <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> = >:<br> ><br> ><br> > So... It is possible for a fully automatic migration of the VM to = another<br> > hypervisor in case Storage connection fails?<br> ><br> ><br> > How can we make this happen? Because for the moment, when we tested = the<br> > situation they stayed in pause state.<br> ><br> ><br> > (Test situation:<br> ><br> </div></div>> * Unplug the 2 fibre cables from the = hypervisor<br> > * VM's go in pause state<br> > * VM's stayed in pause state until the failure was = solved<br> <div class=3D"HOEnZb"><div class=3D"h5">><br> ><br> > )<br> ><br> ><br> ><br> ><br> > They only returned when we restored the fiber connection to the = Hypervisor...<br> ><br> ><br> > Kind Regards,<br> ><br> > Koen<br> ><br> ><br> ><br> ><br> ><br> > 2014-04-03 16:53 GMT+02:00 Koen Vanoppen < <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> = >:<br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > ---------- Forwarded message ----------<br> > From: "Doron Fediuck" < <a = href=3D"mailto:dfediuck@redhat.com">dfediuck@redhat.com</a> ><br> > Date: Apr 3, 2014 4:51 PM<br> > Subject: Re: [Users] HA<br> ><br> ><br> > To: "Koen Vanoppen" < <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> = ><br> > Cc: "Omer Frenkel" < <a = href=3D"mailto:ofrenkel@redhat.com">ofrenkel@redhat.com</a> >, < = <a href=3D"mailto:users@ovirt.org">users@ovirt.org</a> >, = "Federico<br> > Simoncelli" < <a = href=3D"mailto:fsimonce@redhat.com">fsimonce@redhat.com</a> >, "Allon = Mureinik" < <a = href=3D"mailto:amureini@redhat.com">amureini@redhat.com</a><br> > ><br> ><br> ><br> ><br> > ----- Original Message -----<br> > > From: "Koen Vanoppen" < <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> = ><br> > > To: "Omer Frenkel" < <a = href=3D"mailto:ofrenkel@redhat.com">ofrenkel@redhat.com</a> >, <a = href=3D"mailto:users@ovirt.org">users@ovirt.org</a><br> > > Sent: Wednesday, April 2, 2014 4:17:36 PM<br> > > Subject: Re: [Users] HA<br> > ><br> > > Yes, indeed. I meant not-operational. Sorry.<br> > > So, if I understand this correctly. When we ever come in a = situation that<br> > > we<br> > > loose both storage connections on our hypervisor, we will have = to manually<br> > > restore the connections first?<br> > ><br> > > And thanx for the tip for speeding up thins :-).<br> > ><br> > > Kind regards,<br> > ><br> > > Koen<br> > ><br> > ><br> > > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < <a = href=3D"mailto:ofrenkel@redhat.com">ofrenkel@redhat.com</a> > :<br> > ><br> > ><br> > ><br> > ><br> > ><br> > > ----- Original Message -----<br> > > > From: "Koen Vanoppen" < <a = href=3D"mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a> = ><br> > > > To: <a = href=3D"mailto:users@ovirt.org">users@ovirt.org</a><br> > > > Sent: Wednesday, April 2, 2014 4:07:19 PM<br> > > > Subject: [Users] HA<br> > > ><br> > > > Dear All,<br> > > ><br> > > > Due our acceptance testing, we discovered something. = (Document will<br> > > > follow).<br> > > > When we disable one fiber path, no problem multipath = finds it way no<br> > > > pings<br> > > > are lost.<br> > > > BUT when we disabled both the fiber paths (so one of the = storage domain<br> > > > is<br> > > > gone on this host, but still available on the other = host), vms go in<br> > > > paused<br> > > > mode... He chooses a new SPM (can we speed this up?), = put's the host in<br> > > > non-responsive (can we speed this up, more important) and = the VM's stay<br> > > > on<br> > > > Paused mode... I would expect that they would be migrated = (yes, HA is<br> > ><br> > > i guess you mean the host moves to not-operational (in = contrast to<br> > > non-responsive)?<br> > > if so, the engine will not migrate vms that are paused to do = io error,<br> > > because of data corruption risk.<br> > ><br> > > to speed up you can look at the storage domain monitoring = timeout:<br> > > engine-config --get StorageDomainFalureTimeoutInMinutes<br> > ><br> > ><br> > > > enabled) to the other host and reboot there... Any = solution? We are still<br> > > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 = after the easter<br> > > > holiday.<br> > > ><br> > > > Kind Regards,<br> > > ><br> > > > Koen<br> > > ><br> ><br> > Hi Koen,<br> > Resuming from paused due to io issues is supported (adding relevant = folks).<br> > Regardless, if you did not define power management, you should = manually<br> > approve<br> > source host was rebooted in order for migration to proceed. = Otherwise we risk<br> > split-brain scenario.<br> ><br> > Doron<br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > _______________________________________________<br> > Users mailing list<br> > <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = target=3D"_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> ><br> ><br> ><br> ><br> ><br> > _______________________________________________<br> > Users mailing list<br> > <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = target=3D"_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> ><br> > _______________________________________________<br> > Users mailing list<br> > <a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = target=3D"_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> ><br> </div></div></blockquote></div><br></div> _______________________________________________<br>Users mailing = list<br><a = href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br>http://lists.ovirt.= org/mailman/listinfo/users<br></blockquote></div><br></body></html>= --Apple-Mail=_0681B1A7-9C69-4ECE-A5B9-25C543F80270--
participants (3)
-
Koen Vanoppen
-
Michal Skrivanek
-
Omer Frenkel