<div dir="ltr"><div><div>The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem.<br></div>If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-).<br>
<br></div>Kind regards,<br><br><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Koen Vanoppen</b> <span dir="ltr"><<a href="mailto:vanoppen.koen@gmail.com">vanoppen.koen@gmail.com</a>></span><br>
Date: 2014-04-11 14:47 GMT+02:00<br>Subject: Re: [ovirt-users] [Users] HA<br>To: Michal Skrivanek <<a href="mailto:michal.skrivanek@redhat.com">michal.skrivanek@redhat.com</a>><br><br><br><div dir="ltr"><div><div>The Power management is configured correctly. And as long as the host who loses his storage isn't the SPM, there is no problem.<br>
</div>If I can make it work that, when the VM is pauzed it's get switched of and (HA-way) reboots itself. I'm perfectly happy :-).<br>
<br></div>Kind regards,<br><br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-11 9:37 GMT+02:00 Michal Skrivanek <span dir="ltr"><<a href="mailto:michal.skrivanek@redhat.com" target="_blank">michal.skrivanek@redhat.com</a>></span>:<div>
<div class="h5"><br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><br><div><div><div><div>On 11 Apr 2014, at 09:00, Koen Vanoppen wrote:</div>
<br><blockquote type="cite"><div dir="ltr"><div><div>Hi All,<br><br>Any news about this? DSM hook or anything?<br></div>Thanx!<br><br></div>Kind regards<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">
2014-04-09 9:37 GMT+02:00 Omer Frenkel <span dir="ltr"><<a href="mailto:ofrenkel@redhat.com" target="_blank">ofrenkel@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br>
<br>
----- Original Message -----<br>
> From: "Koen Vanoppen" <<a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a>><br>
> To: <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><br>
</div><div>> Sent: Tuesday, April 8, 2014 3:41:02 PM<br>
> Subject: Re: [Users] HA<br>
><br>
</div><div>> Or with other words, the SPM and the VM should move almost immediate after<br>
> the storage connections on the hypervisor are gone. I know, I'm asking to<br>
> much maybe, but we would be very happy :-) :-).<br>
><br>
> So sketch:<br>
><br>
> Mercury1 SPM<br>
> Mercury 2<br>
><br>
> Mercury1 loses both fibre connections --> goes in non-operational and the VM<br>
> goes in paused state and stays this way, until I manually reboot the host so<br>
> it fences.<br>
><br>
> What I would like is that when mercury 1 loses both fibre connections. He<br>
> fences immediate so the VM's are moved also almost instantly... If this is<br>
> possible... :-)<br>
><br>
> Kind regards and thanks for all the help!<br>
><br>
<br>
</div>Michal, is there a vdsm hook for vm moved to pause?<br>
if so, you could send KILL to it, and engine will identify vm was killed+HA,<br>
so it will be restarted, and no need to reboot the host, it will stay in non-operational until storage is fixed.<br></blockquote></div></div></blockquote><div><br></div></div></div>you have to differentiate - if only the VMs would be paused, yes, you can do anything (also change the err reporting policy to not pause the VM)</div>
<div>but if the host becomes non-operational then it simply doesn't work, vdsm got stuck somewhere (often in get blk device stats)</div><div>proper power management config should fence it</div><div><br></div><div>Thanks,</div>
<div>michal</div><div><div><div><br><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><div><br>
><br>
><br>
> 2014-04-08 14:26 GMT+02:00 Koen Vanoppen < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> > :<br>
><br>
><br>
><br>
> Ok,<br>
> Thanx already for all the help. I adapted some things for quicker respons:<br>
> engine-config --get FenceQuietTimeBetweenOperationsInSec-->180<br>
> engine-config --set FenceQuietTimeBetweenOperationsInSec=60<br>
><br>
> engine-config --get StorageDomainFalureTimeoutInMinutes-->180<br>
> engine-config --set StorageDomainFalureTimeoutInMinutes=1<br>
><br>
> engine-config --get SpmCommandFailOverRetries-->5<br>
> engine-config --set SpmCommandFailOverRetries<br>
><br>
> engine-config --get SPMFailOverAttempts-->3<br>
> engine-config --set SPMFailOverAttempts=1<br>
><br>
> engine-config --get NumberOfFailedRunsOnVds-->3<br>
> engine-config --set NumberOfFailedRunsOnVds=1<br>
><br>
> engine-config --get vdsTimeout-->180<br>
> engine-config --set vdsTimeout=30<br>
><br>
> engine-config --get VDSAttemptsToResetCount-->2<br>
> engine-config --set VDSAttemptsToResetCount=1<br>
><br>
> engine-config --get TimeoutToResetVdsInSeconds-->60<br>
> engine-config --set TimeoutToResetVdsInSeconds=30<br>
><br>
> Now the result of this is that when the VM is not running on the SPM that it<br>
> will migrate before going in pause mode.<br>
> But when we tried it, when the vm is running on the SPM, it get's in paused<br>
> mode (for safety reasons, I know ;-) ). And stays there until the host gets<br>
> MANUALLY fenced by rebooting it. So now my question is... How can I make the<br>
> hypervisor fence (so reboots, so vm is moved) quicker?<br>
><br>
> Kind regards,<br>
><br>
> Koen<br>
><br>
><br>
> 2014-04-04 16:28 GMT+02:00 Koen Vanoppen < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> > :<br>
><br>
><br>
><br>
><br>
><br>
> Ja das waar. Maar was aan't rijden... Dus ik stuur maar door dan :-). Ik heb<br>
> reeds de time out aangepast. Die stond op 5 min voor hij den time out ging<br>
> geven. Staat nu op 2 min<br>
> On Apr 4, 2014 4:14 PM, "David Van Zeebroeck" <<br>
> <a href="mailto:david.van.zeebroeck@brusselsairport.be" target="_blank">david.van.zeebroeck@brusselsairport.be</a> > wrote:<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> Ik heb ze ook he<br>
><br>
><br>
><br>
> Maar normaal had de fencing moeten werken als ik het zo lees<br>
><br>
> Dus daar is ergens iets verkeerd gelopen zo te lezen<br>
><br>
><br>
><br>
> From: Koen Vanoppen [mailto: <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> ]<br>
> Sent: vrijdag 4 april 2014 16:07<br>
> To: David Van Zeebroeck<br>
> Subject: Fwd: Re: [Users] HA<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> David Van Zeebroeck<br>
><br>
> Product Manager Unix Infrastructure<br>
><br>
> Information & Communication Technology<br>
><br>
> Brussels Airport Company<br>
><br>
> T <a href="tel:%2B32%20%280%292%20753%2066%2024" value="+3227536624" target="_blank">+32 (0)2 753 66 24</a><br>
><br>
> M <a href="tel:%2B32%20%280%29497%2002%2017%2031" value="+32497021731" target="_blank">+32 (0)497 02 17 31</a><br>
><br>
> <a href="mailto:david.van.zeebroeck@brusselsairport.be" target="_blank">david.van.zeebroeck@brusselsairport.be</a><br>
><br>
><br>
><br>
> <a href="http://www.brusselsairport.be/" target="_blank">www.brusselsairport.be</a><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
</div></div>> FOLLOW US ON:<br>
<div>><br>
><br>
><br>
><br>
><br>
><br>
><br>
> Company Info<br>
><br>
><br>
><br>
><br>
><br>
><br>
> ---------- Forwarded message ----------<br>
> From: "Michal Skrivanek" < <a href="mailto:michal.skrivanek@redhat.com" target="_blank">michal.skrivanek@redhat.com</a> ><br>
> Date: Apr 4, 2014 3:39 PM<br>
> Subject: Re: [Users] HA<br>
> To: "Koen Vanoppen" < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> ><br>
> Cc: "ovirt-users Users" < <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a> ><br>
><br>
><br>
><br>
><br>
><br>
><br>
> On 4 Apr 2014, at 15:14, Sander Grendelman wrote:<br>
><br>
><br>
><br>
><br>
><br>
><br>
> Do you have power management configured?<br>
><br>
><br>
> Was the "failed" host fenced/rebooted?<br>
><br>
><br>
><br>
><br>
><br>
> On Fri, Apr 4, 2014 at 2:21 PM, Koen Vanoppen < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> ><br>
> wrote:<br>
><br>
><br>
> So... It is possible for a fully automatic migration of the VM to another<br>
> hypervisor in case Storage connection fails?<br>
><br>
><br>
> How can we make this happen? Because for the moment, when we tested the<br>
> situation they stayed in pause state.<br>
><br>
><br>
> (Test situation:<br>
><br>
</div>> * Unplug the 2 fibre cables from the hypervisor<br>
> * VM's go in pause state<br>
> * VM's stayed in pause state until the failure was solved<br>
<div><div>><br>
><br>
><br>
><br>
><br>
> as said before, it's not safe hence we (try to) not migrate them.<br>
><br>
><br>
> They only get paused when they actually access the storage which may not be<br>
> always the case. I.e. the storage connection is severed, host deemed<br>
> NonOperational and VMs are getting migrated from it, then some of them will<br>
> succeed if they didn't access that "bad" storage … the paused VMs will<br>
> remain (mostly, it can still happen that they appear paused migrated on<br>
> other host when the disk access occurs only at the last stage of migration)<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> so in other words, if you want to migrate the VMs without interruption it's<br>
> not sometimes possible<br>
><br>
><br>
> if you are fine with the VMs restarted in short time on other host then power<br>
> management/fencing will help here<br>
><br>
><br>
><br>
><br>
><br>
> Thanks,<br>
><br>
><br>
> michal<br>
><br>
><br>
><br>
><br>
><br>
><br>
> )<br>
><br>
><br>
><br>
><br>
> They only returned when we restored the fiber connection to the Hypervisor…<br>
><br>
><br>
><br>
><br>
><br>
> yes, since 3.3 we have the autoresume feature<br>
><br>
><br>
><br>
><br>
><br>
> Thanks,<br>
><br>
><br>
> michal<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> Kind Regards,<br>
><br>
> Koen<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> 2014-04-04 13:52 GMT+02:00 Koen Vanoppen < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> >:<br>
><br>
><br>
> So... It is possible for a fully automatic migration of the VM to another<br>
> hypervisor in case Storage connection fails?<br>
><br>
><br>
> How can we make this happen? Because for the moment, when we tested the<br>
> situation they stayed in pause state.<br>
><br>
><br>
> (Test situation:<br>
><br>
</div></div>> * Unplug the 2 fibre cables from the hypervisor<br>
> * VM's go in pause state<br>
> * VM's stayed in pause state until the failure was solved<br>
<div><div>><br>
><br>
> )<br>
><br>
><br>
><br>
><br>
> They only returned when we restored the fiber connection to the Hypervisor...<br>
><br>
><br>
> Kind Regards,<br>
><br>
> Koen<br>
><br>
><br>
><br>
><br>
><br>
> 2014-04-03 16:53 GMT+02:00 Koen Vanoppen < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> >:<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> ---------- Forwarded message ----------<br>
> From: "Doron Fediuck" < <a href="mailto:dfediuck@redhat.com" target="_blank">dfediuck@redhat.com</a> ><br>
> Date: Apr 3, 2014 4:51 PM<br>
> Subject: Re: [Users] HA<br>
><br>
><br>
> To: "Koen Vanoppen" < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> ><br>
> Cc: "Omer Frenkel" < <a href="mailto:ofrenkel@redhat.com" target="_blank">ofrenkel@redhat.com</a> >, < <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a> >, "Federico<br>
> Simoncelli" < <a href="mailto:fsimonce@redhat.com" target="_blank">fsimonce@redhat.com</a> >, "Allon Mureinik" < <a href="mailto:amureini@redhat.com" target="_blank">amureini@redhat.com</a><br>
> ><br>
><br>
><br>
><br>
> ----- Original Message -----<br>
> > From: "Koen Vanoppen" < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> ><br>
> > To: "Omer Frenkel" < <a href="mailto:ofrenkel@redhat.com" target="_blank">ofrenkel@redhat.com</a> >, <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><br>
> > Sent: Wednesday, April 2, 2014 4:17:36 PM<br>
> > Subject: Re: [Users] HA<br>
> ><br>
> > Yes, indeed. I meant not-operational. Sorry.<br>
> > So, if I understand this correctly. When we ever come in a situation that<br>
> > we<br>
> > loose both storage connections on our hypervisor, we will have to manually<br>
> > restore the connections first?<br>
> ><br>
> > And thanx for the tip for speeding up thins :-).<br>
> ><br>
> > Kind regards,<br>
> ><br>
> > Koen<br>
> ><br>
> ><br>
> > 2014-04-02 15:14 GMT+02:00 Omer Frenkel < <a href="mailto:ofrenkel@redhat.com" target="_blank">ofrenkel@redhat.com</a> > :<br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > ----- Original Message -----<br>
> > > From: "Koen Vanoppen" < <a href="mailto:vanoppen.koen@gmail.com" target="_blank">vanoppen.koen@gmail.com</a> ><br>
> > > To: <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><br>
> > > Sent: Wednesday, April 2, 2014 4:07:19 PM<br>
> > > Subject: [Users] HA<br>
> > ><br>
> > > Dear All,<br>
> > ><br>
> > > Due our acceptance testing, we discovered something. (Document will<br>
> > > follow).<br>
> > > When we disable one fiber path, no problem multipath finds it way no<br>
> > > pings<br>
> > > are lost.<br>
> > > BUT when we disabled both the fiber paths (so one of the storage domain<br>
> > > is<br>
> > > gone on this host, but still available on the other host), vms go in<br>
> > > paused<br>
> > > mode... He chooses a new SPM (can we speed this up?), put's the host in<br>
> > > non-responsive (can we speed this up, more important) and the VM's stay<br>
> > > on<br>
> > > Paused mode... I would expect that they would be migrated (yes, HA is<br>
> ><br>
> > i guess you mean the host moves to not-operational (in contrast to<br>
> > non-responsive)?<br>
> > if so, the engine will not migrate vms that are paused to do io error,<br>
> > because of data corruption risk.<br>
> ><br>
> > to speed up you can look at the storage domain monitoring timeout:<br>
> > engine-config --get StorageDomainFalureTimeoutInMinutes<br>
> ><br>
> ><br>
> > > enabled) to the other host and reboot there... Any solution? We are still<br>
> > > using oVirt 3.3.1 , but we are planning a upgrade to 3.4 after the easter<br>
> > > holiday.<br>
> > ><br>
> > > Kind Regards,<br>
> > ><br>
> > > Koen<br>
> > ><br>
><br>
> Hi Koen,<br>
> Resuming from paused due to io issues is supported (adding relevant folks).<br>
> Regardless, if you did not define power management, you should manually<br>
> approve<br>
> source host was rebooted in order for migration to proceed. Otherwise we risk<br>
> split-brain scenario.<br>
><br>
> Doron<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Users mailing list<br>
> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Users mailing list<br>
> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Users mailing list<br>
> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
><br>
</div></div></blockquote></div><br></div>
_______________________________________________<br>Users mailing list<br><a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br><a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
</blockquote></div><br></div></div></div></blockquote></div></div></div><br></div>
</div><br></div>