On 4 Oct 2016, at 09:51, Gary Lloyd <g.lloyd(a)keele.ac.uk>
wrote:
Hi
We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct Luns for all our VMs.
At the weekend during early hours an Equallogic controller failed over to its standby on
one of our arrays and this caused about 20 of our VMs to be paused due to IO problems.
I have also noticed that this happens during Equallogic firmware upgrades since we moved
onto Ovirt 3.65.
As recommended by Dell disk timeouts within the VMs are set to 60 seconds when they are
hosted on an EqualLogic SAN.
Is there any other timeout value that we can configure in vdsm.conf to stop VMs from
getting paused when a controller fails over ?
not really. but things are not so different when you look at it from the guest
perspective. If the intention is to hide the fact that there is a problem and the guest
should just see a delay (instead of dealing with error) then pausing and unpausing is the
right behavior. From guest point of view this is just a delay it sees.
Also is there anything that we can tweak to automatically unpause the VMs once
connectivity with the arrays is re-established ?
that should happen when the storage domain monitoring detects error and then
reactivate(http://gerrit.ovirt.org/16244). It may be that since you have direct luns it’s
not working with those….dunno, storage people should chime in I guess...
Thanks,
michal
At the moment we are running a customized version of storageServer.py, as Ovirt has yet
to include iscsi multipath support for Direct Luns out of the box.
Many Thanks
Gary Lloyd
________________________________________________
I.T. Systems:Keele University
Finance & IT Directorate
Keele:Staffs:IC1 Building:ST5 5NB:UK
+44 1782 733063
________________________________________________
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users