[ovirt-users] Seamless SAN HA failovers with oVirt?

Doug Ingham dougti at gmail.com
Tue Jun 6 23:41:25 UTC 2017


Hey Matthew,
 I think it's VDSM that handles the pausing & resuming of the VMs.

An analogous small-scale scenario...the Gluster layer for one of our
smaller oVirt clusters temporarily lost quorum the other week, locking all
I/O for about 30 minutes. The VMs all went into pause & then resumed
automatically when quorum was restored.

To my surprise/relief, not a single one of the 10 odd VMs reported any
errors.

YMMV

Doug

On 6 June 2017 at 13:45, Matthew Trent <Matthew.Trent at lewiscountywa.gov>
wrote:

> Thanks for the replies, all!
>
> Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way
> around that when failing between heads.
>
> Sven: In my experience with iX support, they have directed me to reboot
> the active node to initiate failover. There's "hactl takeover" and "hactl
> giveback" commends, but reboot seems to be their preferred method.
>
> VMs going into a paused state and resuming when storage is back online
> sounds great. As long as oVirt's pause/resume isn't significantly slower
> than the 30-or-so seconds the TrueNAS takes to complete its failover,
> that's a pretty tolerable interruption for my needs. So my next questions
> are:
>
> 1) Assuming the SAN failover DOES work correctly, can anyone comment on
> their experience with oVirt pausing/thawing VMs in an NFS-based
> active/passive SAN failover scenario? Does it work reliably without
> intervention? Is it reasonably fast?
>
> 2) Is there anything else in the oVirt stack that might cause it to "freak
> out" rather than gracefully pause/unpause VMs?
>
> 2a) Particularly: I'm running hosted engine on the same TrueNAS storage.
> Does that change anything WRT to timeouts and oVirt's HA and fencing and
> sanlock and such?
>
> 2b) Is there a limit to how long oVirt will wait for storage before doing
> something more drastic than just pausing VMs?
>
> --
> Matthew Trent
> Network Engineer
> Lewis County IT Services
> 360.740.1247 - Helpdesk
> 360.740.3343 - Direct line
>
> ________________________________________
> From: users-bounces at ovirt.org <users-bounces at ovirt.org> on behalf of
> Chris Adams <cma at cmadams.net>
> Sent: Tuesday, June 6, 2017 7:21 AM
> To: users at ovirt.org
> Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt?
>
> Once upon a time, Juan Pablo <pablo.localhost at gmail.com> said:
> > Chris, if you have active-active with multipath: you upgrade one system,
> > reboot it, check it came active again, then upgrade the other.
>
> Yes, but that's still not how a TrueNAS (and most other low- to
> mid-range SANs) works, so is not relevant.  The TrueNAS only has a
> single active node talking to the hard drives at a time, because having
> two nodes talking to the same storage at the same time is a hard problem
> to solve (typically requires custom hardware with active cache coherency
> and such).
>
> You can (and should) use multipath between servers and a TrueNAS, and
> that protects against NIC, cable, and switch failures, but does not help
> with a controller failure/reboot/upgrade.  Multipath is also used to
> provide better bandwidth sharing between links than ethernet LAGs.
>
> --
> Chris Adams <cma at cmadams.net>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



-- 
Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170606/8426fb98/attachment.html>


More information about the Users mailing list