[ovirt-users] Seamless SAN HA failovers with oVirt?
Yaniv Kaul
ykaul at redhat.com
Fri Jun 9 00:34:05 UTC 2017
On Tue, Jun 6, 2017 at 1:45 PM, Matthew Trent <
Matthew.Trent at lewiscountywa.gov> wrote:
> Thanks for the replies, all!
>
> Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way
> around that when failing between heads.
>
General comment - 30 seconds is A LOT. Many application-level IO might
timeout. Most storage strive to remain lower than that.
>
> Sven: In my experience with iX support, they have directed me to reboot
> the active node to initiate failover. There's "hactl takeover" and "hactl
> giveback" commends, but reboot seems to be their preferred method.
>
> VMs going into a paused state and resuming when storage is back online
> sounds great. As long as oVirt's pause/resume isn't significantly slower
> than the 30-or-so seconds the TrueNAS takes to complete its failover,
> that's a pretty tolerable interruption for my needs. So my next questions
> are:
>
> 1) Assuming the SAN failover DOES work correctly, can anyone comment on
> their experience with oVirt pausing/thawing VMs in an NFS-based
> active/passive SAN failover scenario? Does it work reliably without
> intervention? Is it reasonably fast?
>
oVirt is not pausing VMs. qemu-kvm pauses the specific VM that issues an IO
and that IO is stuck. The reason is that the VM cannot reliably continue
without a concern for data loss (the data is in-flight somewhere, right?
host kernel, NIC buffers, etc.)
>
> 2) Is there anything else in the oVirt stack that might cause it to "freak
> out" rather than gracefully pause/unpause VMs?
>
We do monitor storage domain health regularly. We are working on ignoring
short hiccups (see https://bugzilla.redhat.com/show_bug.cgi?id=1459370 for
example).
>
> 2a) Particularly: I'm running hosted engine on the same TrueNAS storage.
> Does that change anything WRT to timeouts and oVirt's HA and fencing and
> sanlock and such?
>
> 2b) Is there a limit to how long oVirt will wait for storage before doing
> something more drastic than just pausing VMs?
>
As explained above, generally, no. We can't do much tbh, and we'd like to
ensure there is no data loss.
That being said, in extreme cases hosts may become unresponsive - if you
have fencing they may even be fenced (there's an option to fence a host
which cannot renew its storage lease). We have not seen that happening for
quite some time, and I don't anticipate short storage hiccups to cause that
, though.
Depending on your application, it may be the right thing to do, btw.
Y.
>
> --
> Matthew Trent
> Network Engineer
> Lewis County IT Services
> 360.740.1247 - Helpdesk
> 360.740.3343 - Direct line
>
> ________________________________________
> From: users-bounces at ovirt.org <users-bounces at ovirt.org> on behalf of
> Chris Adams <cma at cmadams.net>
> Sent: Tuesday, June 6, 2017 7:21 AM
> To: users at ovirt.org
> Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt?
>
> Once upon a time, Juan Pablo <pablo.localhost at gmail.com> said:
> > Chris, if you have active-active with multipath: you upgrade one system,
> > reboot it, check it came active again, then upgrade the other.
>
> Yes, but that's still not how a TrueNAS (and most other low- to
> mid-range SANs) works, so is not relevant. The TrueNAS only has a
> single active node talking to the hard drives at a time, because having
> two nodes talking to the same storage at the same time is a hard problem
> to solve (typically requires custom hardware with active cache coherency
> and such).
>
> You can (and should) use multipath between servers and a TrueNAS, and
> that protects against NIC, cable, and switch failures, but does not help
> with a controller failure/reboot/upgrade. Multipath is also used to
> provide better bandwidth sharing between links than ethernet LAGs.
>
> --
> Chris Adams <cma at cmadams.net>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170608/56ca051e/attachment.html>
More information about the Users
mailing list