[ovirt-users] Seamless SAN HA failovers with oVirt?

Dan Yasny dyasny at gmail.com
Mon Jun 5 23:55:41 UTC 2017


As soon as yous NAS goes down, qemu running the VMs will start getting EIO
errors and VMs will pause, so as to not lose any data. If the NAS upgrade
isn't a very long procedure, you might as well complete the updates, enable
the NAS, and unpause the VMs.

On Mon, Jun 5, 2017 at 5:47 PM, Matthew Trent <
Matthew.Trent at lewiscountywa.gov> wrote:

> I'm using two TrueNAS HA SANs (FreeBSD-based ZFS) to provide storage via
> NFS to 7 oVirt boxes and about 25 VMs.
>
> For SAN system upgrades I've always scheduled a maintenance window, shut
> down all the oVirt stuff, upgraded the SANs, and spun everything back up.
> It's pretty disruptive, but I assumed that was the thing to do.
>
> However, in talking with the TrueNAS vendor they said the majority of
> their customers are using VMWare and they almost always do TrueNAS updates
> in production. They just upgrade one head of the TrueNAS HA pair then
> failover to the other head and upgrade it too. There's a 30-ish second
> pause in I/O while the disk arrays are taken over by the other HA head, but
> VMWare just tolerates it and continues without skipping a beat. They say
> this is standard procedure in the SAN world and virtualization systems
> should tolerate 30-60 seconds of I/O pause for HA failovers seamlessly.
>
> It sounds great to me, but I wanted to pick this lists' brain -- is anyone
> doing this with oVirt? Are you able to failover your HA SAN with 30-60
> seconds of no I/O without oVirt freaking out?
>
> If not, are there any tunables relating to this? I see the default NFS
> mount options look fairly tolerant (proto=tcp,timeo=600,retrans=6), but
> are there VDSM or sanlock or some other oVirt timeouts that will kick in
> and start putting storage domains into error states, fencing hosts or
> something before that? I've never timed anything, but I want to say my past
> experience is that ovirt hosted engine started showing errors almost
> immediately when we've had SAN issues in the past.
>
> Thanks!
>
> --
> Matthew Trent
> Network Engineer
> Lewis County IT Services
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170605/9f5b6ffb/attachment.html>


More information about the Users mailing list