<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jun 6, 2017 at 1:45 PM, Matthew Trent <span dir="ltr"><<a href="mailto:Matthew.Trent@lewiscountywa.gov" target="_blank">Matthew.Trent@lewiscountywa.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Thanks for the replies, all!<br>
<br>
Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way around that when failing between heads.<br></blockquote><div><br></div><div>General comment - 30 seconds is A LOT. Many application-level IO might timeout. Most storage strive to remain lower than that.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Sven: In my experience with iX support, they have directed me to reboot the active node to initiate failover. There's "hactl takeover" and "hactl giveback" commends, but reboot seems to be their preferred method.<br>
<br>
VMs going into a paused state and resuming when storage is back online sounds great. As long as oVirt's pause/resume isn't significantly slower than the 30-or-so seconds the TrueNAS takes to complete its failover, that's a pretty tolerable interruption for my needs. So my next questions are:<br>
<br>
1) Assuming the SAN failover DOES work correctly, can anyone comment on their experience with oVirt pausing/thawing VMs in an NFS-based active/passive SAN failover scenario? Does it work reliably without intervention? Is it reasonably fast?<br></blockquote><div><br></div><div>oVirt is not pausing VMs. qemu-kvm pauses the specific VM that issues an IO and that IO is stuck. The reason is that the VM cannot reliably continue without a concern for data loss (the data is in-flight somewhere, right? host kernel, NIC buffers, etc.)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
2) Is there anything else in the oVirt stack that might cause it to "freak out" rather than gracefully pause/unpause VMs?<br></blockquote><div><br></div><div>We do monitor storage domain health regularly. We are working on ignoring short hiccups (see <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1459370">https://bugzilla.redhat.com/show_bug.cgi?id=1459370</a> for example).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
2a) Particularly: I'm running hosted engine on the same TrueNAS storage. Does that change anything WRT to timeouts and oVirt's HA and fencing and sanlock and such?<br>
<br>
2b) Is there a limit to how long oVirt will wait for storage before doing something more drastic than just pausing VMs?<br></blockquote><div><br></div><div>As explained above, generally, no. We can't do much tbh, and we'd like to ensure there is no data loss.</div><div>That being said, in extreme cases hosts may become unresponsive - if you have fencing they may even be fenced (there's an option to fence a host which cannot renew its storage lease). We have not seen that happening for quite some time, and I don't anticipate short storage hiccups to cause that , though.</div><div>Depending on your application, it may be the right thing to do, btw.</div><div>Y.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="gmail-"><br>
--<br>
Matthew Trent<br>
Network Engineer<br>
Lewis County IT Services<br>
</span><a href="tel:360.740.1247" value="+13607401247">360.740.1247</a> - Helpdesk<br>
<a href="tel:360.740.3343" value="+13607403343">360.740.3343</a> - Direct line<br>
<br>
______________________________<wbr>__________<br>
From: <a href="mailto:users-bounces@ovirt.org">users-bounces@ovirt.org</a> <<a href="mailto:users-bounces@ovirt.org">users-bounces@ovirt.org</a>> on behalf of Chris Adams <<a href="mailto:cma@cmadams.net">cma@cmadams.net</a>><br>
Sent: Tuesday, June 6, 2017 7:21 AM<br>
To: <a href="mailto:users@ovirt.org">users@ovirt.org</a><br>
Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt?<br>
<div class="gmail-HOEnZb"><div class="gmail-h5"><br>
Once upon a time, Juan Pablo <<a href="mailto:pablo.localhost@gmail.com">pablo.localhost@gmail.com</a>> said:<br>
> Chris, if you have active-active with multipath: you upgrade one system,<br>
> reboot it, check it came active again, then upgrade the other.<br>
<br>
Yes, but that's still not how a TrueNAS (and most other low- to<br>
mid-range SANs) works, so is not relevant. The TrueNAS only has a<br>
single active node talking to the hard drives at a time, because having<br>
two nodes talking to the same storage at the same time is a hard problem<br>
to solve (typically requires custom hardware with active cache coherency<br>
and such).<br>
<br>
You can (and should) use multipath between servers and a TrueNAS, and<br>
that protects against NIC, cable, and switch failures, but does not help<br>
with a controller failure/reboot/upgrade. Multipath is also used to<br>
provide better bandwidth sharing between links than ethernet LAGs.<br>
<br>
--<br>
Chris Adams <<a href="mailto:cma@cmadams.net">cma@cmadams.net</a>><br>
______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
</div></div></blockquote></div><br></div></div>