[ovirt-users] Automatically migrate VM between hosts in the same cluster
Alex Crow
acrow at integrafin.co.uk
Thu Sep 17 19:30:41 UTC 2015
I don't really think this is practical:
> - If the PSU failed, your UPS could alert you. If you have one...
If you have only one PSU in a host, a UPS is not going to stop you
losing all the VMs on that host. OK, if you had N+1 PSUs, you may be
able to monitor for this (IPMI/LOM/DRAC etc)and use the API to put a
host into maintenance. Also a lot of people rely on low-cost white-box
servers and decide that it's OK if a single PSU in a host dies, as,
well, we have HA to start on other hosts. If they have N+1 PSUs in the
hosts do they really have to migrate everything off? Swings and
roundabouts really.
I'm also not sure I've seen any practical DC setups where a UPS can
monitor the load for every single attached physical machine and figure
out that one of the redundant PSUs in it has failed - I'd love to know
if there are as that would be really cool.
> - If the machine is going down in an ordinary flow, surely it can be
> done.
Isn't that what "Maintenance mode" is for?
>
> Even if it was a network failure and the host was still up, how
> would you live migrate a VM from a host you can't even talk to?
>
>
> It could be suspended to disk (local) - if the disk is available.
> Then the decision if it is to be resumed from local disk or not (as it
> might be HA'ed and is running elsewhere) need to be taken later, of
> course.
Yes, but that's not even remotely possible with Ovirt right now. I was
trying to be practical as the OP has only just started using Ovirt and I
think it might be a bit much to ask him to start coding up what he'd like.
>
>
> The only way you could do it was if you somehow magically knew far
> enough in advance that the host was about to fail (!) and that
> gave enough time to migrate the machines off. But how would you
> ever know that "machine quux.bar.net <http://quux.bar.net> is
> going to fail in 7 minutes"?
>
>
> I completely agree there are situations in which you can't foresee the
> failure.
> But in many, you can. In those cases, it makes sense for the host to
> self-initiate 'move to maintenance' mode. The policy of what to do
> when 'self-moving-to-maintenance-mode' could be pre-fetched from the
> engine.
> Y.
Hmm, I would love that to be true. But I've seen so many so called
"corner-cases" that I now think the failure area in a datacenter is a
fractal with infinite corners. Yes, you could monitor SMART on local
drives, pick up uncorrected ECC errors, use "sensors" to check for
sagging voltages or high temps, but I don't think you can ever hope to
catch everything, and you could end up doing a migration "storm" for .
I've had more than enough of "Enterprise Spec" switches suddenly going
nuts and spamming corrupt MACs all over the LAN to know you can't ever
account for everything.
I think it's better to adopt the model of redundancy in software and
services, so no-one even notices if a VM host goes away, there's always
something else to take up the slack. Just like the origins of the
Internet - the network should be dumb and the applications should cope
with it! Any infrastructure that can't cope with the loss of a few VMs
for a few minutes probably needs a refresh.
Cheers
Alex
.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150917/20c7f1a7/attachment-0001.html>
More information about the Users
mailing list