Re: [ovirt-users] Automatically migrate VM between hosts in the same cluster

17 Sep 2015

      ...
There are PDU=E2=80=99s that you can monitor power draw per port and =
...
<o:p> </o:p></span></p><div><div =
This is a multipart message in MIME format.

------=_NextPart_000_0061_01D0F148.2CC6F4C0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

There are PDU=E2=80=99s that you can monitor power draw per port and =
that would kind of tell you if a PSU failed as the load would be 0

=20

From: users-bounces@ovirt.org [mailto:users-bounces@ovirt.org] On Behalf =
Of Alex Crow
Sent: Thursday, September 17, 2015 12:31 PM
To: Yaniv Kaul <ykaul@redhat.com>
Cc: users@ovirt.org
Subject: Re: [ovirt-users] Automatically migrate VM between hosts in the =
same cluster

=20

I don't really think this is practical:

- If the PSU failed, your UPS could alert you. If you have one...

If you have only one PSU in a host, a UPS is not going to stop you =
losing all the VMs on that host. OK, if you had N+1 PSUs, you may be =
able to monitor for this (IPMI/LOM/DRAC etc)and use the API to put a =
host into maintenance. Also a lot of people rely on low-cost white-box =
servers and decide that it's OK if a single PSU in a host dies, as, =
well, we have HA to start on other hosts. If they have N+1 PSUs in the =
hosts do they really have to migrate everything off? Swings and =
roundabouts really.

I'm also not sure I've seen any practical DC setups where a UPS can =
monitor the load for every single attached physical machine and figure =
out that one of the redundant PSUs in it has failed - I'd love to know =
if there are as that would be really cool.

- If the machine is going down in an ordinary flow, surely it can be =
done.=20

Isn't that what "Maintenance mode" is for?

=C3=82=20

Even if it was a network failure and the host was still up, how would =
you live migrate a VM from a host you can't even talk to?

=20

It could be suspended to disk (local) - if the disk is available.

Then the decision if it is to be resumed from local disk or not (as it =
might be HA'ed and is running elsewhere) need to be taken later, of =
course.

Yes, but that's not even remotely possible with Ovirt right now. I was =
trying to be practical as the OP has only just started using Ovirt and I =
think it might be a bit much to ask him to start coding up what he'd =
like.

=20

=C3=82=20

The only way you could do it was if you somehow magically knew far =
enough in advance that the host was about to fail (!) and that gave =
enough time to migrate the machines off. But how would you ever know =
that "machine quux.bar.net <http://quux.bar.net>  is going to fail in 7 =
minutes"?

=20

I completely agree there are situations in which you can't foresee the =
failure.=C3=82=20

But in many, you can. In those cases, it makes sense for the host to =
self-initiate 'move to maintenance' mode. The policy of what to do when =
'self-moving-to-maintenance-mode' could be pre-fetched from the engine.

Y.

Hmm, I would love that to be true. But I've seen so many so called =
"corner-cases" that I now think the failure area in a datacenter is a =
fractal with infinite corners. Yes, you could monitor SMART on local =
drives, pick up uncorrected ECC errors, use "sensors" to check for =
sagging voltages or high temps, but I don't think you can ever hope to =
catch everything, and you could end up doing a migration "storm" for . =
I've had more than enough of "Enterprise Spec" switches suddenly going =
nuts and spamming corrupt MACs all over the LAN to know you can't ever =
account for everything.

I think it's better to adopt the model of redundancy in software and =
services, so no-one even notices if a VM host goes away, there's always =
something else to take up the slack. Just like the origins of the =
Internet - the network should be dumb and the applications should cope =
with it! Any infrastructure that can't cope with the loss of a few VMs =
for a few minutes probably needs a refresh.

Cheers

Alex

.=20

------=_NextPart_000_0061_01D0F148.2CC6F4C0
Content-Type: text/html;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta =
http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8"><meta =
name=3DGenerator content=3D"Microsoft Word 15 (filtered =
medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;
	color:black;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-reply;
	font-family:"Calibri",sans-serif;
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body bgcolor=3Dwhite =
lang=3DEN-US link=3Dblue vlink=3Dpurple><div class=3DWordSection1><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'=
that would kind of tell you if a PSU failed as the load would be =
0<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'=
style=3D'border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in =
0in 0in'><p class=3DMsoNormal><b><span =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowte=
xt'>From:</span></b><span =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowte=
xt'> users-bounces@ovirt.org [mailto:users-bounces@ovirt.org] <b>On =
Behalf Of </b>Alex Crow<br><b>Sent:</b> Thursday, September 17, 2015 =
12:31 PM<br><b>To:</b> Yaniv Kaul <ykaul@redhat.com><br><b>Cc:</b> =
users@ovirt.org<br><b>Subject:</b> Re: [ovirt-users] Automatically =
migrate VM between hosts in the same =
cluster<o:p></o:p></span></p></div></div><p =
class=3DMsoNormal><o:p> </o:p></p><p class=3DMsoNormal>I don't =
really think this is practical:<br><br><br><o:p></o:p></p><blockquote =
style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><div><div><div><p =
class=3DMsoNormal>- If the PSU failed, your UPS could alert you. If you =
have one...<o:p></o:p></p></div></div></div></div></blockquote><p =
class=3DMsoNormal><br>If you have only one PSU in a host, a UPS is not =
going to stop you losing all the VMs on that host. OK, if you had N+1 =
PSUs, you may be able to monitor for this (IPMI/LOM/DRAC etc)and use the =
API to put a host into maintenance. Also a lot of people rely on =
low-cost white-box servers and decide that it's OK if a single PSU in a =
host dies, as, well, we have HA to start on other hosts. If they have =
N+1 PSUs in the hosts do they really have to migrate everything off? =
Swings and roundabouts really.<br><br>I'm also not sure I've seen any =
practical DC setups where a UPS can monitor the load for every single =
attached physical machine and figure out that one of the redundant PSUs =
in it has failed - I'd love to know if there are as that would be really =
cool.<br><br><br><o:p></o:p></p><blockquote =
style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><div><div><div><p =
class=3DMsoNormal>- If the machine is going down in an ordinary flow, =
surely it can be done. =
<o:p></o:p></p></div></div></div></div></blockquote><p =
class=3DMsoNormal><br>Isn't that what "Maintenance mode" is =
for?<br><br><br><o:p></o:p></p><blockquote =
style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><div><div><div><p =
class=3DMsoNormal>=C3=82 <o:p></o:p></p></div><blockquote =
style=3D'border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in =
6.0pt;margin-left:4.8pt;margin-right:0in'><p class=3DMsoNormal><br>Even =
if it was a network failure and the host was still up, how would you =
live migrate a VM from a host you can't even talk =
to?<o:p></o:p></p></blockquote><div><p =
class=3DMsoNormal><o:p> </o:p></p></div><div><p =
class=3DMsoNormal>It could be suspended to disk (local) - if the disk is =
available.<o:p></o:p></p></div><div><p class=3DMsoNormal>Then the =
decision if it is to be resumed from local disk or not (as it might be =
HA'ed and is running elsewhere) need to be taken later, of =
course.<o:p></o:p></p></div></div></div></div></blockquote><p =
class=3DMsoNormal><br>Yes, but that's not even remotely possible with =
Ovirt right now. I was trying to be practical as the OP has only just =
started using Ovirt and I think it might be a bit much to ask him to =
start coding up what he'd like.<br><br><br><o:p></o:p></p><blockquote =
style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><div><div><div><p =
class=3DMsoNormal><o:p> </o:p></p></div><div><p =
class=3DMsoNormal>=C3=82 <o:p></o:p></p></div><blockquote =
style=3D'border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in =
6.0pt;margin-left:4.8pt;margin-right:0in'><p class=3DMsoNormal><br>The =
only way you could do it was if you somehow magically knew far enough in =
advance that the host was about to fail (!) and that gave enough time to =
migrate the machines off. But how would you ever know that "machine =
<a href=3D"http://quux.bar.net" target=3D"_blank">quux.bar.net</a> is =
going to fail in 7 minutes"?<o:p></o:p></p></blockquote><div><p =
class=3DMsoNormal><o:p> </o:p></p></div><div><p class=3DMsoNormal>I =
completely agree there are situations in which you can't foresee the =
failure.=C3=82 <o:p></o:p></p></div><div><p class=3DMsoNormal>But =
in many, you can. In those cases, it makes sense for the host to =
self-initiate 'move to maintenance' mode. The policy of what to do when =
'self-moving-to-maintenance-mode' could be pre-fetched from the =
engine.<o:p></o:p></p></div><div><p =
class=3DMsoNormal>Y.<o:p></o:p></p></div></div></div></div></blockquote><=
p class=3DMsoNormal><br>Hmm, I would love that to be true. But I've seen =
so many so called "corner-cases" that I now think the failure =
area in a datacenter is a fractal with infinite corners. Yes, you could =
monitor SMART on local drives, pick up uncorrected ECC errors, use =
"sensors" to check for sagging voltages or high temps, but I =
don't think you can ever hope to catch everything, and you could end up =
doing a migration "storm" for . I've had more than enough of =
"Enterprise Spec" switches suddenly going nuts and spamming =
corrupt MACs all over the LAN to know you can't ever account for =
everything.<br><br>I think it's better to adopt the model of redundancy =
in software and services, so no-one even notices if a VM host goes away, =
there's always something else to take up the slack. Just like the =
origins of the Internet - the network should be dumb and the =
applications should cope with it! Any infrastructure that can't cope =
with the loss of a few VMs for a few minutes probably needs a =
refresh.<br><br>Cheers<br><br>Alex<br><br><br><br><br><br>. =
<o:p></o:p></p></div></body></html>
------=_NextPart_000_0061_01D0F148.2CC6F4C0--

Re: [ovirt-users] Automatically migrate VM between hosts in the same cluster

matthew lagoe