
There are PDU=E2=80=99s that you can monitor power draw per port and =
<o:p> </o:p></span></p><div><div =
This is a multipart message in MIME format. ------=_NextPart_000_0061_01D0F148.2CC6F4C0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable There are PDU=E2=80=99s that you can monitor power draw per port and = that would kind of tell you if a PSU failed as the load would be 0 =20 From: users-bounces@ovirt.org [mailto:users-bounces@ovirt.org] On Behalf = Of Alex Crow Sent: Thursday, September 17, 2015 12:31 PM To: Yaniv Kaul <ykaul@redhat.com> Cc: users@ovirt.org Subject: Re: [ovirt-users] Automatically migrate VM between hosts in the = same cluster =20 I don't really think this is practical: - If the PSU failed, your UPS could alert you. If you have one... If you have only one PSU in a host, a UPS is not going to stop you = losing all the VMs on that host. OK, if you had N+1 PSUs, you may be = able to monitor for this (IPMI/LOM/DRAC etc)and use the API to put a = host into maintenance. Also a lot of people rely on low-cost white-box = servers and decide that it's OK if a single PSU in a host dies, as, = well, we have HA to start on other hosts. If they have N+1 PSUs in the = hosts do they really have to migrate everything off? Swings and = roundabouts really. I'm also not sure I've seen any practical DC setups where a UPS can = monitor the load for every single attached physical machine and figure = out that one of the redundant PSUs in it has failed - I'd love to know = if there are as that would be really cool. - If the machine is going down in an ordinary flow, surely it can be = done.=20 Isn't that what "Maintenance mode" is for? =C3=82=20 Even if it was a network failure and the host was still up, how would = you live migrate a VM from a host you can't even talk to? =20 It could be suspended to disk (local) - if the disk is available. Then the decision if it is to be resumed from local disk or not (as it = might be HA'ed and is running elsewhere) need to be taken later, of = course. Yes, but that's not even remotely possible with Ovirt right now. I was = trying to be practical as the OP has only just started using Ovirt and I = think it might be a bit much to ask him to start coding up what he'd = like. =20 =C3=82=20 The only way you could do it was if you somehow magically knew far = enough in advance that the host was about to fail (!) and that gave = enough time to migrate the machines off. But how would you ever know = that "machine quux.bar.net <http://quux.bar.net> is going to fail in 7 = minutes"? =20 I completely agree there are situations in which you can't foresee the = failure.=C3=82=20 But in many, you can. In those cases, it makes sense for the host to = self-initiate 'move to maintenance' mode. The policy of what to do when = 'self-moving-to-maintenance-mode' could be pre-fetched from the engine. Y. Hmm, I would love that to be true. But I've seen so many so called = "corner-cases" that I now think the failure area in a datacenter is a = fractal with infinite corners. Yes, you could monitor SMART on local = drives, pick up uncorrected ECC errors, use "sensors" to check for = sagging voltages or high temps, but I don't think you can ever hope to = catch everything, and you could end up doing a migration "storm" for . = I've had more than enough of "Enterprise Spec" switches suddenly going = nuts and spamming corrupt MACs all over the LAN to know you can't ever = account for everything. I think it's better to adopt the model of redundancy in software and = services, so no-one even notices if a VM host goes away, there's always = something else to take up the slack. Just like the origins of the = Internet - the network should be dumb and the applications should cope = with it! Any infrastructure that can't cope with the loss of a few VMs = for a few minutes probably needs a refresh. Cheers Alex .=20 ------=_NextPart_000_0061_01D0F148.2CC6F4C0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" = xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" = xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta = http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8"><meta = name=3DGenerator content=3D"Microsoft Word 15 (filtered = medium)"><style><!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman",serif; color:black;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--></head><body bgcolor=3Dwhite = lang=3DEN-US link=3Dblue vlink=3Dpurple><div class=3DWordSection1><p = class=3DMsoNormal><span = style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'= that would kind of tell you if a PSU failed as the load would be = 0<o:p></o:p></span></p><p class=3DMsoNormal><span = style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in = 0in 0in'><p class=3DMsoNormal><b><span = style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowte= xt'>From:</span></b><span = style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowte= xt'> users-bounces@ovirt.org [mailto:users-bounces@ovirt.org] <b>On = Behalf Of </b>Alex Crow<br><b>Sent:</b> Thursday, September 17, 2015 = 12:31 PM<br><b>To:</b> Yaniv Kaul <ykaul@redhat.com><br><b>Cc:</b> = users@ovirt.org<br><b>Subject:</b> Re: [ovirt-users] Automatically = migrate VM between hosts in the same = cluster<o:p></o:p></span></p></div></div><p = class=3DMsoNormal><o:p> </o:p></p><p class=3DMsoNormal>I don't = really think this is practical:<br><br><br><o:p></o:p></p><blockquote = style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><div><div><div><p = class=3DMsoNormal>- If the PSU failed, your UPS could alert you. If you = have one...<o:p></o:p></p></div></div></div></div></blockquote><p = class=3DMsoNormal><br>If you have only one PSU in a host, a UPS is not = going to stop you losing all the VMs on that host. OK, if you had N+1 = PSUs, you may be able to monitor for this (IPMI/LOM/DRAC etc)and use the = API to put a host into maintenance. Also a lot of people rely on = low-cost white-box servers and decide that it's OK if a single PSU in a = host dies, as, well, we have HA to start on other hosts. If they have = N+1 PSUs in the hosts do they really have to migrate everything off? = Swings and roundabouts really.<br><br>I'm also not sure I've seen any = practical DC setups where a UPS can monitor the load for every single = attached physical machine and figure out that one of the redundant PSUs = in it has failed - I'd love to know if there are as that would be really = cool.<br><br><br><o:p></o:p></p><blockquote = style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><div><div><div><p = class=3DMsoNormal>- If the machine is going down in an ordinary flow, = surely it can be done. = <o:p></o:p></p></div></div></div></div></blockquote><p = class=3DMsoNormal><br>Isn't that what "Maintenance mode" is = for?<br><br><br><o:p></o:p></p><blockquote = style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><div><div><div><p = class=3DMsoNormal>=C3=82 <o:p></o:p></p></div><blockquote = style=3D'border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in = 6.0pt;margin-left:4.8pt;margin-right:0in'><p class=3DMsoNormal><br>Even = if it was a network failure and the host was still up, how would you = live migrate a VM from a host you can't even talk = to?<o:p></o:p></p></blockquote><div><p = class=3DMsoNormal><o:p> </o:p></p></div><div><p = class=3DMsoNormal>It could be suspended to disk (local) - if the disk is = available.<o:p></o:p></p></div><div><p class=3DMsoNormal>Then the = decision if it is to be resumed from local disk or not (as it might be = HA'ed and is running elsewhere) need to be taken later, of = course.<o:p></o:p></p></div></div></div></div></blockquote><p = class=3DMsoNormal><br>Yes, but that's not even remotely possible with = Ovirt right now. I was trying to be practical as the OP has only just = started using Ovirt and I think it might be a bit much to ask him to = start coding up what he'd like.<br><br><br><o:p></o:p></p><blockquote = style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><div><div><div><p = class=3DMsoNormal><o:p> </o:p></p></div><div><p = class=3DMsoNormal>=C3=82 <o:p></o:p></p></div><blockquote = style=3D'border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in = 6.0pt;margin-left:4.8pt;margin-right:0in'><p class=3DMsoNormal><br>The = only way you could do it was if you somehow magically knew far enough in = advance that the host was about to fail (!) and that gave enough time to = migrate the machines off. But how would you ever know that "machine = <a href=3D"http://quux.bar.net" target=3D"_blank">quux.bar.net</a> is = going to fail in 7 minutes"?<o:p></o:p></p></blockquote><div><p = class=3DMsoNormal><o:p> </o:p></p></div><div><p class=3DMsoNormal>I = completely agree there are situations in which you can't foresee the = failure.=C3=82 <o:p></o:p></p></div><div><p class=3DMsoNormal>But = in many, you can. In those cases, it makes sense for the host to = self-initiate 'move to maintenance' mode. The policy of what to do when = 'self-moving-to-maintenance-mode' could be pre-fetched from the = engine.<o:p></o:p></p></div><div><p = class=3DMsoNormal>Y.<o:p></o:p></p></div></div></div></div></blockquote><= p class=3DMsoNormal><br>Hmm, I would love that to be true. But I've seen = so many so called "corner-cases" that I now think the failure = area in a datacenter is a fractal with infinite corners. Yes, you could = monitor SMART on local drives, pick up uncorrected ECC errors, use = "sensors" to check for sagging voltages or high temps, but I = don't think you can ever hope to catch everything, and you could end up = doing a migration "storm" for . I've had more than enough of = "Enterprise Spec" switches suddenly going nuts and spamming = corrupt MACs all over the LAN to know you can't ever account for = everything.<br><br>I think it's better to adopt the model of redundancy = in software and services, so no-one even notices if a VM host goes away, = there's always something else to take up the slack. Just like the = origins of the Internet - the network should be dumb and the = applications should cope with it! Any infrastructure that can't cope = with the loss of a few VMs for a few minutes probably needs a = refresh.<br><br>Cheers<br><br>Alex<br><br><br><br><br><br>. = <o:p></o:p></p></div></body></html> ------=_NextPart_000_0061_01D0F148.2CC6F4C0--