Hi !
As you pointed, it wouldn't be safe if ovirt only tried to communicate with the host.
This is why Ovirt (and other cluster software) include a way to power cycle hosts. If host
is not responding, for any reason (software or hardware failure), before cluster migrates
vms, host will be fenced (power off and on again) by an other host in the cluster.
You need some special hardware to manage power cycling, and you need to configure it on
"power management" section of host parameters.
On the HP servers we use, we can configure ILO as power management method. This way, Ovirt
can control power of the hosts as if we used the server web interface to power cycle the
server. It works very well for us (since ovirt 3.4 or 3.5) and does not require some
additional hardware, but I think it will not work if a hardware failure has an impact on
ILO hardware (don't know if it may happen).
You also can add some specific hardware to manage power supplies. I used to buy APC
masterswitch (replaced by apc AP7900) for this purpose, but this requires some additional
hardware and is not as convenient as ILO (IMHO).
In both cases, you have to ensure that you always provide power to your servers, your
switches, and your power management hardware with dual power supplies, connected to
different power lines, different UPS, and so on because if a single power failure can turn
off both your server and your power cycle mechanism, then Ovirt won't be able to
recover.
----- Mail original -----
De: "Kostyrev Aleksandr" <kostyrev(a)tutu.ru>
À: users(a)ovirt.org
Envoyé: Dimanche 18 Janvier 2015 11:17:27
Objet: [ovirt-users] ovirt 3.5 and power outages
good day, everybody!
I've got three node cluster with power management enabled.
As far as I understood to restart vms on the other host in the cluster
in case when host suffered from power outage
the engine has to be able to connect to host (specifically to vdsm) to
be sure that host has been rebooted and it's not running any vms.
But what if I'm running a lot of vms on the host and it's 3 o'clock in
the morning and
1) engine has rebooted the host but the host cannot boot because of some
hardware problem or new kernel gives a kernel panic?
2) the host's motherboard burned out and it cannot get booted
so the engine will never connect to host and therefore all the vms that
were running on that host won't migrate to other node in the cluster.
So my cluster in that case is useless 'cause I'm not there to press
"confirm host has been rebooted'.
--
С уважением,
Костырев Александр,
системный администратор
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--
Ce courriel et tous les fichiers attachés qu'il contient sont confidentiels et
destinés exclusivement à la personne à laquelle ils sont adressés. Si vous avez reçu ce
courriel par erreur, merci de le retourner à son expéditeur et de le détruire. Il est
rappelé que tout message électronique est susceptible d'alteration au cours de son
acheminement sur internet. Seuls les documents officiels du SDIS sont de nature à engager
sa responsabilité. Les idées ou opinions présentées dans ce courriel sont celles de son
auteur et ne représentent pas nécessairement celles du SDIS de la Gironde.