[ovirt-users] Power failure recovery

Wednesday, 7 June 2017

Hi all,

We've got a three-node "hyper-converged" oVirt 4.1.2 + GlusterFS cluster
on brand new hardware. It's not quite in production yet but, as these
things always go, we already have some important VMs on it.

Last night the servers (which aren't yet on UPS) suffered a brief power
failure. They all booted up cleanly and the hosted engine started up ~10
minutes afterwards (presumably once the engine GlusterFS volume was
sufficiently healed and the HA stack realised). So far so good.

As soon at the HostedEngine started up it tried to start all our Highly
Available VMs. Unfortunately our master storage domain was as yet
inactive as GlusterFS was presumably still trying to get it healed.
About 10 minutes later the master domain was activated and
"reconstructed" and an SPM was selected, but oVirt had tried and failed
to start all the HA VMs already and didn't bother trying again.

All the VMs started just fine this morning when we realised what
happened and logged-in to oVirt to start them.

Is this known and/or expected behaviour? Can we do anything to delay
starting HA VMs until the storage domains are there? Can we get oVirt to
keep trying to start HA VMs when they fail to start?

Is there a bug for this already or should I be raising one?

Thanks,
Chris

-- 
Chris Boot
bootc(a)bootc.net

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Power failure recovery