All,
After a power failure, and generator failure I lost my cluster, and the Hosted engine
refused to restart after power was restored. I would expect, once storage comes up that
the hosted engine comes back online without too much of a fight. In practice because the
SPM went down as well, there is no (clearly documented) way to clear any of the stale
locks, and no way to recover both the hosted engine and the cluster.
I have spent the last 12 hours trying to get a functional hosted-engine back online, on a
new node and each attempt hits a new error, from the installer not understanding that
16384mb of dedicated VM memory out of 192GB free on the host is indeed bigger than 4096MB,
to ansible dying on an error like this "Error while executing action: Cannot add
Storage Connection. Storage connection already exists."
The memory error referenced above shows up as:
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg":
"Available memory ( {'failed': False, 'changed': False,
'ansible_facts': {u'max_mem': u'180746'}}MB ) is less then the
minimal requirement (4096MB). Be aware that 512MB is reserved for the host and cannot be
allocated to the engine VM."}
That is what I typically get when I try the steps outlined in the KB "CHAPTER 7.
RECOVERING A SELF-HOSTED ENGINE FROM AN EXISTING BACKUP" from the RH Customer portal.
I have tried this numerous ways, and the cluster still remains in a bad state, with the
hosted engine being 100% inoperable.
What I do have are the two host that are part of the cluster and can host the engine, and
backups of the original hosted engine, both disk and engine-backup generated. I am not
sure what I can do next, to recover this cluster, any suggestions would be apricated.
Regards,
Seann
Show replies by date