Memory error on the storage array that cause postgres to become corrupted. Attempting to restore from backup before the DIMM was replaced was ill advised and now the whole HE is trash.

We're having a long talk with our vendor of the affected piece of kit.

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. callum@well.ox.ac.uk

On 17 Dec 2018, at 15:45, femi adegoke <ovirt@fateknollogee.com> wrote:

Curious question: How did the hardware corrupt the HE?

On Dec 17 2018, at 5:02 am, Callum Smith <callum@well.ox.ac.uk> wrote:
Dear All,

So we've had some major disk corruption on our hosted engine (hardware to blame), and we have taken backups. However, the hosted-engine VM will no longer boot at all, database is thoroughly corrupted, and we need to rebuild the thing. Just a sanity check on the best route:

Preamble:
VMs are still running fine - only hosted engine affected
VMs are distributed across our entire 3 node cluster
All 3 nodes are registered as hosted engine candidates

1. Do another hosted-engine --deploy on one of the existing hosts, and then restore the backup into that
2. Build a new host, deploy the hosted-engine, then restore a backup on a fresh node

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org