Disk corruption on hosted engine

Dear All, So we've had some major disk corruption on our hosted engine (hardware to blame), and we have taken backups. However, the hosted-engine VM will no longer boot at all, database is thoroughly corrupted, and we need to rebuild the thing. Just a sanity check on the best route: Preamble: VMs are still running fine - only hosted engine affected VMs are distributed across our entire 3 node cluster All 3 nodes are registered as hosted engine candidates 1. Do another hosted-engine --deploy on one of the existing hosts, and then restore the backup into that 2. Build a new host, deploy the hosted-engine, then restore a backup on a fresh node Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>

Curious question: How did the hardware corrupt the HE? On Dec 17 2018, at 5:02 am, Callum Smith <callum@well.ox.ac.uk> wrote:
Dear All,
So we've had some major disk corruption on our hosted engine (hardware to blame), and we have taken backups. However, the hosted-engine VM will no longer boot at all, database is thoroughly corrupted, and we need to rebuild the thing. Just a sanity check on the best route:
Preamble: VMs are still running fine - only hosted engine affected VMs are distributed across our entire 3 node cluster All 3 nodes are registered as hosted engine candidates
1. Do another hosted-engine --deploy on one of the existing hosts, and then restore the backup into that 2. Build a new host, deploy the hosted-engine, then restore a backup on a fresh node
Regards, Callum
-- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. callum@well.ox.ac.uk (mailto:callum@well.ox.ac.uk)
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VYQM7WWIBP7Z7F...

Memory error on the storage array that cause postgres to become corrupted. Attempting to restore from backup before the DIMM was replaced was ill advised and now the whole HE is trash. We're having a long talk with our vendor of the affected piece of kit. Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk> On 17 Dec 2018, at 15:45, femi adegoke <ovirt@fateknollogee.com<mailto:ovirt@fateknollogee.com>> wrote: Curious question: How did the hardware corrupt the HE? On Dec 17 2018, at 5:02 am, Callum Smith <callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>> wrote: Dear All, So we've had some major disk corruption on our hosted engine (hardware to blame), and we have taken backups. However, the hosted-engine VM will no longer boot at all, database is thoroughly corrupted, and we need to rebuild the thing. Just a sanity check on the best route: Preamble: VMs are still running fine - only hosted engine affected VMs are distributed across our entire 3 node cluster All 3 nodes are registered as hosted engine candidates 1. Do another hosted-engine --deploy on one of the existing hosts, and then restore the backup into that 2. Build a new host, deploy the hosted-engine, then restore a backup on a fresh node Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk> _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VYQM7WWIBP7Z7F...

On Mon, Dec 17, 2018 at 2:24 PM Callum Smith <callum@well.ox.ac.uk> wrote:
Dear All,
So we've had some major disk corruption on our hosted engine (hardware to blame), and we have taken backups. However, the hosted-engine VM will no longer boot at all, database is thoroughly corrupted, and we need to rebuild the thing. Just a sanity check on the best route:
Preamble: VMs are still running fine - only hosted engine affected VMs are distributed across our entire 3 node cluster All 3 nodes are registered as hosted engine candidates
1. Do another hosted-engine --deploy on one of the existing hosts, and then restore the backup into that 2. Build a new host, deploy the hosted-engine, then restore a backup on a fresh node
Hi, since 4.2.7 you can use hosted-engine --deploy --restore-from-file=backup.tar.gz And the deployment will restore your backup on the fly. Technically you an also use one of the existing hosts with running VMs but if you want to be on the safe and you have a spare host I'd suggest to use that one. You will be asked to create a new SD for the new engine VM, the previous HE SD will be still visible in the engine if you have to migrate other disks stored there.
Regards, Callum
--
Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. callum@well.ox.ac.uk
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VYQM7WWIBP7Z7F...

Dear Simone, Thanks for the response. Unfortunately I didn't get the install up to 4.7 in time for this event, so we might go down the spare host route just to be triple safe pending the arrival of some networking kit. Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk> On 18 Dec 2018, at 10:08, Simone Tiraboschi <stirabos@redhat.com<mailto:stirabos@redhat.com>> wrote: On Mon, Dec 17, 2018 at 2:24 PM Callum Smith <callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>> wrote: Dear All, So we've had some major disk corruption on our hosted engine (hardware to blame), and we have taken backups. However, the hosted-engine VM will no longer boot at all, database is thoroughly corrupted, and we need to rebuild the thing. Just a sanity check on the best route: Preamble: VMs are still running fine - only hosted engine affected VMs are distributed across our entire 3 node cluster All 3 nodes are registered as hosted engine candidates 1. Do another hosted-engine --deploy on one of the existing hosts, and then restore the backup into that 2. Build a new host, deploy the hosted-engine, then restore a backup on a fresh node Hi, since 4.2.7 you can use hosted-engine --deploy --restore-from-file=backup.tar.gz And the deployment will restore your backup on the fly. Technically you an also use one of the existing hosts with running VMs but if you want to be on the safe and you have a spare host I'd suggest to use that one. You will be asked to create a new SD for the new engine VM, the previous HE SD will be still visible in the engine if you have to migrate other disks stored there. Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk> _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VYQM7WWIBP7Z7F...
participants (3)
-
Callum Smith
-
femi adegoke
-
Simone Tiraboschi