On Tue, Mar 27, 2018 at 9:14 PM, Sven Achtelik <Sven.Achtelik(a)eps.aero> wrote:
Hi All,
I’m still facing issues with my HE engine. Here are the steps that I took to
end up in this situation:
- Update Engine from 4.1.7 to 4.1.9
o That worked as expected
- Automatic Backup of Engine DB in the night
- Upgraded Engine from 4.1.9 to 4.2.1
o That worked fine
- Noticed Issues with the HA support for HE
o Cause was not having the latest ovirt-ha agent/broker version on hosts
- After updating the first host with the latest packages for the
Agent/Broker engine was started twice
o As a result the Engine VM Disk was corrupted and there is no Backup of
the Disk
o There is also no Backup of the Engine DB with version 4.2
- VM disk was repaired with fsck.ext4, but DB is corrupt
o Can’t restore the Engine DB because the Backup DB from Engine V 4.1
- Rolled back all changes on Engine VM to 4.1.9 and imported Backup
o Checked for HA VMs to set as disabled and started the Engine
- Login is fine but the Engine is having trouble picking up and
information from the Hosts
o No information on running VMs or hosts status
- Final Situation
o 2 Hosts have VMs still running and I can’t stop those
o I still have the image of my corrupted Engine VM (v4.2)
Since there were no major changes after upgrading from 4.1 to 4.2, would it
be possible to manually restore the 4.1 DB to the 4.2 Engine VM to this up
and running again or are there modifications made to the DB on upgrading
that are relevant for this ?
engine-backup requires restoring to the same version used to take the backup,
with a single exception - on 4.0, it can restore 3.6.
It's very easy to patch it to allow also 4.1->4.2, search inside it for
"VALID_BACKUP_RESTORE_PAIRS". However, I do not think anyone ever tested
this, so no idea might break. In 3.6->4.0 days, we did have to fix a few
other things, notably apache httpd and iptables->firewalld:
https://bugzilla.redhat.com/show_bug.cgi?id=1318580
All my work on rolling back to 4.1.9 with the
DB restore failed as the Engine is not capable of picking up information
from the hosts.
No idea why, but not sure it's related to your restore flow.
Lessons learned is to always make a copy/snapshot of the
engine VM disk before upgrading anything.
If it's a hosted-engine, this isn't supported - see my reply on the
list ~ 1 hour ago...
What are my options on getting
back to a working environment ? Any help or hint is greatly appreciated.
Restore again with either methods - what you tried, or patching engine-backup
and restore directly into 4.2 - and if the engine fails to talk to the hosts,
try to debug/fix this.
If you suspect corruption more severe that just the db, you can install a
fresh engine machine from scratch and restore to it. If it's a hosted-engine,
you'll need to deploy hosted-engine from scratch, check docs about hosted-engine
backup/restore.
Best regards,
--
Didi