For the records,
After having fixed the major fs issues with guestfish and since the DB was
not starting up, I removed everything from DB data dir and recreated it as
below:
rm -rf /var/opt/rh/rh-postgresql10/lib/pgsql/data/*
/opt/rh/rh-postgresql10/root/usr/bin/postgresql-setup --initdb
systemctl restart rh-postgresql10-postgresql.service
Then proceeded with the restoration, where I requested to provision all
missing databases:
engine-backup --mode=restore --file=engine-backup.gz
--provision-all-databases \
--log=restore.log --restore-permissions
Following this, ran engine-setup, as instructed from the restore operation.
Gained engine web access and saw the same running VMs were shown as up
without issues.
I only observed one VM not able to start due to illegal volume, but that's
another story.
On Thu, Nov 19, 2020 at 9:42 PM Alex K <rightkicktech(a)gmail.com> wrote:
On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkicktech(a)gmail.com> wrote:
> Hi Didi,
>
> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <didi(a)redhat.com>
> wrote:
>
>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkicktech(a)gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I have a corrupt self-hosted engine (with several file system errors,
>>> postgres not able to start) and thus it does not give access to the web UI.
>>> This happened following an unlucky split brain resolution (I am running 2
>>> nodes). The two hosts are running VMs also which I would like to keep
>>> running as they are needed.
>>>
>>> When trying to boot into rescue mode (using
>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing
>>> else.
>>>
>>
>> This means that more than just the DB is corrupt...
>>
>>
>>>
>>> I have backups of engine files with scope all (using the engine-backup
>>> tool).
>>> What is the best approach to try and fix the engine or redeploy.
>>>
>>
>> If you are careful, and know what you are doing, you can try something
>> like the following. I am not giving many details, hopefully you can find on
>> the net tutorials about how to use the things I suggest:
>>
>> 1. Move to global maintenance
>>
>> 2. Stop the current dead vm (if needed)
>>
>> 3. Find current vm conf, edit it to boot from a rescue iso image of your
>> preference or from net/PXE etc., and start the vm with '--vm-conf'
pointing
>> to your edited file.
>>
>> 4. Connect a console (hosted-engine --console, or 'virsh console', or
>> use '--add-console-password' and remote viewer, if needed)
>>
>> 5. Clean the disk and install the OS, oVirt, etc.
>>
>> 6. Copy your backup into the vm and restore with engine-backup
>>
>> 7. Then cleanly stop the machine, exit global maint, and let HA start it
>> (or start it yourself with --vm-start).
>>
>> At the time, we had a bug [1] to document this. The result is [2]. It
>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db is
>> dead but fs is ok).
>> For something somewhat similar to what you want, see also [3], which
>> uses guestfish. Might be useful, depending on how badly your disk is
>> corrupted.
>>
> I went with the guestfish approach. It has fixed some fs issues and now
> the yum etc seem fine apart from postgres.
> I had tried previously to uninstall/install packages so I ended
> installing them again with yum install ovirt\*setup\*.
> Now I think I have to run engine-setup but I get the error:
>
> Failed to execute stage 'Environment setup': Cannot connect to Engine
> database using existing credentials: engine@localhost:5432
>
Seems that I need to have psql running to be able to run engine-backup
--mode=restore. Are there any steps how one could manually prepare pgsql
for ovirt so as to attempt restoration?
>
> So I guess I need to follow [2]. What do you think?
>
>
>> How did you run into a split brain? There is a lock on the shared
>> storage that should prevent this.
>>
>> Good luck and best regards,
>>
>> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1482710
>> [2]
>>
https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_S...
>> [3]
https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4
>> --
>> Didi
>>
>