Shabbat Shalom,

On Wed, Aug 11, 2021 at 10:03 AM Yedidyah Bar David <didi@redhat.com> wrote:

On Tue, Aug 10, 2021 at 9:20 PM Gilboa Davara <gilboad@gmail.com> wrote:
>
> Hello,
>
> Many thanks again for taking the time to try and help me recover this machine (even though it would have been far easier to simply redeploy it...)
>
>> >
>> >
>> > Sadly enough, it seems that --clean-metadata requires an active agent.
>> > E.g.
>> > $ hosted-engine --clean-metadata
>> > The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent
>> > is running and the storage server is reachable.
>>
>> Did you try to search the net/list archives?
>
>
> Yes. All of them seem to repeat the same clean-metadata command (which fails).

I suppose we need better documentation. Sorry. Perhaps open a
bug/issue about that.

Done.

https://bugzilla.redhat.com/show_bug.cgi?id=1993575

>
>>
>>
>> >
>> > Can I manually delete the metadata state files?
>>
>> Yes, see e.g.:
>>
>> https://lists.ovirt.org/pipermail/users/2016-April/072676.html
>>
>> As an alternative to the 'find' command there, you can also find the IDs with:
>>
>> $ grep metadata /etc/ovirt-hosted-engine/hosted-engine.conf
>>
>> Best regards,
>> --
>> Didi
>
>
> Yippie! Success (At least it seems that way...)
>
> Following https://lists.ovirt.org/pipermail/users/2016-April/072676.html,
> I stopped the broker and agent services, archived the existing hosted metadata files, created an empty 1GB metadata file using dd, (dd if=/dev/zero of=/run/vdsm/storage/<uuid>/<uuid> bs=1M count=1024), making double sure permissions (0660 / 0644), owner (vdsm:kvm) and SELinux labels (restorecon, just incase) stay the same.
> Let everything settle down.
> Restarted the services....
> ... and everything is up again :)
>
> I plan to let the engine run overnight with zero VMs (making sure all backups are fully up-to-date).
> Once done, I'll return to normal (until I replace this setup with a normal multi-node setup).
>
> Many thanks again!

Glad to hear that, welcome, thanks for the report!

More tests you might want to do before starting your real VMs:

- Set and later clear global maintenance from each hosts, see that this
propagates to the others (both 'hosted-engine --vm-status' and agent.log)

- Migrate the engine VM between the hosts and see this propagates

- Shutdown the engine VM without global maint and see that it's started
automatically.

But I do not think all of this is mandatory, if 'hosted-engine --vm-status'
looks ok on all hosts.

I'd still be careful with other things that might have been corrupted,
though - obviously can't tell you what/where...

Host is back to normal.

The log looks clean (minus some odd smtp errors in the log).

Either way, I'm already in the process of replacing this setup with a real 3 host + gluster setup, so I just need this machine to survive the next couple of weeks :)

- Gilboa