Shabbat Shalom,
On Wed, Aug 11, 2021 at 10:03 AM Yedidyah Bar David <didi(a)redhat.com> wrote:
On Tue, Aug 10, 2021 at 9:20 PM Gilboa Davara
<gilboad(a)gmail.com> wrote:
>
> Hello,
>
> Many thanks again for taking the time to try and help me recover this
machine (even though it would have been far easier to simply redeploy it...)
>
>> >
>> >
>> > Sadly enough, it seems that --clean-metadata requires an active agent.
>> > E.g.
>> > $ hosted-engine --clean-metadata
>> > The hosted engine configuration has not been retrieved from shared
storage. Please ensure that ovirt-ha-agent
>> > is running and the storage server is reachable.
>>
>> Did you try to search the net/list archives?
>
>
> Yes. All of them seem to repeat the same clean-metadata command (which
fails).
I suppose we need better documentation. Sorry. Perhaps open a
bug/issue about that.
Done.
https://bugzilla.redhat.com/show_bug.cgi?id=1993575
>
>>
>>
>> >
>> > Can I manually delete the metadata state files?
>>
>> Yes, see e.g.:
>>
>>
https://lists.ovirt.org/pipermail/users/2016-April/072676.html
>>
>> As an alternative to the 'find' command there, you can also find the
IDs with:
>>
>> $ grep metadata /etc/ovirt-hosted-engine/hosted-engine.conf
>>
>> Best regards,
>> --
>> Didi
>
>
> Yippie! Success (At least it seems that way...)
>
> Following
https://lists.ovirt.org/pipermail/users/2016-April/072676.html
,
> I stopped the broker and agent services, archived the existing hosted
metadata files, created an empty 1GB metadata file using dd, (dd
if=/dev/zero of=/run/vdsm/storage/<uuid>/<uuid> bs=1M count=1024), making
double sure permissions (0660 / 0644), owner (vdsm:kvm) and SELinux labels
(restorecon, just incase) stay the same.
> Let everything settle down.
> Restarted the services....
> ... and everything is up again :)
>
> I plan to let the engine run overnight with zero VMs (making sure all
backups are fully up-to-date).
> Once done, I'll return to normal (until I replace this setup with a
normal multi-node setup).
>
> Many thanks again!
Glad to hear that, welcome, thanks for the report!
More tests you might want to do before starting your real VMs:
- Set and later clear global maintenance from each hosts, see that this
propagates to the others (both 'hosted-engine --vm-status' and agent.log)
- Migrate the engine VM between the hosts and see this propagates
- Shutdown the engine VM without global maint and see that it's started
automatically.
But I do not think all of this is mandatory, if 'hosted-engine --vm-status'
looks ok on all hosts.
I'd still be careful with other things that might have been corrupted,
though - obviously can't tell you what/where...
Host is back to normal.
The log looks clean (minus some odd smtp errors in the log).
Either way, I'm already in the process of replacing this setup with a real
3 host + gluster setup, so I just need this machine to survive the next
couple of weeks :)
- Gilboa