On Tue, Aug 10, 2021 at 9:20 PM Gilboa Davara <gilboad(a)gmail.com> wrote:
Hello,
Many thanks again for taking the time to try and help me recover this machine (even
though it would have been far easier to simply redeploy it...)
> >
> >
> > Sadly enough, it seems that --clean-metadata requires an active agent.
> > E.g.
> > $ hosted-engine --clean-metadata
> > The hosted engine configuration has not been retrieved from shared storage.
Please ensure that ovirt-ha-agent
> > is running and the storage server is reachable.
>
> Did you try to search the net/list archives?
Yes. All of them seem to repeat the same clean-metadata command (which fails).
I suppose we need better documentation. Sorry. Perhaps open a
bug/issue about that.
>
>
> >
> > Can I manually delete the metadata state files?
>
> Yes, see e.g.:
>
>
https://lists.ovirt.org/pipermail/users/2016-April/072676.html
>
> As an alternative to the 'find' command there, you can also find the IDs
with:
>
> $ grep metadata /etc/ovirt-hosted-engine/hosted-engine.conf
>
> Best regards,
> --
> Didi
Yippie! Success (At least it seems that way...)
Following
https://lists.ovirt.org/pipermail/users/2016-April/072676.html,
I stopped the broker and agent services, archived the existing hosted metadata files,
created an empty 1GB metadata file using dd, (dd if=/dev/zero
of=/run/vdsm/storage/<uuid>/<uuid> bs=1M count=1024), making double sure
permissions (0660 / 0644), owner (vdsm:kvm) and SELinux labels (restorecon, just incase)
stay the same.
Let everything settle down.
Restarted the services....
... and everything is up again :)
I plan to let the engine run overnight with zero VMs (making sure all backups are fully
up-to-date).
Once done, I'll return to normal (until I replace this setup with a normal multi-node
setup).
Many thanks again!
Glad to hear that, welcome, thanks for the report!
More tests you might want to do before starting your real VMs:
- Set and later clear global maintenance from each hosts, see that this
propagates to the others (both 'hosted-engine --vm-status' and agent.log)
- Migrate the engine VM between the hosts and see this propagates
- Shutdown the engine VM without global maint and see that it's started
automatically.
But I do not think all of this is mandatory, if 'hosted-engine --vm-status'
looks ok on all hosts.
I'd still be careful with other things that might have been corrupted,
though - obviously can't tell you what/where...
Best regards,
--
Didi