Re: Power failure makes cluster and hosted engine unusable
by Roman Bednar
Hi Seann,
On Mon, Mar 29, 2021 at 8:31 PM Seann G. Clark via Users <users(a)ovirt.org>
wrote:
> All,
>
>
>
> After a power failure, and generator failure I lost my cluster, and the
> Hosted engine refused to restart after power was restored. I would expect,
> once storage comes up that the hosted engine comes back online without too
> much of a fight. In practice because the SPM went down as well, there is no
> (clearly documented) way to clear any of the stale locks, and no way to
> recover both the hosted engine and the cluster.
>
Could you provide more details/logs on storage not coming up? Also more
information about the current locks would be great, is there any procedure
you tried that did not work for cleaning those up?
I have spent the last 12 hours trying to get a functional hosted-engine
> back online, on a new node and each attempt hits a new error, from the
> installer not understanding that 16384mb of dedicated VM memory out of
> 192GB free on the host is indeed bigger than 4096MB, to ansible dying on
> an error like this “Error while executing action: Cannot add Storage
> Connection. Storage connection already exists.”
>
> The memory error referenced above shows up as:
>
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg":
> "Available memory ( {'failed': False, 'changed': False, 'ansible_facts':
> {u'max_mem': u'180746'}}MB ) is less then the minimal requirement (4096MB).
> Be aware that 512MB is reserved for the host and cannot be allocated to the
> engine VM."}
>
> That is what I typically get when I try the steps outlined in the KB
> “CHAPTER 7. RECOVERING A SELF-HOSTED ENGINE FROM AN EXISTING BACKUP” from
> the RH Customer portal. I have tried this numerous ways, and the cluster
> still remains in a bad state, with the hosted engine being 100% inoperable.
>
This could be a bug in the ansible role, did that happen during
"hosted-engine --deploy" or other part of the recovery guide? Provide logs
here as well please, its seems like a completely separate issue though.
>
> What I do have are the two host that are part of the cluster and can host
> the engine, and backups of the original hosted engine, both disk and
> engine-backup generated. I am not sure what I can do next, to recover this
> cluster, any suggestions would be apricated.
>
>
>
> Regards,
>
> Seann
>
>
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/JLDIFTKYDPQ...
>
3 years, 8 months
[ANN] oVirt 4.4.6 Second Release Candidate is now available for testing
by Lev Veyde
oVirt 4.4.6 Second Release Candidate is now available for testing
The oVirt Project is pleased to announce the availability of oVirt 4.4.6
Second Release Candidate for testing, as of April 1st, 2021.
This update is the sixth in a series of stabilization updates to the 4.4
series.
How to prevent hosts entering emergency mode after upgrade from oVirt 4.4.1
Note: Upgrading from 4.4.2 GA or later should not require re-doing these
steps, if already performed while upgrading from 4.4.1 to 4.4.2 GA. These
are only required to be done once.
Due to Bug 1837864 <https://bugzilla.redhat.com/show_bug.cgi?id=1837864> -
Host enter emergency mode after upgrading to latest build
If you have your root file system on a multipath device on your hosts you
should be aware that after upgrading from 4.4.1 to 4.4.6 you may get your
host entering emergency mode.
In order to prevent this be sure to upgrade oVirt Engine first, then on
your hosts:
1.
Remove the current lvm filter while still on 4.4.1, or in emergency mode
(if rebooted).
2.
Reboot.
3.
Upgrade to 4.4.6 (redeploy in case of already being on 4.4.6).
4.
Run vdsm-tool config-lvm-filter to confirm there is a new filter in
place.
5.
Only if not using oVirt Node:
- run "dracut --force --add multipath” to rebuild initramfs with the
correct filter configuration
6.
Reboot.
Documentation
-
If you want to try oVirt as quickly as possible, follow the instructions
on the Download <https://ovirt.org/download/> page.
-
For complete installation, administration, and usage instructions, see
the oVirt Documentation <https://ovirt.org/documentation/>.
-
For upgrading from a previous version, see the oVirt Upgrade Guide
<https://ovirt.org/documentation/upgrade_guide/>.
-
For a general overview of oVirt, see About oVirt
<https://ovirt.org/community/about.html>.
Important notes before you try it
Please note this is a pre-release build.
The oVirt Project makes no guarantees as to its suitability or usefulness.
This pre-release must not be used in production.
Installation instructions
For installation instructions and additional information please refer to:
https://ovirt.org/documentation/
This release is available now on x86_64 architecture for:
* Red Hat Enterprise Linux 8.3 or newer
* CentOS Linux (or similar) 8.3 or newer
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
for:
* Red Hat Enterprise Linux 8.3 or newer
* CentOS Linux (or similar) 8.3 or newer
* oVirt Node 4.4 based on CentOS Linux 8.3 (available for x86_64 only)
See the release notes [1] for installation instructions and a list of new
features and bugs fixed.
Notes:
- oVirt Appliance is already available for CentOS Linux 8
- oVirt Node NG is already available for CentOS Linux 8
- We found a few issues while testing on CentOS Stream so we are still
basing oVirt 4.4.6 Node and Appliance on CentOS Linux.
Additional Resources:
* Read more about the oVirt 4.4.6 release highlights:
http://www.ovirt.org/release/4.4.6/
* Get more oVirt project updates on Twitter: https://twitter.com/ovirt
* Check out the latest project news on the oVirt blog:
http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.4.6/
[2] http://resources.ovirt.org/pub/ovirt-4.4-pre/iso/
--
Lev Veyde
Senior Software Engineer, RHCE | RHCVA | MCITP
Red Hat Israel
<https://www.redhat.com>
lev(a)redhat.com | lveyde(a)redhat.com
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
3 years, 8 months
Locked disks
by Giulio Casella
Since yesterday I found a couple VMs with locked disk. I don't know the
reason, I suspect some interaction made by our backup system (vprotect,
snapshot based), despite it's working for more than a year.
I'd give a chance to unlock_entity.sh script, but it reports:
CAUTION, this operation may lead to data corruption and should be used
with care. Please contact support prior to running this command
Do you think I should trust? Is it safe? VMs are in production...
My manager is 4.4.4.7-1.el8 (CentOS stream 8), hosts are oVirt Node 4.4.4
TIA,
Giulio
3 years, 8 months