Accidentally changed VM Assigned Bridge for Hosted-Engine
by phunyguy@neverserio.us
Hello, exactly as the title suggests, I did something stupid, and I am not sure how to fix this. I also cannot be certain that this was the exact cause of the problem, as I was troubleshooting another issue with a different VM refusing to start.
What I did by accident, was selected the hosted-engine VM, rather than the VM I was attempting to troubleshoot, and changed the assigned network of the VM adapter to something else, and of course, lost connectivity to the engine VM. This caused HA to kick in and restart the engine, however, like the other VM, it is now refusing to start.
Can anyone give me any pointers? agent/broker/vdsm.log are not showing anything useful that I can see. Thanks in advance.
-phunyguy
3 years, 8 months
Problem-VDSM command failed:Received fatal alert certificate_expire
by fmendoza@red.com.sv
Recently Ovirt sent certificate expiration messages. And it does not let me manage the server VMs, I have looked for information about this and I can not find something that is clear. The version of Ovirt that I own is 3.6.3.4-1.el7.cents. I have at least 4 operational VMs of vital importance.
Thanks for the support
3 years, 8 months
Problems after upgrade from 4.4.3 to 4.4.4
by tferic@swissonline.ch
Hi
I have problems after upgrading my 2-node cluster from 4.4.3 to 4.4.4.
Initially, I performed the upgrade of the oVirt hosts using the oVirt GUI (I wasn't planning any changes).
It appears that the upgrade broke the system.
On host1, the ovirt-engine was configured to run on the oVirt host itself (not self-hosted engine).
After the upgrade, the oVirt GUI didn't load in the Browser anymore.
I tried to fix the issue by migrating to self-hosted engine, which did not work, so I ran engine restore and engine-setup in order to get back to the initial state.
I am now able to login to the oVirt GUI again, but I am having the following problems:
host1 is in status "Unassigned", and it has the SPM role. It cannot be set to maintenance mode, nor re-installed from GUI, but I am able to reboot the host from oVirt.
All Storage Domains are inactive. (all NFS)
In the /var/log/messages log, I can see the following message appearing frequently: "vdsm[5935]: ERROR ssl handshake: socket error, address: ::ffff:192.168.100.61"
The cluster is down and no VM's can be run. I don't know how to fix either of the issues.
Does anyone have an idea?
I am appending a tar file containing log files to this email.
http://gofile.me/5fp92/d7iGEqh3H <http://gofile.me/5fp92/d7iGEqh3H>
Many thanks
Toni
3 years, 8 months
No option to remove OpenStack Glance storage domain.
by Gary Taylor
Hi,
I was playing around a few months ago and added an ovirt-image-repository storage domain. There wasn't a real purpose for doing it, just trying to learn and play around. I tried to remove it earlier but couldn't figure out how because the Remove button is greyed out. I got busy and forgot about it. I am trying to clean it up now but the Remove button is still greyed out for that domain. How do I get rid of it now? It isn't being used. It's unattached. I'm the admin.
https://imgur.com/KBKUu16.png
oVirt Open Virtualization Manager
Software Version:4.4.4.7-1.el8
Thank-you,
Gary
3 years, 8 months
Hosted-Engine vs Standalone Engine
by Ian Easter
Hello Folks,
I have had to install a Hosted-Engine a few times in my environment. There
have been some hardware issues and power issues that left the HE
unrecoverable.
In this situation, would the Standalone Engine install be more viable and
less prone to become inoperable due to the previous issues?
My assumption would be to have a head baremetal server run the Engine to
control and maintain my blades.
*Thank you,*
*Ian*
3 years, 8 months
Upgrade to release for oVirt 4.4.5 failing
by Gary Pedretty
Upgraded hosted-engine installation to latest version 4.4.5 today and engine will not come online. It starts and I can ssh and connect via cockpit, but it never passes liveliness check according to command line vm-status and Web portal for engine never loads. All processes appear to be working and nothing jumps out to me in the logs but ha agent keeps rebooting it on different hosts with the same result. 8 host in the cluster all are also updated to the latest updates across the board. This is Centos Stream 8
Ideas?
Gary
_______________________________
Gary Pedretty
IT Manager
Ravn Alaska
Office: 907-266-8451
Mobile: 907-388-2247
Email: gary.pedretty(a)ravnalaska.com
"We call Alaska......Home!"
Ravn Alaska
CONFIDENTIALITY NOTICE:
The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination, forwarding or copying of the email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system. Thank you.
3 years, 8 months
Re: Power failure makes cluster and hosted engine unusable
by Roman Bednar
Hi Seann,
On Mon, Mar 29, 2021 at 8:31 PM Seann G. Clark via Users <users(a)ovirt.org>
wrote:
> All,
>
>
>
> After a power failure, and generator failure I lost my cluster, and the
> Hosted engine refused to restart after power was restored. I would expect,
> once storage comes up that the hosted engine comes back online without too
> much of a fight. In practice because the SPM went down as well, there is no
> (clearly documented) way to clear any of the stale locks, and no way to
> recover both the hosted engine and the cluster.
>
Could you provide more details/logs on storage not coming up? Also more
information about the current locks would be great, is there any procedure
you tried that did not work for cleaning those up?
I have spent the last 12 hours trying to get a functional hosted-engine
> back online, on a new node and each attempt hits a new error, from the
> installer not understanding that 16384mb of dedicated VM memory out of
> 192GB free on the host is indeed bigger than 4096MB, to ansible dying on
> an error like this “Error while executing action: Cannot add Storage
> Connection. Storage connection already exists.”
>
> The memory error referenced above shows up as:
>
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg":
> "Available memory ( {'failed': False, 'changed': False, 'ansible_facts':
> {u'max_mem': u'180746'}}MB ) is less then the minimal requirement (4096MB).
> Be aware that 512MB is reserved for the host and cannot be allocated to the
> engine VM."}
>
> That is what I typically get when I try the steps outlined in the KB
> “CHAPTER 7. RECOVERING A SELF-HOSTED ENGINE FROM AN EXISTING BACKUP” from
> the RH Customer portal. I have tried this numerous ways, and the cluster
> still remains in a bad state, with the hosted engine being 100% inoperable.
>
This could be a bug in the ansible role, did that happen during
"hosted-engine --deploy" or other part of the recovery guide? Provide logs
here as well please, its seems like a completely separate issue though.
>
> What I do have are the two host that are part of the cluster and can host
> the engine, and backups of the original hosted engine, both disk and
> engine-backup generated. I am not sure what I can do next, to recover this
> cluster, any suggestions would be apricated.
>
>
>
> Regards,
>
> Seann
>
>
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/JLDIFTKYDPQ...
>
3 years, 8 months
Locked disks
by Giulio Casella
Since yesterday I found a couple VMs with locked disk. I don't know the
reason, I suspect some interaction made by our backup system (vprotect,
snapshot based), despite it's working for more than a year.
I'd give a chance to unlock_entity.sh script, but it reports:
CAUTION, this operation may lead to data corruption and should be used
with care. Please contact support prior to running this command
Do you think I should trust? Is it safe? VMs are in production...
My manager is 4.4.4.7-1.el8 (CentOS stream 8), hosts are oVirt Node 4.4.4
TIA,
Giulio
3 years, 8 months
Hyperconverged engine high availability?
by David White
I just finished deploying oVirt 4.4.5 onto a 3-node hyperconverged cluster running on Red Hat 8.3 OS.
Over the course of the setup, I noticed that I had to setup the storage for the engine separately from the gluster bricks.
It looks like the engine was installed onto /rhev/data-center/ on the first host, whereas the gluster bricks for all 3 hosts are on /gluster_bricks/.
I fear that I may already know the answer to this, but:
Is it possible to make the engine highly available?
Also, thinking hypothetically here, what would happen to my VMs that are physically on the first server, if the first server crashed? The engine is what handles the high availability, correct? So what if a VM was running on the first host? There would be nothing to automatically "move" it to one of the remaining healthy hosts.
Or am I misunderstanding something here?
Sent with ProtonMail Secure Email.
3 years, 8 months
Deployment issues
by Valerio Luccio
Hello all,
last September I deployed Ovirt on a CentOS 8 server, with storage on
our gluster (replica 3). I then added some VMs, etc. A few days ago I
managed to screw everything up and, after bannging my head for a couple
of days, decided to start from scratch.
I made a copy of all the data under the storage to a safe space, the ran
ovirt-hosted-engine-cleanup and deleted everything under the storage and
tried to create a new hosted engine (I tried both from the cockpit and
from command line). Everything seems to work fine (I can ssh to the
engine) until it tries to save the engine to storage and it fails with
the error:
FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain's metadata]\". HTTP response code is 400."}
I don't get any more details.
I'm using exactly the same parameters I used before. I have no problems
reaching the gluster storage and the process does create the top-level
directory and <top-level>/dom_md/ids with the correct ownership. I
looked at the glusterfs log files, including the
rhev-data-center-mnt-glusterSD-<host:_root>.log file, but I don't spot
any specific error.
What am I doing wrong ? Is there something else I need to clean up
before trying a new deployment ? Should I just try to delete all of the
ovirt configuration files ? Which ones ?
Thanks,
--
As a result of Coronavirus-related precautions, NYU and the Center for
Brain Imaging operations will be managed remotely until further notice.
All telephone calls and e-mail correspondence are being monitored
remotely during our normal business hours of 9am-5pm, Monday through
Friday.
For MRI scanner-related emergency, please contact: Keith Sanzenbach at
keith.sanzenbach(a)nyu.edu and/or Pablo Velasco at pablo.velasco(a)nyu.edu
For computer/hardware/software emergency, please contact: Valerio Luccio
at valerio.luccio(a)nyu.edu
For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at
chrysa(a)nyu.edu
For CBI-related administrative emergency, please contact: Jennifer
Mangan at jennifer.mangan(a)nyu.edu
Valerio Luccio (212) 998-8736
Center for Brain Imaging 4 Washington Place, Room 158
New York University New York, NY 10003
"In an open world, who needs windows or gates ?"
3 years, 8 months