March 2021 - Users - Ovirt List Archives

Hyperconverged engine high availability?

by David White

I just finished deploying oVirt 4.4.5 onto a 3-node hyperconverged cluster running on Red Hat 8.3 OS. Over the course of the setup, I noticed that I had to setup the storage for the engine separately from the gluster bricks. It looks like the engine was installed onto /rhev/data-center/ on the first host, whereas the gluster bricks for all 3 hosts are on /gluster_bricks/. I fear that I may already know the answer to this, but: Is it possible to make the engine highly available? Also, thinking hypothetically here, what would happen to my VMs that are physically on the first server, if the first server crashed? The engine is what handles the high availability, correct? So what if a VM was running on the first host? There would be nothing to automatically "move" it to one of the remaining healthy hosts. Or am I misunderstanding something here? Sent with ProtonMail Secure Email.

4 years, 3 months

4
6
0 / 0

Deployment issues

by Valerio Luccio

Hello all, last September I deployed Ovirt on a CentOS 8 server, with storage on our gluster (replica 3). I then added some VMs, etc. A few days ago I managed to screw everything up and, after bannging my head for a couple of days, decided to start from scratch. I made a copy of all the data under the storage to a safe space, the ran ovirt-hosted-engine-cleanup and deleted everything under the storage and tried to create a new hosted engine (I tried both from the cockpit and from command line). Everything seems to work fine (I can ssh to the engine) until it tries to save the engine to storage and it fails with the error: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain's metadata]\". HTTP response code is 400."} I don't get any more details. I'm using exactly the same parameters I used before. I have no problems reaching the gluster storage and the process does create the top-level directory and <top-level>/dom_md/ids with the correct ownership. I looked at the glusterfs log files, including the rhev-data-center-mnt-glusterSD-<host:_root>.log file, but I don't spot any specific error. What am I doing wrong ? Is there something else I need to clean up before trying a new deployment ? Should I just try to delete all of the ovirt configuration files ? Which ones ? Thanks, -- As a result of Coronavirus-related precautions, NYU and the Center for Brain Imaging operations will be managed remotely until further notice. All telephone calls and e-mail correspondence are being monitored remotely during our normal business hours of 9am-5pm, Monday through Friday. For MRI scanner-related emergency, please contact: Keith Sanzenbach at keith.sanzenbach(a)nyu.edu and/or Pablo Velasco at pablo.velasco(a)nyu.edu For computer/hardware/software emergency, please contact: Valerio Luccio at valerio.luccio(a)nyu.edu For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at chrysa(a)nyu.edu For CBI-related administrative emergency, please contact: Jennifer Mangan at jennifer.mangan(a)nyu.edu Valerio Luccio (212) 998-8736 Center for Brain Imaging 4 Washington Place, Room 158 New York University New York, NY 10003 "In an open world, who needs windows or gates ?"

4 years, 3 months

4
12
0 / 0

Re: Power failure makes cluster and hosted engine unusable

by Thomas Hoberg

Yup, that's a bug in the ansible code, I've come across on hosts that had 512GB of RAM. I quite simply deleted the checks from the ansible code and re-ran the wizard. I can't read YAML or Python or whatever it is that Ansible uses, but my impression is that things are 'cast' or converted into an INT data type on these checks that overflows at that point. I wound up commenting out the entire set of checks to get past this, because I could see no easy way to fix this. I just checked that the commands used to retrieve the memory size returned the proper number of kilobytes and then rolled my eyes at what seemd like a type cast operation. I never went deeper, because at that point I have a hard time keeping the beast inside me at bay, that sees how Ansible can bring a Xeon Scalable Gold system to what seems slower than a 6502 executing BASIC. The hosted-setup takes what seems like an hour, no matter how fast or slow your hardware. I was so fed up at the speed of Ansible and the quality of oVirt QA, I couldn't bring myself to open a ticket, I hope you're better motivated. BTW things aren't much better at the low end either. While Ansible doesn't seem that much slower on an Atom farm I also operate, the hosted-engine setup does fail on them, so replug their SSDs for that part into an i7 and then replug the SSDs to the Atoms aftwards. Once, up and running oVirt is just fine on Atoms (mine have 32GB of RAM each). I am almost ready to donate my Atoms to the project, because I keep thinking that oVirt's major chance would be as edge HCI, but they are asking for 128GB RAM minimum... BTW. I was running oVirt 4.3 on CentOS7 when hitting that error. No idea if it's still the same with 4.4/COS8 as my 'perhaps-next-but-more-likely-never' generation test farm runs on NUCs with 64GB.

4 years, 3 months

1
0
0 / 0

Network issue

by Valerio Luccio

I'm slowly getting things back up, but I ran into a perplexing network problem. I created a new vNIC profile in the engine web UI and attached it to the hosted engine to test it, which in retrospect was not a good idea. I realized that I needed to change something in the parameters of the vNIC, but it won't let me do it because it's attached to the running engine; it will not let me remove the NIC from the engine because it's running. If I shut down the engine, how can I then change the configuration of it ? Seems like a Catch 22 situation. Thanks, -- As a result of Coronavirus-related precautions, NYU and the Center for Brain Imaging operations will be managed remotely until further notice. All telephone calls and e-mail correspondence are being monitored remotely during our normal business hours of 9am-5pm, Monday through Friday. For MRI scanner-related emergency, please contact: Keith Sanzenbach at keith.sanzenbach(a)nyu.edu and/or Pablo Velasco at pablo.velasco(a)nyu.edu For computer/hardware/software emergency, please contact: Valerio Luccio at valerio.luccio(a)nyu.edu For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at chrysa(a)nyu.edu For CBI-related administrative emergency, please contact: Jennifer Mangan at jennifer.mangan(a)nyu.edu Valerio Luccio (212) 998-8736 Center for Brain Imaging 4 Washington Place, Room 158 New York University New York, NY 10003 "In an open world, who needs windows or gates ?"

4 years, 3 months

1
1
0 / 0

Import VMs from disk

by valerio.luccio＠nyu.edu

As I've described before I had issues with my deployment and had to recreate the engine. I never had a chance to export my VMs, but I did back up the files. Now I would like to import the old VMs into the new engine. Is there a way to 'slurp' in the old images ? I see different way to import from an export domain, form VMWare, etc., but I don't see a way to import directly from disk. Do I need to convert them ? To what and how ? Thanks,

4 years, 3 months

6
14
0 / 0

Re: Power failure makes cluster and hosted engine unusable

by Vincent Royer

Seann, If this happens again, try doing nothing (seriously) Each time I've had a power failure, the engine takes a really long time to come back up. I don't know if it's by design or what. Host logs are flooded with errors, everything seemingly storage related. However, my Gluster setup is on fast SSDs and gets back and running pretty much straight away. It takes maybe 5 minutes for the nodes to re-join and the volumes to show 'UP' with no heals. However, it still takes the hosted-engine a good hour or two to simmer down and finally start up. Sometimes I try to help by restarting ha-broker and ha-agent, or plunking in other random commands from the mess of documentation, but it seems to sort itself out on its own time, regardless of my tinkering. I wish I could get more insight into the process, but definitely, doing nothing and waiting has been the most successful troubleshooting step I have taken. Cheers! On Mon, Mar 29, 2021 at 11:32 AM Seann G. Clark via Users <users(a)ovirt.org> wrote: > All, > > > > After a power failure, and generator failure I lost my cluster, and the > Hosted engine refused to restart after power was restored. I would expect, > once storage comes up that the hosted engine comes back online without too > much of a fight. In practice because the SPM went down as well, there is no > (clearly documented) way to clear any of the stale locks, and no way to > recover both the hosted engine and the cluster. > > > > I have spent the last 12 hours trying to get a functional hosted-engine > back online, on a new node and each attempt hits a new error, from the > installer not understanding that 16384mb of dedicated VM memory out of > 192GB free on the host is indeed bigger than 4096MB, to ansible dying on > an error like this “Error while executing action: Cannot add Storage > Connection. Storage connection already exists.” > > The memory error referenced above shows up as: > > [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": > "Available memory ( {'failed': False, 'changed': False, 'ansible_facts': > {u'max_mem': u'180746'}}MB ) is less then the minimal requirement (4096MB). > Be aware that 512MB is reserved for the host and cannot be allocated to the > engine VM."} > > That is what I typically get when I try the steps outlined in the KB > “CHAPTER 7. RECOVERING A SELF-HOSTED ENGINE FROM AN EXISTING BACKUP” from > the RH Customer portal. I have tried this numerous ways, and the cluster > still remains in a bad state, with the hosted engine being 100% inoperable. > > > > What I do have are the two host that are part of the cluster and can host > the engine, and backups of the original hosted engine, both disk and > engine-backup generated. I am not sure what I can do next, to recover this > cluster, any suggestions would be apricated. > > > > Regards, > > Seann > > > > > _______________________________________________ > Users mailing list -- users(a)ovirt.org > To unsubscribe send an email to users-leave(a)ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/JLDIFTKYDPQ... >

4 years, 3 months

1
0
0 / 0

Q: Set host CPU type to kvm64 for a single VM

by Andrei Verovski

Hi ! Is it possible to set host CPU type to kvm64 for a single VM ? Thanks. Andrei

4 years, 3 months

4
4
0 / 0

Power failure makes cluster and hosted engine unusable

by Seann G. Clark

All, After a power failure, and generator failure I lost my cluster, and the Hosted engine refused to restart after power was restored. I would expect, once storage comes up that the hosted engine comes back online without too much of a fight. In practice because the SPM went down as well, there is no (clearly documented) way to clear any of the stale locks, and no way to recover both the hosted engine and the cluster. I have spent the last 12 hours trying to get a functional hosted-engine back online, on a new node and each attempt hits a new error, from the installer not understanding that 16384mb of dedicated VM memory out of 192GB free on the host is indeed bigger than 4096MB, to ansible dying on an error like this "Error while executing action: Cannot add Storage Connection. Storage connection already exists." The memory error referenced above shows up as: [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Available memory ( {'failed': False, 'changed': False, 'ansible_facts': {u'max_mem': u'180746'}}MB ) is less then the minimal requirement (4096MB). Be aware that 512MB is reserved for the host and cannot be allocated to the engine VM."} That is what I typically get when I try the steps outlined in the KB "CHAPTER 7. RECOVERING A SELF-HOSTED ENGINE FROM AN EXISTING BACKUP" from the RH Customer portal. I have tried this numerous ways, and the cluster still remains in a bad state, with the hosted engine being 100% inoperable. What I do have are the two host that are part of the cluster and can host the engine, and backups of the original hosted engine, both disk and engine-backup generated. I am not sure what I can do next, to recover this cluster, any suggestions would be apricated. Regards, Seann

4 years, 3 months

1
0
0 / 0

VLAN Trunking to Cluster Nodes

by Jason Alexander Hazen Valliant-Saunders

Good Day Ovirt Users; I am running a the following Setup: [image: image.png] [image: image.png] [image: image.png] I need to setup the BOND1 connector in the Datacenter with 9 VLAN's they Exist on the Ovirt Node already and the Bond is UP: [image: image.png] (bond0 is 10GB Chelsio Storage Network) The idea is to pass through Vlan's on bond1 (bond1.1, 1.2, 1.3, 1.n ) where n = vlanid. However I'm unsure as to how I go about setting up the Provider in the datacenter, so that bond1 is used to trunk vlans into the DC. From what I have read the VLAN"s need to exist on the cluster node (they do); but the ovirt-engine must then have some kind of provider configured (just like it does for ovirt mgt) then i can map the vNic Profiles to that new provider as per their vlan assignment. Regards, Hazen

4 years, 3 months

1
0
0 / 0

Cinderlib problem after upgrade from 4.3.10 to 4.4.5

by Marc-Christian Schröer

Hello all, first of all thank you very much for this stable virtualization environment. It has been a pillar for our company’s business for more than 5 years now and after migrating from version 3 to 4 it has been so stable ever since. Anyway, I ran into a problem I cannot fix on my own yesterday: After a lot of consideration and hesitation since this is a production environment I followed the upgrade guide (https://www.ovirt.org/documentation/upgrade_guide/ <https://www.ovirt.org/documentation/upgrade_guide/>), configured a vanilla CentOS 8 server as controller, decommissioned the old 4.3 controller and fired up the new one. It worked like a charm until I tried to migrate VMs, start new ones or even create new disks. We use Ceph as managed storage, providing a SSD only and a HDD only pool. The UI simply told me that there was an error. I started investigating the issue and found corresponding log entries in ovirt-engine.log: 2021-03-22 10:36:37,247+01 ERROR [org.ovirt.engine.core.common.utils.cinderlib.CinderlibExecutor] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [67bf193c] cinderlib execution failed: But that was all the engine had to say about the issue. There was no stack trace or additional information. There is no logfile in /var/log/ovirt-engine/cinderlib/, the directory simply is empty while on the other controller it was frequently filed with annoying „already mounted“ messages. Can anyone help me with that issue? I searched the web for a solution or someone else with the same problem, but came up empty. Is there a way to turn up the log level for cinderlib? Are there any dependencies I have to install besides the ovirt packages? Any help is very much appreciated! Kind regards and stay healthy, Marc -- ________________________________________________________________________ Dipl.-Inform. Marc-Christian Schröer schroeer(a)ingenit.com Geschäftsführer / CEO ---------------------------------------------------------------------- ingenit GmbH & Co. KG Tel. +49 (0)231 58 698-120 Emil-Figge-Strasse 76-80 Fax. +49 (0)231 58 698-121 D-44227 Dortmund www.ingenit.com Registergericht: Amtsgericht Dortmund, HRA 13 914 Gesellschafter : Thomas Klute, Marc-Christian Schröer ________________________________________________________________________

4 years, 3 months

2
3
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Users March 2021