October 2021 - Users - Ovirt List Archives

question about engine deployment success rate

by Henning Sprang

Hello, I've just inherited a project where we need to bring a prototype of a small Ovirt system (single node or 3 node hyperconverged, with glusterFS on the same machine, a bunch of different VM's ) running in an industrial machine into serial production. This means, we want to build a new 1 or 3 node Ovirt system each day up until 3 times a day. In my tests so far, the failure rate of the Ovirt engine deployment (via the included scripts as well as the web UI) turns out to be pretty high - it's between 40-60%, meaning until we have a running system, we would have to try the installation and/or final engine deployment about 2-4 times until we are successful. So far I could not identify clear error messages that let me tell how to fix the problem. Before going into details of the errors I would like to ask if people deeper into Ovirt would consider this a somewhat normal success rate, or if this indicates we are doing something generally wrong and we should definitely spend a few more hours or maybe days into finding sources of problems. More info about the system and errors * OVirt 4.3.9 (because the prototype was made and verified with that version - would be interesting to know, too, if it's strongly considered to upgrade for more stable installation/deployment) * The errors that appear are changing between the deployment process seeming not to be able to transfer the "LocalHostedEngine" VM to the glusterFS storage to become a "HostedEngine", and the other seems to be when the engine is already up and running, but never being really connected to the Ovirt system, continuously restarting, and also showing XFS filesystem errors in it's dmesg output. Any hints on our chances on getting this solved or requests for more information about the error are welcome - thanks in advance. Henning

3 years, 8 months

2
5
0 / 0

The Engine VM (/32) and this host (/32) will not be in the same IP subnet.

by notify.sina＠gmail.com

Hi list, 'The Engine VM (10.200.30.5/32) and this host (10.200.30.3/32) will not be in the same IP subnet.' This happens when I select a static IP address configuration for the engine. Ive been struggling with setting up ovirt-hosted-engine on CentOS 8 servers in the Equinix network, using hosted-engine --deploy. Why would the setup not want the engine and the host it's being setup on, not be in the same subnet? Can someone please help me understand what prerequisites I am missing?

3 years, 8 months

4
6
0 / 0

Issue upgrading VDSM in 4.4.9

by ling＠aliko.com

Hello, I am trying to perform an oVirt host upgrade and ran into this issue and looking for advice. If i try to run dnf update vdsm directly: Problem: cannot install the best update candidate for package vdsm-4.40.80.5-1.el8.x86_64 - nothing provides libvirt-daemon-kvm >= 7.6.0-2 needed by vdsm-4.40.90.3-1.el8.x86_64 (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) After updating ovirt-release to 4.4.9, this is what I see available for libvirt-daemon: [root@vmserver02 yum.repos.d]# dnf list libvirt-daemon-kvm Last metadata expiration check: 2:09:25 ago on Wed 20 Oct 2021 01:28:56 PM PDT.Installed Packageslibvirt-daemon-kvm.x86_64 7.0.0-14.1.el8 @ovirt-4.4-advanced-virtualization And the advance-virtualization repo points to this: [ovirt-4.4-advanced-virtualization] name=Advanced Virtualization packages for $basearch mirrorlist=http://mirrorlist.centos.org/?arch=$basearch&release=8&repo=vi... enabled=1 gpgcheck=1 gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-SIG-Virtualization module_hotfixes=1 I am trying this on Centos 8 and also RHEL 8 machines. Both giving the same errors. Thanks.

3 years, 8 months

5
4
0 / 0

hosted-engine deploy - Network error during communication with the Host. (NFS)

by Matyi Szabolcs

Hi, I get the following error when hosted-engine --deplying: [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch cluster facts] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter facts] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter ID] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Add NFS storage domain] [ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "Network error during communication with the Host.". HTTP r esponse code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"Network error during communication with the Host.\". HTTP response code is 400."} Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: NFS sharing is fine I can mount from VM and also from localhost.. Thanks!

3 years, 8 months

2
1
0 / 0

Gluster Install Fail again :(

by admin＠foundryserver.com

I have been working on getting this up and running for about a week now and I am totally frustrated. I am not sure even where to begin. Here is the error I get when it fails, TASK [gluster.features/roles/gluster_hci : Create the GlusterFS volumes] ******* An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create engine replica 3 transport tcp storage1.private.net:/gluster_bricks/engine/engine storage2.private.net:/gluster_bricks/engine/engine storage3.private.net:/gluster_bricks/engine/engine force) command (rc=1): volume create: engine: failed: Staging failed on storage3.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage2.private.net. Error: Host storage1.private.net not connected\n"} An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create data replica 3 transport tcp storage1.private.net:/gluster_bricks/data/data storage2.private.net:/gluster_bricks/data/data storage3.private.net:/gluster_bricks/data/data force) command (rc=1): volume create: data: failed: Staging failed on storage2.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage3.private.net. Error: Host storage1.private.net not connected\n"} An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create vmstore replica 3 transport tcp storage1.private.net:/gluster_bricks/vmstore/vmstore storage2.private.net:/gluster_bricks/vmstore/vmstore storage3.private.net:/gluster_bricks/vmstore/vmstore force) command (rc=1): volume create: vmstore: failed: Staging failed on storage3.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage2.private.net. Error: Host storage1.private.net is not in 'Peer in Cluster' state\n"} Here are the facts. using 4.4.9 of ovirt. using ovirtnode os partion for gluster: /dev/vda4 > 4T in unformatted space. able to ssh into each host on the private.net and known hosts and fqdn passes fine. On the volume page: all default settings. On the bricks page: JBOD / Blacklist true / storage host storage1.private.net / default lvm except the device is /dev/sda4 I really need to get this setup. The first failure was the filter error, so I edited the /etc/lvm/lvm.conf to comment out the filter line. Then without doing a clean up I reran the deployment and got the above error. Thanks in advance Brad

3 years, 8 months

2
3
0 / 0

Q: oVirt guest agent + spice-vdagent on Debian 11 Bullseye

by Andrei Verovski

Hi, Anyone have compiled these deb packages for Debian 11 Bullseye? oVirt guest agent + spice-vdagent Packages from Buster can’t be installed on Bullseye because of broken libnl dependencies. Thanks in advance. Andrei

3 years, 8 months

2
1
0 / 0

Engine VM FQDN will not validate.

by admin＠foundryserver.com

Hello, I am trying to install the Hosted engine using the wizard. I am NOT using the hyper converged. When I add the fqdn of vmengine1.domain.com I get the error. localhost is not a valid address. When I add the host fqdn in the advanced area it validates, if I put the vmengine1.domain.com in the the host section it validates. Facts: - I have the domain in dns, confirmed by dig to resolve to the public IP of the host. - I have the domain in hosts file confirmed by using ping and dig - I have generated a ssh key for the root user on the host and I have added it to the known hosts file. - I have logged into via ssh to all forms of addressing, so the known hosts file has been updated. localhost ecdsa-sha2-nistp256 AAAA host1 ecdsa-sha2-nistp256 AAAA host1.localhost ecdsa-sha2-nistp256 AAAA vmengine.localhost ecdsa-sha2-nistp256 AAAA host1.domain.com ecdsa-sha2-nistp256 AAAA vmengine1.domain.com ecdsa-sha2-nistp256 AAAA The reading suggests that the error doesn't mean what it says, but something about ssh not able to login to the host. I have been able to login to the host with ssh manually from the cli. Any help would be greatly appreciated. thanks Brad

3 years, 8 months

2
1
0 / 0

The service NetworkManager-wait-online fail to start

by amazirs＠hotmail.com

Hello, i have install new Ovirt node. The service NetworkManager-wait-online fail to start: 10:21 Failed to start Network Manager Wait Online. systemd 10:21 NetworkManager-wait-online.service: Failed with result 'exit-code'. systemd 10:21 NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE systemd 10:20 Starting Network Manager Wait Online... systemd When y try: sh /usr/lib/systemd/system/NetworkManager-wait-online.service : /usr/lib/systemd/system/NetworkManager-wait-online.service: line 1: [Unit]: comm /usr/lib/systemd/system/NetworkManager-wait-online.service: line 2: Manager: com /usr/lib/systemd/system/NetworkManager-wait-online.service: line 3: syntax error /usr/lib/systemd/system/NetworkManager-wait-online.service: line 3: `Documentati I'm desperate to find solution!! thanks

3 years, 8 months

2
2
0 / 0

Snapshot and disk size allocation

by jorgevisentini＠gmail.com

Hello everyone. I would like to know how disk size and snapshot allocation works, because every time I create a new snapshot, it increases 1 GB in the VM's disk size, and when I remove the snap, that space is not returned to Domain Storage. I'm using the oVirt 4.3.10 How do I reprovision the VM disk? Thank you all.

3 years, 8 months

5
7
0 / 0

[ANN] Async oVirt engine release for oVirt 4.4.9

by Lev Veyde

On October 28th 2021 the oVirt project released an async update of oVirt engine (4.4.9.4) Changes: - Require EAP 7.4.2 to use supported upgrade path from EAP 7.3 (Fixes BZ#1963748 <https://bugzilla.redhat.com/1963748>) - Enable IOMMU caching_mode when mdev devices are present (Fixes BZ#2013752 <https://bugzilla.redhat.com/2013752>) -- Lev Veyde Senior Software Engineer, RHCE | RHCVA | MCITP Red Hat Israel <https://www.redhat.com> lev(a)redhat.com | lveyde(a)redhat.com <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>

3 years, 8 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Users October 2021