question about engine deployment success rate
by Henning Sprang
Hello,
I've just inherited a project where we need to bring a prototype of a
small Ovirt system (single node or 3 node hyperconverged, with
glusterFS on the same machine, a bunch of different VM's ) running in
an industrial machine into serial production.
This means, we want to build a new 1 or 3 node Ovirt system each day
up until 3 times a day.
In my tests so far, the failure rate of the Ovirt engine deployment
(via the included scripts as well as the web UI) turns out to be
pretty high - it's between 40-60%, meaning until we have a running
system, we would have to try the installation and/or final engine
deployment about 2-4 times until we are successful.
So far I could not identify clear error messages that let me tell how
to fix the problem.
Before going into details of the errors I would like to ask if people
deeper into Ovirt would consider this a somewhat normal success rate,
or if this indicates we are doing something generally wrong and we
should definitely spend a few more hours or maybe days into finding
sources of problems.
More info about the system and errors
* OVirt 4.3.9 (because the prototype was made and verified with that
version - would be interesting to know, too, if it's strongly
considered to upgrade for more stable installation/deployment)
* The errors that appear are changing between the deployment process
seeming not to be able to transfer the "LocalHostedEngine" VM to the
glusterFS storage to become a "HostedEngine", and the other seems to
be when the engine is already up and running, but never being really
connected to the Ovirt system, continuously restarting, and also
showing XFS filesystem errors in it's dmesg output.
Any hints on our chances on getting this solved or requests for more
information about the error are welcome - thanks in advance.
Henning
3 years, 5 months
The Engine VM (/32) and this host (/32) will not be in the same IP subnet.
by notify.sina@gmail.com
Hi list,
'The Engine VM (10.200.30.5/32) and this host (10.200.30.3/32) will not be in the same IP subnet.'
This happens when I select a static IP address configuration for the engine.
Ive been struggling with setting up ovirt-hosted-engine on CentOS 8 servers in the Equinix network, using hosted-engine --deploy.
Why would the setup not want the engine and the host it's being setup on, not be in the same subnet?
Can someone please help me understand what prerequisites I am missing?
3 years, 5 months
Issue upgrading VDSM in 4.4.9
by ling@aliko.com
Hello, I am trying to perform an oVirt host upgrade and ran into this issue and looking for advice.
If i try to run dnf update vdsm directly:
Problem: cannot install the best update candidate for package vdsm-4.40.80.5-1.el8.x86_64
- nothing provides libvirt-daemon-kvm >= 7.6.0-2 needed by vdsm-4.40.90.3-1.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
After updating ovirt-release to 4.4.9, this is what I see available for libvirt-daemon:
[root@vmserver02 yum.repos.d]# dnf list libvirt-daemon-kvm
Last metadata expiration check: 2:09:25 ago on Wed 20 Oct 2021 01:28:56 PM PDT.Installed Packageslibvirt-daemon-kvm.x86_64 7.0.0-14.1.el8 @ovirt-4.4-advanced-virtualization
And the advance-virtualization repo points to this:
[ovirt-4.4-advanced-virtualization]
name=Advanced Virtualization packages for $basearch
mirrorlist=http://mirrorlist.centos.org/?arch=$basearch&release=8&repo=vi...
enabled=1
gpgcheck=1
gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-SIG-Virtualization
module_hotfixes=1
I am trying this on Centos 8 and also RHEL 8 machines. Both giving the same errors.
Thanks.
3 years, 5 months
hosted-engine deploy - Network error during communication with the Host. (NFS)
by Matyi Szabolcs
Hi,
I get the following error when hosted-engine --deplying:
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch cluster facts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter facts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter ID]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Fetch Datacenter name]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Add NFS storage domain]
[ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
"Network error during communication with the Host.". HTTP r
esponse code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg":
"Fault reason is \"Operation Failed\". Fault detail is \"Network
error
during communication with the Host.\". HTTP response code is 400."}
Please specify the storage you would like to use (glusterfs, iscsi, fc,
nfs)[nfs]:
NFS sharing is fine I can mount from VM and also from localhost..
Thanks!
3 years, 5 months
Gluster Install Fail again :(
by admin@foundryserver.com
I have been working on getting this up and running for about a week now and I am totally frustrated. I am not sure even where to begin. Here is the error I get when it fails,
TASK [gluster.features/roles/gluster_hci : Create the GlusterFS volumes] *******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None
failed: [storage1.private.net] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create engine replica 3 transport tcp storage1.private.net:/gluster_bricks/engine/engine storage2.private.net:/gluster_bricks/engine/engine storage3.private.net:/gluster_bricks/engine/engine force) command (rc=1): volume create: engine: failed: Staging failed on storage3.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage2.private.net. Error: Host storage1.private.net not connected\n"}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None
failed: [storage1.private.net] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create data replica 3 transport tcp storage1.private.net:/gluster_bricks/data/data storage2.private.net:/gluster_bricks/data/data storage3.private.net:/gluster_bricks/data/data force) command (rc=1): volume create: data: failed: Staging failed on storage2.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage3.private.net. Error: Host storage1.private.net not connected\n"}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None
failed: [storage1.private.net] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create vmstore replica 3 transport tcp storage1.private.net:/gluster_bricks/vmstore/vmstore storage2.private.net:/gluster_bricks/vmstore/vmstore storage3.private.net:/gluster_bricks/vmstore/vmstore force) command (rc=1): volume create: vmstore: failed: Staging failed on storage3.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage2.private.net. Error: Host storage1.private.net is not in 'Peer in Cluster' state\n"}
Here are the facts.
using 4.4.9 of ovirt.
using ovirtnode os
partion for gluster: /dev/vda4 > 4T in unformatted space.
able to ssh into each host on the private.net and known hosts and fqdn passes fine.
On the volume page:
all default settings.
On the bricks page:
JBOD / Blacklist true / storage host storage1.private.net / default lvm except the device is /dev/sda4
I really need to get this setup. The first failure was the filter error, so I edited the /etc/lvm/lvm.conf to comment out the filter line. Then without doing a clean up I reran the deployment and got the above error.
Thanks in advance
Brad
3 years, 5 months
Q: oVirt guest agent + spice-vdagent on Debian 11 Bullseye
by Andrei Verovski
Hi,
Anyone have compiled these deb packages for Debian 11 Bullseye?
oVirt guest agent + spice-vdagent
Packages from Buster can’t be installed on Bullseye because of broken libnl dependencies.
Thanks in advance.
Andrei
3 years, 5 months
Engine VM FQDN will not validate.
by admin@foundryserver.com
Hello,
I am trying to install the Hosted engine using the wizard. I am NOT using the hyper converged. When I add the fqdn of vmengine1.domain.com I get the error. localhost is not a valid address. When I add the host fqdn in the advanced area it validates, if I put the vmengine1.domain.com in the the host section it validates.
Facts:
- I have the domain in dns, confirmed by dig to resolve to the public IP of the host.
- I have the domain in hosts file confirmed by using ping and dig
- I have generated a ssh key for the root user on the host and I have added it to the known hosts file.
- I have logged into via ssh to all forms of addressing, so the known hosts file has been updated.
localhost ecdsa-sha2-nistp256 AAAA
host1 ecdsa-sha2-nistp256 AAAA
host1.localhost ecdsa-sha2-nistp256 AAAA
vmengine.localhost ecdsa-sha2-nistp256 AAAA
host1.domain.com ecdsa-sha2-nistp256 AAAA
vmengine1.domain.com ecdsa-sha2-nistp256 AAAA
The reading suggests that the error doesn't mean what it says, but something about ssh not able to login to the host. I have been able to login to the host with ssh manually from the cli. Any help would be greatly appreciated.
thanks
Brad
3 years, 5 months
The service NetworkManager-wait-online fail to start
by amazirs@hotmail.com
Hello,
i have install new Ovirt node.
The service NetworkManager-wait-online fail to start:
10:21 Failed to start Network Manager Wait Online. systemd
10:21 NetworkManager-wait-online.service: Failed with result 'exit-code'. systemd
10:21 NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE systemd
10:20 Starting Network Manager Wait Online... systemd
When y try:
sh /usr/lib/systemd/system/NetworkManager-wait-online.service :
/usr/lib/systemd/system/NetworkManager-wait-online.service: line 1: [Unit]: comm
/usr/lib/systemd/system/NetworkManager-wait-online.service: line 2: Manager: com
/usr/lib/systemd/system/NetworkManager-wait-online.service: line 3: syntax error
/usr/lib/systemd/system/NetworkManager-wait-online.service: line 3: `Documentati
I'm desperate to find solution!!
thanks
3 years, 5 months
Snapshot and disk size allocation
by jorgevisentini@gmail.com
Hello everyone.
I would like to know how disk size and snapshot allocation works, because every time I create a new snapshot, it increases 1 GB in the VM's disk size, and when I remove the snap, that space is not returned to Domain Storage.
I'm using the oVirt 4.3.10
How do I reprovision the VM disk?
Thank you all.
3 years, 5 months