4.5.4 with Ceph only storage
by Maurice Burrows
Hey ... A long story short ... I have an existing Red Hat Virt / Gluster hyperconverged solution that I am moving away from.
I have an existing Ceph cluster that I primarily use for OpenStack and a small requirement for S3 via RGW.
I'm planning to build a new oVirt 4.5.4 cluster on RHEL9 using Ceph for all storage requirements. I've read many online articles on oVirt and Ceph, and they all seem to use the Ceph iSCSI gateway, which is now in maintenance, so I'm not real keen to commit to iSCSI.
So my question is, IS there any reason I cannot use CephFS for both hosted-engine and as a data storage domain?
I'm currently running Ceph Pacific FWIW.
Cheers
8 months, 2 weeks
Changing disk QoS causes segfault with IO-Threads enabled (oVirt 4.3.0.4-1.el7)
by jloh@squiz.net
We recently upgraded to 4.3.0 and have found that when changing disk QoS settings on VMs whilst IO-Threads is enabled causes them to segfault and the VM to reboot. We've been able to replicate this across several VMs. VMs with IO-Threads disabled/turned off do not segfault when changing the QoS.
Mar 1 11:49:06 srvXX kernel: IO iothread1[30468]: segfault at fffffffffffffff8 ip 0000557649f2bd24 sp 00007f80de832f60 error 5 in qemu-kvm[5576498dd000+a03000]
Mar 1 11:49:06 srvXX abrt-hook-ccpp: invalid number 'iothread1'
Mar 1 11:49:11 srvXX libvirtd: 2019-03-01 00:49:11.116+0000: 13365: error : qemuMonitorIORead:609 : Unable to read from monitor: Connection reset by peer
Happy to supply some more logs to someone if they'll help but just wondering whether anyone else has experienced this or knows of a current fix other than turning io-threads off.
Cheers.
10 months, 3 weeks
Deploy oVirt Engine fail behind proxy
by Matteo Bonardi
Hi,
I am trying to deploy the ovirt engine following self-hosted engine installation procedure on documentation.
Deployment servers are behind a proxy and I have set it in environment and in yum.conf before run deploy.
Deploy fails because ovirt engine vm cannot resolve AppStream repository url:
[ INFO ] TASK [ovirt.engine-setup : Install oVirt Engine package]
[ ERROR ] fatal: [localhost -> ovirt-manager.mydomain]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'AppStream': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for http://mirrorlist.centos.org/?release=8&arch=x86_64&repo=AppStream&infra=... [Could not resolve host: mirrorlist.centos.org]", "rc": 1, "results": []}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO ] Stage: Clean up
[ INFO ] Cleaning temporary resources
[ INFO ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Force facts gathering]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Fetch logs from the engine VM]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set destination directory path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Create destination directory]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Find the local appliance image]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set local_vm_disk_path]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Give the vm time to flush dirty buffers]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Copy engine logs]
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove local vm dir]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in /etc/hosts for the local VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Clean local storage pools]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Destroy local storage-pool {{ he_local_vm_dir | basename }}]
[ INFO ] TASK [ovirt.hosted_engine_setup : Undefine local storage-pool {{ he_local_vm_dir | basename }}]
[ INFO ] TASK [ovirt.hosted_engine_setup : Destroy local storage-pool {{ local_vm_disk_path.split('/')[5] }}]
[ INFO ] TASK [ovirt.hosted_engine_setup : Undefine local storage-pool {{ local_vm_disk_path.split('/')[5] }}]
[ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20201109165237.conf'
[ INFO ] Stage: Pre-termination
[ INFO ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20201109164244-b3e8sd.log
How I can set proxy for the engine vm?
Ovirt version:
[root@myhost ~]# rpm -qa | grep ovirt-engine-appliance
ovirt-engine-appliance-4.4-20200916125954.1.el8.x86_64
[root@myhost ~]# rpm -qa | grep ovirt-hosted-engine-setup
ovirt-hosted-engine-setup-2.4.6-1.el8.noarch
OS version:
[root@myhost ~]# cat /etc/centos-release
CentOS Linux release 8.2.2004 (Core)
[root@myhost ~]# uname -a
Linux myhost.mydomain 4.18.0-193.28.1.el8_2.x86_64 #1 SMP Thu Oct 22 00:20:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Thanks for the help.
Regards,
Matteo
11 months, 3 weeks
The oVirt Counter
by Sandro Bonazzola
Hi, for those who remember the Linux Counter project, if you'd like other
to know you're using oVirt and know some details about your deployment,
here's a way to count you in:
https://ovirt.org/community/ovirt-counter.html
Enjoy!
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D PERFORMANCE & SCALE
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
1 year
Cannot restart ovirt after massive failure.
by Gilboa Davara
Hello all,
During the night, one of my (smaller) setups, a single node self hosted
engine (localhost NFS) crashed due to what-looks-like a massive disk
failure (Software RAID6, with 10 drives + spare).
After a reboot, I let the RAID resync with a fresh drive) and went on to
start oVirt.
However, no such luck.
Two issues:
1. ovirt-ha-broker fails due to broken hosted engine state (log attached).
2. ovirt-ha-agent fails due to network test (tcp) even though both
remote-host and DNS servers are active. (log attached).
Two questions:
1. Can I somehow force the agent to disable the network liveliness test?
2. Can I somehow force the broker to rebuild / fix the hosted engine state?
- Gilboa
1 year, 1 month
Please, Please Help - New oVirt Install/Deployment Failing - "Host is not up..."
by Matthew J Black
Hi Everyone,
Could someone please help me - I've been trying to do an install of oVirt for *weeks* (including false starts and self-inflicted wounds/errors) and it is still not working.
My setup:
- oVirt v4.5.3
- A brand new fresh vanilla install of RockyLinux 8.6 - all working AOK
- 2*NICs in a bond (802.3ad) with a couple of sub-Interfaces/VLANs - all working AOK
- All relevant IPv4 Address in DNS with Reverse Lookups - all working AOK
- All relevant IPv4 Address in "/etc/hosts" file - all working AOK
- IPv6 (using "method=auto" in the interface config file) enabled on the relevant sub-Interface/VLAN - I'm not using IPv6 on the network, only IPv4, but I'm trying to cover all the bases.
- All relevant Ports (as per the oVirt documentation) set up on the firewall
- ie firewall-cmd --add-service={{ libvirt-tls | ovirt-imageio | ovirt-vmconsole | vdsm }}
- All the relevant Repositories installed (ie RockyLinux BaseOS, AppStream, & PowerTools, and the EPEL, plus the ones from the oVirt documentation)
I have followed the oVirt documentation (including the special RHEL-instructions and RockyLinux-instructions) to the letter - no deviations, no special settings, exactly as they are written.
All the dnf installs, etc, went off without a hitch, including the "dnf install centos-release-ovirt45", "dnf install ovirt-engine-appliance", and "dnf install ovirt-hosted-engine-setup" - no errors anywhere.
Here is the results of a "dnf repolist":
- appstream Rocky Linux 8 - AppStream
- baseos Rocky Linux 8 - BaseOS
- centos-ceph-pacific CentOS-8-stream - Ceph Pacific
- centos-gluster10 CentOS-8-stream - Gluster 10
- centos-nfv-openvswitch CentOS-8 - NFV OpenvSwitch
- centos-opstools CentOS-OpsTools - collectd
- centos-ovirt45 CentOS Stream 8 - oVirt 4.5
- cs8-extras CentOS Stream 8 - Extras
- cs8-extras-common CentOS Stream 8 - Extras common packages
- epel Extra Packages for Enterprise Linux 8 - x86_64
- epel-modular Extra Packages for Enterprise Linux Modular 8 - x86_64
- ovirt-45-centos-stream-openstack-yoga CentOS Stream 8 - oVirt 4.5 - OpenStack Yoga Repository
- ovirt-45-upstream oVirt upstream for CentOS Stream 8 - oVirt 4.5
- powertools Rocky Linux 8 - PowerTools
So I kicked-off the oVirt deployment with: "hosted-engine --deploy --4 --ansible-extra-vars=he_offline_deployment=true".
I used "--ansible-extra-vars=he_offline_deployment=true" because without that flag I was getting "DNF timout" issues (see my previous post `Local (Deployment) VM Can't Reach "centos-ceph-pacific" Repo`).
I answer the defaults to all of questions the script asked, or entered the deployment-relevant answers where appropriate. In doing this I double-checked every answer before hitting <Enter>. Everything progressed smoothly until the deployment reached the "Wait for the host to be up" task... which then hung for more than 30 minutes before failing.
From the ovirt-hosted-engine-setup... log file:
- 2022-10-20 17:54:26,285+1100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:113 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Host is not up, please check logs, perhaps also on the engine machine"}
I checked the following log files and found all of the relevant ERROR lines, then checked several 10s of proceeding and succeeding lines trying to determine what was going wrong, but I could not determine anything.
- ovirt-hosted-engine-setup...
- ovirt-hosted-engine-setup-ansible-bootstrap_local_vm...
- ovirt-hosted-engine-setup-ansible-final_clean... - not really relevant, I believe
I can include the log files (or the relevant parts of the log files) if people want - but that are very large: several 100 kilobytes each.
I also googled "oVirt Host is not up" and found several entries, but after reading them all the most relevant seems to be a thread from these mailing list: `Install of RHV 4.4 failing - "Host is not up, please check logs, perhaps also on the engine machine"` - but this seems to be talking about an upgrade and I didn't gleam anything useful from it - I could, of course, be wrong about that.
So my questions are:
- Where else should I be looking (ie other log files, etc, and possible where to find them)?
- Does anyone have any idea why this isn't working?
- Does anyone have a work-around (including a completely manual process to get things working - I don't mind working in the CLI with virsh, etc)?
- What am I doing wrong?
Please, I'm really stumped with this, and I really do need help.
Cheers
Dulux-Oz
1 year, 1 month
how to renew expired ovirt node vdsm cert manually ?
by dhanaraj.ramesh@yahoo.com
below are the steps to renew the expired vdsm cert of ovirt node
# To check CERT expired
# openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -noout -dates
1. Backup vdsm folder
# cd /etc/pki
# mv vdsm vdsm.orig
# mkdir vdsm ; chown vdsm:kvm vdsm
# cd vdsm
# mkdir libvirt-vnc certs keys libvirt-spice libvirt-migrate
# chown vdsm:kvm libvirt-vnc certs keys libvirt-spice libvirt-migrate
2. Regenerate cert & keys
# vdsm-tool configure --module certificates
3. Copy the cert to destination location
chmod 440 /etc/pki/vdsm/keys/vdsmkey.pem
chown root /etc/pki/vdsmcerts/*pem
chmod 644 /etc/pki/vdsmcerts/*pem
cp /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/libvirt-spice/ca-cert.pem
cp /etc/pki/vdsm/keys/vdsmkey.pem /etc/pki/vdsm/libvirt-spice/server-key.pem
cp /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/vdsm/libvirt-spice/server-cert.pem
cp /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/libvirt-vnc/ca-cert.pem
cp /etc/pki/vdsm/keys/vdsmkey.pem /etc/pki/vdsm/libvirt-vnc/server-key.pem
cp /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/vdsm/libvirt-vnc/server-cert.pem
cp -p /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/libvirt-migrate/ca-cert.pem
cp -p /etc/pki/vdsm/keys/vdsmkey.pem /etc/pki/vdsm/libvirt-migrate/server-key.pem
cp -p /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/vdsm/libvirt-migrate/server-cert.pem
chown root:qemu /etc/pki/vdsm/libvirt-migrate/server-key.pem
cp -p /etc/pki/vdsm.orig/keys/libvirt_password /etc/pki/vdsm/keys/
mv /etc/pki/libvirt/clientcert.pem /etc/pki/libvirt/clientcert.pem.orig
mv /etc/pki/libvirt/private/clientkey.pem /etc/pki/libvirt/private/clientkey.pem.orig
mv /etc/pki/CA/cacert.pem /etc/pki/CA/cacert.pem.orig
cp -p /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/libvirt/clientcert.pem
cp -p /etc/pki/vdsm/keys/vdsmkey.pem /etc/pki/libvirt/private/clientkey.pem
cp -p /etc/pki/vdsm/certs/cacert.pem /etc/pki/CA/cacert.pem
3. cross check the backup folder /etc/pki/vdsm.orig vs /etc/pki/vdsm
# refer to /etc/pki/vdsm.orig/*/ and set the correct owner & group permission in /etc/pki/vdsm/*/
4. restart services # Make sure both services are up
systemctl restart vdsmd libvirtd
1 year, 2 months
Unable to install oVirt on RHEL7.5
by SS00514758@techmahindra.com
Hi All,
I am unable to install oVirt on RHEL7.5, to install it I am taking reference of below link,
https://www.ovirt.org/documentation/install-guide/chap-Installing_oVirt.html
But though it is not working for me, couple of dependencies is not getting installed, and because of this I am not able to run the ovirt-engine, below are the depencies packages that unable to install,
Error: Package: collectd-write_http-5.8.0-6.1.el7.x86_64 (@ovirt-4.2-centos-opstools)
Requires: collectd(x86-64) = 5.8.0-6.1.el7
Removing: collectd-5.8.0-6.1.el7.x86_64 (@ovirt-4.2-centos-opstools)
collectd(x86-64) = 5.8.0-6.1.el7
Updated By: collectd-5.8.1-1.el7.x86_64 (epel)
collectd(x86-64) = 5.8.1-1.el7
Available: collectd-5.7.2-1.el7.x86_64 (ovirt-4.2-centos-opstools)
collectd(x86-64) = 5.7.2-1.el7
Available: collectd-5.7.2-3.el7.x86_64 (ovirt-4.2-centos-opstools)
collectd(x86-64) = 5.7.2-3.el7
Available: collectd-5.8.0-2.el7.x86_64 (ovirt-4.2-centos-opstools)
collectd(x86-64) = 5.8.0-2.el7
Available: collectd-5.8.0-3.el7.x86_64 (ovirt-4.2-centos-opstools)
collectd(x86-64) = 5.8.0-3.el7
Available: collectd-5.8.0-5.el7.x86_64 (ovirt-4.2-centos-opstools)
collectd(x86-64) = 5.8.0-5.el7
Help me to install this.
Looking forward to resolve this issue.
Regards
Sumit Sahay
1 year, 2 months
Grafana - Origin Not Allowed
by Maton, Brett
oVirt 4.5.0.8-1.el8
I tried to connect to grafana via the monitoring portal link from the dash
and all panels are failing to display any data with varying error messages,
but all include 'Origin Not Allowed'
I navigated to Data Sources and ran a test on the PostgreSQL connection
(localhost) which threw the same Origin Not Allowed error message.
Any suggestions?
1 year, 2 months
Multiple hosts stuck in Connecting state waiting for storage pool to go up.
by ivan.lezhnjov.iv@gmail.com
Hi!
We have a problem with multiple hosts stuck in Connecting state, which I hoped somebody here could help us wrap our heads around.
All hosts, except one, seem to have very similar symptoms but I'll focus on one host that represents the rest.
So, the host is stuck in Connecting state and this what we see in oVirt log files.
/var/log/ovirt-engine/engine.log:
2023-04-20 09:51:53,021+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = ABC010-176-XYZ, VdsIdAndVdsVDSCommandParametersBase:{hostId='2c458562-3d4d-4408-afc9-9a9484984a91', vds='Host[ABC010-176-XYZ,2c458562-3d4d-4408-afc9-9a9484984a91]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: SSL session is invalid
2023-04-20 09:55:16,556+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-67) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ABC010-176-XYZ command Get Host Capabilities failed: Message timeout which can be caused by communication issues
/var/log/vdsm/vdsm.log:
2023-04-20 17:48:51,977+0300 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList() from=internal, task_id=ebce7c8c-6ded-454e-9aee-86edf72764ef (api:31)
2023-04-20 17:48:51,977+0300 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=ebce7c8c-6ded-454e-9aee-86edf72764ef (api:37)
2023-04-20 17:48:51,978+0300 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)
Both engine.log and vdsm.log are flooded with these messages. They are repeated at regular intervals ad infinitum. This is one common symptom shared by multiple hosts in our deployment. They all have these message loops in engine.log and vdsm.log files. On all
Running vdsm-client Host getConnectedStoragePools also returns an empty list represented by [] on all hosts (but interestingly there is one that showed Storage Pool UUID and yet it was still stuck in Connecting state).
This particular host (ABC010-176-XYZ) is connected to 3 CEPH iSCSI Storage Domains and lsblk shows 3 block devices with matching UUIDs in their device components. So, the storage seems to be connected but the Storage Pool is not? How is that even possible?
Now, what's even more weird is that we tried rebooting the host (via Administrator Portal) and it didn't help. We even tried removing and re-adding the host in Administrator Portal but to no avail.
Additionally, the host refused to go into Maintenance mode so we had to enforce it by manually updating Engine DB.
We also tried reinstalling the host via Administrator Portal and ran into another weird problem, which I'm not sure if it's a related one or a problem that deserves a dedicated discussion thread but, basically, the underlying Ansible playbook exited with the following error message:
"stdout" : "fatal: [10.10.10.176]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"10.10.10.176\\\". Make sure this host can be reached over ssh: \", \"unreachable\": true}",
Counterintuitively, just before running Reinstall via Administrator Portal we had been able to reboot the same host (which as you know oVirt does via Ansible as well). So, no changes on the host in between just different Ansible playbooks. To confirm that we actually had access to the host over ssh we successfully ran ssh -p $PORT root(a)10.10.10.176 -i /etc/pki/ovirt-engine/keys/engine_id_rsa and it worked.
That made us scratch our heads for a while but what seems to had fixed Ansible's ssh access problems was manual full stop of all VDSM-related systemd services on the host. It was just a wild guess but as soon as we stopped all VDSM services Ansible stopped complaining about not being able to reach the target host and successfully did its job.
I'm sure you'd like to see more logs but I'm not certain what exactly is relevant. There are a ton of logs as this deployment is comprised of nearly 80 hosts. So, I guess it's best if you just request to see specific logs, messages or configuration details and I'll cherry-pick what's relevant.
We don't really understand what's going on and would appreciate any help. We tried just about anything we could think of to resolve this issue and are running out of ideas what to do next.
If you have any questions just ask and I'll do my best to answer them.
1 year, 4 months