May 2020 - Users - oVirt List Archives

oVirt 4.4.0 Release is now generally available
by Sandro Bonazzola 12 Aug '20

12 Aug '20

oVirt 4.4.0 Release is now generally available The oVirt Project is excited to announce the general availability of the oVirt 4.4.0 Release, as of May 20th, 2020 This release unleashes an altogether more powerful and flexible open source virtualization solution that encompasses hundreds of individual changes and a wide range of enhancements across the engine, storage, network, user interface, and analytics, as compared to oVirt 4.3. Important notes before you install / upgrade Some of the features included in the oVirt 4.4.0 release require content that will be available in CentOS Linux 8.2 but cannot be tested on RHEL 8.2 yet due to some incompatibility in the openvswitch package that is shipped in CentOS Virt SIG, which requires rebuilding openvswitch on top of CentOS 8.2. The cluster switch type OVS is not implemented for CentOS 8 hosts. Please note that oVirt 4.4 only supports clusters and datacenters with compatibility version 4.2 and above. If clusters or datacenters are running with an older compatibility version, you need to upgrade them to at least 4.2 (4.3 is recommended). Please note that in RHEL 8 / CentOS 8 several devices that worked on EL7 are no longer supported. For example, megaraid_sas driver is removed. If you use Enterprise Linux 8 hosts you can try to provide the necessary drivers for the deprecated hardware using the DUD method (See users mailing list thread on this at https://lists.ovirt.org/archives/list/users@ovirt.org/thread/NDSVUZSESOXEFJ… ) Installation instructions For the engine: either use the oVirt appliance or install CentOS Linux 8 minimal by following these steps: - Install the CentOS Linux 8 image from http://centos.mirror.garr.it/centos/8.1.1911/isos/x86_64/CentOS-8.1.1911-x8… - dnf install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm - dnf update (reboot if needed) - dnf module enable -y javapackages-tools pki-deps postgresql:12 - dnf install ovirt-engine - engine-setup For the nodes: Either use oVirt Node ISO or: - Install CentOS Linux 8 from http://centos.mirror.garr.it/centos/8.1.1911/isos/x86_64/CentOS-8.1.1911-x8…, selecting the minimal installation. - dnf install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm - dnf update (reboot if needed) - Attach the host to the engine and let it be deployed. Update instructionsUpdate from oVirt 4.4 Release Candidate On the engine side and on CentOS hosts, you’ll need to switch from ovirt44-pre to ovirt44 repositories. In order to do so, you need to: 1. dnf remove ovirt-release44-pre 2. rm -f /etc/yum.repos.d/ovirt-4.4-pre-dependencies.repo 3. rm -f /etc/yum.repos.d/ovirt-4.4-pre.repo 4. dnf install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm 5. dnf update On the engine side you’ll need to run engine-setup only if you were not already on the latest release candidate. On oVirt Node, you’ll need to upgrade with: 1. Move node to maintenance 2. dnf install https://resources.ovirt.org/pub/ovirt-4.4/rpm/el8/noarch/ovirt-node-ng-imag… 3. Reboot 4. Activate the host Update from oVirt 4.3 oVirt 4.4 is available only for CentOS 8. In-place upgrades from previous installations, based on CentOS 7, are not possible. For the engine, use backup, and restore that into a new engine. Nodes will need to be reinstalled. A 4.4 engine can still manage existing 4.3 hosts, but you can’t add new ones. For a standalone engine, please refer to upgrade procedure at https://ovirt.org/documentation/upgrade_guide/#Upgrading_from_4-3 If needed, run ovirt-engine-rename (see engine rename tool documentation at https://www.ovirt.org/documentation/admin-guide/chap-Utilities.html ) When upgrading hosts: You need to upgrade one host at a time. 1. Turn host to maintenance. Virtual machines on that host should migrate automatically to a different host. 2. Remove it from the engine 3. Re-install it with el8 or oVirt Node as per installation instructions 4. Re-add the host to the engine Please note that you may see some issues live migrating VMs from el7 to el8. If you hit such a case, please turn off the vm on el7 host and get it started on the new el8 host in order to be able to move the next el7 host to maintenance. What’s new in oVirt 4.4.0 Release? - Hypervisors based on CentOS Linux 8 (rebuilt from award winning RHEL8), for both oVirt Node and standalone CentOS Linux hosts. - Easier network management and configuration flexibility with NetworkManager. - VMs based on a more modern Q35 chipset with legacy SeaBIOS and UEFI firmware. - Support for direct passthrough of local host disks to VMs. - Live migration improvements for High Performance guests. - New Windows guest tools installer based on WiX framework now moved to VirtioWin project. - Dropped support for cluster level prior to 4.2. - Dropped API/SDK v3 support deprecated in past versions. - 4K block disk support only for file-based storage. iSCSI/FC storage do not support 4K disks yet. - You can export a VM to a data domain. - You can edit floating disks. - Ansible Runner (ansible-runner) is integrated within the engine, enabling more detailed monitoring of playbooks executed from the engine. - Adding and reinstalling hosts is now completely based on Ansible, replacing ovirt-host-deploy, which is not used anymore. - The OpenStack Neutron Agent cannot be configured by oVirt anymore, it should be configured by TripleO instead. This release is available now on x86_64 architecture for: * Red Hat Enterprise Linux 8.1 * CentOS Linux (or similar) 8.1 This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: * Red Hat Enterprise Linux 8.1 * CentOS Linux (or similar) 8.1 * oVirt Node 4.4 based on CentOS Linux 8.1 (available for x86_64 only) See the release notes [1] for installation instructions and a list of new features and bugs fixed. If you manage more than one oVirt instance, OKD or RDO we also recommend to try ManageIQ <http://manageiq.org/>. In such a case, please be sure to take the qc2 image and not the ova image. Notes: - oVirt Appliance is already available for CentOS Linux 8 - oVirt Node NG is already available for CentOS Linux 8 Additional Resources: * Read more about the oVirt 4.4.0 release highlights: http://www.ovirt.org/release/4.4.0/ * Get more oVirt project updates on Twitter: https://twitter.com/ovirt * Check out the latest project news on the oVirt blog: http://www.ovirt.org/blog/ [1] http://www.ovirt.org/release/4.4.0/ [2] http://resources.ovirt.org/pub/ovirt-4.4/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo(a)redhat.com <https://www.redhat.com/> [image: |Our code is open_] <https://www.redhat.com/en/our-code-is-open> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

11 35

4.4 HCI Install Failure - Missing /etc/pki/CA/cacert.pem
by Stephen Panicho 09 Aug '20

09 Aug '20

Hi all! I'm using Cockpit to perform an HCI install, and it fails at the hosted engine deploy. Libvirtd can't restart because of a missing /etc/pki/CA/cacert.pem file. The log (tasks seemingly from /usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/initial_clean.yml): [ INFO ] TASK [ovirt.hosted_engine_setup : Stop libvirt service] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Drop vdsm config statements] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Restore initial abrt config files] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Restart abrtd service] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Drop libvirt sasl2 configuration by vdsm] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Stop and disable services] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Restore initial libvirt default network configuration] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Start libvirt] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Unable to start service libvirtd: Job for libvirtd.service failed because the control process exited with error code.\nSee \"systemctl status libvirtd.service\" and \"journalctl -xe\" for details.\n"} journalctl -u libvirtd: May 22 04:33:25 node1 libvirtd[26392]: libvirt version: 5.6.0, package: 10.el8 (CBS <cbs(a)centos.org>, 2020-02-27-01:09:46, ) May 22 04:33:25 node1 libvirtd[26392]: hostname: node1 May 22 04:33:25 node1 libvirtd[26392]: Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory May 22 04:33:25 node1 systemd[1]: libvirtd.service: Main process exited, code=exited, status=6/NOTCONFIGURED May 22 04:33:25 node1 systemd[1]: libvirtd.service: Failed with result 'exit-code'. May 22 04:33:25 node1 systemd[1]: Failed to start Virtualization daemon. From a fresh CentOS 8.1 minimal install, I've installed the following: - The 4.4 repo - cockpit - ovirt-cockpit-dashboard - vdsm-gluster (providing glusterfs-server and allowing the Gluster Wizard to complete) - gluster-ansible-roles (only on the bootstrap host) I'm not exactly sure what that initial bit of the playbook does. Comparing the bootstrap node with another that has yet to be touched, both /etc/libvirt/libvirtd.conf and /etc/sysconfig/libvirtd are the same on both hosts. Yet the bootstrap host can no longer start libvirtd while the other host can. Neither host has the /etc/pki/CA/cacert.pem file. Please let me know if I can provide any more information. Thanks!

9 15

oVirt thrashes Docker network during installation
by thomas＠hoberg.net 30 Jul '20

30 Jul '20

I want to run containers and VMs side by side and not necessarily nested. The main reason for that is GPUs, Voltas mostly, used for CUDA machine learning not for VDI, which is what most of the VM orchestrators like oVirt or vSphere seem to focus on. And CUDA drivers are notorious for refusing to work under KVM unless you pay $esla. oVirt is more of a side show in my environment, used to run some smaller functional VMs alongside bigger containers, but also in order to consolidate and re-distribute the local compute node storage as a Gluster storage pool: Kibbutz storage and compute, if you want, very much how I understand the HCI philosophy behind oVirt. The full integration of containers and VMs is still very much on the roadmap I believe, but I was surprised to see that even co-existence seems to be a problem currently. So I set-up a 3-node HCI on CentOS7 (GPU-less and older) hosts and then added additional (beefier GPGPU) CentOS7 hosts, that have been running CUDA workloads on the latest Docker-CE v19 something. The installation works fine, I can migrate VMs to these extra hosts etc., but to my dismay Docker containers on these hosts lose access to the local network, that is the entire subnet the host is in. For some strange reason I can still ping Internet hosts, perhaps even everything behind the host's gateway, but local connections are blocked. It would seem that the ovritmgmt network that the oVirt installation puts in breaks the docker0 bridge that Docker put there first. I'd consider that a bug, but I'd like to gather some feedback first, if anyone else has run into this problem. I've repeated this several times in completely distinct environments with the same results: Simply add a host with a working Docker-CE as an oVirt host to an existing DC/cluster and then try if you can still ping anyone on that net, including the Docker host from a busybox container afterwards (should try that ping just before you actually add it). No, I didn't try this with podman yet, because that's separate challenge with CUDA: Would love to know if that is part of QA for oVirt already.

5 9

Non storage nodes erronously included in quota calculations for HCI?
by thomas＠hoberg.net 30 Jul '20

30 Jul '20

For my home-lab I operate a 3 node HCI cluster on 100% passive Atoms, mostly to run light infrastructure services such as LDAP and NextCloud. I then add workstations or even laptops as pure compute hosts to the cluster for bigger but temporary things, that might actually run a different OS most of the time or just be shut off. From oVirt's point of view, these are just first put into maintenance and then shut down until needed again. No fencing or power management, all manual. All nodes, even the HCI ones, run CentOS7 with more of a workstation configuration, so updates pile up pretty quickly. After I recently upgraded one of these extra compute nodes, I found my three node HCI cluster not just faltering, but indeed very hard to reactivate at all. The faltering is a distinct issue: I have the impression that reboots of oVirt nodes cause broadcast storms on my rather simplistic 10Gibt L2 switch, which a normal CentOS instance (or any other OS) doesn't, but that's for another post. No what struck me, was that the gluster daemons on the three HCI nodes kept complaining about a lack of quorum long after the network was all back to normal, even if all three of them were there, saw each other perfectly on "gluster show status all", ready and without any healing issues pending at all. Glusterd would complain on all three nodes that there was no quota for the bricks and stop them. That went away as soon as I started one additional compute node, a node that was a gluster peer (because an oVirt host added to a HCI cluster always gets put into the Gluster, even if it's not contributing storage) but had no bricks. Immediately the gluster daemon on the three nodes with contributing bricks would report back good quota and launch the volumes (and thus all the rest of oVirt), even if in terms of *storage bricks* nothing had changed. I am afraid that downing the extra compute-only oVirtNode will bring down the HCI: Clearly not the type of redundancy it's designed to deliver. Evidently such compute-only hosts (and gluster members) get included into some quorum deliberations even if they hold not a single brick, neither storage nor arbitration. To me that seems like a bug, if that is indeed what happens: There I need your advice and suggestions. AFAIK HCI is a late addition to oVirt/RHEV as storage and compute were orginally designed to be completely distinct. In fact there are still remnants of documentation which seem to prohibit using a node for both compute and storage... what HCI is all about. And I have seen compute nodes with "matching" storage (parts of a distinct HCI setup, that was taken down but still had all the storage and Gluster elements operable), being happliy absorbed into a HCI cluster with all Gluster storage appearing in the GUI etc., without any manual creation or inclusion of bricks: Fully automatic (and undocumented)! In that case it makes sense to widen the scope of quota calculations when additional nodes are hyperconverged elements with contributing bricks. It also seems the only way to turn a 3 node HCI into 6 or 9 node one. But if you really just want to add compute nodes without bricks, those can't get "quota votes" without storage to play a role in the redundancy. I can easily imagine the missing "if then else" in the code here, but I was actually very surprised to see those failure and success messages coming from glusterd itself, which to my understanding is pretty unrelated to oVirt on top. Not from the management engine (wasn't running anyway), not from VDSM. Re-creating the scenario is very scary even if I have gone through this three times already, trying to just bring my HCI back up. And then there is so verbose logs all over the place that I'd like some advice which ones I should post. But simply speaking: Gluster peers should get no quota voting rights on volumes unless they contribute bricks. That rule seems broken. Those in the know, please let me know if am on a goose chase or if there is a real issue here that deserves a bug report.

2 5

Shutdown procedure for single host HCI Gluster
by Gianluca Cecchi 25 Jul '20

25 Jul '20

Hello, I'm testing the single node HCI with ovirt-node-ng 4.3.9 iso. Very nice and many improvements over the last time I tried it. Good! I have a doubt related to shutdown procedure of the server. Here below my steps: - Shutdown all VMs (except engine) - Put into maintenance data and vmstore domains - Enable Global HA Maintenance - Shutdown engine - Shutdown hypervisor It seems that the last step doesn't end and I had to brutally power off the hypervisor. Here the screenshot regarding infinite failure in unmounting /gluster_bricks/engine https://drive.google.com/file/d/1ee0HG21XmYVA0t7LYo5hcFx1iLxZdZ-E/view?usp=… What would be the right step to do before the final shutdown of hypervisor? Thanks, Gianluca

3 10

Upgrade ovirt from 3.4 to 4.3
by lu.alfonsi＠almaviva.it 19 Jun '20

19 Jun '20

Good morning, i have a difficult enviroment with 20 Hypervisors based on ovirt 3.4.3-1 and i would like to reach the 4.3 version. Which are the best steps to achieve these objective? Thanks in advance Luigi

3 10

PKIX path error
by Stack Korora 11 Jun '20

11 Jun '20

Greetings, I have a running oVirt install that's been working for almost 2 years. I'm building a _completely_ new install. I mention it because it is useful for me to compare configurations when I run into issues like this one. Right now there are three physical hosts: 1x management where I run the engine and db 2x hypervisor nodes. I had it up and installed and running smooth this morning on 4.3.9.4-1.el7 on Scientific Linux 7.8 (fully patched). I copied over our 3rd party certs from the running system and restarted httpd. Perfect. SSL is running! /etc/pki/ovirt-engine/apache-ca.pem /etc/pki/ovirt-engine/certs/apache.cer /etc/pki/ovirt-engine/keys/apache.key.nopass Next I used ovirt-engine-extension-aaa-ldap-setup to point to our ldap server. I did the login and search test and both passed on the command line! Horray! Then I went to the web interface... sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target I'm digging through logs and I don't see anything close to this error except nearly the identical message in engine.log. ERROR [org.ovirt.engine.core.aaa.servlet.SslPostLoginServlet] (default task-2) [] server_error: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target I can't log in via the web at all, I only get that message (so I can't even test out the local admin). The aaa ldap configuration it generated is darn near perfectly identical (just a name change). The certs are the same. Even when I look in the keystore, the sha1 hashes are the same between the two environments! After over an hour poking at this, I'm completely stumped. Can someone please give me a pointer on what I should try next? Thanks! ~Stack~

3 8

First ovirt 4.4 installation failing
by wart＠caltech.edu 09 Jun '20

09 Jun '20

I'm having some trouble setting up my first oVirt system. I have the CentOS 8 installation on the bare metal (ovirt1.ldas.ligo-la.caltech.edu) the ovirt4.4 packages installed, and then try running 'hosted-engine --deploy' to set up my engine (ovirt-engine1.ldas.ligo-la.caltech.edu) For this initial deployment, I accept almost all of the defaults (other than local network-specific settings). However, the hosted-engine deployment fails with: [ INFO ] TASK [ovirt.hosted_engine_setup : Obtain SSO token using username/pass word credentials] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up] [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": false, "ov irt_hosts": []} [...cleanup...] [ INFO ] TASK [ovirt.hosted_engine_setup : Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} However, when I run 'virsh list', I can still see a HostedEngine1 vm running. In virt-hosted-engine-setup-20200522153439-e7iw3k.log I see the error: 2020-05-25 11:57:03,897-0500 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'changed': False, 'ovirt_hosts': [], 'invocation': {'module_args': {'pattern': 'name=ovirt1.ldas.ligo-la.caltech.edu', 'fetch_nested': False, 'nested_attributes': [], 'all_content': False, 'cluster_version': None}}, '_ansible_no_log': False, 'attempts': 120} 2020-05-25 11:57:03,998-0500 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"attempts": 120, "changed": false, "ovirt_hosts": []} In ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20200525112504-y2mmzu.log I see the following ansible errors: 2020-05-25 11:36:22,300-0500 DEBUG ansible on_any args localhostTASK: ovirt.hosted_engine_setup : Always revoke the SSO token kwargs 2020-05-25 11:36:23,766-0500 ERROR ansible failed { "ansible_host": "localhost", "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_result": { "_ansible_no_log": false, "changed": false, "invocation": { "module_args": { "ca_file": null, "compress": true, "headers": null, "hostname": null, "insecure": null, "kerberos": false, "ovirt_auth": { "ansible_facts": { "ovirt_auth": { "ca_file": null, "compress": true, "headers": null, "insecure": true, "kerberos": false, "timeout": 0, "token": "tF4ZMU0Q23zS13W2vzyhkswGMB4XAXZCFiPg9IVvbJXkPq9MFmne40wvCKaQOJO_TkYOpfxe78r9HHJcSrUWCQ", "url": "https://ovirt-engine1.ldas.ligo-la.caltech.edu/ovirt-engine/api" } }, "attempts": 1, "changed": false, "failed": false }, "password": null, "state": "absent", "timeout": 0, "token": null, "url": null, "username": null } }, "msg": "You must specify either 'url' or 'hostname'." }, "ansible_task": "Always revoke the SSO token", "ansible_type": "task", "status": "FAILED", "task_duration": 2 } 2020-05-25 11:36:23,767-0500 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f15adaffa58> kwargs ignore_errors:True Then further down: 2020-05-25 11:57:05,063-0500 DEBUG var changed: host "localhost" var "ansible_failed_result" type "<class 'dict'>" value: "{ "_ansible_no_log": false, "_ansible_parsed": true, "attempts": 120, "changed": false, "failed": true, "invocation": { "module_args": { "all_content": false, "cluster_version": null, "fetch_nested": false, "nested_attributes": [], "pattern": "name=ovirt1.ldas.ligo-la.caltech.edu" } }, "ovirt_hosts": [] }" 2020-05-25 11:57:05,063-0500 ERROR ansible failed { "ansible_host": "localhost", "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_result": { "_ansible_no_log": false, "attempts": 120, "changed": false, "invocation": { "module_args": { "all_content": false, "cluster_version": null, "fetch_nested": false, "nested_attributes": [], "pattern": "name=ovirt1.ldas.ligo-la.caltech.edu" } }, "ovirt_hosts": [] }, "ansible_task": "Wait for the host to be up", "ansible_type": "task", "status": "FAILED", "task_duration": 1235 } 2020-05-25 11:57:05,063-0500 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f15ad92dcc0> kwargs ignore_errors:None Not being very familiar with ansible, I'm not sure where to look next for the root cause of the problem. --Michael Thomas

5 11

Q: Which types of tests and tools are used?
by Juergen Novak 05 Jun '20

05 Jun '20

Hi, can anybody help me to find some information about test types used in the project and tools used? Particularly interesting would be tools and tests used for the Python coding, but also any information about Java would be appreciated. I already scanned the documentation, but I mainly found only information about Mocking tools. Thank you! /juergen

3 2

Ovirt 4.4 Migration assistance needed.
by Strahil Nikolov 05 Jun '20

05 Jun '20

Hello All, I would like to ask for some assistance with the planing of the upgrade to 4.4 . I have issues with the OVN (doesn't work at all), thus I would like to start fresh with the HE. The plan so far (downtime is not an issue) : 1. Reinstall the nodes one by 1 and rejoin them in the Gluster TSP 2. Wipe the HostedEngine's gluster volume 3. Deploy a fresh hosted engine 4. Import the storage domains (gluster) back to the engine and import the VMs Do you see any issues with the plan ? Any problems expected if the VMs do have snapshots? What about the storage domain version ? Thanks in Advance. Best Regards, Strahil Nikolov

3 4