boot from cdrom & error code 0005
I have created a new storage domain (data domain, storage type nfs) to use it to upload iso images.
I have so uploaded a new iso and then attach the iso to a new vm.
But when I try to boot the vm I obtain this error:
booting from dvd/cd...
boot failed: could not read from cdrom (code 0005)
no bootable device
The iso file has been uploaded with success in the data storage domain and so the vm lets my attach the iso to the vm in the boot settings.
Can you help me?
Thank you
6 months, 2 weeks
VM Migration Failed
by KSNull Zero
Running oVirt 4.4.5
VM cannot migrate between hosts.
vdsm.log contains the following error:
libvirt.libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://ovhost01.local/system: authentication failed: Failed to verify peer's certificate
Certificates on hosts was renewed some time ago. How this issue can be fixed ?
Thank you.
7 months, 4 weeks
How to re-enroll (or renew) host certificates for a single-host hosted-engine deployment?
by Derek Atkins
I've got a single-host hosted-engine deployment that I originally
installed with 4.0 and have upgraded over the years to 4.3.10. I and some
of my users have upgraded remote-viewer and now I get an error when I try
to view the console of my VMs:
(remote-viewer:8252): Spice-WARNING **: 11:30:41.806:
../subprojects/spice-common/common/ssl_verify.c:477:openssl_verify: Error
in server certificate verification: CA signature digest algorithm too weak
(num=68:depth0:/O=<My Org Name>/CN=<Host's Name>)
I am 99.99% sure this is because the old certs use SHA1.
I reran engine-setup on the engine and it asked me if I wanted to renew
the PKI, and I answered yes. This replaced many[1] of the certificates in
/etc/pki/ovirt-engine/certs on the engine, but it did not update the
Host's certificate.
All the documentation I've seen says that to refresh this certificate I
need to put the host into maintenance mode and then re-enroll.. However I
cannot do that, because this is a single-host system so I cannot put the
host in local mode -- there is no place to migrate the VMs (let alone the
Engine VM).
So.... Is there a command-line way to re-enroll manually and update the
host certs? Or some other way to get all the leftover certs renewed?
[1] Not only did it not update the Host's cert, it did not update any of
the vmconsole-proxy certs, nor the certs in /etc/pki/ovirt-vmconsole/, and
obviously nothing in /etc/pki/ on the host itself.
Derek Atkins 617-623-3745
Computer and Internet Security Consultant
8 months, 1 week
4.5.4 with Ceph only storage
by Maurice Burrows
Hey ... A long story short ... I have an existing Red Hat Virt / Gluster hyperconverged solution that I am moving away from.
I have an existing Ceph cluster that I primarily use for OpenStack and a small requirement for S3 via RGW.
I'm planning to build a new oVirt 4.5.4 cluster on RHEL9 using Ceph for all storage requirements. I've read many online articles on oVirt and Ceph, and they all seem to use the Ceph iSCSI gateway, which is now in maintenance, so I'm not real keen to commit to iSCSI.
So my question is, IS there any reason I cannot use CephFS for both hosted-engine and as a data storage domain?
I'm currently running Ceph Pacific FWIW.
8 months, 2 weeks
i can't access console with noVNC or VNC client(console.vv)
i installed the ovirt 4.5 engine on centos stream 9 and add a ovirt node(ovirt node 4.5 iso) to this engine. i am going to run my vm on this node. i follow the instruction to create the data center, the cluster, the storage domain, upload the image. everything is fine. and after i create a vm with ubuntu image attach, i found that i can't visit the console. when i using the noVNC, it says "Something went wrong, connection is closed", when i visit vnc with virt-viewver, is says "Failed to complete handshake Error in the pull function". i try to change the console type to Bochs one and it appear the same. i change to QXL mode and the vm can't start any more. i check the log, it says "unsupported configuration: domain configuration does not support video model 'qxl'".
so now i can't visit my vm by anyway. i deploy the engine follow the official instruction and keep mostly option default but why still have this issue. why the noVNC says "Something went wrong" instead of telling me what is actually wrong
9 months, 3 weeks
Oracle Virtualization Manager 4.5 anyone?
by Thomas Hoberg
Redhat's decision to shut down RHV caught Oracle pretty unprepared, I'd guess, who had just shut down their own vSphere clone in favor of a RHV clone a couple of years ago.
Oracle is even less vocal about their "Oracle Virtualization" strategy, they don't even seem to have a proper naming convention or branding.
But they have been pushing out OV releases without a publicly announced EOL almost a year behind Redhat for the last years.
And after a 4.4 release in September 22, a few days ago on December 12th actually a release 4.5 was made public.
I've operated oVirt 4.3 with significant quality issues for some years and failed to make oVirt 4.4 work with any degree of acceptable stability but Oracle's variant of 4.4 proved to be rather better than 4.3 on CentOS7 with no noticable bugs, especially in the Hyperconverged setup that I am using with GlusterFS.
I assumed that this was because Oracle based their 4.4 in fact on RHV 4.4 and not oVirt, but since they're not telling, who knows?
One issue with 4.4 was that Oracle is pushing their UE-Kernel and that created immediate issues e.g. with VDO missing modules for UEK and other stuff, but that was solved easily enough by using the RHEL kernel.
With 4.5 Oracle obviously can't use RHV 4.5 as a base, because there is no such thing with RHV declared EOL and according to Oracle their 4.5 is based on oVirt 4.5.4, which made the quality of that release somewhat questionable, but perhaps they have spent the year that has passed since productively killing bugs... only to be caught by surprise again, I presume, by an oVirt release 4.5.5 on December 1st, that no one saw coming!
Long story slightly shorter, I've been testing Oracle's 4.5 variant a bit and it's not without issues.
But much worse, Oracle's variant of oVirt seems to be entirely without any community that I could find.
Now oVirt has been a somewhat secret society for years, but compared to what's going on with Oracle this forum is teaming with life!
So did I just not look around enough? Is there a secret lair where all those OV users are hiding?
Anyhow, here is what I've tested so far and where I'd love to have some feedback:
1. Setting up a three node HCI cluster from scratch using OL8.9 and OV 4.5
Since I don't have extra physical hardware for a 3 node HCI I'm using VMware workstation 17.5 on a Workstation running Windows 2022, a test platform that has been working for all kinds of virtualization tests from VMware ESXi, via Xcp-ng and ovirt.
Created three VMs with OL8.9 minimal and then installed OV 4.5. I used the UEK default kernels and then had an issue when Ansible is trying to create the (local) management engine: the VM simply could not reach the Oracle repo servers to install the packages inside the ME. Since that VM is entirely under the control of Ansible and no console access of any type is possible in that installation phase, I couldn't do diagnostics.
But with 4.4 I used to have similar issues and there switching back to the Redhat kernel for the ME (and the hosts) resolved them.
But with 4.5 it seems that UEK has become a baked-in dependency: the OV team doesn't even seem to do any testing with the Redhat kernel any more. Or not with the HCI setup, which has become deprecated somewhere in oVirt 4.4... Or not with the Cockpit wizard, which might be in a totally untested state, or....
Doing the same install on OL 8.9 with OV 4.4, however, did work just fine and I was even able to update to 4.5 afterwards, which was a nice surprise...
...that I could not repeat on my physical test farm using three Atoms. There switching to the UEK kernel on the hosts caused issues, hosts were becoming unresponsive, file systems inaccessible, even if they were perfectly fine at the Gluster CLI level and in the end the ME VM simply would not longer start. Switching back to the Redhat kernel resolved things there.
In short, switching between the Redhat kernel and UEK, which should be 100% transparent to all things userland including hypervisors, doesn't work.
But my attempts to go with a clean install of 4.5 on a Redhat kernel or UEK is also facing issues. So far the only thing that has worked was a single node HCI install using UEK and OV 4.5 and upgrading to OV 4.5 on a virtualized triple node OV 4.4 HCI cluster.
Anyone else out there trying these things?
I was mostly determined to move to Proxmox VE, but Oracle's OV 4.5 seemed to be handing a bit of a life-line to oVirt and the base architecture is just much more powerful (or less manual) than Proxmox, which doesn't have a management engine.
10 months, 1 week
Changing disk QoS causes segfault with IO-Threads enabled (oVirt
We recently upgraded to 4.3.0 and have found that when changing disk QoS settings on VMs whilst IO-Threads is enabled causes them to segfault and the VM to reboot. We've been able to replicate this across several VMs. VMs with IO-Threads disabled/turned off do not segfault when changing the QoS.
Mar 1 11:49:06 srvXX kernel: IO iothread1[30468]: segfault at fffffffffffffff8 ip 0000557649f2bd24 sp 00007f80de832f60 error 5 in qemu-kvm[5576498dd000+a03000]
Mar 1 11:49:06 srvXX abrt-hook-ccpp: invalid number 'iothread1'
Mar 1 11:49:11 srvXX libvirtd: 2019-03-01 00:49:11.116+0000: 13365: error : qemuMonitorIORead:609 : Unable to read from monitor: Connection reset by peer
Happy to supply some more logs to someone if they'll help but just wondering whether anyone else has experienced this or knows of a current fix other than turning io-threads off.
10 months, 2 weeks
virt-v2v cannot authenticate with oVirt engine API with OAuth2
I've been reading through archives but not able to find what i need. Essentially what I'm trying to do is migrate a larger number of VMs from our OVM environment to a new OLVM setup. In an effort to reduce lots of replication and copying of the disk image (export, convert, copy over, import etc.) I found this article which shows a pretty slick way to do it in one shot
The main command behind it all is the virt-v2v that makes it possible. It looks something like this:
virt-v2v -i libvirtxml vm-test1.xml -o rhv-upload -oc https://<OLVM-server>/ovirt-engine/api -os <my storage> -op /tmp/ovirt-admin-password -of raw -oo rhv-cluster=Default -oo rhv-cafile=/root/ca.pem
The problem I'm having is I cannot authenticate with my new OLVM server at the ovirt-engine/api URL. Since user/password is depricated and you must use OAuth 2.0 with a token I'm stuck.
I have OLVM 4.5.4-1.0.27.el8 and from what I've read in oVirt 4.5 (not sure what version it started) they use keycloak oAuth 2.0 and the older ovirt-aaa-jdbc-tool is now deprecated.
In doing some testing I found I can use curl and authenticate against the ovirt-engine/api and get a token like this:
TOKEN=$(curl -k -X POST -H "Accept: application/json" -H "Content-Type: application/x-www-form-urlencoded" -d "grant_type=password
&username=$USERNAME&password=$PASSWORD&scope=ovirt-app-api" $OVIRT_ENGINE_URL/sso/oauth/token | jq -r '.access_token')
I was then able to query the API to validate my token works
curl -k -H "Accept: application/json" -H "Authorization: Bearer $TOKEN" "$OVIRT_ENGINE_URL/api/clusters?search=name=$CLUSTER_NAME"
The problem is virt-v2v does not support posting any form information or the token to authenticate. Best I can tell the -oc option is strictly the URL and if you want a username in there it's in the form of https://<name>@<server>. So even if I wrote a script and used curl to authenticate and get a token I still can't find a way to make virt-v2v use it.
So I'm stuck how do I get virt-v2v working? Is there a way to re-enable the deprecated user/pass method of accessing the ovirt-engine/api ? or as a last resort a way to get virt-v2v supporting the token?
Thanks for any insight
11 months, 2 weeks
Deploy oVirt Engine fail behind proxy
by Matteo Bonardi
I am trying to deploy the ovirt engine following self-hosted engine installation procedure on documentation.
Deployment servers are behind a proxy and I have set it in environment and in yum.conf before run deploy.
Deploy fails because ovirt engine vm cannot resolve AppStream repository url:
[ INFO ] TASK [ovirt.engine-setup : Install oVirt Engine package]
[ ERROR ] fatal: [localhost -> ovirt-manager.mydomain]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'AppStream': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for [Could not resolve host:]", "rc": 1, "results": []}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO ] Stage: Clean up
[ INFO ] Cleaning temporary resources
[ INFO ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Force facts gathering]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Fetch logs from the engine VM]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set destination directory path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Create destination directory]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Find the local appliance image]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set local_vm_disk_path]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Give the vm time to flush dirty buffers]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Copy engine logs]
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove local vm dir]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in /etc/hosts for the local VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Clean local storage pools]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Destroy local storage-pool {{ he_local_vm_dir | basename }}]
[ INFO ] TASK [ovirt.hosted_engine_setup : Undefine local storage-pool {{ he_local_vm_dir | basename }}]
[ INFO ] TASK [ovirt.hosted_engine_setup : Destroy local storage-pool {{ local_vm_disk_path.split('/')[5] }}]
[ INFO ] TASK [ovirt.hosted_engine_setup : Undefine local storage-pool {{ local_vm_disk_path.split('/')[5] }}]
[ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20201109165237.conf'
[ INFO ] Stage: Pre-termination
[ INFO ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20201109164244-b3e8sd.log
How I can set proxy for the engine vm?
Ovirt version:
[root@myhost ~]# rpm -qa | grep ovirt-engine-appliance
[root@myhost ~]# rpm -qa | grep ovirt-hosted-engine-setup
OS version:
[root@myhost ~]# cat /etc/centos-release
CentOS Linux release 8.2.2004 (Core)
[root@myhost ~]# uname -a
Linux myhost.mydomain 4.18.0-193.28.1.el8_2.x86_64 #1 SMP Thu Oct 22 00:20:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Thanks for the help.
11 months, 2 weeks
Ovirt 4.5 HA over NFS fails when a single host goes down
we have recently installed ovirt as a hosted-engine with high availability on six nodes over NFS storage (no Gluster), with power management through an on-board IPMI device, and the setup was successful. All the nodes (from Supermicro) are identical in every aspect, so no hardware differences exist and no modifications to the servers' hardware were performed. The hosted-engine was deployed on a second host, where two of the six hosts only were required to host the HE VM.
The network interface on each node is bonded between two physical fiber optics NICs in LACP mode with a VLAN on top, serving as the sole networking interface for the server/node, no separate VM or storage networks were needed, as the host OS, hosted-engine vm, and storage are required to be on the same network and VLAN.
We started by testing the high-availability of the hosted-engine VM (as it was deployed on two of the six nodes) by rebooting or powering off one of the hosts, and the VM would migrate successfully to the second HE node. The main goal of our experiments is to test the robustness of the setup, as it is required for the cluster to remain functional even when up to two hosts are brought down (whether due to a network or power issue), however, when rebooting or powering off one of the hosts, the HE VM goes down and takes the entire cluster with it, where we can't even access the web portal. Once the host is rebooted, the HE VM and the cluster becomes functional again. Sometimes the HE VM stays down for a set amount of time (5 to 6 minutes) and then goes back up, and sometimes it goes down until the problematic host is back up. This behavior happens to other VMs as well not the the HE.
We suspected an issue with the NFS storage, however, during ovirt operation it is being mounted properly over /rhev/data-center/mnt/<nfs:directory>, while the expected behavior is for the cluster to stay operational and any other VMs to be migrated to other hosts. During one of the tests, we tried to mount the NFS storage on a different directory and there was no problem, we were even able to perform commands such as ls without any issues, as well as writing a text file at the directory's root, and be able to modify it normally.
We suspected a couple of things the first being that the HE is unable to fence the problematic host (the one we took down), however, power management is setup properly.
The other thing we suspected is the cluster hosts (after taking down one of them) are unable to acquire storage lease, which is weird since the host in question is down and non-operational, hence no locks should be in place. The reason behind this suspicion is the following two errors that we receive frequently when one host or more goes down from the engine\ovirt-engine\engine.log file:
1- "EVENT_ID: VM_DOWN_ERROR(119), VM HostedEngine is down with error. Exit message: resource busy: Failed to acquire lock: Lease is held by another host."
2- "[<id>] Command 'GetVmLeaseInfoVDSCommand( VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', storagePoolId='<pool-id>', ignoreFailoverLimit='false', leaseId='<lease-id>', storageDomainId='<domain-id>'})' execution failed: IRSGenericException: IRSErrorException: No such lease: 'lease=<lease-id>'"
This is a third warning from the /var/log/vdsm/vdsm.log file
1- "WARN (check/loop) [storage.check] Checker '/rhev/data-center/mnt/<nfs-domain:/directory>/<id>/dom_md/metadata' is blocked for 310.00 seconds (check:265)"
All the tests are done without setting nodes into maintenance mode as we are simulating an emergency situation. No HE configuration were modified via the config-engine command, the default values are used.
Is this a normal behavior? Are we missing something? Do we need to tweak a certain configuration using the config-engine command to get a better behavior (e.g., shorter down period)?
Best regards
11 months, 4 weeks