June 2023 - Users - Ovirt List Archives

VM migration failed after upgrade to 4.4

by joris dedieu

Hi list, I'm troubleshooting an issue with live migration on an upgrade context. I have a 4.3 cluster that I try to upgrade in 4.4. The engine is OK in 4.4 and I already have upgraded a single host which had kindly reintegrated the cluster. The host is OK. I can start vms on it and migrate vms from it to other hosts. But, if I try to migrate a vm from a 4.3 host to the 4.4 host, it failed with : * on the host : qemu-kvm: terminating on signal 15 from pid XXXX (<unknown process>) * on the engine : VM 'VM_UUID' was unexpectedly detected as 'Down' on VDS 'HOST_UUID' Does anyone here can help me to find what's goning on ? I will be greate regards Joris

1 year, 11 months

3
2
0 / 0

Cannot start VM with vGPU (NVIDIA)

by pawel.osadtschy01＠dhl.com

Hi, need a help! may be the problem is very simple, but I've not found solution: -oVirt 4.5.4 on RHEL 8.7 -amd_iommu is on on the Host with Nvidia Card A40 I can see my cards in vGPU Settings for Windows VMs (Windows 10 Pro). Without activation of the vGPU device VM started normally. With one active vGPU card VM cannot start. Error message on the Host: Jun 07 13:39:51 depotlsa8ovh1 kernel: [nvidia-vgpu-vfio] e80ae200-4cea-4213-9e78-ebf0b86a756a: start failed. status: 0x0 Timeout Occured Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Kann nicht vom Monitor lesen: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Interner Fehler: qemu unexpectedly closed the monitor: 2023-06-07T11:39:51.494056Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/e80ae200-4cea-4213-9e78-ebf0b86a756a,display=on,ramfb=on,bus=pci.7,addr=0x0: vfio e80ae200-4cea-4213-9e78-ebf0b86a756a: error getting device from group 156: Connection timed out. Verify all devices in group 156 are bound to vfio-<bus> or pci-stub and not already in use. Looks like libvirtd cannot get any device. Is it problem of Nvidia settings on the host or Problem of VM settings? Thank you Paolo

1 year, 11 months

3
3
0 / 0

Re: Resending with Log: Hosted Engine Deploy Fails at "Waiting for Host to be up"

by Gilboa Davara

Hello, I'm seeing the same when trying to install nightly on high-end server (Either direct over NFS or gluster cluster). Seems that oVirt deployment fails to change the HE IP address after deployment. I'm thinking about trying other options, such as switching to Oracle (Ugh!) / oVirt 4.4 which seems to be stable and has long term support. - Gilboa On Sun, Jun 11, 2023 at 4:30 PM Angel, Christopher < christopher.angel(a)usask.ca> wrote: > I’ve installed 3 Ovirt Nodes and am trying to set up the hosted engine. > Every time I run it however, it fails at the ‘Waiting for host to be up’ > stage. I’ve attached the relevant log. > > > > -- > > Christopher Angel, B.Eng, B.Sc > > Laboratory Systems Analyst, Computer Science Department > > University of Saskatchewan > > Christopher.angel(a)usask.ca > > 3069661434 > > > _______________________________________________ > Users mailing list -- users(a)ovirt.org > To unsubscribe send an email to users-leave(a)ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/VN7NXEPRDA7... >

1 year, 11 months

1
0
0 / 0

Unable to find a match: cockpit-ovirt-dashboard

by R A

Hello, maybe someone can help me. I am using ovirt 4.5 standalone engine with local database on Rocky Linux 9 on Server A. On Server B, C and D i ve installed ovirt Node 4.5 and now i want to user Gluster Hyperconverged via cockpit with 4 Server (one for standalone engine) Ovirt 4.5 Standalone Engine is already running on Server A. My Problem is i cant download cockpit-ovirt-dashboard [root@pluto volrath]# yum install cockpit-ovirt-dashboard Last metadata expiration check: 1:41:33 ago on Thu Jun 15 19:36:53 2023. No match for argument: cockpit-ovirt-dashboard Error: Unable to find a match: cockpit-ovirt-dashboard The instructions in Section "Installing on RHEL 9.0 or derivates" from here https://www.ovirt.org/download/install_on_rhel.html are done. But i am still not able to finde the cockpit-plugin. Ansible-Runner ist installed too. What i am missing? Or does Rocky Linux do not support the Cockpit-Plugin? BR Peter

1 year, 11 months

1
0
0 / 0

Re: Resending with Log: Hosted Engine Deploy Fails at "Waiting for Host to be up"

by Thomas Hoberg

I have seen this type of behavior when building a HCI cluster on Atoms. The problem is that at this poing the machine that is generated for the management engine has a machine type that is above what is actually supported in hardware. Since it's not the first VM that is run during the setup process, it's not really intuitive but the libvirt configuration for that prototypical management engine is created by code that assumes to modern a default (I never found the source, but since all development has ceased it won't matter any more). While modern Atoms are actually more like Core CPUs in terms of the instruction set they support, my Goldmont Plus/Gemini Lake Atoms were regarded by KVM as next to a Nehalem CPU and the VM would refuse to start. It's very hard to find in the logs, because it is actually a KVM issue (created by the oVirt ME creation mechanism, of couse). I got out of that fix using removable storage: I had the setup run on an i7-7700k (was a bit faster, too) and then changed the machine type of the management engine (and lowest common denominator for the cluster) to Nehalem before transplanting the SSD back into the Atom. I've gone through that excercise with ovirt 4.3 and again with Oracle's 4.4 variant, which is by far the most stable oVirt/RHEV available right now.

1 year, 11 months

1
0
0 / 0

Windows VMs pause ofte when disk images are located on CephFS storage

by change_jeeringly679＠dralias.com

Hey guys, I have tried to write on the IRC channel, regarding this issue, but I'm no sure if I'm not doing it right or maybe there are not many people watching the #ovirt channel. We have deployed a 6 node oVirt cluster, and have moved roughly 100 VMs on them. We started the cluster with NFS storage, which we were building and testing with our Ceph cluster. The Ceph has been ready for a few months now, and we have tested running oVirt VMs on it using RBD, CephFS and by using the NFS Ganesha server which can be enabled on Ceph. Initially I tested with RBD using the cinderlib functionality in oVirt. This appears to be working fine, and I can live with the fact tha live storage migrations from posix storage are not possible. The biggest hurdle vi encountered are, that the backup APIs in oVirt cannot be used to create a backup and download the backups. This breaks our RHV Veeam backup solution, which we have grown quite fond of. But even my homegrown backup solutions from earlier don't work either, as they use the APIs. For this reason we have now changed to CephFS. It has a different set of problems, as we can only supply 1 monitor when mounting the CephFS storage, making i less robust as it should be. It is also making metadata storage dependent on the MDS. As far as I understand, is data access still connecting to the OSDs where it resides directly. I have multiple MDS containers on the Ceph cluster for loadbalancing and redundancy, but it still feels less tidy than RBD with native Ceph support. The good thing is, that as CephFS is a POSIX filesystem, the backup APIs work and so does Veeam backup. The biggest problems I am struggling with, is the unexplained pausing of exclusively VMs running the Windows Operating system. This is why I didn't notice issues initially, as I have been testing the storage with Linux VMs. The machines pause at a random moment in time with a "Storage Error". I cannot resume the machine, not even using virsh on the host it is running on. The only thing I can do is power it off, and reboot. And in many cases, it then pauses during the boot process. I have been watching the VDSM logs, and couldn't work out why this happens. When I move the disk to plain NFS storage (not on Ceph), this never happens. The result is that most Windows based VMs have not been moved yet to CephFS. I have 0 occurrences of this happening with Linux VMs, of which I have many (I think we have 80 linux VMs vs 20 Windows VMs). The Ceph cluster does not show and problems before, during or after this happening. Is there anyone with experience wih oVirt and Ceph, that can share experiences or help me find the root cause of this problem? I have a couple of things I can progress wih: 1. Move a couple of Windows VMs to RBD, and install a Veeam agent on the servers instead. This is just opening a new can of worms, as it requires some work on the servers each time I add a new server and I also need to make network openings from the server to the backup repos. It is just a little less neat than the RHV Veeam backup solution using the hypervisor. 2. Move a couple of Windows VMs to the NFS Ganesha implementation. This means ALL traffic is going through the NFS containers created on the Ceph cluster, and I loose the distributed nature of the oVirt hosts talking direcly to the OSDs on the Ceph cluster. If I was to go this way, I should probably create some NFS Ganesha servers that connect to Ceph natively on the one end and provide NFS services to oVirt. Both tests would still test Ceph, but using an alternative method to CephFS. My preferred solution really is 1., was it not for the backup APIs being rendered useless. Is work still being carried out on development in these areas, or has oVirt/Ceph development ceased? Thoughts and comments are welcome. Looking forward to sparring with someone that has experience with this :-) With kind regards Jelle

1 year, 11 months

1
0
0 / 0

RHEL 9 (rocky/centos/..) guest console problem

by marek

hi, i know spice/QXL is deprecated so, i have guest VM with console config VGA/VNC (tried Bochs/VNC too) virt-viewer from windows cannot connect to console ("console" button in ovirt admin) if i use VNC params from "console.vv" i can connect to it with tigerVNC client (tightVNC does not work) any ideas what can be wrong? btw is there option to prolong password in console.vv? there is note in the file "# Password is valid for 120 seconds." Marek

1 year, 11 months

3
3
0 / 0

Trying to recover local domain from failed host

by Tim Tuck

Hi all, I had a catastrophic failure of a host with but the disk with the domain on it is fine. I've put the disk in a new machine, mounted it on /backups and I can see this structure ... [root@max backups]# ls -laR * 2f2da2d7-596f-4a53-8a1b-a301f84b3b74: total 20 drwxr-xr-x. 5 vdsm kvm 4096 Sep 15 2020 . drwxr-xr-x. 4 vdsm kvm 4096 Jun 10 11:53 .. drwxr-xr-x. 2 vdsm kvm 4096 Sep 15 2020 dom_md drwxr-xr-x. 23 vdsm kvm 4096 May 20 11:35 images drwxr-xr-x. 4 vdsm kvm 4096 Sep 15 2020 master 2f2da2d7-596f-4a53-8a1b-a301f84b3b74/dom_md: total 32780 drwxr-xr-x. 2 vdsm kvm 4096 Sep 15 2020 . drwxr-xr-x. 5 vdsm kvm 4096 Sep 15 2020 .. -rwxr-xr-x. 1 vdsm kvm 0 Sep 15 2020 ids -rwxr-xr-x. 1 vdsm kvm 16777216 Sep 15 2020 inbox -rwxr-xr-x. 1 vdsm kvm 0 Sep 15 2020 leases -rwxr-xr-x. 1 vdsm kvm 486 Sep 15 2020 metadata -rwxr-xr-x. 1 vdsm kvm 16777216 Sep 15 2020 outbox 2f2da2d7-596f-4a53-8a1b-a301f84b3b74/images: total 92 drwxr-xr-x. 23 vdsm kvm 4096 May 20 11:35 . drwxr-xr-x. 5 vdsm kvm 4096 Sep 15 2020 .. drwxr-xr-x. 2 vdsm kvm 4096 Feb 3 2021 1eb2b682-ef45-409d-9341-abcf61418619 drwxr-xr-x. 2 vdsm kvm 4096 Aug 11 2022 283838fd-43b5-42b9-a332-a7b980613188 drwxr-xr-x. 2 vdsm kvm 4096 Oct 4 2020 36237183-54a2-4232-b0b3-6d1a9457c23a drwxr-xr-x. 2 vdsm kvm 4096 Jan 5 2022 366e7b29-2123-45c0-9190-dafead431651 drwxr-xr-x. 2 vdsm kvm 4096 May 20 09:29 3ea98df5-3ee8-4d21-8607-94284e7fff37 drwxr-xr-x. 2 vdsm kvm 4096 May 20 09:29 44c3611b-1205-48ca-9a84-007eaae03cbb drwxr-xr-x. 2 vdsm kvm 4096 May 18 23:49 674b364c-b74a-4fff-a38b-ca1089103241 drwxr-xr-x. 2 vdsm kvm 4096 Oct 15 2021 6949a9e0-4cc0-4eee-be99-e88795cd6447 drwxr-xr-x. 2 vdsm kvm 4096 Jan 5 2022 8b618bad-6dfb-48de-90ad-f325bf50cd45 drwxr-xr-x. 2 vdsm kvm 4096 Feb 6 2021 91f1f606-0f26-4ef0-a2fe-cccde6c78e89 drwxr-xr-x. 2 vdsm kvm 4096 May 20 11:35 94f4bf41-2b2f-42bc-ae21-0e80205ba19a drwxr-xr-x. 2 vdsm kvm 4096 May 20 11:35 a18f0c48-b3ac-4b89-9e52-fddaddbc8f4a drwxr-xr-x. 2 vdsm kvm 4096 May 26 15:51 ac4a428d-961e-46ae-a279-f282fd9ecf94 drwxr-xr-x. 2 vdsm kvm 4096 May 26 15:51 ba1a624c-6995-4109-b770-7bbd1efc340e drwxr-xr-x. 2 vdsm kvm 4096 Sep 15 2020 dec14b5b-69fe-4c51-abb4-d8296c547a0b drwxr-xr-x. 2 vdsm kvm 4096 Sep 15 2020 e7a17f3b-7567-4c90-b58e-18f34d4686c0 drwxr-xr-x. 2 vdsm kvm 4096 May 20 09:29 ea786bee-b98e-44c0-a44a-a63af63b8c51 drwxr-xr-x. 2 vdsm kvm 4096 Apr 21 2021 ec37f395-3b4b-41df-8f5b-8a22b7e1273b drwxr-xr-x. 2 vdsm kvm 4096 Feb 25 2022 f3e641c7-a2cc-42af-bcd1-b5ecdf437e03 drwxr-xr-x. 2 vdsm kvm 4096 May 18 23:55 f4c221a6-6ce4-44cc-8d32-d673e40aa915 drwxr-xr-x. 2 vdsm kvm 4096 May 20 11:35 f539c098-3f16-410f-938f-3de6ef37b017 2f2da2d7-596f-4a53-8a1b-a301f84b3b74/images/1eb2b682-ef45-409d-9341-abcf61418619: total 104857624 drwxr-xr-x. 2 vdsm kvm 4096 Feb 3 2021 . drwxr-xr-x. 23 vdsm kvm 4096 May 20 11:35 .. -rwxr-xr-x. 1 vdsm kvm 107374182400 Feb 3 2021 e29b005f-d272-49f8-a162-41f908ca67a4 -rwxr-xr-x. 1 vdsm kvm 300 Feb 3 2021 e29b005f-d272-49f8-a162-41f908ca67a4.meta 2f2da2d7-596f-4a53-8a1b-a301f84b3b74/images/283838fd-43b5-42b9-a332-a7b980613188: total 23223760 drwxr-xr-x. 2 vdsm kvm 4096 Aug 11 2022 . drwxr-xr-x. 23 vdsm kvm 4096 May 20 11:35 .. -rwxr-xr-x. 1 vdsm kvm 23781113856 Aug 11 2022 3273221e-1649-4e4a-816c-2ffec18e51c7 -rwxr-xr-x. 1 vdsm kvm 251 Aug 11 2022 3273221e-1649-4e4a-816c-2ffec18e51c7.meta . . etc So... is there a way to get this back ? I tried "Import Domain" both with the disk manually mounted and unmounted, both failed and I get errors in the vdsm.log like this... 2023-06-15 15:57:55,166+1000 INFO (jsonrpc/0) [storage.StorageServer.MountConnection] Creating directory '/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74' (storageServer:167) 2023-06-15 15:57:55,166+1000 INFO (jsonrpc/0) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74 mode: None (fileUtils:201) 2023-06-15 15:57:55,166+1000 INFO (jsonrpc/0) [storage.Mount] mounting /backups/2f2da2d7-596f-4a53-8a1b-a301f84b3b74 at /rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74 (mount:207) 2023-06-15 15:57:55,176+1000 ERROR (jsonrpc/0) [storage.HSM] Could not connect to storageServer (hsm:2374) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2371, in connectStorageServer conObj.connect() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 180, in connect six.reraise(t, v, tb) File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 171, in connect self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP) File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 210, in mount cgroup=cgroup) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in mount File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod raise convert_to_error(kind, result) vdsm.storage.mount.MountError: Command ['/usr/bin/mount', '-t', 'posix', '/backups/2f2da2d7-596f-4a53-8a1b-a301f84b3b74', '/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74'] failed with rc=32 out=b'' err=b"mount: /rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74: unknown filesystem type 'posix'.\n" 2023-06-15 15:57:55,176+1000 INFO (jsonrpc/0) [storage.StorageDomainCache] Invalidating storage domain cache (sdc:74) 2023-06-15 15:57:55,176+1000 INFO (jsonrpc/0) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': '00000000-0000-0000-0000-000000000000', 'status': 477}]} from=::ffff:172.20.1.160,47702, flow_id=6693c76d-b62b-4e62-9b70-c87f2a199705, task_id=682abcff-3040-46d7-aaa1-f2445f0a6698 (api:54) 2023-06-15 15:57:55,283+1000 INFO (jsonrpc/2) [vdsm.api] START disconnectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'password': '********', 'vfs_type': 'posix', 'port': '', 'iqn': '', 'connection': '/backups/2f2da2d7-596f-4a53-8a1b-a301f84b3b74', 'ipv6_enabled': 'false', 'id': '00000000-0000-0000-0000-000000000000', 'user': '', 'tpgt': '1'}]) from=::ffff:172.20.1.160,47702, flow_id=0d3d4a3e-3523-4436-93b3-e22bff47f082, task_id=11870123-365d-4b40-b481-be5d44098fc2 (api:48) 2023-06-15 15:57:55,284+1000 INFO (jsonrpc/2) [storage.Mount] unmounting /rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74 (mount:215) 2023-06-15 15:57:55,290+1000 ERROR (jsonrpc/2) [storage.HSM] Could not disconnect from storageServer (hsm:2480) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2476, in disconnectStorageServer conObj.disconnect() File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 202, in disconnect self._mount.umount(True, True) File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 217, in umount umount(self.fs_file, force=force, lazy=lazy, freeloop=freeloop) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in umount File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod raise convert_to_error(kind, result) vdsm.storage.mount.MountError: Command ['/usr/bin/umount', '-f', '-l', '/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74'] failed with rc=32 out=b'' err=b'umount: /rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74: no mount point specified.\n' 2023-06-15 15:57:55,290+1000 INFO (jsonrpc/2) [storage.StorageDomainCache] Refreshing storage domain cache (resize=False) (sdc:80) 2023-06-15 15:57:55,290+1000 INFO (jsonrpc/2) [storage.ISCSI] Scanning iSCSI devices (iscsi:442) 2023-06-15 15:57:55,328+1000 INFO (jsonrpc/2) [storage.ISCSI] Scanning iSCSI devices: 0.04 seconds (utils:390) 2023-06-15 15:57:55,328+1000 INFO (jsonrpc/2) [storage.HBA] Scanning FC devices (hba:60) 2023-06-15 15:57:55,416+1000 INFO (jsonrpc/2) [storage.HBA] Scanning FC devices: 0.09 seconds (utils:390) 2023-06-15 15:57:55,416+1000 INFO (jsonrpc/2) [storage.Multipath] Waiting until multipathd is ready (multipath:112) 2023-06-15 15:57:57,458+1000 INFO (jsonrpc/2) [storage.Multipath] Waited 2.04 seconds for multipathd (tries=2, ready=2) (multipath:139) 2023-06-15 15:57:57,458+1000 INFO (jsonrpc/2) [storage.StorageDomainCache] Refreshing storage domain cache: 2.17 seconds (utils:390) 2023-06-15 15:57:57,458+1000 INFO (jsonrpc/2) [vdsm.api] FINISH disconnectStorageServer return={'statuslist': [{'id': '00000000-0000-0000-0000-000000000000', 'status': 477}]} from=::ffff:172.20.1.160,47702, flow_id=0d3d4a3e-3523-4436-93b3-e22bff47f082, task_id=11870123-365d-4b40-b481-be5d44098fc2 (api:54) 2023-06-15 15:57:57,458+1000 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call StoragePool.disconnectStorageServer took more than 1.00 seconds to succeed: 2.18 (__init__:316) any help appreciated thanks Tim

1 year, 11 months

1
0
0 / 0

self-hosted engine with Local Storage

by Jorge Visentini

Hello. I know that there is no logic, that there is no HA, redundancy and all that... but is there any possibility that I can deploy the oVirt self-hosted engine using an array of local disks on the host? *In my scenario I don't need to have HA.* *What I need:* 1 host with a self-hosted mechanism on a local domain store, that's all. On this host I will locally run some old VMs. Isolated VMs. -- Att, Jorge Visentini +55 55 98432-9868

1 year, 11 months

2
7
0 / 0

oVirt 4.4 - High Performance VMs with Direct LUN attached disks pause when an upstream SAN path failed

by angus＠ajct.uk

Hello Our SAN engineer performed some maintenance during which time the KVM host lost 2 of the 4 paths to storage. My High Performance VMs with Direct LUN disks with SCSI-Pass-Through enabled paused and had to be rebooted resolve. Other VMs (not High Performance, no Direct LUN disks) on the same host were unaffected. What do you think? Thanks Angus

1 year, 11 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Users June 2023