VM migration failed after upgrade to 4.4
by joris dedieu
Hi list,
I'm troubleshooting an issue with live migration on an upgrade context.
I have a 4.3 cluster that I try to upgrade in 4.4. The engine is OK in
4.4 and I already have upgraded a single host which had kindly
reintegrated the cluster.
The host is OK. I can start vms on it and migrate vms from it to other hosts.
But, if I try to migrate a vm from a 4.3 host to the 4.4 host, it failed with :
* on the host : qemu-kvm: terminating on signal 15 from pid XXXX
(<unknown process>)
* on the engine : VM 'VM_UUID' was unexpectedly detected as 'Down' on
VDS 'HOST_UUID'
Does anyone here can help me to find what's goning on ? I will be greate
regards
Joris
1 year, 9 months
Cannot start VM with vGPU (NVIDIA)
by pawel.osadtschy01@dhl.com
Hi, need a help!
may be the problem is very simple, but I've not found solution:
-oVirt 4.5.4 on RHEL 8.7
-amd_iommu is on on the Host with Nvidia Card A40
I can see my cards in vGPU Settings for Windows VMs (Windows 10 Pro).
Without activation of the vGPU device VM started normally.
With one active vGPU card VM cannot start. Error message on the Host:
Jun 07 13:39:51 depotlsa8ovh1 kernel: [nvidia-vgpu-vfio] e80ae200-4cea-4213-9e78-ebf0b86a756a: start failed. status: 0x0 Timeout Occured
Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Kann nicht vom Monitor lesen: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt
Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Interner Fehler: qemu unexpectedly closed the monitor: 2023-06-07T11:39:51.494056Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/e80ae200-4cea-4213-9e78-ebf0b86a756a,display=on,ramfb=on,bus=pci.7,addr=0x0: vfio e80ae200-4cea-4213-9e78-ebf0b86a756a: error getting device from group 156: Connection timed out. Verify all devices in group 156 are bound to vfio-<bus> or pci-stub and not already in use.
Looks like libvirtd cannot get any device. Is it problem of Nvidia settings on the host or Problem of VM settings?
Thank you
Paolo
1 year, 9 months
Re: Resending with Log: Hosted Engine Deploy Fails at "Waiting for Host to be up"
by Gilboa Davara
Hello,
I'm seeing the same when trying to install nightly on high-end server
(Either direct over NFS or gluster cluster).
Seems that oVirt deployment fails to change the HE IP address after
deployment.
I'm thinking about trying other options, such as switching to Oracle (Ugh!)
/ oVirt 4.4 which seems to be stable and has long term support.
- Gilboa
On Sun, Jun 11, 2023 at 4:30 PM Angel, Christopher <
christopher.angel(a)usask.ca> wrote:
> I’ve installed 3 Ovirt Nodes and am trying to set up the hosted engine.
> Every time I run it however, it fails at the ‘Waiting for host to be up’
> stage. I’ve attached the relevant log.
>
>
>
> --
>
> Christopher Angel, B.Eng, B.Sc
>
> Laboratory Systems Analyst, Computer Science Department
>
> University of Saskatchewan
>
> Christopher.angel(a)usask.ca
>
> 3069661434
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/VN7NXEPRDA7...
>
1 year, 9 months
Unable to find a match: cockpit-ovirt-dashboard
by R A
Hello,
maybe someone can help me.
I am using ovirt 4.5 standalone engine with local database on Rocky Linux 9 on Server A. On Server B, C and D i ve installed ovirt Node 4.5 and now i want to user Gluster Hyperconverged via cockpit with 4 Server (one for standalone engine)
Ovirt 4.5 Standalone Engine is already running on Server A.
My Problem is i cant download cockpit-ovirt-dashboard
[root@pluto volrath]# yum install cockpit-ovirt-dashboard
Last metadata expiration check: 1:41:33 ago on Thu Jun 15 19:36:53 2023.
No match for argument: cockpit-ovirt-dashboard
Error: Unable to find a match: cockpit-ovirt-dashboard
The instructions in Section "Installing on RHEL 9.0 or derivates" from here https://www.ovirt.org/download/install_on_rhel.html are done. But i am still not able to finde the cockpit-plugin.
Ansible-Runner ist installed too.
What i am missing? Or does Rocky Linux do not support the Cockpit-Plugin?
BR
Peter
1 year, 9 months
Re: Resending with Log: Hosted Engine Deploy Fails at "Waiting for Host to be up"
by Thomas Hoberg
I have seen this type of behavior when building a HCI cluster on Atoms.
The problem is that at this poing the machine that is generated for the management engine has a machine type that is above what is actually supported in hardware.
Since it's not the first VM that is run during the setup process, it's not really intuitive but the libvirt configuration for that prototypical management engine is created by code that assumes to modern a default (I never found the source, but since all development has ceased it won't matter any more).
While modern Atoms are actually more like Core CPUs in terms of the instruction set they support, my Goldmont Plus/Gemini Lake Atoms were regarded by KVM as next to a Nehalem CPU and the VM would refuse to start.
It's very hard to find in the logs, because it is actually a KVM issue (created by the oVirt ME creation mechanism, of couse).
I got out of that fix using removable storage: I had the setup run on an i7-7700k (was a bit faster, too) and then changed the machine type of the management engine (and lowest common denominator for the cluster) to Nehalem before transplanting the SSD back into the Atom.
I've gone through that excercise with ovirt 4.3 and again with Oracle's 4.4 variant, which is by far the most stable oVirt/RHEV available right now.
1 year, 9 months
Windows VMs pause ofte when disk images are located on CephFS storage
by change_jeeringly679@dralias.com
Hey guys,
I have tried to write on the IRC channel, regarding this issue, but I'm no sure if I'm not doing it right or maybe there are not many people watching the #ovirt channel.
We have deployed a 6 node oVirt cluster, and have moved roughly 100 VMs on them. We started the cluster with NFS storage, which we were building and testing with our Ceph cluster. The Ceph has been ready for a few months now, and we have tested running oVirt VMs on it using RBD, CephFS and by using the NFS Ganesha server which can be enabled on Ceph.
Initially I tested with RBD using the cinderlib functionality in oVirt. This appears to be working fine, and I can live with the fact tha live storage migrations from posix storage are not possible. The biggest hurdle vi encountered are, that the backup APIs in oVirt cannot be used to create a backup and download the backups. This breaks our RHV Veeam backup solution, which we have grown quite fond of. But even my homegrown backup solutions from earlier don't work either, as they use the APIs.
For this reason we have now changed to CephFS. It has a different set of problems, as we can only supply 1 monitor when mounting the CephFS storage, making i less robust as it should be. It is also making metadata storage dependent on the MDS. As far as I understand, is data access still connecting to the OSDs where it resides directly. I have multiple MDS containers on the Ceph cluster for loadbalancing and redundancy, but it still feels less tidy than RBD with native Ceph support. The good thing is, that as CephFS is a POSIX filesystem, the backup APIs work and so does Veeam backup.
The biggest problems I am struggling with, is the unexplained pausing of exclusively VMs running the Windows Operating system. This is why I didn't notice issues initially, as I have been testing the storage with Linux VMs. The machines pause at a random moment in time with a "Storage Error". I cannot resume the machine, not even using virsh on the host it is running on. The only thing I can do is power it off, and reboot. And in many cases, it then pauses during the boot process. I have been watching the VDSM logs, and couldn't work out why this happens. When I move the disk to plain NFS storage (not on Ceph), this never happens. The result is that most Windows based VMs have not been moved yet to CephFS. I have 0 occurrences of this happening with Linux VMs, of which I have many (I think we have 80 linux VMs vs 20 Windows VMs).
The Ceph cluster does not show and problems before, during or after this happening. Is there anyone with experience wih oVirt and Ceph, that can share experiences or help me find the root cause of this problem? I have a couple of things I can progress wih:
1. Move a couple of Windows VMs to RBD, and install a Veeam agent on the servers instead. This is just opening a new can of worms, as it requires some work on the servers each time I add a new server and I also need to make network openings from the server to the backup repos. It is just a little less neat than the RHV Veeam backup solution using the hypervisor.
2. Move a couple of Windows VMs to the NFS Ganesha implementation. This means ALL traffic is going through the NFS containers created on the Ceph cluster, and I loose the distributed nature of the oVirt hosts talking direcly to the OSDs on the Ceph cluster. If I was to go this way, I should probably create some NFS Ganesha servers that connect to Ceph natively on the one end and provide NFS services to oVirt.
Both tests would still test Ceph, but using an alternative method to CephFS. My preferred solution really is 1., was it not for the backup APIs being rendered useless. Is work still being carried out on development in these areas, or has oVirt/Ceph development ceased?
Thoughts and comments are welcome. Looking forward to sparring with someone that has experience with this :-)
With kind regards
Jelle
1 year, 9 months
RHEL 9 (rocky/centos/..) guest console problem
by marek
hi,
i know spice/QXL is deprecated
so, i have guest VM with console config VGA/VNC (tried Bochs/VNC too)
virt-viewer from windows cannot connect to console ("console" button in
ovirt admin)
if i use VNC params from "console.vv" i can connect to it with tigerVNC
client (tightVNC does not work)
any ideas what can be wrong?
btw is there option to prolong password in console.vv? there is note in
the file "# Password is valid for 120 seconds."
Marek
1 year, 9 months
Trying to recover local domain from failed host
by Tim Tuck
Hi all,
I had a catastrophic failure of a host with but the disk with the domain
on it is fine.
I've put the disk in a new machine, mounted it on /backups and I can see
this structure ...
[root@max backups]# ls -laR *
2f2da2d7-596f-4a53-8a1b-a301f84b3b74:
total 20
drwxr-xr-x. 5 vdsm kvm 4096 Sep 15 2020 .
drwxr-xr-x. 4 vdsm kvm 4096 Jun 10 11:53 ..
drwxr-xr-x. 2 vdsm kvm 4096 Sep 15 2020 dom_md
drwxr-xr-x. 23 vdsm kvm 4096 May 20 11:35 images
drwxr-xr-x. 4 vdsm kvm 4096 Sep 15 2020 master
2f2da2d7-596f-4a53-8a1b-a301f84b3b74/dom_md:
total 32780
drwxr-xr-x. 2 vdsm kvm 4096 Sep 15 2020 .
drwxr-xr-x. 5 vdsm kvm 4096 Sep 15 2020 ..
-rwxr-xr-x. 1 vdsm kvm 0 Sep 15 2020 ids
-rwxr-xr-x. 1 vdsm kvm 16777216 Sep 15 2020 inbox
-rwxr-xr-x. 1 vdsm kvm 0 Sep 15 2020 leases
-rwxr-xr-x. 1 vdsm kvm 486 Sep 15 2020 metadata
-rwxr-xr-x. 1 vdsm kvm 16777216 Sep 15 2020 outbox
2f2da2d7-596f-4a53-8a1b-a301f84b3b74/images:
total 92
drwxr-xr-x. 23 vdsm kvm 4096 May 20 11:35 .
drwxr-xr-x. 5 vdsm kvm 4096 Sep 15 2020 ..
drwxr-xr-x. 2 vdsm kvm 4096 Feb 3 2021
1eb2b682-ef45-409d-9341-abcf61418619
drwxr-xr-x. 2 vdsm kvm 4096 Aug 11 2022
283838fd-43b5-42b9-a332-a7b980613188
drwxr-xr-x. 2 vdsm kvm 4096 Oct 4 2020
36237183-54a2-4232-b0b3-6d1a9457c23a
drwxr-xr-x. 2 vdsm kvm 4096 Jan 5 2022
366e7b29-2123-45c0-9190-dafead431651
drwxr-xr-x. 2 vdsm kvm 4096 May 20 09:29
3ea98df5-3ee8-4d21-8607-94284e7fff37
drwxr-xr-x. 2 vdsm kvm 4096 May 20 09:29
44c3611b-1205-48ca-9a84-007eaae03cbb
drwxr-xr-x. 2 vdsm kvm 4096 May 18 23:49
674b364c-b74a-4fff-a38b-ca1089103241
drwxr-xr-x. 2 vdsm kvm 4096 Oct 15 2021
6949a9e0-4cc0-4eee-be99-e88795cd6447
drwxr-xr-x. 2 vdsm kvm 4096 Jan 5 2022
8b618bad-6dfb-48de-90ad-f325bf50cd45
drwxr-xr-x. 2 vdsm kvm 4096 Feb 6 2021
91f1f606-0f26-4ef0-a2fe-cccde6c78e89
drwxr-xr-x. 2 vdsm kvm 4096 May 20 11:35
94f4bf41-2b2f-42bc-ae21-0e80205ba19a
drwxr-xr-x. 2 vdsm kvm 4096 May 20 11:35
a18f0c48-b3ac-4b89-9e52-fddaddbc8f4a
drwxr-xr-x. 2 vdsm kvm 4096 May 26 15:51
ac4a428d-961e-46ae-a279-f282fd9ecf94
drwxr-xr-x. 2 vdsm kvm 4096 May 26 15:51
ba1a624c-6995-4109-b770-7bbd1efc340e
drwxr-xr-x. 2 vdsm kvm 4096 Sep 15 2020
dec14b5b-69fe-4c51-abb4-d8296c547a0b
drwxr-xr-x. 2 vdsm kvm 4096 Sep 15 2020
e7a17f3b-7567-4c90-b58e-18f34d4686c0
drwxr-xr-x. 2 vdsm kvm 4096 May 20 09:29
ea786bee-b98e-44c0-a44a-a63af63b8c51
drwxr-xr-x. 2 vdsm kvm 4096 Apr 21 2021
ec37f395-3b4b-41df-8f5b-8a22b7e1273b
drwxr-xr-x. 2 vdsm kvm 4096 Feb 25 2022
f3e641c7-a2cc-42af-bcd1-b5ecdf437e03
drwxr-xr-x. 2 vdsm kvm 4096 May 18 23:55
f4c221a6-6ce4-44cc-8d32-d673e40aa915
drwxr-xr-x. 2 vdsm kvm 4096 May 20 11:35
f539c098-3f16-410f-938f-3de6ef37b017
2f2da2d7-596f-4a53-8a1b-a301f84b3b74/images/1eb2b682-ef45-409d-9341-abcf61418619:
total 104857624
drwxr-xr-x. 2 vdsm kvm 4096 Feb 3 2021 .
drwxr-xr-x. 23 vdsm kvm 4096 May 20 11:35 ..
-rwxr-xr-x. 1 vdsm kvm 107374182400 Feb 3 2021
e29b005f-d272-49f8-a162-41f908ca67a4
-rwxr-xr-x. 1 vdsm kvm 300 Feb 3 2021
e29b005f-d272-49f8-a162-41f908ca67a4.meta
2f2da2d7-596f-4a53-8a1b-a301f84b3b74/images/283838fd-43b5-42b9-a332-a7b980613188:
total 23223760
drwxr-xr-x. 2 vdsm kvm 4096 Aug 11 2022 .
drwxr-xr-x. 23 vdsm kvm 4096 May 20 11:35 ..
-rwxr-xr-x. 1 vdsm kvm 23781113856 Aug 11 2022
3273221e-1649-4e4a-816c-2ffec18e51c7
-rwxr-xr-x. 1 vdsm kvm 251 Aug 11 2022
3273221e-1649-4e4a-816c-2ffec18e51c7.meta
.
. etc
So... is there a way to get this back ?
I tried "Import Domain" both with the disk manually mounted and
unmounted, both failed and I get errors in the vdsm.log like this...
2023-06-15 15:57:55,166+1000 INFO (jsonrpc/0)
[storage.StorageServer.MountConnection] Creating directory
'/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74'
(storageServer:167)
2023-06-15 15:57:55,166+1000 INFO (jsonrpc/0) [storage.fileUtils]
Creating directory:
/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74
mode: None (fileUtils:201)
2023-06-15 15:57:55,166+1000 INFO (jsonrpc/0) [storage.Mount] mounting
/backups/2f2da2d7-596f-4a53-8a1b-a301f84b3b74 at
/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74
(mount:207)
2023-06-15 15:57:55,176+1000 ERROR (jsonrpc/0) [storage.HSM] Could not
connect to storageServer (hsm:2374)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line
2371, in connectStorageServer
conObj.connect()
File
"/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line
180, in connect
six.reraise(t, v, tb)
File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File
"/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line
171, in connect
self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line
210, in mount
cgroup=cgroup)
File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
line 56, in __call__
return callMethod()
File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
line 54, in <lambda>
**kwargs)
File "<string>", line 2, in mount
File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in
_callmethod
raise convert_to_error(kind, result)
vdsm.storage.mount.MountError: Command ['/usr/bin/mount', '-t', 'posix',
'/backups/2f2da2d7-596f-4a53-8a1b-a301f84b3b74',
'/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74']
failed with rc=32 out=b'' err=b"mount:
/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74:
unknown filesystem type 'posix'.\n"
2023-06-15 15:57:55,176+1000 INFO (jsonrpc/0)
[storage.StorageDomainCache] Invalidating storage domain cache (sdc:74)
2023-06-15 15:57:55,176+1000 INFO (jsonrpc/0) [vdsm.api] FINISH
connectStorageServer return={'statuslist': [{'id':
'00000000-0000-0000-0000-000000000000', 'status': 477}]}
from=::ffff:172.20.1.160,47702,
flow_id=6693c76d-b62b-4e62-9b70-c87f2a199705,
task_id=682abcff-3040-46d7-aaa1-f2445f0a6698 (api:54)
2023-06-15 15:57:55,283+1000 INFO (jsonrpc/2) [vdsm.api] START
disconnectStorageServer(domType=6,
spUUID='00000000-0000-0000-0000-000000000000', conList=[{'password':
'********', 'vfs_type': 'posix', 'port': '', 'iqn': '', 'connection':
'/backups/2f2da2d7-596f-4a53-8a1b-a301f84b3b74', 'ipv6_enabled':
'false', 'id': '00000000-0000-0000-0000-000000000000', 'user': '',
'tpgt': '1'}]) from=::ffff:172.20.1.160,47702,
flow_id=0d3d4a3e-3523-4436-93b3-e22bff47f082,
task_id=11870123-365d-4b40-b481-be5d44098fc2 (api:48)
2023-06-15 15:57:55,284+1000 INFO (jsonrpc/2) [storage.Mount]
unmounting
/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74
(mount:215)
2023-06-15 15:57:55,290+1000 ERROR (jsonrpc/2) [storage.HSM] Could not
disconnect from storageServer (hsm:2480)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line
2476, in disconnectStorageServer
conObj.disconnect()
File
"/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line
202, in disconnect
self._mount.umount(True, True)
File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line
217, in umount
umount(self.fs_file, force=force, lazy=lazy, freeloop=freeloop)
File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
line 56, in __call__
return callMethod()
File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
line 54, in <lambda>
**kwargs)
File "<string>", line 2, in umount
File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in
_callmethod
raise convert_to_error(kind, result)
vdsm.storage.mount.MountError: Command ['/usr/bin/umount', '-f', '-l',
'/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74']
failed with rc=32 out=b'' err=b'umount:
/rhev/data-center/mnt/_backups_2f2da2d7-596f-4a53-8a1b-a301f84b3b74: no
mount point specified.\n'
2023-06-15 15:57:55,290+1000 INFO (jsonrpc/2)
[storage.StorageDomainCache] Refreshing storage domain cache
(resize=False) (sdc:80)
2023-06-15 15:57:55,290+1000 INFO (jsonrpc/2) [storage.ISCSI] Scanning
iSCSI devices (iscsi:442)
2023-06-15 15:57:55,328+1000 INFO (jsonrpc/2) [storage.ISCSI] Scanning
iSCSI devices: 0.04 seconds (utils:390)
2023-06-15 15:57:55,328+1000 INFO (jsonrpc/2) [storage.HBA] Scanning FC
devices (hba:60)
2023-06-15 15:57:55,416+1000 INFO (jsonrpc/2) [storage.HBA] Scanning FC
devices: 0.09 seconds (utils:390)
2023-06-15 15:57:55,416+1000 INFO (jsonrpc/2) [storage.Multipath]
Waiting until multipathd is ready (multipath:112)
2023-06-15 15:57:57,458+1000 INFO (jsonrpc/2) [storage.Multipath]
Waited 2.04 seconds for multipathd (tries=2, ready=2) (multipath:139)
2023-06-15 15:57:57,458+1000 INFO (jsonrpc/2)
[storage.StorageDomainCache] Refreshing storage domain cache: 2.17
seconds (utils:390)
2023-06-15 15:57:57,458+1000 INFO (jsonrpc/2) [vdsm.api] FINISH
disconnectStorageServer return={'statuslist': [{'id':
'00000000-0000-0000-0000-000000000000', 'status': 477}]}
from=::ffff:172.20.1.160,47702,
flow_id=0d3d4a3e-3523-4436-93b3-e22bff47f082,
task_id=11870123-365d-4b40-b481-be5d44098fc2 (api:54)
2023-06-15 15:57:57,458+1000 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer]
RPC call StoragePool.disconnectStorageServer took more than 1.00 seconds
to succeed: 2.18 (__init__:316)
any help appreciated
thanks
Tim
1 year, 9 months
self-hosted engine with Local Storage
by Jorge Visentini
Hello.
I know that there is no logic, that there is no HA, redundancy and all
that... but is there any possibility that I can deploy the oVirt
self-hosted engine using an array of local disks on the host?
*In my scenario I don't need to have HA.*
*What I need:*
1 host with a self-hosted mechanism on a local domain store, that's all. On
this host I will locally run some old VMs. Isolated VMs.
--
Att,
Jorge Visentini
+55 55 98432-9868
1 year, 9 months