Cannot start VM with vGPU (NVIDIA)

Hi, need a help! may be the problem is very simple, but I've not found solution: -oVirt 4.5.4 on RHEL 8.7 -amd_iommu is on on the Host with Nvidia Card A40 I can see my cards in vGPU Settings for Windows VMs (Windows 10 Pro). Without activation of the vGPU device VM started normally. With one active vGPU card VM cannot start. Error message on the Host: Jun 07 13:39:51 depotlsa8ovh1 kernel: [nvidia-vgpu-vfio] e80ae200-4cea-4213-9e78-ebf0b86a756a: start failed. status: 0x0 Timeout Occured Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Kann nicht vom Monitor lesen: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Interner Fehler: qemu unexpectedly closed the monitor: 2023-06-07T11:39:51.494056Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/e80ae200-4cea-4213-9e78-ebf0b86a756a,display=on,ramfb=on,bus=pci.7,addr=0x0: vfio e80ae200-4cea-4213-9e78-ebf0b86a756a: error getting device from group 156: Connection timed out. Verify all devices in group 156 are bound to vfio-<bus> or pci-stub and not already in use. Looks like libvirtd cannot get any device. Is it problem of Nvidia settings on the host or Problem of VM settings? Thank you Paolo

You mention amd_iommu but you are using Nvidia and I doubt this is the correct approach. Have you checked the procedure at: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/htm... It seems quite extensive and if your Nvidia supports vGPU, you can give it a try. Best Regards,Strahil Nikolov Sent from Yahoo Mail for iPhone On Sunday, June 11, 2023, 4:32 PM, pawel.osadtschy01--- via Users <users@ovirt.org> wrote: Hi, need a help! may be the problem is very simple, but I've not found solution: -oVirt 4.5.4 on RHEL 8.7 -amd_iommu is on on the Host with Nvidia Card A40 I can see my cards in vGPU Settings for Windows VMs (Windows 10 Pro). Without activation of the vGPU device VM started normally. With one active vGPU card VM cannot start. Error message on the Host: Jun 07 13:39:51 depotlsa8ovh1 kernel: [nvidia-vgpu-vfio] e80ae200-4cea-4213-9e78-ebf0b86a756a: start failed. status: 0x0 Timeout Occured Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Kann nicht vom Monitor lesen: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Interner Fehler: qemu unexpectedly closed the monitor: 2023-06-07T11:39:51.494056Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/e80ae200-4cea-4213-9e78-ebf0b86a756a,display=on,ramfb=on,bus=pci.7,addr=0x0: vfio e80ae200-4cea-4213-9e78-ebf0b86a756a: error getting device from group 156: Connection timed out. Verify all devices in group 156 are bound to vfio-<bus> or pci-stub and not already in use. Looks like libvirtd cannot get any device. Is it problem of Nvidia settings on the host or Problem of VM settings? Thank you Paolo _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/55NGZTV55BFLRM...

Hi Strahil, thank you for your emal. I’ve repeat all steps from the Manual. I think, my problem deals with kernel rebuild (dracut –force). I’ve repeat again – VM can be started, NVIDIA Card can be configured. During Start I can see another errors on the Host: kernel: [nvidia-vgpu-vfio] b21e49d5-c13c-4695-83b8-212cd2f0830a: vGPU migration disabled vdsm[4487]: ERROR Error getting managedvolume connector info: Managed Volume Helper failed….. No module named \\\'importlib_resources\\\'\\n\<file:///'importlib_resources/'/n/>'' But my first problem looks solved. Regards Pawel Von: Strahil Nikolov <hunter86_bg@yahoo.com> Gesendet: Sonntag, 11. Juni 2023 16:05 An: Osadtschy,Pawel IT4 Logistics <pawel.osadtschy01@dhl.com>; users@ovirt.org Betreff: Re: [ovirt-users] Cannot start VM with vGPU (NVIDIA) You mention amd_iommu but you are using Nvidia and I doubt this is the correct approach. Have you checked the procedure at: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/htm... It seems quite extensive and if your Nvidia supports vGPU, you can give it a try. Best Regards, Strahil Nikolov Sent from Yahoo Mail for iPhone<https://mail.onelink.me/107872968?pid=nativeplacement&c=Global_Acquisition_YMktg_315_Internal_EmailSignature&af_sub1=Acquisition&af_sub2=Global_YMktg&af_sub3=&af_sub4=100000604&af_sub5=EmailSignature__Static_> On Sunday, June 11, 2023, 4:32 PM, pawel.osadtschy01--- via Users <users@ovirt.org<mailto:users@ovirt.org>> wrote: Hi, need a help! may be the problem is very simple, but I've not found solution: -oVirt 4.5.4 on RHEL 8.7 -amd_iommu is on on the Host with Nvidia Card A40 I can see my cards in vGPU Settings for Windows VMs (Windows 10 Pro). Without activation of the vGPU device VM started normally. With one active vGPU card VM cannot start. Error message on the Host: Jun 07 13:39:51 depotlsa8ovh1 kernel: [nvidia-vgpu-vfio] e80ae200-4cea-4213-9e78-ebf0b86a756a: start failed. status: 0x0 Timeout Occured Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Kann nicht vom Monitor lesen: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt Jun 07 13:39:51 depotlsa8ovh1 libvirtd[3278]: Interner Fehler: qemu unexpectedly closed the monitor: 2023-06-07T11:39:51.494056Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/e80ae200-4cea-4213-9e78-ebf0b86a756a,display=on,ramfb=on,bus=pci.7,addr=0x0: vfio e80ae200-4cea-4213-9e78-ebf0b86a756a: error getting device from group 156: Connection timed out. Verify all devices in group 156 are bound to vfio-<bus> or pci-stub and not already in use. Looks like libvirtd cannot get any device. Is it problem of Nvidia settings on the host or Problem of VM settings? Thank you Paolo _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/55NGZTV55BFLRM...

Hi Strahil, maybe you can help me again: I’ve set Alias Name with SSO_ALTERNATE_ENGINE_FQDNS=….. I can reach the first Page under the alias name, but if I klick for example on “Administration Portal”, I receive 500 Error. Error_log shows something like: auth_openidc:error…the "state" and "session" cookies will not be shared between the two! But it seems not to be a problem. Thank you in advance Regards Pawel
participants (3)
-
Osadtschy,Pawel IT4 Logistics
-
pawel.osadtschy01@dhl.com
-
Strahil Nikolov