vGPU with NVIDIA M60 mdev_type not showing

Hi all, I have a host with 2 M60 with the latest supported driver installed, and working as you can see: root@esxh-03 vdsm]# lsmod | grep vfio nvidia_vgpu_vfio 49475 0 nvidia 16633974 1 nvidia_vgpu_vfio vfio_mdev 12841 0 mdev 20336 2 vfio_mdev,nvidia_vgpu_vfio vfio_iommu_type1 22300 0 vfio 32656 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 [root@esxh-03 vdsm]# nvidia-smi Mon Jan 14 17:39:30 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.91 Driver Version: 410.91 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 Off | 00000000:05:00.0 Off | Off | | 16% 27C P0 41W / 120W | 14MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 Off | 00000000:06:00.0 Off | Off | | 17% 24C P0 39W / 120W | 14MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla M60 Off | 00000000:84:00.0 Off | Off | | 15% 28C P0 41W / 120W | 14MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla M60 Off | 00000000:85:00.0 Off | Off | | 16% 25C P0 40W / 120W | 14MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ But the issue is that when I do : # vdsm-client Host hostdevListByCaps I don't see any "mdev" device. Also the directory /sys/class/mdev_bus is not existing. Am I missing something ? Cheers.

Hi, I am using CentOS 7.6 and the latest oVirt release. Is it possible that the package vdsm-hook-vfio-mdev is needed? As far as I understand it is already deprecated, but I cannot find anything on the documentation. [root@esxh-03 ~]# yum install vdsm-hook-vfio-mdev Loaded plugins: enabled_repos_upload, fastestmirror, package_upload, product-id, search-disabled-repos, : subscription-manager, vdsmupgrade This system is not registered with an entitlement server. You can use subscription-manager to register. Loading mirror speeds from cached hostfile * base: mirror2.hs-esslingen.de * extras: mirror2.hs-esslingen.de * ovirt-4.2: ftp.plusline.net * ovirt-4.2-epel: ftp-stud.hs-esslingen.de * updates: ftp.fau.de Package vdsm-hook-vfio-mdev-4.20.35-1.el7.noarch is obsoleted by vdsm-4.20.43-1.el7.x86_64 which is already installed Nothing to do OS Version: RHEL - 7 - 6.1810.2.el7.centos OS Description: CentOS Linux 7 (Core) Kernel Version: 3.10.0 - 957.1.3.el7.x86_64 KVM Version: 2.12.0 - 18.el7_6.1.1 LIBVIRT Version: libvirt-4.5.0-10.el7_6.3 VDSM Version: vdsm-4.20.43-1.el7 SPICE Version: 0.14.0 - 6.el7 CEPH Version: librbd1-10.2.5-4.el7 Open vSwitch Version: openvswitch-2.9.0-4.el7 Kernel Features: PTI: 1, IBRS: 0, RETP: 1 Cheers On 14/1/19 17:45, Josep Manel Andrés Moscardó wrote:
Hi all, I have a host with 2 M60 with the latest supported driver installed, and working as you can see:
root@esxh-03 vdsm]# lsmod | grep vfio nvidia_vgpu_vfio 49475 0 nvidia 16633974 1 nvidia_vgpu_vfio vfio_mdev 12841 0 mdev 20336 2 vfio_mdev,nvidia_vgpu_vfio vfio_iommu_type1 22300 0 vfio 32656 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 [root@esxh-03 vdsm]# nvidia-smi Mon Jan 14 17:39:30 2019 +-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.91 Driver Version: 410.91 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================|
| 0 Tesla M60 Off | 00000000:05:00.0 Off | Off | | 16% 27C P0 41W / 120W | 14MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 00000000:06:00.0 Off | Off | | 17% 24C P0 39W / 120W | 14MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+
| 2 Tesla M60 Off | 00000000:84:00.0 Off | Off | | 15% 28C P0 41W / 120W | 14MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+
| 3 Tesla M60 Off | 00000000:85:00.0 Off | Off | | 16% 25C P0 40W / 120W | 14MiB / 8191MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================|
| No running processes found | +-----------------------------------------------------------------------------+
But the issue is that when I do :
# vdsm-client Host hostdevListByCaps
I don't see any "mdev" device. Also the directory /sys/class/mdev_bus is not existing.
Am I missing something ?
Cheers.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YR4TYFM24TYQP6...
-- Josep Manel Andrés Moscardó Systems Engineer, IT Operations EMBL Heidelberg T +49 6221 387-8394

Josep Manel Andrés Moscardó <josep.moscardo@embl.de> writes:
Hi all, I have a host with 2 M60 with the latest supported driver installed, and working as you can see:
Hi, all looks fine and the same as on my setup, which is working. How about kernel command line (cat /proc/cmdline)? It's important to have intel_iommu=on there (assuming an Intel machine). [...]
Is it possible that the package vdsm-hook-vfio-mdev is needed? As far as I understand it is already deprecated, but I cannot find anything on the documentation.
The hook is no longer needed nor should be installed. Regards, Milan

Hi Milan, I have re-deployed the server and now is booting with: [root@esxh-03 ~]# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-957.1.3.el7.x86_64 root=/dev/mapper/centos_esxh--03-root ro nofb splash=quiet crashkernel=auto rd.lvm.lv=centos_esxh-03/root rd.lvm.lv=centos_esxh-03/swap rhgb quiet nouveau.modeset=0 intel_iommu=on But still not able to see mdev: [root@esxh-03 ~]# vdsm-client Host hostdevListByCaps | grep -i mdev [root@esxh-03 ~]# Do you have any other idea about what could be going on? Cheers. On 16/1/19 13:03, Milan Zamazal wrote:
Josep Manel Andrés Moscardó <josep.moscardo@embl.de> writes:
Hi all, I have a host with 2 M60 with the latest supported driver installed, and working as you can see:
Hi, all looks fine and the same as on my setup, which is working.
How about kernel command line (cat /proc/cmdline)? It's important to have intel_iommu=on there (assuming an Intel machine).
[...]
Is it possible that the package vdsm-hook-vfio-mdev is needed? As far as I understand it is already deprecated, but I cannot find anything on the documentation.
The hook is no longer needed nor should be installed.
Regards, Milan _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WRAZPQKQOWGPTO...
-- Josep Manel Andrés Moscardó Systems Engineer, IT Operations EMBL Heidelberg T +49 6221 387-8394

Hi, I found what was going on, so as Milan stated, the order is: - Enable intel_iommu=on - Reinstall the host - Install NVIDIA driver I had the nvidia driver installed before enabling iommu in the kernel.... Cheers. On 16/1/19 14:16, Josep Manel Andrés Moscardó wrote:
Hi Milan, I have re-deployed the server and now is booting with:
[root@esxh-03 ~]# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-957.1.3.el7.x86_64 root=/dev/mapper/centos_esxh--03-root ro nofb splash=quiet crashkernel=auto rd.lvm.lv=centos_esxh-03/root rd.lvm.lv=centos_esxh-03/swap rhgb quiet nouveau.modeset=0 intel_iommu=on
But still not able to see mdev:
[root@esxh-03 ~]# vdsm-client Host hostdevListByCaps | grep -i mdev [root@esxh-03 ~]#
Do you have any other idea about what could be going on?
Cheers.
On 16/1/19 13:03, Milan Zamazal wrote:
Josep Manel Andrés Moscardó <josep.moscardo@embl.de> writes:
Hi all, I have a host with 2 M60 with the latest supported driver installed, and working as you can see:
Hi, all looks fine and the same as on my setup, which is working.
How about kernel command line (cat /proc/cmdline)? It's important to have intel_iommu=on there (assuming an Intel machine).
[...]
Is it possible that the package vdsm-hook-vfio-mdev is needed? As far as I understand it is already deprecated, but I cannot find anything on the documentation.
The hook is no longer needed nor should be installed.
Regards, Milan _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WRAZPQKQOWGPTO...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3SPROYVRP6ID6L...
-- Josep Manel Andrés Moscardó Systems Engineer, IT Operations EMBL Heidelberg T +49 6221 387-8394
participants (2)
-
Josep Manel Andrés Moscardó
-
Milan Zamazal