Yea pass through, I think vgpu you have to pay for driver upgrade with
nvidia, I've not tried that and don't know the price, didn't find getting
info on it easy last time I tried.
Have used in both legacy and uefi boot machines, don't know the chipsets
off the top of my head, will look on Monday.
On Fri, 4 Sep 2020, 20:56 Vinícius Ferrão, <ferrao(a)versatushpc.com.br>
wrote:
Thanks Michael and Arman.
To make things clear, you guys are using Passthrough, right? It’s not
vGPU. The 4x GPUs are added on the “Host Devices” tab of the VM.
What I’m trying to achieve is add the 4x V100 directly to one specific VM.
And finally can you guys confirm which BIOS type is being used in your
machines? I’m with Q35 Chipset with UEFI BIOS. I haven’t tested it with
legacy, perhaps I’ll give it a try.
Thanks again.
On 4 Sep 2020, at 14:09, Michael Jones <mj(a)mikejonesey.co.uk> wrote:
Also use multiple t4, also p4, titans, no issues but never used the nvlink
On Fri, 4 Sep 2020, 16:02 Arman Khalatyan, <arm2arm(a)gmail.com> wrote:
> hi,
> with the 2xT4 we haven't seen any trouble. we have no nvlink there.
>
> did u try to disable the nvlink?
>
>
>
> Vinícius Ferrão via Users <users(a)ovirt.org> schrieb am Fr., 4. Sept.
> 2020, 08:39:
>
>> Hello, here we go again.
>>
>> I’m trying to passthrough 4x NVIDIA Tesla V100 GPUs (with NVLink) to a
>> single VM; but things aren’t that good. Only one GPU shows up on the VM.
>> lspci is able to show the GPUs, but three of them are unusable:
>>
>> 08:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
>> (rev a1)
>> 09:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
>> (rev a1)
>> 0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
>> (rev a1)
>> 0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
>> (rev a1)
>>
>> There are some errors on dmesg, regarding a misconfigured BIOS:
>>
>> [ 27.295972] nvidia: loading out-of-tree module taints kernel.
>> [ 27.295980] nvidia: module license 'NVIDIA' taints kernel.
>> [ 27.295981] Disabling lock debugging due to kernel taint
>> [ 27.304180] nvidia: module verification failed: signature and/or
>> required key missing - tainting kernel
>> [ 27.364244] nvidia-nvlink: Nvlink Core is being initialized, major
>> device number 241
>> [ 27.579261] nvidia 0000:09:00.0: enabling device (0000 -> 0002)
>> [ 27.579560] NVRM: This PCI I/O region assigned to your NVIDIA device
>> is invalid:
>> NVRM: BAR1 is 0M @ 0x0 (PCI:0000:09:00.0)
>> [ 27.579560] NVRM: The system BIOS may have misconfigured your GPU.
>> [ 27.579566] nvidia: probe of 0000:09:00.0 failed with error -1
>> [ 27.580727] NVRM: This PCI I/O region assigned to your NVIDIA device
>> is invalid:
>> NVRM: BAR0 is 0M @ 0x0 (PCI:0000:0a:00.0)
>> [ 27.580729] NVRM: The system BIOS may have misconfigured your GPU.
>> [ 27.580734] nvidia: probe of 0000:0a:00.0 failed with error -1
>> [ 27.581299] NVRM: This PCI I/O region assigned to your NVIDIA device
>> is invalid:
>> NVRM: BAR0 is 0M @ 0x0 (PCI:0000:0b:00.0)
>> [ 27.581300] NVRM: The system BIOS may have misconfigured your GPU.
>> [ 27.581305] nvidia: probe of 0000:0b:00.0 failed with error -1
>> [ 27.581333] NVRM: The NVIDIA probe routine failed for 3 device(s).
>> [ 27.581334] NVRM: loading NVIDIA UNIX x86_64 Kernel Module
>> 450.51.06 Sun Jul 19 20:02:54 UTC 2020
>> [ 27.649128] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver
>> for UNIX platforms 450.51.06 Sun Jul 19 20:06:42 UTC 2020
>>
>> The host is Secure Intel Skylake (x86_64). VM is running with Q35
>> Chipset with UEFI (pc-q35-rhel8.2.0)
>>
>> I’ve tried to change the I/O mapping options on the host, tried with
>> 56TB and 12TB without success. Same results. Didn’t tried with 512GB since
>> the machine have 768GB of system RAM.
>>
>> Tried blacklisting the nouveau on the host, nothing.
>> Installed NVIDIA drivers on the host, nothing.
>>
>> In the host I can use the 4x V100, but inside a single VM it’s
>> impossible.
>>
>> Any suggestions?
>>
>>
>>
>> _______________________________________________
>> Users mailing list -- users(a)ovirt.org
>> To unsubscribe send an email to users-leave(a)ovirt.org
>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>>
https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73CXU27AX6N...
>>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PIO4DIVUU4J...
>