any progress in this gpu question?
in our setup we have supermicro boards with intel xeon gold 6146 + 2 T4
we add extra line in the /etc/default/grub
"rd.driver.blacklist=nouveau nouveau.modeset=0 pci-stub.ids=xxx:xxx
intel_iommu=on"
would be interesting if the nvlink was the showstopper.
Arman Khalatyan <arm2arm(a)gmail.com> schrieb am Sa., 5. Sept. 2020, 00:38:
same here ☺️, on Monday will check them.
Michael Jones <mj(a)mikejonesey.co.uk> schrieb am Fr., 4. Sept. 2020, 22:01:
> Yea pass through, I think vgpu you have to pay for driver upgrade with
> nvidia, I've not tried that and don't know the price, didn't find
getting
> info on it easy last time I tried.
>
> Have used in both legacy and uefi boot machines, don't know the chipsets
> off the top of my head, will look on Monday.
>
>
> On Fri, 4 Sep 2020, 20:56 Vinícius Ferrão, <ferrao(a)versatushpc.com.br>
> wrote:
>
>> Thanks Michael and Arman.
>>
>> To make things clear, you guys are using Passthrough, right? It’s not
>> vGPU. The 4x GPUs are added on the “Host Devices” tab of the VM.
>> What I’m trying to achieve is add the 4x V100 directly to one specific
>> VM.
>>
>> And finally can you guys confirm which BIOS type is being used in your
>> machines? I’m with Q35 Chipset with UEFI BIOS. I haven’t tested it with
>> legacy, perhaps I’ll give it a try.
>>
>> Thanks again.
>>
>> On 4 Sep 2020, at 14:09, Michael Jones <mj(a)mikejonesey.co.uk> wrote:
>>
>> Also use multiple t4, also p4, titans, no issues but never used the
>> nvlink
>>
>> On Fri, 4 Sep 2020, 16:02 Arman Khalatyan, <arm2arm(a)gmail.com> wrote:
>>
>>> hi,
>>> with the 2xT4 we haven't seen any trouble. we have no nvlink there.
>>>
>>> did u try to disable the nvlink?
>>>
>>>
>>>
>>> Vinícius Ferrão via Users <users(a)ovirt.org> schrieb am Fr., 4. Sept.
>>> 2020, 08:39:
>>>
>>>> Hello, here we go again.
>>>>
>>>> I’m trying to passthrough 4x NVIDIA Tesla V100 GPUs (with NVLink) to a
>>>> single VM; but things aren’t that good. Only one GPU shows up on the VM.
>>>> lspci is able to show the GPUs, but three of them are unusable:
>>>>
>>>> 08:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2
>>>> 16GB] (rev a1)
>>>> 09:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2
>>>> 16GB] (rev a1)
>>>> 0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2
>>>> 16GB] (rev a1)
>>>> 0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2
>>>> 16GB] (rev a1)
>>>>
>>>> There are some errors on dmesg, regarding a misconfigured BIOS:
>>>>
>>>> [ 27.295972] nvidia: loading out-of-tree module taints kernel.
>>>> [ 27.295980] nvidia: module license 'NVIDIA' taints kernel.
>>>> [ 27.295981] Disabling lock debugging due to kernel taint
>>>> [ 27.304180] nvidia: module verification failed: signature and/or
>>>> required key missing - tainting kernel
>>>> [ 27.364244] nvidia-nvlink: Nvlink Core is being initialized, major
>>>> device number 241
>>>> [ 27.579261] nvidia 0000:09:00.0: enabling device (0000 -> 0002)
>>>> [ 27.579560] NVRM: This PCI I/O region assigned to your NVIDIA
>>>> device is invalid:
>>>> NVRM: BAR1 is 0M @ 0x0 (PCI:0000:09:00.0)
>>>> [ 27.579560] NVRM: The system BIOS may have misconfigured your GPU.
>>>> [ 27.579566] nvidia: probe of 0000:09:00.0 failed with error -1
>>>> [ 27.580727] NVRM: This PCI I/O region assigned to your NVIDIA
>>>> device is invalid:
>>>> NVRM: BAR0 is 0M @ 0x0 (PCI:0000:0a:00.0)
>>>> [ 27.580729] NVRM: The system BIOS may have misconfigured your GPU.
>>>> [ 27.580734] nvidia: probe of 0000:0a:00.0 failed with error -1
>>>> [ 27.581299] NVRM: This PCI I/O region assigned to your NVIDIA
>>>> device is invalid:
>>>> NVRM: BAR0 is 0M @ 0x0 (PCI:0000:0b:00.0)
>>>> [ 27.581300] NVRM: The system BIOS may have misconfigured your GPU.
>>>> [ 27.581305] nvidia: probe of 0000:0b:00.0 failed with error -1
>>>> [ 27.581333] NVRM: The NVIDIA probe routine failed for 3 device(s).
>>>> [ 27.581334] NVRM: loading NVIDIA UNIX x86_64 Kernel Module
>>>> 450.51.06 Sun Jul 19 20:02:54 UTC 2020
>>>> [ 27.649128] nvidia-modeset: Loading NVIDIA Kernel Mode Setting
>>>> Driver for UNIX platforms 450.51.06 Sun Jul 19 20:06:42 UTC 2020
>>>>
>>>> The host is Secure Intel Skylake (x86_64). VM is running with Q35
>>>> Chipset with UEFI (pc-q35-rhel8.2.0)
>>>>
>>>> I’ve tried to change the I/O mapping options on the host, tried with
>>>> 56TB and 12TB without success. Same results. Didn’t tried with 512GB
since
>>>> the machine have 768GB of system RAM.
>>>>
>>>> Tried blacklisting the nouveau on the host, nothing.
>>>> Installed NVIDIA drivers on the host, nothing.
>>>>
>>>> In the host I can use the 4x V100, but inside a single VM it’s
>>>> impossible.
>>>>
>>>> Any suggestions?
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list -- users(a)ovirt.org
>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>> oVirt Code of Conduct:
>>>>
https://www.ovirt.org/community/about/community-guidelines/
>>>> List Archives:
>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73CXU27AX6N...
>>>>
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>>
https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PIO4DIVUU4J...
>>>
>>
>>