Hi,
On 05/17/2018 10:56 AM, Callum Smith wrote:
> In an attempt not to mislead you guys as well, there appears to be a
> separate, vGPU specific, issue.
>
>
https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0
>
> I've uploaded the full vdsm.log to dropbox. Most recently I tried
> unmounting alll network devices from the VM and booting it and i get a
> different issue around the vGPU:
>
> 2018-05-17 09:48:24,806+0100 INFO (vm/1bc9dae8) [root]
> /usr/libexec/vdsm/hooks/before_vm_start/50_hos
> tedengine: rc=0 err= (hooks:110)
> 2018-05-17 09:48:24,953+0100 INFO (vm/1bc9dae8) [root]
> /usr/libexec/vdsm/hooks/before_vm_start/50_vfi
> o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
> (hooks:110)
> 2018-05-17 09:48:25,069+0100 INFO (vm/1bc9dae8) [root]
> /usr/libexec/vdsm/hooks/before_vm_start/50_vho
> stmd: rc=0 err= (hooks:110)
> 2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm]
> (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
> ') The vm start process failed (vm:943)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
> in _startUnderlyingVm
> self._run()
> File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862,
> in _run
> self._custom)
> File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
> 153, in before_vm_start
> return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
> File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
> 120, in _runHooksDir
> raise exception.HookError(err)
> HookError: Hook Error: ('',)
>
> Despite the nvidia-61 being an option on the
>
GPU: https://pastebin.com/bucw21DG
Let's tackle one issue at time :)
From the shared logs, the VM start failed because of
2018-05-17 10:11:12,681+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_hostedengine: rc=0 err= (hooks:110)
2018-05-17 10:11:12,837+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device with type
nvidia-53 is available.
maybe Martin can shed some light here?
Given that the actual slice is available in sysfs (as indicated by one
of the other branches of this thread), I fear we may be facing some
weird issue with the driver itself.
Can you create the mdev manually?
$ uuidgen >
/sys/class/mdev_bus/${DEVICE_ADDR}/mdev_supported_types/nvidia-61
should be enough for a test.
>Callum, please share Vdsm logs showing the network failure
>
>
>Bests,
>
>--
>Francesco Romani
>Senior SW Eng., Virtualization R&D
>Red Hat
>IRC: fromani github: @fromanirh
>