Regards,

Callum

--

Callum Smith

Research Computing Core

Wellcome Trust Centre for Human Genetics
University of Oxford
e. callum@well.ox.ac.uk

On 18 May 2018, at 13:05, Martin Polednik <mpolednik@redhat.com> wrote:

On 18/05/18 13:42 +0200, Francesco Romani wrote:

Hi,

On 05/17/2018 10:56 AM, Callum Smith wrote:

In an attempt not to mislead you guys as well, there appears to be a
separate, vGPU specific, issue.

https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0

I've uploaded the full vdsm.log to dropbox. Most recently I tried
unmounting alll network devices from the VM and booting it and i get a
different issue around the vGPU:

2018-05-17 09:48:24,806+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_hos
tedengine: rc=0 err= (hooks:110)
2018-05-17 09:48:24,953+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vfi
o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
(hooks:110)
2018-05-17 09:48:25,069+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vho
stmd: rc=0 err= (hooks:110)
2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm]
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
in _startUnderlyingVm
  self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862,
in _run
  self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
153, in before_vm_start
  return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
120, in _runHooksDir
  raise exception.HookError(err)
HookError: Hook Error: ('',)

Despite the nvidia-61 being an option on the
GPU: https://pastebin.com/bucw21DG

Let's tackle one issue at time :)
From the shared logs, the VM start failed because of

2018-05-17 10:11:12,681+0100 INFO (vm/1bc9dae8) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_hostedengine: rc=0 err= (hooks:110)
2018-05-17 10:11:12,837+0100 INFO (vm/1bc9dae8) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device with type nvidia-53 is available.

maybe Martin can shed some light here?

Given that the actual slice is available in sysfs (as indicated by one
of the other branches of this thread), I fear we may be facing some
weird issue with the driver itself.
Can you create the mdev manually?

$ uuidgen >
/sys/class/mdev_bus/${DEVICE_ADDR}/mdev_supported_types/nvidia-61

should be enough for a test.

Callum, please share Vdsm logs showing the network failure

Bests,

--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh