Dear All,
I'm still having the same problems, is this a bug or something that's configured
incorrectly?
Regards,
Callum
--
Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>
On 18 May 2018, at 13:22, Callum Smith
<callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>> wrote:
Yep, creating the mdev manually works, and in fact like I said previously, the VM does
actually create an mdev successfully as you can see the UUID of the device (and is
correctly identifiable though the
/sys/class/mdev_bus/${DEVICE_ADDR}/${UUID}/mdev_type/name
In this specific case to help with the logs, the uuid generated is consistently the
similar (even after manual deletion) of "f5dc8396-dad5-3893-9eb4-94eedf60a881"
The VM then fails to start because of the MTU issue. Restarting the VM on the node then
produces the issue of the device not being available (because the device with the previous
uuid exists and it's of max_instance=1). So it's the first VM start with the MTU
issue that needs resolving, with the added complication that the issue of MTU (network) is
caused by the mdev being set. The same error does not happen when mdev is not set.
PS. In fact this was the guide i followed, so thank you Martin for writing it, without it
getting this far would have been very difficult:
https://mpolednik.github.io/2017/09/13/vgpu-in-ovirt/
Regards,
Callum
--
Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>
On 18 May 2018, at 13:05, Martin Polednik
<mpolednik@redhat.com<mailto:mpolednik@redhat.com>> wrote:
On 18/05/18 13:42 +0200, Francesco Romani wrote:
Hi,
On 05/17/2018 10:56 AM, Callum Smith wrote:
In an attempt not to mislead you guys as well, there appears to be a
separate, vGPU specific, issue.
https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0
I've uploaded the full vdsm.log to dropbox. Most recently I tried
unmounting alll network devices from the VM and booting it and i get a
different issue around the vGPU:
2018-05-17 09:48:24,806+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_hos
tedengine: rc=0 err= (hooks:110)
2018-05-17 09:48:24,953+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vfi
o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
(hooks:110)
2018-05-17 09:48:25,069+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vho
stmd: rc=0 err= (hooks:110)
2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm]
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
') The vm start process failed (vm:943)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
in _startUnderlyingVm
self._run()
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862,
in _run
self._custom)
File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
153, in before_vm_start
return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
120, in _runHooksDir
raise exception.HookError(err)
HookError: Hook Error: ('',)
Despite the nvidia-61 being an option on the
GPU:
https://pastebin.com/bucw21DG
Let's tackle one issue at time :)
From the shared logs, the VM start failed because of
2018-05-17 10:11:12,681+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_hostedengine: rc=0 err= (hooks:110)
2018-05-17 10:11:12,837+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device with type
nvidia-53 is available.
maybe Martin can shed some light here?
Given that the actual slice is available in sysfs (as indicated by one
of the other branches of this thread), I fear we may be facing some
weird issue with the driver itself.
Can you create the mdev manually?
$ uuidgen >
/sys/class/mdev_bus/${DEVICE_ADDR}/mdev_supported_types/nvidia-61
should be enough for a test.
Callum, please share Vdsm logs showing the network failure
Bests,
--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh
_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org>