In an attempt not to mislead you guys as well, there appears to be a separate, vGPU
specific, issue.
https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0
I've uploaded the full vdsm.log to dropbox. Most recently I tried unmounting alll
network devices from the VM and booting it and i get a different issue around the vGPU:
2018-05-17 09:48:24,806+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_hos
tedengine: rc=0 err= (hooks:110)
2018-05-17 09:48:24,953+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vfi
o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
(hooks:110)
2018-05-17 09:48:25,069+0100 INFO (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vho
stmd: rc=0 err= (hooks:110)
2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm]
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
') The vm start process failed (vm:943)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in
_startUnderlyingVm
self._run()
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862, in _run
self._custom)
File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 153, in
before_vm_start
return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 120, in
_runHooksDir
raise exception.HookError(err)
HookError: Hook Error: ('',)
Despite the nvidia-61 being an option on the GPU:
https://pastebin.com/bucw21DG
So I think we have two issues here, one relating to the network and one to GPU.
Thanks all for your rapid and very useful help!
Regards,
Callum
--
Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>
On 17 May 2018, at 09:28, Ales Musil
<amusil@redhat.com<mailto:amusil@redhat.com>> wrote:
Seems like some vdsm problem with xml generation.
+Francesco
On Thu, May 17, 2018 at 10:20 AM, Callum Smith
<callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>> wrote:
PS. some other WARN's that come up on the host:
WARN File:
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.org.qemu.guest_agent.0
already removed
vdsm
WARN Attempting to remove a non existing net user:
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN Attempting to remove a non existing network:
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN File:
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.ovirt-guest-agent.0
already removed
vdsm
WARN Attempting to add an existing net user:
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
Regards,
Callum
--
Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>
On 17 May 2018, at 09:16, Callum Smith
<callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>> wrote:
OVN Network provider is used, and the node is running 4.2.3 (specifically 2018051606 clean
install last night).
Regards,
Callum
--
Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>
On 17 May 2018, at 07:47, Ales Musil
<amusil@redhat.com<mailto:amusil@redhat.com>> wrote:
On Thu, May 17, 2018 at 12:01 AM, Callum Smith
<callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>> wrote:
Dear All,
Our vGPU installation is progressing, though the VM is failing to start.
2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm]
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process failed
(vm:943)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in
_startUnderlyingVm
self._run()
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872, in _run
dom.createWithFlags(flags)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line
130, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in
wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in
createWithFlags
if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed',
dom=self)
libvirtError: Cannot get interface MTU on '': No such device
That's the specific error, some other information. It seems the GPU
'allocation' of uuid against the nvidia-xx mdev type is proceeding correctly, and
the device is being created by the VM instantiation but the VM does not succeed in going
up with this error. Any other logs or information relevant to help diagnose?
Regards,
Callum
--
Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. callum@well.ox.ac.uk<mailto:callum@well.ox.ac.uk>
_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org>
Hi Callum,
can you share your version of the setup?
Also do you use OVS switch type in the cluster?
Regards,
Ales.
--
ALES MUSIL
INTERN - rhv network
Red Hat
EMEA<https://www.redhat.com/>
amusil@redhat.com<mailto:amusil@redhat.com> IM: amusil
[
https://www.redhat.com/files/brand/email/sig-redhat.png]<https://red.h...
_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org>
--
ALES MUSIL
INTERN - rhv network
Red Hat
EMEA<https://www.redhat.com/>
amusil@redhat.com<mailto:amusil@redhat.com> IM: amusil
[
https://www.redhat.com/files/brand/email/sig-redhat.png]<https://red.h...