Hi Strahil,
color me surprised, too, especially considering where things are supposed to go in terms
of roadmap.
Yet again, both oVIrt and Docker could be excused to think that they "own the
hardware" they are running on, because it's a rather natural assumption, even if
there are good reasons to run VMs and containers side-by-side as well as nested.
Yes, I believe, I have also done the reverse, put Docker on a system that was already
running as oVirt compute host and it wasn't with better results, either. The biggest
challenge is to repairing the node, without having to re-install the whole thing and I
have gone through quite a few wobbles there, with colleagues who weren't too
appreciative of having their ML jobs fail (those tend to be rather lenghty...)
I've managed to get CUDA work on KVM VMs twiddling the XML config files in these ways
documented on the Web. But with oVirt those KVM XML config files get generated on the fly
in Python and I'd have to fiddle with the code which does that.
Actually I *did* try doing that at one point in time about a year or two ago, but I never
found the right place, where that code actually was taken from. You see, there are copies
of that code on every node, but also on the management engine. And you know how Ansible
squirts code from machine to machine to do its magic, so that's were I stopped at one
point, because running ML workloads containerized was more natural anyway and I was happy
to have CPU-only VMs at their side.
Besides GPU access also disables live-migration and the abillity to move these
long-running functional VMs around to manage resources for ML jobs is exactly what
attracts me to oVirt.
Currently I am mostly probing around here, to see if what I try would be considered
totally esoteric or irresponsible or if it's worth reporting as a bug.