You're welcome!
My machine learning team members that I am maintaining oVirt for tend to load training
data is large sequential batches, which means bandwidth is nice to have. While I give them
local SSD storage on the compute nodes, I also give them lots of HDD/VDO based gluster
file space, which might do miserably on OLTP, but pipes out sequential data at rates at
least similar to SATA SSDs with a 10Gbit network. Seems to work for them, because to CUDA
applications even RAM is barely faster the block storage.
PCIe 4.0 NVMe at 8GB/s per device becomes a challenge to any block storage abstraction,
inside or outside a VM. And when we are talking about NVMe storage with "native"
KV APIs like FusionIO did back then, PCI pass-through will be a necessity, unless somebody
comes up with a new hardware abstraction layer.