My apologies for duplicating posts - they initially got stuck and I really wanted to reach out to the group with the query to try and discover unknowns.  

Passing through the whole pci nvme device is fine, because the VM is locked to the host due to the gpu pci pass through anyway. I will implement a mechanism to protect the data on the single disk in both cases.

I'm not exactly sure what type of disk writes are being used, it's a learning model being trained by the gpu's. I'll try and find out more. After I finished the config I searched online to get some basic throughput test for the disk. Here's the commands and results taken at that time (below). 

Test on host with "local storage" (using a disk image on the nvme drive)
# dd if=/dev/zero of=test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.92561 s, 558 MB/s

Test on host with nvme pass through
# dd if=/dev/zero of=/mnt/nvme/tmpflag bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.42554 s, 753 MB/s
In both cases the nvme was used as a mounted additional drive. The OS is booting on different disk image, which is located in a Storage Domain over iscsi.

I'm not anything close to a storage expert but I understand the gist of the descriptions I find when searching about the dd parameters. Since it looks like both configurations are going to be OK for longevity I'll aim to test both scenarios live and choose the one which gives the best result for the workload. 

Thanks a lot for your reply and help :)

Tony Pearce


On Fri, 6 Aug 2021 at 03:28, Thomas Hoberg <thomas@hoberg.net> wrote:
You gave some different details in your other post, but here you mention use of GPU pass through.

Any pass through will lose you the live migration ability, but unfortunately with GPUs, that's just how it is these days: while those could in theory be moved when the GPUs were identical (because their amount of state is limited to VRAM size), the support code (and kernel interfaces?) simply does not exist today.

In that scenario a pass-through storage device won't lose you anything you still have.

But you'll have to remember that PCI pass-through works only at the granularity of a whole PCI device. That's fine with (an entire) NVMe, because these combine "disks" and "controller", not so fine with individual disks on a SATA or SCSI controller. And you certainly can't pass through partitions!

It gets to be really fun with cascaded USB and I haven't really tried Thunderbolt either (mostly because I have given up on CentOS8/oVirt 4.4)

But generally the VirtIOSCSI interface imposes so little overhead, it only becomes noticeable when you run massive amounts of tiny I/O on NVMe. Play with the block sizes and the sync flag on your DD tests to see the differences, I've had lots of fun (and some disillusions) with that, but mostly with Gluster storage over TCP/IP on Ethernet.

If that's really where your bottlenecks are coming from, you may want to look at architecture rather than pass-through.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6CJPD6TKL4M44O77RECZYTNVNSSMXJRU/