My apologies for duplicating posts - they initially got stuck and I really wanted to reach out to the group with the query to try and discover unknowns.
Passing through the whole pci nvme device is fine, because the VM is locked to the host due to the gpu pci pass through anyway. I will implement a mechanism to protect the data on the single disk in both cases.
I'm not exactly sure what type of disk writes are being used, it's a learning model being trained by the gpu's. I'll try and find out more. After I finished the config I searched online to get some basic throughput test for the disk. Here's the commands and results taken at that time (below).
Test on host with "local storage" (using a disk image on the nvme drive)
# dd if=/dev/zero of=test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.92561 s, 558 MB/s
Test on host with nvme pass through
# dd if=/dev/zero of=/mnt/nvme/tmpflag bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.42554 s, 753 MB/s
In both cases the nvme was used as a mounted additional drive. The OS is booting on different disk image, which is located in a Storage Domain over iscsi.
I'm not anything close to a storage expert but I understand the gist of the descriptions I find when searching about the dd parameters. Since it looks like both configurations are going to be OK for longevity I'll aim to test both scenarios live and choose the one which gives the best result for the workload.
Thanks a lot for your reply and help :)