I noticed this document https://docs.nvidia.com/vgpu/16.0/grid-vgpu-release-notes-generic-linux-kvm/index.html#all-nvlink-gpus-must-be-passed-through-to-same-vm has this to say
In pass through mode, all GPUs connected to each other through NVLink must be assigned to the same VM. If a subset of GPUs connected to each other through NVLink is passed through to a VM, unrecoverable error XID 74 occurs when the VM is booted. If a subset of GPUs connected to each other through NVLink is passed through to a VM, unrecoverable error XID 74 occurs when the VM is booted. This error corrupts the NVLink state on the physical GPUs and, as a result, the NVLink bridge between the NVLink and the physical GPUs is not recognized. result, the NVLink bridge between the GPUs is unusable.
You may need to passthrough all GPUs in the nvlink to the VM