I'm trying to figure out how to keep a "broken" NFS mount point from causing the entire HCI cluster to crash.

HCI is working beautifully.

Last night, I finished adding some NFS storage to the cluster - this is storage that I don't necessarily need to be HA, and I was hoping to store some backups and less-important VMs on, since my Gluster (sssd) storage availability is pretty limited.

But as a test, after I got everything setup, I stopped the nfs-server.
This caused the entire cluster to go down, and several VMs - that are not stored on the NFS storage - went belly up.

Once I started the NFS server process again, HCI did what it was supposed to do, and was able to automatically recover.

My concern is that NFS is a single point of failure, and if VMs that don't even rely on that storage are affected if the NFS storage goes away, then I don't want anything to do with it.

On the other hand, I'm still struggling to come up with a good way to run on-site backups and snapshots without using up more gluster space on my (more expensive) sssd storage.

Is there any way to setup NFS storage for a Backup Domain - as well as a Data domain (for lesser important VMs) - such that, if the NFS server crashed, all of my non-NFS stuff would be unaffected?

Sent with ProtonMail Secure Email.