In the end we reached the conclusion that the server itself was stuck and needed a hard-reset. This raises the question why didn't we have a watchdog device configured on the VM to automatically detect and deal with such issues.
This experience led us to the understanding that resources is fulfilling far too many critical roles ATM for us to be able to responsibly keep it as a simple single VM. I've created the following Epic to track and discuss work for improving the infrastructure behind
resources.ovirt.org to make it less fragile and more reliable:
https://ovirt-jira.atlassian.net/browse/OVIRT-2344