I've got an issue with one of my oVirt 4.3.10.4 clusters. OS is Centos 7.7, storage is
glusterfs. Whenever I try to create a snapshot of any of my VMs, I get an (eventual)
error:
VDSM
track04.yard.psc.edu command HSMGetAllTasksStatusesVDS failed: Could not acquire
resource. Probably resource factory threw an exception.: ()
After a bit of investigation, I believe that the root cause is that the snapshot file
created is not owned by vdsm:kvm. Looking at the directories in glusterfs, I see that some
of the disk images are owned by root:root, some are owned by qemu:qemu. In fact, if we
watch the directory for the VM, we can actually see that the files are owned by vdsm:kvm
when they are created, then get changed to qemu:qemu, then eventually get changed to being
owned by root:root. This is entirely repeatable. Needless to say, oVirt can't read the
disk images when they are owned by root, so that explains why the snapshot is failing. The
question, then, is why the ownership is getting changed out from under the creation
process. Checking the gluster volume info shows:
Volume Name: engine
Type: Replicate
Volume ID: 00951055-74b5-463a-84c0-59fa03be7478
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.200.0.131:/gluster_bricks/engine/engine
Brick2: 10.200.0.134:/gluster_bricks/engine/engine
Brick3: 10.200.0.135:/gluster_bricks/engine/engine
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable
I see that the owner UID and GID are correct (vdsm:kvm). Note that snapshots worked at one
point in the past - I see that there are snapshot images from a year ago. Any ideas where
to look to correct this? Thanks!
Tod Pike