On Wed, Dec 15, 2021 at 8:25 AM <tpike@psc.edu> wrote:
I've got an issue with one of my oVirt 4.3.10.4 clusters. OS is Centos 7.7, storage is glusterfs. Whenever I try to create a snapshot of any of my VMs, I get an (eventual) error:

VDSM track04.yard.psc.edu command HSMGetAllTasksStatusesVDS failed: Could not acquire resource. Probably resource factory threw an exception.: ()

After a bit of investigation, I believe that the root cause is that the snapshot file created is not owned by vdsm:kvm. Looking at the directories in glusterfs, I see that some of the disk images are owned by root:root, some are owned by qemu:qemu. In fact, if we watch the directory for the VM, we can actually see that the files are owned by vdsm:kvm when they are created, then get changed to qemu:qemu, then eventually get changed to being owned by root:root. This is entirely repeatable. Needless to say, oVirt can't read the disk images when they are owned by root, so that explains why the snapshot is failing. The question, then, is why the ownership is getting changed out from under the creation process. Checking the gluster volume info shows:

Volume Name: engine
Type: Replicate
Volume ID: 00951055-74b5-463a-84c0-59fa03be7478
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.200.0.131:/gluster_bricks/engine/engine
Brick2: 10.200.0.134:/gluster_bricks/engine/engine
Brick3: 10.200.0.135:/gluster_bricks/engine/engine
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable

I see that the owner UID and GID are correct (vdsm:kvm). Note that snapshots worked at one point in the past - I see that there are snapshot images from a year ago. Any ideas where to look to correct this? Thanks!
Hi Tod,
I don't know what is your libvirt version but we had a relevant bug in the past: https://bugzilla.redhat.com/show_bug.cgi?id=1851016 
To re-use your current snapshots (those that are root:root) you may need to change the ownership.
Please check if the bug above is the one you are hitting.

Regards,
Liran

Tod Pike
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/C3RBNTD6RCFRMDZJLDM6KGBK2XITIA54/