
Hi everybody, we are running an oVirt 4.3.10 production cluster with 9 hosts and 5 datastore domains. Since yesterday we get error messages 26.02.2021 02:00:48 VDSM command SetVolumeDescriptionVDS failed: Could not acquire resource. Probably resource factory threw an exception.: () 26.02.2021 02:00:48 Failed to update OVF disks 5aa438e3-8d22-4b6c-bccf-a843151ca0be, OVF data isn't updated on those OVF stores (Data Center datacenter01, Storage Domain vmstore13). 26.02.2021 02:00:48 Failed to update VMs/Templates OVF data for Storage Domain vmstore13 in Data Center datacenter01. every hour. Only one domain (vmstore13) is affected, this domain is (like 3 others in the cluster) based on glusterfs. Trying to update the OVF's manually from the engine web-gui leads to the same result followed by (a misleading) 26.02.2021 02:00:49 OVF_STORE for domain vmstore13 was updated by admin@internal-authz. The vm's with discs on the affected domain are running fine, snapshots are working. So far I have tried to move the SPM role to another host. This succeeded, but the error messages persist. The vdsm log on the SPM host contains something like 2021-02-26 03:00:57,701+0100 INFO (jsonrpc/2) [vdsm.api] START setVolumeDescription(sdUUID=u'9f731135-f5d9-4609-9e3b-fa9cae75e314', spUUID=u'33e8dc9e-8bc8-11ea-bd76-00163e741033', imgUUID=u'5aa438e3-8d22-4b6c-bccf-a843151ca0be', volUUID=u'0795e58c-4960-413a-a0b4-e8a6d547fda5', description=u'{"Updated":false,"Last Updated":"Wed Feb 24 17:48:17 CET 2021","Storage Domains":[{"uuid":"9f731135-f5d9-4609-9e3b-fa9cae75e314"}],"Disk Description":"OVF_STORE"}', options=None) from=::ffff:10.70.1.1,46968, flow_id=1f314676, task_id=9101db01-b4f0-447e-a5a9-b6af76278d55 (api:48) 2021-02-26 03:00:57,712+0100 ERROR (jsonrpc/2) [storage.VolumeManifest] [Errno 116] Stale file handle (fileVolume:155) for each error, I have attached the relevant part of vdsm.log. Has anyone experienced this behaviour before and can provide help ? -- juergen

managed to fix it myself :) Did a cat /rhev/data-center/mnt/glusterSD/10.70.7.17\:_vmstore13/9f731135-f5d9-4609-9e3b-fa9cae75e314/images/5aa438e3-8d22-4b6c-bccf-a843151ca0be/0795e58c-4960-413a-a0b4-e8a6d547fda5.meta on the host the gluster file system was mounted from (vmhost17, IP 10.70.7.17), got "file not found", repeated the same command, this time successful and the problem went away. Probably some strage glusterfs problem and not directly related to oVirt. Just in case someone stumbles accross the same thing ...

On Fri, Feb 26, 2021 at 3:41 PM Jürgen Walch via Users <users@ovirt.org> wrote:
managed to fix it myself :)
Did a
cat /rhev/data-center/mnt/glusterSD/10.70.7.17\:_vmstore13/9f731135-f5d9-4609-9e3b-fa9cae75e314/images/5aa438e3-8d22-4b6c-bccf-a843151ca0be/0795e58c-4960-413a-a0b4-e8a6d547fda5.meta
on the host the gluster file system was mounted from (vmhost17, IP 10.70.7.17), got "file not found", repeated the same command, this time successful and the problem went away.
Probably some strage glusterfs problem and not directly related to oVirt. Just in case someone stumbles accross the same thing ...
Please file a bug, we may need to improve storage monitoring with Gluster to handle [Errno 116] Stale file handle. Nir
participants (2)
-
Jürgen Walch
-
Nir Soffer