I suspect you have network issues.
Check the gluster log for the client side /var/log/glusterfs/rhev-data-center-mnt-glusterSD-<node_name>:_<volume_name>.log

Best Regards,
Strahil Nikolov 

On Tue, Sep 6, 2022 at 17:19, Diego Ercolani
<diego.ercolani@ssis.sm> wrote:
I really don't understand, I was monitoring vdsm.log of one node (node2)
And I saw a complain:
2022-09-06 14:08:27,105+0000 ERROR (check/loop) [storage.monitor] Error checking path /rhev/data-center/mnt/glusterSD/ovirt-node2.ovirt:_gv1/45b4f14c-8323-482f-90ab-99d8fd610018/dom_md/metadata (monitor:511)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 509, in _pathChecked
    delay = result.delay()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 398, in delay
    raise exception.MiscFileReadException(self.path, self.rc, self.err)
vdsm.storage.exception.MiscFileReadException: Internal file read failure: ('/rhev/data-center/mnt/glusterSD/ovirt-node2.ovirt:_gv1/45b4f14c-8323-482f-90ab-99d8fd610018/dom_md/metadata', 1, 'Read timeout')
2022-09-06 14:08:27,105+0000 INFO  (check/loop) [storage.monitor] Domain 45b4f14c-8323-482f-90ab-99d8fd610018 became INVALID (monitor:482)
2022-09-06 14:08:27,149+0000 ERROR (check/loop) [storage.monitor] Error checking path /rhev/data-center/mnt/glusterSD/localhost:_gv0/60b7f172-08ed-4a22-8414-31fd5b100d72/dom_md/metadata (monitor:511)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 509, in _pathChecked
    delay = result.delay()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 398, in delay
    raise exception.MiscFileReadException(self.path, self.rc, self.err)
vdsm.storage.exception.MiscFileReadException: Internal file read failure: ('/rhev/data-center/mnt/glusterSD/localhost:_gv0/60b7f172-08ed-4a22-8414-31fd5b100d72/dom_md/metadata', 1, 'Read timeout')
2022-09-06 14:08:27,814+0000 INFO  (jsonrpc/5) [api.virt] START getStats() from=::1,54242, vmId=8486ed73-df34-4c58-bfdc-7025dec63b7f (api:48)
2022-09-06 14:08:27,814+0000 INFO  (jsonrpc/5) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': '8486ed73-df34-4c58-bfdc-7025dec63b7f'} (api:129)
2022-09-06 14:08:27,814+0000 INFO  (jsonrpc/5) [api.virt] FINISH getStats return={'status': {'code': 1, 'message': "Virtual machine does not exist: {'vmId': '8486ed73-df34-4c58-bfdc-7025dec63b7f'}"}} from=::1,54242, vmId=8486ed73-df34-4c58-bfdc-7025dec63b7f (api:54)
2022-09-06 14:08:27,814+0000 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:312)
2022-09-06 14:08:31,357+0000 ERROR (check/loop) [storage.monitor] Error checking path /rhev/data-center/mnt/glusterSD/localhost:_glen/3577c21e-f757-4405-97d1-0f827c9b4e22/dom_md/metadata (monitor:511)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 509, in _pathChecked
    delay = result.delay()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 398, in delay
    raise exception.MiscFileReadException(self.path, self.rc, self.err)
vdsm.storage.exception.MiscFileReadException: Internal file read failure: ('/rhev/data-center/mnt/glusterSD/localhost:_glen/3577c21e-f757-4405-97d1-0f827c9b4e22/dom_md/metadata', 1, 'Read timeout')
2022-09-06 14:08:32,918+0000 INFO  (periodic/5) [Executor] Worker was discarded (executor:305)

but, on the same node, from commandline I can issue a simple cat without any problem:
[root@ovirt-node2 ~]# cat "/rhev/data-center/mnt/glusterSD/localhost:_gv0/60b7f172-08ed-4a22-8414-31fd5b100d72/dom_md/metadata"
ALIGNMENT=1048576
BLOCK_SIZE=512
CLASS=Data
DESCRIPTION=gv0
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
POOL_UUID=da146814-f823-40e0-bd7b-8478dcfa38cd
REMOTE_PATH=localhost:/gv0
ROLE=Regular
SDUUID=60b7f172-08ed-4a22-8414-31fd5b100d72
TYPE=GLUSTERFS
VERSION=5
_SHA_CKSUM=a63324fa9b3030c3ffa35891c2d6c4e129c76af9

and
[root@ovirt-node2 ~]# cat '/rhev/data-center/mnt/glusterSD/localhost:_gv0/60b7f172-08ed-4a22-8414-31fd5b100d72/dom_md/metadata'
ALIGNMENT=1048576
BLOCK_SIZE=512
CLASS=Data
DESCRIPTION=gv0
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
POOL_UUID=da146814-f823-40e0-bd7b-8478dcfa38cd
REMOTE_PATH=localhost:/gv0
ROLE=Regular
SDUUID=60b7f172-08ed-4a22-8414-31fd5b100d72
TYPE=GLUSTERFS
VERSION=5
_SHA_CKSUM=a63324fa9b3030c3ffa35891c2d6c4e129c76af9

after while, I retried the same cat, and the host console hanged.... so, sometimes, gluster revoke access to file ?!?!?! why?

I think this "hang" is the source of all my problems.

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DY5WC45NQSQK2IPWSUKUT5U4MIFMPNW7/