On Tue, Apr 7, 2020 at 3:59 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

OK. So I set log at least at INFO level on all subsystems and tried a redeploy of Openshift with 3 mater nodes and 7 worker nodes.
One worker got the error and VM in paused mode

Apr 7, 2020, 3:27:28 PM VM worker-6 has been paused due to unknown storage error.

The vm has only one 100Gb virtual disk on gluster volume named vmstore


Here below all the logs around time at the different layers.
Let me know if you need another log file not yet considered.

From what I see, the matching error is found in

- rhev-data-center-mnt-glusterSD-ovirtst.mydomain.storage:_vmstore.log

[2020-04-07 13:27:28.721262] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-vmstore-shard: Lookup on shard 523 failed. Base file gfid = d22530cf-2e50-4059-8924-0aafe38497b1 [No such file or directory]
[2020-04-07 13:27:28.721432] W [fuse-bridge.c:2918:fuse_writev_cbk] 0-glusterfs-fuse: 4435189: WRITE => -1 gfid=d22530cf-2e50-4059-8924-0aafe38497b1 fd=0x7f3c4c07ab38 (No such file or directory)

and

- gluster_bricks-vmstore-vmstore.log

[2020-04-07 13:27:28.719391] W [MSGID: 113020] [posix-helpers.c:1051:posix_gfid_set] 0-vmstore-posix: setting GFID on /gluster_bricks/vmstore
/vmstore/.shard/d22530cf-2e50-4059-8924-0aafe38497b1.523 failed  [File exists]
[2020-04-07 13:27:28.719978] E [MSGID: 113020] [posix-entry-ops.c:517:posix_mknod] 0-vmstore-posix: setting gfid on /gluster_bricks/vmstore/v
mstore/.shard/d22530cf-2e50-4059-8924-0aafe38497b1.523 failed [File exists]


Here below all the files checked.
Any hint?

Gianluca


During sort of stress installation test of OCP worker nodes, the same happened other two times with different VMs, always with thin provisioned disks and with the same lookup error in log: 

errors on rhev-data-center-mnt-glusterSD-ovirtst.mydomain.storage:_vmstore.log

[2020-04-07 14:38:55.505093] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-vmstore-shard: Lookup on shard 732 failed. Base file gfid = d99c2f26-ec99-43a9-8a5b-887c38220e1a [No such file or directory]
[2020-04-07 14:38:55.505455] W [fuse-bridge.c:2918:fuse_writev_cbk] 0-glusterfs-fuse: 7219169: WRITE => -1 gfid=d99c2f26ec99-43a9-8a5b-887c38220e1a fd=0x7f3c4c09aa48 (No such file or directory)
[2020-04-07 14:38:55.505461] W [fuse-bridge.c:2918:fuse_writev_cbk] 0-glusterfs-fuse: 7219175: WRITE => -1 gfid=d99c2f26-ec99-43a9-8a5b-887c38220e1a fd=0x7f3c4c09aa48 (No such file or directory)
[2020-04-07 14:38:55.505432] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-vmstore-shard: Lookup on shard 732 failed. Base file gfid = d99c2f26-ec99-43a9-8a5b-887c38220e1a [No such file or directory]
[2020-04-07 14:39:18.292135] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-vmstore-client-0: remote operation failed [Invalid argument]

and

[2020-04-07 15:31:02.224194] E [MSGID: 133010] [shard.c:2327:shard_common_lookup_shards_cbk] 0-vmstore-shard: Lookup on shard 1363 failed. Base file gfid = 0bee6cf6-da1d-4a37-8afb-3459815986f5 [No such file or directory]
[2020-04-07 15:31:02.224393] W [fuse-bridge.c:2918:fuse_writev_cbk] 0-glusterfs-fuse: 9662285: WRITE => -1 gfid=0bee6cf6-da1d-4a37-8afb-3459815986f5 fd=0x7f3c4c024178 (No such file or directory)

Gianluca