
I forgot the additional logs. Please guys, any help... (insert scream here). On 03/02/2020 01:20, Christian Reiss wrote:
Hey folks,
oh Jesus. 3-Way HCI. Gluster w/o any issues:
[root@node01:/var/log/glusterfs] # gluster vol info ssd_storage
Volume Name: ssd_storage Type: Replicate Volume ID: d84ec99a-5db9-49c6-aab4-c7481a1dc57b Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node01.company.com:/gluster_bricks/ssd_storage/ssd_storage Brick2: node02.company.com:/gluster_bricks/ssd_storage/ssd_storage Brick3: node03.company.com:/gluster_bricks/ssd_storage/ssd_storage Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enab
[root@node01:/var/log/glusterfs] # gluster vol status ssd_storage Status of volume: ssd_storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------
Brick node01.company.com:/gluster_br icks/ssd_storage/ssd_storage 49152 0 Y 63488 Brick node02.company.com:/gluster_br icks/ssd_storage/ssd_storage 49152 0 Y 18860 Brick node03.company.com:/gluster_br icks/ssd_storage/ssd_storage 49152 0 Y 15262 Self-heal Daemon on localhost N/A N/A Y 63511 Self-heal Daemon on node03.dc-dus.dalason.n et N/A N/A Y 15285 Self-heal Daemon on 10.100.200.12 N/A N/A Y 18883
Task Status of Volume ssd_storage ------------------------------------------------------------------------------
There are no active volume tasks
[root@node01:/var/log/glusterfs] # gluster vol heal ssd_storage info Brick node01.company.com:/gluster_bricks/ssd_storage/ssd_storage Status: Connected Number of entries: 0
Brick node02.company.com:/gluster_bricks/ssd_storage/ssd_storage Status: Connected Number of entries: 0
Brick node03.company.com:/gluster_bricks/ssd_storage/ssd_storage Status: Connected Number of entries: 0
And everything is mounted where its supposed to. But no VMs start due to IO Error. I checked a gluster-based file (CentOS iso) md5 against a local copy, it matches. One VM at one point managed to start, but failed subsequent starts. The data/disks seem okay,
/var/log/glusterfs/"rhev-data-center-mnt-glusterSD-node01.company.com:_ssd__storage.log-20200202" has entries like:
[2020-02-01 23:15:15.449902] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1: remote operation failed. Path: /.shard/86da0289-f74f-4200-9284-678e7bd76195.1405 (00000000-0000-0000-0000-000000000000) [Permission denied] [2020-02-01 23:15:15.484363] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1: remote operation failed. Path: /.shard/86da0289-f74f-4200-9284-678e7bd76195.1400 (00000000-0000-0000-0000-000000000000) [Permission denied]
Before this happened we put one host into maintenance mode, it all started during migration.
Any help? We're sweating blood here.
-- with kind regards, mit freundlichen Gruessen, Christian Reiss