On Sun, Feb 2, 2020 at 8:21 PM Christian Reiss <email@christian-reiss.de> wrote:

Hey folks,

oh Jesus. 3-Way HCI. Gluster w/o any issues:

[root@node01:/var/log/glusterfs] # gluster vol info ssd_storage

Volume Name: ssd_storage
Type: Replicate
Volume ID: d84ec99a-5db9-49c6-aab4-c7481a1dc57b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node01.company.com:/gluster_bricks/ssd_storage/ssd_storage
Brick2: node02.company.com:/gluster_bricks/ssd_storage/ssd_storage
Brick3: node03.company.com:/gluster_bricks/ssd_storage/ssd_storage
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.strict-o-direct: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
network.ping-timeout: 30
storage.owner-uid: 36
storage.owner-gid: 36
cluster.granular-entry-heal: enab

[root@node01:/var/log/glusterfs] # gluster vol status ssd_storage
Status of volume: ssd_storage
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick node01.company.com:/gluster_br
icks/ssd_storage/ssd_storage 49152 0 Y
63488
Brick node02.company.com:/gluster_br
icks/ssd_storage/ssd_storage 49152 0 Y
18860
Brick node03.company.com:/gluster_br
icks/ssd_storage/ssd_storage 49152 0 Y
15262
Self-heal Daemon on localhost N/A N/A Y
63511
Self-heal Daemon on node03.dc-dus.dalason.n
et N/A N/A Y
15285
Self-heal Daemon on 10.100.200.12 N/A N/A Y
18883

Task Status of Volume ssd_storage
------------------------------------------------------------------------------
There are no active volume tasks

[root@node01:/var/log/glusterfs] # gluster vol heal ssd_storage info
Brick node01.company.com:/gluster_bricks/ssd_storage/ssd_storage
Status: Connected
Number of entries: 0

Brick node02.company.com:/gluster_bricks/ssd_storage/ssd_storage
Status: Connected
Number of entries: 0

Brick node03.company.com:/gluster_bricks/ssd_storage/ssd_storage
Status: Connected
Number of entries: 0

And everything is mounted where its supposed to. But no VMs start due to
IO Error. I checked a gluster-based file (CentOS iso) md5 against a
local copy, it matches. One VM at one point managed to start, but failed
subsequent starts. The data/disks seem okay,

/var/log/glusterfs/"rhev-data-center-mnt-glusterSD-node01.company.com:_ssd__storage.log-20200202"
has entries like:

[2020-02-01 23:15:15.449902] W [MSGID: 114031]
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1:
remote operation failed. Path:
/.shard/86da0289-f74f-4200-9284-678e7bd76195.1405
(00000000-0000-0000-0000-000000000000) [Permission denied]
[2020-02-01 23:15:15.484363] W [MSGID: 114031]
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1:
remote operation failed. Path:
/.shard/86da0289-f74f-4200-9284-678e7bd76195.1400
(00000000-0000-0000-0000-000000000000) [Permission denied]

Before this happened we put one host into maintenance mode, it all
started during migration.

Any help? We're sweating blood here.

--
with kind regards,
mit freundlichen Gruessen,

Christian Reiss
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VJUJK7USH2BV4ZXLFXAA7EJMUVAUGIF4/