[ovirt-users] Re: Emergency :/ No VMs starting

3 Feb 2020

      I forgot the additional logs.

Please guys, any help... (insert scream here).

On 03/02/2020 01:20, Christian Reiss wrote:
...
Hey folks,
oh Jesus. 3-Way HCI. Gluster w/o any issues:
[root@node01:/var/log/glusterfs] # gluster vol info  ssd_storage
Volume Name: ssd_storage
Type: Replicate
Volume ID: d84ec99a-5db9-49c6-aab4-c7481a1dc57b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node01.company.com:/gluster_bricks/ssd_storage/ssd_storage
Brick2: node02.company.com:/gluster_bricks/ssd_storage/ssd_storage
Brick3: node03.company.com:/gluster_bricks/ssd_storage/ssd_storage
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.strict-o-direct: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
network.ping-timeout: 30
storage.owner-uid: 36
storage.owner-gid: 36
cluster.granular-entry-heal: enab
[root@node01:/var/log/glusterfs] # gluster vol status  ssd_storage
Status of volume: ssd_storage
Gluster process                             TCP Port  RDMA Port  Online  
Pid
------------------------------------------------------------------------------
Brick node01.company.com:/gluster_br
icks/ssd_storage/ssd_storage                49152     0          Y 63488
Brick node02.company.com:/gluster_br
icks/ssd_storage/ssd_storage                49152     0          Y 18860
Brick node03.company.com:/gluster_br
icks/ssd_storage/ssd_storage                49152     0          Y 15262
Self-heal Daemon on localhost               N/A       N/A        Y 63511
Self-heal Daemon on node03.dc-dus.dalason.n
et                                          N/A       N/A        Y 15285
Self-heal Daemon on 10.100.200.12           N/A       N/A        Y 18883
Task Status of Volume ssd_storage
------------------------------------------------------------------------------
There are no active volume tasks
[root@node01:/var/log/glusterfs] # gluster vol heal ssd_storage info
Brick node01.company.com:/gluster_bricks/ssd_storage/ssd_storage
Status: Connected
Number of entries: 0
Brick node02.company.com:/gluster_bricks/ssd_storage/ssd_storage
Status: Connected
Number of entries: 0
Brick node03.company.com:/gluster_bricks/ssd_storage/ssd_storage
Status: Connected
Number of entries: 0
And everything is mounted where its supposed to. But no VMs start due to 
IO Error. I checked a gluster-based file (CentOS iso) md5 against a 
local copy, it matches. One VM at one point managed to start, but failed 
subsequent starts. The data/disks seem okay,
/var/log/glusterfs/"rhev-data-center-mnt-glusterSD-node01.company.com:_ssd__storage.log-20200202" 
has entries like:
[2020-02-01 23:15:15.449902] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1: 
remote operation failed. Path: 
/.shard/86da0289-f74f-4200-9284-678e7bd76195.1405 
(00000000-0000-0000-0000-000000000000) [Permission denied]
[2020-02-01 23:15:15.484363] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1: 
remote operation failed. Path: 
/.shard/86da0289-f74f-4200-9284-678e7bd76195.1400 
(00000000-0000-0000-0000-000000000000) [Permission denied]
Before this happened we put one host into maintenance mode, it all 
started during migration.
Any help? We're sweating blood here.
-- 
with kind regards,
mit freundlichen Gruessen,

Christian Reiss