В вторник, 4 февруари 2020 г., 14:40:32 ч. Гринуич+2, Christian Reiss <email@christian-reiss.de> написа:

Hello folks,

I am running a 3-way, no arbiter Gluster setup using oVirt and contained

Gluster 6.7. After a crash we are unable to start any VMs due to Storage

IO error.

After much, much backtracking and debugging we are closing in on the

symptons, albeit not the issue.

Conditions:

- gluster volume is healthy,

- No outstanding heal or split-brain files,

- 3 way without arbiter nodes (3 copies),

- I already ran several "heal full" commands.

Gluster Volume Info

Volume Name: ssd_storage

Type: Replicate

Volume ID: d84ec99a-5db9-49c6-aab4-c7481a1dc57b

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: node01.company.com:/gluster_bricks/ssd_storage/ssd_storage

Brick2: node02.company.com:/gluster_bricks/ssd_storage/ssd_storage

Brick3: node03.company.com:/gluster_bricks/ssd_storage/ssd_storage

Options Reconfigured:

cluster.self-heal-daemon: enable

cluster.granular-entry-heal: enable

storage.owner-gid: 36

storage.owner-uid: 36

network.ping-timeout: 30

server.event-threads: 4

client.event-threads: 4

cluster.choose-local: off

user.cifs: off

features.shard: on

cluster.shd-wait-qlength: 10000

cluster.shd-max-threads: 8

cluster.locking-scheme: granular

cluster.data-self-heal-algorithm: full

cluster.server-quorum-type: server

cluster.quorum-type: auto

cluster.eager-lock: enable

network.remote-dio: off

performance.low-prio-threads: 32

performance.io-cache: off

performance.read-ahead: off

performance.quick-read: off

performance.strict-o-direct: on

transport.address-family: inet

nfs.disable: on

performance.client-io-threads: on

Gluster Volume Status

Status of volume: ssd_storage

Gluster process TCP Port RDMA Port

Online Pid

------------------------------------------------------------------------------

Brick node01.company.com:/gluster_br

icks/ssd_storage/ssd_storage 49152 0 Y

8218

Brick node02.company.com:/gluster_br

icks/ssd_storage/ssd_storage 49152 0 Y

23595

Brick node03.company.com:/gluster_br

icks/ssd_storage/ssd_storage 49152 0 Y

8080

Self-heal Daemon on localhost N/A N/A Y

66028

Self-heal Daemon on 10.100.200.12 N/A N/A Y

52087

Self-heal Daemon on node03.company.com

et N/A N/A Y

8372

Task Status of Volume ssd_storage

------------------------------------------------------------------------------

There are no active volume tasks

The mounted path where the oVirt vm files reside is 100% okay, we copied

all the images out there onto standalone hosts and the images run just

fine. There is no obvious data corruption. However launching any VM out

of oVirt fails with "IO Storage Error".

This is where everything gets funny.

oVirt uses a vdsm user to access all the files.

Findings:

- root can read, edit and write all files inside the ovirt mounted

gluster path.

- vdsm user can write to new files regardless of size without any

issues; changes get replicated instantly to other nodes.

- vdsm user can append to existing files regardless of size without

any issues; changes get replicated instantly to other nodes.

- vdsm user can read files if those files are smaller than 64mb.

- vdsm user gets permission denied errors if the file to be read is

65mb or bigger.

- vdsm user gets permission denied errors if the requests crosses a

gluster shard-file boundary.

- if root does a "dd if=file_larger_than64mb" of=/dev/null" on any

large file, the file can then be read by the vdsm user on that single

node. Changes do not get replicated to other nodes.

Example:

id of the vdsm user & sudo to them:

[vdsm@node01:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test]

$ id

uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock)

context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[vdsm@node02:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test]

$ id

uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock)

context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[vdsm@node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test]

$ id

uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock)

context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Create a file >64mb on one node:

[vdsm@node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test]

$ base64 /dev/urandom | head -c 200000000 > file.txt

[vdsm@node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test]

$ ls -lha

total 191M

drwxr-xr-x. 2 vdsm kvm 30 Feb 4 13:10 .

drwxr-xr-x. 6 vdsm kvm 80 Jan 1 1970 ..

-rw-r--r--. 1 vdsm kvm 191M Feb 4 13:10 file.txt

File is instantly available on another node:

[vdsm@node01:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test]

$ ls -lha

total 191M

drwxr-xr-x. 2 vdsm kvm 30 Feb 4 13:10 .

drwxr-xr-x. 6 vdsm kvm 80 Jan 1 1970 ..

-rw-r--r--. 1 vdsm kvm 191M Feb 4 13:10 file.txt

Accessing the whole file fails:

[vdsm@node01:] dd if=file.txt of=/dev/null

dd: error reading ‘file.txt’: Permission denied

131072+0 records in

131072+0 records out

67108864 bytes (67 MB) copied, 0.0651919 s, 1.0 GB/s

Reading first 64mb works, 65mb (crossing boundary) does not:

[vdsm@node01:] $ dd if=file.txt bs=1M count=64 of=/dev/null

64+0 records in

64+0 records out

67108864 bytes (67 MB) copied, 0.00801663 s, 8.4 GB/s

[vdsm@node01:] $ dd if=file.txt bs=1M count=65 of=/dev/null

dd: error reading ‘file.txt’: Permission denied

64+0 records in

64+0 records out

67108864 bytes (67 MB) copied, 0.00908712 s, 7.4 GB/s

Attaching/ appending to the file works (not crossing bounary):

[vdsm@node01:] $ date >> file.txt

[vdsm@node01:] $

[vdsm@node02:] $ tail -n2 file.txt

E16ACZaLqLhx2oUUUov5JHvQcVFohn6HH+eog6XZCiTaG0Tue 4 Feb 13:18:37 CET 2020

Reading the file beginning & end works, if it crosses the boundary not

so much:

[vdsm@node02:] $ head file.txt

jrZOxGaGvwfpGSwn1BKWWmFC4556KNzXsD2BCwY78tnV1mRY54IxnE+hbnszRyWgVuXhBpVRoJTp

xvVwktZwSytMyvJjsSt7pQbXbHSY66tRe/rvrw5dHr3RNJn9HjqtlKQ9mHVX4ch1HkU5posSmDbg

vwzxBTXWfxLDMmIghyTgBTSFiI9Xg8W6htxDpxrbO+10EzlnaN1Am5tAlTkfrorNLyihpiQhUPGG

ag6tJUcFj3IySGRTAxnStFRQoBXN5dlyx1Sqc4s/Tpl7gkgR8+I7UcdRKISjgcGcpW+zrXKqFF/H

Dwv6ql+2ysPRrtlbt2V8Zf697VsNX5DTgZS9BKmWlAeqejNYaqG5Rsuhn7szbCfkkmsjedk+Rdcv

A3SHMBeHXdtfBHS0AlbEwKgeml08NmCUcwnifhrQywCnu8NN9+RQ3cUxGvIuLLSzi3915wC6hbxr

8xArckQfSUfKA/hrHvoiiCGZU9D23xj3XXtsjdbIIDXATDnCPrKANdvGN5LTKal8bT0jXORfAz1z

MniqVUgvWVNcviPgQ9BfT5qpGo8g7LaoBMGamAGVX6Ezrs04rk8jQ1yz1bB/8URfTRLZdyYkMh0u

MB4xMylnyavgusoi7Duf5RuYJvNaL0g8Lx/cfGpGsGwdD2Lj/qRC45ammn6wCxDVfiJV6Z/TzJcY

PBvzWK5xT++PQgMV8EwtXwA1kFqaGrcuiDHejMQ8O82Edjr+eBCBe0B7bRddoMD6oOlhNm1YsSNt

[vdsm@node02:] $ tail file.txt

9JX8OWCJwbyvEPDyyI30H1/jPZfDo1sS11dZ2JjiO7qhB45VaU8+irG45D0GGJhFf8wE8TD9EGWG

8346QHLX9ZSFsbjpuh71hr5Ju1UduVdvIDwwP8WDBtRUbMAVvsyGR33rkpijepmUjmYl/jeZ7rsC

VyUVlmG5PxrI7KKxz5dSkzApqVHKKgsf93JMDAdPwvXTq4hhZdUJ581w9FC/f9k2wWldEGkAcyB0

cCKp+VJl2vx989KUoqAJzsrvYdK0X7itruqYdpC29JXode+7NixUflhKvPdKmitBYyCEgCcyxUyn

eyMOdaan2x8d8MztLLoWLpp+gLzl2Hev7y3OXq6I9SVN2t+hcVIz8Llmumy0cD+VC4u2/UZszYqS

nDaSSMs35agGUUgIpHjPxCRf/yqnfrJJMTGAcxSEqHtpEdsjEmkf4QkyEgEZ13f4oi7P/DFCIIvV

JBsHzOLDoetnFzAA2/RqbDflPrVWcAR7tXVqGLACCj2s19uUFSNb8nBWmEk8fFz31iJhuL43v0WE

78/THl49T0hhzHQp6kdIiw5p1zPUIFGBZ0BS4mBCHxu+tMlPZe1zWJMJZdPnvDNtHZ4gQ6LFgU4w

E16ACZaLqLhx2oUUUov5JHvQcVFohn6HH+eog6XZCiTaG0Tue 4 Feb 13:18:37 CET 2020

[vdsm@node02:] $ dd if=file.txt of=/dev/null

dd: error reading ‘file.txt’: Permission denied

131072+0 records in

131072+0 records out

67108864 bytes (67 MB) copied, 0.106097 s, 633 MB/s

if root does dd first, all is peachy:

[root@node02] # dd if=file.txt of=/dev/null

390625+1 records in

390625+1 records out

200000058 bytes (200 MB) copied, 0.345906 s, 578 MB/s

[vdsm@node02] $ dd if=file.txt of=/dev/null

390625+1 records in

390625+1 records out

200000058 bytes (200 MB) copied, 0.188451 s, 1.1 GB/s

Error in the gluster.log:

[2020-02-04 12:27:57.915356] W [MSGID: 114031]

[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1:

remote operation failed. Path:

/.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2

(00000000-0000-0000-0000-000000000000) [Permission denied]

[2020-02-04 12:27:57.915404] W [MSGID: 114031]

[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-0:

remote operation failed. Path:

/.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2

(00000000-0000-0000-0000-000000000000) [Permission denied]

[2020-02-04 12:27:57.915472] W [MSGID: 114031]

[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-2:

remote operation failed. Path:

/.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2

(00000000-0000-0000-0000-000000000000) [Permission denied]

[2020-02-04 12:27:57.915490] E [MSGID: 133010]

[shard.c:2327:shard_common_lookup_shards_cbk] 0-ssd_storage-shard:

Lookup on shard 2 failed. Base file gfid =

57200f4f-537d-4e56-9258-38fe6ac64c4e [Permission denied]

What we tried:

- restarting single hosts,

- restarting the entire cluster,

- doing stuff like find /rhev .. exec stats{}\ ;

- dd'ing (read) all of the mount dir...

We are out of ideas and also no experts on either gluster nor ovirt, it

seems.

And this is supposed to be a production HA environment. Any help would

be appreciated.

I hope I did think of all the relevant data and logs.

with kind regards,

mit freundlichen Gruessen,

Christian Reiss

_______________________________________________

Users mailing list -- users@ovirt.org

To unsubscribe send an email to users-leave@ovirt.org

Privacy Statement: https://www.ovirt.org/site/privacy-policy/

oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/

List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BOYMT4BP6P5FICFTMY75U4EDDL3HFNQK/