[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

28 Mar 2020

      On Sat, Mar 28, 2020 at 2:21 AM Gianluca Cecchi <gianluca.cecchi@gmail.com>
wrote:
...
Hello,
having deployed oVirt 4.3.9 single host HCI with Gluster, I see some times
VM going into paused state for the error above and needing to manually run
it (sometimes this resumal operation fails).
Actually it only happened with empty disk (thin provisioned) and sudden
high I/O during the initial phase of install of the OS; it didn't happened
then during normal operaton (even with 600MB/s of throughput).
I suspect something related to metadata extension not able to be in pair
with the speed of the physical disk growing.... similar to what happens for
block based storage domains where the LVM layer has to extend the logical
volume representing the virtual disk
My real world reproduction of the error is during install of OCP 4.3.8
master node, when Red Hat Cores OS boots from network and wipes the disk
and I think then transfer an image, so doing high immediate I/O.
The VM used as master node has been created with a 120Gb thin provisioned
disk (virtio-scsi type) and starts with disk just initialized and empty,
going through PXE install.
I get this line inside events for the VM
Mar 27, 2020, 12:35:23 AM VM master01 has been paused due to unknown
storage error.
Here logs around the time frame above:
- engine.log
https://drive.google.com/file/d/1zpNo5IgFVTAlKXHiAMTL-uvaoXSNMVRO/view?usp=s...
- vdsm.log
https://drive.google.com/file/d/1v8kR0N6PdHBJ5hYzEYKl4-m7v1Lb_cYX/view?usp=s...
Any suggestions?
The disk of the VM is on vmstore storage domain and its gluster volume
settings are:
[root@ovirt tmp]# gluster volume info vmstore
Volume Name: vmstore
Type: Distribute
Volume ID: a6203d77-3b9d-49f9-94c5-9e30562959c4
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: ovirtst.mydomain.storage:/gluster_bricks/vmstore/vmstore
Options Reconfigured:
performance.low-prio-threads: 32
storage.owner-gid: 36
performance.read-ahead: off
user.cifs: off
storage.owner-uid: 36
performance.io-cache: off
performance.quick-read: off
network.ping-timeout: 30
features.shard: on
network.remote-dio: off
cluster.eager-lock: enable
performance.strict-o-direct: on
transport.address-family: inet
nfs.disable: on
[root@ovirt tmp]#
What about config above, related to eventual optimizations to be done
based on having single host?
And comparing with the virt group of options:
[root@ovirt tmp]# cat /var/lib/glusterd/groups/virt
performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.low-prio-threads=32
network.remote-dio=enable
cluster.eager-lock=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.data-self-heal-algorithm=full
cluster.locking-scheme=granular
cluster.shd-max-threads=8
cluster.shd-wait-qlength=10000
features.shard=on
user.cifs=off
cluster.choose-local=off
client.event-threads=4
server.event-threads=4
performance.client-io-threads=on
[root@ovirt tmp]#
?
Thanks Gianluca
Further information.
What I see around time frame in gluster brick log file
gluster_bricks-vmstore-vmstore.log (timestamp is behind 1 hour in log file)

[2020-03-27 23:30:38.575808] I [MSGID: 101055]
[client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down connection
CTX_ID:6e8f70b8-1946-4505-860f-be90e5807cb3-GRAPH_ID:0-PID:223418-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0
[2020-03-27 23:35:15.281449] E [MSGID: 113072]
[posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed:
offset 0, [Invalid argument]
[2020-03-27 23:35:15.281545] E [MSGID: 115067]
[server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-vmstore-server: 34139378:
WRITEV 10 (00d9fe81-8a31-498e-8401-7b9d1477378e), client:
CTX_ID:d04437ba-ef98-43df-864f-5e9d3738620a-GRAPH_ID:0-PID:27687-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0,
error-xlator: vmstore-posix [Invalid argument]
[2020-03-27 23:40:15.415794] E [MSGID: 113072]
[posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed:
offset 0, [Invalid argument]

My gluster components' version:

gluster-ansible-cluster-1.0.0-1.el7.noarch
gluster-ansible-features-1.0.5-3.el7.noarch
gluster-ansible-infra-1.0.4-3.el7.noarch
gluster-ansible-maintenance-1.0.1-1.el7.noarch
gluster-ansible-repositories-1.0.1-1.el7.noarch
gluster-ansible-roles-1.0.5-7.el7.noarch
glusterfs-6.8-1.el7.x86_64
glusterfs-api-6.8-1.el7.x86_64
glusterfs-cli-6.8-1.el7.x86_64
glusterfs-client-xlators-6.8-1.el7.x86_64
glusterfs-events-6.8-1.el7.x86_64
glusterfs-fuse-6.8-1.el7.x86_64
glusterfs-geo-replication-6.8-1.el7.x86_64
glusterfs-libs-6.8-1.el7.x86_64
glusterfs-rdma-6.8-1.el7.x86_64
glusterfs-server-6.8-1.el7.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-23.el7_7.6.x86_64
python2-gluster-6.8-1.el7.x86_64
vdsm-gluster-4.30.43-1.el7.x86_64

And for completeness, the whole set of parameters for the volume, to ask
eventual further performance related suggestions considering single node
environment:

[root@ovirt tmp]# gluster volume get vmstore all
Option                                  Value

------                                  -----

cluster.lookup-unhashed                 on

cluster.lookup-optimize                 on

cluster.min-free-disk                   10%

cluster.min-free-inodes                 5%

cluster.rebalance-stats                 off

cluster.subvols-per-directory           (null)

cluster.readdir-optimize                off

cluster.rsync-hash-regex                (null)

cluster.extra-hash-regex                (null)

cluster.dht-xattr-name                  trusted.glusterfs.dht

cluster.randomize-hash-range-by-gfid    off

cluster.rebal-throttle                  normal

cluster.lock-migration                  off

cluster.force-migration                 off

cluster.local-volume-name               (null)

cluster.weighted-rebalance              on

cluster.switch-pattern                  (null)

cluster.entry-change-log                on

cluster.read-subvolume                  (null)

cluster.read-subvolume-index            -1

cluster.read-hash-mode                  1

cluster.background-self-heal-count      8

cluster.metadata-self-heal              off

cluster.data-self-heal                  off

cluster.entry-self-heal                 off

cluster.self-heal-daemon                on

cluster.heal-timeout                    600

cluster.self-heal-window-size           1

cluster.data-change-log                 on

cluster.metadata-change-log             on

cluster.data-self-heal-algorithm        (null)

cluster.eager-lock                      enable

disperse.eager-lock                     on

disperse.other-eager-lock               on

disperse.eager-lock-timeout             1

disperse.other-eager-lock-timeout       1

cluster.quorum-type                     none

cluster.quorum-count                    (null)

cluster.choose-local                    true

cluster.self-heal-readdir-size          1KB

cluster.post-op-delay-secs              1

cluster.ensure-durability               on

cluster.consistent-metadata             no

cluster.heal-wait-queue-length          128

cluster.favorite-child-policy           none

cluster.full-lock                       yes

diagnostics.latency-measurement         off

diagnostics.dump-fd-stats               off

diagnostics.count-fop-hits              off

diagnostics.brick-log-level             INFO

diagnostics.client-log-level            INFO

diagnostics.brick-sys-log-level         CRITICAL

diagnostics.client-sys-log-level        CRITICAL

diagnostics.brick-logger                (null)

diagnostics.client-logger               (null)

diagnostics.brick-log-format            (null)

diagnostics.client-log-format           (null)

diagnostics.brick-log-buf-size          5

diagnostics.client-log-buf-size         5

diagnostics.brick-log-flush-timeout     120

diagnostics.client-log-flush-timeout    120

diagnostics.stats-dump-interval         0

diagnostics.fop-sample-interval         0

diagnostics.stats-dump-format           json

diagnostics.fop-sample-buf-size         65535

diagnostics.stats-dnscache-ttl-sec      86400

performance.cache-max-file-size         0

performance.cache-min-file-size         0

performance.cache-refresh-timeout       1

performance.cache-priority

performance.cache-size                  32MB

performance.io-thread-count             16

performance.high-prio-threads           16

performance.normal-prio-threads         16

performance.low-prio-threads            32

performance.least-prio-threads          1

performance.enable-least-priority       on

performance.iot-watchdog-secs           (null)

performance.iot-cleanup-disconnected-reqsoff

performance.iot-pass-through            false

performance.io-cache-pass-through       false

performance.cache-size                  128MB

performance.qr-cache-timeout            1

performance.cache-invalidation          false

performance.ctime-invalidation          false

performance.flush-behind                on

performance.nfs.flush-behind            on

performance.write-behind-window-size    1MB

performance.resync-failed-syncs-after-fsyncoff

performance.nfs.write-behind-window-size1MB

performance.strict-o-direct             on

performance.nfs.strict-o-direct         off

performance.strict-write-ordering       off

performance.nfs.strict-write-ordering   off

performance.write-behind-trickling-writeson

performance.aggregate-size              128KB

performance.nfs.write-behind-trickling-writeson

performance.lazy-open                   yes

performance.read-after-open             yes

performance.open-behind-pass-through    false

performance.read-ahead-page-count       4

performance.read-ahead-pass-through     false

performance.readdir-ahead-pass-through  false

performance.md-cache-pass-through       false

performance.md-cache-timeout            1

performance.cache-swift-metadata        true

performance.cache-samba-metadata        false

performance.cache-capability-xattrs     true

performance.cache-ima-xattrs            true

performance.md-cache-statfs             off

performance.xattr-cache-list

performance.nl-cache-pass-through       false

features.encryption                     off

network.frame-timeout                   1800

network.ping-timeout                    30

network.tcp-window-size                 (null)

client.ssl                              off

network.remote-dio                      off

client.event-threads                    2

client.tcp-user-timeout                 0

client.keepalive-time                   20

client.keepalive-interval               2

client.keepalive-count                  9

network.tcp-window-size                 (null)

network.inode-lru-limit                 16384

auth.allow                              *

auth.reject                             (null)

transport.keepalive                     1

server.allow-insecure                   on

server.root-squash                      off

server.all-squash                       off

server.anonuid                          65534

server.anongid                          65534

server.statedump-path                   /var/run/gluster

server.outstanding-rpc-limit            64

server.ssl                              off

auth.ssl-allow                          *

server.manage-gids                      off

server.dynamic-auth                     on

client.send-gids                        on

server.gid-timeout                      300

server.own-thread                       (null)

server.event-threads                    2

server.tcp-user-timeout                 42

server.keepalive-time                   20

server.keepalive-interval               2

server.keepalive-count                  9

transport.listen-backlog                1024

transport.address-family                inet

performance.write-behind                on

performance.read-ahead                  off

performance.readdir-ahead               on

performance.io-cache                    off

performance.open-behind                 on

performance.quick-read                  off

performance.nl-cache                    off

performance.stat-prefetch               on

performance.client-io-threads           on

performance.nfs.write-behind            on

performance.nfs.read-ahead              off

performance.nfs.io-cache                off

performance.nfs.quick-read              off

performance.nfs.stat-prefetch           off

performance.nfs.io-threads              off

performance.force-readdirp              true

performance.cache-invalidation          false

performance.global-cache-invalidation   true

features.uss                            off

features.snapshot-directory             .snaps

features.show-snapshot-directory        off

features.tag-namespaces                 off

network.compression                     off

network.compression.window-size         -15

network.compression.mem-level           8

network.compression.min-size            0

network.compression.compression-level   -1

network.compression.debug               false

features.default-soft-limit             80%

features.soft-timeout                   60

features.hard-timeout                   5

features.alert-time                     86400

features.quota-deem-statfs              off

geo-replication.indexing                off

geo-replication.indexing                off

geo-replication.ignore-pid-check        off

geo-replication.ignore-pid-check        off

features.quota                          off

features.inode-quota                    off

features.bitrot                         disable

debug.trace                             off

debug.log-history                       no

debug.log-file                          no

debug.exclude-ops                       (null)

debug.include-ops                       (null)

debug.error-gen                         off

debug.error-failure                     (null)

debug.error-number                      (null)

debug.random-failure                    off

debug.error-fops                        (null)

nfs.disable                             on

features.read-only                      off

features.worm                           off

features.worm-file-level                off

features.worm-files-deletable           on

features.default-retention-period       120

features.retention-mode                 relax

features.auto-commit-period             180

storage.linux-aio                       off

storage.batch-fsync-mode                reverse-fsync

storage.batch-fsync-delay-usec          0

storage.owner-uid                       36

storage.owner-gid                       36

storage.node-uuid-pathinfo              off

storage.health-check-interval           30

storage.build-pgfid                     off

storage.gfid2path                       on

storage.gfid2path-separator             :

storage.reserve                         1

storage.health-check-timeout            10

storage.fips-mode-rchecksum             off

storage.force-create-mode               0000

storage.force-directory-mode            0000

storage.create-mask                     0777

storage.create-directory-mask           0777

storage.max-hardlinks                   100

features.ctime                          on

config.gfproxyd                         off

cluster.server-quorum-type              off

cluster.server-quorum-ratio             0

changelog.changelog                     off

changelog.changelog-dir                 {{ brick.path
}}/.glusterfs/changelogs
changelog.encoding                      ascii

changelog.rollover-time                 15

changelog.fsync-interval                5

changelog.changelog-barrier-timeout     120

changelog.capture-del-path              off

features.barrier                        disable

features.barrier-timeout                120

features.trash                          off

features.trash-dir                      .trashcan

features.trash-eliminate-path           (null)

features.trash-max-filesize             5MB

features.trash-internal-op              off

cluster.enable-shared-storage           disable

locks.trace                             off

locks.mandatory-locking                 off

cluster.disperse-self-heal-daemon       enable

cluster.quorum-reads                    no

client.bind-insecure                    (null)

features.shard                          on

features.shard-block-size               64MB

features.shard-lru-limit                16384

features.shard-deletion-rate            100

features.scrub-throttle                 lazy

features.scrub-freq                     biweekly

features.scrub                          false

features.expiry-time                    120

features.cache-invalidation             off

features.cache-invalidation-timeout     60

features.leases                         off

features.lease-lock-recall-timeout      60

disperse.background-heals               8

disperse.heal-wait-qlength              128

cluster.heal-timeout                    600

dht.force-readdirp                      on

disperse.read-policy                    gfid-hash

cluster.shd-max-threads                 1

cluster.shd-wait-qlength                1024

cluster.locking-scheme                  full

cluster.granular-entry-heal             no

features.locks-revocation-secs          0

features.locks-revocation-clear-all     false

features.locks-revocation-max-blocked   0

features.locks-monkey-unlocking         false

features.locks-notify-contention        no

features.locks-notify-contention-delay  5

disperse.shd-max-threads                1

disperse.shd-wait-qlength               1024

disperse.cpu-extensions                 auto

disperse.self-heal-window-size          1

cluster.use-compound-fops               off

performance.parallel-readdir            off

performance.rda-request-size            131072

performance.rda-low-wmark               4096

performance.rda-high-wmark              128KB

performance.rda-cache-limit             10MB

performance.nl-cache-positive-entry     false

performance.nl-cache-limit              10MB

performance.nl-cache-timeout            60

cluster.brick-multiplex                 off

cluster.max-bricks-per-process          250

disperse.optimistic-change-log          on

disperse.stripe-cache                   4

cluster.halo-enabled                    False

cluster.halo-shd-max-latency            99999

cluster.halo-nfsd-max-latency           5

cluster.halo-max-latency                5

cluster.halo-max-replicas               99999

cluster.halo-min-replicas               2

features.selinux                        on

cluster.daemon-log-level                INFO

debug.delay-gen                         off

delay-gen.delay-percentage              10%

delay-gen.delay-duration                100000

delay-gen.enable

disperse.parallel-writes                on

features.sdfs                           off

features.cloudsync                      off

features.ctime                          on

ctime.noatime                           on

feature.cloudsync-storetype             (null)

features.enforce-mandatory-lock         off

[root@ovirt tmp]#

Gianluca