On Sat, Mar 28, 2020 at 2:21 AM Gianluca Cecchi <gianluca.cecchi(a)gmail.com>
wrote:
Hello,
having deployed oVirt 4.3.9 single host HCI with Gluster, I see some times
VM going into paused state for the error above and needing to manually run
it (sometimes this resumal operation fails).
Actually it only happened with empty disk (thin provisioned) and sudden
high I/O during the initial phase of install of the OS; it didn't happened
then during normal operaton (even with 600MB/s of throughput).
I suspect something related to metadata extension not able to be in pair
with the speed of the physical disk growing.... similar to what happens for
block based storage domains where the LVM layer has to extend the logical
volume representing the virtual disk
My real world reproduction of the error is during install of OCP 4.3.8
master node, when Red Hat Cores OS boots from network and wipes the disk
and I think then transfer an image, so doing high immediate I/O.
The VM used as master node has been created with a 120Gb thin provisioned
disk (virtio-scsi type) and starts with disk just initialized and empty,
going through PXE install.
I get this line inside events for the VM
Mar 27, 2020, 12:35:23 AM VM master01 has been paused due to unknown
storage error.
Here logs around the time frame above:
- engine.log
https://drive.google.com/file/d/1zpNo5IgFVTAlKXHiAMTL-uvaoXSNMVRO/view?us...
- vdsm.log
https://drive.google.com/file/d/1v8kR0N6PdHBJ5hYzEYKl4-m7v1Lb_cYX/view?us...
Any suggestions?
The disk of the VM is on vmstore storage domain and its gluster volume
settings are:
[root@ovirt tmp]# gluster volume info vmstore
Volume Name: vmstore
Type: Distribute
Volume ID: a6203d77-3b9d-49f9-94c5-9e30562959c4
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: ovirtst.mydomain.storage:/gluster_bricks/vmstore/vmstore
Options Reconfigured:
performance.low-prio-threads: 32
storage.owner-gid: 36
performance.read-ahead: off
user.cifs: off
storage.owner-uid: 36
performance.io-cache: off
performance.quick-read: off
network.ping-timeout: 30
features.shard: on
network.remote-dio: off
cluster.eager-lock: enable
performance.strict-o-direct: on
transport.address-family: inet
nfs.disable: on
[root@ovirt tmp]#
What about config above, related to eventual optimizations to be done
based on having single host?
And comparing with the virt group of options:
[root@ovirt tmp]# cat /var/lib/glusterd/groups/virt
performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.low-prio-threads=32
network.remote-dio=enable
cluster.eager-lock=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.data-self-heal-algorithm=full
cluster.locking-scheme=granular
cluster.shd-max-threads=8
cluster.shd-wait-qlength=10000
features.shard=on
user.cifs=off
cluster.choose-local=off
client.event-threads=4
server.event-threads=4
performance.client-io-threads=on
[root@ovirt tmp]#
?
Thanks Gianluca
Further information.
What I see around time frame in gluster brick log file
gluster_bricks-vmstore-vmstore.log (timestamp is behind 1 hour in log file)
[2020-03-27 23:30:38.575808] I [MSGID: 101055]
[client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down connection
CTX_ID:6e8f70b8-1946-4505-860f-be90e5807cb3-GRAPH_ID:0-PID:223418-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0
[2020-03-27 23:35:15.281449] E [MSGID: 113072]
[posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed:
offset 0, [Invalid argument]
[2020-03-27 23:35:15.281545] E [MSGID: 115067]
[server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-vmstore-server: 34139378:
WRITEV 10 (00d9fe81-8a31-498e-8401-7b9d1477378e), client:
CTX_ID:d04437ba-ef98-43df-864f-5e9d3738620a-GRAPH_ID:0-PID:27687-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0,
error-xlator: vmstore-posix [Invalid argument]
[2020-03-27 23:40:15.415794] E [MSGID: 113072]
[posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed:
offset 0, [Invalid argument]
My gluster components' version:
gluster-ansible-cluster-1.0.0-1.el7.noarch
gluster-ansible-features-1.0.5-3.el7.noarch
gluster-ansible-infra-1.0.4-3.el7.noarch
gluster-ansible-maintenance-1.0.1-1.el7.noarch
gluster-ansible-repositories-1.0.1-1.el7.noarch
gluster-ansible-roles-1.0.5-7.el7.noarch
glusterfs-6.8-1.el7.x86_64
glusterfs-api-6.8-1.el7.x86_64
glusterfs-cli-6.8-1.el7.x86_64
glusterfs-client-xlators-6.8-1.el7.x86_64
glusterfs-events-6.8-1.el7.x86_64
glusterfs-fuse-6.8-1.el7.x86_64
glusterfs-geo-replication-6.8-1.el7.x86_64
glusterfs-libs-6.8-1.el7.x86_64
glusterfs-rdma-6.8-1.el7.x86_64
glusterfs-server-6.8-1.el7.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-23.el7_7.6.x86_64
python2-gluster-6.8-1.el7.x86_64
vdsm-gluster-4.30.43-1.el7.x86_64
And for completeness, the whole set of parameters for the volume, to ask
eventual further performance related suggestions considering single node
environment:
[root@ovirt tmp]# gluster volume get vmstore all
Option Value
------ -----
cluster.lookup-unhashed on
cluster.lookup-optimize on
cluster.min-free-disk 10%
cluster.min-free-inodes 5%
cluster.rebalance-stats off
cluster.subvols-per-directory (null)
cluster.readdir-optimize off
cluster.rsync-hash-regex (null)
cluster.extra-hash-regex (null)
cluster.dht-xattr-name trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid off
cluster.rebal-throttle normal
cluster.lock-migration off
cluster.force-migration off
cluster.local-volume-name (null)
cluster.weighted-rebalance on
cluster.switch-pattern (null)
cluster.entry-change-log on
cluster.read-subvolume (null)
cluster.read-subvolume-index -1
cluster.read-hash-mode 1
cluster.background-self-heal-count 8
cluster.metadata-self-heal off
cluster.data-self-heal off
cluster.entry-self-heal off
cluster.self-heal-daemon on
cluster.heal-timeout 600
cluster.self-heal-window-size 1
cluster.data-change-log on
cluster.metadata-change-log on
cluster.data-self-heal-algorithm (null)
cluster.eager-lock enable
disperse.eager-lock on
disperse.other-eager-lock on
disperse.eager-lock-timeout 1
disperse.other-eager-lock-timeout 1
cluster.quorum-type none
cluster.quorum-count (null)
cluster.choose-local true
cluster.self-heal-readdir-size 1KB
cluster.post-op-delay-secs 1
cluster.ensure-durability on
cluster.consistent-metadata no
cluster.heal-wait-queue-length 128
cluster.favorite-child-policy none
cluster.full-lock yes
diagnostics.latency-measurement off
diagnostics.dump-fd-stats off
diagnostics.count-fop-hits off
diagnostics.brick-log-level INFO
diagnostics.client-log-level INFO
diagnostics.brick-sys-log-level CRITICAL
diagnostics.client-sys-log-level CRITICAL
diagnostics.brick-logger (null)
diagnostics.client-logger (null)
diagnostics.brick-log-format (null)
diagnostics.client-log-format (null)
diagnostics.brick-log-buf-size 5
diagnostics.client-log-buf-size 5
diagnostics.brick-log-flush-timeout 120
diagnostics.client-log-flush-timeout 120
diagnostics.stats-dump-interval 0
diagnostics.fop-sample-interval 0
diagnostics.stats-dump-format json
diagnostics.fop-sample-buf-size 65535
diagnostics.stats-dnscache-ttl-sec 86400
performance.cache-max-file-size 0
performance.cache-min-file-size 0
performance.cache-refresh-timeout 1
performance.cache-priority
performance.cache-size 32MB
performance.io-thread-count 16
performance.high-prio-threads 16
performance.normal-prio-threads 16
performance.low-prio-threads 32
performance.least-prio-threads 1
performance.enable-least-priority on
performance.iot-watchdog-secs (null)
performance.iot-cleanup-disconnected-reqsoff
performance.iot-pass-through false
performance.io-cache-pass-through false
performance.cache-size 128MB
performance.qr-cache-timeout 1
performance.cache-invalidation false
performance.ctime-invalidation false
performance.flush-behind on
performance.nfs.flush-behind on
performance.write-behind-window-size 1MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct on
performance.nfs.strict-o-direct off
performance.strict-write-ordering off
performance.nfs.strict-write-ordering off
performance.write-behind-trickling-writeson
performance.aggregate-size 128KB
performance.nfs.write-behind-trickling-writeson
performance.lazy-open yes
performance.read-after-open yes
performance.open-behind-pass-through false
performance.read-ahead-page-count 4
performance.read-ahead-pass-through false
performance.readdir-ahead-pass-through false
performance.md-cache-pass-through false
performance.md-cache-timeout 1
performance.cache-swift-metadata true
performance.cache-samba-metadata false
performance.cache-capability-xattrs true
performance.cache-ima-xattrs true
performance.md-cache-statfs off
performance.xattr-cache-list
performance.nl-cache-pass-through false
features.encryption off
network.frame-timeout 1800
network.ping-timeout 30
network.tcp-window-size (null)
client.ssl off
network.remote-dio off
client.event-threads 2
client.tcp-user-timeout 0
client.keepalive-time 20
client.keepalive-interval 2
client.keepalive-count 9
network.tcp-window-size (null)
network.inode-lru-limit 16384
auth.allow *
auth.reject (null)
transport.keepalive 1
server.allow-insecure on
server.root-squash off
server.all-squash off
server.anonuid 65534
server.anongid 65534
server.statedump-path /var/run/gluster
server.outstanding-rpc-limit 64
server.ssl off
auth.ssl-allow *
server.manage-gids off
server.dynamic-auth on
client.send-gids on
server.gid-timeout 300
server.own-thread (null)
server.event-threads 2
server.tcp-user-timeout 42
server.keepalive-time 20
server.keepalive-interval 2
server.keepalive-count 9
transport.listen-backlog 1024
transport.address-family inet
performance.write-behind on
performance.read-ahead off
performance.readdir-ahead on
performance.io-cache off
performance.open-behind on
performance.quick-read off
performance.nl-cache off
performance.stat-prefetch on
performance.client-io-threads on
performance.nfs.write-behind on
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
performance.force-readdirp true
performance.cache-invalidation false
performance.global-cache-invalidation true
features.uss off
features.snapshot-directory .snaps
features.show-snapshot-directory off
features.tag-namespaces off
network.compression off
network.compression.window-size -15
network.compression.mem-level 8
network.compression.min-size 0
network.compression.compression-level -1
network.compression.debug false
features.default-soft-limit 80%
features.soft-timeout 60
features.hard-timeout 5
features.alert-time 86400
features.quota-deem-statfs off
geo-replication.indexing off
geo-replication.indexing off
geo-replication.ignore-pid-check off
geo-replication.ignore-pid-check off
features.quota off
features.inode-quota off
features.bitrot disable
debug.trace off
debug.log-history no
debug.log-file no
debug.exclude-ops (null)
debug.include-ops (null)
debug.error-gen off
debug.error-failure (null)
debug.error-number (null)
debug.random-failure off
debug.error-fops (null)
nfs.disable on
features.read-only off
features.worm off
features.worm-file-level off
features.worm-files-deletable on
features.default-retention-period 120
features.retention-mode relax
features.auto-commit-period 180
storage.linux-aio off
storage.batch-fsync-mode reverse-fsync
storage.batch-fsync-delay-usec 0
storage.owner-uid 36
storage.owner-gid 36
storage.node-uuid-pathinfo off
storage.health-check-interval 30
storage.build-pgfid off
storage.gfid2path on
storage.gfid2path-separator :
storage.reserve 1
storage.health-check-timeout 10
storage.fips-mode-rchecksum off
storage.force-create-mode 0000
storage.force-directory-mode 0000
storage.create-mask 0777
storage.create-directory-mask 0777
storage.max-hardlinks 100
features.ctime on
config.gfproxyd off
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
changelog.changelog off
changelog.changelog-dir {{ brick.path
}}/.glusterfs/changelogs
changelog.encoding ascii
changelog.rollover-time 15
changelog.fsync-interval 5
changelog.changelog-barrier-timeout 120
changelog.capture-del-path off
features.barrier disable
features.barrier-timeout 120
features.trash off
features.trash-dir .trashcan
features.trash-eliminate-path (null)
features.trash-max-filesize 5MB
features.trash-internal-op off
cluster.enable-shared-storage disable
locks.trace off
locks.mandatory-locking off
cluster.disperse-self-heal-daemon enable
cluster.quorum-reads no
client.bind-insecure (null)
features.shard on
features.shard-block-size 64MB
features.shard-lru-limit 16384
features.shard-deletion-rate 100
features.scrub-throttle lazy
features.scrub-freq biweekly
features.scrub false
features.expiry-time 120
features.cache-invalidation off
features.cache-invalidation-timeout 60
features.leases off
features.lease-lock-recall-timeout 60
disperse.background-heals 8
disperse.heal-wait-qlength 128
cluster.heal-timeout 600
dht.force-readdirp on
disperse.read-policy gfid-hash
cluster.shd-max-threads 1
cluster.shd-wait-qlength 1024
cluster.locking-scheme full
cluster.granular-entry-heal no
features.locks-revocation-secs 0
features.locks-revocation-clear-all false
features.locks-revocation-max-blocked 0
features.locks-monkey-unlocking false
features.locks-notify-contention no
features.locks-notify-contention-delay 5
disperse.shd-max-threads 1
disperse.shd-wait-qlength 1024
disperse.cpu-extensions auto
disperse.self-heal-window-size 1
cluster.use-compound-fops off
performance.parallel-readdir off
performance.rda-request-size 131072
performance.rda-low-wmark 4096
performance.rda-high-wmark 128KB
performance.rda-cache-limit 10MB
performance.nl-cache-positive-entry false
performance.nl-cache-limit 10MB
performance.nl-cache-timeout 60
cluster.brick-multiplex off
cluster.max-bricks-per-process 250
disperse.optimistic-change-log on
disperse.stripe-cache 4
cluster.halo-enabled False
cluster.halo-shd-max-latency 99999
cluster.halo-nfsd-max-latency 5
cluster.halo-max-latency 5
cluster.halo-max-replicas 99999
cluster.halo-min-replicas 2
features.selinux on
cluster.daemon-log-level INFO
debug.delay-gen off
delay-gen.delay-percentage 10%
delay-gen.delay-duration 100000
delay-gen.enable
disperse.parallel-writes on
features.sdfs off
features.cloudsync off
features.ctime on
ctime.noatime on
feature.cloudsync-storetype (null)
features.enforce-mandatory-lock off
[root@ovirt tmp]#
Gianluca