
On Sat, Mar 28, 2020 at 2:21 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, having deployed oVirt 4.3.9 single host HCI with Gluster, I see some times VM going into paused state for the error above and needing to manually run it (sometimes this resumal operation fails). Actually it only happened with empty disk (thin provisioned) and sudden high I/O during the initial phase of install of the OS; it didn't happened then during normal operaton (even with 600MB/s of throughput). I suspect something related to metadata extension not able to be in pair with the speed of the physical disk growing.... similar to what happens for block based storage domains where the LVM layer has to extend the logical volume representing the virtual disk
My real world reproduction of the error is during install of OCP 4.3.8 master node, when Red Hat Cores OS boots from network and wipes the disk and I think then transfer an image, so doing high immediate I/O. The VM used as master node has been created with a 120Gb thin provisioned disk (virtio-scsi type) and starts with disk just initialized and empty, going through PXE install. I get this line inside events for the VM
Mar 27, 2020, 12:35:23 AM VM master01 has been paused due to unknown storage error.
Here logs around the time frame above:
- engine.log
https://drive.google.com/file/d/1zpNo5IgFVTAlKXHiAMTL-uvaoXSNMVRO/view?usp=s...
- vdsm.log
https://drive.google.com/file/d/1v8kR0N6PdHBJ5hYzEYKl4-m7v1Lb_cYX/view?usp=s...
Any suggestions?
The disk of the VM is on vmstore storage domain and its gluster volume settings are:
[root@ovirt tmp]# gluster volume info vmstore
Volume Name: vmstore Type: Distribute Volume ID: a6203d77-3b9d-49f9-94c5-9e30562959c4 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: ovirtst.mydomain.storage:/gluster_bricks/vmstore/vmstore Options Reconfigured: performance.low-prio-threads: 32 storage.owner-gid: 36 performance.read-ahead: off user.cifs: off storage.owner-uid: 36 performance.io-cache: off performance.quick-read: off network.ping-timeout: 30 features.shard: on network.remote-dio: off cluster.eager-lock: enable performance.strict-o-direct: on transport.address-family: inet nfs.disable: on [root@ovirt tmp]#
What about config above, related to eventual optimizations to be done based on having single host? And comparing with the virt group of options:
[root@ovirt tmp]# cat /var/lib/glusterd/groups/virt performance.quick-read=off performance.read-ahead=off performance.io-cache=off performance.low-prio-threads=32 network.remote-dio=enable cluster.eager-lock=enable cluster.quorum-type=auto cluster.server-quorum-type=server cluster.data-self-heal-algorithm=full cluster.locking-scheme=granular cluster.shd-max-threads=8 cluster.shd-wait-qlength=10000 features.shard=on user.cifs=off cluster.choose-local=off client.event-threads=4 server.event-threads=4 performance.client-io-threads=on [root@ovirt tmp]#
?
Thanks Gianluca
Further information. What I see around time frame in gluster brick log file gluster_bricks-vmstore-vmstore.log (timestamp is behind 1 hour in log file) [2020-03-27 23:30:38.575808] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down connection CTX_ID:6e8f70b8-1946-4505-860f-be90e5807cb3-GRAPH_ID:0-PID:223418-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0 [2020-03-27 23:35:15.281449] E [MSGID: 113072] [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: offset 0, [Invalid argument] [2020-03-27 23:35:15.281545] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-vmstore-server: 34139378: WRITEV 10 (00d9fe81-8a31-498e-8401-7b9d1477378e), client: CTX_ID:d04437ba-ef98-43df-864f-5e9d3738620a-GRAPH_ID:0-PID:27687-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0, error-xlator: vmstore-posix [Invalid argument] [2020-03-27 23:40:15.415794] E [MSGID: 113072] [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: offset 0, [Invalid argument] My gluster components' version: gluster-ansible-cluster-1.0.0-1.el7.noarch gluster-ansible-features-1.0.5-3.el7.noarch gluster-ansible-infra-1.0.4-3.el7.noarch gluster-ansible-maintenance-1.0.1-1.el7.noarch gluster-ansible-repositories-1.0.1-1.el7.noarch gluster-ansible-roles-1.0.5-7.el7.noarch glusterfs-6.8-1.el7.x86_64 glusterfs-api-6.8-1.el7.x86_64 glusterfs-cli-6.8-1.el7.x86_64 glusterfs-client-xlators-6.8-1.el7.x86_64 glusterfs-events-6.8-1.el7.x86_64 glusterfs-fuse-6.8-1.el7.x86_64 glusterfs-geo-replication-6.8-1.el7.x86_64 glusterfs-libs-6.8-1.el7.x86_64 glusterfs-rdma-6.8-1.el7.x86_64 glusterfs-server-6.8-1.el7.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-23.el7_7.6.x86_64 python2-gluster-6.8-1.el7.x86_64 vdsm-gluster-4.30.43-1.el7.x86_64 And for completeness, the whole set of parameters for the volume, to ask eventual further performance related suggestions considering single node environment: [root@ovirt tmp]# gluster volume get vmstore all Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.force-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal off cluster.data-self-heal off cluster.entry-self-heal off cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm (null) cluster.eager-lock enable disperse.eager-lock on disperse.other-eager-lock on disperse.eager-lock-timeout 1 disperse.other-eager-lock-timeout 1 cluster.quorum-type none cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.full-lock yes diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.stats-dump-format json diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 32 performance.least-prio-threads 1 performance.enable-least-priority on performance.iot-watchdog-secs (null) performance.iot-cleanup-disconnected-reqsoff performance.iot-pass-through false performance.io-cache-pass-through false performance.cache-size 128MB performance.qr-cache-timeout 1 performance.cache-invalidation false performance.ctime-invalidation false performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct on performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-writeson performance.aggregate-size 128KB performance.nfs.write-behind-trickling-writeson performance.lazy-open yes performance.read-after-open yes performance.open-behind-pass-through false performance.read-ahead-page-count 4 performance.read-ahead-pass-through false performance.readdir-ahead-pass-through false performance.md-cache-pass-through false performance.md-cache-timeout 1 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true performance.md-cache-statfs off performance.xattr-cache-list performance.nl-cache-pass-through false features.encryption off network.frame-timeout 1800 network.ping-timeout 30 network.tcp-window-size (null) client.ssl off network.remote-dio off client.event-threads 2 client.tcp-user-timeout 0 client.keepalive-time 20 client.keepalive-interval 2 client.keepalive-count 9 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive 1 server.allow-insecure on server.root-squash off server.all-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 server.ssl off auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 2 server.tcp-user-timeout 42 server.keepalive-time 20 server.keepalive-interval 2 server.keepalive-count 9 transport.listen-backlog 1024 transport.address-family inet performance.write-behind on performance.read-ahead off performance.readdir-ahead on performance.io-cache off performance.open-behind on performance.quick-read off performance.nl-cache off performance.stat-prefetch on performance.client-io-threads on performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation false performance.global-cache-invalidation true features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off features.tag-namespaces off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.disable on features.read-only off features.worm off features.worm-file-level off features.worm-files-deletable on features.default-retention-period 120 features.retention-mode relax features.auto-commit-period 180 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid 36 storage.owner-gid 36 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.gfid2path on storage.gfid2path-separator : storage.reserve 1 storage.health-check-timeout 10 storage.fips-mode-rchecksum off storage.force-create-mode 0000 storage.force-directory-mode 0000 storage.create-mask 0777 storage.create-directory-mask 0777 storage.max-hardlinks 100 features.ctime on config.gfproxyd off cluster.server-quorum-type off cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) features.shard on features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100 features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation off features.cache-invalidation-timeout 60 features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy gfid-hash cluster.shd-max-threads 1 cluster.shd-wait-qlength 1024 cluster.locking-scheme full cluster.granular-entry-heal no features.locks-revocation-secs 0 features.locks-revocation-clear-all false features.locks-revocation-max-blocked 0 features.locks-monkey-unlocking false features.locks-notify-contention no features.locks-notify-contention-delay 5 disperse.shd-max-threads 1 disperse.shd-wait-qlength 1024 disperse.cpu-extensions auto disperse.self-heal-window-size 1 cluster.use-compound-fops off performance.parallel-readdir off performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 10MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60 cluster.brick-multiplex off cluster.max-bricks-per-process 250 disperse.optimistic-change-log on disperse.stripe-cache 4 cluster.halo-enabled False cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2 features.selinux on cluster.daemon-log-level INFO debug.delay-gen off delay-gen.delay-percentage 10% delay-gen.delay-duration 100000 delay-gen.enable disperse.parallel-writes on features.sdfs off features.cloudsync off features.ctime on ctime.noatime on feature.cloudsync-storetype (null) features.enforce-mandatory-lock off [root@ovirt tmp]# Gianluca