[ovirt-users] Sometimes paused due to unknown storage error on gluster

Friday, 27 March 2020

Hello,
having deployed oVirt 4.3.9 single host HCI with Gluster, I see some times
VM going into paused state for the error above and needing to manually run
it (sometimes this resumal operation fails).
Actually it only happened with empty disk (thin provisioned) and sudden
high I/O during the initial phase of install of the OS; it didn't happened
then during normal operaton (even with 600MB/s of throughput).
I suspect something related to metadata extension not able to be in pair
with the speed of the physical disk growing.... similar to what happens for
block based storage domains where the LVM layer has to extend the logical
volume representing the virtual disk

My real world reproduction of the error is during install of OCP 4.3.8
master node, when Red Hat Cores OS boots from network and wipes the disk
and I think then transfer an image, so doing high immediate I/O.
The VM used as master node has been created with a 120Gb thin provisioned
disk (virtio-scsi type) and starts with disk just initialized and empty,
going through PXE install.
I get this line inside events for the VM

Mar 27, 2020, 12:35:23 AM VM master01 has been paused due to unknown
storage error.

Here logs around the time frame above:

- engine.log
https://drive.google.com/file/d/1zpNo5IgFVTAlKXHiAMTL-uvaoXSNMVRO/view?us...

- vdsm.log
https://drive.google.com/file/d/1v8kR0N6PdHBJ5hYzEYKl4-m7v1Lb_cYX/view?us...

Any suggestions?

The disk of the VM is on vmstore storage domain and its gluster volume
settings are:

[root@ovirt tmp]# gluster volume info vmstore

Volume Name: vmstore
Type: Distribute
Volume ID: a6203d77-3b9d-49f9-94c5-9e30562959c4
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: ovirtst.mydomain.storage:/gluster_bricks/vmstore/vmstore
Options Reconfigured:
performance.low-prio-threads: 32
storage.owner-gid: 36
performance.read-ahead: off
user.cifs: off
storage.owner-uid: 36
performance.io-cache: off
performance.quick-read: off
network.ping-timeout: 30
features.shard: on
network.remote-dio: off
cluster.eager-lock: enable
performance.strict-o-direct: on
transport.address-family: inet
nfs.disable: on
[root@ovirt tmp]#

What about config above, related to eventual optimizations to be done based
on having single host?
And comparing with the virt group of options:

[root@ovirt tmp]# cat /var/lib/glusterd/groups/virt
performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.low-prio-threads=32
network.remote-dio=enable
cluster.eager-lock=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.data-self-heal-algorithm=full
cluster.locking-scheme=granular
cluster.shd-max-threads=8
cluster.shd-wait-qlength=10000
features.shard=on
user.cifs=off
cluster.choose-local=off
client.event-threads=4
server.event-threads=4
performance.client-io-threads=on
[root@ovirt tmp]#

?

Thanks Gianluca

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Sometimes paused due to unknown storage error on gluster