Sharding decision for oVirt

8 May 2021

      Description of problem:

Intermittent VM pause and Qcow image corruption after add new bricks.

I'm suffered an issue on image corruption on oVirt 4.3 caused by default gluster ovirt profile, and intermittent VM pause. the problem is similar to #2246 #2254 in glusterfs issue and VM pause issue report in ovirt user group. The gluster vol did not have pending heal object, vol appear in good shape, xfs is healthy, no hardware issue. Sadly few VM have mystery corruption after new bricks added.

Afterwards, I try to simulate the problem with or without "cluster.lookup-optimize off" few time, but the problem is not 100% reproducible with lookup-optimize on, I got 1 of 3 attempt that able to reproduce it. It really depend on the workloads and cache status at that moment and the number of object after rebalance as well.

Also I tried to disable all sharding features, it ran very solid, write performance increase by far, no corruption, no VM pause when the gluster under stress.

So, here is a decision question on shard or not shard.

IMO, even recommendation document saying it break large file into smaller chunk that allow healing to complete faster, a larger file can spread over multiple bricks. But there are uncovered issue compared to full large file in this case, I'd like to further deep dive into the reason why recommend shard as default for oVirt? Especially from the reliability and performance perspective, sharding seems losing this end for ovirt/kvm workloads. Is it more appropriate to just tell ovirt user to ensure underlying single bricks shall be large enough to hold the largest chunk instead? Besides, anything i'm overlooked for the shard setting? I'm really doubt to enable sharding on the volume after disaster.

levin＠mydream.com.hk

Strahil Nikolov

levindecaro＠gmail.com

Strahil Nikolov

levindecaro＠gmail.com

Strahil Nikolov

Ernest Clyde Chua

levindecaro＠gmail.com

Strahil Nikolov

Ernest Clyde Chua

levindecaro＠gmail.com

Strahil Nikolov

levindecaro＠gmail.com

tags

participants (4)