
Description of problem: Intermittent VM pause and Qcow image corruption after add new bricks. I'm suffered an issue on image corruption on oVirt 4.3 caused by default gluster ovirt profile, and intermittent VM pause. the problem is similar to #2246 #2254 in glusterfs issue and VM pause issue report in ovirt user group. The gluster vol did not have pending heal object, vol appear in good shape, xfs is healthy, no hardware issue. Sadly few VM have mystery corruption after new bricks added. Afterwards, I try to simulate the problem with or without "cluster.lookup-optimize off" few time, but the problem is not 100% reproducible with lookup-optimize on, I got 1 of 3 attempt that able to reproduce it. It really depend on the workloads and cache status at that moment and the number of object after rebalance as well. Also I tried to disable all sharding features, it ran very solid, write performance increase by far, no corruption, no VM pause when the gluster under stress. So, here is a decision question on shard or not shard. IMO, even recommendation document saying it break large file into smaller chunk that allow healing to complete faster, a larger file can spread over multiple bricks. But there are uncovered issue compared to full large file in this case, I'd like to further deep dive into the reason why recommend shard as default for oVirt? Especially from the reliability and performance perspective, sharding seems losing this end for ovirt/kvm workloads. Is it more appropriate to just tell ovirt user to ensure underlying single bricks shall be large enough to hold the largest chunk instead? Besides, anything i'm overlooked for the shard setting? I'm really doubt to enable sharding on the volume after disaster.

I would still recommend sharding. Imagine that you got 2 TB disks for a VM and one of the oVirt hosts needs maintenance.When gluster has to heal that 2TB file, your VM won't be able to access the file for a very long time and will fail. Sharding is important for having no-downtime maintenance. Yet I have one question.Did you use preallocated VM disks or you used thin provisioning for the qcow2 ? Best Regards,Strahil Nikolov Description of problem: Intermittent VM pause and Qcow image corruption after add new bricks. I'm suffered an issue on image corruption on oVirt 4.3 caused by default gluster ovirt profile, and intermittent VM pause. the problem is similar to #2246 #2254 in glusterfs issue and VM pause issue report in ovirt user group. The gluster vol did not have pending heal object, vol appear in good shape, xfs is healthy, no hardware issue. Sadly few VM have mystery corruption after new bricks added. Afterwards, I try to simulate the problem with or without "cluster.lookup-optimize off" few time, but the problem is not 100% reproducible with lookup-optimize on, I got 1 of 3 attempt that able to reproduce it. It really depend on the workloads and cache status at that moment and the number of object after rebalance as well. Also I tried to disable all sharding features, it ran very solid, write performance increase by far, no corruption, no VM pause when the gluster under stress. So, here is a decision question on shard or not shard. IMO, even recommendation document saying it break large file into smaller chunk that allow healing to complete faster, a larger file can spread over multiple bricks. But there are uncovered issue compared to full large file in this case, I'd like to further deep dive into the reason why recommend shard as default for oVirt? Especially from the reliability and performance perspective, sharding seems losing this end for ovirt/kvm workloads. Is it more appropriate to just tell ovirt user to ensure underlying single bricks shall be large enough to hold the largest chunk instead? Besides, anything i'm overlooked for the shard setting? I'm really doubt to enable sharding on the volume after disaster. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LFG6KMP7SQV6W3...

Hi Strahil, I'm sorry to said I'm not totally agree with the VM pause behavior you mentioned during healing, VM actually did not pause, but it heal very slow indeed about 1GB per minutes, VM sync write operation appear slowing down when healing undergo. it is comparative when sharding enabled.. chance to be paused is more offten when u do parallel heal info at a time, it is also amplified pause situation when u have million shards. I'd turn off regular heal info in ovirt, the pause situation almost gone. Levin

Also, keep in mind that RHHI is using shards of 512 MB which reduces the shard count. I'm glad that in newer versions of Gluster there are no stalls. Also, in Gluster v9 healing mechanisms - so we can see. Anyway, what does it make you think that sharding was your problem ? Best Regards,Strahil Nikolov Hi Strahil, I'm sorry to said I'm not totally agree with the VM pause behavior you mentioned during healing, VM actually did not pause, but it heal very slow indeed about 1GB per minutes, VM sync write operation appear slowing down when healing undergo. it is comparative when sharding enabled.. chance to be paused is more offten when u do parallel heal info at a time, it is also amplified pause situation when u have million shards. I'd turn off regular heal info in ovirt, the pause situation almost gone. Levin _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/566MMX3BUZ33CV...

It is because of a serious bug on cluster.lookup-optimize, it cause me few VM image corruption after new brick added. Although cluster.lookup-optimize theoretically impact all file not just shards. However, after ran many round verification test, corruption doesn't happen when shards disabled. Therefore I'm interested to see why shards is essential in oVirt defaults.

A quote from : https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/ht... Sharding has one supported use case: in the context of providing Red Hat Gluster Storage as a storage domain for Red Hat Enterprise Virtualization, to provide storage for live virtual machine images. Note that sharding is also a requirement for this use case, as it provides significant performance improvements over previous implementations. Also, FUSE will be able to read multiple shards from multiple bricks - so load should be properly spread among the bricks and performance is most optimal. Also, I don't see that option in https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/ht... How did this option got on your volume? Was this volume created by oVirt or manually ? Best Regards,Strahil Nikolov It is because of a serious bug on cluster.lookup-optimize, it cause me few VM image corruption after new brick added. Although cluster.lookup-optimize theoretically impact all file not just shards. However, after ran many round verification test, corruption doesn't happen when shards disabled. Therefore I'm interested to see why shards is essential in oVirt defaults. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VVEWEV5ESWLUCR...

Good day, the distributed volume was created manually. currently i'm thinking to create a replica on the two new servers which 1 server will hold 2 bricks and replace it later, then recreate the brick for the server hosting 2 bricks into 1, also i found the image location /gluster_bricks/data/data/19cdda62-da1c-4821-9e27-2b2585ededff/images but not sure how to transfer it to a new instance of engine On Mon, May 10, 2021 at 4:49 PM Strahil Nikolov via Users <users@ovirt.org> wrote:
A quote from : https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/ht...
Sharding has one supported use case: in the context of providing Red Hat Gluster Storage as a storage domain for Red Hat Enterprise Virtualization, to provide storage for live virtual machine images. Note that sharding is also a requirement for this use case, as it provides significant performance improvements over previous implementations.
Also, FUSE will be able to read multiple shards from multiple bricks - so load should be properly spread among the bricks and performance is most optimal.
Also, I don't see that option in https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/ht...
How did this option got on your volume? Was this volume created by oVirt or manually ?
Best Regards, Strahil Nikolov
It is because of a serious bug on cluster.lookup-optimize, it cause me few VM image corruption after new brick added. Although cluster.lookup-optimize theoretically impact all file not just shards. However, after ran many round verification test, corruption doesn't happen when shards disabled. Therefore I'm interested to see why shards is essential in oVirt defaults.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VVEWEV5ESWLUCR...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RUMZZV5IA2XUTH...

for most safety, you create a new gluster layout and storage domain, and slowly migrate the VM into new domain. If you do other workaround, you should test it very carefully beforehand.

The problem with sparse qcow2 images is that the Gluster shard xlator might not cope with the random I/O nature of the workload, as it will have to create a lot of shards in a short period of time ( 64MB shard size) for a small I/O ( for example 50 x 512 byte I/O request could cause 50 shards to be created simulatenously ). With VDO enabled, preallocated disk images will take a fraction of the size , but qcow2 metadata and gluster metadata (shard files) will exist and that problem should not exist at all. Can you try to reproduce the bug with Gluster v9.1 ? If it exists, let's create a separate thread in the gluster mailing list. Best Regards,Strahil Nikolov for most safety, you create a new gluster layout and storage domain, and slowly migrate the VM into new domain. If you do other workaround, you should test it very carefully beforehand. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VDMDZZ6TCT5XL5...

Sorry i replied to the wrong thread😳 On Mon, May 10, 2021 at 6:11 PM Ernest Clyde Chua < ernestclydeachua@gmail.com> wrote:
Good day, the distributed volume was created manually. currently i'm thinking to create a replica on the two new servers which 1 server will hold 2 bricks and replace it later, then recreate the brick for the server hosting 2 bricks into 1, also i found the image location /gluster_bricks/data/data/19cdda62-da1c-4821-9e27-2b2585ededff/images but not sure how to transfer it to a new instance of engine
On Mon, May 10, 2021 at 4:49 PM Strahil Nikolov via Users <users@ovirt.org> wrote:
A quote from : https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/ht...
Sharding has one supported use case: in the context of providing Red Hat Gluster Storage as a storage domain for Red Hat Enterprise Virtualization, to provide storage for live virtual machine images. Note that sharding is also a requirement for this use case, as it provides significant performance improvements over previous implementations.
Also, FUSE will be able to read multiple shards from multiple bricks - so load should be properly spread among the bricks and performance is most optimal.
Also, I don't see that option in https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/ht...
How did this option got on your volume? Was this volume created by oVirt or manually ?
Best Regards, Strahil Nikolov
It is because of a serious bug on cluster.lookup-optimize, it cause me few VM image corruption after new brick added. Although cluster.lookup-optimize theoretically impact all file not just shards. However, after ran many round verification test, corruption doesn't happen when shards disabled. Therefore I'm interested to see why shards is essential in oVirt defaults.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VVEWEV5ESWLUCR...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RUMZZV5IA2XUTH...

Hi Strahil, cluster.lookup-optimize had been default turn on since i think is gluster version 6 corresponding in oVirt 4.3, so ovirt inherit this setting regardless of ovirt preset. My volumes are provisioned by ovirt UI. Yes, In the theory, shards improve read performance, however write doesn't, it do slowing down in general. In my lab test environment, 4 x 1.8TB SAS RAID 10 (2 +1 arbiter) JBOD, 10GbE I do get 400MB Read but 70MB sequential write in shard configuration with 1M4Q 70R/30W, but I do get 380MB read 120MB write without shards. In 8k16Q 70R/30W workload, i got double performance when not using shard. Thats will let me re-think about theoretical vs real-world scenario. For the quote of sharding use case from RedHat, yes I've verified that when NO sharding, it will lose live storage migration flexibility. Thank for pointing out this.

Hm... are those tests done with sharding + full disk preallocation ?If yes, then this is quite interesting. Storage migration should be still possible, as oVirt creates a snapshot and the migrates the disks and consolidates them on the new storage location. Best Regards,Strahil Nikolov Hi Strahil, cluster.lookup-optimize had been default turn on since i think is gluster version 6 corresponding in oVirt 4.3, so ovirt inherit this setting regardless of ovirt preset. My volumes are provisioned by ovirt UI. Yes, In the theory, shards improve read performance, however write doesn't, it do slowing down in general. In my lab test environment, 4 x 1.8TB SAS RAID 10 (2 +1 arbiter) JBOD, 10GbE I do get 400MB Read but 70MB sequential write in shard configuration with 1M4Q 70R/30W, but I do get 380MB read 120MB write without shards. In 8k16Q 70R/30W workload, i got double performance when not using shard. Thats will let me re-think about theoretical vs real-world scenario. For the quote of sharding use case from RedHat, yes I've verified that when NO sharding, it will lose live storage migration flexibility. Thank for pointing out this. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZZDWJAVBLNDGFR...

Right, I re-test again, shard setting didn't interference the migration, my previous test failure is caused by root file privilege reset bug in 4.3. All my test is using qcow spare file, I'm afraid that I will not drag into preallocate file comparison because it is not practical in our usage. Although preallocation is good for performance, but we won't use it anyway.
participants (4)
-
Ernest Clyde Chua
-
levin@mydream.com.hk
-
levindecaro@gmail.com
-
Strahil Nikolov