On Sat, Mar 7, 2020 at 2:52 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:

On March 7, 2020 1:09:37 AM GMT+02:00, Jayme <jaymef@gmail.com> wrote:
>Strahil,
>
>Thanks for your suggestions. The config is pretty standard HCI setup
>with
>cockpit and hosts are oVirt node. XFS was handled by the deployment
>automatically. The gluster volumes were optimized for virt store.
>
>I tried noop on the SSDs, that made zero difference in the tests I was
>running above. I took a look at the random-io-profile and it looks like
>it
>really only sets vm.dirty_background_ratio = 2 & vm.dirty_ratio = 5 --
>my
>hosts already appear to have those sysctl values, and by default are
>using virtual-host tuned profile.
>
>I'm curious what a test like "dd if=/dev/zero of=test2.img bs=512
>count=1000 oflag=dsync" on one of your VMs would show for results?
>
>I haven't done much with gluster profiling but will take a look and see
>if
>I can make sense of it. Otherwise, the setup is pretty stock oVirt HCI
>deployment with SSD backed storage and 10Gbe storage network. I'm not
>coming anywhere close to maxing network throughput.
>
>The NFS export I was testing was an export from a local server
>exporting a
>single SSD (same type as in the oVirt hosts).
>
>I might end up switching storage to NFS and ditching gluster if
>performance
>is really this much better...
>
>
>On Fri, Mar 6, 2020 at 5:06 PM Strahil Nikolov <hunter86_bg@yahoo.com>
>wrote:
>
>> On March 6, 2020 6:02:03 PM GMT+02:00, Jayme <jaymef@gmail.com>
>wrote:
>> >I have 3 server HCI with Gluster replica 3 storage (10GBe and SSD
>> >disks).
>> >Small file performance inner-vm is pretty terrible compared to a
>> >similar
>> >spec'ed VM using NFS mount (10GBe network, SSD disk)
>> >
>> >VM with gluster storage:
>> >
>> ># dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync
>> >1000+0 records in
>> >1000+0 records out
>> >512000 bytes (512 kB) copied, 53.9616 s, 9.5 kB/s
>> >
>> >VM with NFS:
>> >
>> ># dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync
>> >1000+0 records in
>> >1000+0 records out
>> >512000 bytes (512 kB) copied, 2.20059 s, 233 kB/s
>> >
>> >This is a very big difference, 2 seconds to copy 1000 files on NFS
>VM
>> >VS 53
>> >seconds on the other.
>> >
>> >Aside from enabling libgfapi is there anything I can tune on the
>> >gluster or
>> >VM side to improve small file performance? I have seen some guides
>by
>> >Redhat in regards to small file performance but I'm not sure what/if
>> >any of
>> >it applies to oVirt's implementation of gluster in HCI.
>>
>> You can use the rhgs-random-io tuned profile from
>>
>ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.4.2.0-1.el7rhgs.src.rpm
>> and try with that on your hosts.
>> In my case, I have modified it so it's a mixture between
>rhgs-random-io
>> and the profile for Virtualization Host.
>>
>> Also,ensure that your bricks are using XFS with relatime/noatime
>mount
>> option and your scheduler for the SSDs is either 'noop' or 'none'
>.The
>> default I/O scheduler for RHEL7 is deadline which is giving
>preference to
>> reads and your workload is definitely 'write'.
>>
>> Ensure that the virt settings are enabled for your gluster volumes:
>> 'gluster volume set <volname> group virt'
>>
>> Also, are you running on fully allocated disks for the VM or you
>started
>> thin ?
>> I'm asking as creation of new shards at gluster level is a slow
>task.
>>
>> Have you checked gluster profiling the volume? It can clarify what
>is
>> going on.
>>
>>
>> Also are you comparing apples to apples ?
>> For example, 1 ssd mounted and exported as NFS and a replica 3
>volume
>> of the same type of ssd ? If not, the NFS can have more iops due to
>> multiple disks behind it, while Gluster has to write the same thing
>on all
>> nodes.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>>

Hi Jayme,

My test are not quite good ,as I have a different setup:

NVME - VDO - 4 thin LVs -XFS - 4 Gluster volumes (replica 2 arbiter 1) - 4 storage domains - striped LV in each VM

RHEL7 VM (fully stock):
[root@node1 ~]# dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 19.8195 s, 25.8 kB/s
[root@node1 ~]#

Brick:
[root@ovirt1 data_fast]# dd if=/dev/zero of=test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.41192 s, 363 kB/s

As I use VDO with compression (on 1/4 of the NVMe) - I cannot expect any performance from it.

Is your app really using dsync ? I have seen many times that performance testing with the wrong tools/tests cause more trouble than it should.

I would recommend you to test with a real workload before deciding to change the architecture.

I forgot to mention that you need to disable c states for your systems if you are chasing performance.
Run a gluster profile while you run real workload in your VMs and then provide that for analysis.

Which version of Gluster are you using ?

Best Regards,
Strahil Nikolov