On 2019-04-14 12:07, Alex McWhirter wrote:
> On 2019-04-13 03:15, Strahil wrote:
>> Hi,
>>
>> What is your dirty cache settings on the gluster servers ?
>>
>> Best Regards,
>> Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter
>> <alex(a)triadic.us> wrote:
>>>
>>> I have 8 machines acting as gluster servers. They each have 12
>>> drives
>>> raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together
as
>>> one).
>>>
>>> They connect to the compute hosts and to each other over lacp'd 10GB
>>> connections split across two cisco nexus switched with VPC.
>>>
>>> Gluster has the following set.
>>>
>>> performance.write-behind-window-size: 4MB
>>> performance.flush-behind: on
>>> performance.stat-prefetch: on
>>> server.event-threads: 4
>>> client.event-threads: 8
>>> performance.io-thread-count: 32
>>> network.ping-timeout: 30
>>> cluster.granular-entry-heal: enable
>>> performance.strict-o-direct: on
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> features.shard: on
>>> cluster.shd-wait-qlength: 10000
>>> cluster.shd-max-threads: 8
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> cluster.eager-lock: enable
>>> network.remote-dio: off
>>> performance.low-prio-threads: 32
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> auth.allow: *
>>> user.cifs: off
>>> transport.address-family: inet
>>> nfs.disable: off
>>> performance.client-io-threads: on
>>>
>>>
>>> I have the following sysctl values on gluster client and servers,
>>> using
>>> libgfapi, MTU 9K
>>>
>>> net.core.rmem_max = 134217728
>>> net.core.wmem_max = 134217728
>>> net.ipv4.tcp_rmem = 4096 87380 134217728
>>> net.ipv4.tcp_wmem = 4096 65536 134217728
>>> net.core.netdev_max_backlog = 300000
>>> net.ipv4.tcp_moderate_rcvbuf =1
>>> net.ipv4.tcp_no_metrics_save = 1
>>> net.ipv4.tcp_congestion_control=htcp
>>>
>>> reads with this setup are perfect, benchmarked in VM to be about
>>> 770MB/s
>>> sequential with disk access times of < 1ms. Writes on the other hand
>>> are
>>> all over the place. They peak around 320MB/s sequential write, which
>>> is
>>> what i expect but it seems as if there is some blocking going on.
>>>
>>> During the write test i will hit 320MB/s briefly, then 0MB/s as disk
>>> access time shoot to over 3000ms, then back to 320MB/s. It averages
>>> out
>>> to about 110MB/s afterwards.
>>>
>>> Gluster version is 3.12.15 ovirt is 4.2.7.5
>>>
>>> Any ideas on what i could tune to eliminate or minimize that
>>> blocking?
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>>
https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAG...
>> _______________________________________________
>> Users mailing list -- users(a)ovirt.org
>> To unsubscribe send an email to users-leave(a)ovirt.org
>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>>
https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMB6NCNJL2W...
>
> Just the vdsm defaults
>
> vm.dirty_ratio = 5
> vm.dirty_background_ratio = 2
>
> these boxes only have 8gb of ram as well, so those percentages should
> be super small.
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/H4XWDEHYKD2...
doing a gluster profile my bricks give me some odd numbers.
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 131.00 us 131.00 us 131.00 us 1
FSTAT
0.01 104.50 us 77.00 us 118.00 us 14
STATFS
0.01 95.38 us 45.00 us 130.00 us 16
STAT
0.10 252.39 us 124.00 us 329.00 us 61
LOOKUP
0.22 55.68 us 16.00 us 180.00 us 635
FINODELK
0.43 543.41 us 50.00 us 1760.00 us 125
FSYNC
1.52 573.75 us 76.00 us 5463.00 us 422
FXATTROP
97.72 7443.50 us 184.00 us 34917.00 us 2092
WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 70
FORGET
0.00 0.00 us 0.00 us 0.00 us 1792
RELEASE
0.00 0.00 us 0.00 us 0.00 us 23422
RELEASEDIR
0.01 126.20 us 80.00 us 210.00 us 20
FSTAT
0.06 102.81 us 26.00 us 162.00 us 230
STATFS
0.06 93.51 us 18.00 us 174.00 us 261
STAT
0.57 239.13 us 103.00 us 391.00 us 997
LOOKUP
0.59 59.07 us 15.00 us 6554.00 us 4208
FINODELK
1.31 506.71 us 50.00 us 2735.00 us 1077
FSYNC
2.53 389.07 us 65.00 us 5510.00 us 2720
FXATTROP
28.24 498.18 us 134.00 us 3513.00 us 23688
READ
66.64 4971.59 us 184.00 us 34917.00 us 5601
WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 92.33 us 83.00 us 97.00 us 3
FSTAT
0.01 87.81 us 35.00 us 123.00 us 16
STAT
0.01 101.64 us 67.00 us 133.00 us 14
STATFS
0.11 235.67 us 149.00 us 320.00 us 51
LOOKUP
0.17 497.46 us 170.00 us 771.00 us 35
FSYNC
0.43 247.58 us 81.00 us 983.00 us 181
FXATTROP
0.43 49.37 us 14.00 us 177.00 us 914
FINODELK
98.83 5591.06 us 192.00 us 29586.00 us 1850
WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 92.33 us 83.00 us 97.00 us 3
FSTAT
0.01 87.81 us 35.00 us 123.00 us 16
STAT
0.01 101.64 us 67.00 us 133.00 us 14
STATFS
0.11 235.67 us 149.00 us 320.00 us 51
LOOKUP
0.17 497.46 us 170.00 us 771.00 us 35
FSYNC
0.43 247.58 us 81.00 us 983.00 us 181
FXATTROP
0.43 49.37 us 14.00 us 177.00 us 914
FINODELK
98.83 5591.06 us 192.00 us 29586.00 us 1850
WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 102.40 us 69.00 us 130.00 us 5
FSTAT
0.00 94.50 us 32.00 us 130.00 us 14
STATFS
0.03 231.25 us 97.00 us 332.00 us 55
LOOKUP
0.05 985.54 us 402.00 us 1371.00 us 24
READ
0.09 397.99 us 89.00 us 1072.00 us 113
FSYNC
0.23 384.93 us 68.00 us 3276.00 us 286
FXATTROP
11.66 4835.83 us 214.00 us 25386.00 us 1158
WRITE
87.93 87398.97 us 16.00 us 1325513.00 us 483
FINODELK
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 83
FORGET
0.00 0.00 us 0.00 us 0.00 us 2103
RELEASE
0.00 0.00 us 0.00 us 0.00 us 23419
RELEASEDIR
0.01 114.54 us 51.00 us 175.00 us 80
FSTAT
0.02 94.78 us 28.00 us 176.00 us 230
STATFS
0.18 364.51 us 51.00 us 1072.00 us 531
FSYNC
0.19 221.18 us 97.00 us 432.00 us 936
LOOKUP
0.34 273.10 us 68.00 us 3276.00 us 1354
FXATTROP
12.70 3875.57 us 179.00 us 29246.00 us 3534
WRITE
12.76 560.97 us 141.00 us 4705.00 us 24547
READ
73.80 44651.79 us 12.00 us 1984451.00 us 1783
FINODELK
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 130.50 us 127.00 us 134.00 us 2
FSTAT
0.02 87.12 us 36.00 us 113.00 us 16
STAT
0.02 107.86 us 76.00 us 117.00 us 14
STATFS
0.05 136.09 us 26.00 us 630.00 us 32
READ
0.14 235.45 us 115.00 us 315.00 us 55
LOOKUP
0.35 65.89 us 18.00 us 1283.00 us 477
FINODELK
0.81 648.49 us 105.00 us 3673.00 us 113
FSYNC
1.98 624.26 us 74.00 us 5532.00 us 286
FXATTROP
96.63 7515.45 us 263.00 us 37343.00 us 1158
WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 83
FORGET
0.00 0.00 us 0.00 us 0.00 us 2103
RELEASE
0.00 0.00 us 0.00 us 0.00 us 23422
RELEASEDIR
0.01 123.21 us 49.00 us 194.00 us 29
FSTAT
0.09 101.08 us 33.00 us 149.00 us 230
STATFS
0.10 94.62 us 30.00 us 325.00 us 261
STAT
0.49 71.46 us 15.00 us 1283.00 us 1779
FINODELK
0.86 239.23 us 72.00 us 397.00 us 936
LOOKUP
0.92 447.62 us 41.00 us 3673.00 us 531
FSYNC
1.80 344.20 us 71.00 us 5532.00 us 1354
FXATTROP
28.40 519.98 us 23.00 us 8811.00 us 14159
READ
67.33 4939.29 us 177.00 us 37343.00 us 3534
WRITE
Looks like two of the bricks are seeing excessive latency over the
rest which seem more or less the same +/- 1-3ms.
Looks like i need to debug those two bricks? Obviously +/- 50ms is
unacceptable, but is +/- 3ms also unreasonable for HDD's?
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VT6CTV7OZX7...
Just tested each brick individually, all came back roughly the same. The
odd part i see is this
Host 1 - Bad Latency
Host 2 - Good Latency
Host 3 - Bad Latency
Host 4 - Good Latency
Host 5 - Bad Latency
Host 6 - Good Latency
Host 7 - Bad Latency
Host 8 - Good Latency
To me it looks like the actual write latency from vm -> server is bad,
but the replication of that data (replica 2) is speedy. Could the client
be sending less than ideal block sizes or something similar?