Re: Tuning Gluster Writes

Hi, What is your dirty cache settings on the gluster servers ? Best Regards, Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter <alex@triadic.us> wrote:
I have 8 machines acting as gluster servers. They each have 12 drives raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as one).
They connect to the compute hosts and to each other over lacp'd 10GB connections split across two cisco nexus switched with VPC.
Gluster has the following set.
performance.write-behind-window-size: 4MB performance.flush-behind: on performance.stat-prefetch: on server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 32 network.ping-timeout: 30 cluster.granular-entry-heal: enable performance.strict-o-direct: on storage.owner-gid: 36 storage.owner-uid: 36 features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: off transport.address-family: inet nfs.disable: off performance.client-io-threads: on
I have the following sysctl values on gluster client and servers, using libgfapi, MTU 9K
net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf =1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_congestion_control=htcp
reads with this setup are perfect, benchmarked in VM to be about 770MB/s sequential with disk access times of < 1ms. Writes on the other hand are all over the place. They peak around 320MB/s sequential write, which is what i expect but it seems as if there is some blocking going on.
During the write test i will hit 320MB/s briefly, then 0MB/s as disk access time shoot to over 3000ms, then back to 320MB/s. It averages out to about 110MB/s afterwards.
Gluster version is 3.12.15 ovirt is 4.2.7.5
Any ideas on what i could tune to eliminate or minimize that blocking? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAGICE...

On 2019-04-13 03:15, Strahil wrote:
Hi,
What is your dirty cache settings on the gluster servers ?
Best Regards, Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter <alex@triadic.us> wrote:
I have 8 machines acting as gluster servers. They each have 12 drives raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as one).
They connect to the compute hosts and to each other over lacp'd 10GB connections split across two cisco nexus switched with VPC.
Gluster has the following set.
performance.write-behind-window-size: 4MB performance.flush-behind: on performance.stat-prefetch: on server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 32 network.ping-timeout: 30 cluster.granular-entry-heal: enable performance.strict-o-direct: on storage.owner-gid: 36 storage.owner-uid: 36 features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: off transport.address-family: inet nfs.disable: off performance.client-io-threads: on
I have the following sysctl values on gluster client and servers, using libgfapi, MTU 9K
net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf =1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_congestion_control=htcp
reads with this setup are perfect, benchmarked in VM to be about 770MB/s sequential with disk access times of < 1ms. Writes on the other hand are all over the place. They peak around 320MB/s sequential write, which is what i expect but it seems as if there is some blocking going on.
During the write test i will hit 320MB/s briefly, then 0MB/s as disk access time shoot to over 3000ms, then back to 320MB/s. It averages out to about 110MB/s afterwards.
Gluster version is 3.12.15 ovirt is 4.2.7.5
Any ideas on what i could tune to eliminate or minimize that blocking? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAGICE...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMB6NCNJL2WKED...
Just the vdsm defaults vm.dirty_ratio = 5 vm.dirty_background_ratio = 2 these boxes only have 8gb of ram as well, so those percentages should be super small.

On 2019-04-14 12:07, Alex McWhirter wrote:
On 2019-04-13 03:15, Strahil wrote:
Hi,
What is your dirty cache settings on the gluster servers ?
Best Regards, Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter <alex@triadic.us> wrote:
I have 8 machines acting as gluster servers. They each have 12 drives raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as one).
They connect to the compute hosts and to each other over lacp'd 10GB connections split across two cisco nexus switched with VPC.
Gluster has the following set.
performance.write-behind-window-size: 4MB performance.flush-behind: on performance.stat-prefetch: on server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 32 network.ping-timeout: 30 cluster.granular-entry-heal: enable performance.strict-o-direct: on storage.owner-gid: 36 storage.owner-uid: 36 features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: off transport.address-family: inet nfs.disable: off performance.client-io-threads: on
I have the following sysctl values on gluster client and servers, using libgfapi, MTU 9K
net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf =1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_congestion_control=htcp
reads with this setup are perfect, benchmarked in VM to be about 770MB/s sequential with disk access times of < 1ms. Writes on the other hand are all over the place. They peak around 320MB/s sequential write, which is what i expect but it seems as if there is some blocking going on.
During the write test i will hit 320MB/s briefly, then 0MB/s as disk access time shoot to over 3000ms, then back to 320MB/s. It averages out to about 110MB/s afterwards.
Gluster version is 3.12.15 ovirt is 4.2.7.5
Any ideas on what i could tune to eliminate or minimize that blocking? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAGICE...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMB6NCNJL2WKED...
Just the vdsm defaults
vm.dirty_ratio = 5 vm.dirty_background_ratio = 2
these boxes only have 8gb of ram as well, so those percentages should be super small. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/H4XWDEHYKD2MQU...
doing a gluster profile my bricks give me some odd numbers. %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 131.00 us 131.00 us 131.00 us 1 FSTAT 0.01 104.50 us 77.00 us 118.00 us 14 STATFS 0.01 95.38 us 45.00 us 130.00 us 16 STAT 0.10 252.39 us 124.00 us 329.00 us 61 LOOKUP 0.22 55.68 us 16.00 us 180.00 us 635 FINODELK 0.43 543.41 us 50.00 us 1760.00 us 125 FSYNC 1.52 573.75 us 76.00 us 5463.00 us 422 FXATTROP 97.72 7443.50 us 184.00 us 34917.00 us 2092 WRITE %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 70 FORGET 0.00 0.00 us 0.00 us 0.00 us 1792 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23422 RELEASEDIR 0.01 126.20 us 80.00 us 210.00 us 20 FSTAT 0.06 102.81 us 26.00 us 162.00 us 230 STATFS 0.06 93.51 us 18.00 us 174.00 us 261 STAT 0.57 239.13 us 103.00 us 391.00 us 997 LOOKUP 0.59 59.07 us 15.00 us 6554.00 us 4208 FINODELK 1.31 506.71 us 50.00 us 2735.00 us 1077 FSYNC 2.53 389.07 us 65.00 us 5510.00 us 2720 FXATTROP 28.24 498.18 us 134.00 us 3513.00 us 23688 READ 66.64 4971.59 us 184.00 us 34917.00 us 5601 WRITE %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 92.33 us 83.00 us 97.00 us 3 FSTAT 0.01 87.81 us 35.00 us 123.00 us 16 STAT 0.01 101.64 us 67.00 us 133.00 us 14 STATFS 0.11 235.67 us 149.00 us 320.00 us 51 LOOKUP 0.17 497.46 us 170.00 us 771.00 us 35 FSYNC 0.43 247.58 us 81.00 us 983.00 us 181 FXATTROP 0.43 49.37 us 14.00 us 177.00 us 914 FINODELK 98.83 5591.06 us 192.00 us 29586.00 us 1850 WRITE %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 92.33 us 83.00 us 97.00 us 3 FSTAT 0.01 87.81 us 35.00 us 123.00 us 16 STAT 0.01 101.64 us 67.00 us 133.00 us 14 STATFS 0.11 235.67 us 149.00 us 320.00 us 51 LOOKUP 0.17 497.46 us 170.00 us 771.00 us 35 FSYNC 0.43 247.58 us 81.00 us 983.00 us 181 FXATTROP 0.43 49.37 us 14.00 us 177.00 us 914 FINODELK 98.83 5591.06 us 192.00 us 29586.00 us 1850 WRITE %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 102.40 us 69.00 us 130.00 us 5 FSTAT 0.00 94.50 us 32.00 us 130.00 us 14 STATFS 0.03 231.25 us 97.00 us 332.00 us 55 LOOKUP 0.05 985.54 us 402.00 us 1371.00 us 24 READ 0.09 397.99 us 89.00 us 1072.00 us 113 FSYNC 0.23 384.93 us 68.00 us 3276.00 us 286 FXATTROP 11.66 4835.83 us 214.00 us 25386.00 us 1158 WRITE 87.93 87398.97 us 16.00 us 1325513.00 us 483 FINODELK %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 83 FORGET 0.00 0.00 us 0.00 us 0.00 us 2103 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23419 RELEASEDIR 0.01 114.54 us 51.00 us 175.00 us 80 FSTAT 0.02 94.78 us 28.00 us 176.00 us 230 STATFS 0.18 364.51 us 51.00 us 1072.00 us 531 FSYNC 0.19 221.18 us 97.00 us 432.00 us 936 LOOKUP 0.34 273.10 us 68.00 us 3276.00 us 1354 FXATTROP 12.70 3875.57 us 179.00 us 29246.00 us 3534 WRITE 12.76 560.97 us 141.00 us 4705.00 us 24547 READ 73.80 44651.79 us 12.00 us 1984451.00 us 1783 FINODELK %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 130.50 us 127.00 us 134.00 us 2 FSTAT 0.02 87.12 us 36.00 us 113.00 us 16 STAT 0.02 107.86 us 76.00 us 117.00 us 14 STATFS 0.05 136.09 us 26.00 us 630.00 us 32 READ 0.14 235.45 us 115.00 us 315.00 us 55 LOOKUP 0.35 65.89 us 18.00 us 1283.00 us 477 FINODELK 0.81 648.49 us 105.00 us 3673.00 us 113 FSYNC 1.98 624.26 us 74.00 us 5532.00 us 286 FXATTROP 96.63 7515.45 us 263.00 us 37343.00 us 1158 WRITE %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 83 FORGET 0.00 0.00 us 0.00 us 0.00 us 2103 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23422 RELEASEDIR 0.01 123.21 us 49.00 us 194.00 us 29 FSTAT 0.09 101.08 us 33.00 us 149.00 us 230 STATFS 0.10 94.62 us 30.00 us 325.00 us 261 STAT 0.49 71.46 us 15.00 us 1283.00 us 1779 FINODELK 0.86 239.23 us 72.00 us 397.00 us 936 LOOKUP 0.92 447.62 us 41.00 us 3673.00 us 531 FSYNC 1.80 344.20 us 71.00 us 5532.00 us 1354 FXATTROP 28.40 519.98 us 23.00 us 8811.00 us 14159 READ 67.33 4939.29 us 177.00 us 37343.00 us 3534 WRITE Looks like two of the bricks are seeing excessive latency over the rest which seem more or less the same +/- 1-3ms. Looks like i need to debug those two bricks? Obviously +/- 50ms is unacceptable, but is +/- 3ms also unreasonable for HDD's?

On 2019-04-14 13:05, Alex McWhirter wrote:
On 2019-04-14 12:07, Alex McWhirter wrote:
On 2019-04-13 03:15, Strahil wrote:
Hi,
What is your dirty cache settings on the gluster servers ?
Best Regards, Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter <alex@triadic.us> wrote:
I have 8 machines acting as gluster servers. They each have 12 drives raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as one).
They connect to the compute hosts and to each other over lacp'd 10GB connections split across two cisco nexus switched with VPC.
Gluster has the following set.
performance.write-behind-window-size: 4MB performance.flush-behind: on performance.stat-prefetch: on server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 32 network.ping-timeout: 30 cluster.granular-entry-heal: enable performance.strict-o-direct: on storage.owner-gid: 36 storage.owner-uid: 36 features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: off transport.address-family: inet nfs.disable: off performance.client-io-threads: on
I have the following sysctl values on gluster client and servers, using libgfapi, MTU 9K
net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf =1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_congestion_control=htcp
reads with this setup are perfect, benchmarked in VM to be about 770MB/s sequential with disk access times of < 1ms. Writes on the other hand are all over the place. They peak around 320MB/s sequential write, which is what i expect but it seems as if there is some blocking going on.
During the write test i will hit 320MB/s briefly, then 0MB/s as disk access time shoot to over 3000ms, then back to 320MB/s. It averages out to about 110MB/s afterwards.
Gluster version is 3.12.15 ovirt is 4.2.7.5
Any ideas on what i could tune to eliminate or minimize that blocking? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAGICE...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMB6NCNJL2WKED...
Just the vdsm defaults
vm.dirty_ratio = 5 vm.dirty_background_ratio = 2
these boxes only have 8gb of ram as well, so those percentages should be super small. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/H4XWDEHYKD2MQU...
doing a gluster profile my bricks give me some odd numbers.
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 131.00 us 131.00 us 131.00 us 1 FSTAT 0.01 104.50 us 77.00 us 118.00 us 14 STATFS 0.01 95.38 us 45.00 us 130.00 us 16 STAT 0.10 252.39 us 124.00 us 329.00 us 61 LOOKUP 0.22 55.68 us 16.00 us 180.00 us 635 FINODELK 0.43 543.41 us 50.00 us 1760.00 us 125 FSYNC 1.52 573.75 us 76.00 us 5463.00 us 422 FXATTROP 97.72 7443.50 us 184.00 us 34917.00 us 2092 WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 70 FORGET 0.00 0.00 us 0.00 us 0.00 us 1792 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23422 RELEASEDIR 0.01 126.20 us 80.00 us 210.00 us 20 FSTAT 0.06 102.81 us 26.00 us 162.00 us 230 STATFS 0.06 93.51 us 18.00 us 174.00 us 261 STAT 0.57 239.13 us 103.00 us 391.00 us 997 LOOKUP 0.59 59.07 us 15.00 us 6554.00 us 4208 FINODELK 1.31 506.71 us 50.00 us 2735.00 us 1077 FSYNC 2.53 389.07 us 65.00 us 5510.00 us 2720 FXATTROP 28.24 498.18 us 134.00 us 3513.00 us 23688 READ 66.64 4971.59 us 184.00 us 34917.00 us 5601 WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 92.33 us 83.00 us 97.00 us 3 FSTAT 0.01 87.81 us 35.00 us 123.00 us 16 STAT 0.01 101.64 us 67.00 us 133.00 us 14 STATFS 0.11 235.67 us 149.00 us 320.00 us 51 LOOKUP 0.17 497.46 us 170.00 us 771.00 us 35 FSYNC 0.43 247.58 us 81.00 us 983.00 us 181 FXATTROP 0.43 49.37 us 14.00 us 177.00 us 914 FINODELK 98.83 5591.06 us 192.00 us 29586.00 us 1850 WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 92.33 us 83.00 us 97.00 us 3 FSTAT 0.01 87.81 us 35.00 us 123.00 us 16 STAT 0.01 101.64 us 67.00 us 133.00 us 14 STATFS 0.11 235.67 us 149.00 us 320.00 us 51 LOOKUP 0.17 497.46 us 170.00 us 771.00 us 35 FSYNC 0.43 247.58 us 81.00 us 983.00 us 181 FXATTROP 0.43 49.37 us 14.00 us 177.00 us 914 FINODELK 98.83 5591.06 us 192.00 us 29586.00 us 1850 WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 102.40 us 69.00 us 130.00 us 5 FSTAT 0.00 94.50 us 32.00 us 130.00 us 14 STATFS 0.03 231.25 us 97.00 us 332.00 us 55 LOOKUP 0.05 985.54 us 402.00 us 1371.00 us 24 READ 0.09 397.99 us 89.00 us 1072.00 us 113 FSYNC 0.23 384.93 us 68.00 us 3276.00 us 286 FXATTROP 11.66 4835.83 us 214.00 us 25386.00 us 1158 WRITE 87.93 87398.97 us 16.00 us 1325513.00 us 483 FINODELK
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 83 FORGET 0.00 0.00 us 0.00 us 0.00 us 2103 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23419 RELEASEDIR 0.01 114.54 us 51.00 us 175.00 us 80 FSTAT 0.02 94.78 us 28.00 us 176.00 us 230 STATFS 0.18 364.51 us 51.00 us 1072.00 us 531 FSYNC 0.19 221.18 us 97.00 us 432.00 us 936 LOOKUP 0.34 273.10 us 68.00 us 3276.00 us 1354 FXATTROP 12.70 3875.57 us 179.00 us 29246.00 us 3534 WRITE 12.76 560.97 us 141.00 us 4705.00 us 24547 READ 73.80 44651.79 us 12.00 us 1984451.00 us 1783 FINODELK
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 130.50 us 127.00 us 134.00 us 2 FSTAT 0.02 87.12 us 36.00 us 113.00 us 16 STAT 0.02 107.86 us 76.00 us 117.00 us 14 STATFS 0.05 136.09 us 26.00 us 630.00 us 32 READ 0.14 235.45 us 115.00 us 315.00 us 55 LOOKUP 0.35 65.89 us 18.00 us 1283.00 us 477 FINODELK 0.81 648.49 us 105.00 us 3673.00 us 113 FSYNC 1.98 624.26 us 74.00 us 5532.00 us 286 FXATTROP 96.63 7515.45 us 263.00 us 37343.00 us 1158 WRITE
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 83 FORGET 0.00 0.00 us 0.00 us 0.00 us 2103 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23422 RELEASEDIR 0.01 123.21 us 49.00 us 194.00 us 29 FSTAT 0.09 101.08 us 33.00 us 149.00 us 230 STATFS 0.10 94.62 us 30.00 us 325.00 us 261 STAT 0.49 71.46 us 15.00 us 1283.00 us 1779 FINODELK 0.86 239.23 us 72.00 us 397.00 us 936 LOOKUP 0.92 447.62 us 41.00 us 3673.00 us 531 FSYNC 1.80 344.20 us 71.00 us 5532.00 us 1354 FXATTROP 28.40 519.98 us 23.00 us 8811.00 us 14159 READ 67.33 4939.29 us 177.00 us 37343.00 us 3534 WRITE
Looks like two of the bricks are seeing excessive latency over the rest which seem more or less the same +/- 1-3ms.
Looks like i need to debug those two bricks? Obviously +/- 50ms is unacceptable, but is +/- 3ms also unreasonable for HDD's? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VT6CTV7OZX7IGI...
Just tested each brick individually, all came back roughly the same. The odd part i see is this Host 1 - Bad Latency Host 2 - Good Latency Host 3 - Bad Latency Host 4 - Good Latency Host 5 - Bad Latency Host 6 - Good Latency Host 7 - Bad Latency Host 8 - Good Latency To me it looks like the actual write latency from vm -> server is bad, but the replication of that data (replica 2) is speedy. Could the client be sending less than ideal block sizes or something similar?

Some kernels do not like values below 5%, thus I prefer to use vm.dirty_bytes & vm.dirty_background_bytes. Try the following ones (comment out the vdsm.conf values ):vm.dirty_background_bytes = 200000000vm.dirty_bytes = 450000000 It's more like shooting in the dark , but it might help. Best Regards,Strahil Nikolov В неделя, 14 април 2019 г., 19:06:07 ч. Гринуич+3, Alex McWhirter <alex@triadic.us> написа: On 2019-04-13 03:15, Strahil wrote:
Hi,
What is your dirty cache settings on the gluster servers ?
Best Regards, Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter <alex@triadic.us> wrote:
I have 8 machines acting as gluster servers. They each have 12 drives raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as one).
They connect to the compute hosts and to each other over lacp'd 10GB connections split across two cisco nexus switched with VPC.
Gluster has the following set.
performance.write-behind-window-size: 4MB performance.flush-behind: on performance.stat-prefetch: on server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 32 network.ping-timeout: 30 cluster.granular-entry-heal: enable performance.strict-o-direct: on storage.owner-gid: 36 storage.owner-uid: 36 features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: off transport.address-family: inet nfs.disable: off performance.client-io-threads: on
I have the following sysctl values on gluster client and servers, using libgfapi, MTU 9K
net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf =1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_congestion_control=htcp
reads with this setup are perfect, benchmarked in VM to be about 770MB/s sequential with disk access times of < 1ms. Writes on the other hand are all over the place. They peak around 320MB/s sequential write, which is what i expect but it seems as if there is some blocking going on.
During the write test i will hit 320MB/s briefly, then 0MB/s as disk access time shoot to over 3000ms, then back to 320MB/s. It averages out to about 110MB/s afterwards.
Gluster version is 3.12.15 ovirt is 4.2.7.5
Any ideas on what i could tune to eliminate or minimize that blocking? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAGICE...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMB6NCNJL2WKED...
Just the vdsm defaults vm.dirty_ratio = 5 vm.dirty_background_ratio = 2 these boxes only have 8gb of ram as well, so those percentages should be super small.

On 2019-04-14 17:07, Strahil Nikolov wrote:
Some kernels do not like values below 5%, thus I prefer to use vm.dirty_bytes & vm.dirty_background_bytes. Try the following ones (comment out the vdsm.conf values ):
vm.dirty_background_bytes = 200000000 vm.dirty_bytes = 450000000 It's more like shooting in the dark , but it might help.
Best Regards, Strahil Nikolov
В неделя, 14 април 2019 г., 19:06:07 ч. Гринуич+3, Alex McWhirter <alex@triadic.us> написа:
On 2019-04-13 03:15, Strahil wrote:
Hi,
What is your dirty cache settings on the gluster servers ?
Best Regards, Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter <alex@triadic.us> wrote:
I have 8 machines acting as gluster servers. They each have 12 drives raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as one).
They connect to the compute hosts and to each other over lacp'd 10GB connections split across two cisco nexus switched with VPC.
Gluster has the following set.
performance.write-behind-window-size: 4MB performance.flush-behind: on performance.stat-prefetch: on server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 32 network.ping-timeout: 30 cluster.granular-entry-heal: enable performance.strict-o-direct: on storage.owner-gid: 36 storage.owner-uid: 36 features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: off transport.address-family: inet nfs.disable: off performance.client-io-threads: on
I have the following sysctl values on gluster client and servers, using libgfapi, MTU 9K
net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf =1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_congestion_control=htcp
reads with this setup are perfect, benchmarked in VM to be about 770MB/s sequential with disk access times of < 1ms. Writes on the other hand are all over the place. They peak around 320MB/s sequential write, which is what i expect but it seems as if there is some blocking going on.
During the write test i will hit 320MB/s briefly, then 0MB/s as disk access time shoot to over 3000ms, then back to 320MB/s. It averages out to about 110MB/s afterwards.
Gluster version is 3.12.15 ovirt is 4.2.7.5
Any ideas on what i could tune to eliminate or minimize that blocking? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAGICE...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMB6NCNJL2WKED...
Just the vdsm defaults
vm.dirty_ratio = 5 vm.dirty_background_ratio = 2
these boxes only have 8gb of ram as well, so those percentages should be super small.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5U6QGARQSLFXMP...
i will try this, I went in and disabled TCP offload on all the nics, huge performance boost. went from 110MB/s to 240MB/s seq writes, reads lost a bit of performance going down to 680MB/s, but that's a decent trade off. Latency is still really high though, need to work on that. I think some more TCP tuning might help.

On 2019-04-14 22:47, Alex McWhirter wrote:
On 2019-04-14 17:07, Strahil Nikolov wrote:
Some kernels do not like values below 5%, thus I prefer to use vm.dirty_bytes & vm.dirty_background_bytes. Try the following ones (comment out the vdsm.conf values ):
vm.dirty_background_bytes = 200000000 vm.dirty_bytes = 450000000 It's more like shooting in the dark , but it might help.
Best Regards, Strahil Nikolov
В неделя, 14 април 2019 г., 19:06:07 ч. Гринуич+3, Alex McWhirter <alex@triadic.us> написа:
On 2019-04-13 03:15, Strahil wrote:
Hi,
What is your dirty cache settings on the gluster servers ?
Best Regards, Strahil NikolovOn Apr 13, 2019 00:44, Alex McWhirter <alex@triadic.us> wrote:
I have 8 machines acting as gluster servers. They each have 12 drives raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as one).
They connect to the compute hosts and to each other over lacp'd 10GB connections split across two cisco nexus switched with VPC.
Gluster has the following set.
performance.write-behind-window-size: 4MB performance.flush-behind: on performance.stat-prefetch: on server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 32 network.ping-timeout: 30 cluster.granular-entry-heal: enable performance.strict-o-direct: on storage.owner-gid: 36 storage.owner-uid: 36 features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: off transport.address-family: inet nfs.disable: off performance.client-io-threads: on
I have the following sysctl values on gluster client and servers, using libgfapi, MTU 9K
net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf =1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_congestion_control=htcp
reads with this setup are perfect, benchmarked in VM to be about 770MB/s sequential with disk access times of < 1ms. Writes on the other hand are all over the place. They peak around 320MB/s sequential write, which is what i expect but it seems as if there is some blocking going on.
During the write test i will hit 320MB/s briefly, then 0MB/s as disk access time shoot to over 3000ms, then back to 320MB/s. It averages out to about 110MB/s afterwards.
Gluster version is 3.12.15 ovirt is 4.2.7.5
Any ideas on what i could tune to eliminate or minimize that blocking? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7F72BKYKAGICE...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMB6NCNJL2WKED...
Just the vdsm defaults
vm.dirty_ratio = 5 vm.dirty_background_ratio = 2
these boxes only have 8gb of ram as well, so those percentages should be super small.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5U6QGARQSLFXMP...
i will try this,
I went in and disabled TCP offload on all the nics, huge performance boost. went from 110MB/s to 240MB/s seq writes, reads lost a bit of performance going down to 680MB/s, but that's a decent trade off. Latency is still really high though, need to work on that. I think some more TCP tuning might help.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XFZQSMKDZZTTPD...
Those changes didn't do a whole lot, but i ended up enabling performance.read-ahead on the gluster volume. my blockdev read ahead values were already 8192, which seemed good enough. Not sure if ovirt set those, or if it's just the defaults of my raid controller. Anyways up to 350MB/s writes, 700MB/s reads. Which so happens to correlate with the saturation of my 10G network. Latency is still a slight issue, but at least now im not blocking :)

Interesting. Who’s 10g cards and which offload settings did you disable? Did you do that on the servers or the vm host clients or both?
On Apr 15, 2019, at 11:37 AM, Alex McWhirter <alex@triadic.us> wrote:
I went in and disabled TCP offload on all the nics, huge performance boost. went from 110MB/s to 240MB/s seq writes, reads lost a bit of performance going down to 680MB/s, but that's a decent trade off. Latency is still really high though, need to work on that. I think some more TCP tuning might help.
Those changes didn't do a whole lot, but i ended up enabling performance.read-ahead on the gluster volume. my blockdev read ahead values were already 8192, which seemed good enough. Not sure if ovirt set those, or if it's just the defaults of my raid controller.
Anyways up to 350MB/s writes, 700MB/s reads. Which so happens to correlate with the saturation of my 10G network. Latency is still a slight issue, but at least now im not blocking :)
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5COPHAIVCVK42K...

On 2019-04-15 12:43, Darrell Budic wrote:
Interesting. Who's 10g cards and which offload settings did you disable? Did you do that on the servers or the vm host clients or both?
On Apr 15, 2019, at 11:37 AM, Alex McWhirter <alex@triadic.us> wrote:
I went in and disabled TCP offload on all the nics, huge performance boost. went from 110MB/s to 240MB/s seq writes, reads lost a bit of performance going down to 680MB/s, but that's a decent trade off. Latency is still really high though, need to work on that. I think some more TCP tuning might help.
Those changes didn't do a whole lot, but i ended up enabling performance.read-ahead on the gluster volume. my blockdev read ahead values were already 8192, which seemed good enough. Not sure if ovirt set those, or if it's just the defaults of my raid controller.
Anyways up to 350MB/s writes, 700MB/s reads. Which so happens to correlate with the saturation of my 10G network. Latency is still a slight issue, but at least now im not blocking :)
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5COPHAIVCVK42K...
These are dual port qlogic QLGE cards, plugging into dual Cisco Nexus 3064's with VPC to allow me to LACP across two switches. These are FCoE/10GBE cards, so on the Cisco Switches i had to disable lldp on the ports to stop FCoE initiator errors from disabling ports (as i don't use FCoE atm) bond options are "mode=4 lacp_rate=1 miimon=100 xmit_hash_policy=1" then i have the following /sbin/ifup-local script that triggers on storage network creation #!/bin/bash case "$1" in Storage) /sbin/ethtool -K ens2f0 tx off rx off tso off gso off /sbin/ethtool -K ens2f1 tx off rx off tso off gso off /sbin/ip link set dev ens2f0 txqueuelen 10000 /sbin/ip link set dev ens2f1 txqueuelen 10000 /sbin/ip link set dev bond2 txqueuelen 10000 /sbin/ip link set dev Storage txqueuelen 10000 ;; *) ;; esac exit 0 if you have lro, disable it too IMO, these cards do not do lro so it's not applicable to me. This did cut down my read performance by about 50MB/s, but my write went from 98-110MB/s to about 240MB/s, then enabling read-ahead got me to the 350MB/s it should have been.

On 2019-04-15 12:58, Alex McWhirter wrote:
On 2019-04-15 12:43, Darrell Budic wrote: Interesting. Who's 10g cards and which offload settings did you disable? Did you do that on the servers or the vm host clients or both?
On Apr 15, 2019, at 11:37 AM, Alex McWhirter <alex@triadic.us> wrote:
I went in and disabled TCP offload on all the nics, huge performance boost. went from 110MB/s to 240MB/s seq writes, reads lost a bit of performance going down to 680MB/s, but that's a decent trade off. Latency is still really high though, need to work on that. I think some more TCP tuning might help.
Those changes didn't do a whole lot, but i ended up enabling performance.read-ahead on the gluster volume. my blockdev read ahead values were already 8192, which seemed good enough. Not sure if ovirt set those, or if it's just the defaults of my raid controller.
Anyways up to 350MB/s writes, 700MB/s reads. Which so happens to correlate with the saturation of my 10G network. Latency is still a slight issue, but at least now im not blocking :)
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5COPHAIVCVK42K...
These are dual port qlogic QLGE cards, plugging into dual Cisco Nexus 3064's with VPC to allow me to LACP across two switches. These are FCoE/10GBE cards, so on the Cisco Switches i had to disable lldp on the ports to stop FCoE initiator errors from disabling ports (as i don't use FCoE atm) bond options are "mode=4 lacp_rate=1 miimon=100 xmit_hash_policy=1" then i have the following /sbin/ifup-local script that triggers on storage network creation #!/bin/bash case "$1" in Storage) /sbin/ethtool -K ens2f0 tx off rx off tso off gso off /sbin/ethtool -K ens2f1 tx off rx off tso off gso off /sbin/ip link set dev ens2f0 txqueuelen 10000 /sbin/ip link set dev ens2f1 txqueuelen 10000 /sbin/ip link set dev bond2 txqueuelen 10000 /sbin/ip link set dev Storage txqueuelen 10000 ;; *) ;; esac exit 0 if you have lro, disable it too IMO, these cards do not do lro so it's not applicable to me. This did cut down my read performance by about 50MB/s, but my write went from 98-110MB/s to about 240MB/s, then enabling read-ahead got me to the 350MB/s it should have been. Oh and i did it on both, the VM hosts and storage machines. Same cards in all of them.
participants (4)
-
Alex McWhirter
-
Darrell Budic
-
Strahil
-
Strahil Nikolov