
On Jun 19, 2017, at 9:46 AM, Chris Boot <bootc@bootc.net> wrote: =20 Hi folks, =20 I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10 configuration. My VMs run off a replica 3 arbiter 1 volume comprised = of 6 bricks, which themselves live on two SSDs in each of the servers = (one brick per SSD). The bricks are XFS on LVM thin volumes straight onto =
--Apple-Mail=_DC124CB5-AC50-425D-BE5D-768F40A9DA3F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Chris- You probably need to head over to gluster-users@gluster.org = <mailto:gluster-users@gluster.org> for help with performance issues. That said, what kind of performance are you getting, via some form or = testing like bonnie++ or even dd runs? Raw bricks vs gluster performance = is useful to determine what kind of performance you=E2=80=99re actually = getting. Beyond that, I=E2=80=99d recommend dropping the arbiter bricks and = re-adding them as full replicas, they can=E2=80=99t serve distributed = data in this configuration and may be slowing things down on you. If = you=E2=80=99ve got a storage network setup, make sure it=E2=80=99s using = the largest MTU it can, and consider adding/testing these settings that = I use on my main storage volume: performance.io-thread-count: 32 client.event-threads: 8 server.event-threads: 3 performance.stat-prefetch: on Good luck, -Darrell the
SSDs. Connectivity is 10G Ethernet. =20 Performance within the VMs is pretty terrible. I experience very low throughput and random IO is really bad: it feels like a latency issue. On my oVirt nodes the SSDs are not generally very busy. The 10G = network seems to run without errors (iperf3 gives bandwidth measurements of >=3D=
9.20 Gbits/sec between the three servers). =20 To put this into perspective: I was getting better behaviour from NFS4 on a gigabit connection than I am with GlusterFS on 10G: that doesn't feel right at all. =20 My volume configuration looks like this: =20 Volume Name: vmssd Type: Distributed-Replicate Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) =3D 6 Transport-type: tcp Bricks: Brick1: ovirt3:/gluster/ssd0_vmssd/brick Brick2: ovirt1:/gluster/ssd0_vmssd/brick Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter) Brick4: ovirt3:/gluster/ssd1_vmssd/brick Brick5: ovirt1:/gluster/ssd1_vmssd/brick Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter) Options Reconfigured: nfs.disable: on transport.address-family: inet6 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 features.shard-block-size: 128MB performance.strict-o-direct: on network.ping-timeout: 30 cluster.granular-entry-heal: enable =20 I would really appreciate some guidance on this to try to improve = things because at this rate I will need to reconsider using GlusterFS = altogether. =20 Cheers, Chris =20 --=20 Chris Boot bootc@bootc.net _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_DC124CB5-AC50-425D-BE5D-768F40A9DA3F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">Chris-<div class=3D""><br class=3D""></div><div class=3D"">You = probably need to head over to <a href=3D"mailto:gluster-users@gluster.org"= class=3D"">gluster-users@gluster.org</a> for help with performance = issues.</div><div class=3D""><br class=3D""></div><div class=3D"">That = said, what kind of performance are you getting, via some form or testing = like bonnie++ or even dd runs? Raw bricks vs gluster performance is = useful to determine what kind of performance you=E2=80=99re actually = getting.</div><div class=3D""><br class=3D""></div><div class=3D"">Beyond = that, I=E2=80=99d recommend dropping the arbiter bricks and re-adding = them as full replicas, they can=E2=80=99t serve distributed data in this = configuration and may be slowing things down on you. If you=E2=80=99ve = got a storage network setup, make sure it=E2=80=99s using the largest = MTU it can, and consider adding/testing these settings that I use on my = main storage volume:</div><div class=3D""><br class=3D""></div><div = class=3D""><div style=3D"margin: 0px; line-height: normal;" class=3D""><a = href=3D"http://performance.io" class=3D"">performance.io</a>-thread-count:= 32</div><div style=3D"margin: 0px; line-height: normal;" class=3D""><span= style=3D"font-variant-ligatures: no-common-ligatures" = class=3D"">client.event-threads: 8</span></div><div style=3D"margin: = 0px; line-height: normal;" class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" = class=3D"">server.event-threads: 3</span></div><div style=3D"margin: = 0px; line-height: normal;" class=3D"">performance.stat-prefetch: = on</div></div><div class=3D""><span style=3D"font-variant-ligatures: = no-common-ligatures" class=3D""><br class=3D""></span></div><div = class=3D""><span style=3D"font-variant-ligatures: no-common-ligatures" = class=3D"">Good luck,</span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D""><br = class=3D""></span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D""> = -Darrell</span></div><div class=3D""><br class=3D""></div><div = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On Jun 19, 2017, at 9:46 AM, Chris Boot <<a = href=3D"mailto:bootc@bootc.net" class=3D"">bootc@bootc.net</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = class=3D"">Hi folks,<br class=3D""><br class=3D"">I have 3x servers in a = "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10<br = class=3D"">configuration. My VMs run off a replica 3 arbiter 1 volume = comprised of<br class=3D"">6 bricks, which themselves live on two SSDs = in each of the servers (one<br class=3D"">brick per SSD). The bricks are = XFS on LVM thin volumes straight onto the<br class=3D"">SSDs. = Connectivity is 10G Ethernet.<br class=3D""><br class=3D"">Performance = within the VMs is pretty terrible. I experience very low<br = class=3D"">throughput and random IO is really bad: it feels like a = latency issue.<br class=3D"">On my oVirt nodes the SSDs are not = generally very busy. The 10G network<br class=3D"">seems to run without = errors (iperf3 gives bandwidth measurements of >=3D<br class=3D"">9.20 = Gbits/sec between the three servers).<br class=3D""><br class=3D"">To = put this into perspective: I was getting better behaviour from NFS4<br = class=3D"">on a gigabit connection than I am with GlusterFS on 10G: that = doesn't<br class=3D"">feel right at all.<br class=3D""><br class=3D"">My = volume configuration looks like this:<br class=3D""><br class=3D"">Volume = Name: vmssd<br class=3D"">Type: Distributed-Replicate<br class=3D"">Volume= ID: d5a5ddd1-a140-4e0d-b514-701cfe464853<br class=3D"">Status: = Started<br class=3D"">Snapshot Count: 0<br class=3D"">Number of Bricks: = 2 x (2 + 1) =3D 6<br class=3D"">Transport-type: tcp<br = class=3D"">Bricks:<br class=3D"">Brick1: = ovirt3:/gluster/ssd0_vmssd/brick<br class=3D"">Brick2: = ovirt1:/gluster/ssd0_vmssd/brick<br class=3D"">Brick3: = ovirt2:/gluster/ssd0_vmssd/brick (arbiter)<br class=3D"">Brick4: = ovirt3:/gluster/ssd1_vmssd/brick<br class=3D"">Brick5: = ovirt1:/gluster/ssd1_vmssd/brick<br class=3D"">Brick6: = ovirt2:/gluster/ssd1_vmssd/brick (arbiter)<br class=3D"">Options = Reconfigured:<br class=3D"">nfs.disable: on<br = class=3D"">transport.address-family: inet6<br = class=3D"">performance.quick-read: off<br = class=3D"">performance.read-ahead: off<br class=3D""><a = href=3D"http://performance.io" class=3D"">performance.io</a>-cache: = off<br class=3D"">performance.stat-prefetch: off<br = class=3D"">performance.low-prio-threads: 32<br = class=3D"">network.remote-dio: off<br class=3D"">cluster.eager-lock: = enable<br class=3D"">cluster.quorum-type: auto<br = class=3D"">cluster.server-quorum-type: server<br = class=3D"">cluster.data-self-heal-algorithm: full<br = class=3D"">cluster.locking-scheme: granular<br = class=3D"">cluster.shd-max-threads: 8<br = class=3D"">cluster.shd-wait-qlength: 10000<br class=3D"">features.shard: = on<br class=3D"">user.cifs: off<br class=3D"">storage.owner-uid: 36<br = class=3D"">storage.owner-gid: 36<br class=3D"">features.shard-block-size: = 128MB<br class=3D"">performance.strict-o-direct: on<br = class=3D"">network.ping-timeout: 30<br = class=3D"">cluster.granular-entry-heal: enable<br class=3D""><br = class=3D"">I would really appreciate some guidance on this to try to = improve things<br class=3D"">because at this rate I will need to = reconsider using GlusterFS altogether.<br class=3D""><br = class=3D"">Cheers,<br class=3D"">Chris<br class=3D""><br class=3D"">-- = <br class=3D"">Chris Boot<br class=3D""><a href=3D"mailto:bootc@bootc.net"= class=3D"">bootc@bootc.net</a><br = class=3D"">_______________________________________________<br = class=3D"">Users mailing list<br class=3D"">Users@ovirt.org<br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></div></blockquote></div><br = class=3D""></div></body></html>= --Apple-Mail=_DC124CB5-AC50-425D-BE5D-768F40A9DA3F--