Re: [ovirt-users] Very poor GlusterFS performance

19 Jun 2017

      ...
On Jun 19, 2017, at 9:46 AM, Chris Boot <bootc@bootc.net> wrote:
=20
Hi folks,
=20
I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10
configuration. My VMs run off a replica 3 arbiter 1 volume comprised =
of
6 bricks, which themselves live on two SSDs in each of the servers =
(one
brick per SSD). The bricks are XFS on LVM thin volumes straight onto =
--Apple-Mail=_DC124CB5-AC50-425D-BE5D-768F40A9DA3F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Chris-

You probably need to head over to gluster-users@gluster.org =
<mailto:gluster-users@gluster.org> for help with performance issues.

That said, what kind of performance are you getting, via some form or =
testing like bonnie++ or even dd runs? Raw bricks vs gluster performance =
is useful to determine what kind of performance you=E2=80=99re actually =
getting.

Beyond that, I=E2=80=99d recommend dropping the arbiter bricks and =
re-adding them as full replicas, they can=E2=80=99t serve distributed =
data in this configuration and may be slowing things down on you. If =
you=E2=80=99ve got a storage network setup, make sure it=E2=80=99s using =
the largest MTU it can, and consider adding/testing these settings that =
I use on my main storage volume:

performance.io-thread-count: 32
client.event-threads: 8
server.event-threads: 3
performance.stat-prefetch: on

Good luck,

  -Darrell

the
...
SSDs. Connectivity is 10G Ethernet.
=20
Performance within the VMs is pretty terrible. I experience very low
throughput and random IO is really bad: it feels like a latency issue.
On my oVirt nodes the SSDs are not generally very busy. The 10G =
network
seems to run without errors (iperf3 gives bandwidth measurements of >=3D=
...
9.20 Gbits/sec between the three servers).
=20
To put this into perspective: I was getting better behaviour from NFS4
on a gigabit connection than I am with GlusterFS on 10G: that doesn't
feel right at all.
=20
My volume configuration looks like this:
=20
Volume Name: vmssd
Type: Distributed-Replicate
Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) =3D 6
Transport-type: tcp
Bricks:
Brick1: ovirt3:/gluster/ssd0_vmssd/brick
Brick2: ovirt1:/gluster/ssd0_vmssd/brick
Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
Brick4: ovirt3:/gluster/ssd1_vmssd/brick
Brick5: ovirt1:/gluster/ssd1_vmssd/brick
Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet6
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
features.shard-block-size: 128MB
performance.strict-o-direct: on
network.ping-timeout: 30
cluster.granular-entry-heal: enable
=20
I would really appreciate some guidance on this to try to improve =
things
because at this rate I will need to reconsider using GlusterFS =
altogether.
=20
Cheers,
Chris
=20
--=20
Chris Boot
bootc@bootc.net
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_DC124CB5-AC50-425D-BE5D-768F40A9DA3F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">Chris-<div class=3D""><br class=3D""></div><div class=3D"">You =
probably need to head over to <a href=3D"mailto:gluster-users@gluster.org"=
 class=3D"">gluster-users@gluster.org</a> for help with performance =
issues.</div><div class=3D""><br class=3D""></div><div class=3D"">That =
said, what kind of performance are you getting, via some form or testing =
like bonnie++ or even dd runs? Raw bricks vs gluster performance is =
useful to determine what kind of performance you=E2=80=99re actually =
getting.</div><div class=3D""><br class=3D""></div><div class=3D"">Beyond =
that, I=E2=80=99d recommend dropping the arbiter bricks and re-adding =
them as full replicas, they can=E2=80=99t serve distributed data in this =
configuration and may be slowing things down on you. If you=E2=80=99ve =
got a storage network setup, make sure it=E2=80=99s using the largest =
MTU it can, and consider adding/testing these settings that I use on my =
main storage volume:</div><div class=3D""><br class=3D""></div><div =
class=3D""><div style=3D"margin: 0px; line-height: normal;" class=3D""><a =
href=3D"http://performance.io" class=3D"">performance.io</a>-thread-count:=
 32</div><div style=3D"margin: 0px; line-height: normal;" class=3D""><span=
 style=3D"font-variant-ligatures: no-common-ligatures" =
class=3D"">client.event-threads: 8</span></div><div style=3D"margin: =
0px; line-height: normal;" class=3D""><span =
style=3D"font-variant-ligatures: no-common-ligatures" =
class=3D"">server.event-threads: 3</span></div><div style=3D"margin: =
0px; line-height: normal;" class=3D"">performance.stat-prefetch: =
on</div></div><div class=3D""><span style=3D"font-variant-ligatures: =
no-common-ligatures" class=3D""><br class=3D""></span></div><div =
class=3D""><span style=3D"font-variant-ligatures: no-common-ligatures" =
class=3D"">Good luck,</span></div><div class=3D""><span =
style=3D"font-variant-ligatures: no-common-ligatures" class=3D""><br =
class=3D""></span></div><div class=3D""><span =
style=3D"font-variant-ligatures: no-common-ligatures" class=3D"">  =
-Darrell</span></div><div class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div =
class=3D"">On Jun 19, 2017, at 9:46 AM, Chris Boot <<a =
href=3D"mailto:bootc@bootc.net" class=3D"">bootc@bootc.net</a>> =
wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div =
class=3D"">Hi folks,<br class=3D""><br class=3D"">I have 3x servers in a =
"hyper-converged" oVirt 4.1.2 + GlusterFS 3.10<br =
class=3D"">configuration. My VMs run off a replica 3 arbiter 1 volume =
comprised of<br class=3D"">6 bricks, which themselves live on two SSDs =
in each of the servers (one<br class=3D"">brick per SSD). The bricks are =
XFS on LVM thin volumes straight onto the<br class=3D"">SSDs. =
Connectivity is 10G Ethernet.<br class=3D""><br class=3D"">Performance =
within the VMs is pretty terrible. I experience very low<br =
class=3D"">throughput and random IO is really bad: it feels like a =
latency issue.<br class=3D"">On my oVirt nodes the SSDs are not =
generally very busy. The 10G network<br class=3D"">seems to run without =
errors (iperf3 gives bandwidth measurements of >=3D<br class=3D"">9.20 =
Gbits/sec between the three servers).<br class=3D""><br class=3D"">To =
put this into perspective: I was getting better behaviour from NFS4<br =
class=3D"">on a gigabit connection than I am with GlusterFS on 10G: that =
doesn't<br class=3D"">feel right at all.<br class=3D""><br class=3D"">My =
volume configuration looks like this:<br class=3D""><br class=3D"">Volume =
Name: vmssd<br class=3D"">Type: Distributed-Replicate<br class=3D"">Volume=
 ID: d5a5ddd1-a140-4e0d-b514-701cfe464853<br class=3D"">Status: =
Started<br class=3D"">Snapshot Count: 0<br class=3D"">Number of Bricks: =
2 x (2 + 1) =3D 6<br class=3D"">Transport-type: tcp<br =
class=3D"">Bricks:<br class=3D"">Brick1: =
ovirt3:/gluster/ssd0_vmssd/brick<br class=3D"">Brick2: =
ovirt1:/gluster/ssd0_vmssd/brick<br class=3D"">Brick3: =
ovirt2:/gluster/ssd0_vmssd/brick (arbiter)<br class=3D"">Brick4: =
ovirt3:/gluster/ssd1_vmssd/brick<br class=3D"">Brick5: =
ovirt1:/gluster/ssd1_vmssd/brick<br class=3D"">Brick6: =
ovirt2:/gluster/ssd1_vmssd/brick (arbiter)<br class=3D"">Options =
Reconfigured:<br class=3D"">nfs.disable: on<br =
class=3D"">transport.address-family: inet6<br =
class=3D"">performance.quick-read: off<br =
class=3D"">performance.read-ahead: off<br class=3D""><a =
href=3D"http://performance.io" class=3D"">performance.io</a>-cache: =
off<br class=3D"">performance.stat-prefetch: off<br =
class=3D"">performance.low-prio-threads: 32<br =
class=3D"">network.remote-dio: off<br class=3D"">cluster.eager-lock: =
enable<br class=3D"">cluster.quorum-type: auto<br =
class=3D"">cluster.server-quorum-type: server<br =
class=3D"">cluster.data-self-heal-algorithm: full<br =
class=3D"">cluster.locking-scheme: granular<br =
class=3D"">cluster.shd-max-threads: 8<br =
class=3D"">cluster.shd-wait-qlength: 10000<br class=3D"">features.shard: =
on<br class=3D"">user.cifs: off<br class=3D"">storage.owner-uid: 36<br =
class=3D"">storage.owner-gid: 36<br class=3D"">features.shard-block-size: =
128MB<br class=3D"">performance.strict-o-direct: on<br =
class=3D"">network.ping-timeout: 30<br =
class=3D"">cluster.granular-entry-heal: enable<br class=3D""><br =
class=3D"">I would really appreciate some guidance on this to try to =
improve things<br class=3D"">because at this rate I will need to =
reconsider using GlusterFS altogether.<br class=3D""><br =
class=3D"">Cheers,<br class=3D"">Chris<br class=3D""><br class=3D"">-- =
<br class=3D"">Chris Boot<br class=3D""><a href=3D"mailto:bootc@bootc.net"=
 class=3D"">bootc@bootc.net</a><br =
class=3D"">_______________________________________________<br =
class=3D"">Users mailing list<br class=3D"">Users@ovirt.org<br =
class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br =
class=3D""></div></div></blockquote></div><br =
class=3D""></div></body></html>=

--Apple-Mail=_DC124CB5-AC50-425D-BE5D-768F40A9DA3F--

Re: [ovirt-users] Very poor GlusterFS performance

Darrell Budic