Re: [External] : Re: Poor gluster performances over 10Gbps network
by Marcos Sungaila
Hi Mathieu,
Using dd for disk performance tests is not recommended. Besides needing oflag and iflag options, dd will perform a sequential write what is not a real scenario,
To evaluate your servers, use fio or bonnie++ for performance tests.
These applications will play a more realistic scenario to validate your environment.
Regards,
Marcos
From: Staniforth, Paul <P.Staniforth(a)leedsbeckett.ac.uk>
Sent: quarta-feira, 8 de setembro de 2021 08:17
To: Mathieu Valois <mvalois(a)teicee.com>; users <users(a)ovirt.org>
Subject: [External] : [ovirt-users] Re: Poor gluster performances over 10Gbps network
Hi Mathieu,
with a linux VM using dd without the oflag=sync will mean it's using disk bufffers, hence the faster thoughput at the beginning until the buffers are full.
Regards,
Paul S.
________________________________
From: Mathieu Valois <mvalois(a)teicee.com<mailto:mvalois@teicee.com>>
Sent: 08 September 2021 11:56
To: Staniforth, Paul <P.Staniforth(a)leedsbeckett.ac.uk<mailto:P.Staniforth@leedsbeckett.ac.uk>>; users <users(a)ovirt.org<mailto:users@ovirt.org>>
Subject: Re: [ovirt-users] Poor gluster performances over 10Gbps network
Caution External Mail: Do not click any links or open any attachments unless you trust the sender and know that the content is safe.
Hi Paul,
thank you for your answer.
Indeed I did a `dd` inside a VM to measure the Gluster disk performance. I've also tried `dd` on an hypervisor writing into one of the replicated gluster bricks, which gives good performances (similar to logical volume ones).
Le 08/09/2021 à 12:51, Staniforth, Paul a écrit :
Hi Mathieu,
How are you measuring the Gluster disk performance?
also using dd you should use the
oflag=dsync
to avoid buffer caching.
Regards,
Paul S
________________________________
From: Mathieu Valois <mvalois(a)teicee.com><mailto:mvalois@teicee.com>
Sent: 08 September 2021 10:12
To: users <users(a)ovirt.org><mailto:users@ovirt.org>
Subject: [ovirt-users] Poor gluster performances over 10Gbps network
Caution External Mail: Do not click any links or open any attachments unless you trust the sender and know that the content is safe.
Sorry for double post but I don't know if this mail has been received.
Hello everyone,
I know this issue was already treated on this mailing list. However none of the proposed solutions is satisfying me.
Here is my situation : I've got 3 hyperconverged gluster ovirt nodes, with 6 network interfaces, bounded in bunches of 2 (management, VMs and gluster). The gluster network is on a dedicated bound where the 2 interfaces are directly connected to the 2 other ovirt nodes. Gluster is apparently using it :
# gluster volume status vmstore
Status of volume: vmstore
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster-ov1:/gluster_bricks
/vmstore/vmstore 49152 0 Y 3019
Brick gluster-ov2:/gluster_bricks
/vmstore/vmstore 49152 0 Y 3009
Brick gluster-ov3:/gluster_bricks
/vmstore/vmstore
where 'gluster-ov{1,2,3}' are domain names referencing nodes in the gluster network. This networks has 10Gbps capabilities :
# iperf3 -c gluster-ov3
Connecting to host gluster-ov3, port 5201
[ 5] local 10.20.0.50 port 46220 connected to 10.20.0.51 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.16 GBytes 9.92 Gbits/sec 17 900 KBytes
[ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 900 KBytes
[ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 4 996 KBytes
[ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 1 996 KBytes
[ 5] 4.00-5.00 sec 1.15 GBytes 9.89 Gbits/sec 0 996 KBytes
[ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996 KBytes
[ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996 KBytes
[ 5] 7.00-8.00 sec 1.15 GBytes 9.91 Gbits/sec 0 996 KBytes
[ 5] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996 KBytes
[ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec 22 sender
[ 5] 0.00-10.04 sec 11.5 GBytes 9.86 Gbits/sec receiver
iperf Done.
However, VMs stored on the vmstore gluster volume has poor write performances, oscillating between 100KBps and 30MBps. I almost always observe a write spike (180Mbps) at the beginning until around 500MB written, then it drastically falls at 10MBps, sometimes even less (100KBps). Hypervisors have 32 threads (2 sockets, 8 cores per socket, 2 threads per core).
Here is the volume settings :
Volume Name: vmstore
Type: Replicate
Volume ID: XXX
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster-ov1:/gluster_bricks/vmstore/vmstore
Brick2: gluster-ov2:/gluster_bricks/vmstore/vmstore
Brick3: gluster-ov3:/gluster_bricks/vmstore/vmstore
Options Reconfigured:
performance.io-thread-count: 32 # was 16 by default.
cluster.granular-entry-heal: enable
storage.owner-gid: 36
storage.owner-uid: 36
cluster.lookup-optimize: off
server.keepalive-count: 5
server.keepalive-interval: 2
server.keepalive-time: 10
server.tcp-user-timeout: 20
network.ping-timeout: 30
server.event-threads: 4
client.event-threads: 8 # was 4 by default
cluster.choose-local: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
performance.strict-o-direct: on
network.remote-dio: off
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
When I naively write directly on the logical volume, which is mounted on a material RAID5 3-disks array, I have interesting performances:
# dd if=/dev/zero of=a bs=4M count=2048
2048+0 records in
2048+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 17.2485 s, 498 MB/s #urandom gives around 200MBps
Moreover, hypervisors have SSD which have been configured as lvcache, but I'm unsure how to test it efficiently.
I can't find where is the problem, as every piece of the chain is apparently doing well ...
Thanks anyone for helping me :)
--
[téïcée]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>
Mathieu Valois
Bureau Caen: Quartier Kœnig - 153, rue Géraldine MOCK - 14760 Bretteville-sur-Odon
Bureau Vitré: Zone de la baratière - 12, route de Domalain - 35500 Vitré
02 72 34 13 20 | www.teicee.com<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>
[téïcée sur facebook]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>[téïcée sur twitter]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>[téïcée sur linkedin]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>[téïcée sur viadeo]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>
[Datadocké]
To view the terms under which this email is distributed, please go to:-
https://leedsbeckett.ac.uk/disclaimer/email<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>
--
[téïcée]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>
Mathieu Valois
Bureau Caen: Quartier Kœnig - 153, rue Géraldine MOCK - 14760 Bretteville-sur-Odon
Bureau Vitré: Zone de la baratière - 12, route de Domalain - 35500 Vitré
02 72 34 13 20 | www.teicee.com<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>
[téïcée sur facebook]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>[téïcée sur twitter]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>[téïcée sur linkedin]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>[téïcée sur viadeo]<https://urldefense.com/v3/__https:/eur02.safelinks.protection.outlook.com...>
[Datadocké]
To view the terms under which this email is distributed, please go to:-
https://leedsbeckett.ac.uk/disclaimer/email<https://urldefense.com/v3/__https:/leedsbeckett.ac.uk/disclaimer/email__;...>
3 years, 3 months
Re: Time Drift Issues
by Gianluca Cecchi
On Wed, Sep 8, 2021 at 1:19 PM Nur Imam Febrianto <nur_imam(a)outlook.com>
wrote:
> Hi Marcos,
>
>
>
> Want to clarify one thing. If I’m using and oVirt Node based host (not EL
> Linux Based), does it already configured with ntp client or can I just
> configure any ntp client on the host ? If I just configure ntp client on
> the host either chronyc or ntpd, will it persist after upgrading the node ?
>
>
>
> Thanks before.
>
>
>
> Regards,
>
> Nur Imam Febrianto
>
>
>
>
Just to clarify: when using oVirt node 4.4, during installation you have
the usual way to configure ntp inside anaconda. In 4.4, based on CentOS
(stream) 8, when done that way it will configure chronyd.
In the recent past there was a bug where sometimes installation crashed
when configuring ntp in certain conditions.
If you don't configure ntp at install time you can configure it later by
modifying /etc/chrony.conf and starting/enabling the service from the
cockpit web interface (or command line with usual commands). It will remain
across reboots and/or image updates.
HIH,
Gianluca
3 years, 3 months
Re: Poor gluster performances over 10Gbps network
by Mathieu Valois
Hi Paul,
thank you for your answer.
Indeed I did a `dd` inside a VM to measure the Gluster disk performance.
I've also tried `dd` on an hypervisor writing into one of the replicated
gluster bricks, which gives good performances (similar to logical volume
ones).
Le 08/09/2021 à 12:51, Staniforth, Paul a écrit :
>
> Hi Mathieu,
> How are you measuring the Gluster disk performance?
> also using dd you should use the
> oflag=dsync
>
> to avoid buffer caching.
>
>
> Regards,
>
> Paul S
> ------------------------------------------------------------------------
> *From:* Mathieu Valois <mvalois(a)teicee.com>
> *Sent:* 08 September 2021 10:12
> *To:* users <users(a)ovirt.org>
> *Subject:* [ovirt-users] Poor gluster performances over 10Gbps network
>
> *Caution External Mail:* Do not click any links or open any
> attachments unless you trust the sender and know that the content is safe.
>
> Sorry for double post but I don't know if this mail has been received.
>
> Hello everyone,
>
> I know this issue was already treated on this mailing list. However
> none of the proposed solutions is satisfying me.
>
> Here is my situation : I've got 3 hyperconverged gluster ovirt nodes,
> with 6 network interfaces, bounded in bunches of 2 (management, VMs
> and gluster). The gluster network is on a dedicated bound where the 2
> interfaces are directly connected to the 2 other ovirt nodes. Gluster
> is apparently using it :
>
> # gluster volume status vmstore
> Status of volume: vmstore
> Gluster process TCP Port RDMA Port
> Online Pid
> ------------------------------------------------------------------------------
> Brick gluster-ov1:/gluster_bricks
> /vmstore/vmstore 49152 0
> Y 3019
> Brick gluster-ov2:/gluster_bricks
> /vmstore/vmstore 49152 0
> Y 3009
> Brick gluster-ov3:/gluster_bricks
> /vmstore/vmstore
>
> where 'gluster-ov{1,2,3}' are domain names referencing nodes in the
> gluster network. This networks has 10Gbps capabilities :
>
> # iperf3 -c gluster-ov3
> Connecting to host gluster-ov3, port 5201
> [ 5] local 10.20.0.50 port 46220 connected to 10.20.0.51 port 5201
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 1.16 GBytes 9.92 Gbits/sec 17 900 KBytes
> [ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 900 KBytes
> [ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 4 996 KBytes
> [ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 1 996 KBytes
> [ 5] 4.00-5.00 sec 1.15 GBytes 9.89 Gbits/sec 0 996 KBytes
> [ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996 KBytes
> [ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996 KBytes
> [ 5] 7.00-8.00 sec 1.15 GBytes 9.91 Gbits/sec 0 996 KBytes
> [ 5] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996 KBytes
> [ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 996 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec
> 22 sender
> [ 5] 0.00-10.04 sec 11.5 GBytes 9.86
> Gbits/sec receiver
>
> iperf Done.
>
> However, VMs stored on the vmstore gluster volume has poor write
> performances, oscillating between 100KBps and 30MBps. I almost always
> observe a write spike (180Mbps) at the beginning until around 500MB
> written, then it drastically falls at 10MBps, sometimes even less
> (100KBps). Hypervisors have 32 threads (2 sockets, 8 cores per socket,
> 2 threads per core).
>
> Here is the volume settings :
>
> Volume Name: vmstore
> Type: Replicate
> Volume ID: XXX
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster-ov1:/gluster_bricks/vmstore/vmstore
> Brick2: gluster-ov2:/gluster_bricks/vmstore/vmstore
> Brick3: gluster-ov3:/gluster_bricks/vmstore/vmstore
> Options Reconfigured:
> performance.io-thread-count: 32 # was 16 by default.
> cluster.granular-entry-heal: enable
> storage.owner-gid: 36
> storage.owner-uid: 36
> cluster.lookup-optimize: off
> server.keepalive-count: 5
> server.keepalive-interval: 2
> server.keepalive-time: 10
> server.tcp-user-timeout: 20
> network.ping-timeout: 30
> server.event-threads: 4
> client.event-threads: 8 # was 4 by default
> cluster.choose-local: off
> features.shard: on
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> performance.strict-o-direct: on
> network.remote-dio: off
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> auth.allow: *
> user.cifs: off
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: on
>
> When I naively write directly on the logical volume, which is mounted
> on a material RAID5 3-disks array, I have interesting performances:
>
> # dd if=/dev/zero of=a bs=4M count=2048
> 2048+0 records in
> 2048+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 17.2485 s, 498 MB/s
> #urandom gives around 200MBps
>
> Moreover, hypervisors have SSD which have been configured as lvcache,
> but I'm unsure how to test it efficiently.
>
> I can't find where is the problem, as every piece of the chain is
> apparently doing well ...
>
> Thanks anyone for helping me :)
>
> --
> téïcée
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.tei...>
> *Mathieu Valois*
>
> Bureau Caen: Quartier Kœnig - 153, rue Géraldine MOCK - 14760
> Bretteville-sur-Odon
> Bureau Vitré: Zone de la baratière - 12, route de Domalain - 35500 Vitré
> 02 72 34 13 20 | www.teicee.com
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.tei...>
>
>
> téïcée sur facebook
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fac...>
> téïcée sur twitter
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter...>
> téïcée sur linkedin
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.lin...>
> téïcée sur viadeo
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffr.viad...>
> Datadocké
>
>
> To view the terms under which this email is distributed, please go to:-
> https://leedsbeckett.ac.uk/disclaimer/email
> <https://leedsbeckett.ac.uk/disclaimer/email>
--
téïcée <https://www.teicee.com/?pk_campaign=Email> *Mathieu Valois*
Bureau Caen: Quartier Kœnig - 153, rue Géraldine MOCK - 14760
Bretteville-sur-Odon
Bureau Vitré: Zone de la baratière - 12, route de Domalain - 35500 Vitré
02 72 34 13 20 | www.teicee.com <https://www.teicee.com/?pk_campaign=Email>
téïcée sur facebook <https://www.facebook.com/teicee> téïcée sur twitter
<https://twitter.com/Teicee_fr> téïcée sur linkedin
<https://www.linkedin.com/company/t-c-e> téïcée sur viadeo
<https://fr.viadeo.com/fr/company/teicee> Datadocké
3 years, 3 months
Re: Time Drift Issues
by Nur Imam Febrianto
Hi Marcos,
Want to clarify one thing. If I’m using and oVirt Node based host (not EL Linux Based), does it already configured with ntp client or can I just configure any ntp client on the host ? If I just configure ntp client on the host either chronyc or ntpd, will it persist after upgrading the node ?
Thanks before.
Regards,
Nur Imam Febrianto
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
From: Marcos Sungaila<mailto:marcos.sungaila@oracle.com>
Sent: Tuesday, 07 September 2021 20:54
To: Nur Imam Febrianto<mailto:nur_imam@outlook.com>; oVirt Users<mailto:users@ovirt.org>
Subject: RE: Time Drift Issues
Hi Nur,
ntpd has a 300sec limit for synchronization. If you want a more flexible NTP client, use chronyd.
If you prefer to use ntpd, you should have ntpdate as a boot client and ntpd as a runtime client.
When the server boots, the ntpdate client will sync time no matter the difference. As a boot client, it runs only once and exits. Then the ntpd service starts, and keep your server clock in sync with an external source.
On distributions like CentOS and similar, ntpdate reads the external time sources configuration from ntpd.conf. You should need booth packages and services installed and enabled:
I prefer chronyd since it does not need any extra service or procedure to keep clock synchronization.
Regards,
Marcos
From: Nur Imam Febrianto <nur_imam(a)outlook.com>
Sent: terça-feira, 7 de setembro de 2021 01:08
To: oVirt Users <users(a)ovirt.org>
Subject: [External] : [ovirt-users] Time Drift Issues
Hi All,
Recently I got an warning in our cluster about time drift :
Host xxxxx has time-drift of 672 seconds while maximum configured value is 300 seconds.
What should I do to address this issue ? Should I reconfigure / configure ntp client on all host ?
Thanks before.
Regards,
Nur Imam Febrianto
Sent from Mail<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldef...> for Windows
3 years, 3 months
mom-vdsm doesn't restart after upgrade
by Nathanaël Blanchet
Hello,
I performed a host upgrade from 4.4.5 to 4.4.8, and vdsmd failed to
launch with this log:
Sep 07 16:16:52 kamen systemd[1]: mom-vdsm.service: Job
mom-vdsm.service/start failed with result 'dependency'.
Sep 07 16:16:54 kamen systemd[1]: Dependency failed for MOM instance
configured for VDSM purposes.
Following this ticket
https://bugzilla.redhat.com/show_bug.cgi?id=1557735, I managed to launch
mom-vdsm and so vdsmd doing:
vdsm-tool configure
--
Nathanaël Blanchet
Supervision réseau
SIRE
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr
3 years, 3 months
GlusterFS Monitoring/Alerting
by simon@justconnect.ie
Hi All,
Does anyone have recommendations for GlusterFS monitoring/alerting software and or plugins.
Kind regards
Simon...
3 years, 3 months
Re: Time Drift Issues
by Marcos Sungaila
Hi Nur,
ntpd has a 300sec limit for synchronization. If you want a more flexible NTP client, use chronyd.
If you prefer to use ntpd, you should have ntpdate as a boot client and ntpd as a runtime client.
When the server boots, the ntpdate client will sync time no matter the difference. As a boot client, it runs only once and exits. Then the ntpd service starts, and keep your server clock in sync with an external source.
On distributions like CentOS and similar, ntpdate reads the external time sources configuration from ntpd.conf. You should need booth packages and services installed and enabled:
I prefer chronyd since it does not need any extra service or procedure to keep clock synchronization.
Regards,
Marcos
From: Nur Imam Febrianto <nur_imam(a)outlook.com>
Sent: terça-feira, 7 de setembro de 2021 01:08
To: oVirt Users <users(a)ovirt.org>
Subject: [External] : [ovirt-users] Time Drift Issues
Hi All,
Recently I got an warning in our cluster about time drift :
Host xxxxx has time-drift of 672 seconds while maximum configured value is 300 seconds.
What should I do to address this issue ? Should I reconfigure / configure ntp client on all host ?
Thanks before.
Regards,
Nur Imam Febrianto
Sent from Mail<https://urldefense.com/v3/__https:/go.microsoft.com/fwlink/?LinkId=550986...> for Windows
3 years, 3 months
Re: Time Drift Issues
by dhanaraj.ramesh@yahoo.com
change the host and engine time to local Time zone and ensure NTP client pointed to same NTP server
3 years, 3 months