Thank you for the analisys:
The version is the last distributed in the ovirt@centos8 distribution:
[root@ovirt-node2 ~]# rpm -qa | grep '\(glusterfs-server\|ovirt-node\)'
ovirt-node-ng-image-update-placeholder-4.5.2-1.el8.noarch
glusterfs-server-10.2-1.el8s.x86_64
ovirt-node-ng-nodectl-4.4.2-1.el8.noarch
python3-ovirt-node-ng-nodectl-4.4.2-1.el8.noarch
ovirt-node-ng-image-update-4.5.2-1.el8.noarch
[root@ovirt-node3 ~]# rpm -qa | grep '\(glusterfs-server\|ovirt-node\)'
ovirt-node-ng-image-update-placeholder-4.5.2-1.el8.noarch
glusterfs-server-10.2-1.el8s.x86_64
ovirt-node-ng-nodectl-4.4.2-1.el8.noarch
python3-ovirt-node-ng-nodectl-4.4.2-1.el8.noarch
ovirt-node-ng-image-update-4.5.2-1.el8.noarch
[root@ovirt-node4 ~]# rpm -qa | grep '\(glusterfs-server\|ovirt-node\)'
ovirt-node-ng-image-update-placeholder-4.5.2-1.el8.noarch
glusterfs-server-10.2-1.el8s.x86_64
ovirt-node-ng-nodectl-4.4.2-1.el8.noarch
python3-ovirt-node-ng-nodectl-4.4.2-1.el8.noarch
ovirt-node-ng-image-update-4.5.2-1.el8.noarch
Duiring backup (or when there is an input/output (even not to intensive looking at the SDD
led) the only think I noticed is that sometimes there is a sort of lag:
I issue "gluster volume heal glen|gv0|gv1 info" and the answer wait 4-5 seconds
before answer... even if the aswer give 0 object missing... I have ever connected nodes.
eg.
Brick ovirt-node2.ovirt:/brickhe/_glen
Status: Connected
Number of entries: 0
Brick ovirt-node3.ovirt:/brickhe/glen
Status: Connected
Number of entries: 0
Brick ovirt-node4.ovirt:/dati/_glen
Status: Connected
Number of entries: 0
For hte "rate limit" I didn't work on the QOS, but the destination is an NFS
sata raid5 NAS publisced via 1Gb link so I think I have a 20MB/s "cap" by
architecture, the gluster bricks are all built by SSD SATA drives, I recorded a troughput
of 200MB/s.
I also tried to monitor performace via iotop command but I didn't recorded a
"band problem" and even monitored network via iftop recording no band saturation
and no errors.
Searching in the gluster mailing list
(
https://lists.gluster.org/pipermail/gluster-users/2022-September/040063.html) I tried the
same test but under 1/10 seconds write and read:
[root@ovirt-node2 ~]# su - vdsm -s /bin/bash
Last login: Wed Sep 14 15:33:45 UTC 2022 on pts/1
nodectl must be run as root!
nodectl must be run as root!
cd /rhev/data-center/mnt/glusterSD/ovirt-node2.ovirt:_gv1; while sleep 0.1; do date
+'%s.%N' | tee testfile ; done
ovirt-node-ng-image-update-4.5.2-1.el8.noarch
[root@ovirt-node3 ~]# su - vdsm -s /bin/bash
nodectl must be run as root!
nodectl must be run as root!
[vdsm@ovirt-node3 ~]$ cd /rhev/data-center/mnt/glusterSD/ovirt-node2.ovirt:_gv1; while
sleep 0.1 ; do date +' %s.%N'; cat testfile ; done
[root@ovirt-node4 ~]# su - vdsm -s /bin/bash
Last login: Wed Aug 24 16:52:55 UTC 2022
nodectl must be run as root!
nodectl must be run as root!
[vdsm@ovirt-node4 ~]$ cd /rhev/data-center/mnt/glusterSD/ovirt-node2.ovirt:_gv1; while
sleep 0.1 ; do date +' %s.%N'; cat testfile ; done
Obtaining that for the nodes reading glusterfs I record only a 1 second update... more or
less:
to report the test I selected timestamp for node2 (the write node) betweeen 1663228352 and
1663228356, for node3 and 4 between 1663228353 and 1663228356:
node2:
1663228352.589998302
1663228352.695887198
1663228352.801681699
1663228352.907548634
1663228353.011931276
1663228353.115904115
1663228353.222383590
1663228353.329941123
1663228353.436480791
1663228353.540536995
1663228353.644858473
1663228353.749470221
1663228353.853969491
1663228353.958703186
1663228354.062732971
1663228354.166616934
1663228354.270398507
1663228354.373989214
1663228354.477149100
1663228354.581862187
1663228354.686177524
1663228354.790362507
1663228354.894673446
1663228354.999136257
1663228355.102889616
1663228355.207043913
1663228355.312522545
1663228355.416667384
1663228355.520897473
1663228355.624582255
1663228355.728590069
1663228355.832979634
1663228355.937309737
1663228356.042289521
1663228356.146565174
1663228356.250773672
1663228356.356361818
1663228356.460048755
1663228356.565054968
1663228356.669126850
1663228356.773807899
1663228356.878011739
1663228356.983842597
node3:
1663228353.027991911
1663228352.064562785
1663228353.129696675
1663228353.115904115
1663228353.232351572
1663228353.115904115
1663228353.334188748
1663228353.115904115
1663228353.436208688
1663228353.115904115
1663228353.538268493
1663228353.115904115
1663228353.641266519
1663228353.115904115
1663228353.743094997
1663228353.115904115
1663228353.845244131
1663228353.115904115
1663228353.947049766
1663228353.115904115
1663228354.048876741
1663228353.115904115
1663228354.150979017
1663228354.062732971
1663228354.254198339
1663228354.062732971
1663228354.356197640
1663228354.270398507
1663228354.459541685
1663228354.270398507
1663228354.561548541
1663228354.270398507
1663228354.664280563
1663228354.270398507
1663228354.766557007
1663228354.270398507
1663228354.868323950
1663228354.270398507
1663228354.970093106
1663228354.270398507
1663228355.072195391
1663228354.270398507
1663228355.174244725
1663228354.270398507
1663228355.276335829
1663228355.207043913
1663228355.380891924
1663228355.207043913
1663228355.483239210
1663228355.207043913
1663228355.585240135
1663228355.207043913
1663228355.687486532
1663228355.207043913
1663228355.789322563
1663228355.207043913
1663228355.891375906
1663228355.207043913
1663228355.993212340
1663228355.207043913
1663228356.094918478
1663228355.207043913
1663228356.196910915
1663228355.207043913
1663228356.299065941
1663228356.250773672
1663228356.402899261
1663228356.250773672
1663228356.504898603
1663228356.250773672
1663228356.606802284
1663228356.250773672
1663228356.709301567
1663228356.250773672
1663228356.811021872
1663228356.250773672
1663228356.913016384
node4:
1663228353.091321741
1663228352.589998302
1663228353.192613374
1663228352.589998302
1663228353.293885581
1663228352.589998302
1663228353.395206449
1663228352.589998302
1663228353.496548707
1663228352.589998302
1663228353.597914229
1663228352.589998302
1663228353.699232292
1663228353.644858473
1663228353.801913729
1663228353.644858473
1663228353.903240939
1663228353.644858473
1663228354.005266462
1663228353.644858473
1663228354.106513772
1663228353.644858473
1663228354.207908519
1663228353.644858473
1663228354.309278430
1663228353.644858473
1663228354.410932083
1663228353.644858473
1663228354.512335006
1663228353.644858473
1663228354.613632691
1663228353.644858473
1663228354.714993594
1663228354.686177524
1663228354.817491576
1663228354.686177524
1663228354.918799429
1663228354.686177524
1663228355.020439267
1663228354.686177524
1663228355.121832050
1663228354.686177524
1663228355.223172355
1663228354.686177524
1663228355.324540271
1663228354.686177524
1663228355.425842454
1663228354.686177524
1663228355.527215380
1663228354.686177524
1663228355.628587564
1663228354.686177524
1663228355.729968575
1663228355.832452340
1663228355.933988683
1663228356.036161934
1663228356.137618036
1663228356.239021910
1663228356.340352667
1663228356.441720413
1663228356.543116683
1663228356.644490180
1663228356.745882022
1663228356.669126850
1663228356.848421632
1663228356.669126850
1663228356.949864539
Analizing this test I can record these issues:
1. see, node2 that seem not to have written the 1663228352.064562785 that node3 read...
2. node4 seemed disconnected between 1663228355.628587564 and 1663228356.745882022
3. node3 and node4 read only updates to the test file that are ~1 second old
I think digging we found the tip of the iceberg....