Re: [ovirt-users] Replicated Glusterfs on top of ZFS

Sunday, 5 March 2017

+gluster-users 

Regards,
Ramesh

----- Original Message -----
...
 From: "Arman Khalatyan" <arm2arm(a)gmail.com&gt;
 To: "Juan Pablo" <pablo.localhost(a)gmail.com&gt;
 Cc: "users" <users(a)ovirt.org&gt;, "FERNANDO FREDIANI"
<fernando.frediani(a)upx.com&gt;
 Sent: Friday, March 3, 2017 8:32:31 PM
 Subject: Re: [ovirt-users] Replicated Glusterfs on top of ZFS

 The problem itself is not the streaming data performance., and also dd zero
 does not help much in the production zfs running with compression.
 the main problem comes when the gluster is starting to do something with
 that, it is using xattrs, probably accessing extended attributes inside the
 zfs is slower than XFS.
 Also primitive find file or ls -l in the (dot)gluster folders takes ages:

 now I can see that arbiter host has almost 100% cache miss during the
 rebuild, which is actually natural while he is reading always the new
 datasets:
 [root@clei26 ~]# arcstat.py 1
 time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
 15:57:31 29 29 100 29 100 0 0 29 100 685M 31G
 15:57:32 530 476 89 476 89 0 0 457 89 685M 31G
 15:57:33 480 467 97 467 97 0 0 463 97 685M 31G
 15:57:34 452 443 98 443 98 0 0 435 97 685M 31G
 15:57:35 582 547 93 547 93 0 0 536 94 685M 31G
 15:57:36 439 417 94 417 94 0 0 393 94 685M 31G
 15:57:38 435 392 90 392 90 0 0 374 89 685M 31G
 15:57:39 364 352 96 352 96 0 0 352 96 685M 31G
 15:57:40 408 375 91 375 91 0 0 360 91 685M 31G
 15:57:41 552 539 97 539 97 0 0 539 97 685M 31G

 It looks like we cannot have in the same system performance and reliability
 :(
 Simply final conclusion is with the single disk+ssd even zfs doesnot help to
 speedup the glusterfs healing.
 I will stop here:)

 On Fri, Mar 3, 2017 at 3:35 PM, Juan Pablo < pablo.localhost(a)gmail.com >
 wrote:

 cd to inside the pool path
 then dd if=/dev/zero of= test.tt bs=1M
 leave it runing 5/10 minutes.
 do ctrl+c paste result here.
 etc.

 2017-03-03 11:30 GMT-03:00 Arman Khalatyan < arm2arm(a)gmail.com > :

 No, I have one pool made of the one disk and ssd as a cache and log device.
 I have 3 Glusterfs bricks- separate 3 hosts:Volume type Replicate (Arbiter)=
 replica 2+1!
 That how much you can push into compute nodes(they have only 3 disk slots).

 On Fri, Mar 3, 2017 at 3:19 PM, Juan Pablo < pablo.localhost(a)gmail.com >
 wrote:

 ok, you have 3 pools, zclei22, logs and cache, thats wrong. you should have 1
 pool, with zlog+cache if you are looking for performance.
 also, dont mix drives.
 whats the performance issue you are facing?

 regards,

 2017-03-03 11:00 GMT-03:00 Arman Khalatyan < arm2arm(a)gmail.com > :

 This is CentOS 7.3 ZoL version 0.6.5.9-1

 [root@clei22 ~]# lsscsi

 [2:0:0:0] disk ATA INTEL SSDSC2CW24 400i /dev/sda

 [3:0:0:0] disk ATA HGST HUS724040AL AA70 /dev/sdb

 [4:0:0:0] disk ATA WDC WD2002FYPS-0 1G01 /dev/sdc

 [root@clei22 ~]# pvs ;vgs;lvs

 PV VG Fmt Attr PSize PFree

 /dev/mapper/INTEL_SSDSC2CW240A3_CVCV306302RP240CGN vg_cache lvm2 a-- 223.57g
 0

 /dev/sdc2 centos_clei22 lvm2 a-- 1.82t 64.00m

 VG #PV #LV #SN Attr VSize VFree

 centos_clei22 1 3 0 wz--n- 1.82t 64.00m

 vg_cache 1 2 0 wz--n- 223.57g 0

 LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert

 home centos_clei22 -wi-ao---- 1.74t

 root centos_clei22 -wi-ao---- 50.00g

 swap centos_clei22 -wi-ao---- 31.44g

 lv_cache vg_cache -wi-ao---- 213.57g

 lv_slog vg_cache -wi-ao---- 10.00g

 [root@clei22 ~]# zpool status -v

 pool: zclei22

 state: ONLINE

 scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07 2017

 config:

 NAME STATE READ WRITE CKSUM

 zclei22 ONLINE 0 0 0

 HGST_HUS724040ALA640_PN2334PBJ4SV6T1 ONLINE 0 0 0

 logs

 lv_slog ONLINE 0 0 0

 cache

 lv_cache ONLINE 0 0 0

 errors: No known data errors

 ZFS config:

 [root@clei22 ~]# zfs get all zclei22/01

 NAME PROPERTY VALUE SOURCE

 zclei22/01 type filesystem -

 zclei22/01 creation Tue Feb 28 14:06 2017 -

 zclei22/01 used 389G -

 zclei22/01 available 3.13T -

 zclei22/01 referenced 389G -

 zclei22/01 compressratio 1.01x -

 zclei22/01 mounted yes -

 zclei22/01 quota none default

 zclei22/01 reservation none default

 zclei22/01 recordsize 128K local

 zclei22/01 mountpoint /zclei22/01 default

 zclei22/01 sharenfs off default

 zclei22/01 checksum on default

 zclei22/01 compression off local

 zclei22/01 atime on default

 zclei22/01 devices on default

 zclei22/01 exec on default

 zclei22/01 setuid on default

 zclei22/01 readonly off default

 zclei22/01 zoned off default

 zclei22/01 snapdir hidden default

 zclei22/01 aclinherit restricted default

 zclei22/01 canmount on default

 zclei22/01 xattr sa local

 zclei22/01 copies 1 default

 zclei22/01 version 5 -

 zclei22/01 utf8only off -

 zclei22/01 normalization none -

 zclei22/01 casesensitivity sensitive -

 zclei22/01 vscan off default

 zclei22/01 nbmand off default

 zclei22/01 sharesmb off default

 zclei22/01 refquota none default

 zclei22/01 refreservation none default

 zclei22/01 primarycache metadata local

 zclei22/01 secondarycache metadata local

 zclei22/01 usedbysnapshots 0 -

 zclei22/01 usedbydataset 389G -

 zclei22/01 usedbychildren 0 -

 zclei22/01 usedbyrefreservation 0 -

 zclei22/01 logbias latency default

 zclei22/01 dedup off default

 zclei22/01 mlslabel none default

 zclei22/01 sync disabled local

 zclei22/01 refcompressratio 1.01x -

 zclei22/01 written 389G -

 zclei22/01 logicalused 396G -

 zclei22/01 logicalreferenced 396G -

 zclei22/01 filesystem_limit none default

 zclei22/01 snapshot_limit none default

 zclei22/01 filesystem_count none default

 zclei22/01 snapshot_count none default

 zclei22/01 snapdev hidden default

 zclei22/01 acltype off default

 zclei22/01 context none default

 zclei22/01 fscontext none default

 zclei22/01 defcontext none default

 zclei22/01 rootcontext none default

 zclei22/01 relatime off default

 zclei22/01 redundant_metadata all default

 zclei22/01 overlay off default

 On Fri, Mar 3, 2017 at 2:52 PM, Juan Pablo < pablo.localhost(a)gmail.com >
 wrote:

 Which operating system version are you using for your zfs storage?
 do:
 zfs get all your-pool-name
 use arc_summary.py from freenas git repo if you wish.

 2017-03-03 10:33 GMT-03:00 Arman Khalatyan < arm2arm(a)gmail.com > :

 Pool load:
 [root@clei21 ~]# zpool iostat -v 1
 capacity operations bandwidth
 pool alloc free read write read write
 -------------------------------------- ----- ----- ----- ----- ----- -----
 zclei21 10.1G 3.62T 0 112 823 8.82M
 HGST_HUS724040ALA640_PN2334PBJ52XWT1 10.1G 3.62T 0 46 626 4.40M
 logs - - - - - -
 lv_slog 225M 9.72G 0 66 198 4.45M
 cache - - - - - -
 lv_cache 9.81G 204G 0 46 56 4.13M
 -------------------------------------- ----- ----- ----- ----- ----- -----

 capacity operations bandwidth
 pool alloc free read write read write
 -------------------------------------- ----- ----- ----- ----- ----- -----
 zclei21 10.1G 3.62T 0 191 0 12.8M
 HGST_HUS724040ALA640_PN2334PBJ52XWT1 10.1G 3.62T 0 0 0 0
 logs - - - - - -
 lv_slog 225M 9.72G 0 191 0 12.8M
 cache - - - - - -
 lv_cache 9.83G 204G 0 218 0 20.0M
 -------------------------------------- ----- ----- ----- ----- ----- -----

 capacity operations bandwidth
 pool alloc free read write read write
 -------------------------------------- ----- ----- ----- ----- ----- -----
 zclei21 10.1G 3.62T 0 191 0 12.7M
 HGST_HUS724040ALA640_PN2334PBJ52XWT1 10.1G 3.62T 0 0 0 0
 logs - - - - - -
 lv_slog 225M 9.72G 0 191 0 12.7M
 cache - - - - - -
 lv_cache 9.83G 204G 0 72 0 7.68M
 -------------------------------------- ----- ----- ----- ----- ----- -----

 On Fri, Mar 3, 2017 at 2:32 PM, Arman Khalatyan < arm2arm(a)gmail.com > wrote:

 Glusterfs now in healing mode:
 Receiver:
 [root@clei21 ~]# arcstat.py 1
 time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
 13:24:49 0 0 0 0 0 0 0 0 0 4.6G 31G
 13:24:50 154 80 51 80 51 0 0 80 51 4.6G 31G
 13:24:51 179 62 34 62 34 0 0 62 42 4.6G 31G
 13:24:52 148 68 45 68 45 0 0 68 45 4.6G 31G
 13:24:53 140 64 45 64 45 0 0 64 45 4.6G 31G
 13:24:54 124 48 38 48 38 0 0 48 38 4.6G 31G
 13:24:55 157 80 50 80 50 0 0 80 50 4.7G 31G
 13:24:56 202 68 33 68 33 0 0 68 41 4.7G 31G
 13:24:57 127 54 42 54 42 0 0 54 42 4.7G 31G
 13:24:58 126 50 39 50 39 0 0 50 39 4.7G 31G
 13:24:59 116 40 34 40 34 0 0 40 34 4.7G 31G

 Sender
 [root@clei22 ~]# arcstat.py 1
 time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
 13:28:37 8 2 25 2 25 0 0 2 25 468M 31G
 13:28:38 1.2K 727 62 727 62 0 0 525 54 469M 31G
 13:28:39 815 508 62 508 62 0 0 376 55 469M 31G
 13:28:40 994 624 62 624 62 0 0 450 54 469M 31G
 13:28:41 783 456 58 456 58 0 0 338 50 470M 31G
 13:28:42 916 541 59 541 59 0 0 390 50 470M 31G
 13:28:43 768 437 56 437 57 0 0 313 48 471M 31G
 13:28:44 877 534 60 534 60 0 0 393 53 470M 31G
 13:28:45 957 630 65 630 65 0 0 450 57 470M 31G
 13:28:46 819 479 58 479 58 0 0 357 51 471M 31G

 On Thu, Mar 2, 2017 at 7:18 PM, Juan Pablo < pablo.localhost(a)gmail.com >
 wrote:

 hey,
 what are you using for zfs? get an arc status and show please

 2017-03-02 9:57 GMT-03:00 Arman Khalatyan < arm2arm(a)gmail.com > :

 no,
 ZFS itself is not on top of lvm. only ssd was spitted by lvm for slog(10G)
 and cache (the rest)
 but in any-case the ssd does not help much on glusterfs/ovirt load it has
 almost 100% cache misses....:( (terrible performance compare with nfs)

 On Thu, Mar 2, 2017 at 1:47 PM, FERNANDO FREDIANI < fernando.frediani(a)upx.com
 > wrote:

 Am I understanding correctly, but you have Gluster on the top of ZFS which is
 on the top of LVM ? If so, why the usage of LVM was necessary ? I have ZFS
 with any need of LVM.

 Fernando

 On 02/03/2017 06:19, Arman Khalatyan wrote:

 Hi,
 I use 3 nodes with zfs and glusterfs.
 Are there any suggestions to optimize it?

 host zfs config 4TB-HDD+250GB-SSD:
 [root@clei22 ~]# zpool status
 pool: zclei22
 state: ONLINE
 scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07 2017
 config:

 NAME STATE READ WRITE CKSUM
 zclei22 ONLINE 0 0 0
 HGST_HUS724040ALA640_PN2334PBJ4SV6T1 ONLINE 0 0 0
 logs
 lv_slog ONLINE 0 0 0
 cache
 lv_cache ONLINE 0 0 0

 errors: No known data errors

 Name:
 GluReplica
 Volume ID:
 ee686dfe-203a-4caa-a691-26353460cc48
 Volume Type:
 Replicate (Arbiter)
 Replica Count:
 2 + 1
 Number of Bricks:
 3
 Transport Types:
 TCP, RDMA
 Maximum no of snapshots:
 256
 Capacity:
 3.51 TiB total, 190.56 GiB used, 3.33 TiB free

 _______________________________________________
 Users mailing list Users(a)ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 _______________________________________________
 Users mailing list
 Users(a)ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 _______________________________________________
 Users mailing list
 Users(a)ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 _______________________________________________
 Users mailing list
 Users(a)ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [ovirt-users] Replicated Glusterfs on top of ZFS