On Wed, Jul 5, 2017 at 3:10 AM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

On Tue, Jul 4, 2017 at 2:57 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

No, it's not. One option is to update glusterfs packages to 3.10.

Is it supported throughout oVirt to use CentOS Storage SIG packages instead of ovirt provided ones? I imagine you mean it, correct?

If this is a case, would I have to go with Gluster 3.9 (non LTS)
https://lists.centos.org/pipermail/centos-announce/2017-January/022249.html

Or Gluster 3.10 (LTS)
https://lists.centos.org/pipermail/centos-announce/2017-March/022337.html

I suppose the latter...
Any problem then with updates of oVirt itself, eg going through 4.1.2 to 4.1.3?

Thanks
Gianluca

Is 3.9 version of Gluster packages provided when updating to upcoming 4.1.3, perhaps?

Never mind, I will verify. At the end this is a test system.
I put the nodes in maintenance one by one and then installed glusterfs 3.10 with;

yum install centos-release-gluster
yum update

All were able to self heal then and I see the 4 storage domains (engine, data, iso, export) up and running.
See some notes at the end of the e-mail.
Now I'm ready to test the change of gluster network traffic.

In my case the current hostnames that are also matching the ovirtmgmt network are ovirt0N.localdomain.com with N=1,2,3

On my vlan2, defined as gluster network role in the cluster, I have defined (on each node /etc/hosts file) the hostnames

10.10.2.102 gl01.localdomain.local gl01
10.10.2.103 gl02.localdomain.local gl02
10.10.2.104 gl03.localdomain.local gl03

I need more details about command to run:

Currently I have

[root@ovirt03 ~]# gluster peer status
Number of Peers: 2

Hostname: ovirt01.localdomain.local
Uuid: e9717281-a356-42aa-a579-a4647a29a0bc
State: Peer in Cluster (Connected)
Other names:
10.10.2.102

Hostname: ovirt02.localdomain.local
Uuid: b89311fe-257f-4e44-8e15-9bff6245d689
State: Peer in Cluster (Connected)
Other names:
10.10.2.103

Suppose I start form export volume, that has these info:

[root@ovirt03 ~]# gluster volume info export

Volume Name: export
Type: Replicate
Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ovirt01.localdomain.local:/gluster/brick3/export
Brick2: ovirt02.localdomain.local:/gluster/brick3/export
Brick3: ovirt03.localdomain.local:/gluster/brick3/export (arbiter)
...

then the commands I need to run would be:

gluster volume reset-brick export ovirt01.localdomain.local:/gluster/brick3/export start
gluster volume reset-brick export ovirt01.localdomain.local:/gluster/brick3/export gl01.localdomain.local:/gluster/brick3/export commit force

Correct?

Yes, correct. gl01.localdomain.local should resolve correctly on all 3 nodes.

Is it sufficient to run it on a single node? And then on the same node, to run also for the other bricks of the same volume:

Yes, it is sufficient to run on single node. You can run the reset-brick for all bricks from same node.

gluster volume reset-brick export ovirt02.localdomain.local:/gluster/brick3/export start
gluster volume reset-brick export ovirt02.localdomain.local:/gluster/brick3/export gl02.localdomain.local:/gluster/brick3/export commit force

and

gluster volume reset-brick export ovirt03.localdomain.local:/gluster/brick3/export start
gluster volume reset-brick export ovirt03.localdomain.local:/gluster/brick3/export gl03.localdomain.local:/gluster/brick3/export commit force

Correct? Do I have to wait self-heal after each commit command, before proceeding with the other ones?

Ideally, gluster should recognize this as same brick as before, and heal will not be needed. Please confirm that it is indeed the case before proceeding

Thanks in advance for input so that I can test it.

Gianluca

NOTE: during the update of gluster packages from 3.8 to 3.10 I got these:

warning: /var/lib/glusterd/vols/engine/engine.ovirt01.localdomain.local.gluster-brick1-engine.vol saved as /var/lib/glusterd/vols/engine/engine.ovirt01.localdomain.local.gluster-brick1-engine.vol.rpmsave
warning: /var/lib/glusterd/vols/engine/engine.ovirt02.localdomain.local.gluster-brick1-engine.vol saved as /var/lib/glusterd/vols/engine/engine.ovirt02.localdomain.local.gluster-brick1-engine.vol.rpmsave
warning: /var/lib/glusterd/vols/engine/engine.ovirt03.localdomain.local.gluster-brick1-engine.vol saved as /var/lib/glusterd/vols/engine/engine.ovirt03.localdomain.local.gluster-brick1-engine.vol.rpmsave
warning: /var/lib/glusterd/vols/engine/trusted-engine.tcp-fuse.vol saved as /var/lib/glusterd/vols/engine/trusted-engine.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/engine/engine.tcp-fuse.vol saved as /var/lib/glusterd/vols/engine/engine.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/data/data.ovirt01.localdomain.local.gluster-brick2-data.vol saved as /var/lib/glusterd/vols/data/data.ovirt01.localdomain.local.gluster-brick2-data.vol.rpmsave
warning: /var/lib/glusterd/vols/data/data.ovirt02.localdomain.local.gluster-brick2-data.vol saved as /var/lib/glusterd/vols/data/data.ovirt02.localdomain.local.gluster-brick2-data.vol.rpmsave
warning: /var/lib/glusterd/vols/data/data.ovirt03.localdomain.local.gluster-brick2-data.vol saved as /var/lib/glusterd/vols/data/data.ovirt03.localdomain.local.gluster-brick2-data.vol.rpmsave
warning: /var/lib/glusterd/vols/data/trusted-data.tcp-fuse.vol saved as /var/lib/glusterd/vols/data/trusted-data.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/data/data.tcp-fuse.vol saved as /var/lib/glusterd/vols/data/data.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/export/export.ovirt01.localdomain.local.gluster-brick3-export.vol saved as /var/lib/glusterd/vols/export/export.ovirt01.localdomain.local.gluster-brick3-export.vol.rpmsave
warning: /var/lib/glusterd/vols/export/export.ovirt02.localdomain.local.gluster-brick3-export.vol saved as /var/lib/glusterd/vols/export/export.ovirt02.localdomain.local.gluster-brick3-export.vol.rpmsave
warning: /var/lib/glusterd/vols/export/export.ovirt03.localdomain.local.gluster-brick3-export.vol saved as /var/lib/glusterd/vols/export/export.ovirt03.localdomain.local.gluster-brick3-export.vol.rpmsave
warning: /var/lib/glusterd/vols/export/trusted-export.tcp-fuse.vol saved as /var/lib/glusterd/vols/export/trusted-export.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/export/export.tcp-fuse.vol saved as /var/lib/glusterd/vols/export/export.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/iso/iso.ovirt01.localdomain.local.gluster-brick4-iso.vol saved as /var/lib/glusterd/vols/iso/iso.ovirt01.localdomain.local.gluster-brick4-iso.vol.rpmsave
warning: /var/lib/glusterd/vols/iso/iso.ovirt02.localdomain.local.gluster-brick4-iso.vol saved as /var/lib/glusterd/vols/iso/iso.ovirt02.localdomain.local.gluster-brick4-iso.vol.rpmsave
warning: /var/lib/glusterd/vols/iso/iso.ovirt03.localdomain.local.gluster-brick4-iso.vol saved as /var/lib/glusterd/vols/iso/iso.ovirt03.localdomain.local.gluster-brick4-iso.vol.rpmsave
warning: /var/lib/glusterd/vols/iso/trusted-iso.tcp-fuse.vol saved as /var/lib/glusterd/vols/iso/trusted-iso.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/iso/iso.tcp-fuse.vol saved as /var/lib/glusterd/vols/iso/iso.tcp-fuse.vol.rpmsave
Installing : python2-gluster-3.10.3-1.el7.x86_64                                                             9/20
Installing : python-prettytable-0.7.2-2.el7.centos.noarch                                                   10/20
Updating   : glusterfs-geo-replication-3.10.3-1.el7.x86_64                                                  11/20
Warning: glusterd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Cleanup    : glusterfs-geo-replication-3.8.13-1.el7.x86_64                                                  12/20
Warning: glusterd.service changed on disk. Run 'systemctl daemon-reload' to reload units.

For each volume the differences were these:

[root@ovirt02 engine]# diff engine.ovirt01.localdomain.local.gluster-brick1-engine.vol engine.ovirt01.localdomain.local.gluster-brick1-engine.vol.rpmsave
19,20c19,20
<     option sql-db-wal-autocheckpoint 25000
<     option sql-db-cachesize 12500
---
>     option sql-db-wal-autocheckpoint 1000
>     option sql-db-cachesize 1000
127c127
< volume engine-io-stats
---
> volume /gluster/brick1/engine
132d131
<     option unique-id /gluster/brick1/engine
136c135
< volume /gluster/brick1/engine
---
> volume engine-decompounder
138c137
<     subvolumes engine-io-stats
---
>     subvolumes /gluster/brick1/engine
149c148
<     subvolumes /gluster/brick1/engine
---
>     subvolumes engine-decompounder
[root@ovirt02 engine]#

[root@ovirt02 engine]# diff trusted-engine.tcp-fuse.vol trusted-engine.tcp-fuse.vol.rpmsave
39d38
<     option use-compound-fops off
70,72d68
<     option rda-cache-limit 10MB
<     option rda-request-size 131072
<     option parallel-readdir off
[root@ovirt02 engine]#

[root@ovirt02 engine]# diff engine.tcp-fuse.vol engine.tcp-fuse.vol.rpmsave
33d32
<     option use-compound-fops off
64,66d62
<     option rda-cache-limit 10MB
<     option rda-request-size 131072
<     option parallel-readdir off
[root@ovirt02 engine]#

The message related to glusterd service was misleading, because actually I verified that the file /usr/lib/systemd/system/glusterd.service was the same as before.