On Mon, May 8, 2017 at 10:46 PM, Ryan Housand <rhousand@empoweredbenefits.com> wrote:

We have three gluster shares (_data, _engine, _export) created by a brick located on three of our VM hosts. See output from "gluster volume info" below:

Volume Name: data
Type: Replicate
Volume ID: c07fdf43-b838-4e4b-bb26-61dbf406cb57
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick2/data
Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick2/data
Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick2/data (arbiter)
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on

Volume Name: engine
Type: Distributed-Replicate
Volume ID: 25455f13-75ba-4bc6-926a-d06ee7c5859a
Status: Started
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick1/engine
Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick1/engine
Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick1/engine (arbiter)
Brick4: vmhost04-chi:/mnt/engine
Brick5: vmhost05-chi:/mnt/engine
Brick6: vmhost06-chi:/mnt/engine (arbiter)
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on

Volume Name: export
Type: Replicate
Volume ID: a4c3a49a-fa83-4a62-9523-989c8e016c35
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick3/export
Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick3/export
Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick3/export (arbiter)
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on

Our issue is that we ran out of space on our gluster-engine bricks which caused our Hosted Engine vm to crash. We added additional bricks from new VM Hosts (see vmhost05 to vmhost06 above) but we still are unable to restart our Hosted Engine due to the first three space being depleted. My understanding is that I need to extend the bricks that are 100% full on our engine partition. Is it the best practice to stop the glusterd service or can I use "gloster volume stop engine" to only stop the volume I need to extend? Also, if I need to stop glusterd will my VMs hosted on my ovirt cluster be affected by mount points export and data being off line?

Adding the 3 bricks to engine does not redistribute the data. You need to run rebalance on gluster volume engine for this. There's a bug currently that rebalance causes corruption when performed with ongoing IO on the volume.

I think the best way for you to do this, is put hosted-engine to global maintenance, stop the hosted-engine and rebalance the engine gluster volume.

What was the original size of the engine gluster volume? (Curious to understand why you ran out of space)

The VMs running on data gluster volume should not be affected by this.

Thanks,

Ryan
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users