Gluster Bricks Out of Space

We have three gluster shares (_data, _engine, _export) created by a brick located on three of our VM hosts. See output from "gluster volume info" below: Volume Name: data Type: Replicate Volume ID: c07fdf43-b838-4e4b-bb26-61dbf406cb57 Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick2/data Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick2/data Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on Volume Name: engine Type: Distributed-Replicate Volume ID: 25455f13-75ba-4bc6-926a-d06ee7c5859a Status: Started Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick1/engine Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick1/engine Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick1/engine (arbiter) Brick4: vmhost04-chi:/mnt/engine Brick5: vmhost05-chi:/mnt/engine Brick6: vmhost06-chi:/mnt/engine (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on Volume Name: export Type: Replicate Volume ID: a4c3a49a-fa83-4a62-9523-989c8e016c35 Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick3/export Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick3/export Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick3/export (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on Our issue is that we ran out of space on our gluster-engine bricks which caused our Hosted Engine vm to crash. We added additional bricks from new VM Hosts (see vmhost05 to vmhost06 above) but we still are unable to restart our Hosted Engine due to the first three space being depleted. My understanding is that I need to extend the bricks that are 100% full on our engine partition. Is it the best practice to stop the glusterd service or can I use "gloster volume stop engine" to only stop the volume I need to extend? Also, if I need to stop glusterd will my VMs hosted on my ovirt cluster be affected by mount points export and data being off line? Thanks, Ryan

On Mon, May 8, 2017 at 10:46 PM, Ryan Housand < rhousand@empoweredbenefits.com> wrote:
We have three gluster shares (_data, _engine, _export) created by a brick located on three of our VM hosts. See output from "gluster volume info" below:
Volume Name: data Type: Replicate Volume ID: c07fdf43-b838-4e4b-bb26-61dbf406cb57 Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick2/data Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick2/data Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick2/data (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on
Volume Name: engine Type: Distributed-Replicate Volume ID: 25455f13-75ba-4bc6-926a-d06ee7c5859a Status: Started Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick1/engine Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick1/engine Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick1/engine (arbiter) Brick4: vmhost04-chi:/mnt/engine Brick5: vmhost05-chi:/mnt/engine Brick6: vmhost06-chi:/mnt/engine (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on
Volume Name: export Type: Replicate Volume ID: a4c3a49a-fa83-4a62-9523-989c8e016c35 Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: vmhost01-chi.empoweredbenefits.com:/gluster/brick3/export Brick2: vmhost02-chi.empoweredbenefits.com:/gluster/brick3/export Brick3: vmhost03-chi.empoweredbenefits.com:/gluster/brick3/export (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on
Our issue is that we ran out of space on our gluster-engine bricks which caused our Hosted Engine vm to crash. We added additional bricks from new VM Hosts (see vmhost05 to vmhost06 above) but we still are unable to restart our Hosted Engine due to the first three space being depleted. My understanding is that I need to extend the bricks that are 100% full on our engine partition. Is it the best practice to stop the glusterd service or can I use "gloster volume stop engine" to only stop the volume I need to extend? Also, if I need to stop glusterd will my VMs hosted on my ovirt cluster be affected by mount points export and data being off line?
Adding the 3 bricks to engine does not redistribute the data. You need to run rebalance on gluster volume engine for this. There's a bug currently that rebalance causes corruption when performed with ongoing IO on the volume. I think the best way for you to do this, is put hosted-engine to global maintenance, stop the hosted-engine and rebalance the engine gluster volume. What was the original size of the engine gluster volume? (Curious to understand why you ran out of space) The VMs running on data gluster volume should not be affected by this.
Thanks,
Ryan _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (2)
-
Ryan Housand
-
Sahina Bose