[ovirt-users] Replica 3 distribute-replicated - data placement and fault tolerance

19 Apr 2019


      Hello Everyone,
I did some fio performance tests on a particular vm and I have noticed
things that I do not understand regarding how data is placed along the
bricks. I am sure this is a lack of knowledge, but I would really
appreciate any help in understanding this, I did a bit of research on the
internet,  but just could't find something relevant.
I have one replica 3 distributed-replicated arbitrated volume, across 18
bricks ( 9 nodes, 2 jbods per node ).
The volume was created as:
node1 - brick1, node-2, brick1,.....node9-brick1, node1 - brick2, node-2,
brick2,.....node9-brick2
As far as i've understood,  under the hood there are sets of 3 times
replicated data ( subvolumes ) which are assigned to the first 3 bricks,
next set of replicated data to the next set of 3 bricks,  and so on..
Now, I have this testing vm runign one node one.
When I've started the fio test, i've noticed incresed gluster traffic from
node 1 to node2,4 and 5
So I assumed that the vm disk is data resides on a subvolume allocated to
bricks from these hosts.
Then I have migrated the vm on node 2, and did the same test. Now the
increased treffic is generated from node2 to node1, 4, and 5..
What I do not understand is:
 - why gluster client ( ovirt host ) is sending data to  3 bricks if the
volume is arbitrated - shouldn't it send only to 2 of them ?
- why are there 4 brick implicated in this subvolume
- what would it be the fault tolerance level in this setup ,ie: how many
hosts can I take down and still having the volume serving IO requests; can
they be random ?
I am sorry for my lack of knowledge,  I am just trying to understand what
is happening so I can deploy a decent proper setup of an hci environment.
Thank you,
Leo


-- 
Best regards, Leo David