[ovirt-users] ovirt with glusterfs - big test - unwanted results

Thu Mar 31 12:09:05 UTC 2016

Hello,
we tried the  following test - with unwanted results

input:
5 node gluster
A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
C = distributed replica 3 arbiter 1  ( node1+node2, node3+node4, each 
arbiter on node 5)
node 5 has only arbiter replica ( 4x )

TEST:
1)  directly reboot one node - OK ( is not important which ( data node 
or arbiter node ))
2)  directly reboot two nodes - OK ( if  nodes are not from the same 
replica )
3)  directly reboot three nodes - yes, this is the main problem and a 
questions ....
     - rebooted all three nodes from replica "B"  ( not so possible, but 
who knows ... )
     - all VMs with data on this replica was paused ( no data access ) - OK
     - all VMs running on replica "B" nodes lost (  started manually, 
later )( datas on other replicas ) - acceptable
BUT
     - !!! all oVIrt domains went down !! - master domain is on replica 
"A" which lost only one member from three !!!
     so we are not expecting that all domain will go down, especially 
master with 2 live members.

Results:
     - the whole cluster unreachable until at all domains up - depent of 
all nodes up !!!
     - all paused VMs started back - OK
     - rest of all VMs rebooted and runnig - OK

Questions:
     1) why all domains down if master domain ( on replica "A" ) has two 
runnig members ( 2 of 3 )  ??
     2) how to fix that colaps without waiting to all nodes up ? ( in 
worste case if node has HW error eg. ) ??
     3) which oVirt  cluster  policy  can prevent that situation ?? ( if 
any )

regs.
Pavel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160331/50a0eaf1/attachment-0001.html>