New subject: ovirt with glusterfs - big test - unwanted results

31 Mar 2016

      This is a multi-part message in MIME format.
--------------070802090208020205070907
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Hello,
we tried the  following test - with unwanted results

input:
5 node gluster
A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
C = distributed replica 3 arbiter 1  ( node1+node2, node3+node4, each 
arbiter on node 5)
node 5 has only arbiter replica ( 4x )

TEST:
1)  directly reboot one node - OK ( is not important which ( data node 
or arbiter node ))
2)  directly reboot two nodes - OK ( if  nodes are not from the same 
replica )
3)  directly reboot three nodes - yes, this is the main problem and a 
questions ....
     - rebooted all three nodes from replica "B"  ( not so possible, but 
who knows ... )
     - all VMs with data on this replica was paused ( no data access ) - OK
     - all VMs running on replica "B" nodes lost (  started manually, 
later )( datas on other replicas ) - acceptable
BUT
     - !!! all oVIrt domains went down !! - master domain is on replica 
"A" which lost only one member from three !!!
     so we are not expecting that all domain will go down, especially 
master with 2 live members.

Results:
     - the whole cluster unreachable until at all domains up - depent of 
all nodes up !!!
     - all paused VMs started back - OK
     - rest of all VMs rebooted and runnig - OK

Questions:
     1) why all domains down if master domain ( on replica "A" ) has two 
runnig members ( 2 of 3 )  ??
     2) how to fix that colaps without waiting to all nodes up ? ( in 
worste case if node has HW error eg. ) ??
     3) which oVirt  cluster  policy  can prevent that situation ?? ( if 
any )

regs.
Pavel

--------------070802090208020205070907
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body text="#000066" bgcolor="#FFFFFF">
    Hello, <br>
    we tried the  following test - with unwanted results<br>
    <br>
    input:<br>
    5 node gluster<br>
    A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )<br>
    B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )<br>
    C = distributed replica 3 arbiter 1  ( node1+node2, node3+node4,
    each arbiter on node 5)<br>
    node 5 has only arbiter replica ( 4x )<br>
    <br>
    TEST:<br>
    1)  directly reboot one node - OK ( is not important which ( data
    node or arbiter node ))<br>
    2)  directly reboot two nodes - OK ( if  nodes are not from the same
    replica ) <br>
    3)  directly reboot three nodes - yes, this is the main problem and
    a questions ....<br>
        - rebooted all three nodes from replica "B"  ( not so possible,
    but who knows ... )<br>
        - all VMs with data on this replica was paused ( no data access
    ) - OK<br>
        - all VMs running on replica "B" nodes lost (  started manually,
    later )( datas on other replicas ) - acceptable<br>
    BUT<br>
        - !!! all oVIrt domains went down !! - master domain is on
    replica "A" which lost only one member from three !!!<br>
        so we are not expecting that all domain will go down, especially
    master with 2 live members.<br>
        <br>
    Results: <br>
        - the whole cluster unreachable until at all domains up - depent
    of all nodes up !!!<br>
        - all paused VMs started back - OK<br>
        - rest of all VMs rebooted and runnig - OK<br>
    <br>
    Questions:<br>
        1) why all domains down if master domain ( on replica "A" ) has
    two runnig members ( 2 of 3 )  ??<br>
        2) how to fix that colaps without waiting to all nodes up ? ( in
    worste case if node has HW error eg. ) ??<br>
        3) which oVirt  cluster  policy  can prevent that situation ?? (
    if any )<br>
    <br>
    regs.<br>
    Pavel<br>
    <br>
    <br>
  </body>
</html>

--------------070802090208020205070907--

ovirt with glusterfs - big test - unwanted results

paf1＠email.cz

Yaniv Kaul

paf1＠email.cz

paf1＠email.cz

Sahina Bose

paf1＠email.cz

Sahina Bose

paf1＠email.cz

tags

participants (3)