<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000066" bgcolor="#FFFFFF">

    Hello Yaniv,<br>

    <br>

    we tried another small test - reboot two nodes from replica3 a1 (

    1HP12-R3A1P1 ) which keep master domain.<br>

    All domains went down  = master down, but master domain didn't move

    to another available domain ( eg. 2HP12-R3A1P1 ).<br>

    <br>

    It looks that "master domain" management isn't correct ( has a bug

    ?? )<br>

    <br>

    regs.<br>

    Pavel<br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 31.3.2016 14:30, Yaniv Kaul wrote:<br>

    </div>

    <blockquote

cite="mid:CAJgorsaOUQ_42GUSPh-H1vGUgJ114JYcUHR8vHwvmcWR+w8Jmw@mail.gmail.com"

      type="cite">

      <div dir="ltr">Hi Pavel,

        <div><br>

        </div>

        <div>Thanks for the report. Can you begin with a more accurate

          description of your environment?</div>

        <div>Begin with host, oVirt and Gluster versions. Then continue

          with the exact setup (what are 'A', 'B', 'C' - domains?

          Volumes? What is the mapping between domains and volumes?).</div>

        <div><br>

        </div>

        <div>Are there any logs you can share with us?</div>

        <div><br>

        </div>

        <div>I'm sure with more information, we'd be happy to look at

          the issue.</div>

        <div>Y.</div>

        <div><br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Thu, Mar 31, 2016 at 3:09 PM, <a

            moz-do-not-send="true" href="mailto:paf1@email.cz">paf1@email.cz</a>

          <span dir="ltr">&lt;<a moz-do-not-send="true"

              href="mailto:paf1@email.cz" target="_blank">paf1@email.cz</a>&gt;</span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div text="#000066" bgcolor="#FFFFFF"> Hello, <br>

              we tried the  following test - with unwanted results<br>

              <br>

              input:<br>

              5 node gluster<br>

              A = replica 3 with arbiter 1 ( node1+node2+arbiter on node

              5 )<br>

              B = replica 3 with arbiter 1 ( node3+node4+arbiter on node

              5 )<br>

              C = distributed replica 3 arbiter 1  ( node1+node2,

              node3+node4, each arbiter on node 5)<br>

              node 5 has only arbiter replica ( 4x )<br>

              <br>

              TEST:<br>

              1)  directly reboot one node - OK ( is not important which

              ( data node or arbiter node ))<br>

              2)  directly reboot two nodes - OK ( if  nodes are not

              from the same replica ) <br>

              3)  directly reboot three nodes - yes, this is the main

              problem and a questions ....<br>

                  - rebooted all three nodes from replica "B"  ( not so

              possible, but who knows ... )<br>

                  - all VMs with data on this replica was paused ( no

              data access ) - OK<br>

                  - all VMs running on replica "B" nodes lost (  started

              manually, later )( datas on other replicas ) - acceptable<br>

              BUT<br>

                  - !!! all oVIrt domains went down !! - master domain

              is on replica "A" which lost only one member from three

              !!!<br>

                  so we are not expecting that all domain will go down,

              especially master with 2 live members.<br>

                  <br>

              Results: <br>

                  - the whole cluster unreachable until at all domains

              up - depent of all nodes up !!!<br>

                  - all paused VMs started back - OK<br>

                  - rest of all VMs rebooted and runnig - OK<br>

              <br>

              Questions:<br>

                  1) why all domains down if master domain ( on replica

              "A" ) has two runnig members ( 2 of 3 )  ??<br>

                  2) how to fix that colaps without waiting to all nodes

              up ? ( in worste case if node has HW error eg. ) ??<br>

                  3) which oVirt  cluster  policy  can prevent that

              situation ?? ( if any )<br>

              <br>

              regs.<br>

              Pavel<br>

              <br>

              <br>

            </div>

            <br>

            _______________________________________________<br>

            Users mailing list<br>

            <a moz-do-not-send="true" href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>

            <a moz-do-not-send="true"

              href="http://lists.ovirt.org/mailman/listinfo/users"

              rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>