Re: [ovirt-users] ovirt with glusterfs - big test - unwanted results

31 Mar 2016

      This is a multi-part message in MIME format.
--------------040008080106010908080300
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Hello,
some envir. answers :
*****************************************************
OS = RHEL - 7 - 2.151
kernel = 3.10.0 - 327.10.1.el7.x86_64
KVM = 2.3.0 - 31.el7_2.7.1
libvirt = libvirt-1.2.17-13.el7_2.3
vdsm = vdsm-4.17.23.2-0.el7
glusterfs = glusterfs-3.7.9-1.el7
ovirt = 3.5.6.2-1
*****************************************************
# gluster peer status
Number of Peers: 4

Hostname: 1hp2
Uuid: 8e87cf18-8958-41b7-8d24-7ee420a1ef9f
State: Peer in Cluster (Connected)

Hostname: 2hp2
Uuid: b1d987d8-0b42-4ce4-b85f-83b4072e0990
State: Peer in Cluster (Connected)

Hostname: 2hp1
Uuid: a1cbe1a8-88ad-4e89-8a0e-d2bb2b6786d8
State: Peer in Cluster (Connected)

Hostname: kvmarbiter
Uuid: bb1d63f1-7757-4c07-b70d-aa2f68449e21
State: Peer in Cluster (Connected)
*****************************************************
== "C" ==
Volume Name: 12HP12-D2R3A1P2
Type: Distributed-Replicate
Volume ID: 3c22d3dc-7c6e-4e37-9e0b-78410873ed6d
Status: Started
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: 1hp1:/STORAGES/P2/GFS
Brick2: 1hp2:/STORAGES/P2/GFS
Brick3: kvmarbiter:/STORAGES/P2-1/GFS (arbiter)
Brick4: 2hp1:/STORAGES/P2/GFS
Brick5: 2hp2:/STORAGES/P2/GFS
Brick6: kvmarbiter:/STORAGES/P2-2/GFS (arbiter)
Options Reconfigured:
performance.readdir-ahead: on
*****************************************************
== "A" ==
Volume Name: 1HP12-R3A1P1
Type: Replicate
Volume ID: e4121610-6128-4ecc-86d3-1429ab3b8356
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 1hp1:/STORAGES/P1/GFS
Brick2: 1hp2:/STORAGES/P1/GFS
Brick3: kvmarbiter:/STORAGES/P1-1/GFS (arbiter)
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
cluster.data-self-heal-algorithm: full
performance.write-behind: on
performance.low-prio-threads: 32
performance.write-behind-window-size: 128MB
network.ping-timeout: 10
*****************************************************
== "B" ==
Volume Name: 2HP12-R3A1P1
Type: Replicate
Volume ID: d3d260cd-455f-42d6-9580-d88ae6df0519
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 2hp1:/STORAGES/P1/GFS
Brick2: 2hp2:/STORAGES/P1/GFS
Brick3: kvmarbiter:/STORAGES/P1-2/GFS (arbiter)
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
cluster.data-self-heal-algorithm: full
performance.write-behind: on
performance.low-prio-threads: 32
performance.write-behind-window-size: 128MB
network.ping-timeout: 10

The oVirt volumes(storages) have the same name as gluster volumes ( eg: 
"B" = 2HP12-R3A1P1( ovirt storage ) = 2HP12-R3A1P1( gluster volume name ) )
In the test the master volume was  "A" = 1HP12-R3A1P1

regs. Pavel
PS: logs will follow as webstore pointer ... this takes some time

On 31.3.2016 14:30, Yaniv Kaul wrote:
...
Hi Pavel,
Thanks for the report. Can you begin with a more accurate description 
of your environment?
Begin with host, oVirt and Gluster versions. Then continue with the 
exact setup (what are 'A', 'B', 'C' - domains? Volumes? What is the 
mapping between domains and volumes?).
Are there any logs you can share with us?
I'm sure with more information, we'd be happy to look at the issue.
Y.
On Thu, Mar 31, 2016 at 3:09 PM, paf1@email.cz <mailto:paf1@email.cz> 
<paf1@email.cz <mailto:paf1@email.cz>> wrote:
Hello,
    we tried the  following test - with unwanted results
input:
    5 node gluster
    A = replica 3 with arbiter 1 ( node1+node2+arbiter on node 5 )
    B = replica 3 with arbiter 1 ( node3+node4+arbiter on node 5 )
    C = distributed replica 3 arbiter 1  ( node1+node2, node3+node4,
    each arbiter on node 5)
    node 5 has only arbiter replica ( 4x )
TEST:
    1)  directly reboot one node - OK ( is not important which ( data
    node or arbiter node ))
    2)  directly reboot two nodes - OK ( if  nodes are not from the
    same replica )
    3)  directly reboot three nodes - yes, this is the main problem
    and a questions ....
        - rebooted all three nodes from replica "B"  ( not so
    possible, but who knows ... )
        - all VMs with data on this replica was paused ( no data
    access ) - OK
        - all VMs running on replica "B" nodes lost (  started
    manually, later )( datas on other replicas ) - acceptable
    BUT
        - !!! all oVIrt domains went down !! - master domain is on
    replica "A" which lost only one member from three !!!
        so we are not expecting that all domain will go down,
    especially master with 2 live members.
Results:
        - the whole cluster unreachable until at all domains up -
    depent of all nodes up !!!
        - all paused VMs started back - OK
        - rest of all VMs rebooted and runnig - OK
Questions:
        1) why all domains down if master domain ( on replica "A" )
    has two runnig members ( 2 of 3 )  ??
        2) how to fix that colaps without waiting to all nodes up ? (
    in worste case if node has HW error eg. ) ??
        3) which oVirt  cluster  policy  can prevent that situation ??
    ( if any )
regs.
    Pavel
_______________________________________________
    Users mailing list
    Users@ovirt.org <mailto:Users@ovirt.org>
    http://lists.ovirt.org/mailman/listinfo/users
--------------040008080106010908080300
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000066" bgcolor="#FFFFFF">
    Hello, <br>
    some envir. answers :<br>
    *****************************************************<br>
    OS = RHEL - 7 - 2.151<br>
    kernel = 3.10.0 - 327.10.1.el7.x86_64<br>
    KVM = 2.3.0 - 31.el7_2.7.1<br>
    libvirt = libvirt-1.2.17-13.el7_2.3<br>
    vdsm = vdsm-4.17.23.2-0.el7<br>
    glusterfs = glusterfs-3.7.9-1.el7<br>
    ovirt = 3.5.6.2-1<br>
    *****************************************************<br>
    # gluster peer status<br>
    Number of Peers: 4<br>
    <br>
    Hostname: 1hp2<br>
    Uuid: 8e87cf18-8958-41b7-8d24-7ee420a1ef9f<br>
    State: Peer in Cluster (Connected)<br>
    <br>
    Hostname: 2hp2<br>
    Uuid: b1d987d8-0b42-4ce4-b85f-83b4072e0990<br>
    State: Peer in Cluster (Connected)<br>
    <br>
    Hostname: 2hp1<br>
    Uuid: a1cbe1a8-88ad-4e89-8a0e-d2bb2b6786d8<br>
    State: Peer in Cluster (Connected)<br>
    <br>
    Hostname: kvmarbiter<br>
    Uuid: bb1d63f1-7757-4c07-b70d-aa2f68449e21<br>
    State: Peer in Cluster (Connected)<br>
    *****************************************************<br>
    == "C" ==<br>
    Volume Name: 12HP12-D2R3A1P2<br>
    Type: Distributed-Replicate<br>
    Volume ID: 3c22d3dc-7c6e-4e37-9e0b-78410873ed6d<br>
    Status: Started<br>
    Number of Bricks: 2 x (2 + 1) = 6<br>
    Transport-type: tcp<br>
    Bricks:<br>
    Brick1: 1hp1:/STORAGES/P2/GFS<br>
    Brick2: 1hp2:/STORAGES/P2/GFS<br>
    Brick3: kvmarbiter:/STORAGES/P2-1/GFS (arbiter)<br>
    Brick4: 2hp1:/STORAGES/P2/GFS<br>
    Brick5: 2hp2:/STORAGES/P2/GFS<br>
    Brick6: kvmarbiter:/STORAGES/P2-2/GFS (arbiter)<br>
    Options Reconfigured:<br>
    performance.readdir-ahead: on<br>
    *****************************************************<br>
    == "A" ==<br>
    Volume Name: 1HP12-R3A1P1<br>
    Type: Replicate<br>
    Volume ID: e4121610-6128-4ecc-86d3-1429ab3b8356<br>
    Status: Started<br>
    Number of Bricks: 1 x (2 + 1) = 3<br>
    Transport-type: tcp<br>
    Bricks:<br>
    Brick1: 1hp1:/STORAGES/P1/GFS<br>
    Brick2: 1hp2:/STORAGES/P1/GFS<br>
    Brick3: kvmarbiter:/STORAGES/P1-1/GFS (arbiter)<br>
    Options Reconfigured:<br>
    performance.readdir-ahead: on<br>
    performance.quick-read: off<br>
    performance.read-ahead: off<br>
    performance.io-cache: off<br>
    performance.stat-prefetch: off<br>
    cluster.eager-lock: enable<br>
    network.remote-dio: enable<br>
    cluster.quorum-type: auto<br>
    cluster.server-quorum-type: server<br>
    storage.owner-uid: 36<br>
    storage.owner-gid: 36<br>
    features.shard: on<br>
    features.shard-block-size: 512MB<br>
    cluster.data-self-heal-algorithm: full<br>
    performance.write-behind: on<br>
    performance.low-prio-threads: 32<br>
    performance.write-behind-window-size: 128MB<br>
    network.ping-timeout: 10<br>
    *****************************************************<br>
    == "B" ==<br>
    Volume Name: 2HP12-R3A1P1<br>
    Type: Replicate<br>
    Volume ID: d3d260cd-455f-42d6-9580-d88ae6df0519<br>
    Status: Started<br>
    Number of Bricks: 1 x (2 + 1) = 3<br>
    Transport-type: tcp<br>
    Bricks:<br>
    Brick1: 2hp1:/STORAGES/P1/GFS<br>
    Brick2: 2hp2:/STORAGES/P1/GFS<br>
    Brick3: kvmarbiter:/STORAGES/P1-2/GFS (arbiter)<br>
    Options Reconfigured:<br>
    performance.readdir-ahead: on<br>
    performance.quick-read: off<br>
    performance.read-ahead: off<br>
    performance.io-cache: off<br>
    performance.stat-prefetch: off<br>
    cluster.eager-lock: enable<br>
    network.remote-dio: enable<br>
    cluster.quorum-type: auto<br>
    cluster.server-quorum-type: server<br>
    storage.owner-uid: 36<br>
    storage.owner-gid: 36<br>
    features.shard: on<br>
    features.shard-block-size: 512MB<br>
    cluster.data-self-heal-algorithm: full<br>
    performance.write-behind: on<br>
    performance.low-prio-threads: 32<br>
    performance.write-behind-window-size: 128MB<br>
    network.ping-timeout: 10<br>
    <br>
    <br>
    The oVirt volumes(storages) have the same name as gluster volumes (
    eg: "B" = 2HP12-R3A1P1( ovirt storage ) = 2HP12-R3A1P1( gluster
    volume name ) )<br>
    In the test the master volume was  "A" = 1HP12-R3A1P1<br>
    <br>
    regs. Pavel<br>
    PS: logs will follow as webstore pointer ... this takes some time <br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 31.3.2016 14:30, Yaniv Kaul wrote:<br>
    </div>
    <blockquote
cite="mid:CAJgorsaOUQ_42GUSPh-H1vGUgJ114JYcUHR8vHwvmcWR+w8Jmw@mail.gmail.com"
      type="cite">
      <div dir="ltr">Hi Pavel,
        <div><br>
        </div>
        <div>Thanks for the report. Can you begin with a more accurate
          description of your environment?</div>
        <div>Begin with host, oVirt and Gluster versions. Then continue
          with the exact setup (what are 'A', 'B', 'C' - domains?
          Volumes? What is the mapping between domains and volumes?).</div>
        <div><br>
        </div>
        <div>Are there any logs you can share with us?</div>
        <div><br>
        </div>
        <div>I'm sure with more information, we'd be happy to look at
          the issue.</div>
        <div>Y.</div>
        <div><br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Thu, Mar 31, 2016 at 3:09 PM, <a
            moz-do-not-send="true" href="mailto:paf1@email.cz">paf1@email.cz</a>
          <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:paf1@email.cz" target="_blank">paf1@email.cz</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000066" bgcolor="#FFFFFF"> Hello, <br>
              we tried the  following test - with unwanted results<br>
              <br>
              input:<br>
              5 node gluster<br>
              A = replica 3 with arbiter 1 ( node1+node2+arbiter on node
              5 )<br>
              B = replica 3 with arbiter 1 ( node3+node4+arbiter on node
              5 )<br>
              C = distributed replica 3 arbiter 1  ( node1+node2,
              node3+node4, each arbiter on node 5)<br>
              node 5 has only arbiter replica ( 4x )<br>
              <br>
              TEST:<br>
              1)  directly reboot one node - OK ( is not important which
              ( data node or arbiter node ))<br>
              2)  directly reboot two nodes - OK ( if  nodes are not
              from the same replica ) <br>
              3)  directly reboot three nodes - yes, this is the main
              problem and a questions ....<br>
                  - rebooted all three nodes from replica "B"  ( not so
              possible, but who knows ... )<br>
                  - all VMs with data on this replica was paused ( no
              data access ) - OK<br>
                  - all VMs running on replica "B" nodes lost (  started
              manually, later )( datas on other replicas ) - acceptable<br>
              BUT<br>
                  - !!! all oVIrt domains went down !! - master domain
              is on replica "A" which lost only one member from three
              !!!<br>
                  so we are not expecting that all domain will go down,
              especially master with 2 live members.<br>
                  <br>
              Results: <br>
                  - the whole cluster unreachable until at all domains
              up - depent of all nodes up !!!<br>
                  - all paused VMs started back - OK<br>
                  - rest of all VMs rebooted and runnig - OK<br>
              <br>
              Questions:<br>
                  1) why all domains down if master domain ( on replica
              "A" ) has two runnig members ( 2 of 3 )  ??<br>
                  2) how to fix that colaps without waiting to all nodes
              up ? ( in worste case if node has HW error eg. ) ??<br>
                  3) which oVirt  cluster  policy  can prevent that
              situation ?? ( if any )<br>
              <br>
              regs.<br>
              Pavel<br>
              <br>
              <br>
            </div>
            <br>
            _______________________________________________<br>
            Users mailing list<br>
            <a moz-do-not-send="true" href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
            <a moz-do-not-send="true"
              href="http://lists.ovirt.org/mailman/listinfo/users"
              rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------040008080106010908080300--

Re: [ovirt-users] ovirt with glusterfs - big test - unwanted results

paf1＠email.cz