<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi Chris,<br>
    <br>
    Replies inline..<br>
    <br>
    <div class="moz-cite-prefix">On 09/22/2015 09:31 AM, Sahina Bose
      wrote:<br>
    </div>
    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <br>
      <div class="moz-forward-container"><br>
        <br>
        -------- Forwarded Message --------
        <table class="moz-email-headers-table" border="0"
          cellpadding="0" cellspacing="0">
          <tbody>
            <tr>
              <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Subject:

              </th>
              <td>Re: [ovirt-users] urgent issue</td>
            </tr>
            <tr>
              <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Date:
              </th>
              <td>Wed, 9 Sep 2015 08:31:07 -0700</td>
            </tr>
            <tr>
              <th nowrap="nowrap" valign="BASELINE" align="RIGHT">From:
              </th>
              <td>Chris Liebman <a moz-do-not-send="true"
                  class="moz-txt-link-rfc2396E"
                  href="mailto:chris.l@taboola.com">&lt;chris.l@taboola.com&gt;</a></td>
            </tr>
            <tr>
              <th nowrap="nowrap" valign="BASELINE" align="RIGHT">To: </th>
              <td>users <a moz-do-not-send="true"
                  class="moz-txt-link-rfc2396E"
                  href="mailto:users@ovirt.org">&lt;users@ovirt.org&gt;</a></td>
            </tr>
          </tbody>
        </table>
        <br>
        <br>
        <div dir="ltr">Ok - I think I'm going to switch to local storage
          - I've had way to many unexplainable issue with glusterfs
          Â :-(.  Is there any reason I cant add local storage to the
          existing shared-storage cluster?  I see that the menu item is
          greyed out....
          <div><br>
          </div>
          <div><br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    What version of gluster and ovirt are you using? <br>
    <br>
    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">
      <div class="moz-forward-container">
        <div dir="ltr">
          <div> </div>
          <div>
            <div><br>
            </div>
            <div><br>
            </div>
          </div>
        </div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Tue, Sep 8, 2015 at 4:19 PM, Chris
            Liebman <span dir="ltr">&lt;<a moz-do-not-send="true"
                href="mailto:chris.l@taboola.com" target="_blank">chris.l@taboola.com</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div dir="ltr">Its possible that this is specific to just
                one gluster volume...  I've moved a few VM disks off of
                that volume and am able to start them fine.  My
                recolection is that any VM started on the "bad" volume
                causes it to be disconnected and forces the ovirt node
                to be marked down until Maint-&gt;Activate.</div>
              <div class="HOEnZb">
                <div class="h5">
                  <div class="gmail_extra"><br>
                    <div class="gmail_quote">On Tue, Sep 8, 2015 at 3:52
                      PM, Chris Liebman <span dir="ltr">&lt;<a
                          moz-do-not-send="true"
                          class="moz-txt-link-abbreviated"
                          href="mailto:chris.l@taboola.com"><a class="moz-txt-link-abbreviated" href="mailto:chris.l@taboola.com">chris.l@taboola.com</a></a>&gt;</span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div dir="ltr">In attempting to put an ovirt
                          cluster in production I'm running into some
                          off errors with gluster it looks like.  Its
                          12 hosts each with one brick in
                          distributed-replicate. Â (actually 2 bricks
                          but they are separate volumes)
                          <div><br>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    These 12 nodes in dist-rep config, are they in replica 2 or replica
    3? The latter is what is recommended for VM use-cases. Could you
    give the output of `gluster volume info` ?<br>
    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">
      <div class="moz-forward-container">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div class="HOEnZb">
                <div class="h5">
                  <div class="gmail_extra">
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div dir="ltr">
                          <div> </div>
                          <div>
                            <p><span>[root@ovirt-node268 glusterfs]# rpm
                                -qa | grep vdsm</span></p>
                            <p><span>vdsm-jsonrpc-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-gluster-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-xmlrpc-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-yajsonrpc-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-4.16.20-0.el6.x86_64</span></p>
                            <p><span>vdsm-python-zombiereaper-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-python-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-cli-4.16.20-0.el6.noarch</span></p>
                            <p><br>
                            </p>
                            <p>  Â Everything was fine last week,
                              however, today various clients in the
                              gluster cluster seem get "client quorum
                              not met" periodically - when they get this
                              they take one of the bricks offline - this
                              causes VM's to be attempted to move -
                              sometimes 20 at a time.  That takes a
                              long time :-(. I've tried disabling
                              automatic migration and teh VM's get
                              paused when this happens - resuming gets
                              nothing at that point as the volumes mount
                              on the server hosting the VM is not
                              connected:</p>
                            <div><br>
                            </div>
                            <div>
                              <p>from
rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02.log:</p>
                              <p><span>[2015-09-08 21:18:42.920771] W
                                  [MSGID: 108001]
                                  [afr-common.c:4043:afr_notify]
                                  2-LADC-TBX-V02-replicate-2:
                                  Client-quorum is </span><span>not met</span></p>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    When client-quorum is not met (due to network disconnects, or
    gluster brick processes going down etc), gluster makes the volume
    read-only. This is expected behavior and prevents split-brains. It's
    probably a bit late, but do you have the  gluster fuse mount logs to
    confirm this indeed was the issue?<br>
    <br>
    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">
      <div class="moz-forward-container">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div class="HOEnZb">
                <div class="h5">
                  <div class="gmail_extra">
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div dir="ltr">
                          <div>
                            <div>
                              <p><span>[2015-09-08 21:18:42.931751] I
                                  [fuse-bridge.c:4900:fuse_thread_proc]
                                  0-fuse: unmounting
/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02</span></p>
                              <p><span>[2015-09-08 21:18:42.931836] W
                                  [glusterfsd.c:1219:cleanup_and_exit]
                                  (--&gt;/lib64/libpthread.so.0(+0x7a51)
                                  [0x7f1bebc84a51]
                                  --&gt;/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd)
                                  [0x405e4d]
                                  --&gt;/usr/sbin/glusterfs(cleanup_and_exit+0x</span></p>
                              <p><span>65) [0x4059b5] ) 0-: received
                                  signum (15), shutting down</span></p>
                              <p><span>[2015-09-08 21:18:42.931858] I
                                  [fuse-bridge.c:5595:fini] 0-fuse:
                                  Unmounting
'/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02'.</span></p>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    The VM pause you saw could be because of the unmount.I understand
    that a fix (<a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/40240/">https://gerrit.ovirt.org/#/c/40240/</a>)  went in for ovirt
    3-.6 (vdsm-4.17) to prevent vdsm from unmounting the gluster volume
    when vdsm exits/restarts. <br>
    Is it possible to run a test setup on 3.6 and see if this is still
    happening?<br>
    <br>
    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">
      <div class="moz-forward-container">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div class="HOEnZb">
                <div class="h5">
                  <div class="gmail_extra">
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div dir="ltr">
                          <div>
                            <div>
                              <p><span><br>
                                </span></p>
                              <p><span>And the mount is broken at that
                                  point:</span></p>
                            </div>
                            <div>
                              <p><span>[root@ovirt-node267 ~]# df</span></p>
                              <p><span><font color="#ff0000"><b>df:
                                      `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02':

                                      Transport endpoint is not
                                      connected</b></font></span></p>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Yes because it received a SIGTERM above.<br>
    <br>
    Thanks,<br>
    Ravi<br>
    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">
      <div class="moz-forward-container">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div class="HOEnZb">
                <div class="h5">
                  <div class="gmail_extra">
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div dir="ltr">
                          <div>
                            <div>
                              <p><span>Filesystem  Â  Â  Â  Â 
                                  Â Â 1K-blocks  Â 
                                  Â Â Used  Available Use% Mounted on</span></p>
                              <p><span>/dev/sda3  Â  Â  Â  Â  Â 
                                  Â Â 51475068   1968452   46885176   5%
                                  /</span></p>
                              <p><span>tmpfs   Â  Â  Â  Â  Â  Â 
                                  Â Â 132210244   Â  Â 
                                  Â Â 0  132210244   0% /dev/shm</span></p>
                              <p><span>/dev/sda2  Â  Â  Â  Â  Â  Â 
                                  Â Â 487652   Â Â 32409  
                                  Â Â 429643   8% /boot</span></p>
                              <p><span>/dev/sda1  Â  Â  Â  Â  Â  Â 
                                  Â Â 204580   Â  Â Â 260  
                                  Â Â 204320   1% /boot/efi</span></p>
                              <p><span>/dev/sda5  Â  Â  Â  Â 
                                  Â Â 1849960960 156714056
                                  1599267616   9% /data1</span></p>
                              <p><span>/dev/sdb1  Â  Â  Â  Â 
                                  Â Â 1902274676  18714468
                                  1786923588   2% /data2</span></p>
                              <p><span>ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01</span></p>
                              <p><span>   Â  Â  Â  Â  Â  Â  Â  Â 
                                  Â Â 9249804800 727008640 <a
                                    moz-do-not-send="true"
                                    href="tel:8052899712"
                                    value="+18052899712" target="_blank">8052899712</a>   9%
/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V01</span></p>
                              <p><span>ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03</span></p>
                              <p><span>   Â  Â  Â  Â  Â  Â  Â  Â 
                                  Â Â 1849960960   Â Â 73728
                                  1755907968   1%
/rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:_LADC-TBX-V03</span></p>
                              <p>The fix for that is to put the server
                                in maintenance mode then activate it
                                again. But all VM's need to be migrated
                                or stopped for that to work.</p>
                            </div>
                            <div><br>
                            </div>
                            <div>I'm not seeing any obvious network or
                              disk errors...... </div>
                          </div>
                          <div><br>
                          </div>
                          <div>Are their configuration options I'm
                            missing?</div>
                          <div><br>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                    <br>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
        <br>
      </div>
      <br>
    </blockquote>
    <br>
  </body>
</html>