<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Chris,<br>

    <br>

    Replies inline..<br>

    <br>

    <div class="moz-cite-prefix">On 09/22/2015 09:31 AM, Sahina Bose

      wrote:<br>

    </div>

    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      <br>

      <div class="moz-forward-container"><br>

        <br>

        -------- Forwarded Message --------

        <table class="moz-email-headers-table" border="0"

          cellpadding="0" cellspacing="0">

          <tbody>

            <tr>

              <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Subject:


              </th>

              <td>Re: [ovirt-users] urgent issue</td>

            </tr>

            <tr>

              <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Date:

              </th>

              <td>Wed, 9 Sep 2015 08:31:07 -0700</td>

            </tr>

            <tr>

              <th nowrap="nowrap" valign="BASELINE" align="RIGHT">From:

              </th>

              <td>Chris Liebman <a moz-do-not-send="true"

                  class="moz-txt-link-rfc2396E"

                  href="mailto:chris.l@taboola.com">&lt;chris.l@taboola.com&gt;</a></td>

            </tr>

            <tr>

              <th nowrap="nowrap" valign="BASELINE" align="RIGHT">To: </th>

              <td>users <a moz-do-not-send="true"

                  class="moz-txt-link-rfc2396E"

                  href="mailto:users@ovirt.org">&lt;users@ovirt.org&gt;</a></td>

            </tr>

          </tbody>

        </table>

        <br>

        <br>

        <div dir="ltr">Ok - I think I'm going to switch to local storage

          - I've had way to many unexplainable issue with glusterfs

          Â :-(.Â  Is there any reason I cant add local storage to the

          existing shared-storage cluster?Â  I see that the menu item is

          greyed out....

          <div><br>

          </div>

          <div><br>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    What version of gluster and ovirt are you using? <br>

    <br>

    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">

      <div class="moz-forward-container">

        <div dir="ltr">

          <div> </div>

          <div>

            <div><br>

            </div>

            <div><br>

            </div>

          </div>

        </div>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Tue, Sep 8, 2015 at 4:19 PM, Chris

            Liebman <span dir="ltr">&lt;<a moz-do-not-send="true"

                href="mailto:chris.l@taboola.com" target="_blank">chris.l@taboola.com</a>&gt;</span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div dir="ltr">Its possible that this is specific to just

                one gluster volume...Â  I've moved a few VM disks off of

                that volume and am able to start them fine.Â  My

                recolection is that any VM started on the "bad" volume

                causes it to be disconnected and forces the ovirt node

                to be marked down until Maint-&gt;Activate.</div>

              <div class="HOEnZb">

                <div class="h5">

                  <div class="gmail_extra"><br>

                    <div class="gmail_quote">On Tue, Sep 8, 2015 at 3:52

                      PM, Chris Liebman <span dir="ltr">&lt;<a

                          moz-do-not-send="true"

                          class="moz-txt-link-abbreviated"

                          href="mailto:chris.l@taboola.com"><a class="moz-txt-link-abbreviated" href="mailto:chris.l@taboola.com">chris.l@taboola.com</a></a>&gt;</span>

                      wrote:<br>

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div dir="ltr">In attempting to put an ovirt

                          cluster in production I'm running into some

                          off errors with gluster it looks like.Â  Its

                          12 hosts each with one brick in

                          distributed-replicate. Â (actually 2 bricks

                          but they are separate volumes)

                          <div><br>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    These 12 nodes in dist-rep config, are they in replica 2 or replica

    3? The latter is what is recommended for VM use-cases. Could you

    give the output of `gluster volume info` ?<br>

    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">

      <div class="moz-forward-container">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div class="HOEnZb">

                <div class="h5">

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div dir="ltr">

                          <div> </div>

                          <div>

                            <p><span>[root@ovirt-node268 glusterfs]# rpm

                                -qa | grep vdsm</span></p>

                            <p><span>vdsm-jsonrpc-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-gluster-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-xmlrpc-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-yajsonrpc-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-4.16.20-0.el6.x86_64</span></p>

                            <p><span>vdsm-python-zombiereaper-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-python-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-cli-4.16.20-0.el6.noarch</span></p>

                            <p><br>

                            </p>

                            <p>Â  Â Everything was fine last week,

                              however, today various clients in the

                              gluster cluster seem get "client quorum

                              not met" periodically - when they get this

                              they take one of the bricks offline - this

                              causes VM's to be attempted to move -

                              sometimes 20 at a time.Â  That takes a

                              long time :-(. I've tried disabling

                              automatic migration and teh VM's get

                              paused when this happens - resuming gets

                              nothing at that point as the volumes mount

                              on the server hosting the VM is not

                              connected:</p>

                            <div><br>

                            </div>

                            <div>

                              <p>from

rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02.log:</p>

                              <p><span>[2015-09-08 21:18:42.920771] W

                                  [MSGID: 108001]

                                  [afr-common.c:4043:afr_notify]

                                  2-LADC-TBX-V02-replicate-2:

                                  Client-quorum isÂ </span><span>not met</span></p>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    When client-quorum is not met (due to network disconnects, or

    gluster brick processes going down etc), gluster makes the volume

    read-only. This is expected behavior and prevents split-brains. It's

    probably a bit late, but do you have the  gluster fuse mount logs to

    confirm this indeed was the issue?<br>

    <br>

    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">

      <div class="moz-forward-container">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div class="HOEnZb">

                <div class="h5">

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div dir="ltr">

                          <div>

                            <div>

                              <p><span>[2015-09-08 21:18:42.931751] I

                                  [fuse-bridge.c:4900:fuse_thread_proc]

                                  0-fuse: unmounting

/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02</span></p>

                              <p><span>[2015-09-08 21:18:42.931836] W

                                  [glusterfsd.c:1219:cleanup_and_exit]

                                  (--&gt;/lib64/libpthread.so.0(+0x7a51)

                                  [0x7f1bebc84a51]

                                  --&gt;/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd)

                                  [0x405e4d]

                                  --&gt;/usr/sbin/glusterfs(cleanup_and_exit+0x</span></p>

                              <p><span>65) [0x4059b5] ) 0-: received

                                  signum (15), shutting down</span></p>

                              <p><span>[2015-09-08 21:18:42.931858] I

                                  [fuse-bridge.c:5595:fini] 0-fuse:

                                  Unmounting

'/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02'.</span></p>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    The VM pause you saw could be because of the unmount.I understand

    that a fix (<a class="moz-txt-link-freetext" href="https://gerrit.ovirt.org/#/c/40240/">https://gerrit.ovirt.org/#/c/40240/</a>)  went in for ovirt

    3-.6 (vdsm-4.17) to prevent vdsm from unmounting the gluster volume

    when vdsm exits/restarts. <br>

    Is it possible to run a test setup on 3.6 and see if this is still

    happening?<br>

    <br>

    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">

      <div class="moz-forward-container">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div class="HOEnZb">

                <div class="h5">

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div dir="ltr">

                          <div>

                            <div>

                              <p><span><br>

                                </span></p>

                              <p><span>And the mount is broken at that

                                  point:</span></p>

                            </div>

                            <div>

                              <p><span>[root@ovirt-node267 ~]# df</span></p>

                              <p><span><font color="#ff0000"><b>df:

                                      `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02':


                                      Transport endpoint is not

                                      connected</b></font></span></p>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Yes because it received a SIGTERM above.<br>

    <br>

    Thanks,<br>

    Ravi<br>

    <blockquote cite="mid:5600D288.8090608@redhat.com" type="cite">

      <div class="moz-forward-container">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div class="HOEnZb">

                <div class="h5">

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div dir="ltr">

                          <div>

                            <div>

                              <p><span>FilesystemÂ  Â  Â  Â  Â 

                                  Â Â 1K-blocksÂ  Â 

                                  Â Â UsedÂ Â Available Use% Mounted on</span></p>

                              <p><span>/dev/sda3Â  Â  Â  Â  Â  Â 

                                  Â Â 51475068Â Â Â 1968452Â Â Â 46885176Â Â Â 5%

                                  /</span></p>

                              <p><span>tmpfsÂ Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 132210244Â Â  Â  Â 

                                  Â Â 0Â Â 132210244Â Â Â 0% /dev/shm</span></p>

                              <p><span>/dev/sda2Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 487652Â Â  Â Â 32409Â Â 

                                  Â Â 429643Â Â Â 8% /boot</span></p>

                              <p><span>/dev/sda1Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 204580Â Â  Â  Â Â 260Â Â 

                                  Â Â 204320Â Â Â 1% /boot/efi</span></p>

                              <p><span>/dev/sda5Â  Â  Â  Â  Â 

                                  Â Â 1849960960 156714056

                                  1599267616Â Â Â 9% /data1</span></p>

                              <p><span>/dev/sdb1Â  Â  Â  Â  Â 

                                  Â Â 1902274676Â Â 18714468

                                  1786923588Â Â Â 2% /data2</span></p>

                              <p><span>ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01</span></p>

                              <p><span>Â Â  Â  Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 9249804800 727008640 <a

                                    moz-do-not-send="true"

                                    href="tel:8052899712"

                                    value="+18052899712" target="_blank">8052899712</a>Â Â Â 9%

/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V01</span></p>

                              <p><span>ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03</span></p>

                              <p><span>Â Â  Â  Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 1849960960Â Â  Â Â 73728

                                  1755907968Â Â Â 1%

/rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:_LADC-TBX-V03</span></p>

                              <p>The fix for that is to put the server

                                in maintenance mode then activate it

                                again. But all VM's need to be migrated

                                or stopped for that to work.</p>

                            </div>

                            <div><br>

                            </div>

                            <div>I'm not seeing any obvious network or

                              disk errors......Â </div>

                          </div>

                          <div><br>

                          </div>

                          <div>Are their configuration options I'm

                            missing?</div>

                          <div><br>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                    <br>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

        <br>

      </div>

      <br>

    </blockquote>

    <br>

  </body>

</html>