<div dir="ltr">Sorry - its too late - all hosts have been re-imaged and are setup as local storage.</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 21, 2015 at 10:38 PM, Ravishankar N <span dir="ltr">&lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div text="#000000" bgcolor="#FFFFFF">

    Hi Chris,<br>

    <br>

    Replies inline..<br>

    <br>

    <div>On 09/22/2015 09:31 AM, Sahina Bose

      wrote:<br>

    </div>

    <blockquote type="cite">

      
      <br>

      <div><br>

        <br>

        -------- Forwarded Message --------

        <table border="0" cellpadding="0" cellspacing="0">

          <tbody>

            <tr>

              <th nowrap valign="BASELINE" align="RIGHT">Subject:


              </th>

              <td>Re: [ovirt-users] urgent issue</td>

            </tr>

            <tr>

              <th nowrap valign="BASELINE" align="RIGHT">Date:

              </th>

              <td>Wed, 9 Sep 2015 08:31:07 -0700</td>

            </tr>

            <tr>

              <th nowrap valign="BASELINE" align="RIGHT">From:

              </th>

              <td>Chris Liebman <a href="mailto:chris.l@taboola.com" target="_blank">&lt;chris.l@taboola.com&gt;</a></td>

            </tr>

            <tr>

              <th nowrap valign="BASELINE" align="RIGHT">To: </th>

              <td>users <a href="mailto:users@ovirt.org" target="_blank">&lt;users@ovirt.org&gt;</a></td>

            </tr>

          </tbody>

        </table>

        <br>

        <br>

        <div dir="ltr">Ok - I think I&#39;m going to switch to local storage

          - I&#39;ve had way to many unexplainable issue with glusterfs

          Â :-(.Â  Is there any reason I cant add local storage to the

          existing shared-storage cluster?Â  I see that the menu item is

          greyed out....

          <div><br>

          </div>

          <div><br>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    What version of gluster and ovirt are you using? <br>

    <br>

    <blockquote type="cite">

      <div>

        <div dir="ltr">

          <div> </div>

          <div>

            <div><br>

            </div>

            <div><br>

            </div>

          </div>

        </div>

        <div class="gmail_extra"><br>

          <div class="gmail_quote"><span class="">On Tue, Sep 8, 2015 at 4:19 PM, Chris

            Liebman <span dir="ltr">&lt;<a href="mailto:chris.l@taboola.com" target="_blank">chris.l@taboola.com</a>&gt;</span>

            wrote:<br>

            </span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div dir="ltr">Its possible that this is specific to just

                one gluster volume...Â  I&#39;ve moved a few VM disks off of

                that volume and am able to start them fine.Â  My

                recolection is that any VM started on the &quot;bad&quot; volume

                causes it to be disconnected and forces the ovirt node

                to be marked down until Maint-&gt;Activate.</div>

              <div>

                <div>

                  <div class="gmail_extra"><br>

                    <div class="gmail_quote"><span class="">On Tue, Sep 8, 2015 at 3:52

                      PM, Chris Liebman <span dir="ltr">&lt;<a href="mailto:chris.l@taboola.com" target="_blank"></a><a href="mailto:chris.l@taboola.com" target="_blank">chris.l@taboola.com</a>&gt;</span>

                      wrote:<br>

                      </span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                        <div dir="ltr">In attempting to put an ovirt

                          cluster in production I&#39;m running into some

                          off errors with gluster it looks like.Â  Its

                          12 hosts each with one brick in

                          distributed-replicate. Â (actually 2 bricks

                          but they are separate volumes)

                          <div><br>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    These 12 nodes in dist-rep config, are they in replica 2 or replica

    3? The latter is what is recommended for VM use-cases. Could you

    give the output of `gluster volume info` ?<br>

    <blockquote type="cite">

      <div>

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div>

                <div>

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                        <div dir="ltr">

                          <div> </div>

                          <div><span class="">

                            <p><span>[root@ovirt-node268 glusterfs]# rpm

                                -qa | grep vdsm</span></p>

                            <p><span>vdsm-jsonrpc-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-gluster-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-xmlrpc-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-yajsonrpc-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-4.16.20-0.el6.x86_64</span></p>

                            <p><span>vdsm-python-zombiereaper-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-python-4.16.20-0.el6.noarch</span></p>

                            <p><span>vdsm-cli-4.16.20-0.el6.noarch</span></p>

                            <p><br>

                            </p>

                            </span><p>Â  Â Everything was fine last week,

                              however, today various clients in the

                              gluster cluster seem get &quot;client quorum

                              not met&quot; periodically - when they get this

                              they take one of the bricks offline - this

                              causes VM&#39;s to be attempted to move -

                              sometimes 20 at a time.Â  That takes a

                              long time :-(. I&#39;ve tried disabling

                              automatic migration and teh VM&#39;s get

                              paused when this happens - resuming gets

                              nothing at that point as the volumes mount

                              on the server hosting the VM is not

                              connected:</p>

                            <div><br>

                            </div>

                            <div>

                              <p>from

rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02.log:</p>

                              <p><span>[2015-09-08 21:18:42.920771] W

                                  [MSGID: 108001]

                                  [afr-common.c:4043:afr_notify]

                                  2-LADC-TBX-V02-replicate-2:

                                  Client-quorum isÂ </span><span>not met</span></p>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    When client-quorum is not met (due to network disconnects, or

    gluster brick processes going down etc), gluster makes the volume

    read-only. This is expected behavior and prevents split-brains. It&#39;s

    probably a bit late, but do you have the  gluster fuse mount logs to

    confirm this indeed was the issue?<span class=""><br>

    <br>

    <blockquote type="cite">

      <div>

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div>

                <div>

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                        <div dir="ltr">

                          <div>

                            <div>

                              <p><span>[2015-09-08 21:18:42.931751] I

                                  [fuse-bridge.c:4900:fuse_thread_proc]

                                  0-fuse: unmounting

/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02</span></p>

                              <p><span>[2015-09-08 21:18:42.931836] W

                                  [glusterfsd.c:1219:cleanup_and_exit]

                                  (--&gt;/lib64/libpthread.so.0(+0x7a51)

                                  [0x7f1bebc84a51]

                                  --&gt;/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd)

                                  [0x405e4d]

                                  --&gt;/usr/sbin/glusterfs(cleanup_and_exit+0x</span></p>

                              <p><span>65) [0x4059b5] ) 0-: received

                                  signum (15), shutting down</span></p>

                              <p><span>[2015-09-08 21:18:42.931858] I

                                  [fuse-bridge.c:5595:fini] 0-fuse:

                                  Unmounting

&#39;/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02&#39;.</span></p>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br></span>

    The VM pause you saw could be because of the unmount.I understand

    that a fix (<a href="https://gerrit.ovirt.org/#/c/40240/" target="_blank">https://gerrit.ovirt.org/#/c/40240/</a>)  went in for ovirt

    3-.6 (vdsm-4.17) to prevent vdsm from unmounting the gluster volume

    when vdsm exits/restarts. <br>

    Is it possible to run a test setup on 3.6 and see if this is still

    happening?<span class=""><br>

    <br>

    <blockquote type="cite">

      <div>

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div>

                <div>

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                        <div dir="ltr">

                          <div>

                            <div>

                              <p><span><br>

                                </span></p>

                              <p><span>And the mount is broken at that

                                  point:</span></p>

                            </div>

                            <div>

                              <p><span>[root@ovirt-node267 ~]# df</span></p>

                              <p><span><font color="#ff0000"><b>df:

                                      `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02&#39;:


                                      Transport endpoint is not

                                      connected</b></font></span></p>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br></span>

    Yes because it received a SIGTERM above.<br>

    <br>

    Thanks,<br>

    Ravi<br>

    <blockquote type="cite">

      <div>

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div>

                <div>

                  <div class="gmail_extra">

                    <div class="gmail_quote">

                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                        <div dir="ltr">

                          <div>

                            <div>

                              <p><span>FilesystemÂ  Â  Â  Â  Â 

                                  Â Â 1K-blocksÂ  Â 

                                  Â Â UsedÂ Â Available Use% Mounted on</span></p>

                              <p><span>/dev/sda3Â  Â  Â  Â  Â  Â 

                                  Â Â 51475068Â Â Â 1968452Â Â Â 46885176Â Â Â 5%

                                  /</span></p>

                              <p><span>tmpfsÂ Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 132210244Â Â  Â  Â 

                                  Â Â 0Â Â 132210244Â Â Â 0% /dev/shm</span></p>

                              <p><span>/dev/sda2Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 487652Â Â  Â Â 32409Â Â 

                                  Â Â 429643Â Â Â 8% /boot</span></p>

                              <p><span>/dev/sda1Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 204580Â Â  Â  Â Â 260Â Â 

                                  Â Â 204320Â Â Â 1% /boot/efi</span></p>

                              <p><span>/dev/sda5Â  Â  Â  Â  Â 

                                  Â Â 1849960960 156714056

                                  1599267616Â Â Â 9% /data1</span></p>

                              <p><span>/dev/sdb1Â  Â  Â  Â  Â 

                                  Â Â 1902274676Â Â 18714468

                                  1786923588Â Â Â 2% /data2</span></p>

                              <p><span>ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01</span></p>

                              <p><span>Â Â  Â  Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 9249804800 727008640 <a href="tel:8052899712" value="+18052899712" target="_blank">8052899712</a>Â Â Â 9%

/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V01</span></p>

                              <p><span>ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03</span></p>

                              <p><span>Â Â  Â  Â  Â  Â  Â  Â  Â  Â 

                                  Â Â 1849960960Â Â  Â Â 73728

                                  1755907968Â Â Â 1%

/rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:_LADC-TBX-V03</span></p><span class="">

                              <p>The fix for that is to put the server

                                in maintenance mode then activate it

                                again. But all VM&#39;s need to be migrated

                                or stopped for that to work.</p>

                            </span></div>

                            <div><br>

                            </div>

                            <div>I&#39;m not seeing any obvious network or

                              disk errors......Â </div>

                          </div><span class="">

                          <div><br>

                          </div>

                          <div>Are their configuration options I&#39;m

                            missing?</div>

                          <div><br>

                          </div>

                        </span></div>

                      </blockquote>

                    </div>

                    <br>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

        <br>

      </div>

      <br>

    </blockquote>

    <br>

  </div>


</blockquote></div><br></div>