<div dir="ltr">Sorry - its too late - all hosts have been re-imaged and are setup as local storage.</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 21, 2015 at 10:38 PM, Ravishankar N <span dir="ltr">&lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    Hi Chris,<br>
    <br>
    Replies inline..<br>
    <br>
    <div>On 09/22/2015 09:31 AM, Sahina Bose
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <br>
      <div><br>
        <br>
        -------- Forwarded Message --------
        <table border="0" cellpadding="0" cellspacing="0">
          <tbody>
            <tr>
              <th nowrap valign="BASELINE" align="RIGHT">Subject:

              </th>
              <td>Re: [ovirt-users] urgent issue</td>
            </tr>
            <tr>
              <th nowrap valign="BASELINE" align="RIGHT">Date:
              </th>
              <td>Wed, 9 Sep 2015 08:31:07 -0700</td>
            </tr>
            <tr>
              <th nowrap valign="BASELINE" align="RIGHT">From:
              </th>
              <td>Chris Liebman <a href="mailto:chris.l@taboola.com" target="_blank">&lt;chris.l@taboola.com&gt;</a></td>
            </tr>
            <tr>
              <th nowrap valign="BASELINE" align="RIGHT">To: </th>
              <td>users <a href="mailto:users@ovirt.org" target="_blank">&lt;users@ovirt.org&gt;</a></td>
            </tr>
          </tbody>
        </table>
        <br>
        <br>
        <div dir="ltr">Ok - I think I&#39;m going to switch to local storage
          - I&#39;ve had way to many unexplainable issue with glusterfs
          Â :-(.  Is there any reason I cant add local storage to the
          existing shared-storage cluster?  I see that the menu item is
          greyed out....
          <div><br>
          </div>
          <div><br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    What version of gluster and ovirt are you using? <br>
    <br>
    <blockquote type="cite">
      <div>
        <div dir="ltr">
          <div> </div>
          <div>
            <div><br>
            </div>
            <div><br>
            </div>
          </div>
        </div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote"><span class="">On Tue, Sep 8, 2015 at 4:19 PM, Chris
            Liebman <span dir="ltr">&lt;<a href="mailto:chris.l@taboola.com" target="_blank">chris.l@taboola.com</a>&gt;</span>
            wrote:<br>
            </span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div dir="ltr">Its possible that this is specific to just
                one gluster volume...  I&#39;ve moved a few VM disks off of
                that volume and am able to start them fine.  My
                recolection is that any VM started on the &quot;bad&quot; volume
                causes it to be disconnected and forces the ovirt node
                to be marked down until Maint-&gt;Activate.</div>
              <div>
                <div>
                  <div class="gmail_extra"><br>
                    <div class="gmail_quote"><span class="">On Tue, Sep 8, 2015 at 3:52
                      PM, Chris Liebman <span dir="ltr">&lt;<a href="mailto:chris.l@taboola.com" target="_blank"></a><a href="mailto:chris.l@taboola.com" target="_blank">chris.l@taboola.com</a>&gt;</span>
                      wrote:<br>
                      </span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                        <div dir="ltr">In attempting to put an ovirt
                          cluster in production I&#39;m running into some
                          off errors with gluster it looks like.  Its
                          12 hosts each with one brick in
                          distributed-replicate. Â (actually 2 bricks
                          but they are separate volumes)
                          <div><br>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    These 12 nodes in dist-rep config, are they in replica 2 or replica
    3? The latter is what is recommended for VM use-cases. Could you
    give the output of `gluster volume info` ?<br>
    <blockquote type="cite">
      <div>
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div>
                  <div class="gmail_extra">
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                        <div dir="ltr">
                          <div> </div>
                          <div><span class="">
                            <p><span>[root@ovirt-node268 glusterfs]# rpm
                                -qa | grep vdsm</span></p>
                            <p><span>vdsm-jsonrpc-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-gluster-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-xmlrpc-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-yajsonrpc-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-4.16.20-0.el6.x86_64</span></p>
                            <p><span>vdsm-python-zombiereaper-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-python-4.16.20-0.el6.noarch</span></p>
                            <p><span>vdsm-cli-4.16.20-0.el6.noarch</span></p>
                            <p><br>
                            </p>
                            </span><p>  Â Everything was fine last week,
                              however, today various clients in the
                              gluster cluster seem get &quot;client quorum
                              not met&quot; periodically - when they get this
                              they take one of the bricks offline - this
                              causes VM&#39;s to be attempted to move -
                              sometimes 20 at a time.  That takes a
                              long time :-(. I&#39;ve tried disabling
                              automatic migration and teh VM&#39;s get
                              paused when this happens - resuming gets
                              nothing at that point as the volumes mount
                              on the server hosting the VM is not
                              connected:</p>
                            <div><br>
                            </div>
                            <div>
                              <p>from
rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02.log:</p>
                              <p><span>[2015-09-08 21:18:42.920771] W
                                  [MSGID: 108001]
                                  [afr-common.c:4043:afr_notify]
                                  2-LADC-TBX-V02-replicate-2:
                                  Client-quorum is </span><span>not met</span></p>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    When client-quorum is not met (due to network disconnects, or
    gluster brick processes going down etc), gluster makes the volume
    read-only. This is expected behavior and prevents split-brains. It&#39;s
    probably a bit late, but do you have the  gluster fuse mount logs to
    confirm this indeed was the issue?<span class=""><br>
    <br>
    <blockquote type="cite">
      <div>
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div>
                  <div class="gmail_extra">
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                        <div dir="ltr">
                          <div>
                            <div>
                              <p><span>[2015-09-08 21:18:42.931751] I
                                  [fuse-bridge.c:4900:fuse_thread_proc]
                                  0-fuse: unmounting
/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02</span></p>
                              <p><span>[2015-09-08 21:18:42.931836] W
                                  [glusterfsd.c:1219:cleanup_and_exit]
                                  (--&gt;/lib64/libpthread.so.0(+0x7a51)
                                  [0x7f1bebc84a51]
                                  --&gt;/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd)
                                  [0x405e4d]
                                  --&gt;/usr/sbin/glusterfs(cleanup_and_exit+0x</span></p>
                              <p><span>65) [0x4059b5] ) 0-: received
                                  signum (15), shutting down</span></p>
                              <p><span>[2015-09-08 21:18:42.931858] I
                                  [fuse-bridge.c:5595:fini] 0-fuse:
                                  Unmounting
&#39;/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02&#39;.</span></p>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    The VM pause you saw could be because of the unmount.I understand
    that a fix (<a href="https://gerrit.ovirt.org/#/c/40240/" target="_blank">https://gerrit.ovirt.org/#/c/40240/</a>)  went in for ovirt
    3-.6 (vdsm-4.17) to prevent vdsm from unmounting the gluster volume
    when vdsm exits/restarts. <br>
    Is it possible to run a test setup on 3.6 and see if this is still
    happening?<span class=""><br>
    <br>
    <blockquote type="cite">
      <div>
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div>
                  <div class="gmail_extra">
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                        <div dir="ltr">
                          <div>
                            <div>
                              <p><span><br>
                                </span></p>
                              <p><span>And the mount is broken at that
                                  point:</span></p>
                            </div>
                            <div>
                              <p><span>[root@ovirt-node267 ~]# df</span></p>
                              <p><span><font color="#ff0000"><b>df:
                                      `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02&#39;:

                                      Transport endpoint is not
                                      connected</b></font></span></p>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    Yes because it received a SIGTERM above.<br>
    <br>
    Thanks,<br>
    Ravi<br>
    <blockquote type="cite">
      <div>
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div>
                <div>
                  <div class="gmail_extra">
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                        <div dir="ltr">
                          <div>
                            <div>
                              <p><span>Filesystem  Â  Â  Â  Â 
                                  Â Â 1K-blocks  Â 
                                  Â Â Used  Available Use% Mounted on</span></p>
                              <p><span>/dev/sda3  Â  Â  Â  Â  Â 
                                  Â Â 51475068   1968452   46885176   5%
                                  /</span></p>
                              <p><span>tmpfs   Â  Â  Â  Â  Â  Â 
                                  Â Â 132210244   Â  Â 
                                  Â Â 0  132210244   0% /dev/shm</span></p>
                              <p><span>/dev/sda2  Â  Â  Â  Â  Â  Â 
                                  Â Â 487652   Â Â 32409  
                                  Â Â 429643   8% /boot</span></p>
                              <p><span>/dev/sda1  Â  Â  Â  Â  Â  Â 
                                  Â Â 204580   Â  Â Â 260  
                                  Â Â 204320   1% /boot/efi</span></p>
                              <p><span>/dev/sda5  Â  Â  Â  Â 
                                  Â Â 1849960960 156714056
                                  1599267616   9% /data1</span></p>
                              <p><span>/dev/sdb1  Â  Â  Â  Â 
                                  Â Â 1902274676  18714468
                                  1786923588   2% /data2</span></p>
                              <p><span>ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01</span></p>
                              <p><span>   Â  Â  Â  Â  Â  Â  Â  Â 
                                  Â Â 9249804800 727008640 <a href="tel:8052899712" value="+18052899712" target="_blank">8052899712</a>   9%
/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V01</span></p>
                              <p><span>ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03</span></p>
                              <p><span>   Â  Â  Â  Â  Â  Â  Â  Â 
                                  Â Â 1849960960   Â Â 73728
                                  1755907968   1%
/rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:_LADC-TBX-V03</span></p><span class="">
                              <p>The fix for that is to put the server
                                in maintenance mode then activate it
                                again. But all VM&#39;s need to be migrated
                                or stopped for that to work.</p>
                            </span></div>
                            <div><br>
                            </div>
                            <div>I&#39;m not seeing any obvious network or
                              disk errors...... </div>
                          </div><span class="">
                          <div><br>
                          </div>
                          <div>Are their configuration options I&#39;m
                            missing?</div>
                          <div><br>
                          </div>
                        </span></div>
                      </blockquote>
                    </div>
                    <br>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
        <br>
      </div>
      <br>
    </blockquote>
    <br>
  </div>

</blockquote></div><br></div>