<div dir="ltr">Sorry - its too late - all hosts have been re-imaged and are setup as local storage.</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 21, 2015 at 10:38 PM, Ravishankar N <span dir="ltr"><<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
Hi Chris,<br>
<br>
Replies inline..<br>
<br>
<div>On 09/22/2015 09:31 AM, Sahina Bose
wrote:<br>
</div>
<blockquote type="cite">
<br>
<div><br>
<br>
-------- Forwarded Message --------
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<th nowrap valign="BASELINE" align="RIGHT">Subject:
</th>
<td>Re: [ovirt-users] urgent issue</td>
</tr>
<tr>
<th nowrap valign="BASELINE" align="RIGHT">Date:
</th>
<td>Wed, 9 Sep 2015 08:31:07 -0700</td>
</tr>
<tr>
<th nowrap valign="BASELINE" align="RIGHT">From:
</th>
<td>Chris Liebman <a href="mailto:chris.l@taboola.com" target="_blank"><chris.l@taboola.com></a></td>
</tr>
<tr>
<th nowrap valign="BASELINE" align="RIGHT">To: </th>
<td>users <a href="mailto:users@ovirt.org" target="_blank"><users@ovirt.org></a></td>
</tr>
</tbody>
</table>
<br>
<br>
<div dir="ltr">Ok - I think I'm going to switch to local storage
- I've had way to many unexplainable issue with glusterfs
 :-(. Is there any reason I cant add local storage to the
existing shared-storage cluster? I see that the menu item is
greyed out....
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
What version of gluster and ovirt are you using? <br>
<br>
<blockquote type="cite">
<div>
<div dir="ltr">
<div> </div>
<div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote"><span class="">On Tue, Sep 8, 2015 at 4:19 PM, Chris
Liebman <span dir="ltr"><<a href="mailto:chris.l@taboola.com" target="_blank">chris.l@taboola.com</a>></span>
wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Its possible that this is specific to just
one gluster volume... I've moved a few VM disks off of
that volume and am able to start them fine. My
recolection is that any VM started on the "bad" volume
causes it to be disconnected and forces the ovirt node
to be marked down until Maint->Activate.</div>
<div>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote"><span class="">On Tue, Sep 8, 2015 at 3:52
PM, Chris Liebman <span dir="ltr"><<a href="mailto:chris.l@taboola.com" target="_blank"></a><a href="mailto:chris.l@taboola.com" target="_blank">chris.l@taboola.com</a>></span>
wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">In attempting to put an ovirt
cluster in production I'm running into some
off errors with gluster it looks like. Its
12 hosts each with one brick in
distributed-replicate. Â (actually 2 bricks
but they are separate volumes)
<div><br>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
These 12 nodes in dist-rep config, are they in replica 2 or replica
3? The latter is what is recommended for VM use-cases. Could you
give the output of `gluster volume info` ?<br>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div> </div>
<div><span class="">
<p><span>[root@ovirt-node268 glusterfs]# rpm
-qa | grep vdsm</span></p>
<p><span>vdsm-jsonrpc-4.16.20-0.el6.noarch</span></p>
<p><span>vdsm-gluster-4.16.20-0.el6.noarch</span></p>
<p><span>vdsm-xmlrpc-4.16.20-0.el6.noarch</span></p>
<p><span>vdsm-yajsonrpc-4.16.20-0.el6.noarch</span></p>
<p><span>vdsm-4.16.20-0.el6.x86_64</span></p>
<p><span>vdsm-python-zombiereaper-4.16.20-0.el6.noarch</span></p>
<p><span>vdsm-python-4.16.20-0.el6.noarch</span></p>
<p><span>vdsm-cli-4.16.20-0.el6.noarch</span></p>
<p><br>
</p>
</span><p>Â Â Everything was fine last week,
however, today various clients in the
gluster cluster seem get "client quorum
not met" periodically - when they get this
they take one of the bricks offline - this
causes VM's to be attempted to move -
sometimes 20 at a time. That takes a
long time :-(. I've tried disabling
automatic migration and teh VM's get
paused when this happens - resuming gets
nothing at that point as the volumes mount
on the server hosting the VM is not
connected:</p>
<div><br>
</div>
<div>
<p>from
rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02.log:</p>
<p><span>[2015-09-08 21:18:42.920771] W
[MSGID: 108001]
[afr-common.c:4043:afr_notify]
2-LADC-TBX-V02-replicate-2:
Client-quorum is </span><span>not met</span></p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
When client-quorum is not met (due to network disconnects, or
gluster brick processes going down etc), gluster makes the volume
read-only. This is expected behavior and prevents split-brains. It's
probably a bit late, but do you have the gluster fuse mount logs to
confirm this indeed was the issue?<span class=""><br>
<br>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>
<p><span>[2015-09-08 21:18:42.931751] I
[fuse-bridge.c:4900:fuse_thread_proc]
0-fuse: unmounting
/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02</span></p>
<p><span>[2015-09-08 21:18:42.931836] W
[glusterfsd.c:1219:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7a51)
[0x7f1bebc84a51]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd)
[0x405e4d]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x</span></p>
<p><span>65) [0x4059b5] ) 0-: received
signum (15), shutting down</span></p>
<p><span>[2015-09-08 21:18:42.931858] I
[fuse-bridge.c:5595:fini] 0-fuse:
Unmounting
'/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02'.</span></p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br></span>
The VM pause you saw could be because of the unmount.I understand
that a fix (<a href="https://gerrit.ovirt.org/#/c/40240/" target="_blank">https://gerrit.ovirt.org/#/c/40240/</a>) went in for ovirt
3-.6 (vdsm-4.17) to prevent vdsm from unmounting the gluster volume
when vdsm exits/restarts. <br>
Is it possible to run a test setup on 3.6 and see if this is still
happening?<span class=""><br>
<br>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>
<p><span><br>
</span></p>
<p><span>And the mount is broken at that
point:</span></p>
</div>
<div>
<p><span>[root@ovirt-node267 ~]# df</span></p>
<p><span><font color="#ff0000"><b>df:
`/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02':
Transport endpoint is not
connected</b></font></span></p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br></span>
Yes because it received a SIGTERM above.<br>
<br>
Thanks,<br>
Ravi<br>
<blockquote type="cite">
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>
<p><span>Filesystem    Â
  1K-blocks Â
  Used  Available Use% Mounted on</span></p>
<p><span>/dev/sda3Â Â Â Â Â Â
  51475068   1968452   46885176   5%
/</span></p>
<p><span>tmpfs       Â
  132210244   Â
  0  132210244   0% /dev/shm</span></p>
<p><span>/dev/sda2Â Â Â Â Â Â Â
  487652    32409 Â
  429643   8% /boot</span></p>
<p><span>/dev/sda1Â Â Â Â Â Â Â
  204580     260 Â
  204320   1% /boot/efi</span></p>
<p><span>/dev/sda5Â Â Â Â Â
  1849960960 156714056
1599267616Â Â Â 9% /data1</span></p>
<p><span>/dev/sdb1Â Â Â Â Â
  1902274676  18714468
1786923588Â Â Â 2% /data2</span></p>
<p><span>ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01</span></p>
<p><span>Â Â Â Â Â Â Â Â Â Â
  9249804800 727008640 <a href="tel:8052899712" value="+18052899712" target="_blank">8052899712</a>   9%
/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V01</span></p>
<p><span>ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03</span></p>
<p><span>Â Â Â Â Â Â Â Â Â Â
  1849960960    73728
1755907968Â Â Â 1%
/rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:_LADC-TBX-V03</span></p><span class="">
<p>The fix for that is to put the server
in maintenance mode then activate it
again. But all VM's need to be migrated
or stopped for that to work.</p>
</span></div>
<div><br>
</div>
<div>I'm not seeing any obvious network or
disk errors...... </div>
</div><span class="">
<div><br>
</div>
<div>Are their configuration options I'm
missing?</div>
<div><br>
</div>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
</div>
<br>
</blockquote>
<br>
</div>
</blockquote></div><br></div>