On Thu, Sep 24, 2015 at 9:06 AM, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:


On 09/24/2015 11:28 AM, Nir Soffer wrote:
On Thu, Sep 24, 2015 at 7:37 AM, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:


On 09/24/2015 02:38 AM, Darrell Budic wrote:
This is a known issue in overt 3.5.x and below. It’s been solved in the upcoming ovirt 3.6.

Related to https://bugzilla.redhat.com/show_bug.cgi?id=1172905, the fix involved setting up a special cgroup for the mount, but i can’t find the exact details atm.


I have vdsm 4.17.6-0.el7.centos already installed on the hosts. So I am not sure above bug 1172905 fixes this correctly.

I think the root cause is the same - qemu cannot recover from glusterfs unmount, and the only way to resume the vm is to restart it with a fresh mount.

The mentioned bug handle the case where stopping vdsm kills the glusterfs mount helper. This issue is fixed in 3.6. 

The issue here seems different. I suggest you open a bug so gluster guys can investigate this.


Seems like I am hitting the issue reported in bz https://bugzilla.redhat.com/show_bug.cgi?id=1171261.

Indeed.

I would open an ovirt bug anyway and make it depend on the glusterfs bug.

We need a way to track this issues, and having no ovirt/rhev hides this issue.
 

Regards,
Ramesh


Nir


 

Regards,
Ramesh



On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:



On 09/22/2015 05:57 PM, Alastair Neil wrote:
You need to set the gluster.server-quorum-ratio to 51%


I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.

Regards,
Ramesh

On 22 September 2015 at 08:25, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:


On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio  settings on the volume?


cluster.server-quorum-type:server
cluster.quorum-type:auto
gluster.server-quorum-ratio is not set.

One brick process is purposefully killed  but remaining two bricks are up and running.

Regards,
Ramesh

On 22 September 2015 at 06:24, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
Hi,

   I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.

1. Created a gluster storage domain which is backed by gluster volume with replica 3.
2. Killed one brick process. So only two bricks are running in replica 3 setup.
3. Created two VMs
4. Started some IO using fio on both of the VMs
5. After some time got the following error in gluster mount and VMs moved to paused state.
         " server 10.70.45.17:49217 has not responded in the last 42 seconds, disconnecting."
      "
vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met"
      more gluster mount logs at http://pastebin.com/UmiUQq0F
6. After some time gluster quorum is active and I am able to write the the gluster file system.
7. When I try to resume the VM it doesn't work and I got following error in vdsm log.
      http://pastebin.com/aXiamY15


Regards,
Ramesh


_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users





_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users