[ovirt-users] Not able to resume a VM which was paused because of gluster quorum issue

Ramesh Nachimuthu rnachimu at redhat.com
Thu Sep 24 06:06:11 UTC 2015



On 09/24/2015 11:28 AM, Nir Soffer wrote:
> On Thu, Sep 24, 2015 at 7:37 AM, Ramesh Nachimuthu 
> <rnachimu at redhat.com <mailto:rnachimu at redhat.com>> wrote:
>
>
>
>     On 09/24/2015 02:38 AM, Darrell Budic wrote:
>>     This is a known issue in overt 3.5.x and below. It’s been solved
>>     in the upcoming ovirt 3.6.
>>
>>     Related to https://bugzilla.redhat.com/show_bug.cgi?id=1172905,
>>     the fix involved setting up a special cgroup for the mount, but i
>>     can’t find the exact details atm.
>>
>
>     I have vdsm 4.17.6-0.el7.centos already installed on the hosts. So
>     I am not sure above bug 1172905
>     <https://bugzilla.redhat.com/show_bug.cgi?id=1172905> fixes this
>     correctly.
>
>
> I think the root cause is the same - qemu cannot recover from 
> glusterfs unmount, and the only way to resume the vm is to restart it 
> with a fresh mount.
>
> The mentioned bug handle the case where stopping vdsm kills the 
> glusterfs mount helper. This issue is fixed in 3.6.
>
> The issue here seems different. I suggest you open a bug so gluster 
> guys can investigate this.
>

Seems like I am hitting the issue reported in bz 
https://bugzilla.redhat.com/show_bug.cgi?id=1171261.

Regards,
Ramesh

> Nir
>
>
>
>     Regards,
>     Ramesh
>
>
>>
>>>     On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu
>>>     <rnachimu at redhat.com <mailto:rnachimu at redhat.com>> wrote:
>>>
>>>
>>>
>>>     On 09/22/2015 05:57 PM, Alastair Neil wrote:
>>>>     You need to set the gluster.server-quorum-ratio to 51%
>>>>
>>>
>>>     I did that. But still I am facing the same issue. VM get paused
>>>     when I do some I/O using fio on some disks backed by gluster. I
>>>     am not able to resume the VM after this. Now only way is to
>>>     bring down the VM and run again. It runs successfully on the
>>>     same host without any issue.
>>>
>>>     Regards,
>>>     Ramesh
>>>
>>>>     On 22 September 2015 at 08:25, Ramesh Nachimuthu
>>>>     <rnachimu at redhat.com <mailto:rnachimu at redhat.com>> wrote:
>>>>
>>>>
>>>>
>>>>         On 09/22/2015 05:43 PM, Alastair Neil wrote:
>>>>>         what are the gluster-quorum-type
>>>>>         and gluster.server-quorum-ratio  settings on the volume?
>>>>>
>>>>
>>>>         *cluster.server-quorum-type*:server
>>>>         *cluster.quorum-type*:auto
>>>>         *gluster.server-quorum-ratio is not set.*
>>>>
>>>>         One brick process is purposefully killed  but remaining two
>>>>         bricks are up and running.
>>>>
>>>>         Regards,
>>>>         Ramesh
>>>>
>>>>>         On 22 September 2015 at 06:24, Ramesh Nachimuthu
>>>>>         <rnachimu at redhat.com <mailto:rnachimu at redhat.com>> wrote:
>>>>>
>>>>>             Hi,
>>>>>
>>>>>                I am not able to resume a VM which was paused
>>>>>             because of gluster client quorum issue. Here is what
>>>>>             happened in my setup.
>>>>>
>>>>>             1. Created a gluster storage domain which is backed by
>>>>>             gluster volume with replica 3.
>>>>>             2. Killed one brick process. So only two bricks are
>>>>>             running in replica 3 setup.
>>>>>             3. Created two VMs
>>>>>             4. Started some IO using fio on both of the VMs
>>>>>             5. After some time got the following error in gluster
>>>>>             mount and VMs moved to paused state.
>>>>>                      " server 10.70.45.17:49217
>>>>>             <http://10.70.45.17:49217/> has not responded in the
>>>>>             last 42 seconds, disconnecting."
>>>>>                   "vmstore-replicate-0:
>>>>>             e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as
>>>>>             quorum is not met"
>>>>>                   more gluster mount logs at
>>>>>             http://pastebin.com/UmiUQq0F
>>>>>             6. After some time gluster quorum is active and I am
>>>>>             able to write the the gluster file system.
>>>>>             7. When I try to resume the VM it doesn't work and I
>>>>>             got following error in vdsm log.
>>>>>             http://pastebin.com/aXiamY15
>>>>>
>>>>>
>>>>>             Regards,
>>>>>             Ramesh
>>>>>
>>>>>
>>>>>             _______________________________________________
>>>>>             Users mailing list
>>>>>             Users at ovirt.org <mailto:Users at ovirt.org>
>>>>>             http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>     _______________________________________________
>>>     Users mailing list
>>>     Users at ovirt.org <mailto:Users at ovirt.org>
>>>     http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
>     _______________________________________________
>     Users mailing list
>     Users at ovirt.org <mailto:Users at ovirt.org>
>     http://lists.ovirt.org/mailman/listinfo/users
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150924/90f4a3bf/attachment-0001.html>


More information about the Users mailing list