Not able to resume a VM which was paused because of gluster quorum issue

This is a multi-part message in MIME format. --------------050502080707090302020409 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi, I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup. 1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15 Regards, Ramesh --------------050502080707090302020409 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> Hi,<br> <br> I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup. <br> <br> 1. Created a gluster storage domain which is backed by gluster volume with replica 3. <br> 2. Killed one brick process. So only two bricks are running in replica 3 setup.<br> 3. Created two VMs<br> 4. Started some IO using fio on both of the VMs<br> 5. After some time got the following error in gluster mount and VMs moved to paused state.<br> " <meta http-equiv="content-type" content="text/html; charset=utf-8"> <span style="color: rgb(51, 51, 51); font-family: monospace; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 13.2px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: rgb(255, 255, 255);">server 10.70.45.17:49217 has not responded in the last 42 seconds, disconnecting."<br> "</span><span style="color: rgb(51, 51, 51); font-family: monospace; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 13.2px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: rgb(255, 255, 255);"><span style="color: rgb(51, 51, 51); font-family: monospace; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 13.2px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: rgb(255, 255, 255);">vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met</span>"<br> more gluster mount logs at <a class="moz-txt-link-freetext" href="http://pastebin.com/UmiUQq0F">http://pastebin.com/UmiUQq0F</a><br> </span>6. After some time gluster quorum is active and I am able to write the the gluster file system.<br> 7. When I try to resume the VM it doesn't work and I got following error in vdsm log.<br> <a class="moz-txt-link-freetext" href="http://pastebin.com/aXiamY15">http://pastebin.com/aXiamY15</a><br> <br> <br> Regards,<br> Ramesh<br> <br> </body> </html> --------------050502080707090302020409--

what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume? On 22 September 2015 at 06:24, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------000009040208060202040809 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?
*cluster.server-quorum-type*:server *cluster.quorum-type*:auto *gluster.server-quorum-ratio is not set.* One brick process is purposefully killed but remaining two bricks are up and running. Regards, Ramesh
On 22 September 2015 at 06:24, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 <http://10.70.45.17:49217> has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
--------------000009040208060202040809 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> <br> <br> <div class="moz-cite-prefix">On 09/22/2015 05:43 PM, Alastair Neil wrote:<br> </div> <blockquote cite="mid:CA+SarwpUsiNz5MsUz66LYmEua-G=P5v_mbzOMkiErJZWwdE2MQ@mail.gmail.com" type="cite"> <div dir="ltr">what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?</div> <div class="gmail_extra"><br> </div> </blockquote> <br> <div style="outline-style:none;" __gwt_cell="cell-gwt-uid-89049"> <div class="" style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;" id="SubTabVolumeParameterView_table_content_col1_row9"><b>cluster.server-quorum-type</b>:server<br> <div title="" tabindex="0" style="outline-style: none;" __gwt_cell="cell-gwt-uid-89187"> <div class="" style="overflow: hidden; text-overflow: ellipsis; white-space: nowrap;" id="SubTabVolumeParameterView_table_content_col1_row13"><b>cluster.quorum-type</b>:auto<br> <b>gluster.server-quorum-ratio is not set.</b><br> <br> </div> </div> One brick process is purposefully killed but remaining two bricks are up and running.<br> <br> Regards,<br> Ramesh<br> </div> </div> <br> <blockquote cite="mid:CA+SarwpUsiNz5MsUz66LYmEua-G=P5v_mbzOMkiErJZWwdE2MQ@mail.gmail.com" type="cite"> <div class="gmail_extra"> <div class="gmail_quote">On 22 September 2015 at 06:24, Ramesh Nachimuthu <span dir="ltr"><<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" target="_blank">rnachimu@redhat.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> Hi,<br> <br> I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup. <br> <br> 1. Created a gluster storage domain which is backed by gluster volume with replica 3. <br> 2. Killed one brick process. So only two bricks are running in replica 3 setup.<br> 3. Created two VMs<br> 4. Started some IO using fio on both of the VMs<br> 5. After some time got the following error in gluster mount and VMs moved to paused state.<br> " <span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)">server <a moz-do-not-send="true" href="http://10.70.45.17:49217" target="_blank">10.70.45.17:49217</a> has not responded in the last 42 seconds, disconnecting."<br> "</span><span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)"><span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)">vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met</span>"<br> more gluster mount logs at <a moz-do-not-send="true" href="http://pastebin.com/UmiUQq0F" target="_blank"><a class="moz-txt-link-freetext" href="http://pastebin.com/UmiUQq0F">http://pastebin.com/UmiUQq0F</a></a><br> </span>6. After some time gluster quorum is active and I am able to write the the gluster file system.<br> 7. When I try to resume the VM it doesn't work and I got following error in vdsm log.<br> <a moz-do-not-send="true" href="http://pastebin.com/aXiamY15" target="_blank">http://pastebin.com/aXiamY15</a><br> <br> <br> Regards,<br> Ramesh<br> <br> </div> <br> _______________________________________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org">Users@ovirt.org</a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </blockquote> <br> </body> </html> --------------000009040208060202040809--

You need to set the gluster.server-quorum-ratio to 51% On 22 September 2015 at 08:25, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?
*cluster.server-quorum-type*:server *cluster.quorum-type*:auto *gluster.server-quorum-ratio is not set.*
One brick process is purposefully killed but remaining two bricks are up and running.
Regards, Ramesh
On 22 September 2015 at 06:24, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at <http://pastebin.com/UmiUQq0F> http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------040609060800000303090805 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 09/22/2015 05:57 PM, Alastair Neil wrote:
You need to set the gluster.server-quorum-ratio to 51%
I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue. Regards, Ramesh
On 22 September 2015 at 08:25, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?
*cluster.server-quorum-type*:server *cluster.quorum-type*:auto *gluster.server-quorum-ratio is not set.*
One brick process is purposefully killed but remaining two bricks are up and running.
Regards, Ramesh
On 22 September 2015 at 06:24, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 <http://10.70.45.17:49217> has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
--------------040609060800000303090805 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> <br> <br> <div class="moz-cite-prefix">On 09/22/2015 05:57 PM, Alastair Neil wrote:<br> </div> <blockquote cite="mid:CA+SarwoorU3LWG6+sR-tJ1BEbQa1k4WRrRXkaXy3z-EPRrz7Uw@mail.gmail.com" type="cite"> <div dir="ltr">You need to set the gluster.server-quorum-ratio to 51%</div> <div class="gmail_extra"><br> </div> </blockquote> <br> I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.<br> <br> Regards,<br> Ramesh<br> <br> <blockquote cite="mid:CA+SarwoorU3LWG6+sR-tJ1BEbQa1k4WRrRXkaXy3z-EPRrz7Uw@mail.gmail.com" type="cite"> <div class="gmail_extra"> <div class="gmail_quote">On 22 September 2015 at 08:25, Ramesh Nachimuthu <span dir="ltr"><<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" target="_blank">rnachimu@redhat.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"><span class=""> <br> <br> <div>On 09/22/2015 05:43 PM, Alastair Neil wrote:<br> </div> <blockquote type="cite"> <div dir="ltr">what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?</div> <div class="gmail_extra"><br> </div> </blockquote> <br> </span> <div style="outline-style:none"> <div style="overflow:hidden;text-overflow:ellipsis;white-space:nowrap"><b>cluster.server-quorum-type</b>:server<br> <div title="" style="outline-style:none"> <div style="overflow:hidden;text-overflow:ellipsis;white-space:nowrap"><b>cluster.quorum-type</b>:auto<br> <b>gluster.server-quorum-ratio is not set.</b><br> <br> </div> </div> One brick process is purposefully killed but remaining two bricks are up and running.<br> <br> Regards,<br> Ramesh<br> </div> </div> <span class=""> <br> <blockquote type="cite"> <div class="gmail_extra"> <div class="gmail_quote">On 22 September 2015 at 06:24, Ramesh Nachimuthu <span dir="ltr"><<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:rnachimu@redhat.com">rnachimu@redhat.com</a></a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> Hi,<br> <br> I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup. <br> <br> 1. Created a gluster storage domain which is backed by gluster volume with replica 3. <br> 2. Killed one brick process. So only two bricks are running in replica 3 setup.<br> 3. Created two VMs<br> 4. Started some IO using fio on both of the VMs<br> 5. After some time got the following error in gluster mount and VMs moved to paused state.<br> " <span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)">server <a moz-do-not-send="true" href="http://10.70.45.17:49217" target="_blank">10.70.45.17:49217</a> has not responded in the last 42 seconds, disconnecting."<br> "</span><span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)"><span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)">vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met</span>"<br> more gluster mount logs at <a moz-do-not-send="true" href="http://pastebin.com/UmiUQq0F" target="_blank"><a class="moz-txt-link-freetext" href="http://pastebin.com/UmiUQq0F">http://pastebin.com/UmiUQq0F</a></a><br> </span>6. After some time gluster quorum is active and I am able to write the the gluster file system.<br> 7. When I try to resume the VM it doesn't work and I got following error in vdsm log.<br> <a moz-do-not-send="true" href="http://pastebin.com/aXiamY15" target="_blank">http://pastebin.com/aXiamY15</a><br> <br> <br> Regards,<br> Ramesh<br> <br> </div> <br> _______________________________________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </blockquote> <br> </span></div> </blockquote> </div> <br> </div> </blockquote> <br> </body> </html> --------------040609060800000303090805--

On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <rnachimu@redhat.com> = wrote: =20 =20 =20 On 09/22/2015 05:57 PM, Alastair Neil wrote:
You need to set the gluster.server-quorum-ratio to 51% =20 =20 I did that. But still I am facing the same issue. VM get paused when I = do some I/O using fio on some disks backed by gluster. I am not able to = resume the VM after this. Now only way is to bring down the VM and run = again. It runs successfully on the same host without any issue. =20 Regards, Ramesh =20 On 22 September 2015 at 08:25, Ramesh Nachimuthu <rnachimu@redhat.com = <mailto:rnachimu@redhat.com>> wrote: =20 =20 On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio = settings on the volume? =20 =20 cluster.server-quorum-type:server cluster.quorum-type:auto gluster.server-quorum-ratio is not set. =20 One brick process is purposefully killed but remaining two bricks = are up and running. =20 Regards, Ramesh =20 On 22 September 2015 at 06:24, Ramesh Nachimuthu < = <mailto:rnachimu@redhat.com>rnachimu@redhat.com = <mailto:rnachimu@redhat.com>> wrote: Hi, =20 I am not able to resume a VM which was paused because of gluster = client quorum issue. Here is what happened in my setup.=20 =20 1. Created a gluster storage domain which is backed by gluster = volume with replica 3.=20 2. Killed one brick process. So only two bricks are running in = replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs = moved to paused state. " server 10.70.45.17:49217 <http://10.70.45.17:49217/> has = not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: = Failing WRITE as quorum is not met" more gluster mount logs at = <http://pastebin.com/UmiUQq0F>http://pastebin.com/UmiUQq0F = <http://pastebin.com/UmiUQq0F> 6. After some time gluster quorum is active and I am able to write =
--Apple-Mail=_6A5ECE5F-7775-4FAA-8073-61A6AFDB8E50 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 This is a known issue in overt 3.5.x and below. It=E2=80=99s been solved = in the upcoming ovirt 3.6. Related to https://bugzilla.redhat.com/show_bug.cgi?id=3D1172905, the = fix involved setting up a special cgroup for the mount, but i can=E2=80=99= t find the exact details atm. the the gluster file system.
7. When I try to resume the VM it doesn't work and I got following = error in vdsm log. http://pastebin.com/aXiamY15 <http://pastebin.com/aXiamY15> =20 =20 Regards, Ramesh =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 =20 =20 =20
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_6A5ECE5F-7775-4FAA-8073-61A6AFDB8E50 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">This is a known issue in overt 3.5.x and below. It=E2=80=99s = been solved in the upcoming ovirt 3.6.<div class=3D""><br = class=3D""></div><div class=3D"">Related to <a = href=3D"https://bugzilla.redhat.com/show_bug.cgi?id=3D1172905" = class=3D"">https://bugzilla.redhat.com/show_bug.cgi?id=3D1172905</a>, = the fix involved setting up a special cgroup for the mount, but i = can=E2=80=99t find the exact details atm.</div><div class=3D""><br = class=3D""></div><div class=3D""><br class=3D""><div><blockquote = type=3D"cite" class=3D""><div class=3D"">On Sep 23, 2015, at 7:38 AM, = Ramesh Nachimuthu <<a href=3D"mailto:rnachimu@redhat.com" = class=3D"">rnachimu@redhat.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""> =20 <meta content=3D"text/html; charset=3Dutf-8" = http-equiv=3D"Content-Type" class=3D""> =20 <div text=3D"#000000" bgcolor=3D"#FFFFFF" class=3D""> <br class=3D""> <br class=3D""> <div class=3D"moz-cite-prefix">On 09/22/2015 05:57 PM, Alastair Neil wrote:<br class=3D""> </div> <blockquote = cite=3D"mid:CA+SarwoorU3LWG6+sR-tJ1BEbQa1k4WRrRXkaXy3z-EPRrz7Uw@mail.gmail= .com" type=3D"cite" class=3D""> <div dir=3D"ltr" class=3D"">You need to set the = gluster.server-quorum-ratio to 51%</div> <div class=3D"gmail_extra"><br class=3D""> </div> </blockquote> <br class=3D""> I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.<br class=3D""> <br class=3D""> Regards,<br class=3D""> Ramesh<br class=3D""> <br class=3D""> <blockquote = cite=3D"mid:CA+SarwoorU3LWG6+sR-tJ1BEbQa1k4WRrRXkaXy3z-EPRrz7Uw@mail.gmail= .com" type=3D"cite" class=3D""> <div class=3D"gmail_extra"> <div class=3D"gmail_quote">On 22 September 2015 at 08:25, Ramesh Nachimuthu <span dir=3D"ltr" class=3D""><<a = moz-do-not-send=3D"true" href=3D"mailto:rnachimu@redhat.com" = target=3D"_blank" class=3D"">rnachimu@redhat.com</a>></span> wrote:<br class=3D""> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text=3D"#000000" bgcolor=3D"#FFFFFF" class=3D""><span = class=3D""> <br class=3D""> <br class=3D""> <div class=3D"">On 09/22/2015 05:43 PM, Alastair Neil = wrote:<br class=3D""> </div> <blockquote type=3D"cite" class=3D""> <div dir=3D"ltr" class=3D"">what are = the gluster-quorum-type and gluster.server-quorum-ratio settings = on the volume?</div> <div class=3D"gmail_extra"><br class=3D""> </div> </blockquote> <br class=3D""> </span> <div style=3D"outline-style:none" class=3D""> <div = style=3D"overflow:hidden;text-overflow:ellipsis;white-space:nowrap" = class=3D""><b class=3D"">cluster.server-quorum-type</b>:server<br = class=3D""> <div title=3D"" style=3D"outline-style:none" class=3D"">= <div = style=3D"overflow:hidden;text-overflow:ellipsis;white-space:nowrap" = class=3D""><b class=3D"">cluster.quorum-type</b>:auto<br class=3D""> <b class=3D"">gluster.server-quorum-ratio is not = set.</b><br class=3D""> <br class=3D""> </div> </div> One brick process is purposefully killed but remaining two bricks are up and running.<br class=3D""> <br class=3D""> Regards,<br class=3D""> Ramesh<br class=3D""> </div> </div> <span class=3D""> <br class=3D""> <blockquote type=3D"cite" class=3D""> <div class=3D"gmail_extra"> <div class=3D"gmail_quote">On 22 September 2015 at 06:24, Ramesh Nachimuthu <span dir=3D"ltr" = class=3D""><<a moz-do-not-send=3D"true" = href=3D"mailto:rnachimu@redhat.com" target=3D"_blank" class=3D""></a><a = class=3D"moz-txt-link-abbreviated" = href=3D"mailto:rnachimu@redhat.com">rnachimu@redhat.com</a>></span> wrote:<br class=3D""> <blockquote class=3D"gmail_quote" style=3D"margin:0 = 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text=3D"#000000" bgcolor=3D"#FFFFFF" = class=3D""> Hi,<br class=3D""> <br class=3D""> I am not able to resume a VM = which was paused because of gluster client quorum issue. Here is what happened in my setup. <br = class=3D""> <br class=3D""> 1. Created a gluster storage domain which is backed by gluster volume with replica 3. <br = class=3D""> 2. Killed one brick process. So only two bricks are running in replica 3 setup.<br = class=3D""> 3. Created two VMs<br class=3D""> 4. Started some IO using fio on both of the VMs<br class=3D""> 5. After some time got the following error in gluster mount and VMs moved to paused = state.<br class=3D""> = " <span = style=3D"color:rgb(51,51,51);font-family:monospace;font-size:11px;font-sty= le:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;lin= e-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-= space:normal;word-spacing:0px;display:inline!important;float:none;backgrou= nd-color:rgb(255,255,255)" class=3D"">server <a moz-do-not-send=3D"true" = href=3D"http://10.70.45.17:49217/" target=3D"_blank" = class=3D"">10.70.45.17:49217</a> has not responded in the last 42 seconds, disconnecting."<br class=3D""> "</span><span = style=3D"color:rgb(51,51,51);font-family:monospace;font-size:11px;font-sty= le:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;lin= e-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-= space:normal;word-spacing:0px;display:inline!important;float:none;backgrou= nd-color:rgb(255,255,255)" class=3D""><span = style=3D"color:rgb(51,51,51);font-family:monospace;font-size:11px;font-sty= le:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;lin= e-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-= space:normal;word-spacing:0px;display:inline!important;float:none;backgrou= nd-color:rgb(255,255,255)" class=3D"">vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not = met</span>"<br class=3D""> more gluster = mount logs at <a moz-do-not-send=3D"true" = href=3D"http://pastebin.com/UmiUQq0F" target=3D"_blank" class=3D""></a><a = class=3D"moz-txt-link-freetext" = href=3D"http://pastebin.com/UmiUQq0F">http://pastebin.com/UmiUQq0F</a><br = class=3D""> </span>6. After some time gluster quorum is active and I am able to write the the gluster file system.<br class=3D""> 7. When I try to resume the VM it doesn't work and I got following error in vdsm log.<br = class=3D""> <a = moz-do-not-send=3D"true" href=3D"http://pastebin.com/aXiamY15" = target=3D"_blank" class=3D"">http://pastebin.com/aXiamY15</a><br = class=3D""> <br class=3D""> <br class=3D""> Regards,<br class=3D""> Ramesh<br class=3D""> <br class=3D""> </div> <br class=3D""> = _______________________________________________<br class=3D""> Users mailing list<br class=3D""> <a moz-do-not-send=3D"true" = href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""> <a moz-do-not-send=3D"true" = href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer" = target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a><br = class=3D""> <br class=3D""> </blockquote> </div> <br class=3D""> </div> </blockquote> <br class=3D""> </span></div> </blockquote> </div> <br class=3D""> </div> </blockquote> <br class=3D""> </div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></div></body></html>= --Apple-Mail=_6A5ECE5F-7775-4FAA-8073-61A6AFDB8E50--

This is a multi-part message in MIME format. --------------050104050809050706040602 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 09/24/2015 02:38 AM, Darrell Budic wrote:
This is a known issue in overt 3.5.x and below. It’s been solved in the upcoming ovirt 3.6.
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1172905, the fix involved setting up a special cgroup for the mount, but i can’t find the exact details atm.
I have vdsm 4.17.6-0.el7.centos already installed on the hosts. So I am not sure above bug 1172905 <https://bugzilla.redhat.com/show_bug.cgi?id=1172905> fixes this correctly. Regards, Ramesh
On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
On 09/22/2015 05:57 PM, Alastair Neil wrote:
You need to set the gluster.server-quorum-ratio to 51%
I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.
Regards, Ramesh
On 22 September 2015 at 08:25, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?
*cluster.server-quorum-type*:server *cluster.quorum-type*:auto *gluster.server-quorum-ratio is not set.*
One brick process is purposefully killed but remaining two bricks are up and running.
Regards, Ramesh
On 22 September 2015 at 06:24, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 <http://10.70.45.17:49217/> has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
--------------050104050809050706040602 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> <br> <br> <div class="moz-cite-prefix">On 09/24/2015 02:38 AM, Darrell Budic wrote:<br> </div> <blockquote cite="mid:F5280320-1598-4721-A5D4-CE3035F1E7C3@onholyground.com" type="cite"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> This is a known issue in overt 3.5.x and below. It’s been solved in the upcoming ovirt 3.6. <div class=""><br class=""> </div> <div class="">Related to <a moz-do-not-send="true" href="https://bugzilla.redhat.com/show_bug.cgi?id=1172905" class="">https://bugzilla.redhat.com/show_bug.cgi?id=1172905</a>, the fix involved setting up a special cgroup for the mount, but i can’t find the exact details atm.</div> <div class=""><br class=""> </div> </blockquote> <br> I have vdsm 4.17.6-0.el7.centos already installed on the hosts. So I am not sure above bug <a moz-do-not-send="true" href="https://bugzilla.redhat.com/show_bug.cgi?id=1172905" class="">1172905</a> fixes this correctly.<br> <br> Regards,<br> Ramesh<br> <br> <blockquote cite="mid:F5280320-1598-4721-A5D4-CE3035F1E7C3@onholyground.com" type="cite"> <div class=""><br class=""> <div> <blockquote type="cite" class=""> <div class="">On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" class="">rnachimu@redhat.com</a>> wrote:</div> <br class="Apple-interchange-newline"> <div class=""> <meta content="text/html; charset=utf-8" http-equiv="Content-Type" class=""> <div text="#000000" bgcolor="#FFFFFF" class=""> <br class=""> <br class=""> <div class="moz-cite-prefix">On 09/22/2015 05:57 PM, Alastair Neil wrote:<br class=""> </div> <blockquote cite="mid:CA+SarwoorU3LWG6+sR-tJ1BEbQa1k4WRrRXkaXy3z-EPRrz7Uw@mail.gmail.com" type="cite" class=""> <div dir="ltr" class="">You need to set the gluster.server-quorum-ratio to 51%</div> <div class="gmail_extra"><br class=""> </div> </blockquote> <br class=""> I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.<br class=""> <br class=""> Regards,<br class=""> Ramesh<br class=""> <br class=""> <blockquote cite="mid:CA+SarwoorU3LWG6+sR-tJ1BEbQa1k4WRrRXkaXy3z-EPRrz7Uw@mail.gmail.com" type="cite" class=""> <div class="gmail_extra"> <div class="gmail_quote">On 22 September 2015 at 08:25, Ramesh Nachimuthu <span dir="ltr" class=""><<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" target="_blank" class=""><a class="moz-txt-link-abbreviated" href="mailto:rnachimu@redhat.com">rnachimu@redhat.com</a></a>></span> wrote:<br class=""> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF" class=""><span class=""> <br class=""> <br class=""> <div class="">On 09/22/2015 05:43 PM, Alastair Neil wrote:<br class=""> </div> <blockquote type="cite" class=""> <div dir="ltr" class="">what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?</div> <div class="gmail_extra"><br class=""> </div> </blockquote> <br class=""> </span> <div style="outline-style:none" class=""> <div style="overflow:hidden;text-overflow:ellipsis;white-space:nowrap" class=""><b class="">cluster.server-quorum-type</b>:server<br class=""> <div title="" style="outline-style:none" class=""> <div style="overflow:hidden;text-overflow:ellipsis;white-space:nowrap" class=""><b class="">cluster.quorum-type</b>:auto<br class=""> <b class="">gluster.server-quorum-ratio is not set.</b><br class=""> <br class=""> </div> </div> One brick process is purposefully killed but remaining two bricks are up and running.<br class=""> <br class=""> Regards,<br class=""> Ramesh<br class=""> </div> </div> <span class=""> <br class=""> <blockquote type="cite" class=""> <div class="gmail_extra"> <div class="gmail_quote">On 22 September 2015 at 06:24, Ramesh Nachimuthu <span dir="ltr" class=""><<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:rnachimu@redhat.com"><a class="moz-txt-link-abbreviated" href="mailto:rnachimu@redhat.com">rnachimu@redhat.com</a></a>></span> wrote:<br class=""> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF" class=""> Hi,<br class=""> <br class=""> I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup. <br class=""> <br class=""> 1. Created a gluster storage domain which is backed by gluster volume with replica 3. <br class=""> 2. Killed one brick process. So only two bricks are running in replica 3 setup.<br class=""> 3. Created two VMs<br class=""> 4. Started some IO using fio on both of the VMs<br class=""> 5. After some time got the following error in gluster mount and VMs moved to paused state.<br class=""> " <span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)" class="">server <a moz-do-not-send="true" href="http://10.70.45.17:49217/" target="_blank" class="">10.70.45.17:49217</a> has not responded in the last 42 seconds, disconnecting."<br class=""> "</span><span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)" class=""><span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)" class="">vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met</span>"<br class=""> more gluster mount logs at <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://pastebin.com/UmiUQq0F">http://pastebin.com/UmiUQq0F</a><br class=""> </span>6. After some time gluster quorum is active and I am able to write the the gluster file system.<br class=""> 7. When I try to resume the VM it doesn't work and I got following error in vdsm log.<br class=""> <a moz-do-not-send="true" href="http://pastebin.com/aXiamY15" target="_blank" class="">http://pastebin.com/aXiamY15</a><br class=""> <br class=""> <br class=""> Regards,<br class=""> Ramesh<br class=""> <br class=""> </div> <br class=""> _______________________________________________<br class=""> Users mailing list<br class=""> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank" class="">Users@ovirt.org</a><br class=""> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/mailman/listinfo/users</a><br class=""> <br class=""> </blockquote> </div> <br class=""> </div> </blockquote> <br class=""> </span></div> </blockquote> </div> <br class=""> </div> </blockquote> <br class=""> </div> _______________________________________________<br class=""> Users mailing list<br class=""> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" class="">Users@ovirt.org</a><br class=""> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br class=""> </div> </blockquote> </div> <br class=""> </div> </blockquote> <br> </body> </html> --------------050104050809050706040602--

On Thu, Sep 24, 2015 at 7:37 AM, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
On 09/24/2015 02:38 AM, Darrell Budic wrote:
This is a known issue in overt 3.5.x and below. It’s been solved in the upcoming ovirt 3.6.
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1172905, the fix involved setting up a special cgroup for the mount, but i can’t find the exact details atm.
I have vdsm 4.17.6-0.el7.centos already installed on the hosts. So I am not sure above bug 1172905 <https://bugzilla.redhat.com/show_bug.cgi?id=1172905> fixes this correctly.
I think the root cause is the same - qemu cannot recover from glusterfs unmount, and the only way to resume the vm is to restart it with a fresh mount. The mentioned bug handle the case where stopping vdsm kills the glusterfs mount helper. This issue is fixed in 3.6. The issue here seems different. I suggest you open a bug so gluster guys can investigate this. Nir
Regards, Ramesh
On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
On 09/22/2015 05:57 PM, Alastair Neil wrote:
You need to set the gluster.server-quorum-ratio to 51%
I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.
Regards, Ramesh
On 22 September 2015 at 08:25, Ramesh Nachimuthu < <rnachimu@redhat.com> rnachimu@redhat.com> wrote:
On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?
*cluster.server-quorum-type*:server *cluster.quorum-type*:auto *gluster.server-quorum-ratio is not set.*
One brick process is purposefully killed but remaining two bricks are up and running.
Regards, Ramesh
On 22 September 2015 at 06:24, Ramesh Nachimuthu < <rnachimu@redhat.com> rnachimu@redhat.com> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------030503030503080006060300 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 09/24/2015 11:28 AM, Nir Soffer wrote:
On Thu, Sep 24, 2015 at 7:37 AM, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
On 09/24/2015 02:38 AM, Darrell Budic wrote:
This is a known issue in overt 3.5.x and below. It’s been solved in the upcoming ovirt 3.6.
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1172905, the fix involved setting up a special cgroup for the mount, but i can’t find the exact details atm.
I have vdsm 4.17.6-0.el7.centos already installed on the hosts. So I am not sure above bug 1172905 <https://bugzilla.redhat.com/show_bug.cgi?id=1172905> fixes this correctly.
I think the root cause is the same - qemu cannot recover from glusterfs unmount, and the only way to resume the vm is to restart it with a fresh mount.
The mentioned bug handle the case where stopping vdsm kills the glusterfs mount helper. This issue is fixed in 3.6.
The issue here seems different. I suggest you open a bug so gluster guys can investigate this.
Seems like I am hitting the issue reported in bz https://bugzilla.redhat.com/show_bug.cgi?id=1171261. Regards, Ramesh
Nir
Regards, Ramesh
On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
On 09/22/2015 05:57 PM, Alastair Neil wrote:
You need to set the gluster.server-quorum-ratio to 51%
I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.
Regards, Ramesh
On 22 September 2015 at 08:25, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?
*cluster.server-quorum-type*:server *cluster.quorum-type*:auto *gluster.server-quorum-ratio is not set.*
One brick process is purposefully killed but remaining two bricks are up and running.
Regards, Ramesh
On 22 September 2015 at 06:24, Ramesh Nachimuthu <rnachimu@redhat.com <mailto:rnachimu@redhat.com>> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 <http://10.70.45.17:49217/> has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
--------------030503030503080006060300 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> <br> <br> <div class="moz-cite-prefix">On 09/24/2015 11:28 AM, Nir Soffer wrote:<br> </div> <blockquote cite="mid:CAMRbyyvi0mtEVyae4TJxq-67B7ZKwnEVwUu5Dp6DrChO1Y2_Yw@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote">On Thu, Sep 24, 2015 at 7:37 AM, Ramesh Nachimuthu <span dir="ltr"><<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:rnachimu@redhat.com">rnachimu@redhat.com</a></a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"><span class=""> <br> <br> <div>On 09/24/2015 02:38 AM, Darrell Budic wrote:<br> </div> <blockquote type="cite"> This is a known issue in overt 3.5.x and below. It’s been solved in the upcoming ovirt 3.6. <div><br> </div> <div>Related to <a moz-do-not-send="true" href="https://bugzilla.redhat.com/show_bug.cgi?id=1172905" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1172905</a>, the fix involved setting up a special cgroup for the mount, but i can’t find the exact details atm.</div> <div><br> </div> </blockquote> <br> </span> I have vdsm 4.17.6-0.el7.centos already installed on the hosts. So I am not sure above bug <a moz-do-not-send="true" href="https://bugzilla.redhat.com/show_bug.cgi?id=1172905" target="_blank">1172905</a> fixes this correctly.<br> </div> </blockquote> <div><br> </div> <div>I think the root cause is the same - qemu cannot recover from glusterfs unmount, and the only way to resume the vm is to restart it with a fresh mount.</div> <div><br> </div> <div>The mentioned bug handle the case where stopping vdsm kills the glusterfs mount helper. This issue is fixed in 3.6. </div> <div><br> </div> <div>The issue here seems different. I suggest you open a bug so gluster guys can investigate this.</div> <div><br> </div> </div> </div> </div> </blockquote> <br> Seems like I am hitting the issue reported in bz <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1171261">https://bugzilla.redhat.com/show_bug.cgi?id=1171261</a>. <br> <br> Regards,<br> Ramesh<br> <br> <blockquote cite="mid:CAMRbyyvi0mtEVyae4TJxq-67B7ZKwnEVwUu5Dp6DrChO1Y2_Yw@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div>Nir</div> <div><br> </div> <div><br> </div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> <br> Regards,<br> Ramesh <div> <div class="h5"><br> <br> <blockquote type="cite"> <div><br> <div> <blockquote type="cite"> <div>On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" target="_blank">rnachimu@redhat.com</a>> wrote:</div> <br> <div> <div text="#000000" bgcolor="#FFFFFF"> <br> <br> <div>On 09/22/2015 05:57 PM, Alastair Neil wrote:<br> </div> <blockquote type="cite"> <div dir="ltr">You need to set the gluster.server-quorum-ratio to 51%</div> <div class="gmail_extra"><br> </div> </blockquote> <br> I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.<br> <br> Regards,<br> Ramesh<br> <br> <blockquote type="cite"> <div class="gmail_extra"> <div class="gmail_quote">On 22 September 2015 at 08:25, Ramesh Nachimuthu <span dir="ltr"><<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:rnachimu@redhat.com">rnachimu@redhat.com</a></a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"><span> <br> <br> <div>On 09/22/2015 05:43 PM, Alastair Neil wrote:<br> </div> <blockquote type="cite"> <div dir="ltr">what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?</div> <div class="gmail_extra"><br> </div> </blockquote> <br> </span> <div style="outline-style:none"> <div style="overflow:hidden;text-overflow:ellipsis;white-space:nowrap"><b>cluster.server-quorum-type</b>:server<br> <div title="" style="outline-style:none"> <div style="overflow:hidden;text-overflow:ellipsis;white-space:nowrap"><b>cluster.quorum-type</b>:auto<br> <b>gluster.server-quorum-ratio is not set.</b><br> <br> </div> </div> One brick process is purposefully killed but remaining two bricks are up and running.<br> <br> Regards,<br> Ramesh<br> </div> </div> <span> <br> <blockquote type="cite"> <div class="gmail_extra"> <div class="gmail_quote">On 22 September 2015 at 06:24, Ramesh Nachimuthu <span dir="ltr"><<a moz-do-not-send="true" href="mailto:rnachimu@redhat.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:rnachimu@redhat.com">rnachimu@redhat.com</a></a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> Hi,<br> <br> I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup. <br> <br> 1. Created a gluster storage domain which is backed by gluster volume with replica 3. <br> 2. Killed one brick process. So only two bricks are running in replica 3 setup.<br> 3. Created two VMs<br> 4. Started some IO using fio on both of the VMs<br> 5. After some time got the following error in gluster mount and VMs moved to paused state.<br> " <span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)">server <a moz-do-not-send="true" href="http://10.70.45.17:49217/" target="_blank">10.70.45.17:49217</a> has not responded in the last 42 seconds, disconnecting."<br> "</span><span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)"><span style="color:rgb(51,51,51);font-family:monospace;font-size:11px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:13.2px;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;display:inline!important;float:none;background-color:rgb(255,255,255)">vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met</span>"<br> more gluster mount logs at <a moz-do-not-send="true" href="http://pastebin.com/UmiUQq0F" target="_blank"><a class="moz-txt-link-freetext" href="http://pastebin.com/UmiUQq0F">http://pastebin.com/UmiUQq0F</a></a><br> </span>6. After some time gluster quorum is active and I am able to write the the gluster file system.<br> 7. When I try to resume the VM it doesn't work and I got following error in vdsm log.<br> <a moz-do-not-send="true" href="http://pastebin.com/aXiamY15" target="_blank"><a class="moz-txt-link-freetext" href="http://pastebin.com/aXiamY15">http://pastebin.com/aXiamY15</a></a><br> <br> <br> Regards,<br> Ramesh<br> <br> </div> <br> _______________________________________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a></a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank"><a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a></a><br> <br> </blockquote> </div> <br> </div> </blockquote> <br> </span></div> </blockquote> </div> <br> </div> </blockquote> <br> </div> _______________________________________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> </div> </blockquote> </div> <br> </div> </blockquote> <br> </div> </div> </div> <br> _______________________________________________<br> Users mailing list<br> <a moz-do-not-send="true" href="mailto:Users@ovirt.org">Users@ovirt.org</a><br> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br> <br> </blockquote> </div> <br> </div> </div> </blockquote> <br> </body> </html> --------------030503030503080006060300--

On Thu, Sep 24, 2015 at 9:06 AM, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
On 09/24/2015 11:28 AM, Nir Soffer wrote:
On Thu, Sep 24, 2015 at 7:37 AM, Ramesh Nachimuthu < <rnachimu@redhat.com> rnachimu@redhat.com> wrote:
On 09/24/2015 02:38 AM, Darrell Budic wrote:
This is a known issue in overt 3.5.x and below. It’s been solved in the upcoming ovirt 3.6.
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1172905, the fix involved setting up a special cgroup for the mount, but i can’t find the exact details atm.
I have vdsm 4.17.6-0.el7.centos already installed on the hosts. So I am not sure above bug 1172905 <https://bugzilla.redhat.com/show_bug.cgi?id=1172905> fixes this correctly.
I think the root cause is the same - qemu cannot recover from glusterfs unmount, and the only way to resume the vm is to restart it with a fresh mount.
The mentioned bug handle the case where stopping vdsm kills the glusterfs mount helper. This issue is fixed in 3.6.
The issue here seems different. I suggest you open a bug so gluster guys can investigate this.
Seems like I am hitting the issue reported in bz https://bugzilla.redhat.com/show_bug.cgi?id=1171261.
Indeed. I would open an ovirt bug anyway and make it depend on the glusterfs bug. We need a way to track this issues, and having no ovirt/rhev hides this issue.
Regards, Ramesh
Nir
Regards, Ramesh
On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
On 09/22/2015 05:57 PM, Alastair Neil wrote:
You need to set the gluster.server-quorum-ratio to 51%
I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.
Regards, Ramesh
On 22 September 2015 at 08:25, Ramesh Nachimuthu < <rnachimu@redhat.com> rnachimu@redhat.com> wrote:
On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?
*cluster.server-quorum-type*:server *cluster.quorum-type*:auto *gluster.server-quorum-ratio is not set.*
One brick process is purposefully killed but remaining two bricks are up and running.
Regards, Ramesh
On 22 September 2015 at 06:24, Ramesh Nachimuthu < <rnachimu@redhat.com> rnachimu@redhat.com> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at <http://pastebin.com/UmiUQq0F> http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. <http://pastebin.com/aXiamY15>http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list <Users@ovirt.org>Users@ovirt.org <http://lists.ovirt.org/mailman/listinfo/users> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

The details are here: https://gerrit.ovirt.org/#/c/40240 The link exist on the bug of course. On Thu, Sep 24, 2015 at 12:08 AM, Darrell Budic <budic@onholyground.com> wrote:
This is a known issue in overt 3.5.x and below. It’s been solved in the upcoming ovirt 3.6.
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1172905, the fix involved setting up a special cgroup for the mount, but i can’t find the exact details atm.
On Sep 23, 2015, at 7:38 AM, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
On 09/22/2015 05:57 PM, Alastair Neil wrote:
You need to set the gluster.server-quorum-ratio to 51%
I did that. But still I am facing the same issue. VM get paused when I do some I/O using fio on some disks backed by gluster. I am not able to resume the VM after this. Now only way is to bring down the VM and run again. It runs successfully on the same host without any issue.
Regards, Ramesh
On 22 September 2015 at 08:25, Ramesh Nachimuthu <rnachimu@redhat.com> wrote:
On 09/22/2015 05:43 PM, Alastair Neil wrote:
what are the gluster-quorum-type and gluster.server-quorum-ratio settings on the volume?
*cluster.server-quorum-type*:server *cluster.quorum-type*:auto *gluster.server-quorum-ratio is not set.*
One brick process is purposefully killed but remaining two bricks are up and running.
Regards, Ramesh
On 22 September 2015 at 06:24, Ramesh Nachimuthu < <rnachimu@redhat.com> rnachimu@redhat.com> wrote:
Hi,
I am not able to resume a VM which was paused because of gluster client quorum issue. Here is what happened in my setup.
1. Created a gluster storage domain which is backed by gluster volume with replica 3. 2. Killed one brick process. So only two bricks are running in replica 3 setup. 3. Created two VMs 4. Started some IO using fio on both of the VMs 5. After some time got the following error in gluster mount and VMs moved to paused state. " server 10.70.45.17:49217 has not responded in the last 42 seconds, disconnecting." "vmstore-replicate-0: e16d1e40-2b6e-4f19-977d-e099f465dfc6: Failing WRITE as quorum is not met" more gluster mount logs at <http://pastebin.com/UmiUQq0F> http://pastebin.com/UmiUQq0F 6. After some time gluster quorum is active and I am able to write the the gluster file system. 7. When I try to resume the VM it doesn't work and I got following error in vdsm log. http://pastebin.com/aXiamY15
Regards, Ramesh
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (4)
-
Alastair Neil
-
Darrell Budic
-
Nir Soffer
-
Ramesh Nachimuthu