Re: [ovirt-users] gluster self-heal takes cluster offline

23 Mar 2018

      ...
From: Sahina Bose <sabose@redhat.com>
Subject: Re: [ovirt-users] gluster self-heal takes cluster offline
Date: March 23, 2018 at 1:26:01 AM CDT
To: Jim Kusznir
Cc: Ravishankar Narayanankutty; users
=20
=20
=20
On Fri, Mar 16, 2018 at 2:45 AM, Jim Kusznir <jim@palousetech.com =
<mailto:jim@palousetech.com>> wrote:
Hi all:
=20
I'm trying to understand why/how (and most importantly, how to fix) an =
substantial issue I had last night.  This happened one other time, but I =
didn't know/understand all the parts associated with it until last =
night.
=20
I have a 3 node hyperconverged (self-hosted engine, Gluster on each =
node) cluster.  Gluster is Replica 2 + arbitrar.  Current network =
configuration is 2x GigE on load balance ("LAG Group" on switch), plus =
one GigE from each server on a separate vlan, intended for Gluster (but =
not used).  Server hardware is Dell R610's, each server as an SSD in it. =
 Server 1 and 2 have the full replica, server 3 is the arbitrar.
=20
I put server 2 into maintence so I can work on the hardware, including =
turn it off and such.  In the course of the work, I found that I needed =
to reconfigure the SSD's partitioning somewhat, and it resulted in =
wiping the data partition (storing VM images).  I figure, its no big =
deal, gluster will rebuild that in short order.  I did take care of the =
extended attr settings and the like, and when I booted it up, gluster =
came up as expected and began rebuilding the disk.
=20
How big was the data on this partition? What was the shard size set on =
...
Out of curiosity, how long did it take to heal and come back to =
operational?
=20
=20
The problem is that suddenly my entire cluster got very sluggish.  The =
entine was marking nodes and VMs failed and unfaling them throughout the =
system, fairly randomly.  It didn't matter what node the engine or VM =
was on.  At one point, it power cycled server 1 for "non-responsive" =
(even though everything was running on it, and the gluster rebuild was =
working on it).  As a result of this, about 6 VMs were killed and my =
entire gluster system went down hard (suspending all remaining VMs and =
...
=20
I ended up putting the cluster in global HA maintence mode and =
disabling power fencing on the nodes until the process finished.  It =
appeared on at least two occasions a functional node was marked bad and =
had the fencing not been disabled, a node would have rebooted, just =
further exacerbating the problem. =20
=20
Its clear that the gluster rebuild overloaded things and caused the =
...
=20
Why did my system bite it so hard with the rebuild?  I baby'ed it =
along until the rebuild was complete, after which it returned to normal =
operation.
=20
As of this event, all networking (host/engine management, gluster, and =
VM network) were on the same vlan.  I'd love to move things off, but so =
far any attempt to do so breaks my cluster.  How can I move my =
management interfaces to a separate VLAN/IP Space?  I also want to move =
Gluster to its own private space, but it seems if I change anything in =
...
=20
Will the above network reconfigurations be enough?  I got the =
impression that the issue may not have been purely network based, but =
--Apple-Mail=_C9658AC7-B5C3-4BB8-9C28-655DB402EFFB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

What version of ovirt and gluster? Sounds like something I just saw with =
gluster 3.12.x, are you using libgfapi or just fuse mounts?

the gluster volume?
the engine), as there were no remaining full copies of the data.  After =
several minutes (these are Dell servers, after all...), server 1 came =
back up, and gluster resumed the rebuild, and came online on the =
cluster.  I had to manually (virtsh command) unpause the engine, and =
then struggle through trying to get critical VMs back up.  Everything =
was super slow, and load averages on the servers were often seen in =
excess of 80 (these are 8 core / 16 thread boxes).  Actual CPU usage =
(reported by top) was rarely above 40% (inclusive of all CPUs) for any =
one server. Glusterfs was often seen using 180%-350% of a CPU on server =
1 and 2. =20
problem.  I don't know why the load was so high (even IOWait was low), =
but load averages were definately tied to the glusterfs cpu utilization =
%.   At no point did I have any problems pinging any machine (host or =
VM) unless the engine decided it was dead and killed it.
the peers file, the entire gluster cluster goes down.  The dedicated =
gluster network is listed as a secondary hostname for all peers already.
possibly server IO overload.  Is this likely / right?
...
=20
I appreciate input.  I don't think gluster's recovery is supposed to =
do as much damage as it did the last two or three times any healing was =
required.
=20
Thanks!
--Jim
=20
_______________________________________________
Users mailing list
Users@ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users =
<http://lists.ovirt.org/mailman/listinfo/users>
=20
=20
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_C9658AC7-B5C3-4BB8-9C28-655DB402EFFB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D"">What =
version of ovirt and gluster? Sounds like something I just saw with =
gluster 3.12.x, are you using libgfapi or just fuse mounts?<div =
class=3D""><br class=3D""></div><div class=3D""><div><blockquote =
type=3D"cite" class=3D""><hr style=3D"border:none;border-top:solid =
#B5C4DF 1.0pt;padding:0 0 0 0;margin:10px 0 5px 0;" class=3D""><span =
style=3D"margin: -1.3px 0.0px 0.0px 0.0px" id=3D"RwhHeaderAttributes" =
class=3D""><font face=3D"Helvetica" size=3D"4" color=3D"#000000" =
style=3D"font: 13.0px Helvetica; color: #000000" class=3D""><b =
class=3D"">From:</b> Sahina Bose <<a href=3D"mailto:sabose@redhat.com" =
class=3D"">sabose@redhat.com</a>></font></span><br class=3D"">
<span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font =
face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px =
Helvetica; color: #000000" class=3D""><b class=3D"">Subject:</b> Re: =
[ovirt-users] gluster self-heal takes cluster offline</font></span><br =
class=3D"">
<span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font =
face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px =
Helvetica; color: #000000" class=3D""><b class=3D"">Date:</b> March 23, =
2018 at 1:26:01 AM CDT</font></span><br class=3D"">
<span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font =
face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px =
Helvetica; color: #000000" class=3D""><b class=3D"">To:</b> Jim =
Kusznir</font></span><br class=3D"">
<span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font =
face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px =
Helvetica; color: #000000" class=3D""><b class=3D"">Cc:</b> Ravishankar =
Narayanankutty; users</font></span><br class=3D"">
<br class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D""><br class=3D""><div class=3D"gmail_extra"><br class=3D""><div =
class=3D"gmail_quote">On Fri, Mar 16, 2018 at 2:45 AM, Jim Kusznir <span =
dir=3D"ltr" class=3D""><<a href=3D"mailto:jim@palousetech.com" =
target=3D"_blank" class=3D"">jim@palousetech.com</a>></span> =
wrote:<br class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" =
class=3D"">Hi all:<div class=3D""><br class=3D""></div><div class=3D"">I'm=
 trying to understand why/how (and most importantly, how to fix) an =
substantial issue I had last night.  This happened one other time, =
but I didn't know/understand all the parts associated with it until last =
night.</div><div class=3D""><br class=3D""></div><div class=3D"">I have =
a 3 node hyperconverged (self-hosted engine, Gluster on each node) =
cluster.  Gluster is Replica 2 + arbitrar.  Current network =
configuration is 2x GigE on load balance ("LAG Group" on switch), plus =
one GigE from each server on a separate vlan, intended for Gluster (but =
not used).  Server hardware is Dell R610's, each server as an SSD =
in it.  Server 1 and 2 have the full replica, server 3 is the =
arbitrar.</div><div class=3D""><br class=3D""></div><div class=3D"">I =
put server 2 into maintence so I can work on the hardware, including =
turn it off and such.  In the course of the work, I found that I =
needed to reconfigure the SSD's partitioning somewhat, and it resulted =
in wiping the data partition (storing VM images).  I figure, its no =
big deal, gluster will rebuild that in short order.  I did take =
care of the extended attr settings and the like, and when I booted it =
up, gluster came up as expected and began rebuilding the =
disk.</div></div></blockquote><div class=3D""><br class=3D""></div><div =
class=3D"">How big was the data on this partition? What was the shard =
size set on the gluster volume?</div><div class=3D"">Out of curiosity, =
how long did it take to heal and come back to operational?</div><div =
class=3D""><br class=3D""></div><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex"><div dir=3D"ltr" class=3D""><div class=3D""><br =
class=3D""></div><div class=3D"">The problem is that suddenly my entire =
cluster got very sluggish.  The entine was marking nodes and VMs =
failed and unfaling them throughout the system, fairly randomly.  =
It didn't matter what node the engine or VM was on.  At one point, =
it power cycled server 1 for "non-responsive" (even though everything =
was running on it, and the gluster rebuild was working on it).  As =
a result of this, about 6 VMs were killed and my entire gluster system =
went down hard (suspending all remaining VMs and the engine), as there =
were no remaining full copies of the data.  After several minutes =
(these are Dell servers, after all...), server 1 came back up, and =
gluster resumed the rebuild, and came online on the cluster.  I had =
to manually (virtsh command) unpause the engine, and then struggle =
through trying to get critical VMs back up.  Everything was super =
slow, and load averages on the servers were often seen in excess of 80 =
(these are 8 core / 16 thread boxes).  Actual CPU usage (reported =
by top) was rarely above 40% (inclusive of all CPUs) for any one server. =
Glusterfs was often seen using 180%-350% of a CPU on server 1 and =
2.  </div><div class=3D""><br class=3D""></div><div class=3D"">I=
 ended up putting the cluster in global HA maintence mode and disabling =
power fencing on the nodes until the process finished.  It appeared =
on at least two occasions a functional node was marked bad and had the =
fencing not been disabled, a node would have rebooted, just further =
exacerbating the problem.  </div><div class=3D""><br =
class=3D""></div><div class=3D"">Its clear that the gluster rebuild =
overloaded things and caused the problem.  I don't know why the =
load was so high (even IOWait was low), but load averages were =
definately tied to the glusterfs cpu utilization %.   At no =
point did I have any problems pinging any machine (host or VM) unless =
the engine decided it was dead and killed it.</div><div class=3D""><br =
class=3D""></div><div class=3D"">Why did my system bite it so hard with =
the rebuild?  I baby'ed it along until the rebuild was complete, =
after which it returned to normal operation.</div><div class=3D""><br =
class=3D""></div><div class=3D"">As of this event, all networking =
(host/engine management, gluster, and VM network) were on the same =
vlan.  I'd love to move things off, but so far any attempt to do so =
breaks my cluster.  How can I move my management interfaces to a =
separate VLAN/IP Space?  I also want to move Gluster to its own =
private space, but it seems if I change anything in the peers file, the =
entire gluster cluster goes down.  The dedicated gluster network is =
listed as a secondary hostname for all peers already.</div><div =
class=3D""><br class=3D""></div><div class=3D"">Will the above network =
reconfigurations be enough?  I got the impression that the issue =
may not have been purely network based, but possibly server IO =
overload.  Is this likely / right?</div><div class=3D""><br =
class=3D""></div><div class=3D"">I appreciate input.  I don't think =
gluster's recovery is supposed to do as much damage as it did the last =
two or three times any healing was required.</div><div class=3D""><br =
class=3D""></div><div class=3D"">Thanks!</div><span class=3D"HOEnZb"><font=
 color=3D"#888888" class=3D""><div =
class=3D"">--Jim</div></font></span></div>
<br class=3D"">______________________________<wbr =
class=3D"">_________________<br class=3D"">
Users mailing list<br class=3D"">
<a href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br =
class=3D"">
<a href=3D"http://lists.ovirt.org/mailman/listinfo/users" =
rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://lists.ovirt.org/<wbr =
class=3D"">mailman/listinfo/users</a><br class=3D"">
<br class=3D""></blockquote></div><br class=3D""></div></div>
_______________________________________________<br class=3D"">Users =
mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" =
class=3D"">Users@ovirt.org</a><br =
class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br =
class=3D""></div></blockquote></div><br class=3D""></div></body></html>=

--Apple-Mail=_C9658AC7-B5C3-4BB8-9C28-655DB402EFFB--

Re: [ovirt-users] gluster self-heal takes cluster offline

Darrell Budic