--Apple-Mail=_C9658AC7-B5C3-4BB8-9C28-655DB402EFFB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
What version of ovirt and gluster? Sounds like something I just saw with =
gluster 3.12.x, are you using libgfapi or just fuse mounts?
From: Sahina Bose <sabose(a)redhat.com>
Subject: Re: [ovirt-users] gluster self-heal takes cluster offline
Date: March 23, 2018 at 1:26:01 AM CDT
To: Jim Kusznir
Cc: Ravishankar Narayanankutty; users
=20
=20
=20
On Fri, Mar 16, 2018 at 2:45 AM, Jim Kusznir <jim(a)palousetech.com =
<mailto:jim@palousetech.com>> wrote:
Hi all:
=20
I'm trying to understand why/how (and most importantly, how to fix) an =
substantial issue I had last night. This happened one other time, but I =
didn't know/understand all the parts associated with it until last =
night.
=20
I have a 3 node hyperconverged (self-hosted engine, Gluster on each =
node)
cluster. Gluster is Replica 2 + arbitrar. Current network =
configuration is 2x GigE on load balance ("LAG Group" on switch), plus =
one GigE from each server on a separate vlan, intended for Gluster (but =
not used). Server hardware is Dell R610's, each server as an SSD in it. =
Server 1 and 2 have the full replica, server 3 is the arbitrar.
=20
I put server 2 into maintence so I can work on the hardware, including =
turn it
off and such. In the course of the work, I found that I needed =
to reconfigure the SSD's partitioning somewhat, and it resulted in =
wiping the data partition (storing VM images). I figure, its no big =
deal, gluster will rebuild that in short order. I did take care of the =
extended attr settings and the like, and when I booted it up, gluster =
came up as expected and began rebuilding the disk.
=20
How big was the data on this partition? What was the shard size set on =
the
gluster volume?
Out of curiosity, how long did it take to heal and come back to =
operational?
=20
=20
The problem is that suddenly my entire cluster got very sluggish. The =
entine was
marking nodes and VMs failed and unfaling them throughout the =
system, fairly randomly. It didn't matter what node the engine or VM =
was on. At one point, it power cycled server 1 for "non-responsive" =
(even though everything was running on it, and the gluster rebuild was =
working on it). As a result of this, about 6 VMs were killed and my =
entire gluster system went down hard (suspending all remaining VMs and =
the engine), as there were no remaining full copies of the data. After =
several minutes (these are Dell servers, after all...), server 1 came =
back up, and gluster resumed the rebuild, and came online on the =
cluster. I had to manually (virtsh command) unpause the engine, and =
then struggle through trying to get critical VMs back up. Everything =
was super slow, and load averages on the servers were often seen in =
excess of 80 (these are 8 core / 16 thread boxes). Actual CPU usage =
(reported by top) was rarely above 40% (inclusive of all CPUs) for any =
one server. Glusterfs was often seen using 180%-350% of a CPU on server =
1 and 2. =20
=20
I ended up putting the cluster in global HA maintence mode and =
disabling power
fencing on the nodes until the process finished. It =
appeared on at least two occasions a functional node was marked bad and =
had the fencing not been disabled, a node would have rebooted, just =
further exacerbating the problem. =20
=20
Its clear that the gluster rebuild overloaded things and caused the =
problem. I
don't know why the load was so high (even IOWait was low), =
but load averages were definately tied to the glusterfs cpu utilization =
%. At no point did I have any problems pinging any machine (host or =
VM) unless the engine decided it was dead and killed it.
=20
Why did my system bite it so hard with the rebuild? I baby'ed it =
along until
the rebuild was complete, after which it returned to normal =
operation.
=20
As of this event, all networking (host/engine management, gluster, and =
VM
network) were on the same vlan. I'd love to move things off, but so =
far any attempt to do so breaks my cluster. How can I move my =
management interfaces to a separate VLAN/IP Space? I also want to move =
Gluster to its own private space, but it seems if I change anything in =
the peers file, the entire gluster cluster goes down. The dedicated =
gluster network is listed as a secondary hostname for all peers already.
=20
Will the above network reconfigurations be enough? I got the =
impression that the
issue may not have been purely network based, but =
possibly server IO overload. Is this likely / right?
=20
I appreciate input. I don't think gluster's recovery is supposed to =
do
as much damage as it did the last two or three times any healing was =
required.
=20
Thanks!
--Jim
=20
_______________________________________________
Users mailing list
Users(a)ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users =
<
http://lists.ovirt.org/mailman/listinfo/users>
=20
=20
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_C9658AC7-B5C3-4BB8-9C28-655DB402EFFB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=us-ascii
<html><head><meta http-equiv=3D"Content-Type"
content=3D"text/html; =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;"
class=3D"">What =
version of ovirt and gluster? Sounds like something I just saw with =
gluster 3.12.x, are you using libgfapi or just fuse mounts?<div =
class=3D""><br class=3D""></div><div
class=3D""><div><blockquote =
type=3D"cite" class=3D""><hr
style=3D"border:none;border-top:solid =
#B5C4DF 1.0pt;padding:0 0 0 0;margin:10px 0 5px 0;" class=3D""><span
=
style=3D"margin: -1.3px 0.0px 0.0px 0.0px" id=3D"RwhHeaderAttributes"
=
class=3D""><font face=3D"Helvetica" size=3D"4"
color=3D"#000000" =
style=3D"font: 13.0px Helvetica; color: #000000" class=3D""><b
=
class=3D"">From:</b> Sahina Bose <<a
href=3D"mailto:sabose@redhat.com" =
class=3D"">sabose(a)redhat.com</a>&gt;</font></span><br
class=3D"">
<span style=3D"margin: -1.3px 0.0px 0.0px 0.0px"
class=3D""><font =
face=3D"Helvetica" size=3D"4" color=3D"#000000"
style=3D"font: 13.0px =
Helvetica; color: #000000" class=3D""><b
class=3D"">Subject:</b> Re: =
[ovirt-users] gluster self-heal takes cluster offline</font></span><br =
class=3D"">
<span style=3D"margin: -1.3px 0.0px 0.0px 0.0px"
class=3D""><font =
face=3D"Helvetica" size=3D"4" color=3D"#000000"
style=3D"font: 13.0px =
Helvetica; color: #000000" class=3D""><b
class=3D"">Date:</b> March 23, =
2018 at 1:26:01 AM CDT</font></span><br class=3D"">
<span style=3D"margin: -1.3px 0.0px 0.0px 0.0px"
class=3D""><font =
face=3D"Helvetica" size=3D"4" color=3D"#000000"
style=3D"font: 13.0px =
Helvetica; color: #000000" class=3D""><b
class=3D"">To:</b> Jim =
Kusznir</font></span><br class=3D"">
<span style=3D"margin: -1.3px 0.0px 0.0px 0.0px"
class=3D""><font =
face=3D"Helvetica" size=3D"4" color=3D"#000000"
style=3D"font: 13.0px =
Helvetica; color: #000000" class=3D""><b
class=3D"">Cc:</b> Ravishankar =
Narayanankutty; users</font></span><br class=3D"">
<br class=3D"Apple-interchange-newline"><div
class=3D""><div dir=3D"ltr" =
class=3D""><br class=3D""><div
class=3D"gmail_extra"><br class=3D""><div =
class=3D"gmail_quote">On Fri, Mar 16, 2018 at 2:45 AM, Jim Kusznir <span
=
dir=3D"ltr" class=3D""><<a
href=3D"mailto:jim@palousetech.com" =
target=3D"_blank"
class=3D"">jim(a)palousetech.com</a>&gt;</span> =
wrote:<br class=3D""><blockquote class=3D"gmail_quote"
style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir=3D"ltr" =
class=3D"">Hi all:<div class=3D""><br
class=3D""></div><div class=3D"">I'm=
trying to understand why/how (and most importantly, how to fix) an =
substantial issue I had last night. This happened one other time, =
but I didn't know/understand all the parts associated with it until last =
night.</div><div class=3D""><br
class=3D""></div><div class=3D"">I have =
a 3 node hyperconverged (self-hosted engine, Gluster on each node) =
cluster. Gluster is Replica 2 + arbitrar. Current network =
configuration is 2x GigE on load balance ("LAG Group" on switch), plus =
one GigE from each server on a separate vlan, intended for Gluster (but =
not used). Server hardware is Dell R610's, each server as an SSD =
in it. Server 1 and 2 have the full replica, server 3 is the =
arbitrar.</div><div class=3D""><br
class=3D""></div><div class=3D"">I =
put server 2 into maintence so I can work on the hardware, including =
turn it off and such. In the course of the work, I found that I =
needed to reconfigure the SSD's partitioning somewhat, and it resulted =
in wiping the data partition (storing VM images). I figure, its no =
big deal, gluster will rebuild that in short order. I did take =
care of the extended attr settings and the like, and when I booted it =
up, gluster came up as expected and began rebuilding the =
disk.</div></div></blockquote><div class=3D""><br
class=3D""></div><div =
class=3D"">How big was the data on this partition? What was the shard =
size set on the gluster volume?</div><div class=3D"">Out of
curiosity, =
how long did it take to heal and come back to operational?</div><div =
class=3D""><br class=3D""></div><blockquote
class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex"><div dir=3D"ltr"
class=3D""><div class=3D""><br =
class=3D""></div><div class=3D"">The problem is that
suddenly my entire =
cluster got very sluggish. The entine was marking nodes and VMs =
failed and unfaling them throughout the system, fairly randomly. =
It didn't matter what node the engine or VM was on. At one point, =
it power cycled server 1 for "non-responsive" (even though everything =
was running on it, and the gluster rebuild was working on it). As =
a result of this, about 6 VMs were killed and my entire gluster system =
went down hard (suspending all remaining VMs and the engine), as there =
were no remaining full copies of the data. After several minutes =
(these are Dell servers, after all...), server 1 came back up, and =
gluster resumed the rebuild, and came online on the cluster. I had =
to manually (virtsh command) unpause the engine, and then struggle =
through trying to get critical VMs back up. Everything was super =
slow, and load averages on the servers were often seen in excess of 80 =
(these are 8 core / 16 thread boxes). Actual CPU usage (reported =
by top) was rarely above 40% (inclusive of all CPUs) for any one server. =
Glusterfs was often seen using 180%-350% of a CPU on server 1 and =
2. </div><div class=3D""><br
class=3D""></div><div class=3D"">I=
ended up putting the cluster in global HA maintence mode and disabling =
power fencing on the nodes until the process finished. It appeared =
on at least two occasions a functional node was marked bad and had the =
fencing not been disabled, a node would have rebooted, just further =
exacerbating the problem. </div><div
class=3D""><br =
class=3D""></div><div class=3D"">Its clear that the
gluster rebuild =
overloaded things and caused the problem. I don't know why the =
load was so high (even IOWait was low), but load averages were =
definately tied to the glusterfs cpu utilization %. At no =
point did I have any problems pinging any machine (host or VM) unless =
the engine decided it was dead and killed it.</div><div
class=3D""><br =
class=3D""></div><div class=3D"">Why did my system bite
it so hard with =
the rebuild? I baby'ed it along until the rebuild was complete, =
after which it returned to normal operation.</div><div
class=3D""><br =
class=3D""></div><div class=3D"">As of this event, all
networking =
(host/engine management, gluster, and VM network) were on the same =
vlan. I'd love to move things off, but so far any attempt to do so =
breaks my cluster. How can I move my management interfaces to a =
separate VLAN/IP Space? I also want to move Gluster to its own =
private space, but it seems if I change anything in the peers file, the =
entire gluster cluster goes down. The dedicated gluster network is =
listed as a secondary hostname for all peers already.</div><div =
class=3D""><br class=3D""></div><div
class=3D"">Will the above network =
reconfigurations be enough? I got the impression that the issue =
may not have been purely network based, but possibly server IO =
overload. Is this likely / right?</div><div
class=3D""><br =
class=3D""></div><div class=3D"">I appreciate
input. I don't think =
gluster's recovery is supposed to do as much damage as it did the last =
two or three times any healing was required.</div><div
class=3D""><br =
class=3D""></div><div
class=3D"">Thanks!</div><span
class=3D"HOEnZb"><font=
color=3D"#888888" class=3D""><div =
class=3D"">--Jim</div></font></span></div>
<br class=3D"">______________________________<wbr =
class=3D"">_________________<br class=3D"">
Users mailing list<br class=3D"">
<a href=3D"mailto:Users@ovirt.org"
class=3D"">Users(a)ovirt.org</a><br =
class=3D"">
<a
href=3D"http://lists.ovirt.org/mailman/listinfo/users" =
rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://lists.ovirt.org/<wbr =
class=3D"">mailman/listinfo/users</a><br class=3D"">
<br class=3D""></blockquote></div><br
class=3D""></div></div>
_______________________________________________<br class=3D"">Users =
mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org"
=
class=3D"">Users(a)ovirt.org</a><br =
class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br =
class=3D""></div></blockquote></div><br
class=3D""></div></body></html>=
--Apple-Mail=_C9658AC7-B5C3-4BB8-9C28-655DB402EFFB--