This is a multi-part message in MIME format.
--------------84EB6DEB84AAFAA57D0538D6
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
On 07/21/2017 02:55 PM, yayo (j) wrote:
2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar(a)redhat.com
<mailto:ravishankar@redhat.com>>:
But it does say something. All these gfids of completed heals in
the log below are the for the ones that you have given the
getfattr output of. So what is likely happening is there is an
intermittent connection problem between your mount and the brick
process, leading to pending heals again after the heal gets
completed, which is why the numbers are varying each time. You
would need to check why that is the case.
Hope this helps,
Ravi
>
> /[2017-07-20 09:58:46.573079] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal]
> 0-engine-replicate-0: Completed data selfheal on
> e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1 sinks=2/
> /[2017-07-20 09:59:22.995003] I [MSGID: 108026]
> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
> 0-engine-replicate-0: performing metadata selfheal on
> f05b9742-2771-484a-85fc-5b6974bcef81/
> /[2017-07-20 09:59:22.999372] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal]
> 0-engine-replicate-0: Completed metadata selfheal on
> f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1 sinks=2/
>
Hi,
But we ha1e 2 gluster volume on the same network and the other one
(the "Data" gluster) don't have any problems. Why you think there is a
network problem?
Because pending self-heals come into the picture when I/O from the
clients (mounts) do not succeed on some bricks. They are mostly due to
(a) the client losing connection to some bricks (likely),
(b) the I/O failing on the bricks themselves (unlikely).
If most of the i/o is also going to the 3rd brick (since you say the
files are already present on all bricks and I/O is successful) , then it
is likely to be (a).
How to check this on a gluster infrastructure?
In the fuse mount logs for the engine volume, check if there are any
messages for brick disconnects. Something along the lines of
"disconnected from volname-client-x".
Just guessing here, but maybe even the 'data' volume did experience
disconnects and self-heals later but you did not observe it when you ran
heal info. See the glustershd log or mount log for for self-heal
completion messages on /0-data-replicate-0 /also.
Regards,
Ravi
Thank you
--------------84EB6DEB84AAFAA57D0538D6
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 07/21/2017 02:55 PM, yayo (j)
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAGK=3kx=XufQ3_bbmF1+R3g+gqCa-nTUf05KNbA00eBcHM92+g@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">2017-07-20 14:48 GMT+02:00
Ravishankar N <span dir="ltr"><<a
href="mailto:ravishankar@redhat.com" target="_blank"
moz-do-not-send="true">ravishankar@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span
class="">
<p><br>
</p>
</span> But it does say something. All these gfids of
completed heals in the log below are the for the ones
that you have given the getfattr output of. So what is
likely happening is there is an intermittent connection
problem between your mount and the brick process,
leading to pending heals again after the heal gets
completed, which is why the numbers are varying each
time. You would need to check why that is the case.<br>
Hope this helps,<br>
Ravi
<div>
<div class="h5"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
</div>
</div>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<div class="gmail_extra">
<div class="gmail_quote">
<div>
<div><i>[2017-07-20 09:58:46.573079] I
[MSGID: 108026]
[afr-self-heal-common.c:1254:a<wbr>fr_log_selfheal]
0-engine-replicate-0: Completed data
selfheal on
e6dfd556-340b-4b76-b47b-7b6f5b<wbr>d74327.
sources=[0] 1 sinks=2</i></div>
</div>
</div>
</div>
<div class="gmail_extra">
<div class="gmail_quote">
<div>
<div><i>[2017-07-20 09:59:22.995003] I
[MSGID: 108026]
[afr-self-heal-metadata.c:51:_<wbr>_afr_selfheal_metadata_do]
0-engine-replicate-0: performing
metadata selfheal on
f05b9742-2771-484a-85fc-5b6974<wbr>bcef81</i></div>
</div>
</div>
</div>
<div class="gmail_extra">
<div class="gmail_quote">
<div>
<div><i>[2017-07-20 09:59:22.999372] I
[MSGID: 108026]
[afr-self-heal-common.c:1254:a<wbr>fr_log_selfheal]
0-engine-replicate-0: Completed
metadata selfheal on
f05b9742-2771-484a-85fc-5b6974<wbr>bcef81.
sources=[0] 1 sinks=2</i></div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<div class="gmail_signature"
data-smartmail="gmail_signature"><br>
</div>
<div class="gmail_signature"
data-smartmail="gmail_signature"><br>
</div>
<div class="gmail_signature"
data-smartmail="gmail_signature">Hi,</div>
<div class="gmail_signature"
data-smartmail="gmail_signature"><br>
</div>
<div class="gmail_signature"
data-smartmail="gmail_signature">But
we ha1e 2 gluster volume on the same network and the other
one (the "Data" gluster) don't have any problems. Why you
think there is a network problem?</div>
</div>
</div>
</blockquote>
<br>
Because pending self-heals come into the picture when I/O from the
clients (mounts) do not succeed on some bricks. They are mostly due
to <br>
(a) the client losing connection to some bricks (likely),<br>
(b) the I/O failing on the bricks themselves (unlikely).<br>
<br>
If most of the i/o is also going to the 3rd brick (since you say the
files are already present on all bricks and I/O is successful) ,
then it is likely to be (a).<br>
<br>
<blockquote type="cite"
cite="mid:CAGK=3kx=XufQ3_bbmF1+R3g+gqCa-nTUf05KNbA00eBcHM92+g@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_signature"
data-smartmail="gmail_signature">
How to check this on a gluster infrastructure?</div>
<div class="gmail_signature"
data-smartmail="gmail_signature"><br>
</div>
</div>
</div>
</blockquote>
In the fuse mount logs for the engine volume, check if there are any
messages for brick disconnects. Something along the lines of
"disconnected from volname-client-x".<br>
Just guessing here, but maybe even the 'data' volume did experience
disconnects and self-heals later but you did not observe it when you
ran heal info. See the glustershd log or mount log for for self-heal
completion messages on <i> 0-data-replicate-0 </i>also.<br>
<br>
Regards,<br>
Ravi<br>
<blockquote type="cite"
cite="mid:CAGK=3kx=XufQ3_bbmF1+R3g+gqCa-nTUf05KNbA00eBcHM92+g@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_signature"
data-smartmail="gmail_signature">Thank
you</div>
<div class="gmail_signature"
data-smartmail="gmail_signature"><br>
</div>
<div class="gmail_signature"
data-smartmail="gmail_signature"><br>
</div>
<div class="gmail_signature"
data-smartmail="gmail_signature"><br>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>
--------------84EB6DEB84AAFAA57D0538D6--