This is a multi-part message in MIME format.
--------------858C29C457E2769C648EAA1C
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Le 19/12/2016 =C3=A0 08:28, Sahina Bose a =C3=A9crit :
On Fri, Dec 16, 2016 at 11:00 PM, Nathana=C3=ABl Blanchet <blanchet@abe=
s.fr=20
<mailto:blanchet@abes.fr>> wrote:
Le 16/12/2016 =C3=A0 16:34, Sahina Bose a =C3=A9crit :
> Failed to find host
> 'Host[guadalupe1,7a30c899-a317-479a-b07b-244bc2374485]' in
> gluster peer list from
> 'Host[guadalupe1,7a30c899-a317-479a-b07b-244bc2374485]' on attempt=
2
> It looks the gluster uuid saved in the ovirt engine db does
not
> match the one returned from CLI
>
> Was this host reinstalled?
> You may need to remove host from engine and add it again. If that
> doesn't work you may need to manually change the uuid value in
> the database (gluster_server table)
Removing host did nothing, indeed I had to go to the
gluster_server table to remove any disconnected host uuid, but it
was not enough. Then I had then to remove the host and reinstall
it as a new host.
Thank you, I've been spending a lot of time to solve this issue.
Sorry to hear that you had trouble with this. Could you explain a bit=20
on how you got into this state?
Was it because you re-provisioned one of the gluster nodes and the=20
gluster UUID was reset (without oVirt being aware of it?). Would like=20
to either fix/enhance the engine to handle this if it's a common=20
enough use-case
When going to the gluster_server table, I realized that there were
some=20
(disconnected) hosts probed with the gluster network IP. A the begining,=20
I didn't use the gluster network so my hosts were probed on the=20
management network and all was fine. When I decided to change the=20
gluster traffic to a dedicated network (you answered to me about it=20
there :
https://www.mail-archive.com/users@ovirt.org/msg37742.html), I=20
believed that hosts would be probed with the new network IP, but they=20
didn't. So I manually probed them with the gluster IP, and I think all=20
my troubles come from there. I reinstalled vdsm, and then nothing was ok=20
since since this moment.
>
> On Fri, Dec 16, 2016 at 7:00 PM, Nathana=C3=ABl Blanchet
> <blanchet(a)abes.fr <mailto:blanchet@abes.fr>> wrote:
>
> extract of the last engine logs, thank you
>
>
> Le 16/12/2016 =C3=A0 14:02, Sahina Bose a =C3=A9crit :
>> Could you attach the engine log with this error?
>>
>> On Fri, Dec 16, 2016 at 4:29 PM, Nathana=C3=ABl Blanchet
>> <blanchet(a)abes.fr <mailto:blanchet@abes.fr>> wrote:
>>
>> Hi,
>>
>> I used to successfully run a replica 3 gluster volume,
>> but since the last 4.0.5 update, they can't connect each
>> other with the message : gluster [gluster peer status
>> guadalupe1.v100.abes.fr
>> <
http://guadalupe1.v100.abes.fr>] command failed on
>> server guadalupe2.v100.abes.fr
>> <
http://guadalupe2.v100.abes.fr>.
>>
>> So host guadalupe1 can't never be up.
>>
>> When doing gluster peer probe, they are connected as
>> expected. I reinstalled vdsm and gluster, but it is
>> still the same.
>>
>> I found this on guadalupe2 supervdsm.log
>>
>> MainProcess|jsonrpc.Executor/6::DEBUG::2016-12-16
>> 11:53:21,429::supervdsmServer::99::SuperVdsm.ServerCallba=
ck::(wrapper)
>> return peerStatus with [{'status':
'CONNECTED',
>> 'hostname': '10.34.101.56/24
<
http://10.34.101.56/24>';,
>> 'uuid': 'c259c09b-8d7c-4b12-8745-677199877583'},
>> {'status': 'CONNECTED', 'hostname':
>> 'guadalupe3.v100.abes.fr
>> <
http://guadalupe3.v100.abes.fr>';, 'uuid':
>> '6af67cd3-7931-446d-aaa2-ffea51325adc'}, {'status':
>> 'CONNECTED', 'hostname':
'guadalupe1.v100.abes.fr
>> <
http://guadalupe1.v100.abes.fr>';, 'uuid':
>> '8eb485cd-31c4-4c3a-a315-3dc6d3ddc0c9'}]
>> MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16
>> 11:53:21,490::supervdsmServer::92::SuperVdsm.ServerCallba=
ck::(wrapper)
>> call peerProbe with () {}
>> MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16
>> 11:53:21,491::commands::68::root::(execCmd)
>> /usr/bin/taskset --cpu-list 0-63 /usr/sbin/gluster
>> --mode=3Dscript peer probe guadalupe1.v100.abes.fr
>> <
http://guadalupe1.v100.abes.fr> --xml (cwd None)
>> MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16
>> 11:53:21,570::commands::86::root::(execCmd) SUCCESS:
>> <err> =3D ''; <rc> =3D 0
>> MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16
>> 11:53:21,570::supervdsmServer::99::SuperVdsm.ServerCallba=
ck::(wrapper)
>> return peerProbe with True
>>
>> We can see guadalupe2 can see guadalupe1 but taskset
>> still executes peer probe to guadalupe1 with message
>> "Host guadalupe1.v100.abes.fr
>> <
http://guadalupe1.v100.abes.fr> port 24007 already in
>> peer list"
>>
>> How can I say to guadalupe2 stop trying to probe guadalup=
e1?
>>
>>
>> --=20
>> Nathana=C3=ABl Blanchet
>>
>> Supervision r=C3=A9seau
>> P=C3=B4le Infrastrutures Informatiques
>> 227 avenue Professeur-Jean-Louis-Viala
>> 34193 MONTPELLIER CEDEX 5
>> T=C3=A9l. 33 (0)4 67 54 84 55
>> Fax 33 (0)4 67 54 84 14
>> blanchet(a)abes.fr <mailto:blanchet@abes.fr>
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org <mailto:Users@ovirt.org>
>>
http://lists.ovirt.org/mailman/listinfo/users
>> <
http://lists.ovirt.org/mailman/listinfo/users>
>>
>>
>
> --=20
> Nathana=C3=ABl Blanchet
>
> Supervision r=C3=A9seau
> P=C3=B4le Infrastrutures Informatiques
> 227 avenue Professeur-Jean-Louis-Viala
> 34193 MONTPELLIER CEDEX 5 =09
> T=C3=A9l. 33 (0)4 67 54 84 55
> Fax 33 (0)4 67 54 84 14
> blanchet(a)abes.fr <mailto:blanchet@abes.fr> =20
>
>
--=20
Nathana=C3=ABl Blanchet
Supervision r=C3=A9seau
P=C3=B4le Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 =09
T=C3=A9l. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr <mailto:blanchet@abes.fr> =20
--=20
Nathana=C3=ABl Blanchet
Supervision r=C3=A9seau
P=C3=B4le Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 =09
T=C3=A9l. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr
--------------858C29C457E2769C648EAA1C
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta content=3D"text/html; charset=3Dutf-8"
http-equiv=3D"Content-Ty=
pe">
</head>
<body text=3D"#000000" bgcolor=3D"#FFFFFF">
<p><br>
</p>
<br>
<div class=3D"moz-cite-prefix">Le 19/12/2016 =C3=A0 08:28, Sahina
Bos=
e a
=C3=A9crit=C2=A0:<br>
</div>
<blockquote
cite=3D"mid:CACjzOvdhyA2-n5MyyVRAgNabzx1NHVRuV-h-hxZSrb3PtYHPVA@mail.gmai=
l.com"
type=3D"cite">
<div dir=3D"ltr"><br>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Fri, Dec 16, 2016 at 11:00 PM,
Nathana=C3=ABl Blanchet <span dir=3D"ltr"><<a
moz-do-not-send=3D"true"
href=3D"mailto:blanchet@abes.fr"
target=3D"_blank"><a
class=3D"moz-txt-link-abbreviated" h=
ref=3D"mailto:blanchet@abes.fr">blanchet@abes.fr</a></a>></span>
wrote=
:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text=3D"#000000" bgcolor=3D"#FFFFFF"><span
class=3D"">
<p><br>
</p>
<br>
<div class=3D"m_-152671267708035808moz-cite-prefix">Le
16/12/2016 =C3=A0 16:34, Sahina Bose a =C3=A9crit=C2=A0=
:<br>
</div>
<blockquote type=3D"cite">
<div dir=3D"ltr">
<div>
<div>Failed to find host
'Host[guadalupe1,7a30c899-<wbr>a317-479a-b07b-2=
44bc2374485]'
in gluster peer list from
'Host[guadalupe1,7a30c899-<wbr>a317-479a-b07b-2=
44bc2374485]'
on attempt 2<br>
It looks the gluster uuid=C2=A0 saved in the ov=
irt
engine db does not match the one returned from
CLI<br>
<br>
</div>
Was this host reinstalled? <br>
</div>
<div>You may need to remove host from engine and
add it again. If that doesn't work you may need
to manually change the uuid value in the
database (gluster_server table)<br>
</div>
</div>
</blockquote>
</span> Removing host did nothing, indeed I had to go to
the gluster_server table to remove any disconnected host
uuid, but it was not enough. Then I had then to remove
the host and reinstall it as a new host.<br>
Thank you, I've been spending a lot of time to solve
this issue.</div>
</blockquote>
<div><br>
</div>
<div>Sorry to hear that you had trouble with this. Could you
explain a bit on how you got into this state?<br>
<br>
</div>
<div>Was it because you re-provisioned one of the gluster
nodes and the gluster UUID was reset (without oVirt being
aware of it?). Would like to either fix/enhance the engine
to handle this if it's a common enough use-case<br>
</div>
</div>
</div>
</div>
</blockquote>
When going to the gluster_server table, I realized that there were
some (disconnected) hosts probed with the gluster network IP. A the
begining, I didn't use the gluster network so my hosts were probed
on the management network and all was fine. When I decided to change
the gluster traffic to a dedicated network (you answered to me about
it there :
<a class=3D"moz-txt-link-freetext"
href=3D"https://www.mail-archive.c=
om/users@ovirt.org/msg37742.html">https://www.mail-archive.com/users@ovir=
t.org/msg37742.html</a>), I
believed that hosts would be probed with the new network IP, but
they didn't. So I manually probed them with the gluster IP, and I
think all my troubles come from there. I reinstalled vdsm, and then
nothing was ok since since this moment.<br>
<blockquote
cite=3D"mid:CACjzOvdhyA2-n5MyyVRAgNabzx1NHVRuV-h-hxZSrb3PtYHPVA@mail.gmai=
l.com"
type=3D"cite">
<div dir=3D"ltr">
<div class=3D"gmail_extra">
<div class=3D"gmail_quote">
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text=3D"#000000" bgcolor=3D"#FFFFFF">
<div>
<div class=3D"h5"><br>
<blockquote type=3D"cite">
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Fri, Dec 16, 2016
a=
t
7:00 PM, Nathana=C3=ABl Blanchet <span dir=3D"l=
tr"><<a
moz-do-not-send=3D"true"
class=3D"m_-152671267708035808moz-txt-link-=
abbreviated"
href=3D"mailto:blanchet@abes.fr"
target=3D"_blank"><a
class=3D"moz-txt-link-=
abbreviated"
href=3D"mailto:blanchet@abes.fr">blanchet@abes.fr</a></a>>=
;</span>
wrote:<br>
<blockquote class=3D"gmail_quote"
style=3D"margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div text=3D"#000000"
bgcolor=3D"#FFFFFF">
extract of the last engine logs, thank you
<div>
<div
class=3D"m_-152671267708035808h5"><b=
r>
<br>
<div
class=3D"m_-152671267708035808m_89287=
38385687730066moz-cite-prefix">Le
16/12/2016 =C3=A0 14:02, Sahina Bose =
a
=C3=A9crit=C2=A0:<br>
</div>
<blockquote type=3D"cite">
<div dir=3D"ltr">Could you attach the
engine log with this error?<br>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Fri,
Dec 16, 2016 at 4:29 PM,
Nathana=C3=ABl Blanchet <span
dir=3D"ltr"><<a
moz-do-not-send=3D"true"
class=3D"m_-15267126770803580=
8moz-txt-link-abbreviated"
href=3D"mailto:blanchet@abes.fr" target=3D"_blank"><a
class=3D"moz-txt-li=
nk-abbreviated"
href=3D"mailto:blanchet@abes.fr">blanchet@abes.fr</a></a>=
></span>
wrote:<br>
<blockquote class=3D"gmail_quote"
style=3D"margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">Hi,<br>
<br>
I used to successfully run a
replica 3 gluster volume, but
since the last 4.0.5 update,
they can't connect each other
with the message : gluster
[gluster peer status <a
moz-do-not-send=3D"true"
href=3D"http://guadalupe1.v10=
0.abes.fr"
rel=3D"noreferrer"
target=3D"_blank">guadalupe1.=
v100.abes.fr</a>]
command failed on server <a
moz-do-not-send=3D"true"
href=3D"http://guadalupe2.v10=
0.abes.fr"
rel=3D"noreferrer"
target=3D"_blank">guadalupe2.=
v100.abes.fr</a>.<br>
<br>
So host guadalupe1 can't never
be up.<br>
<br>
When doing gluster peer probe,
they are connected as
expected. I reinstalled vdsm
and gluster, but it is still
the same.<br>
<br>
I found this on guadalupe2
supervdsm.log<br>
<br>
MainProcess|jsonrpc.Executor/6<=
wbr>::DEBUG::2016-12-16
11:53:21,429::supervdsmServer:<=
wbr>:99::SuperVdsm.ServerCallback:<wbr>:(wrapper)
return peerStatus with
[{'status': 'CONNECTED',
'hostname': '<a
moz-do-not-send=3D"true"
href=3D"http://10.34.101.56/2=
4"
rel=3D"noreferrer"
target=3D"_blank">10.34.101.5=
6/24</a>',
'uuid':
'c259c09b-8d7c-4b12-8745-67719<=
wbr>9877583'},
{'status': 'CONNECTED',
'hostname': '<a
moz-do-not-send=3D"true"
href=3D"http://guadalupe3.v10=
0.abes.fr"
rel=3D"noreferrer"
target=3D"_blank">guadalupe3.=
v100.abes.fr</a>',
'uuid':
'6af67cd3-7931-446d-aaa2-ffea5<=
wbr>1325adc'},
{'status': 'CONNECTED',
'hostname': '<a
moz-do-not-send=3D"true"
href=3D"http://guadalupe1.v10=
0.abes.fr"
rel=3D"noreferrer"
target=3D"_blank">guadalupe1.=
v100.abes.fr</a>',
'uuid':
'8eb485cd-31c4-4c3a-a315-3dc6d<=
wbr>3ddc0c9'}]<br>
MainProcess|jsonrpc.Executor/7<=
wbr>::DEBUG::2016-12-16
11:53:21,490::supervdsmServer:<=
wbr>:92::SuperVdsm.ServerCallback:<wbr>:(wrapper)
call peerProbe with () {}<br>
MainProcess|jsonrpc.Executor/7<=
wbr>::DEBUG::2016-12-16
11:53:21,491::commands::68::ro<=
wbr>ot::(execCmd)
/usr/bin/taskset --cpu-list
0-63 /usr/sbin/gluster
--mode=3Dscript peer probe <a
moz-do-not-send=3D"true"
href=3D"http://guadalupe1.v10=
0.abes.fr"
rel=3D"noreferrer"
target=3D"_blank">guadalupe1.=
v100.abes.fr</a>
--xml (cwd None)<br>
MainProcess|jsonrpc.Executor/7<=
wbr>::DEBUG::2016-12-16
11:53:21,570::commands::86::ro<=
wbr>ot::(execCmd)
SUCCESS: <err> =3D '';
<rc> =3D 0<br>
MainProcess|jsonrpc.Executor/7<=
wbr>::DEBUG::2016-12-16
11:53:21,570::supervdsmServer:<=
wbr>:99::SuperVdsm.ServerCallback:<wbr>:(wrapper)
return peerProbe with True<br>
<br>
We can see guadalupe2 can see
guadalupe1 but taskset still
executes peer probe to
guadalupe1 with message "Host
<a moz-do-not-send=3D"true"
href=3D"http://guadalupe1.v10=
0.abes.fr"
rel=3D"noreferrer"
target=3D"_blank">guadalupe1.=
v100.abes.fr</a>
port 24007 already in peer
list"<br>
<br>
How can I say to guadalupe2
stop trying to probe
guadalupe1?<br>
<br>
<br>
-- <br>
Nathana=C3=ABl Blanchet<br>
<br>
Supervision r=C3=A9seau<br>
P=C3=B4le Infrastrutures
Informatiques<br>
227 avenue
Professeur-Jean-Louis-Viala<br>
34193 MONTPELLIER CEDEX 5=C2=A0=
=C2=A0 =C2=A0
=C2=A0<br>
T=C3=A9l. 33 (0)4 67 54 84 55<b=
r>
Fax=C2=A0 33 (0)4 67 54 84 14<b=
r>
<a moz-do-not-send=3D"true"
href=3D"mailto:blanchet@abes.=
fr"
target=3D"_blank">blanchet@ab=
es.fr</a><br>
<br>
______________________________<=
wbr>_________________<br>
Users mailing list<br>
<a moz-do-not-send=3D"true"
href=3D"mailto:Users@ovirt.or=
g"
target=3D"_blank">Users@ovirt=
.org</a><br>
<a moz-do-not-send=3D"true"
href=3D"http://lists.ovirt.or=
g/mailman/listinfo/users"
rel=3D"noreferrer"
target=3D"_blank">http://list=
s.ovirt.org/mailman<wbr>/listinfo/users</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<pre class=3D"m_-152671267708035808m_89=
28738385687730066moz-signature" cols=3D"72">--=20
Nathana=C3=ABl Blanchet
Supervision r=C3=A9seau
P=C3=B4le Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 =09
T=C3=A9l. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
<a moz-do-not-send=3D"true"
class=3D"m_-152671267708035808m_8928738385687=
730066moz-txt-link-abbreviated" href=3D"mailto:blanchet@abes.fr"
target=3D=
"_blank">blanchet(a)abes.fr</a> </pre>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<pre class=3D"m_-152671267708035808moz-signature" col=
s=3D"72">--=20
Nathana=C3=ABl Blanchet
Supervision r=C3=A9seau
P=C3=B4le Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 =09
T=C3=A9l. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
<a moz-do-not-send=3D"true"
class=3D"m_-152671267708035808moz-txt-link-ab=
breviated" href=3D"mailto:blanchet@abes.fr"
target=3D"_blank">blanchet@ab=
es.fr</a> </pre>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
<pre class=3D"moz-signature" cols=3D"72">--=20
Nathana=C3=ABl Blanchet
Supervision r=C3=A9seau
P=C3=B4le Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 =09
T=C3=A9l. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
<a class=3D"moz-txt-link-abbreviated"
href=3D"mailto:blanchet@abes.fr">bl=
anchet(a)abes.fr</a> </pre>
</body>
</html>
--------------858C29C457E2769C648EAA1C--