
This is a multi-part message in MIME format. --------------858C29C457E2769C648EAA1C Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Le 19/12/2016 =C3=A0 08:28, Sahina Bose a =C3=A9crit :
On Fri, Dec 16, 2016 at 11:00 PM, Nathana=C3=ABl Blanchet <blanchet@abe=
<mailto:blanchet@abes.fr>> wrote:
Le 16/12/2016 =C3=A0 16:34, Sahina Bose a =C3=A9crit :
Failed to find host 'Host[guadalupe1,7a30c899-a317-479a-b07b-244bc2374485]' in gluster peer list from 'Host[guadalupe1,7a30c899-a317-479a-b07b-244bc2374485]' on attempt=
2
It looks the gluster uuid saved in the ovirt engine db does not match the one returned from CLI
Was this host reinstalled? You may need to remove host from engine and add it again. If that doesn't work you may need to manually change the uuid value in the database (gluster_server table)
Removing host did nothing, indeed I had to go to the gluster_server table to remove any disconnected host uuid, but it was not enough. Then I had then to remove the host and reinstall it as a new host. Thank you, I've been spending a lot of time to solve this issue.
Sorry to hear that you had trouble with this. Could you explain a bit=20 on how you got into this state?
Was it because you re-provisioned one of the gluster nodes and the=20 gluster UUID was reset (without oVirt being aware of it?). Would like=20 to either fix/enhance the engine to handle this if it's a common=20 enough use-case When going to the gluster_server table, I realized that there were some=20 (disconnected) hosts probed with the gluster network IP. A the begining,=20 I didn't use the gluster network so my hosts were probed on the=20 management network and all was fine. When I decided to change the=20 gluster traffic to a dedicated network (you answered to me about it=20
s.fr=20 there : https://www.mail-archive.com/users@ovirt.org/msg37742.html), I=20 believed that hosts would be probed with the new network IP, but they=20 didn't. So I manually probed them with the gluster IP, and I think all=20 my troubles come from there. I reinstalled vdsm, and then nothing was ok=20 since since this moment.
On Fri, Dec 16, 2016 at 7:00 PM, Nathana=C3=ABl Blanchet <blanchet@abes.fr <mailto:blanchet@abes.fr>> wrote:
extract of the last engine logs, thank you
Le 16/12/2016 =C3=A0 14:02, Sahina Bose a =C3=A9crit :
Could you attach the engine log with this error?
On Fri, Dec 16, 2016 at 4:29 PM, Nathana=C3=ABl Blanchet <blanchet@abes.fr <mailto:blanchet@abes.fr>> wrote:
Hi,
I used to successfully run a replica 3 gluster volume, but since the last 4.0.5 update, they can't connect each other with the message : gluster [gluster peer status guadalupe1.v100.abes.fr <http://guadalupe1.v100.abes.fr>] command failed on server guadalupe2.v100.abes.fr <http://guadalupe2.v100.abes.fr>.
So host guadalupe1 can't never be up.
When doing gluster peer probe, they are connected as expected. I reinstalled vdsm and gluster, but it is still the same.
I found this on guadalupe2 supervdsm.log
MainProcess|jsonrpc.Executor/6::DEBUG::2016-12-16 11:53:21,429::supervdsmServer::99::SuperVdsm.ServerCallba=
ck::(wrapper)
return peerStatus with [{'status': 'CONNECTED', 'hostname': '10.34.101.56/24 <http://10.34.101.56/24>', 'uuid': 'c259c09b-8d7c-4b12-8745-677199877583'}, {'status': 'CONNECTED', 'hostname': 'guadalupe3.v100.abes.fr <http://guadalupe3.v100.abes.fr>', 'uuid': '6af67cd3-7931-446d-aaa2-ffea51325adc'}, {'status': 'CONNECTED', 'hostname': 'guadalupe1.v100.abes.fr <http://guadalupe1.v100.abes.fr>', 'uuid': '8eb485cd-31c4-4c3a-a315-3dc6d3ddc0c9'}] MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16 11:53:21,490::supervdsmServer::92::SuperVdsm.ServerCallba=
ck::(wrapper)
call peerProbe with () {} MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16 11:53:21,491::commands::68::root::(execCmd) /usr/bin/taskset --cpu-list 0-63 /usr/sbin/gluster --mode=3Dscript peer probe guadalupe1.v100.abes.fr <http://guadalupe1.v100.abes.fr> --xml (cwd None) MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16 11:53:21,570::commands::86::root::(execCmd) SUCCESS: <err> =3D ''; <rc> =3D 0 MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16 11:53:21,570::supervdsmServer::99::SuperVdsm.ServerCallba=
ck::(wrapper)
return peerProbe with True
We can see guadalupe2 can see guadalupe1 but taskset still executes peer probe to guadalupe1 with message "Host guadalupe1.v100.abes.fr <http://guadalupe1.v100.abes.fr> port 24007 already in peer list"
How can I say to guadalupe2 stop trying to probe guadalup=
e1?
--=20 Nathana=C3=ABl Blanchet
Supervision r=C3=A9seau P=C3=B4le Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 T=C3=A9l. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr <mailto:blanchet@abes.fr>
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>
--=20 Nathana=C3=ABl Blanchet
Supervision r=C3=A9seau P=C3=B4le Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 =09 T=C3=A9l. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr <mailto:blanchet@abes.fr> =20
--=20 Nathana=C3=ABl Blanchet
Supervision r=C3=A9seau P=C3=B4le Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 =09 T=C3=A9l. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr <mailto:blanchet@abes.fr> =20
--=20 Nathana=C3=ABl Blanchet Supervision r=C3=A9seau P=C3=B4le Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 =09 T=C3=A9l. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr --------------858C29C457E2769C648EAA1C Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html> <head> <meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Ty= pe"> </head> <body text=3D"#000000" bgcolor=3D"#FFFFFF"> <p><br> </p> <br> <div class=3D"moz-cite-prefix">Le 19/12/2016 =C3=A0 08:28, Sahina Bos= e a =C3=A9crit=C2=A0:<br> </div> <blockquote cite=3D"mid:CACjzOvdhyA2-n5MyyVRAgNabzx1NHVRuV-h-hxZSrb3PtYHPVA@mail.gmai= l.com" type=3D"cite"> <div dir=3D"ltr"><br> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Fri, Dec 16, 2016 at 11:00 PM, Nathana=C3=ABl Blanchet <span dir=3D"ltr"><<a moz-do-not-send=3D"true" href=3D"mailto:blanchet@abes.fr" target=3D"_blank"><a class=3D"moz-txt-link-abbreviated" h= ref=3D"mailto:blanchet@abes.fr">blanchet@abes.fr</a></a>></span> wrote= :<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text=3D"#000000" bgcolor=3D"#FFFFFF"><span class=3D""> <p><br> </p> <br> <div class=3D"m_-152671267708035808moz-cite-prefix">Le 16/12/2016 =C3=A0 16:34, Sahina Bose a =C3=A9crit=C2=A0= :<br> </div> <blockquote type=3D"cite"> <div dir=3D"ltr"> <div> <div>Failed to find host 'Host[guadalupe1,7a30c899-<wbr>a317-479a-b07b-2= 44bc2374485]' in gluster peer list from 'Host[guadalupe1,7a30c899-<wbr>a317-479a-b07b-2= 44bc2374485]' on attempt 2<br> It looks the gluster uuid=C2=A0 saved in the ov= irt engine db does not match the one returned from CLI<br> <br> </div> Was this host reinstalled? <br> </div> <div>You may need to remove host from engine and add it again. If that doesn't work you may need to manually change the uuid value in the database (gluster_server table)<br> </div> </div> </blockquote> </span> Removing host did nothing, indeed I had to go to the gluster_server table to remove any disconnected host uuid, but it was not enough. Then I had then to remove the host and reinstall it as a new host.<br> Thank you, I've been spending a lot of time to solve this issue.</div> </blockquote> <div><br> </div> <div>Sorry to hear that you had trouble with this. Could you explain a bit on how you got into this state?<br> <br> </div> <div>Was it because you re-provisioned one of the gluster nodes and the gluster UUID was reset (without oVirt being aware of it?). Would like to either fix/enhance the engine to handle this if it's a common enough use-case<br> </div> </div> </div> </div> </blockquote> When going to the gluster_server table, I realized that there were some (disconnected) hosts probed with the gluster network IP. A the begining, I didn't use the gluster network so my hosts were probed on the management network and all was fine. When I decided to change the gluster traffic to a dedicated network (you answered to me about it there : <a class=3D"moz-txt-link-freetext" href=3D"https://www.mail-archive.c= om/users@ovirt.org/msg37742.html">https://www.mail-archive.com/users@ovir= t.org/msg37742.html</a>), I believed that hosts would be probed with the new network IP, but they didn't. So I manually probed them with the gluster IP, and I think all my troubles come from there. I reinstalled vdsm, and then nothing was ok since since this moment.<br> <blockquote cite=3D"mid:CACjzOvdhyA2-n5MyyVRAgNabzx1NHVRuV-h-hxZSrb3PtYHPVA@mail.gmai= l.com" type=3D"cite"> <div dir=3D"ltr"> <div class=3D"gmail_extra"> <div class=3D"gmail_quote"> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text=3D"#000000" bgcolor=3D"#FFFFFF"> <div> <div class=3D"h5"><br> <blockquote type=3D"cite"> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Fri, Dec 16, 2016 a= t 7:00 PM, Nathana=C3=ABl Blanchet <span dir=3D"l= tr"><<a moz-do-not-send=3D"true" class=3D"m_-152671267708035808moz-txt-link-= abbreviated" href=3D"mailto:blanchet@abes.fr" target=3D"_blank"><a class=3D"moz-txt-link-= abbreviated" href=3D"mailto:blanchet@abes.fr">blanchet@abes.fr</a></a>>= ;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text=3D"#000000" bgcolor=3D"#FFFFFF"> extract of the last engine logs, thank you <div> <div class=3D"m_-152671267708035808h5"><b= r> <br> <div class=3D"m_-152671267708035808m_89287= 38385687730066moz-cite-prefix">Le 16/12/2016 =C3=A0 14:02, Sahina Bose = a =C3=A9crit=C2=A0:<br> </div> <blockquote type=3D"cite"> <div dir=3D"ltr">Could you attach the engine log with this error?<br> </div> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Fri, Dec 16, 2016 at 4:29 PM, Nathana=C3=ABl Blanchet <span dir=3D"ltr"><<a moz-do-not-send=3D"true" class=3D"m_-15267126770803580= 8moz-txt-link-abbreviated" href=3D"mailto:blanchet@abes.fr" target=3D"_blank"><a class=3D"moz-txt-li= nk-abbreviated" href=3D"mailto:blanchet@abes.fr">blanchet@abes.fr</a></a>= ></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br> <br> I used to successfully run a replica 3 gluster volume, but since the last 4.0.5 update, they can't connect each other with the message : gluster [gluster peer status <a moz-do-not-send=3D"true" href=3D"http://guadalupe1.v10= 0.abes.fr" rel=3D"noreferrer" target=3D"_blank">guadalupe1.= v100.abes.fr</a>] command failed on server <a moz-do-not-send=3D"true" href=3D"http://guadalupe2.v10= 0.abes.fr" rel=3D"noreferrer" target=3D"_blank">guadalupe2.= v100.abes.fr</a>.<br> <br> So host guadalupe1 can't never be up.<br> <br> When doing gluster peer probe, they are connected as expected. I reinstalled vdsm and gluster, but it is still the same.<br> <br> I found this on guadalupe2 supervdsm.log<br> <br> MainProcess|jsonrpc.Executor/6<= wbr>::DEBUG::2016-12-16 11:53:21,429::supervdsmServer:<= wbr>:99::SuperVdsm.ServerCallback:<wbr>:(wrapper) return peerStatus with [{'status': 'CONNECTED', 'hostname': '<a moz-do-not-send=3D"true" href=3D"http://10.34.101.56/2= 4" rel=3D"noreferrer" target=3D"_blank">10.34.101.5= 6/24</a>', 'uuid': 'c259c09b-8d7c-4b12-8745-67719<= wbr>9877583'}, {'status': 'CONNECTED', 'hostname': '<a moz-do-not-send=3D"true" href=3D"http://guadalupe3.v10= 0.abes.fr" rel=3D"noreferrer" target=3D"_blank">guadalupe3.= v100.abes.fr</a>', 'uuid': '6af67cd3-7931-446d-aaa2-ffea5<= wbr>1325adc'}, {'status': 'CONNECTED', 'hostname': '<a moz-do-not-send=3D"true" href=3D"http://guadalupe1.v10= 0.abes.fr" rel=3D"noreferrer" target=3D"_blank">guadalupe1.= v100.abes.fr</a>', 'uuid': '8eb485cd-31c4-4c3a-a315-3dc6d<= wbr>3ddc0c9'}]<br> MainProcess|jsonrpc.Executor/7<= wbr>::DEBUG::2016-12-16 11:53:21,490::supervdsmServer:<= wbr>:92::SuperVdsm.ServerCallback:<wbr>:(wrapper) call peerProbe with () {}<br> MainProcess|jsonrpc.Executor/7<= wbr>::DEBUG::2016-12-16 11:53:21,491::commands::68::ro<= wbr>ot::(execCmd) /usr/bin/taskset --cpu-list 0-63 /usr/sbin/gluster --mode=3Dscript peer probe <a moz-do-not-send=3D"true" href=3D"http://guadalupe1.v10= 0.abes.fr" rel=3D"noreferrer" target=3D"_blank">guadalupe1.= v100.abes.fr</a> --xml (cwd None)<br> MainProcess|jsonrpc.Executor/7<= wbr>::DEBUG::2016-12-16 11:53:21,570::commands::86::ro<= wbr>ot::(execCmd) SUCCESS: <err> =3D ''; <rc> =3D 0<br> MainProcess|jsonrpc.Executor/7<= wbr>::DEBUG::2016-12-16 11:53:21,570::supervdsmServer:<= wbr>:99::SuperVdsm.ServerCallback:<wbr>:(wrapper) return peerProbe with True<br> <br> We can see guadalupe2 can see guadalupe1 but taskset still executes peer probe to guadalupe1 with message "Host <a moz-do-not-send=3D"true" href=3D"http://guadalupe1.v10= 0.abes.fr" rel=3D"noreferrer" target=3D"_blank">guadalupe1.= v100.abes.fr</a> port 24007 already in peer list"<br> <br> How can I say to guadalupe2 stop trying to probe guadalupe1?<br> <br> <br> -- <br> Nathana=C3=ABl Blanchet<br> <br> Supervision r=C3=A9seau<br> P=C3=B4le Infrastrutures Informatiques<br> 227 avenue Professeur-Jean-Louis-Viala<br> 34193 MONTPELLIER CEDEX 5=C2=A0= =C2=A0 =C2=A0 =C2=A0<br> T=C3=A9l. 33 (0)4 67 54 84 55<b= r> Fax=C2=A0 33 (0)4 67 54 84 14<b= r> <a moz-do-not-send=3D"true" href=3D"mailto:blanchet@abes.= fr" target=3D"_blank">blanchet@ab= es.fr</a><br> <br> ______________________________<= wbr>_________________<br> Users mailing list<br> <a moz-do-not-send=3D"true" href=3D"mailto:Users@ovirt.or= g" target=3D"_blank">Users@ovirt= .org</a><br> <a moz-do-not-send=3D"true" href=3D"http://lists.ovirt.or= g/mailman/listinfo/users" rel=3D"noreferrer" target=3D"_blank">http://list= s.ovirt.org/mailman<wbr>/listinfo/users</a><br> </blockquote> </div> <br> </div> </blockquote> <br> <pre class=3D"m_-152671267708035808m_89= 28738385687730066moz-signature" cols=3D"72">--=20 Nathana=C3=ABl Blanchet Supervision r=C3=A9seau P=C3=B4le Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 =09 T=C3=A9l. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 <a moz-do-not-send=3D"true" class=3D"m_-152671267708035808m_8928738385687= 730066moz-txt-link-abbreviated" href=3D"mailto:blanchet@abes.fr" target=3D= "_blank">blanchet@abes.fr</a> </pre> </div> </div> </div> </blockquote> </div> <br> </div> </blockquote> <br> <pre class=3D"m_-152671267708035808moz-signature" col= s=3D"72">--=20 Nathana=C3=ABl Blanchet Supervision r=C3=A9seau P=C3=B4le Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 =09 T=C3=A9l. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 <a moz-do-not-send=3D"true" class=3D"m_-152671267708035808moz-txt-link-ab= breviated" href=3D"mailto:blanchet@abes.fr" target=3D"_blank">blanchet@ab= es.fr</a> </pre> </div> </div> </div> </blockquote> </div> <br> </div> </div> </blockquote> <br> <pre class=3D"moz-signature" cols=3D"72">--=20 Nathana=C3=ABl Blanchet Supervision r=C3=A9seau P=C3=B4le Infrastrutures Informatiques 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 =09 T=C3=A9l. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 <a class=3D"moz-txt-link-abbreviated" href=3D"mailto:blanchet@abes.fr">bl= anchet@abes.fr</a> </pre> </body> </html> --------------858C29C457E2769C648EAA1C--