On Fri, Dec 16, 2016 at 11:00 PM, Nathanaël Blanchet <blanchet@abes.fr> wrote:



Le 16/12/2016 à 16:34, Sahina Bose a écrit :
Failed to find host 'Host[guadalupe1,7a30c899-a317-479a-b07b-244bc2374485]' in gluster peer list from 'Host[guadalupe1,7a30c899-a317-479a-b07b-244bc2374485]' on attempt 2
It looks the gluster uuid  saved in the ovirt engine db does not match the one returned from CLI

Was this host reinstalled?
You may need to remove host from engine and add it again. If that doesn't work you may need to manually change the uuid value in the database (gluster_server table)
Removing host did nothing, indeed I had to go to the gluster_server table to remove any disconnected host uuid, but it was not enough. Then I had then to remove the host and reinstall it as a new host.
Thank you, I've been spending a lot of time to solve this issue.

Sorry to hear that you had trouble with this. Could you explain a bit on how you got into this state?

Was it because you re-provisioned one of the gluster nodes and the gluster UUID was reset (without oVirt being aware of it?). Would like to either fix/enhance the engine to handle this if it's a common enough use-case
 


On Fri, Dec 16, 2016 at 7:00 PM, Nathanaël Blanchet <blanchet@abes.fr> wrote:
extract of the last engine logs, thank you


Le 16/12/2016 à 14:02, Sahina Bose a écrit :
Could you attach the engine log with this error?

On Fri, Dec 16, 2016 at 4:29 PM, Nathanaël Blanchet <blanchet@abes.fr> wrote:
Hi,

I used to successfully run a replica 3 gluster volume, but since the last 4.0.5 update, they can't connect each other with the message : gluster [gluster peer status guadalupe1.v100.abes.fr] command failed on server guadalupe2.v100.abes.fr.

So host guadalupe1 can't never be up.

When doing gluster peer probe, they are connected as expected. I reinstalled vdsm and gluster, but it is still the same.

I found this on guadalupe2 supervdsm.log

MainProcess|jsonrpc.Executor/6::DEBUG::2016-12-16 11:53:21,429::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return peerStatus with [{'status': 'CONNECTED', 'hostname': '10.34.101.56/24', 'uuid': 'c259c09b-8d7c-4b12-8745-677199877583'}, {'status': 'CONNECTED', 'hostname': 'guadalupe3.v100.abes.fr', 'uuid': '6af67cd3-7931-446d-aaa2-ffea51325adc'}, {'status': 'CONNECTED', 'hostname': 'guadalupe1.v100.abes.fr', 'uuid': '8eb485cd-31c4-4c3a-a315-3dc6d3ddc0c9'}]
MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16 11:53:21,490::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call peerProbe with () {}
MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16 11:53:21,491::commands::68::root::(execCmd) /usr/bin/taskset --cpu-list 0-63 /usr/sbin/gluster --mode=script peer probe guadalupe1.v100.abes.fr --xml (cwd None)
MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16 11:53:21,570::commands::86::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|jsonrpc.Executor/7::DEBUG::2016-12-16 11:53:21,570::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return peerProbe with True

We can see guadalupe2 can see guadalupe1 but taskset still executes peer probe to guadalupe1 with message "Host guadalupe1.v100.abes.fr port 24007 already in peer list"

How can I say to guadalupe2 stop trying to probe guadalupe1?


--
Nathanaël Blanchet

Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5       
Tél. 33 (0)4 67 54 84 55
Fax  33 (0)4 67 54 84 14
blanchet@abes.fr

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


-- 
Nathanaël Blanchet

Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 	
Tél. 33 (0)4 67 54 84 55
Fax  33 (0)4 67 54 84 14
blanchet@abes.fr 


-- 
Nathanaël Blanchet

Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 	
Tél. 33 (0)4 67 54 84 55
Fax  33 (0)4 67 54 84 14
blanchet@abes.fr