[ovirt-users] Unable to reactivate host after reboot due to failed Gluster probe

Jan Siml jsiml at plusline.net
Thu Jan 29 04:39:36 EST 2015


Hello,

we have a strange behavior within an oVirt cluster. Version is 3.5.1, 
engine is running on EL6 machine and hosts are using EL7 as operating 
system. The cluster uses a GlusterFS backed storage domain amongst 
others. Three of four hosts are peers in the Gluster cluster (3 bricks, 
3 replica).

When all hosts are restarted (maybe due to power outage), engine can't 
activate them again, because Gluster probe fails. The message given in 
UI is:

"Gluster command [gluster peer node-03] failed on server node-03."

Checking Gluster peer and volume status on each host confirms that 
Gluster peers are known to each other and volume is up.

node-03:~ $ gluster peer status
Number of Peers: 2

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

node-03:~ $ gluster volume status
Status of volume: glusterfs-1
Gluster process					Port	Online	Pid
------------------------------------------------------------------------------
Brick node-01:/export/glusterfs/brick           49152	Y	12409
Brick node-02:/export/glusterfs/brick		49153	Y	9978
Brick node-03:/export/glusterfs/brick		49152	Y	10001
Self-heal Daemon on localhost			N/A	Y	10003
Self-heal Daemon on node-01			N/A	Y	11590
Self-heal Daemon on node-02			N/A	Y	9988

Task Status of Volume glusterfs-1
------------------------------------------------------------------------------
There are no active volume tasks

Storage domain in oVirt UI is fine (active and green) and usable. But 
neither Gluster volume nor any brick is visible in UI.

If I try the command which is shown in UI it returns:

root at node-03:~ $ gluster peer probe node-03
peer probe: success. Probe on localhost not needed

root at node-03:~ $ gluster --mode=script peer probe node-03 --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
   <opRet>0</opRet>
   <opErrno>1</opErrno>
   <opErrstr>(null)</opErrstr>
   <output>Probe on localhost not needed</output>
</cliOutput>

Is this maybe just an engine side parsing error?

-- 
Kind regards

Jan Siml


More information about the Users mailing list