Hello,
finally I got the nodes online. What helps was probing the not needed
peer node-04 (no brick) from one of the other cluster nodes. When the
node becames a Gluster peer, I am able to activate any oVirt node which
serves bricks.
Therefore I assume, the error message which the UI returns comes from
node-04:
root@node-04:~ $ gluster peer probe node-01
peer probe: failed: Probe returned with unknown errno 107
root@node-03:~ $ gluster peer status
Number of Peers: 2
Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)
Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)
root@node-03:~ $ gluster peer probe node-04
peer probe: success.
root@node-03:~ $ gluster peer status
Number of Peers: 3
Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)
Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)
Hostname: node-04
Uuid: 9cdefc68-d710-4346-93b1-76b5307e258b
State: Peer in Cluster (Connected)
This (oVirt's behavior) seems to be reproducible.
On 01/29/2015 11:10 AM, Jan Siml wrote:
Hello,
when looking into engine.log, I can see, that "gluster probe" returned
errno 107. But I can't figure out why:
2015-01-29 10:40:03,546 ERROR
[org.ovirt.engine.core.bll.InitVdsOnUpCommand]
(DefaultQuartzScheduler_Worker-59) [5977aac5] Could not peer probe the
gluster server node-03. Error: VdcBLLException: org.ovirt.eng
ine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException:
VDSErrorException: Failed to AddGlusterServerVDS, error = Add host failed
error: Probe returned with unknown errno 107
Just for the record: We use the /etc/hosts method because of missing
possibility to choose the network interface for Gluster. The three
Gluster peer hosts have modified /etc/hosts files with addresses binded
to a different interface than the ovirtmgmt addresses.
Example:
root@node-03:~ $ cat /etc/hosts
192.168.200.195 node-01
192.168.200.196 node-02
192.168.200.198 node-03
The /etc/hosts file on engine host isn't modified.
On 01/29/2015 10:39 AM, Jan Siml wrote:
> Hello,
>
> we have a strange behavior within an oVirt cluster. Version is 3.5.1,
> engine is running on EL6 machine and hosts are using EL7 as operating
> system. The cluster uses a GlusterFS backed storage domain amongst
> others. Three of four hosts are peers in the Gluster cluster (3 bricks,
> 3 replica).
>
> When all hosts are restarted (maybe due to power outage), engine can't
> activate them again, because Gluster probe fails. The message given in
> UI is:
>
> "Gluster command [gluster peer node-03] failed on server node-03."
>
> Checking Gluster peer and volume status on each host confirms that
> Gluster peers are known to each other and volume is up.
>
> node-03:~ $ gluster peer status
> Number of Peers: 2
>
> Hostname: node-02
> Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
> State: Peer in Cluster (Connected)
>
> Hostname: node-01
> Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
> State: Peer in Cluster (Connected)
>
> node-03:~ $ gluster volume status
> Status of volume: glusterfs-1
> Gluster process Port Online Pid
> ------------------------------------------------------------------------------
>
>
> Brick node-01:/export/glusterfs/brick 49152 Y 12409
> Brick node-02:/export/glusterfs/brick 49153 Y 9978
> Brick node-03:/export/glusterfs/brick 49152 Y 10001
> Self-heal Daemon on localhost N/A Y 10003
> Self-heal Daemon on node-01 N/A Y 11590
> Self-heal Daemon on node-02 N/A Y 9988
>
> Task Status of Volume glusterfs-1
> ------------------------------------------------------------------------------
>
>
> There are no active volume tasks
>
> Storage domain in oVirt UI is fine (active and green) and usable. But
> neither Gluster volume nor any brick is visible in UI.
>
> If I try the command which is shown in UI it returns:
>
> root@node-03:~ $ gluster peer probe node-03
> peer probe: success. Probe on localhost not needed
>
> root@node-03:~ $ gluster --mode=script peer probe node-03 --xml
> <?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
> <cliOutput>
> <opRet>0</opRet>
> <opErrno>1</opErrno>
> <opErrstr>(null)</opErrstr>
> <output>Probe on localhost not needed</output>
> </cliOutput>
>
> Is this maybe just an engine side parsing error?
>
--
Kind regards
Jan Siml