[ovirt-users] Unable to reactivate host after reboot due to failed Gluster probe

Thu Jan 29 10:56:00 UTC 2015

Hello,

finally I got the nodes online. What helps was probing the not needed 
peer node-04 (no brick) from one of the other cluster nodes. When the 
node becames a Gluster peer, I am able to activate any oVirt node which 
serves bricks.

Therefore I assume, the error message which the UI returns comes from 
node-04:

root at node-04:~ $ gluster peer probe node-01
peer probe: failed: Probe returned with unknown errno 107

root at node-03:~ $ gluster peer status
Number of Peers: 2

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

root at node-03:~ $ gluster peer probe node-04
peer probe: success.

root at node-03:~ $ gluster peer status
Number of Peers: 3

Hostname: node-01
Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
State: Peer in Cluster (Connected)

Hostname: node-02
Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
State: Peer in Cluster (Connected)

Hostname: node-04
Uuid: 9cdefc68-d710-4346-93b1-76b5307e258b
State: Peer in Cluster (Connected)

This (oVirt's behavior) seems to be reproducible.

On 01/29/2015 11:10 AM, Jan Siml wrote:
> Hello,
>
> when looking into engine.log, I can see, that "gluster probe" returned
> errno 107. But I can't figure out why:
>
> 2015-01-29 10:40:03,546 ERROR
> [org.ovirt.engine.core.bll.InitVdsOnUpCommand]
> (DefaultQuartzScheduler_Worker-59) [5977aac5] Could not peer probe the
> gluster server node-03. Error: VdcBLLException: org.ovirt.eng
> ine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException:
> VDSErrorException: Failed to AddGlusterServerVDS, error = Add host failed
> error: Probe returned with unknown errno 107
>
> Just for the record: We use the /etc/hosts method because of missing
> possibility to choose the network interface for Gluster. The three
> Gluster peer hosts have modified /etc/hosts files with addresses binded
> to a different interface than the ovirtmgmt addresses.
>
> Example:
>
> root at node-03:~ $ cat /etc/hosts
> 192.168.200.195  node-01
> 192.168.200.196  node-02
> 192.168.200.198  node-03
>
> The /etc/hosts file on engine host isn't modified.
>
>
> On 01/29/2015 10:39 AM, Jan Siml wrote:
>> Hello,
>>
>> we have a strange behavior within an oVirt cluster. Version is 3.5.1,
>> engine is running on EL6 machine and hosts are using EL7 as operating
>> system. The cluster uses a GlusterFS backed storage domain amongst
>> others. Three of four hosts are peers in the Gluster cluster (3 bricks,
>> 3 replica).
>>
>> When all hosts are restarted (maybe due to power outage), engine can't
>> activate them again, because Gluster probe fails. The message given in
>> UI is:
>>
>> "Gluster command [gluster peer node-03] failed on server node-03."
>>
>> Checking Gluster peer and volume status on each host confirms that
>> Gluster peers are known to each other and volume is up.
>>
>> node-03:~ $ gluster peer status
>> Number of Peers: 2
>>
>> Hostname: node-02
>> Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
>> State: Peer in Cluster (Connected)
>>
>> Hostname: node-01
>> Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
>> State: Peer in Cluster (Connected)
>>
>> node-03:~ $ gluster volume status
>> Status of volume: glusterfs-1
>> Gluster process                    Port    Online    Pid
>> ------------------------------------------------------------------------------
>>
>>
>> Brick node-01:/export/glusterfs/brick           49152    Y    12409
>> Brick node-02:/export/glusterfs/brick        49153    Y    9978
>> Brick node-03:/export/glusterfs/brick        49152    Y    10001
>> Self-heal Daemon on localhost            N/A    Y    10003
>> Self-heal Daemon on node-01            N/A    Y    11590
>> Self-heal Daemon on node-02            N/A    Y    9988
>>
>> Task Status of Volume glusterfs-1
>> ------------------------------------------------------------------------------
>>
>>
>> There are no active volume tasks
>>
>> Storage domain in oVirt UI is fine (active and green) and usable. But
>> neither Gluster volume nor any brick is visible in UI.
>>
>> If I try the command which is shown in UI it returns:
>>
>> root at node-03:~ $ gluster peer probe node-03
>> peer probe: success. Probe on localhost not needed
>>
>> root at node-03:~ $ gluster --mode=script peer probe node-03 --xml
>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>> <cliOutput>
>>    <opRet>0</opRet>
>>    <opErrno>1</opErrno>
>>    <opErrstr>(null)</opErrstr>
>>    <output>Probe on localhost not needed</output>
>> </cliOutput>
>>
>> Is this maybe just an engine side parsing error?
>>
>

-- 
Kind regards

Jan Siml