[ovirt-users] Unable to reactivate host after reboot due to failed Gluster probe

Thu Jan 29 12:06:53 UTC 2015

On 01/29/2015 04:26 PM, Jan Siml wrote:
> Hello,
>
> finally I got the nodes online. What helps was probing the not needed 
> peer node-04 (no brick) from one of the other cluster nodes. When the 
> node becames a Gluster peer, I am able to activate any oVirt node 
> which serves bricks.
>
> Therefore I assume, the error message which the UI returns comes from 
> node-04:

Yes, this could be an issue as all other successful cases, the value for 
opErrno is retruned as 0 and opErrStr is blank.
I feel this scenario is treated as an error engine side.

>
> root at node-04:~ $ gluster peer probe node-01
> peer probe: failed: Probe returned with unknown errno 107
>
> root at node-03:~ $ gluster peer status
> Number of Peers: 2
>
> Hostname: node-01
> Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
> State: Peer in Cluster (Connected)
>
> Hostname: node-02
> Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
> State: Peer in Cluster (Connected)
>
> root at node-03:~ $ gluster peer probe node-04
> peer probe: success.
>
> root at node-03:~ $ gluster peer status
> Number of Peers: 3
>
> Hostname: node-01
> Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
> State: Peer in Cluster (Connected)
>
> Hostname: node-02
> Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
> State: Peer in Cluster (Connected)
>
> Hostname: node-04
> Uuid: 9cdefc68-d710-4346-93b1-76b5307e258b
> State: Peer in Cluster (Connected)
>
> This (oVirt's behavior) seems to be reproducible.
>
> On 01/29/2015 11:10 AM, Jan Siml wrote:
>> Hello,
>>
>> when looking into engine.log, I can see, that "gluster probe" returned
>> errno 107. But I can't figure out why:
>>
>> 2015-01-29 10:40:03,546 ERROR
>> [org.ovirt.engine.core.bll.InitVdsOnUpCommand]
>> (DefaultQuartzScheduler_Worker-59) [5977aac5] Could not peer probe the
>> gluster server node-03. Error: VdcBLLException: org.ovirt.eng
>> ine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException:
>> VDSErrorException: Failed to AddGlusterServerVDS, error = Add host 
>> failed
>> error: Probe returned with unknown errno 107
>>
>> Just for the record: We use the /etc/hosts method because of missing
>> possibility to choose the network interface for Gluster. The three
>> Gluster peer hosts have modified /etc/hosts files with addresses binded
>> to a different interface than the ovirtmgmt addresses.
>>
>> Example:
>>
>> root at node-03:~ $ cat /etc/hosts
>> 192.168.200.195  node-01
>> 192.168.200.196  node-02
>> 192.168.200.198  node-03
>>
>> The /etc/hosts file on engine host isn't modified.
>>
>>
>> On 01/29/2015 10:39 AM, Jan Siml wrote:
>>> Hello,
>>>
>>> we have a strange behavior within an oVirt cluster. Version is 3.5.1,
>>> engine is running on EL6 machine and hosts are using EL7 as operating
>>> system. The cluster uses a GlusterFS backed storage domain amongst
>>> others. Three of four hosts are peers in the Gluster cluster (3 bricks,
>>> 3 replica).
>>>
>>> When all hosts are restarted (maybe due to power outage), engine can't
>>> activate them again, because Gluster probe fails. The message given in
>>> UI is:
>>>
>>> "Gluster command [gluster peer node-03] failed on server node-03."
>>>
>>> Checking Gluster peer and volume status on each host confirms that
>>> Gluster peers are known to each other and volume is up.
>>>
>>> node-03:~ $ gluster peer status
>>> Number of Peers: 2
>>>
>>> Hostname: node-02
>>> Uuid: 3fc36f55-d3a2-4efc-b2f0-31f83ed709d9
>>> State: Peer in Cluster (Connected)
>>>
>>> Hostname: node-01
>>> Uuid: 18027b35-971b-4b21-bb3d-df252b4dd525
>>> State: Peer in Cluster (Connected)
>>>
>>> node-03:~ $ gluster volume status
>>> Status of volume: glusterfs-1
>>> Gluster process                    Port    Online    Pid
>>> ------------------------------------------------------------------------------ 
>>>
>>>
>>>
>>> Brick node-01:/export/glusterfs/brick           49152    Y 12409
>>> Brick node-02:/export/glusterfs/brick        49153    Y 9978
>>> Brick node-03:/export/glusterfs/brick        49152    Y 10001
>>> Self-heal Daemon on localhost            N/A    Y    10003
>>> Self-heal Daemon on node-01            N/A    Y    11590
>>> Self-heal Daemon on node-02            N/A    Y    9988
>>>
>>> Task Status of Volume glusterfs-1
>>> ------------------------------------------------------------------------------ 
>>>
>>>
>>>
>>> There are no active volume tasks
>>>
>>> Storage domain in oVirt UI is fine (active and green) and usable. But
>>> neither Gluster volume nor any brick is visible in UI.
>>>
>>> If I try the command which is shown in UI it returns:
>>>
>>> root at node-03:~ $ gluster peer probe node-03
>>> peer probe: success. Probe on localhost not needed
>>>
>>> root at node-03:~ $ gluster --mode=script peer probe node-03 --xml
>>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>>> <cliOutput>
>>>    <opRet>0</opRet>
>>>    <opErrno>1</opErrno>
>>>    <opErrstr>(null)</opErrstr>
>>>    <output>Probe on localhost not needed</output>
>>> </cliOutput>
>>>
>>> Is this maybe just an engine side parsing error?
>>>
>>
>