[ovirt-users] Cannot connect to gluster storage after HE installation

Tue May 30 13:26:39 UTC 2017

Hi

Ovirt-node-ng installation using iso image 20170526.
I've made five attempts, each one ended with different fail, still loking at : http://www.ovirt.org/blog/2017/04/up-and-running-with-ovirt-4-1-and-gluster-storage/

The last one was done, successfully (I hope)  taking care of :

1)    Configure networking before to set date & time, to have chronyd up & running

2)    Modifying gdeploy generated script, still looking at ntpd instead of chronyd

3)    Being a gluster based cluster, configured partition on each data disk (sdb > sdb1 type 8e + partprobe)

4)    Blacklisted all nodes on multipath.conf

5)    Double check if refuses from previous attempts was already visible (for example gluster volume group > vgremove -f -y <vg_name>).

After HE installation and restart, no advisory about additional servers to add to the cluster, so manually added as new servers. Successfully.

Now I must add storage, but unfortunately, nothing is shown in the Gluster drop-down list, even if I change the host.
I've chosen "Use managed gluster".

AT a first look, glusterd is up & running (but disabled at system startup !) :

aps-te65-mng.mydomain.it:    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled)
aps-te65-mng.mydomain.it:    Active: active (running) since Tue 2017-05-30 09:54:23 CEST; 4h 40min ago

aps-te66-mng.mydomain.it:    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled)
aps-te66-mng.mydomain.it:    Active: active (running) since Tue 2017-05-30 09:54:24 CEST; 4h 40min ago

aps-te67-mng.mydomain.it:    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled)
aps-te67-mng.mydomain.it:    Active: active (running) since Tue 2017-05-30 09:54:24 CEST; 4h 40min ago

data gluster volume is ok :

[root at aps-te65-mng ~]# gluster volume info data

Volume Name: data
Type: Replicate
Volume ID: ea6a2c9f-b042-42b4-9c0e-1f776e50b828
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: aps-te65-mng.mydomain.it:/gluster_bricks/data/data
Brick2: aps-te66-mng.mydomain.it:/gluster_bricks/data/data
Brick3: aps-te67-mng.mydomain.it:/gluster_bricks/data/data (arbiter)
Options Reconfigured:
cluster.granular-entry-heal: enable
performance.strict-o-direct: on
network.ping-timeout: 30
storage.owner-gid: 36
storage.owner-uid: 36
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root at aps-te65-mng ~]#

[root at aps-te65-mng ~]# gluster volume status data
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick aps-te65-mng.mydomain.it:/gluster_bric
ks/data/data                                49153     0          Y       52710
Brick aps-te66-mng.mydomain.it:/gluster_bric
ks/data/data                                49153     0          Y       45265
Brick aps-te67-mng.mydomain.it:/gluster_bric
ks/data/data                                49153     0          Y       45366
Self-heal Daemon on localhost               N/A       N/A        Y       57491
Self-heal Daemon on aps-te67-mng.mydomain.it N/A       N/A        Y       46488
Self-heal Daemon on aps-te66-mng.mydomain.it N/A       N/A        Y       46384

Task Status of Volume data
------------------------------------------------------------------------------
There are no active volume tasks

[root at aps-te65-mng ~]#

Any hints on this ? May I send logs ?
In hosted-engine log, apart fencing problems with HPE iLO3 agent, I can find only these errors:

2017-05-30 11:58:56,981+02 ERROR [org.ovirt.engine.core.utils.servlet.ServletUtils] (default task-23) [] Can't read file '/usr/share/ovirt-engine/files/spice/SpiceVersion.txt' for request '/ovirt-engine/services/files/spice/SpiceVersion.txt', will send a 404 error response.
2017-05-30 13:49:51,033+02 ERROR [org.ovirt.engine.core.utils.servlet.ServletUtils] (default task-64) [] Can't read file '/usr/share/ovirt-engine/files/spice/SpiceVersion.txt' for request '/ovirt-engine/services/files/spice/SpiceVersion.txt', will send a 404 error response.
2017-05-30 13:58:01,569+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-25) [677ae254] Command 'PollVDSCommand(HostName = aps-te66-mng.mydomain.it, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='3fea5320-33f5-4479-89ce-7d3bc575cd49'})' execution failed: VDSGenericException: VDSNetworkException: Timeout during rpc call
2017-05-30 13:58:01,574+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-25) [677ae254] Timeout waiting for VDSM response: Internal timeout occured
2017-05-30 13:58:02,079+02 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "2f3f5da7-48ec-43c9-a4c7-f935412f70ad"
2017-05-30 13:58:14,371+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [49d9ff38] EVENT_ID: VDS_FENCE_STATUS_FAILED(497), Correlation ID: 2f9cdb60, Call Stack: null, Custom Event ID: -1, Message: Failed to verify Host aps-te66-mng.mydomain.it power management.
2017-05-30 14:02:14,613+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-2) [2334b588] Command 'PollVDSCommand(HostName = aps-te67-mng.mydomain.it, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='d335c802-7e83-492f-855a-bcb31257bad1'})' execution failed: VDSGenericException: VDSNetworkException: Timeout during rpc call
2017-05-30 14:02:19,700+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-2) [2334b588] Timeout waiting for VDSM response: Internal timeout occured
2017-05-30 14:02:19,741+02 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "49804adf-283f-4c5c-be8c-907100d2ace5"
2017-05-30 15:05:30,066+02 ERROR [org.ovirt.engine.core.utils.servlet.ServletUtils] (default task-61) [] Can't read file '/usr/share/ovirt-engine/files/spice/SpiceVersion_x64.txt' for request '/ovirt-engine/services/files/spice/SpiceVersion_x64.txt', will send a 404 error response.

Vdsmd service is up & running in all three hosts.

Roberto

________________________________

Questo messaggio e' indirizzato esclusivamente al destinatario indicato e potrebbe contenere informazioni confidenziali, riservate o proprietarie. Qualora la presente venisse ricevuta per errore, si prega di segnalarlo immediatamente al mittente, cancellando l'originale e ogni sua copia e distruggendo eventuali copie cartacee. Ogni altro uso e' strettamente proibito e potrebbe essere fonte di violazione di legge.

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately, deleting the original and all copies and destroying any hard copies. Any other use is strictly prohibited and may be unlawful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170530/7782bbe7/attachment-0001.html>