[ovirt-users] 4.0 - 2nd node fails on deploy

Wed Oct 5 10:31:18 UTC 2016

On Wed, Oct 5, 2016 at 1:56 PM, Jason Jeffrey <jason at sudo.co.uk> wrote:

> HI,
>
>
>
> Logs attached
>

Have you probed 2 interfaces for same host, that is - dcasrv02 and
dcastor02? Does "gluster peer status" understand both names as for same
host?

>From glusterd logs and the mount logs - the connection between the peers is
lost, and quorum is lost, which is reaffirming what Simone said earlier.
Logs seem to indicate network issues - check the direct link setup. See
below

>From mount logs:
[2016-10-04 17:26:15.718300] E [socket.c:2292:socket_connect_finish]
0-engine-client-2: connection to 10.100.103.3:24007 failed (No route to
host)
[2016-10-04 17:26:15.718345] W [MSGID: 108001]
[afr-common.c:4379:afr_notify] 0-engine-replicate-0: Client-quorum is not
met
[2016-10-04 17:26:16.428290] E [socket.c:2292:socket_connect_finish]
0-engine-client-1: connection to 10.100.101.2:24007 failed (No route to
host)
[2016-10-04 17:26:16.428336] E [MSGID: 108006]
[afr-common.c:4321:afr_notify] 0-engine-replicate-0: All subvolumes are
down. Going offline until atleast one of them comes back up

And in glusterd logs:
[2016-10-04 17:24:39.522402] E [socket.c:2292:socket_connect_finish]
0-management: connection to 10.100.50.82:24007 failed (No route to host)
[2016-10-04 17:24:39.522578] I [MSGID: 106004]
[glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer
<dcasrv02> (<1e788fc9-dfe9-4753-92c7-76a95c8d0891>), in state <Peer in
Cluster>, has disconnected from glusterd.
[2016-10-04 17:24:39.523272] C [MSGID: 106002]
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume engine. Stopping local bricks.
[2016-10-04 17:24:39.523314] I [MSGID: 106132]
[glusterd-utils.c:1560:glusterd_service_stop] 0-management: brick already
stopped
[2016-10-04 17:24:39.526188] E [socket.c:2292:socket_connect_finish]
0-management: connection to 10.100.103.3:24007 failed (No route to host)
[2016-10-04 17:24:39.526219] I [MSGID: 106004]
[glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer
<dcastor03> (<9a9c037e-96cd-4f73-9800-a1df5cdd2818>), in state <Peer in
Cluster>, has disconnected from glusterd.

> Thanks
>
>
>
> *From:* Sahina Bose [mailto:sabose at redhat.com]
> *Sent:* 05 October 2016 08:11
> *To:* Jason Jeffrey <jason at sudo.co.uk>; gluster-users at gluster.org;
> Ravishankar Narayanankutty <ravishankar at redhat.com>
> *Cc:* Simone Tiraboschi <stirabos at redhat.com>; users <users at ovirt.org>
>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
> [Adding gluster-users ML]
>
> The brick logs are filled with errors :
> [2016-10-05 19:30:28.659061] E [MSGID: 113077] [posix-handle.c:309:posix_handle_pump]
> 0-engine-posix: malformed internal link /var/run/vdsm/storage/
> 0a021563-91b5-4f49-9c6b-fff45e85a025/d84f0551-0f2b-457c-808c-6369c6708d43/
> 1b5a5e34-818c-4914-8192-2f05733b5583 for /xpool/engine/brick/.
> glusterfs/b9/8e/b98ed8d2-3bf9-4b11-92fd-ca5324e131a8
> [2016-10-05 19:30:28.659069] E [MSGID: 113091] [posix.c:180:posix_lookup]
> 0-engine-posix: Failed to create inode handle for path
> <gfid:b98ed8d2-3bf9-4b11-92fd-ca5324e131a8>
> The message "E [MSGID: 113018] [posix.c:198:posix_lookup] 0-engine-posix:
> lstat on null failed" repeated 3 times between [2016-10-05 19:30:28.656529]
> and [2016-10-05 19:30:28.659076]
> [2016-10-05 19:30:28.659087] W [MSGID: 115005]
> [server-resolve.c:126:resolve_gfid_cbk] 0-engine-server:
> b98ed8d2-3bf9-4b11-92fd-ca5324e131a8: failed to resolve (Success)
>
> - Ravi, the above are from the data brick of the arbiter volume. Can you
> take a look?
>
>
>
> Jason,
>
> Could you also provide the mount logs from the first host
> (/var/log/glusterfs/rhev-data-center-mnt-glusterSD*engine.log) and
> glusterd log (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) around
> the same time frame.
>
>
>
>
>
> On Wed, Oct 5, 2016 at 3:28 AM, Jason Jeffrey <jason at sudo.co.uk> wrote:
>
> Hi,
>
>
>
> Servers are powered  off  when I’m not looking at the problem.
>
>
>
> There may have been instances where all three were not powered on, during
> the same period.
>
>
>
> Glusterhd log attached, the xpool-engine-brick log is over 1 GB in size,
> I’ve taken a sample of the last  couple days, looks to be highly repative.
>
>
>
> Cheers
>
>
>
> Jason
>
>
>
>
>
>
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com]
> *Sent:* 04 October 2016 16:50
>
>
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey <jason at sudo.co.uk> wrote:
>
> Hi,
>
>
>
> DCASTORXX is a hosts entry for dedicated  direct 10GB links (each private
> /28) between the x3 servers  i.e 1=> 2&3, 2=> 1&3, etc) planned to be used
> solely for storage.
>
>
>
> I,e
>
>
>
> 10.100.50.81    dcasrv01
>
> 10.100.101.1    dcastor01
>
> 10.100.50.82    dcasrv02
>
> 10.100.101.2    dcastor02
>
> 10.100.50.83    dcasrv03
>
> 10.100.103.3    dcastor03
>
>
>
> These were setup with the gluster commands
>
>
>
> ·         gluster volume create iso replica 3 arbiter 1
> dcastor01:/xpool/iso/brick   dcastor02:/xpool/iso/brick
> dcastor03:/xpool/iso/brick
>
> ·         gluster volume create export replica 3 arbiter 1
> dcastor02:/xpool/export/brick  dcastor03:/xpool/export/brick
> dcastor01:/xpool/export/brick
>
> ·         gluster volume create engine replica 3 arbiter 1
> dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick
> dcastor03:/xpool/engine/brick
>
> ·         gluster volume create data replica 3 arbiter 1
> dcastor01:/xpool/data/brick  dcastor03:/xpool/data/brick
> dcastor02:/xpool/data/bricky
>
>
>
>
>
> So yes, DCASRV01 is the server (pri) and have local bricks access through
> DCASTOR01 interface
>
>
>
> Is the issue here not the incorrect soft link ?
>
>
>
> No, this should be fine.
>
>
>
> The issue is that periodically your gluster volume losses its server
> quorum and become unavailable.
>
> It happened more than once from your logs.
>
>
>
> Can you please attach also gluster logs for that volume?
>
>
>
>
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>
> [root at dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-
> 76a4876ecaaf/
>
> ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
> No such file or directory
>
> But the data does exist
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>
> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>
> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>
> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93
>
> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93.lease
>
> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17 cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com]
> *Sent:* 04 October 2016 14:40
>
>
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>
>
>
>
> On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey <jason at sudo.co.uk> wrote:
>
> Hi,
>
>
>
> Another problem has appeared, after rebooting the primary the VM will not
> start.
>
>
>
> Appears the symlink is broken between gluster mount ref and vdsm
>
>
>
> The first host was correctly deployed but it seas that you are facing some
> issue connecting the storage.
>
> Can you please attach vdsm logs and /var/log/messages from the first host?
>
>
>
> Thanks Jason,
>
> I suspect that your issue is related to this:
>
> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 17:24:39.522620] C [MSGID: 106002] [glusterd-server-quorum.c:351:
> glusterd_do_volume_quorum_action] 0-management: Server quorum lost for
> volume data. Stopping local bricks.
>
> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 17:24:39.523272] C [MSGID: 106002] [glusterd-server-quorum.c:351:
> glusterd_do_volume_quorum_action] 0-management: Server quorum lost for
> volume engine. Stopping local bricks.
>
>
>
> and for some time your gluster volume has been working.
>
>
>
> But then:
>
> Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o
> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
> Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o
> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
> packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending
> is deprecated. Use Dispatcher.socket.pending instead.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
> 'pending', lambda: 0)
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
> packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending
> is deprecated. Use Dispatcher.socket.pending instead.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
> 'pending', lambda: 0)
>
> Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error
> during reading data: unexpected eof
>
> Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to
> storage server failed' - trying to restart agent
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error:
> 'Connection to storage server failed' - trying to restart agent
>
> Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 18:02:12.384611] C [MSGID: 106003] [glusterd-server-quorum.c:346:
> glusterd_do_volume_quorum_action] 0-management: Server quorum regained
> for volume data. Starting local bricks.
>
> Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 18:02:12.388981] C [MSGID: 106003] [glusterd-server-quorum.c:346:
> glusterd_do_volume_quorum_action] 0-management: Server quorum regained
> for volume engine. Starting local bricks.
>
>
>
> And at that point VDSM started complaining that the hosted-engine-storage
> domain doesn't exist anymore:
>
> Oct  4 19:02:30 dcasrv01 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.lib.image.Image ERROR Error fetching volumes list:
> Storage domain does not exist: (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
>
> Oct  4 19:02:30 dcasrv01 ovirt-ha-agent: ERROR:ovirt_hosted_engine_ha.lib.image.Image:Error
> fetching volumes list: Storage domain does not exist:
> (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
>
>
>
> I see from the logs that the ovirt-ha-agent is trying to mount the
> hosted-engine storage domain as:
>
> /usr/bin/mount -t glusterfs -o backup-volfile-servers=dcastor02:dcastor03
> dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
>
>
> Pointing to dcastor01, dcastor02 and dcastor03 while your server is
> dcasrv01.
>
> But at the same time it seams that also dcasrv01 has local bricks for the
> same engine volume.
>
>
>
> So, is dcasrv01 just an alias fro dcastor01? if not you probably have some
> issue with the configuration of your gluster volume.
>
>
>
>
>
>
>
> From broker.log
>
>
>
> Thread-169::ERROR::2016-10-04 22:44:16,189::storage_broker::138::
> ovirt_hosted_engine_ha.broker.storage_broker.
> StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata
> from /rhev/data-center/mnt/glusterSD/dcastor01:engine/
> bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/hosted-engine.metadata
>
>
>
> [root at dcasrv01 ovirt-hosted-engine-ha]# ls -al /rhev/data-center/mnt/
> glusterSD/dcastor01\:engine/bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/
>
> total 9
>
> drwxrwx---. 2 vdsm kvm 4096 Oct  3 17:27 .
>
> drwxr-xr-x. 5 vdsm kvm 4096 Oct  3 17:17 ..
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.lockspace ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/23d81b73-bcb7-
> 4742-abde-128522f43d78/11d6a3e1-1817-429d-b2e0-9051a3cf41a4
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>
>
>
> [root at dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-
> 76a4876ecaaf/
>
> ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
> No such file or directory
>
>
>
> Though file appears to be there
>
>
>
> Gluster is setup as xpool/engine
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# pwd
>
> /xpool/engine/brick/bbb70623-194a-46d2-a164-76a4876ecaaf/
> images/fd44dbf9-473a-496a-9996-c8abe3278390
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>
> total 2060
>
> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>
> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>
> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93
>
> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93.lease
>
> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17 cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>
>
>
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume info
>
>
>
> Volume Name: data
>
> Type: Replicate
>
> Volume ID: 54fbcafc-fed9-4bce-92ec-fa36cdcacbd4
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/data/brick
>
> Brick2: dcastor03:/xpool/data/brick
>
> Brick3: dcastor02:/xpool/data/bricky (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: engine
>
> Type: Replicate
>
> Volume ID: dd4c692d-03aa-4fc6-9011-a8dad48dad96
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/engine/brick
>
> Brick2: dcastor02:/xpool/engine/brick
>
> Brick3: dcastor03:/xpool/engine/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: export
>
> Type: Replicate
>
> Volume ID: 23f14730-d264-4cc2-af60-196b943ecaf3
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor02:/xpool/export/brick
>
> Brick2: dcastor03:/xpool/export/brick
>
> Brick3: dcastor01:/xpool/export/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: iso
>
> Type: Replicate
>
> Volume ID: b2d3d7e2-9919-400b-8368-a0443d48e82a
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/iso/brick
>
> Brick2: dcastor02:/xpool/iso/brick
>
> Brick3: dcastor03:/xpool/iso/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
>
>
> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
> status
>
> Status of volume: data
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/data/brick           49153     0          Y
> 3076
>
> Brick dcastor03:/xpool/data/brick           49153     0          Y
> 3019
>
> Brick dcastor02:/xpool/data/bricky          49153     0          Y
> 3857
>
> NFS Server on localhost                     2049      0          Y
>     3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume data
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: engine
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/engine/brick         49152     0          Y
> 3131
>
> Brick dcastor02:/xpool/engine/brick         49152     0          Y
> 3852
>
> Brick dcastor03:/xpool/engine/brick         49152     0          Y
> 2992
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume engine
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: export
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor02:/xpool/export/brick         49155     0          Y
> 3872
>
> Brick dcastor03:/xpool/export/brick         49155     0          Y
> 3147
>
> Brick dcastor01:/xpool/export/brick         49155     0          Y
> 3150
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume export
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> Status of volume: iso
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick dcastor01:/xpool/iso/brick            49154     0          Y
> 3152
>
> Brick dcastor02:/xpool/iso/brick            49154     0          Y
> 3881
>
> Brick dcastor03:/xpool/iso/brick            49154     0          Y
> 3146
>
> NFS Server on localhost                     2049      0          Y
> 3097
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3088
>
> NFS Server on dcastor03                     2049      0          Y
> 3039
>
> Self-heal Daemon on dcastor03               N/A       N/A        Y
> 3114
>
> NFS Server on dcasrv02                      2049      0          Y
> 3871
>
> Self-heal Daemon on dcasrv02                N/A       N/A        Y
> 3864
>
>
>
> Task Status of Volume iso
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
> *From:* users-bounces at ovirt.org [mailto:users-bounces at ovirt.org] *On
> Behalf Of *Jason Jeffrey
> *Sent:* 03 October 2016 18:40
> *To:* users at ovirt.org
>
>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
> Hi,
>
>
>
> Setup log attached for primary
>
>
>
> Regards
>
>
>
> Jason
>
>
>
> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com
> <stirabos at redhat.com>]
> *Sent:* 03 October 2016 09:27
> *To:* Jason Jeffrey <jason at sudo.co.uk>
> *Cc:* users <users at ovirt.org>
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Mon, Oct 3, 2016 at 12:45 AM, Jason Jeffrey <jason at sudo.co.uk> wrote:
>
> Hi,
>
>
>
> I am trying to build a x3 HC cluster, with a self hosted engine using
> gluster.
>
>
>
> I have successful built the 1st node,  however when I attempt to run
> hosted-engine –deploy on node 2, I get the following error
>
>
>
> [WARNING] A configuration file must be supplied to deploy Hosted Engine on
> an additional host.
>
> [ ERROR ] 'version' is not stored in the HE configuration image
>
> [ ERROR ] Unable to get the answer file from the shared storage
>
> [ ERROR ] Failed to execute stage 'Environment customization': Unable to
> get the answer file from the shared storage
>
> [ INFO  ] Stage: Clean up
>
> [ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-
> setup/answers/answers-20161002232505.conf'
>
> [ INFO  ] Stage: Pre-termination
>
> [ INFO  ] Stage: Termination
>
> [ ERROR ] Hosted Engine deployment failed
>
>
>
> Looking at the failure in the log file..
>
>
>
> Can you please attach hosted-engine-setup logs from the first host?
>
>
>
>
>
> 2016-10-02 23:25:05 WARNING otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._customization:151 A configuration
>
> file must be supplied to deploy Hosted Engine on an additional host.
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:61 _fetch_answer_f
>
> ile
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:69 fetching from:
>
> /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7
>
> 8cb2527-a2e2-489a-9fad-465a72221b37
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i
>
> f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/
> 0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c-
> 02f9-4cd1-a22c-d6b56a0a8e9b
>
> /78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k'
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:70 executing: 'tar -tvf -'
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:88 stdout:
>
> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib._dd_pipe_tar:89 stderr:
>
> 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
> heconflib.validateConfImage:111 'version' is not stored
>
> in the HE configuration image
>
> 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:73 Unable to get t
>
> he answer file from the shared storage
>
>
>
> Looking at the detected gluster path - /rhev/data-center/mnt/
> glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>
>
>
> [root at dcasrv02 ~]# ls -al /rhev/data-center/mnt/
> glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>
> total 1049609
>
> drwxr-xr-x. 2 vdsm kvm       4096 Oct  2 04:46 .
>
> drwxr-xr-x. 6 vdsm kvm       4096 Oct  2 04:46 ..
>
> -rw-rw----. 1 vdsm kvm 1073741824 Oct  2 04:46 78cb2527-a2e2-489a-9fad-
> 465a72221b37
>
> -rw-rw----. 1 vdsm kvm    1048576 Oct  2 04:46 78cb2527-a2e2-489a-9fad-
> 465a72221b37.lease
>
> -rw-r--r--. 1 vdsm kvm        294 Oct  2 04:46 78cb2527-a2e2-489a-9fad-465a72221b37.meta
>
>
>
>
> 78cb2527-a2e2-489a-9fad-465a72221b37 is  a 1 GB file, is this the engine
> VM ?
>
>
>
> Copying the answers file form primary (/etc/ovirt-hosted-engine/answers.conf
> ) to  node 2 and rerunning produces the same error : (
>
> (hosted-engine --deploy  --config-append=/root/answers.conf )
>
>
>
> Also tried on node 3, same issues
>
>
>
> Happy to provide logs and other debugs
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161005/e32cbcd1/attachment-0001.html>