[ovirt-users] 4.0 - 2nd node fails on deploy

Simone Tiraboschi stirabos at redhat.com
Tue Oct 4 13:40:25 UTC 2016


On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi <stirabos at redhat.com>
wrote:

>
>
> On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey <jason at sudo.co.uk> wrote:
>
>> Hi,
>>
>>
>>
>> Another problem has appeared, after rebooting the primary the VM will not
>> start.
>>
>>
>>
>> Appears the symlink is broken between gluster mount ref and vdsm
>>
>
> The first host was correctly deployed but it seas that you are facing some
> issue connecting the storage.
> Can you please attach vdsm logs and /var/log/messages from the first host?
>

Thanks Jason,
I suspect that your issue is related to this:
Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
17:24:39.522620] C [MSGID: 106002]
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume data. Stopping local bricks.
Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
17:24:39.523272] C [MSGID: 106002]
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume engine. Stopping local bricks.

and for some time your gluster volume has been working.

But then:
Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o
backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
/rhev/data-center/mnt/glusterSD/dcastor01:engine.
Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o
backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
/rhev/data-center/mnt/glusterSD/dcastor01:engine.
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
/usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352:
DeprecationWarning: Dispatcher.pending is deprecated. Use
Dispatcher.socket.pending instead.
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
'pending', lambda: 0)
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
/usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352:
DeprecationWarning: Dispatcher.pending is deprecated. Use
Dispatcher.socket.pending instead.
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
'pending', lambda: 0)
Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error
during reading data: unexpected eof
Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to
storage server failed' - trying to restart agent
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to
storage server failed' - trying to restart agent
Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
18:02:12.384611] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume data. Starting local bricks.
Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
18:02:12.388981] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume engine. Starting local
bricks.

And at that point VDSM started complaining that the hosted-engine-storage
domain doesn't exist anymore:
Oct  4 19:02:30 dcasrv01 journal: ovirt-ha-agent
ovirt_hosted_engine_ha.lib.image.Image ERROR Error fetching volumes list:
Storage domain does not exist: (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
Oct  4 19:02:30 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.lib.image.Image:Error fetching volumes list:
Storage domain does not exist: (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)

I see from the logs that the ovirt-ha-agent is trying to mount the
hosted-engine storage domain as:
/usr/bin/mount -t glusterfs -o backup-volfile-servers=dcastor02:dcastor03
dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine.

Pointing to dcastor01, dcastor02 and dcastor03 while your server is
dcasrv01.
But at the same time it seams that also dcasrv01 has local bricks for the
same engine volume.

So, is dcasrv01 just an alias fro dcastor01? if not you probably have some
issue with the configuration of your gluster volume.



>
>>
>> From broker.log
>>
>>
>>
>> Thread-169::ERROR::2016-10-04 22:44:16,189::storage_broker::138::
>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBro
>> ker::(get_raw_stats_for_service_type) Failed to read metadata from
>> /rhev/data-center/mnt/glusterSD/dcastor01:engine/bbb70623-
>> 194a-46d2-a164-76a4876ecaaf/ha_agent/hosted-engine.metadata
>>
>>
>>
>> [root at dcasrv01 ovirt-hosted-engine-ha]# ls -al
>> /rhev/data-center/mnt/glusterSD/dcastor01\:engine/bbb70623-
>> 194a-46d2-a164-76a4876ecaaf/ha_agent/
>>
>> total 9
>>
>> drwxrwx---. 2 vdsm kvm 4096 Oct  3 17:27 .
>>
>> drwxr-xr-x. 5 vdsm kvm 4096 Oct  3 17:17 ..
>>
>> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.lockspace ->
>> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/
>> 23d81b73-bcb7-4742-abde-128522f43d78/11d6a3e1-1817-429d-b2e0-9051a3cf41a4
>>
>> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
>> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/
>> fd44dbf9-473a-496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>>
>>
>>
>>
>> [root at dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623
>> -194a-46d2-a164-76a4876ecaaf/
>>
>> ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
>> No such file or directory
>>
>>
>>
>> Though file appears to be there
>>
>>
>>
>> Gluster is setup as xpool/engine
>>
>>
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# pwd
>>
>> /xpool/engine/brick/bbb70623-194a-46d2-a164-76a4876ecaaf/ima
>> ges/fd44dbf9-473a-496a-9996-c8abe3278390
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>>
>> total 2060
>>
>> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>>
>> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>>
>> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-c47e6f
>> 9cbc93
>>
>> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-c47e6f
>> 9cbc93.lease
>>
>> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17 cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>>
>>
>>
>>
>>
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume info
>>
>>
>>
>> Volume Name: data
>>
>> Type: Replicate
>>
>> Volume ID: 54fbcafc-fed9-4bce-92ec-fa36cdcacbd4
>>
>> Status: Started
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: dcastor01:/xpool/data/brick
>>
>> Brick2: dcastor03:/xpool/data/brick
>>
>> Brick3: dcastor02:/xpool/data/bricky (arbiter)
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> cluster.eager-lock: enable
>>
>> network.remote-dio: enable
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>>
>>
>> Volume Name: engine
>>
>> Type: Replicate
>>
>> Volume ID: dd4c692d-03aa-4fc6-9011-a8dad48dad96
>>
>> Status: Started
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: dcastor01:/xpool/engine/brick
>>
>> Brick2: dcastor02:/xpool/engine/brick
>>
>> Brick3: dcastor03:/xpool/engine/brick (arbiter)
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> cluster.eager-lock: enable
>>
>> network.remote-dio: enable
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>>
>>
>> Volume Name: export
>>
>> Type: Replicate
>>
>> Volume ID: 23f14730-d264-4cc2-af60-196b943ecaf3
>>
>> Status: Started
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: dcastor02:/xpool/export/brick
>>
>> Brick2: dcastor03:/xpool/export/brick
>>
>> Brick3: dcastor01:/xpool/export/brick (arbiter)
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>>
>>
>> Volume Name: iso
>>
>> Type: Replicate
>>
>> Volume ID: b2d3d7e2-9919-400b-8368-a0443d48e82a
>>
>> Status: Started
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: dcastor01:/xpool/iso/brick
>>
>> Brick2: dcastor02:/xpool/iso/brick
>>
>> Brick3: dcastor03:/xpool/iso/brick (arbiter)
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>>
>>
>>
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
>> status
>>
>> Status of volume: data
>>
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> Brick dcastor01:/xpool/data/brick           49153     0          Y
>> 3076
>>
>> Brick dcastor03:/xpool/data/brick           49153     0          Y
>> 3019
>>
>> Brick dcastor02:/xpool/data/bricky          49153     0          Y
>> 3857
>>
>> NFS Server on localhost                     2049      0          Y
>>     3097
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 3088
>>
>> NFS Server on dcastor03                     2049      0          Y
>> 3039
>>
>> Self-heal Daemon on dcastor03               N/A       N/A        Y
>> 3114
>>
>> NFS Server on dcasrv02                      2049      0          Y
>> 3871
>>
>> Self-heal Daemon on dcasrv02                N/A       N/A        Y
>> 3864
>>
>>
>>
>> Task Status of Volume data
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> There are no active volume tasks
>>
>>
>>
>> Status of volume: engine
>>
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> Brick dcastor01:/xpool/engine/brick         49152     0          Y
>> 3131
>>
>> Brick dcastor02:/xpool/engine/brick         49152     0          Y
>> 3852
>>
>> Brick dcastor03:/xpool/engine/brick         49152     0          Y
>> 2992
>>
>> NFS Server on localhost                     2049      0          Y
>> 3097
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 3088
>>
>> NFS Server on dcastor03                     2049      0          Y
>> 3039
>>
>> Self-heal Daemon on dcastor03               N/A       N/A        Y
>> 3114
>>
>> NFS Server on dcasrv02                      2049      0          Y
>> 3871
>>
>> Self-heal Daemon on dcasrv02                N/A       N/A        Y
>> 3864
>>
>>
>>
>> Task Status of Volume engine
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> There are no active volume tasks
>>
>>
>>
>> Status of volume: export
>>
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> Brick dcastor02:/xpool/export/brick         49155     0          Y
>> 3872
>>
>> Brick dcastor03:/xpool/export/brick         49155     0          Y
>> 3147
>>
>> Brick dcastor01:/xpool/export/brick         49155     0          Y
>> 3150
>>
>> NFS Server on localhost                     2049      0          Y
>> 3097
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 3088
>>
>> NFS Server on dcastor03                     2049      0          Y
>> 3039
>>
>> Self-heal Daemon on dcastor03               N/A       N/A        Y
>> 3114
>>
>> NFS Server on dcasrv02                      2049      0          Y
>> 3871
>>
>> Self-heal Daemon on dcasrv02                N/A       N/A        Y
>> 3864
>>
>>
>>
>> Task Status of Volume export
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> There are no active volume tasks
>>
>>
>>
>> Status of volume: iso
>>
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> Brick dcastor01:/xpool/iso/brick            49154     0          Y
>> 3152
>>
>> Brick dcastor02:/xpool/iso/brick            49154     0          Y
>> 3881
>>
>> Brick dcastor03:/xpool/iso/brick            49154     0          Y
>> 3146
>>
>> NFS Server on localhost                     2049      0          Y
>> 3097
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 3088
>>
>> NFS Server on dcastor03                     2049      0          Y
>> 3039
>>
>> Self-heal Daemon on dcastor03               N/A       N/A        Y
>> 3114
>>
>> NFS Server on dcasrv02                      2049      0          Y
>> 3871
>>
>> Self-heal Daemon on dcasrv02                N/A       N/A        Y
>> 3864
>>
>>
>>
>> Task Status of Volume iso
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> There are no active volume tasks
>>
>>
>>
>>
>> Thanks
>>
>>
>>
>> Jason
>>
>>
>>
>>
>>
>>
>>
>> *From:* users-bounces at ovirt.org [mailto:users-bounces at ovirt.org] *On
>> Behalf Of *Jason Jeffrey
>> *Sent:* 03 October 2016 18:40
>> *To:* users at ovirt.org
>>
>> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>>
>>
>>
>> Hi,
>>
>>
>>
>> Setup log attached for primary
>>
>>
>>
>> Regards
>>
>>
>>
>> Jason
>>
>>
>>
>> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com
>> <stirabos at redhat.com>]
>> *Sent:* 03 October 2016 09:27
>> *To:* Jason Jeffrey <jason at sudo.co.uk>
>> *Cc:* users <users at ovirt.org>
>> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 12:45 AM, Jason Jeffrey <jason at sudo.co.uk> wrote:
>>
>> Hi,
>>
>>
>>
>> I am trying to build a x3 HC cluster, with a self hosted engine using
>> gluster.
>>
>>
>>
>> I have successful built the 1st node,  however when I attempt to run
>> hosted-engine –deploy on node 2, I get the following error
>>
>>
>>
>> [WARNING] A configuration file must be supplied to deploy Hosted Engine
>> on an additional host.
>>
>> [ ERROR ] 'version' is not stored in the HE configuration image
>>
>> [ ERROR ] Unable to get the answer file from the shared storage
>>
>> [ ERROR ] Failed to execute stage 'Environment customization': Unable to
>> get the answer file from the shared storage
>>
>> [ INFO  ] Stage: Clean up
>>
>> [ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-
>> setup/answers/answers-20161002232505.conf'
>>
>> [ INFO  ] Stage: Pre-termination
>>
>> [ INFO  ] Stage: Termination
>>
>> [ ERROR ] Hosted Engine deployment failed
>>
>>
>>
>> Looking at the failure in the log file..
>>
>>
>>
>> Can you please attach hosted-engine-setup logs from the first host?
>>
>>
>>
>>
>>
>> 2016-10-02 23:25:05 WARNING otopi.plugins.gr_he_common.core.remote_answerfile
>> remote_answerfile._customization:151 A configuration
>>
>> file must be supplied to deploy Hosted Engine on an additional host.
>>
>> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
>> remote_answerfile._fetch_answer_file:61 _fetch_answer_f
>>
>> ile
>>
>> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
>> remote_answerfile._fetch_answer_file:69 fetching from:
>>
>> /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-
>> 91b5-4f49-9c6b-fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7
>>
>> 8cb2527-a2e2-489a-9fad-465a72221b37
>>
>> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i
>>
>> f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-
>> 91b5-4f49-9c6b-fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b
>>
>> /78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k'
>>
>> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib._dd_pipe_tar:70 executing: 'tar -tvf -'
>>
>> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib._dd_pipe_tar:88 stdout:
>>
>> 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib._dd_pipe_tar:89 stderr:
>>
>> 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib.validateConfImage:111 'version' is not stored
>>
>> in the HE configuration image
>>
>> 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile
>> remote_answerfile._fetch_answer_file:73 Unable to get t
>>
>> he answer file from the shared storage
>>
>>
>>
>> Looking at the detected gluster path - /rhev/data-center/mnt/glusterS
>> D/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff45e85a025/
>> images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>>
>>
>>
>> [root at dcasrv02 ~]# ls -al /rhev/data-center/mnt/glusterS
>> D/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff45e85a025/
>> images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>>
>> total 1049609
>>
>> drwxr-xr-x. 2 vdsm kvm       4096 Oct  2 04:46 .
>>
>> drwxr-xr-x. 6 vdsm kvm       4096 Oct  2 04:46 ..
>>
>> -rw-rw----. 1 vdsm kvm 1073741824 Oct  2 04:46
>> 78cb2527-a2e2-489a-9fad-465a72221b37
>>
>> -rw-rw----. 1 vdsm kvm    1048576 Oct  2 04:46
>> 78cb2527-a2e2-489a-9fad-465a72221b37.lease
>>
>> -rw-r--r--. 1 vdsm kvm        294 Oct  2 04:46
>> 78cb2527-a2e2-489a-9fad-465a72221b37.meta
>>
>>
>>
>> 78cb2527-a2e2-489a-9fad-465a72221b37 is  a 1 GB file, is this the engine
>> VM ?
>>
>>
>>
>> Copying the answers file form primary (/etc/ovirt-hosted-engine/answers.conf
>> ) to  node 2 and rerunning produces the same error : (
>>
>> (hosted-engine --deploy  --config-append=/root/answers.conf )
>>
>>
>>
>> Also tried on node 3, same issues
>>
>>
>>
>> Happy to provide logs and other debugs
>>
>>
>>
>> Thanks
>>
>>
>>
>> Jason
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161004/4758b1bb/attachment-0001.html>


More information about the Users mailing list