[ovirt-users] Hosted engine

Joel Diaz mrjoeldiaz at gmail.com
Fri Jun 16 12:44:59 UTC 2017


Good morning,

Info requested below.

[root at ovirt-hyp-02 ~]# hosted-engine --vm-start

Exception in thread Client localhost:54321 (most likely raised during
interpreter shutdown):VM exists and its status is Up



[root at ovirt-hyp-02 ~]# ping engine

PING engine.example.lan (192.168.170.149) 56(84) bytes of data.

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination Host
Unreachable

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination Host
Unreachable

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination Host
Unreachable

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination Host
Unreachable

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination Host
Unreachable

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination Host
Unreachable

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination Host
Unreachable

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination Host
Unreachable





[root at ovirt-hyp-02 ~]# gluster volume status engine

Status of volume: engine

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------
------------------

Brick 192.168.170.141:/gluster_bricks/engin

e/engine                                    49159     0          Y
1799

Brick 192.168.170.143:/gluster_bricks/engin

e/engine                                    49159     0          Y
2900

Self-heal Daemon on localhost               N/A       N/A        Y
2914

Self-heal Daemon on ovirt-hyp-01.example.lan   N/A       N/A        Y
1854



Task Status of Volume engine

------------------------------------------------------------
------------------

There are no active volume tasks



[root at ovirt-hyp-02 ~]# gluster volume heal engine info

Brick 192.168.170.141:/gluster_bricks/engine/engine

Status: Connected

Number of entries: 0



Brick 192.168.170.143:/gluster_bricks/engine/engine

Status: Connected

Number of entries: 0



Brick 192.168.170.147:/gluster_bricks/engine/engine

Status: Connected

Number of entries: 0



[root at ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-
center-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log

[2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing





Each of the three host sends out the following notifications about every 15
minutes.

Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineDown-EngineStart.

Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineStart-EngineStarting.

Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineStarting-
EngineForceStop.

Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineForceStop-EngineDown.

Please let me know if you need any additional information.

Thank you,

Joel



On Jun 16, 2017 2:52 AM, "Sahina Bose" <sabose at redhat.com> wrote:

> From the agent.log,
> MainThread::INFO::2017-06-15 11:16:50,583::states::473::
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine
> vm is running on host ovirt-hyp-02.reis.com (id 2)
>
> It looks like the HE VM was started successfully? Is it possible that the
> ovirt-engine service could not be started on the HE VM. Could you try to
> start the HE vm using below and then logging into the VM console.
> #hosted-engine --vm-start
>
> Also, please check
> # gluster volume status engine
> # gluster volume heal engine info
>
> Please also check if there are errors in gluster mount logs - at
> /var/log/glusterfs/rhev-data-center-mnt..<engine>.log
>
>
> On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <mrjoeldiaz at gmail.com> wrote:
>
>> Sorry. I forgot to attached the requested logs in the previous email.
>>
>> Thanks,
>>
>> On Jun 15, 2017 9:38 AM, "Joel Diaz" <mrjoeldiaz at gmail.com> wrote:
>>
>> Good morning,
>>
>> Requested info below. Along with some additional info.
>>
>> You'll notice the data volume is not mounted.
>>
>> Any help in getting HE back running would be greatly appreciated.
>>
>> Thank you,
>>
>> Joel
>>
>> [root at ovirt-hyp-01 ~]# hosted-engine --vm-status
>>
>>
>>
>>
>>
>> --== Host 1 status ==--
>>
>>
>>
>> conf_on_shared_storage             : True
>>
>> Status up-to-date                  : False
>>
>> Hostname                           : ovirt-hyp-01.example.lan
>>
>> Host ID                            : 1
>>
>> Engine status                      : unknown stale-data
>>
>> Score                              : 3400
>>
>> stopped                            : False
>>
>> Local maintenance                  : False
>>
>> crc32                              : 5558a7d3
>>
>> local_conf_timestamp               : 20356
>>
>> Host timestamp                     : 20341
>>
>> Extra metadata (valid at timestamp):
>>
>>         metadata_parse_version=1
>>
>>         metadata_feature_version=1
>>
>>         timestamp=20341 (Fri Jun  9 14:38:57 2017)
>>
>>         host-id=1
>>
>>         score=3400
>>
>>         vm_conf_refresh_time=20356 (Fri Jun  9 14:39:11 2017)
>>
>>         conf_on_shared_storage=True
>>
>>         maintenance=False
>>
>>         state=EngineDown
>>
>>         stopped=False
>>
>>
>>
>>
>>
>> --== Host 2 status ==--
>>
>>
>>
>> conf_on_shared_storage             : True
>>
>> Status up-to-date                  : False
>>
>> Hostname                           : ovirt-hyp-02.example.lan
>>
>> Host ID                            : 2
>>
>> Engine status                      : unknown stale-data
>>
>> Score                              : 3400
>>
>> stopped                            : False
>>
>> Local maintenance                  : False
>>
>> crc32                              : 936d4cf3
>>
>> local_conf_timestamp               : 20351
>>
>> Host timestamp                     : 20337
>>
>> Extra metadata (valid at timestamp):
>>
>>         metadata_parse_version=1
>>
>>         metadata_feature_version=1
>>
>>         timestamp=20337 (Fri Jun  9 14:39:03 2017)
>>
>>         host-id=2
>>
>>         score=3400
>>
>>         vm_conf_refresh_time=20351 (Fri Jun  9 14:39:17 2017)
>>
>>         conf_on_shared_storage=True
>>
>>         maintenance=False
>>
>>         state=EngineDown
>>
>>         stopped=False
>>
>>
>>
>>
>>
>> --== Host 3 status ==--
>>
>>
>>
>> conf_on_shared_storage             : True
>>
>> Status up-to-date                  : False
>>
>> Hostname                           : ovirt-hyp-03.example.lan
>>
>> Host ID                            : 3
>>
>> Engine status                      : unknown stale-data
>>
>> Score                              : 3400
>>
>> stopped                            : False
>>
>> Local maintenance                  : False
>>
>> crc32                              : f646334e
>>
>> local_conf_timestamp               : 20391
>>
>> Host timestamp                     : 20377
>>
>> Extra metadata (valid at timestamp):
>>
>>         metadata_parse_version=1
>>
>>         metadata_feature_version=1
>>
>>         timestamp=20377 (Fri Jun  9 14:39:37 2017)
>>
>>         host-id=3
>>
>>         score=3400
>>
>>         vm_conf_refresh_time=20391 (Fri Jun  9 14:39:51 2017)
>>
>>         conf_on_shared_storage=True
>>
>>         maintenance=False
>>
>>         state=EngineStop
>>
>>         stopped=False
>>
>>         timeout=Thu Jan  1 00:43:08 1970
>>
>>
>>
>>
>>
>> [root at ovirt-hyp-01 ~]# gluster peer status
>>
>> Number of Peers: 2
>>
>>
>>
>> Hostname: 192.168.170.143
>>
>> Uuid: b2b30d05-cf91-4567-92fd-022575e082f5
>>
>> State: Peer in Cluster (Connected)
>>
>> Other names:
>>
>> 10.0.0.2
>>
>>
>>
>> Hostname: 192.168.170.147
>>
>> Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529
>>
>> State: Peer in Cluster (Connected)
>>
>> Other names:
>>
>> 10.0.0.3
>>
>>
>>
>> [root at ovirt-hyp-01 ~]# gluster volume info all
>>
>>
>>
>> Volume Name: data
>>
>> Type: Replicate
>>
>> Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: 192.168.170.141:/gluster_bricks/data/data
>>
>> Brick2: 192.168.170.143:/gluster_bricks/data/data
>>
>> Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter)
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> performance.readdir-ahead: on
>>
>> transport.address-family: inet
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> performance.low-prio-threads: 32
>>
>> network.remote-dio: off
>>
>> cluster.eager-lock: enable
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> cluster.data-self-heal-algorithm: full
>>
>> cluster.locking-scheme: granular
>>
>> cluster.shd-max-threads: 8
>>
>> cluster.shd-wait-qlength: 10000
>>
>> features.shard: on
>>
>> user.cifs: off
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>> network.ping-timeout: 30
>>
>> performance.strict-o-direct: on
>>
>> cluster.granular-entry-heal: enable
>>
>>
>>
>> Volume Name: engine
>>
>> Type: Replicate
>>
>> Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: 192.168.170.141:/gluster_bricks/engine/engine
>>
>> Brick2: 192.168.170.143:/gluster_bricks/engine/engine
>>
>> Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter)
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> performance.readdir-ahead: on
>>
>> transport.address-family: inet
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> performance.low-prio-threads: 32
>>
>> network.remote-dio: off
>>
>> cluster.eager-lock: enable
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> cluster.data-self-heal-algorithm: full
>>
>> cluster.locking-scheme: granular
>>
>> cluster.shd-max-threads: 8
>>
>> cluster.shd-wait-qlength: 10000
>>
>> features.shard: on
>>
>> user.cifs: off
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>> network.ping-timeout: 30
>>
>> performance.strict-o-direct: on
>>
>> cluster.granular-entry-heal: enable
>>
>>
>>
>>
>>
>> [root at ovirt-hyp-01 ~]# df -h
>>
>> Filesystem                                    Size  Used Avail Use%
>> Mounted on
>>
>> /dev/mapper/centos_ovirt--hyp--01-root         50G  4.1G   46G   9% /
>>
>> devtmpfs                                      7.7G     0  7.7G   0% /dev
>>
>> tmpfs                                         7.8G     0  7.8G   0%
>> /dev/shm
>>
>> tmpfs                                         7.8G  8.7M  7.7G   1% /run
>>
>> tmpfs                                         7.8G     0  7.8G   0%
>> /sys/fs/cgroup
>>
>> /dev/mapper/centos_ovirt--hyp--01-home         61G   33M   61G   1% /home
>>
>> /dev/mapper/gluster_vg_sdb-gluster_lv_engine   50G  7.6G   43G  16%
>> /gluster_bricks/engine
>>
>> /dev/mapper/gluster_vg_sdb-gluster_lv_data    730G  157G  574G  22%
>> /gluster_bricks/data
>>
>> /dev/sda1                                     497M  173M  325M  35% /boot
>>
>> ovirt-hyp-01.example.lan:engine                   50G  7.6G   43G  16%
>> /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine
>>
>> tmpfs                                         1.6G     0  1.6G   0%
>> /run/user/0
>>
>>
>>
>> [root at ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt
>>
>> ovirt-ha-agent.service                        enabled
>>
>> ovirt-ha-broker.service                       enabled
>>
>> ovirt-imageio-daemon.service                  disabled
>>
>> ovirt-vmconsole-host-sshd.service             enabled
>>
>>
>>
>> [root at ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service
>>
>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>> Monitoring Agent
>>
>>   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>> enabled; vendor preset: disabled)
>>
>>    Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago
>>
>> Main PID: 3150 (ovirt-ha-agent)
>>
>>    CGroup: /system.slice/ovirt-ha-agent.service
>>
>>            └─3150 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>> --no-daemon
>>
>>
>>
>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted
>> Engine High Availability Monitoring Agent.
>>
>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>> Hosted Engine High Availability Monitoring Agent...
>>
>> Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]:
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Engine VM stopped on localhost
>>
>> [root at ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>>
>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>> Communications Broker
>>
>>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>> enabled; vendor preset: disabled)
>>
>>    Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago
>>
>> Main PID: 968 (ovirt-ha-broker)
>>
>>    CGroup: /system.slice/ovirt-ha-broker.service
>>
>>            └─968 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>> --no-daemon
>>
>>
>>
>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted
>> Engine High Availability Communications Broker.
>>
>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>> Hosted Engine High Availability Communications Broker...
>>
>> Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]:
>> ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler
>> ERROR Error handling request, data: '...1b55bcf76'
>>
>>                                                             Traceback
>> (most recent call last):
>>
>>                                                               File
>> "/usr/lib/python2.7/site-packages/ovirt...
>>
>> Hint: Some lines were ellipsized, use -l to show in full.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [root at ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service
>>
>> [root at ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service
>>
>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>> Monitoring Agent
>>
>>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>> enabled; vendor preset: disabled)
>>
>>    Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago
>>
>> Main PID: 8563 (ovirt-ha-agent)
>>
>>    CGroup: /system.slice/ovirt-ha-agent.service
>>
>>            └─8563 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>> --no-daemon
>>
>>
>>
>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted
>> Engine High Availability Monitoring Agent.
>>
>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>> Hosted Engine High Availability Monitoring Agent...
>>
>> [root at ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service
>>
>> [root at ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>>
>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>> Communications Broker
>>
>>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>> enabled; vendor preset: disabled)
>>
>>    Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago
>>
>> Main PID: 8844 (ovirt-ha-broker)
>>
>>    CGroup: /system.slice/ovirt-ha-broker.service
>>
>>            └─8844 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>> --no-daemon
>>
>>
>>
>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted
>> Engine High Availability Communications Broker.
>>
>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>> Hosted Engine High Availability Communications Broker...
>>
>>
>> On Jun 14, 2017 4:45 AM, "Sahina Bose" <sabose at redhat.com> wrote:
>>
>>> What's the output of "hosted-engine --vm-status" and "gluster volume
>>> status engine" tell you? Are all the bricks running as per gluster vol
>>> status?
>>>
>>> Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services?
>>>
>>> If HE still has issues powering up, please provide agent.log and
>>> broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount logs
>>> from /var/log/glusterfs/rhev-data-center-mnt <engine>.log
>>>
>>> On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <mrjoeldiaz at gmail.com> wrote:
>>>
>>>> Good morning oVirt community,
>>>>
>>>> I'm running a three host gluster environment with hosted engine.
>>>>
>>>> Yesterday the engine went down and has not been able to come up
>>>> properly. It tries to start on all three host.
>>>>
>>>> I have two gluster volumes, data and engne. The data storage domian
>>>> volume is no longer mounted but the engine volume is up. I've restarted the
>>>> gluster service and make sure both volumes were running. The data volume
>>>> will not mount.
>>>>
>>>> How can I get the engine running properly again?
>>>>
>>>> Thanks,
>>>>
>>>> Joel
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170616/209dfec7/attachment-0001.html>


More information about the Users mailing list