I don't notice anything wrong on the gluster end.
Maybe Simone can help take a look at HE behaviour?
On Fri, Jun 16, 2017 at 6:14 PM, Joel Diaz <mrjoeldiaz(a)gmail.com> wrote:
Good morning,
Info requested below.
[root@ovirt-hyp-02 ~]# hosted-engine --vm-start
Exception in thread Client localhost:54321 (most likely raised during
interpreter shutdown):VM exists and its status is Up
[root@ovirt-hyp-02 ~]# ping engine
PING engine.example.lan (192.168.170.149) 56(84) bytes of data.
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination
Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination
Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination
Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination
Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination
Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination
Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination
Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination
Host Unreachable
[root@ovirt-hyp-02 ~]# gluster volume status engine
Status of volume: engine
Gluster process TCP Port RDMA Port Online
Pid
------------------------------------------------------------
------------------
Brick 192.168.170.141:/gluster_bricks/engin
e/engine 49159 0 Y
1799
Brick 192.168.170.143:/gluster_bricks/engin
e/engine 49159 0 Y
2900
Self-heal Daemon on localhost N/A N/A Y
2914
Self-heal Daemon on ovirt-hyp-01.example.lan N/A N/A
Y 1854
Task Status of Volume engine
------------------------------------------------------------
------------------
There are no active volume tasks
[root@ovirt-hyp-02 ~]# gluster volume heal engine info
Brick 192.168.170.141:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
Brick 192.168.170.143:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
Brick 192.168.170.147:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
[root@ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-c
enter-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log
[2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Each of the three host sends out the following notifications about every
15 minutes.
Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineDown-EngineStart.
Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineStart-EngineStarting.
Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineStarting-EngineForceStop.
Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineForceStop-EngineDown.
Please let me know if you need any additional information.
Thank you,
Joel
On Jun 16, 2017 2:52 AM, "Sahina Bose" <sabose(a)redhat.com> wrote:
> From the agent.log,
> MainThread::INFO::2017-06-15 11:16:50,583::states::473::ovi
> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine
> vm is running on host
ovirt-hyp-02.reis.com (id 2)
>
> It looks like the HE VM was started successfully? Is it possible that the
> ovirt-engine service could not be started on the HE VM. Could you try to
> start the HE vm using below and then logging into the VM console.
> #hosted-engine --vm-start
>
> Also, please check
> # gluster volume status engine
> # gluster volume heal engine info
>
> Please also check if there are errors in gluster mount logs - at
> /var/log/glusterfs/rhev-data-center-mnt..<engine>.log
>
>
> On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <mrjoeldiaz(a)gmail.com> wrote:
>
>> Sorry. I forgot to attached the requested logs in the previous email.
>>
>> Thanks,
>>
>> On Jun 15, 2017 9:38 AM, "Joel Diaz" <mrjoeldiaz(a)gmail.com>
wrote:
>>
>> Good morning,
>>
>> Requested info below. Along with some additional info.
>>
>> You'll notice the data volume is not mounted.
>>
>> Any help in getting HE back running would be greatly appreciated.
>>
>> Thank you,
>>
>> Joel
>>
>> [root@ovirt-hyp-01 ~]# hosted-engine --vm-status
>>
>>
>>
>>
>>
>> --== Host 1 status ==--
>>
>>
>>
>> conf_on_shared_storage : True
>>
>> Status up-to-date : False
>>
>> Hostname : ovirt-hyp-01.example.lan
>>
>> Host ID : 1
>>
>> Engine status : unknown stale-data
>>
>> Score : 3400
>>
>> stopped : False
>>
>> Local maintenance : False
>>
>> crc32 : 5558a7d3
>>
>> local_conf_timestamp : 20356
>>
>> Host timestamp : 20341
>>
>> Extra metadata (valid at timestamp):
>>
>> metadata_parse_version=1
>>
>> metadata_feature_version=1
>>
>> timestamp=20341 (Fri Jun 9 14:38:57 2017)
>>
>> host-id=1
>>
>> score=3400
>>
>> vm_conf_refresh_time=20356 (Fri Jun 9 14:39:11 2017)
>>
>> conf_on_shared_storage=True
>>
>> maintenance=False
>>
>> state=EngineDown
>>
>> stopped=False
>>
>>
>>
>>
>>
>> --== Host 2 status ==--
>>
>>
>>
>> conf_on_shared_storage : True
>>
>> Status up-to-date : False
>>
>> Hostname : ovirt-hyp-02.example.lan
>>
>> Host ID : 2
>>
>> Engine status : unknown stale-data
>>
>> Score : 3400
>>
>> stopped : False
>>
>> Local maintenance : False
>>
>> crc32 : 936d4cf3
>>
>> local_conf_timestamp : 20351
>>
>> Host timestamp : 20337
>>
>> Extra metadata (valid at timestamp):
>>
>> metadata_parse_version=1
>>
>> metadata_feature_version=1
>>
>> timestamp=20337 (Fri Jun 9 14:39:03 2017)
>>
>> host-id=2
>>
>> score=3400
>>
>> vm_conf_refresh_time=20351 (Fri Jun 9 14:39:17 2017)
>>
>> conf_on_shared_storage=True
>>
>> maintenance=False
>>
>> state=EngineDown
>>
>> stopped=False
>>
>>
>>
>>
>>
>> --== Host 3 status ==--
>>
>>
>>
>> conf_on_shared_storage : True
>>
>> Status up-to-date : False
>>
>> Hostname : ovirt-hyp-03.example.lan
>>
>> Host ID : 3
>>
>> Engine status : unknown stale-data
>>
>> Score : 3400
>>
>> stopped : False
>>
>> Local maintenance : False
>>
>> crc32 : f646334e
>>
>> local_conf_timestamp : 20391
>>
>> Host timestamp : 20377
>>
>> Extra metadata (valid at timestamp):
>>
>> metadata_parse_version=1
>>
>> metadata_feature_version=1
>>
>> timestamp=20377 (Fri Jun 9 14:39:37 2017)
>>
>> host-id=3
>>
>> score=3400
>>
>> vm_conf_refresh_time=20391 (Fri Jun 9 14:39:51 2017)
>>
>> conf_on_shared_storage=True
>>
>> maintenance=False
>>
>> state=EngineStop
>>
>> stopped=False
>>
>> timeout=Thu Jan 1 00:43:08 1970
>>
>>
>>
>>
>>
>> [root@ovirt-hyp-01 ~]# gluster peer status
>>
>> Number of Peers: 2
>>
>>
>>
>> Hostname: 192.168.170.143
>>
>> Uuid: b2b30d05-cf91-4567-92fd-022575e082f5
>>
>> State: Peer in Cluster (Connected)
>>
>> Other names:
>>
>> 10.0.0.2
>>
>>
>>
>> Hostname: 192.168.170.147
>>
>> Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529
>>
>> State: Peer in Cluster (Connected)
>>
>> Other names:
>>
>> 10.0.0.3
>>
>>
>>
>> [root@ovirt-hyp-01 ~]# gluster volume info all
>>
>>
>>
>> Volume Name: data
>>
>> Type: Replicate
>>
>> Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: 192.168.170.141:/gluster_bricks/data/data
>>
>> Brick2: 192.168.170.143:/gluster_bricks/data/data
>>
>> Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter)
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> performance.readdir-ahead: on
>>
>> transport.address-family: inet
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> performance.low-prio-threads: 32
>>
>> network.remote-dio: off
>>
>> cluster.eager-lock: enable
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> cluster.data-self-heal-algorithm: full
>>
>> cluster.locking-scheme: granular
>>
>> cluster.shd-max-threads: 8
>>
>> cluster.shd-wait-qlength: 10000
>>
>> features.shard: on
>>
>> user.cifs: off
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>> network.ping-timeout: 30
>>
>> performance.strict-o-direct: on
>>
>> cluster.granular-entry-heal: enable
>>
>>
>>
>> Volume Name: engine
>>
>> Type: Replicate
>>
>> Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: 192.168.170.141:/gluster_bricks/engine/engine
>>
>> Brick2: 192.168.170.143:/gluster_bricks/engine/engine
>>
>> Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter)
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> performance.readdir-ahead: on
>>
>> transport.address-family: inet
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> performance.low-prio-threads: 32
>>
>> network.remote-dio: off
>>
>> cluster.eager-lock: enable
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> cluster.data-self-heal-algorithm: full
>>
>> cluster.locking-scheme: granular
>>
>> cluster.shd-max-threads: 8
>>
>> cluster.shd-wait-qlength: 10000
>>
>> features.shard: on
>>
>> user.cifs: off
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>> network.ping-timeout: 30
>>
>> performance.strict-o-direct: on
>>
>> cluster.granular-entry-heal: enable
>>
>>
>>
>>
>>
>> [root@ovirt-hyp-01 ~]# df -h
>>
>> Filesystem Size Used Avail Use%
>> Mounted on
>>
>> /dev/mapper/centos_ovirt--hyp--01-root 50G 4.1G 46G 9% /
>>
>> devtmpfs 7.7G 0 7.7G 0% /dev
>>
>> tmpfs 7.8G 0 7.8G 0%
>> /dev/shm
>>
>> tmpfs 7.8G 8.7M 7.7G 1% /run
>>
>> tmpfs 7.8G 0 7.8G 0%
>> /sys/fs/cgroup
>>
>> /dev/mapper/centos_ovirt--hyp--01-home 61G 33M 61G 1%
>> /home
>>
>> /dev/mapper/gluster_vg_sdb-gluster_lv_engine 50G 7.6G 43G 16%
>> /gluster_bricks/engine
>>
>> /dev/mapper/gluster_vg_sdb-gluster_lv_data 730G 157G 574G 22%
>> /gluster_bricks/data
>>
>> /dev/sda1 497M 173M 325M 35%
>> /boot
>>
>> ovirt-hyp-01.example.lan:engine 50G 7.6G 43G 16%
>> /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine
>>
>> tmpfs 1.6G 0 1.6G 0%
>> /run/user/0
>>
>>
>>
>> [root@ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt
>>
>> ovirt-ha-agent.service enabled
>>
>> ovirt-ha-broker.service enabled
>>
>> ovirt-imageio-daemon.service disabled
>>
>> ovirt-vmconsole-host-sshd.service enabled
>>
>>
>>
>> [root@ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service
>>
>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>> Monitoring Agent
>>
>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>> enabled; vendor preset: disabled)
>>
>> Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago
>>
>> Main PID: 3150 (ovirt-ha-agent)
>>
>> CGroup: /system.slice/ovirt-ha-agent.service
>>
>> └─3150 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>> --no-daemon
>>
>>
>>
>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>> Hosted Engine High Availability Monitoring Agent.
>>
>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>> Hosted Engine High Availability Monitoring Agent...
>>
>> Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]:
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Engine VM stopped on localhost
>>
>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>>
>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>> Communications Broker
>>
>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>> enabled; vendor preset: disabled)
>>
>> Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago
>>
>> Main PID: 968 (ovirt-ha-broker)
>>
>> CGroup: /system.slice/ovirt-ha-broker.service
>>
>> └─968 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>> --no-daemon
>>
>>
>>
>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>> Hosted Engine High Availability Communications Broker.
>>
>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>> Hosted Engine High Availability Communications Broker...
>>
>> Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]:
>> ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler
>> ERROR Error handling request, data: '...1b55bcf76'
>>
>> Traceback
>> (most recent call last):
>>
>> File
>> "/usr/lib/python2.7/site-packages/ovirt...
>>
>> Hint: Some lines were ellipsized, use -l to show in full.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service
>>
>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service
>>
>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>> Monitoring Agent
>>
>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>> enabled; vendor preset: disabled)
>>
>> Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago
>>
>> Main PID: 8563 (ovirt-ha-agent)
>>
>> CGroup: /system.slice/ovirt-ha-agent.service
>>
>> └─8563 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>> --no-daemon
>>
>>
>>
>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>> Hosted Engine High Availability Monitoring Agent.
>>
>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>> Hosted Engine High Availability Monitoring Agent...
>>
>> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service
>>
>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>>
>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>> Communications Broker
>>
>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>> enabled; vendor preset: disabled)
>>
>> Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago
>>
>> Main PID: 8844 (ovirt-ha-broker)
>>
>> CGroup: /system.slice/ovirt-ha-broker.service
>>
>> └─8844 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>> --no-daemon
>>
>>
>>
>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>> Hosted Engine High Availability Communications Broker.
>>
>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>> Hosted Engine High Availability Communications Broker...
>>
>>
>> On Jun 14, 2017 4:45 AM, "Sahina Bose" <sabose(a)redhat.com>
wrote:
>>
>>> What's the output of "hosted-engine --vm-status" and
"gluster volume
>>> status engine" tell you? Are all the bricks running as per gluster vol
>>> status?
>>>
>>> Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services?
>>>
>>> If HE still has issues powering up, please provide agent.log and
>>> broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount logs
>>> from /var/log/glusterfs/rhev-data-center-mnt <engine>.log
>>>
>>> On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <mrjoeldiaz(a)gmail.com>
wrote:
>>>
>>>> Good morning oVirt community,
>>>>
>>>> I'm running a three host gluster environment with hosted engine.
>>>>
>>>> Yesterday the engine went down and has not been able to come up
>>>> properly. It tries to start on all three host.
>>>>
>>>> I have two gluster volumes, data and engne. The data storage domian
>>>> volume is no longer mounted but the engine volume is up. I've
restarted the
>>>> gluster service and make sure both volumes were running. The data volume
>>>> will not mount.
>>>>
>>>> How can I get the engine running properly again?
>>>>
>>>> Thanks,
>>>>
>>>> Joel
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>