Good morning,
Info requested below.
[root@ovirt-hyp-02 ~]# hosted-engine --vm-start
Exception in thread Client localhost:54321 (most likely raised during
interpreter shutdown):VM exists and its status is Up
[root@ovirt-hyp-02 ~]# ping engine
PING engine.example.lan (192.168.170.149) 56(84) bytes of data.
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination
Host
Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination
Host
Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination
Host
Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination
Host
Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination
Host
Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination
Host
Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination
Host
Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination
Host
Unreachable
[root@ovirt-hyp-02 ~]# gluster volume status engine
Status of volume: engine
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------
------------------
Brick 192.168.170.141:/gluster_bricks/engin
e/engine 49159 0 Y
1799
Brick 192.168.170.143:/gluster_bricks/engin
e/engine 49159 0 Y
2900
Self-heal Daemon on localhost N/A N/A Y
2914
Self-heal Daemon on ovirt-hyp-01.example.lan N/A N/A Y
1854
Task Status of Volume engine
------------------------------------------------------------
------------------
There are no active volume tasks
[root@ovirt-hyp-02 ~]# gluster volume heal engine info
Brick 192.168.170.141:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
Brick 192.168.170.143:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
Brick 192.168.170.147:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
[root@ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-
center-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log
[2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
Each of the three host sends out the following notifications about every 15
minutes.
Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineDown-EngineStart.
Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineStart-EngineStarting.
Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineStarting-
EngineForceStop.
Hosted engine host: ovirt-hyp-01.example.lan changed state:
EngineForceStop-EngineDown.
Please let me know if you need any additional information.
Thank you,
Joel
On Jun 16, 2017 2:52 AM, "Sahina Bose" <sabose(a)redhat.com> wrote:
From the agent.log,
MainThread::INFO::2017-06-15 11:16:50,583::states::473::
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine
vm is running on host
ovirt-hyp-02.reis.com (id 2)
It looks like the HE VM was started successfully? Is it possible that the
ovirt-engine service could not be started on the HE VM. Could you try to
start the HE vm using below and then logging into the VM console.
#hosted-engine --vm-start
Also, please check
# gluster volume status engine
# gluster volume heal engine info
Please also check if there are errors in gluster mount logs - at
/var/log/glusterfs/rhev-data-center-mnt..<engine>.log
On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <mrjoeldiaz(a)gmail.com> wrote:
> Sorry. I forgot to attached the requested logs in the previous email.
>
> Thanks,
>
> On Jun 15, 2017 9:38 AM, "Joel Diaz" <mrjoeldiaz(a)gmail.com> wrote:
>
> Good morning,
>
> Requested info below. Along with some additional info.
>
> You'll notice the data volume is not mounted.
>
> Any help in getting HE back running would be greatly appreciated.
>
> Thank you,
>
> Joel
>
> [root@ovirt-hyp-01 ~]# hosted-engine --vm-status
>
>
>
>
>
> --== Host 1 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date : False
>
> Hostname : ovirt-hyp-01.example.lan
>
> Host ID : 1
>
> Engine status : unknown stale-data
>
> Score : 3400
>
> stopped : False
>
> Local maintenance : False
>
> crc32 : 5558a7d3
>
> local_conf_timestamp : 20356
>
> Host timestamp : 20341
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=20341 (Fri Jun 9 14:38:57 2017)
>
> host-id=1
>
> score=3400
>
> vm_conf_refresh_time=20356 (Fri Jun 9 14:39:11 2017)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=EngineDown
>
> stopped=False
>
>
>
>
>
> --== Host 2 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date : False
>
> Hostname : ovirt-hyp-02.example.lan
>
> Host ID : 2
>
> Engine status : unknown stale-data
>
> Score : 3400
>
> stopped : False
>
> Local maintenance : False
>
> crc32 : 936d4cf3
>
> local_conf_timestamp : 20351
>
> Host timestamp : 20337
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=20337 (Fri Jun 9 14:39:03 2017)
>
> host-id=2
>
> score=3400
>
> vm_conf_refresh_time=20351 (Fri Jun 9 14:39:17 2017)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=EngineDown
>
> stopped=False
>
>
>
>
>
> --== Host 3 status ==--
>
>
>
> conf_on_shared_storage : True
>
> Status up-to-date : False
>
> Hostname : ovirt-hyp-03.example.lan
>
> Host ID : 3
>
> Engine status : unknown stale-data
>
> Score : 3400
>
> stopped : False
>
> Local maintenance : False
>
> crc32 : f646334e
>
> local_conf_timestamp : 20391
>
> Host timestamp : 20377
>
> Extra metadata (valid at timestamp):
>
> metadata_parse_version=1
>
> metadata_feature_version=1
>
> timestamp=20377 (Fri Jun 9 14:39:37 2017)
>
> host-id=3
>
> score=3400
>
> vm_conf_refresh_time=20391 (Fri Jun 9 14:39:51 2017)
>
> conf_on_shared_storage=True
>
> maintenance=False
>
> state=EngineStop
>
> stopped=False
>
> timeout=Thu Jan 1 00:43:08 1970
>
>
>
>
>
> [root@ovirt-hyp-01 ~]# gluster peer status
>
> Number of Peers: 2
>
>
>
> Hostname: 192.168.170.143
>
> Uuid: b2b30d05-cf91-4567-92fd-022575e082f5
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> 10.0.0.2
>
>
>
> Hostname: 192.168.170.147
>
> Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> 10.0.0.3
>
>
>
> [root@ovirt-hyp-01 ~]# gluster volume info all
>
>
>
> Volume Name: data
>
> Type: Replicate
>
> Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: 192.168.170.141:/gluster_bricks/data/data
>
> Brick2: 192.168.170.143:/gluster_bricks/data/data
>
> Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter)
>
> Options Reconfigured:
>
> nfs.disable: on
>
> performance.readdir-ahead: on
>
> transport.address-family: inet
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> performance.low-prio-threads: 32
>
> network.remote-dio: off
>
> cluster.eager-lock: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> cluster.data-self-heal-algorithm: full
>
> cluster.locking-scheme: granular
>
> cluster.shd-max-threads: 8
>
> cluster.shd-wait-qlength: 10000
>
> features.shard: on
>
> user.cifs: off
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
> network.ping-timeout: 30
>
> performance.strict-o-direct: on
>
> cluster.granular-entry-heal: enable
>
>
>
> Volume Name: engine
>
> Type: Replicate
>
> Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: 192.168.170.141:/gluster_bricks/engine/engine
>
> Brick2: 192.168.170.143:/gluster_bricks/engine/engine
>
> Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter)
>
> Options Reconfigured:
>
> nfs.disable: on
>
> performance.readdir-ahead: on
>
> transport.address-family: inet
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> performance.low-prio-threads: 32
>
> network.remote-dio: off
>
> cluster.eager-lock: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> cluster.data-self-heal-algorithm: full
>
> cluster.locking-scheme: granular
>
> cluster.shd-max-threads: 8
>
> cluster.shd-wait-qlength: 10000
>
> features.shard: on
>
> user.cifs: off
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
> network.ping-timeout: 30
>
> performance.strict-o-direct: on
>
> cluster.granular-entry-heal: enable
>
>
>
>
>
> [root@ovirt-hyp-01 ~]# df -h
>
> Filesystem Size Used Avail Use%
> Mounted on
>
> /dev/mapper/centos_ovirt--hyp--01-root 50G 4.1G 46G 9% /
>
> devtmpfs 7.7G 0 7.7G 0% /dev
>
> tmpfs 7.8G 0 7.8G 0%
> /dev/shm
>
> tmpfs 7.8G 8.7M 7.7G 1% /run
>
> tmpfs 7.8G 0 7.8G 0%
> /sys/fs/cgroup
>
> /dev/mapper/centos_ovirt--hyp--01-home 61G 33M 61G 1% /home
>
> /dev/mapper/gluster_vg_sdb-gluster_lv_engine 50G 7.6G 43G 16%
> /gluster_bricks/engine
>
> /dev/mapper/gluster_vg_sdb-gluster_lv_data 730G 157G 574G 22%
> /gluster_bricks/data
>
> /dev/sda1 497M 173M 325M 35% /boot
>
> ovirt-hyp-01.example.lan:engine 50G 7.6G 43G 16%
> /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine
>
> tmpfs 1.6G 0 1.6G 0%
> /run/user/0
>
>
>
> [root@ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt
>
> ovirt-ha-agent.service enabled
>
> ovirt-ha-broker.service enabled
>
> ovirt-imageio-daemon.service disabled
>
> ovirt-vmconsole-host-sshd.service enabled
>
>
>
> [root@ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service
>
> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
> Monitoring Agent
>
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
> enabled; vendor preset: disabled)
>
> Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago
>
> Main PID: 3150 (ovirt-ha-agent)
>
> CGroup: /system.slice/ovirt-ha-agent.service
>
> └─3150 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
> --no-daemon
>
>
>
> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted
> Engine High Availability Monitoring Agent.
>
> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
> Hosted Engine High Availability Monitoring Agent...
>
> Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]:
> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
> ERROR Engine VM stopped on localhost
>
> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>
> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
> Communications Broker
>
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
> enabled; vendor preset: disabled)
>
> Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago
>
> Main PID: 968 (ovirt-ha-broker)
>
> CGroup: /system.slice/ovirt-ha-broker.service
>
> └─968 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
> --no-daemon
>
>
>
> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted
> Engine High Availability Communications Broker.
>
> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
> Hosted Engine High Availability Communications Broker...
>
> Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]:
> ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler
> ERROR Error handling request, data: '...1b55bcf76'
>
> Traceback
> (most recent call last):
>
> File
> "/usr/lib/python2.7/site-packages/ovirt...
>
> Hint: Some lines were ellipsized, use -l to show in full.
>
>
>
>
>
>
>
>
>
> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service
>
> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service
>
> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
> Monitoring Agent
>
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
> enabled; vendor preset: disabled)
>
> Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago
>
> Main PID: 8563 (ovirt-ha-agent)
>
> CGroup: /system.slice/ovirt-ha-agent.service
>
> └─8563 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
> --no-daemon
>
>
>
> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted
> Engine High Availability Monitoring Agent.
>
> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
> Hosted Engine High Availability Monitoring Agent...
>
> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service
>
> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>
> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
> Communications Broker
>
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
> enabled; vendor preset: disabled)
>
> Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago
>
> Main PID: 8844 (ovirt-ha-broker)
>
> CGroup: /system.slice/ovirt-ha-broker.service
>
> └─8844 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
> --no-daemon
>
>
>
> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted
> Engine High Availability Communications Broker.
>
> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
> Hosted Engine High Availability Communications Broker...
>
>
> On Jun 14, 2017 4:45 AM, "Sahina Bose" <sabose(a)redhat.com> wrote:
>
>> What's the output of "hosted-engine --vm-status" and "gluster
volume
>> status engine" tell you? Are all the bricks running as per gluster vol
>> status?
>>
>> Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services?
>>
>> If HE still has issues powering up, please provide agent.log and
>> broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount logs
>> from /var/log/glusterfs/rhev-data-center-mnt <engine>.log
>>
>> On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <mrjoeldiaz(a)gmail.com> wrote:
>>
>>> Good morning oVirt community,
>>>
>>> I'm running a three host gluster environment with hosted engine.
>>>
>>> Yesterday the engine went down and has not been able to come up
>>> properly. It tries to start on all three host.
>>>
>>> I have two gluster volumes, data and engne. The data storage domian
>>> volume is no longer mounted but the engine volume is up. I've restarted
the
>>> gluster service and make sure both volumes were running. The data volume
>>> will not mount.
>>>
>>> How can I get the engine running properly again?
>>>
>>> Thanks,
>>>
>>> Joel
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>