Ok.
Simone,
Please let me know if I can provide any additional log files.
Thanks for taking the time to look into this.
Joel
On Jun 16, 2017 8:59 AM, "Sahina Bose" <sabose(a)redhat.com> wrote:
I don't notice anything wrong on the gluster end.
Maybe Simone can help take a look at HE behaviour?
On Fri, Jun 16, 2017 at 6:14 PM, Joel Diaz <mrjoeldiaz(a)gmail.com> wrote:
> Good morning,
>
> Info requested below.
>
> [root@ovirt-hyp-02 ~]# hosted-engine --vm-start
>
> Exception in thread Client localhost:54321 (most likely raised during
> interpreter shutdown):VM exists and its status is Up
>
>
>
> [root@ovirt-hyp-02 ~]# ping engine
>
> PING engine.example.lan (192.168.170.149) 56(84) bytes of data.
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination
> Host Unreachable
>
>
>
>
>
> [root@ovirt-hyp-02 ~]# gluster volume status engine
>
> Status of volume: engine
>
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick 192.168.170.141:/gluster_bricks/engin
>
> e/engine 49159 0 Y
> 1799
>
> Brick 192.168.170.143:/gluster_bricks/engin
>
> e/engine 49159 0 Y
> 2900
>
> Self-heal Daemon on localhost N/A N/A Y
> 2914
>
> Self-heal Daemon on ovirt-hyp-01.example.lan N/A N/A
> Y 1854
>
>
>
> Task Status of Volume engine
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> [root@ovirt-hyp-02 ~]# gluster volume heal engine info
>
> Brick 192.168.170.141:/gluster_bricks/engine/engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick 192.168.170.143:/gluster_bricks/engine/engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick 192.168.170.147:/gluster_bricks/engine/engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> [root@ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-c
> enter-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log
>
> [2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
>
>
>
>
>
> Each of the three host sends out the following notifications about every
> 15 minutes.
>
> Hosted engine host: ovirt-hyp-01.example.lan changed state:
> EngineDown-EngineStart.
>
> Hosted engine host: ovirt-hyp-01.example.lan changed state:
> EngineStart-EngineStarting.
>
> Hosted engine host: ovirt-hyp-01.example.lan changed state:
> EngineStarting-EngineForceStop.
>
> Hosted engine host: ovirt-hyp-01.example.lan changed state:
> EngineForceStop-EngineDown.
>
> Please let me know if you need any additional information.
>
> Thank you,
>
> Joel
>
>
>
> On Jun 16, 2017 2:52 AM, "Sahina Bose" <sabose(a)redhat.com> wrote:
>
>> From the agent.log,
>> MainThread::INFO::2017-06-15 11:16:50,583::states::473::ovi
>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine
>> vm is running on host
ovirt-hyp-02.reis.com (id 2)
>>
>> It looks like the HE VM was started successfully? Is it possible that
>> the ovirt-engine service could not be started on the HE VM. Could you try
>> to start the HE vm using below and then logging into the VM console.
>> #hosted-engine --vm-start
>>
>> Also, please check
>> # gluster volume status engine
>> # gluster volume heal engine info
>>
>> Please also check if there are errors in gluster mount logs - at
>> /var/log/glusterfs/rhev-data-center-mnt..<engine>.log
>>
>>
>> On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <mrjoeldiaz(a)gmail.com> wrote:
>>
>>> Sorry. I forgot to attached the requested logs in the previous email.
>>>
>>> Thanks,
>>>
>>> On Jun 15, 2017 9:38 AM, "Joel Diaz" <mrjoeldiaz(a)gmail.com>
wrote:
>>>
>>> Good morning,
>>>
>>> Requested info below. Along with some additional info.
>>>
>>> You'll notice the data volume is not mounted.
>>>
>>> Any help in getting HE back running would be greatly appreciated.
>>>
>>> Thank you,
>>>
>>> Joel
>>>
>>> [root@ovirt-hyp-01 ~]# hosted-engine --vm-status
>>>
>>>
>>>
>>>
>>>
>>> --== Host 1 status ==--
>>>
>>>
>>>
>>> conf_on_shared_storage : True
>>>
>>> Status up-to-date : False
>>>
>>> Hostname : ovirt-hyp-01.example.lan
>>>
>>> Host ID : 1
>>>
>>> Engine status : unknown stale-data
>>>
>>> Score : 3400
>>>
>>> stopped : False
>>>
>>> Local maintenance : False
>>>
>>> crc32 : 5558a7d3
>>>
>>> local_conf_timestamp : 20356
>>>
>>> Host timestamp : 20341
>>>
>>> Extra metadata (valid at timestamp):
>>>
>>> metadata_parse_version=1
>>>
>>> metadata_feature_version=1
>>>
>>> timestamp=20341 (Fri Jun 9 14:38:57 2017)
>>>
>>> host-id=1
>>>
>>> score=3400
>>>
>>> vm_conf_refresh_time=20356 (Fri Jun 9 14:39:11 2017)
>>>
>>> conf_on_shared_storage=True
>>>
>>> maintenance=False
>>>
>>> state=EngineDown
>>>
>>> stopped=False
>>>
>>>
>>>
>>>
>>>
>>> --== Host 2 status ==--
>>>
>>>
>>>
>>> conf_on_shared_storage : True
>>>
>>> Status up-to-date : False
>>>
>>> Hostname : ovirt-hyp-02.example.lan
>>>
>>> Host ID : 2
>>>
>>> Engine status : unknown stale-data
>>>
>>> Score : 3400
>>>
>>> stopped : False
>>>
>>> Local maintenance : False
>>>
>>> crc32 : 936d4cf3
>>>
>>> local_conf_timestamp : 20351
>>>
>>> Host timestamp : 20337
>>>
>>> Extra metadata (valid at timestamp):
>>>
>>> metadata_parse_version=1
>>>
>>> metadata_feature_version=1
>>>
>>> timestamp=20337 (Fri Jun 9 14:39:03 2017)
>>>
>>> host-id=2
>>>
>>> score=3400
>>>
>>> vm_conf_refresh_time=20351 (Fri Jun 9 14:39:17 2017)
>>>
>>> conf_on_shared_storage=True
>>>
>>> maintenance=False
>>>
>>> state=EngineDown
>>>
>>> stopped=False
>>>
>>>
>>>
>>>
>>>
>>> --== Host 3 status ==--
>>>
>>>
>>>
>>> conf_on_shared_storage : True
>>>
>>> Status up-to-date : False
>>>
>>> Hostname : ovirt-hyp-03.example.lan
>>>
>>> Host ID : 3
>>>
>>> Engine status : unknown stale-data
>>>
>>> Score : 3400
>>>
>>> stopped : False
>>>
>>> Local maintenance : False
>>>
>>> crc32 : f646334e
>>>
>>> local_conf_timestamp : 20391
>>>
>>> Host timestamp : 20377
>>>
>>> Extra metadata (valid at timestamp):
>>>
>>> metadata_parse_version=1
>>>
>>> metadata_feature_version=1
>>>
>>> timestamp=20377 (Fri Jun 9 14:39:37 2017)
>>>
>>> host-id=3
>>>
>>> score=3400
>>>
>>> vm_conf_refresh_time=20391 (Fri Jun 9 14:39:51 2017)
>>>
>>> conf_on_shared_storage=True
>>>
>>> maintenance=False
>>>
>>> state=EngineStop
>>>
>>> stopped=False
>>>
>>> timeout=Thu Jan 1 00:43:08 1970
>>>
>>>
>>>
>>>
>>>
>>> [root@ovirt-hyp-01 ~]# gluster peer status
>>>
>>> Number of Peers: 2
>>>
>>>
>>>
>>> Hostname: 192.168.170.143
>>>
>>> Uuid: b2b30d05-cf91-4567-92fd-022575e082f5
>>>
>>> State: Peer in Cluster (Connected)
>>>
>>> Other names:
>>>
>>> 10.0.0.2
>>>
>>>
>>>
>>> Hostname: 192.168.170.147
>>>
>>> Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529
>>>
>>> State: Peer in Cluster (Connected)
>>>
>>> Other names:
>>>
>>> 10.0.0.3
>>>
>>>
>>>
>>> [root@ovirt-hyp-01 ~]# gluster volume info all
>>>
>>>
>>>
>>> Volume Name: data
>>>
>>> Type: Replicate
>>>
>>> Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02
>>>
>>> Status: Started
>>>
>>> Snapshot Count: 0
>>>
>>> Number of Bricks: 1 x (2 + 1) = 3
>>>
>>> Transport-type: tcp
>>>
>>> Bricks:
>>>
>>> Brick1: 192.168.170.141:/gluster_bricks/data/data
>>>
>>> Brick2: 192.168.170.143:/gluster_bricks/data/data
>>>
>>> Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter)
>>>
>>> Options Reconfigured:
>>>
>>> nfs.disable: on
>>>
>>> performance.readdir-ahead: on
>>>
>>> transport.address-family: inet
>>>
>>> performance.quick-read: off
>>>
>>> performance.read-ahead: off
>>>
>>> performance.io-cache: off
>>>
>>> performance.stat-prefetch: off
>>>
>>> performance.low-prio-threads: 32
>>>
>>> network.remote-dio: off
>>>
>>> cluster.eager-lock: enable
>>>
>>> cluster.quorum-type: auto
>>>
>>> cluster.server-quorum-type: server
>>>
>>> cluster.data-self-heal-algorithm: full
>>>
>>> cluster.locking-scheme: granular
>>>
>>> cluster.shd-max-threads: 8
>>>
>>> cluster.shd-wait-qlength: 10000
>>>
>>> features.shard: on
>>>
>>> user.cifs: off
>>>
>>> storage.owner-uid: 36
>>>
>>> storage.owner-gid: 36
>>>
>>> network.ping-timeout: 30
>>>
>>> performance.strict-o-direct: on
>>>
>>> cluster.granular-entry-heal: enable
>>>
>>>
>>>
>>> Volume Name: engine
>>>
>>> Type: Replicate
>>>
>>> Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd
>>>
>>> Status: Started
>>>
>>> Snapshot Count: 0
>>>
>>> Number of Bricks: 1 x (2 + 1) = 3
>>>
>>> Transport-type: tcp
>>>
>>> Bricks:
>>>
>>> Brick1: 192.168.170.141:/gluster_bricks/engine/engine
>>>
>>> Brick2: 192.168.170.143:/gluster_bricks/engine/engine
>>>
>>> Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter)
>>>
>>> Options Reconfigured:
>>>
>>> nfs.disable: on
>>>
>>> performance.readdir-ahead: on
>>>
>>> transport.address-family: inet
>>>
>>> performance.quick-read: off
>>>
>>> performance.read-ahead: off
>>>
>>> performance.io-cache: off
>>>
>>> performance.stat-prefetch: off
>>>
>>> performance.low-prio-threads: 32
>>>
>>> network.remote-dio: off
>>>
>>> cluster.eager-lock: enable
>>>
>>> cluster.quorum-type: auto
>>>
>>> cluster.server-quorum-type: server
>>>
>>> cluster.data-self-heal-algorithm: full
>>>
>>> cluster.locking-scheme: granular
>>>
>>> cluster.shd-max-threads: 8
>>>
>>> cluster.shd-wait-qlength: 10000
>>>
>>> features.shard: on
>>>
>>> user.cifs: off
>>>
>>> storage.owner-uid: 36
>>>
>>> storage.owner-gid: 36
>>>
>>> network.ping-timeout: 30
>>>
>>> performance.strict-o-direct: on
>>>
>>> cluster.granular-entry-heal: enable
>>>
>>>
>>>
>>>
>>>
>>> [root@ovirt-hyp-01 ~]# df -h
>>>
>>> Filesystem Size Used Avail Use%
>>> Mounted on
>>>
>>> /dev/mapper/centos_ovirt--hyp--01-root 50G 4.1G 46G 9% /
>>>
>>> devtmpfs 7.7G 0 7.7G 0%
>>> /dev
>>>
>>> tmpfs 7.8G 0 7.8G 0%
>>> /dev/shm
>>>
>>> tmpfs 7.8G 8.7M 7.7G 1%
>>> /run
>>>
>>> tmpfs 7.8G 0 7.8G 0%
>>> /sys/fs/cgroup
>>>
>>> /dev/mapper/centos_ovirt--hyp--01-home 61G 33M 61G 1%
>>> /home
>>>
>>> /dev/mapper/gluster_vg_sdb-gluster_lv_engine 50G 7.6G 43G 16%
>>> /gluster_bricks/engine
>>>
>>> /dev/mapper/gluster_vg_sdb-gluster_lv_data 730G 157G 574G 22%
>>> /gluster_bricks/data
>>>
>>> /dev/sda1 497M 173M 325M 35%
>>> /boot
>>>
>>> ovirt-hyp-01.example.lan:engine 50G 7.6G 43G 16%
>>> /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine
>>>
>>> tmpfs 1.6G 0 1.6G 0%
>>> /run/user/0
>>>
>>>
>>>
>>> [root@ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt
>>>
>>> ovirt-ha-agent.service enabled
>>>
>>> ovirt-ha-broker.service enabled
>>>
>>> ovirt-imageio-daemon.service disabled
>>>
>>> ovirt-vmconsole-host-sshd.service enabled
>>>
>>>
>>>
>>> [root@ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service
>>>
>>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>>> Monitoring Agent
>>>
>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>>> enabled; vendor preset: disabled)
>>>
>>> Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago
>>>
>>> Main PID: 3150 (ovirt-ha-agent)
>>>
>>> CGroup: /system.slice/ovirt-ha-agent.service
>>>
>>> └─3150 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>>> --no-daemon
>>>
>>>
>>>
>>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>>> Hosted Engine High Availability Monitoring Agent.
>>>
>>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>>> Hosted Engine High Availability Monitoring Agent...
>>>
>>> Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]:
>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>> ERROR Engine VM stopped on localhost
>>>
>>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>>>
>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>>> Communications Broker
>>>
>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>>> enabled; vendor preset: disabled)
>>>
>>> Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago
>>>
>>> Main PID: 968 (ovirt-ha-broker)
>>>
>>> CGroup: /system.slice/ovirt-ha-broker.service
>>>
>>> └─968 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>> --no-daemon
>>>
>>>
>>>
>>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>>> Hosted Engine High Availability Communications Broker.
>>>
>>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>>> Hosted Engine High Availability Communications Broker...
>>>
>>> Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]:
>>> ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler
>>> ERROR Error handling request, data: '...1b55bcf76'
>>>
>>> Traceback
>>> (most recent call last):
>>>
>>> File
>>> "/usr/lib/python2.7/site-packages/ovirt...
>>>
>>> Hint: Some lines were ellipsized, use -l to show in full.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service
>>>
>>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service
>>>
>>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>>> Monitoring Agent
>>>
>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>>> enabled; vendor preset: disabled)
>>>
>>> Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago
>>>
>>> Main PID: 8563 (ovirt-ha-agent)
>>>
>>> CGroup: /system.slice/ovirt-ha-agent.service
>>>
>>> └─8563 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>>> --no-daemon
>>>
>>>
>>>
>>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>>> Hosted Engine High Availability Monitoring Agent.
>>>
>>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>>> Hosted Engine High Availability Monitoring Agent...
>>>
>>> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service
>>>
>>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>>>
>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>>> Communications Broker
>>>
>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>>> enabled; vendor preset: disabled)
>>>
>>> Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago
>>>
>>> Main PID: 8844 (ovirt-ha-broker)
>>>
>>> CGroup: /system.slice/ovirt-ha-broker.service
>>>
>>> └─8844 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>> --no-daemon
>>>
>>>
>>>
>>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>>> Hosted Engine High Availability Communications Broker.
>>>
>>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>>> Hosted Engine High Availability Communications Broker...
>>>
>>>
>>> On Jun 14, 2017 4:45 AM, "Sahina Bose" <sabose(a)redhat.com>
wrote:
>>>
>>>> What's the output of "hosted-engine --vm-status" and
"gluster volume
>>>> status engine" tell you? Are all the bricks running as per gluster
vol
>>>> status?
>>>>
>>>> Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services?
>>>>
>>>> If HE still has issues powering up, please provide agent.log and
>>>> broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount
>>>> logs from /var/log/glusterfs/rhev-data-center-mnt <engine>.log
>>>>
>>>> On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <mrjoeldiaz(a)gmail.com>
>>>> wrote:
>>>>
>>>>> Good morning oVirt community,
>>>>>
>>>>> I'm running a three host gluster environment with hosted engine.
>>>>>
>>>>> Yesterday the engine went down and has not been able to come up
>>>>> properly. It tries to start on all three host.
>>>>>
>>>>> I have two gluster volumes, data and engne. The data storage domian
>>>>> volume is no longer mounted but the engine volume is up. I've
restarted the
>>>>> gluster service and make sure both volumes were running. The data
volume
>>>>> will not mount.
>>>>>
>>>>> How can I get the engine running properly again?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Joel
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users(a)ovirt.org
>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>
>>