[ovirt-users] Hosted engine

Sahina Bose sabose at redhat.com
Fri Jun 16 12:59:03 UTC 2017


I don't notice anything wrong on the gluster end.

Maybe Simone can help take a look at HE behaviour?

On Fri, Jun 16, 2017 at 6:14 PM, Joel Diaz <mrjoeldiaz at gmail.com> wrote:

> Good morning,
>
> Info requested below.
>
> [root at ovirt-hyp-02 ~]# hosted-engine --vm-start
>
> Exception in thread Client localhost:54321 (most likely raised during
> interpreter shutdown):VM exists and its status is Up
>
>
>
> [root at ovirt-hyp-02 ~]# ping engine
>
> PING engine.example.lan (192.168.170.149) 56(84) bytes of data.
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination
> Host Unreachable
>
> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination
> Host Unreachable
>
>
>
>
>
> [root at ovirt-hyp-02 ~]# gluster volume status engine
>
> Status of volume: engine
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick 192.168.170.141:/gluster_bricks/engin
>
> e/engine                                    49159     0          Y
> 1799
>
> Brick 192.168.170.143:/gluster_bricks/engin
>
> e/engine                                    49159     0          Y
> 2900
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 2914
>
> Self-heal Daemon on ovirt-hyp-01.example.lan   N/A       N/A
> Y       1854
>
>
>
> Task Status of Volume engine
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> [root at ovirt-hyp-02 ~]# gluster volume heal engine info
>
> Brick 192.168.170.141:/gluster_bricks/engine/engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick 192.168.170.143:/gluster_bricks/engine/engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick 192.168.170.147:/gluster_bricks/engine/engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> [root at ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-c
> enter-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log
>
> [2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
>
>
>
>
>
> Each of the three host sends out the following notifications about every
> 15 minutes.
>
> Hosted engine host: ovirt-hyp-01.example.lan changed state:
> EngineDown-EngineStart.
>
> Hosted engine host: ovirt-hyp-01.example.lan changed state:
> EngineStart-EngineStarting.
>
> Hosted engine host: ovirt-hyp-01.example.lan changed state:
> EngineStarting-EngineForceStop.
>
> Hosted engine host: ovirt-hyp-01.example.lan changed state:
> EngineForceStop-EngineDown.
>
> Please let me know if you need any additional information.
>
> Thank you,
>
> Joel
>
>
>
> On Jun 16, 2017 2:52 AM, "Sahina Bose" <sabose at redhat.com> wrote:
>
>> From the agent.log,
>> MainThread::INFO::2017-06-15 11:16:50,583::states::473::ovi
>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine
>> vm is running on host ovirt-hyp-02.reis.com (id 2)
>>
>> It looks like the HE VM was started successfully? Is it possible that the
>> ovirt-engine service could not be started on the HE VM. Could you try to
>> start the HE vm using below and then logging into the VM console.
>> #hosted-engine --vm-start
>>
>> Also, please check
>> # gluster volume status engine
>> # gluster volume heal engine info
>>
>> Please also check if there are errors in gluster mount logs - at
>> /var/log/glusterfs/rhev-data-center-mnt..<engine>.log
>>
>>
>> On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <mrjoeldiaz at gmail.com> wrote:
>>
>>> Sorry. I forgot to attached the requested logs in the previous email.
>>>
>>> Thanks,
>>>
>>> On Jun 15, 2017 9:38 AM, "Joel Diaz" <mrjoeldiaz at gmail.com> wrote:
>>>
>>> Good morning,
>>>
>>> Requested info below. Along with some additional info.
>>>
>>> You'll notice the data volume is not mounted.
>>>
>>> Any help in getting HE back running would be greatly appreciated.
>>>
>>> Thank you,
>>>
>>> Joel
>>>
>>> [root at ovirt-hyp-01 ~]# hosted-engine --vm-status
>>>
>>>
>>>
>>>
>>>
>>> --== Host 1 status ==--
>>>
>>>
>>>
>>> conf_on_shared_storage             : True
>>>
>>> Status up-to-date                  : False
>>>
>>> Hostname                           : ovirt-hyp-01.example.lan
>>>
>>> Host ID                            : 1
>>>
>>> Engine status                      : unknown stale-data
>>>
>>> Score                              : 3400
>>>
>>> stopped                            : False
>>>
>>> Local maintenance                  : False
>>>
>>> crc32                              : 5558a7d3
>>>
>>> local_conf_timestamp               : 20356
>>>
>>> Host timestamp                     : 20341
>>>
>>> Extra metadata (valid at timestamp):
>>>
>>>         metadata_parse_version=1
>>>
>>>         metadata_feature_version=1
>>>
>>>         timestamp=20341 (Fri Jun  9 14:38:57 2017)
>>>
>>>         host-id=1
>>>
>>>         score=3400
>>>
>>>         vm_conf_refresh_time=20356 (Fri Jun  9 14:39:11 2017)
>>>
>>>         conf_on_shared_storage=True
>>>
>>>         maintenance=False
>>>
>>>         state=EngineDown
>>>
>>>         stopped=False
>>>
>>>
>>>
>>>
>>>
>>> --== Host 2 status ==--
>>>
>>>
>>>
>>> conf_on_shared_storage             : True
>>>
>>> Status up-to-date                  : False
>>>
>>> Hostname                           : ovirt-hyp-02.example.lan
>>>
>>> Host ID                            : 2
>>>
>>> Engine status                      : unknown stale-data
>>>
>>> Score                              : 3400
>>>
>>> stopped                            : False
>>>
>>> Local maintenance                  : False
>>>
>>> crc32                              : 936d4cf3
>>>
>>> local_conf_timestamp               : 20351
>>>
>>> Host timestamp                     : 20337
>>>
>>> Extra metadata (valid at timestamp):
>>>
>>>         metadata_parse_version=1
>>>
>>>         metadata_feature_version=1
>>>
>>>         timestamp=20337 (Fri Jun  9 14:39:03 2017)
>>>
>>>         host-id=2
>>>
>>>         score=3400
>>>
>>>         vm_conf_refresh_time=20351 (Fri Jun  9 14:39:17 2017)
>>>
>>>         conf_on_shared_storage=True
>>>
>>>         maintenance=False
>>>
>>>         state=EngineDown
>>>
>>>         stopped=False
>>>
>>>
>>>
>>>
>>>
>>> --== Host 3 status ==--
>>>
>>>
>>>
>>> conf_on_shared_storage             : True
>>>
>>> Status up-to-date                  : False
>>>
>>> Hostname                           : ovirt-hyp-03.example.lan
>>>
>>> Host ID                            : 3
>>>
>>> Engine status                      : unknown stale-data
>>>
>>> Score                              : 3400
>>>
>>> stopped                            : False
>>>
>>> Local maintenance                  : False
>>>
>>> crc32                              : f646334e
>>>
>>> local_conf_timestamp               : 20391
>>>
>>> Host timestamp                     : 20377
>>>
>>> Extra metadata (valid at timestamp):
>>>
>>>         metadata_parse_version=1
>>>
>>>         metadata_feature_version=1
>>>
>>>         timestamp=20377 (Fri Jun  9 14:39:37 2017)
>>>
>>>         host-id=3
>>>
>>>         score=3400
>>>
>>>         vm_conf_refresh_time=20391 (Fri Jun  9 14:39:51 2017)
>>>
>>>         conf_on_shared_storage=True
>>>
>>>         maintenance=False
>>>
>>>         state=EngineStop
>>>
>>>         stopped=False
>>>
>>>         timeout=Thu Jan  1 00:43:08 1970
>>>
>>>
>>>
>>>
>>>
>>> [root at ovirt-hyp-01 ~]# gluster peer status
>>>
>>> Number of Peers: 2
>>>
>>>
>>>
>>> Hostname: 192.168.170.143
>>>
>>> Uuid: b2b30d05-cf91-4567-92fd-022575e082f5
>>>
>>> State: Peer in Cluster (Connected)
>>>
>>> Other names:
>>>
>>> 10.0.0.2
>>>
>>>
>>>
>>> Hostname: 192.168.170.147
>>>
>>> Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529
>>>
>>> State: Peer in Cluster (Connected)
>>>
>>> Other names:
>>>
>>> 10.0.0.3
>>>
>>>
>>>
>>> [root at ovirt-hyp-01 ~]# gluster volume info all
>>>
>>>
>>>
>>> Volume Name: data
>>>
>>> Type: Replicate
>>>
>>> Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02
>>>
>>> Status: Started
>>>
>>> Snapshot Count: 0
>>>
>>> Number of Bricks: 1 x (2 + 1) = 3
>>>
>>> Transport-type: tcp
>>>
>>> Bricks:
>>>
>>> Brick1: 192.168.170.141:/gluster_bricks/data/data
>>>
>>> Brick2: 192.168.170.143:/gluster_bricks/data/data
>>>
>>> Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter)
>>>
>>> Options Reconfigured:
>>>
>>> nfs.disable: on
>>>
>>> performance.readdir-ahead: on
>>>
>>> transport.address-family: inet
>>>
>>> performance.quick-read: off
>>>
>>> performance.read-ahead: off
>>>
>>> performance.io-cache: off
>>>
>>> performance.stat-prefetch: off
>>>
>>> performance.low-prio-threads: 32
>>>
>>> network.remote-dio: off
>>>
>>> cluster.eager-lock: enable
>>>
>>> cluster.quorum-type: auto
>>>
>>> cluster.server-quorum-type: server
>>>
>>> cluster.data-self-heal-algorithm: full
>>>
>>> cluster.locking-scheme: granular
>>>
>>> cluster.shd-max-threads: 8
>>>
>>> cluster.shd-wait-qlength: 10000
>>>
>>> features.shard: on
>>>
>>> user.cifs: off
>>>
>>> storage.owner-uid: 36
>>>
>>> storage.owner-gid: 36
>>>
>>> network.ping-timeout: 30
>>>
>>> performance.strict-o-direct: on
>>>
>>> cluster.granular-entry-heal: enable
>>>
>>>
>>>
>>> Volume Name: engine
>>>
>>> Type: Replicate
>>>
>>> Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd
>>>
>>> Status: Started
>>>
>>> Snapshot Count: 0
>>>
>>> Number of Bricks: 1 x (2 + 1) = 3
>>>
>>> Transport-type: tcp
>>>
>>> Bricks:
>>>
>>> Brick1: 192.168.170.141:/gluster_bricks/engine/engine
>>>
>>> Brick2: 192.168.170.143:/gluster_bricks/engine/engine
>>>
>>> Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter)
>>>
>>> Options Reconfigured:
>>>
>>> nfs.disable: on
>>>
>>> performance.readdir-ahead: on
>>>
>>> transport.address-family: inet
>>>
>>> performance.quick-read: off
>>>
>>> performance.read-ahead: off
>>>
>>> performance.io-cache: off
>>>
>>> performance.stat-prefetch: off
>>>
>>> performance.low-prio-threads: 32
>>>
>>> network.remote-dio: off
>>>
>>> cluster.eager-lock: enable
>>>
>>> cluster.quorum-type: auto
>>>
>>> cluster.server-quorum-type: server
>>>
>>> cluster.data-self-heal-algorithm: full
>>>
>>> cluster.locking-scheme: granular
>>>
>>> cluster.shd-max-threads: 8
>>>
>>> cluster.shd-wait-qlength: 10000
>>>
>>> features.shard: on
>>>
>>> user.cifs: off
>>>
>>> storage.owner-uid: 36
>>>
>>> storage.owner-gid: 36
>>>
>>> network.ping-timeout: 30
>>>
>>> performance.strict-o-direct: on
>>>
>>> cluster.granular-entry-heal: enable
>>>
>>>
>>>
>>>
>>>
>>> [root at ovirt-hyp-01 ~]# df -h
>>>
>>> Filesystem                                    Size  Used Avail Use%
>>> Mounted on
>>>
>>> /dev/mapper/centos_ovirt--hyp--01-root         50G  4.1G   46G   9% /
>>>
>>> devtmpfs                                      7.7G     0  7.7G   0% /dev
>>>
>>> tmpfs                                         7.8G     0  7.8G   0%
>>> /dev/shm
>>>
>>> tmpfs                                         7.8G  8.7M  7.7G   1% /run
>>>
>>> tmpfs                                         7.8G     0  7.8G   0%
>>> /sys/fs/cgroup
>>>
>>> /dev/mapper/centos_ovirt--hyp--01-home         61G   33M   61G   1%
>>> /home
>>>
>>> /dev/mapper/gluster_vg_sdb-gluster_lv_engine   50G  7.6G   43G  16%
>>> /gluster_bricks/engine
>>>
>>> /dev/mapper/gluster_vg_sdb-gluster_lv_data    730G  157G  574G  22%
>>> /gluster_bricks/data
>>>
>>> /dev/sda1                                     497M  173M  325M  35%
>>> /boot
>>>
>>> ovirt-hyp-01.example.lan:engine                   50G  7.6G   43G  16%
>>> /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine
>>>
>>> tmpfs                                         1.6G     0  1.6G   0%
>>> /run/user/0
>>>
>>>
>>>
>>> [root at ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt
>>>
>>> ovirt-ha-agent.service                        enabled
>>>
>>> ovirt-ha-broker.service                       enabled
>>>
>>> ovirt-imageio-daemon.service                  disabled
>>>
>>> ovirt-vmconsole-host-sshd.service             enabled
>>>
>>>
>>>
>>> [root at ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service
>>>
>>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>>> Monitoring Agent
>>>
>>>   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>>> enabled; vendor preset: disabled)
>>>
>>>    Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago
>>>
>>> Main PID: 3150 (ovirt-ha-agent)
>>>
>>>    CGroup: /system.slice/ovirt-ha-agent.service
>>>
>>>            └─3150 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>>> --no-daemon
>>>
>>>
>>>
>>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>>> Hosted Engine High Availability Monitoring Agent.
>>>
>>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>>> Hosted Engine High Availability Monitoring Agent...
>>>
>>> Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]:
>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>> ERROR Engine VM stopped on localhost
>>>
>>> [root at ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>>>
>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>>> Communications Broker
>>>
>>>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>>> enabled; vendor preset: disabled)
>>>
>>>    Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago
>>>
>>> Main PID: 968 (ovirt-ha-broker)
>>>
>>>    CGroup: /system.slice/ovirt-ha-broker.service
>>>
>>>            └─968 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>> --no-daemon
>>>
>>>
>>>
>>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>>> Hosted Engine High Availability Communications Broker.
>>>
>>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>>> Hosted Engine High Availability Communications Broker...
>>>
>>> Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]:
>>> ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler
>>> ERROR Error handling request, data: '...1b55bcf76'
>>>
>>>                                                             Traceback
>>> (most recent call last):
>>>
>>>                                                               File
>>> "/usr/lib/python2.7/site-packages/ovirt...
>>>
>>> Hint: Some lines were ellipsized, use -l to show in full.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> [root at ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service
>>>
>>> [root at ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service
>>>
>>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>>> Monitoring Agent
>>>
>>>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>>> enabled; vendor preset: disabled)
>>>
>>>    Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago
>>>
>>> Main PID: 8563 (ovirt-ha-agent)
>>>
>>>    CGroup: /system.slice/ovirt-ha-agent.service
>>>
>>>            └─8563 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>>> --no-daemon
>>>
>>>
>>>
>>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>>> Hosted Engine High Availability Monitoring Agent.
>>>
>>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>>> Hosted Engine High Availability Monitoring Agent...
>>>
>>> [root at ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service
>>>
>>> [root at ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
>>>
>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>>> Communications Broker
>>>
>>>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>>> enabled; vendor preset: disabled)
>>>
>>>    Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago
>>>
>>> Main PID: 8844 (ovirt-ha-broker)
>>>
>>>    CGroup: /system.slice/ovirt-ha-broker.service
>>>
>>>            └─8844 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>> --no-daemon
>>>
>>>
>>>
>>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt
>>> Hosted Engine High Availability Communications Broker.
>>>
>>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt
>>> Hosted Engine High Availability Communications Broker...
>>>
>>>
>>> On Jun 14, 2017 4:45 AM, "Sahina Bose" <sabose at redhat.com> wrote:
>>>
>>>> What's the output of "hosted-engine --vm-status" and "gluster volume
>>>> status engine" tell you? Are all the bricks running as per gluster vol
>>>> status?
>>>>
>>>> Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services?
>>>>
>>>> If HE still has issues powering up, please provide agent.log and
>>>> broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount logs
>>>> from /var/log/glusterfs/rhev-data-center-mnt <engine>.log
>>>>
>>>> On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <mrjoeldiaz at gmail.com> wrote:
>>>>
>>>>> Good morning oVirt community,
>>>>>
>>>>> I'm running a three host gluster environment with hosted engine.
>>>>>
>>>>> Yesterday the engine went down and has not been able to come up
>>>>> properly. It tries to start on all three host.
>>>>>
>>>>> I have two gluster volumes, data and engne. The data storage domian
>>>>> volume is no longer mounted but the engine volume is up. I've restarted the
>>>>> gluster service and make sure both volumes were running. The data volume
>>>>> will not mount.
>>>>>
>>>>> How can I get the engine running properly again?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Joel
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170616/21de5c87/attachment-0001.html>


More information about the Users mailing list