Ok.

Simone,

Please let me know if I can provide any additional log files.

Thanks for taking the time to look into this.

Joel

On Jun 16, 2017 8:59 AM, "Sahina Bose" <sabose@redhat.com> wrote:
I don't notice anything wrong on the gluster end.

Maybe Simone can help take a look at HE behaviour?

On Fri, Jun 16, 2017 at 6:14 PM, Joel Diaz <mrjoeldiaz@gmail.com> wrote:
Good morning,

Info requested below.

[root@ovirt-hyp-02 ~]# hosted-engine --vm-start

Exception in thread Client localhost:54321 (most likely raised during interpreter shutdown):VM exists and its status is Up

 

[root@ovirt-hyp-02 ~]# ping engine

PING engine.example.lan (192.168.170.149) 56(84) bytes of data.

From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination Host Unreachable

From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination Host Unreachable

From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination Host Unreachable

From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination Host Unreachable

From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination Host Unreachable

>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination Host Unreachable

From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination Host Unreachable

From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination Host Unreachable

 

 

[root@ovirt-hyp-02 ~]# gluster volume status engine

Status of volume: engine

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick 192.168.170.141:/gluster_bricks/engin

e/engine                                    49159     0          Y       1799

Brick 192.168.170.143:/gluster_bricks/engin

e/engine                                    49159     0          Y       2900

Self-heal Daemon on localhost               N/A       N/A        Y       2914

Self-heal Daemon on ovirt-hyp-01.example.lan   N/A       N/A        Y       1854

 

Task Status of Volume engine

------------------------------------------------------------------------------

There are no active volume tasks

 

[root@ovirt-hyp-02 ~]# gluster volume heal engine info

Brick 192.168.170.141:/gluster_bricks/engine/engine

Status: Connected

Number of entries: 0

 

Brick 192.168.170.143:/gluster_bricks/engine/engine

Status: Connected

Number of entries: 0

 

Brick 192.168.170.147:/gluster_bricks/engine/engine

Status: Connected

Number of entries: 0

 

[root@ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-center-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log

[2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

 

 

Each of the three host sends out the following notifications about every 15 minutes.

Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineDown-EngineStart.

Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineStart-EngineStarting.

Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineStarting-EngineForceStop.

Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineForceStop-EngineDown.

Please let me know if you need any additional information.

Thank you,

Joel




On Jun 16, 2017 2:52 AM, "Sahina Bose" <sabose@redhat.com> wrote:
From the agent.log, 
MainThread::INFO::2017-06-15 11:16:50,583::states::473::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm is running on host ovirt-hyp-02.reis.com (id 2)

It looks like the HE VM was started successfully? Is it possible that the ovirt-engine service could not be started on the HE VM. Could you try to start the HE vm using below and then logging into the VM console.
#hosted-engine --vm-start

Also, please check
# gluster volume status engine
# gluster volume heal engine info

Please also check if there are errors in gluster mount logs - at /var/log/glusterfs/rhev-data-center-mnt..<engine>.log


On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <mrjoeldiaz@gmail.com> wrote:
Sorry. I forgot to attached the requested logs in the previous email.

Thanks,

On Jun 15, 2017 9:38 AM, "Joel Diaz" <mrjoeldiaz@gmail.com> wrote:
Good morning,

Requested info below. Along with some additional info. 

You'll notice the data volume is not mounted.

Any help in getting HE back running would be greatly appreciated.

Thank you,

Joel

[root@ovirt-hyp-01 ~]# hosted-engine --vm-status

 

 

--== Host 1 status ==--

 

conf_on_shared_storage             : True

Status up-to-date                  : False

Hostname                           : ovirt-hyp-01.example.lan

Host ID                            : 1

Engine status                      : unknown stale-data

Score                              : 3400

stopped                            : False

Local maintenance                  : False

crc32                              : 5558a7d3

local_conf_timestamp               : 20356

Host timestamp                     : 20341

Extra metadata (valid at timestamp):

        metadata_parse_version=1

        metadata_feature_version=1

        timestamp=20341 (Fri Jun  9 14:38:57 2017)

        host-id=1

        score=3400

        vm_conf_refresh_time=20356 (Fri Jun  9 14:39:11 2017)

        conf_on_shared_storage=True

        maintenance=False

        state=EngineDown

        stopped=False

 

 

--== Host 2 status ==--

 

conf_on_shared_storage             : True

Status up-to-date                  : False

Hostname                           : ovirt-hyp-02.example.lan

Host ID                            : 2

Engine status                      : unknown stale-data

Score                              : 3400

stopped                            : False

Local maintenance                  : False

crc32                              : 936d4cf3

local_conf_timestamp               : 20351

Host timestamp                     : 20337

Extra metadata (valid at timestamp):

        metadata_parse_version=1

        metadata_feature_version=1

        timestamp=20337 (Fri Jun  9 14:39:03 2017)

        host-id=2

        score=3400

        vm_conf_refresh_time=20351 (Fri Jun  9 14:39:17 2017)

        conf_on_shared_storage=True

        maintenance=False

        state=EngineDown

        stopped=False

 

 

--== Host 3 status ==--

 

conf_on_shared_storage             : True

Status up-to-date                  : False

Hostname                           : ovirt-hyp-03.example.lan

Host ID                            : 3

Engine status                      : unknown stale-data

Score                              : 3400

stopped                            : False

Local maintenance                  : False

crc32                              : f646334e

local_conf_timestamp               : 20391

Host timestamp                     : 20377

Extra metadata (valid at timestamp):

        metadata_parse_version=1

        metadata_feature_version=1

        timestamp=20377 (Fri Jun  9 14:39:37 2017)

        host-id=3

        score=3400

        vm_conf_refresh_time=20391 (Fri Jun  9 14:39:51 2017)

        conf_on_shared_storage=True

        maintenance=False

        state=EngineStop

        stopped=False

        timeout=Thu Jan  1 00:43:08 1970

 

 

[root@ovirt-hyp-01 ~]# gluster peer status

Number of Peers: 2

 

Hostname: 192.168.170.143

Uuid: b2b30d05-cf91-4567-92fd-022575e082f5

State: Peer in Cluster (Connected)

Other names:

10.0.0.2

 

Hostname: 192.168.170.147

Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529

State: Peer in Cluster (Connected)

Other names:

10.0.0.3

 

[root@ovirt-hyp-01 ~]# gluster volume info all

 

Volume Name: data

Type: Replicate

Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: 192.168.170.141:/gluster_bricks/data/data

Brick2: 192.168.170.143:/gluster_bricks/data/data

Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter)

Options Reconfigured:

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

performance.low-prio-threads: 32

network.remote-dio: off

cluster.eager-lock: enable

cluster.quorum-type: auto

cluster.server-quorum-type: server

cluster.data-self-heal-algorithm: full

cluster.locking-scheme: granular

cluster.shd-max-threads: 8

cluster.shd-wait-qlength: 10000

features.shard: on

user.cifs: off

storage.owner-uid: 36

storage.owner-gid: 36

network.ping-timeout: 30

performance.strict-o-direct: on

cluster.granular-entry-heal: enable

 

Volume Name: engine

Type: Replicate

Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: 192.168.170.141:/gluster_bricks/engine/engine

Brick2: 192.168.170.143:/gluster_bricks/engine/engine

Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter)

Options Reconfigured:

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

performance.low-prio-threads: 32

network.remote-dio: off

cluster.eager-lock: enable

cluster.quorum-type: auto

cluster.server-quorum-type: server

cluster.data-self-heal-algorithm: full

cluster.locking-scheme: granular

cluster.shd-max-threads: 8

cluster.shd-wait-qlength: 10000

features.shard: on

user.cifs: off

storage.owner-uid: 36

storage.owner-gid: 36

network.ping-timeout: 30

performance.strict-o-direct: on

cluster.granular-entry-heal: enable

 

 

[root@ovirt-hyp-01 ~]# df -h

Filesystem                                    Size  Used Avail Use% Mounted on

/dev/mapper/centos_ovirt--hyp--01-root         50G  4.1G   46G   9% /

devtmpfs                                      7.7G     0  7.7G   0% /dev

tmpfs                                         7.8G     0  7.8G   0% /dev/shm

tmpfs                                         7.8G  8.7M  7.7G   1% /run

tmpfs                                         7.8G     0  7.8G   0% /sys/fs/cgroup

/dev/mapper/centos_ovirt--hyp--01-home         61G   33M   61G   1% /home

/dev/mapper/gluster_vg_sdb-gluster_lv_engine   50G  7.6G   43G  16% /gluster_bricks/engine

/dev/mapper/gluster_vg_sdb-gluster_lv_data    730G  157G  574G  22% /gluster_bricks/data

/dev/sda1                                     497M  173M  325M  35% /boot

ovirt-hyp-01.example.lan:engine                   50G  7.6G   43G  16% /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine

tmpfs                                         1.6G     0  1.6G   0% /run/user/0

 

[root@ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt

ovirt-ha-agent.service                        enabled

ovirt-ha-broker.service                       enabled

ovirt-imageio-daemon.service                  disabled

ovirt-vmconsole-host-sshd.service             enabled

 

[root@ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service

● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent

  Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)

   Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago

Main PID: 3150 (ovirt-ha-agent)

   CGroup: /system.slice/ovirt-ha-agent.service

           └─3150 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

 

Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.

Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...

Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost

[root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service

● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker

   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)

   Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago

Main PID: 968 (ovirt-ha-broker)

   CGroup: /system.slice/ovirt-ha-broker.service

           └─968 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon

 

Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.

Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...

Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: '...1b55bcf76'

                                                            Traceback (most recent call last):

                                                              File "/usr/lib/python2.7/site-packages/ovirt...

Hint: Some lines were ellipsized, use -l to show in full.

 

 

 

 

[root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service

[root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service

● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent

   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)

   Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago

Main PID: 8563 (ovirt-ha-agent)

   CGroup: /system.slice/ovirt-ha-agent.service

           └─8563 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

 

Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.

Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...

[root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service

[root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service

● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker

   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)

   Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago

Main PID: 8844 (ovirt-ha-broker)

   CGroup: /system.slice/ovirt-ha-broker.service

           └─8844 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon

 

Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.

Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...



On Jun 14, 2017 4:45 AM, "Sahina Bose" <sabose@redhat.com> wrote:
What's the output of "hosted-engine --vm-status" and "gluster volume status engine" tell you? Are all the bricks running as per gluster vol status?

Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services?

If HE still has issues powering up, please provide agent.log and broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount logs from /var/log/glusterfs/rhev-data-center-mnt <engine>.log

On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <mrjoeldiaz@gmail.com> wrote:
Good morning oVirt community,

I'm running a three host gluster environment with hosted engine.

Yesterday the engine went down and has not been able to come up properly. It tries to start on all three host.

I have two gluster volumes, data and engne. The data storage domian volume is no longer mounted but the engine volume is up. I've restarted the gluster service and make sure both volumes were running. The data volume will not mount.

How can I get the engine running properly again?

Thanks,

Joel

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users