Ok.Simone,Please let me know if I can provide any additional log files.Thanks for taking the time to look into this.JoelOn Jun 16, 2017 8:59 AM, "Sahina Bose" <sabose@redhat.com> wrote:I don't notice anything wrong on the gluster end.Maybe Simone can help take a look at HE behaviour?On Fri, Jun 16, 2017 at 6:14 PM, Joel Diaz <mrjoeldiaz@gmail.com> wrote:Good morning,Info requested below.[root@ovirt-hyp-02 ~]# hosted-engine --vm-start
Exception in thread Client localhost:54321 (most likely raised during interpreter shutdown):VM exists and its status is Up
[root@ovirt-hyp-02 ~]# ping engine
PING engine.example.lan (192.168.170.149) 56(84) bytes of data.
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination Host Unreachable
>From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination Host Unreachable
From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination Host Unreachable
[root@ovirt-hyp-02 ~]# gluster volume status engine
Status of volume: engine
Gluster process
TCP Port RDMA Port Online Pid ------------------------------
------------------------------ ------------------ Brick 192.168.170.141:/gluster_brick
s/engin e/engine
49159 0 Y 1799 Brick 192.168.170.143:/gluster_brick
s/engin e/engine
49159 0 Y 2900 Self-heal Daemon on localhost N/A N/A Y 2914
Self-heal Daemon on ovirt-hyp-01.example.lan N/A N/A Y 1854
Task Status of Volume engine
------------------------------
------------------------------ ------------------ There are no active volume tasks
[root@ovirt-hyp-02 ~]# gluster volume heal engine info
Brick 192.168.170.141:/gluster_brick
s/engine/engine Status: Connected
Number of entries: 0
Brick 192.168.170.143:/gluster_brick
s/engine/engine Status: Connected
Number of entries: 0
Brick 192.168.170.147:/gluster_brick
s/engine/engine Status: Connected
Number of entries: 0
[root@ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-c
enter-mnt-glusterSD-ovirt-hyp- 01.example.lan\:engine.log [2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_g
etspec_cbk] 0-glusterfs: No change in volfile, continuing
Each of the three host sends out the following notifications about every 15 minutes.
Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineDown-EngineStart.
Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineStart-EngineStarting.
Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineStarting-EngineForceStop
. Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineForceStop-EngineDown.
Please let me know if you need any additional information.
Thank you,
Joel
On Jun 16, 2017 2:52 AM, "Sahina Bose" <sabose@redhat.com> wrote:Please also check if there are errors in gluster mount logs - at /var/log/glusterfs/rhev-data-c# gluster volume heal engine info# gluster volume status engineAlso, please check#hosted-engine --vm-startFrom the agent.log,It looks like the HE VM was started successfully? Is it possible that the ovirt-engine service could not be started on the HE VM. Could you try to start the HE vm using below and then logging into the VM console.
MainThread::INFO::2017-06-15 11:16:50,583::states::473::ovirt_hosted_engine_ha.agent.host ed_engine.HostedEngine::(consu me) Engine vm is running on host ovirt-hyp-02.reis.com (id 2) enter-mnt..<engine>.log On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <mrjoeldiaz@gmail.com> wrote:Sorry. I forgot to attached the requested logs in the previous email.Thanks,On Jun 15, 2017 9:38 AM, "Joel Diaz" <mrjoeldiaz@gmail.com> wrote:Good morning,Requested info below. Along with some additional info.You'll notice the data volume is not mounted.Any help in getting HE back running would be greatly appreciated.Thank you,Joel[root@ovirt-hyp-01 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage
: True Status up-to-date : False
Hostname
: ovirt-hyp-01.example.lan Host ID : 1
Engine status : unknown stale-data
Score
: 3400 stopped
: False Local maintenance : False
crc32
: 5558a7d3 local_conf_timestamp
: 20356 Host timestamp : 20341
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=20341 (Fri Jun 9 14:38:57 2017)
host-id=1
score=3400
vm_conf_refresh_time=20356 (Fri Jun 9 14:39:11 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host 2 status ==--
conf_on_shared_storage
: True Status up-to-date : False
Hostname
: ovirt-hyp-02.example.lan Host ID : 2
Engine status : unknown stale-data
Score
: 3400 stopped
: False Local maintenance : False
crc32
: 936d4cf3 local_conf_timestamp
: 20351 Host timestamp : 20337
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=20337 (Fri Jun 9 14:39:03 2017)
host-id=2
score=3400
vm_conf_refresh_time=20351 (Fri Jun 9 14:39:17 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host 3 status ==--
conf_on_shared_storage
: True Status up-to-date : False
Hostname
: ovirt-hyp-03.example.lan Host ID : 3
Engine status : unknown stale-data
Score
: 3400 stopped
: False Local maintenance : False
crc32
: f646334e local_conf_timestamp
: 20391 Host timestamp : 20377
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=20377 (Fri Jun 9 14:39:37 2017)
host-id=3
score=3400
vm_conf_refresh_time=20391 (Fri Jun 9 14:39:51 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineStop
stopped=False
timeout=Thu Jan 1 00:43:08 1970
[root@ovirt-hyp-01 ~]# gluster peer status
Number of Peers: 2
Hostname: 192.168.170.143
Uuid: b2b30d05-cf91-4567-92fd-022575
e082f5 State: Peer in Cluster (Connected)
Other names:
10.0.0.2
Hostname: 192.168.170.147
Uuid: 4e50acc4-f3cb-422d-b499-fb5796
a53529 State: Peer in Cluster (Connected)
Other names:
10.0.0.3
[root@ovirt-hyp-01 ~]# gluster volume info all
Volume Name: data
Type: Replicate
Volume ID: 1d6bb110-9be4-4630-ae91-36ec1c
f6cc02 Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.170.141:/gluster_brick
s/data/data Brick2: 192.168.170.143:/gluster_brick
s/data/data Brick3: 192.168.170.147:/gluster_brick
s/data/data (arbiter) Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorit
hm: full cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable
Volume Name: engine
Type: Replicate
Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab
1519dd Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.170.141:/gluster_brick
s/engine/engine Brick2: 192.168.170.143:/gluster_brick
s/engine/engine Brick3: 192.168.170.147:/gluster_brick
s/engine/engine (arbiter) Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorit
hm: full cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable
[root@ovirt-hyp-01 ~]# df -h
Filesystem
Size Used Avail Use% Mounted on /dev/mapper/centos_ovirt--hyp-
-01-root 50G 4.1G 46G 9% / devtmpfs
7.7G 0 7.7G 0% /dev tmpfs
7.8G 0 7.8G 0% /dev/shm tmpfs
7.8G 8.7M 7.7G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos_ovirt--hyp-
-01-home 61G 33M 61G 1% /home /dev/mapper/gluster_vg_sdb-glu
ster_lv_engine 50G 7.6G 43G 16% /gluster_bricks/engine /dev/mapper/gluster_vg_sdb-glu
ster_lv_data 730G 157G 574G 22% /gluster_bricks/data /dev/sda1
497M 173M 325M 35% /boot ovirt-hyp-01.example.lan:engin
e 50G 7.6G 43G 16% /rhev/data-center/mnt/glusterS D/ovirt-hyp-01.example.lan:eng ine tmpfs
1.6G 0 1.6G 0% /run/user/0
[root@ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt
ovirt-ha-agent.service
enabled ovirt-ha-broker.service
enabled ovirt-imageio-daemon.service
disabled ovirt-vmconsole-host-sshd.serv
ice enabled
[root@ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt
-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago
Main PID: 3150 (ovirt-ha-agent)
CGroup: /system.slice/ovirt-ha-agent.s
ervice └─3150 /usr/bin/python /usr/share/ovirt-hosted-engine
-ha/ovirt-ha-agent --no-daemon
Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...
Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.h
osted_engine.HostedEngine ERROR Engine VM stopped on localhost [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
Loaded: loaded (/usr/lib/systemd/system/ovirt
-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago
Main PID: 968 (ovirt-ha-broker)
CGroup: /system.slice/ovirt-ha-broker.
service └─968 /usr/bin/python /usr/share/ovirt-hosted-engine
-ha/ovirt-ha-broker --no-daemon
Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.
listener.ConnectionHandler ERROR Error handling request, data: '...1b55bcf76'
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packa ges/ovirt... Hint: Some lines were ellipsized, use -l to show in full.
[root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service
[root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt
-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago
Main PID: 8563 (ovirt-ha-agent)
CGroup: /system.slice/ovirt-ha-agent.s
ervice └─8563 /usr/bin/python /usr/share/ovirt-hosted-engine
-ha/ovirt-ha-agent --no-daemon
Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...
[root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service
[root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
Loaded: loaded (/usr/lib/systemd/system/ovirt
-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago
Main PID: 8844 (ovirt-ha-broker)
CGroup: /system.slice/ovirt-ha-broker.
service └─8844 /usr/bin/python /usr/share/ovirt-hosted-engine
-ha/ovirt-ha-broker --no-daemon
Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
On Jun 14, 2017 4:45 AM, "Sahina Bose" <sabose@redhat.com> wrote:If HE still has issues powering up, please provide agent.log and broker.log from /var/log/ovirt-hosted-engine-hWhat's the output of "hosted-engine --vm-status" and "gluster volume status engine" tell you? Are all the bricks running as per gluster vol status?Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services?a and gluster mount logs from /var/log/glusterfs/rhev-data-c enter-mnt <engine>.log On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <mrjoeldiaz@gmail.com> wrote:Good morning oVirt community,I'm running a three host gluster environment with hosted engine.Yesterday the engine went down and has not been able to come up properly. It tries to start on all three host.I have two gluster volumes, data and engne. The data storage domian volume is no longer mounted but the engine volume is up. I've restarted the gluster service and make sure both volumes were running. The data volume will not mount.How can I get the engine running properly again?Thanks,Joel
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users