New subject: Hosted Engine crash - state = EngineUp-EngineUpBadHealth

22 Dec 2015

      On Tue, Dec 22, 2015 at 4:03 PM, Will Dennis <wdennis@nec-labs.com> wrote:
...
I believe IPtables may be the culprit...
Host 1:
-------
[root@ovirt-node-01 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere             state
RELATED,ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:54321
ACCEPT     tcp  --  anywhere             anywhere             tcp
dpt:sunrpc
ACCEPT     udp  --  anywhere             anywhere             udp
dpt:sunrpc
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh
ACCEPT     udp  --  anywhere             anywhere             udp dpt:snmp
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:16514
ACCEPT     tcp  --  anywhere             anywhere             multiport
dports rockwell-csp2
ACCEPT     tcp  --  anywhere             anywhere             multiport
dports rfb:6923
ACCEPT     tcp  --  anywhere             anywhere             multiport
dports 49152:49216
REJECT     all  --  anywhere             anywhere             reject-with
icmp-host-prohibited
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
REJECT     all  --  anywhere             anywhere             PHYSDEV
match ! --physdev-is-bridged reject-with icmp-host-prohibited
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
Host 2:
-------
[root@ovirt-node-02 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere             state
RELATED,ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:54321
ACCEPT     tcp  --  anywhere             anywhere             tcp
dpt:sunrpc
ACCEPT     udp  --  anywhere             anywhere             udp
dpt:sunrpc
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh
ACCEPT     udp  --  anywhere             anywhere             udp dpt:snmp
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:16514
ACCEPT     tcp  --  anywhere             anywhere             multiport
dports rockwell-csp2
ACCEPT     tcp  --  anywhere             anywhere             multiport
dports rfb:6923
ACCEPT     tcp  --  anywhere             anywhere             multiport
dports 49152:49216
REJECT     all  --  anywhere             anywhere             reject-with
icmp-host-prohibited
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
REJECT     all  --  anywhere             anywhere             PHYSDEV
match ! --physdev-is-bridged reject-with icmp-host-prohibited
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
Host 3:
-------
[root@ovirt-node-03 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
An example of my Gluster engine volume status (off host #2):
[root@ovirt-node-02 ~]# gluster volume status
Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online
Pid
------------------------------------------------------------------------------
Brick ovirt-node-02:/gluster_brick2/engine_
brick                                       49217     0          Y
2973
Brick ovirt-node-03:/gluster_brick3/engine_
brick                                       N/A       N/A        N
N/A
Brick ovirt-node-02:/gluster_brick4/engine_
brick                                       49218     0          Y
2988
Brick ovirt-node-03:/gluster_brick5/engine_
brick                                       N/A       N/A        N
N/A
NFS Server on localhost                     2049      0          Y
3007
Self-heal Daemon on localhost               N/A       N/A        Y
3012
NFS Server on ovirt-node-03                 2049      0          Y
1671
Self-heal Daemon on ovirt-node-03           N/A       N/A        Y
1707
I had changed the base port # per instructions found at
http://www.ovirt.org/Features/Self_Hosted_Engine_Hyper_Converged_Gluster_Sup...
:
“By default gluster uses a port that vdsm also wants, so we need to change
base-port setting avoiding the clash between the two daemons. We need to add
option base-port 49217
to /etc/glusterfs/glusterd.vol
and ensure glusterd service is enabled and started before proceeding.”
So I did that on all the hosts:
[root@ovirt-node-02 ~]# cat /etc/glusterfs/glusterd.vol
volume management
type mgmt/glusterd
option working-directory /var/lib/glusterd
option transport-type socket,rdma
option transport.socket.keepalive-time 10
option transport.socket.keepalive-interval 2
option transport.socket.read-fail-log off
option ping-timeout 30
#   option base-port 49152
option base-port 49217
option rpc-auth-allow-insecure on
end-volume
Question: does oVirt really need IPtables to be enforcing rules, or can I
just set everything wide open? If I can, how to specify that in setup?
hosted-engine-setup asks:
          iptables was detected on your computer, do you wish setup to
configure it? (Yes, No)[Yes]:

You have just to say no here.

If you say no it's completely up to you to configure it opening the
required ports or everything disabling it if you don't care.

The issue with gluster ports is that hosted-engine-setup simply configure
iptables for what it knows you'll need and on 3.6 it's always assuming that
the gluster volume is served by external hosts.
...
W.
*From:* Sahina Bose [mailto:sabose@redhat.com]
*Sent:* Tuesday, December 22, 2015 9:19 AM
*To:* Will Dennis; Simone Tiraboschi; Dan Kenigsberg
*Subject:* Re: [ovirt-users] Hosted Engine crash - state =
EngineUp-EngineUpBadHealth
On 12/22/2015 07:47 PM, Sahina Bose wrote:
On 12/22/2015 07:28 PM, Will Dennis wrote:
See attached for requested log files
From gluster logs
[2015-12-22 00:40:53.501341] W [MSGID: 108001]
[afr-common.c:3924:afr_notify] 0-engine-replicate-1: Client-quorum is not
met
[2015-12-22 00:40:53.502288] W [socket.c:588:__socket_rwv]
0-engine-client-2: readv on 138.15.200.93:49217 failed (No data available)
[2015-12-22 00:41:17.667302] W [fuse-bridge.c:2292:fuse_writev_cbk]
0-glusterfs-fuse: 3875597: WRITE => -1 (Read-only file system)
Could you check if the gluster ports are open on all nodes?
It's possible you ran into this ? -
https://bugzilla.redhat.com/show_bug.cgi?id=1288979
*From:* Sahina Bose [mailto:sabose@redhat.com <sabose@redhat.com>]
*Sent:* Tuesday, December 22, 2015 4:59 AM
*To:* Simone Tiraboschi; Will Dennis; Dan Kenigsberg
*Cc:* users
*Subject:* Re: [ovirt-users] Hosted Engine crash - state =
EngineUp-EngineUpBadHealth
On 12/22/2015 02:38 PM, Simone Tiraboschi wrote:
On Tue, Dec 22, 2015 at 2:31 AM, Will Dennis <wdennis@nec-labs.com> wrote:
OK, another problem :(
I was having the same problem with my second oVirt host that I had with my
first one, where when I ran “hosted-engine —deploy” on it, after it
completed successfully, then I was experiencing a ~50sec lag when SSH’ing
into the node…
vpnp71:~ will$ time ssh root@ovirt-node-02 uptime
 19:36:06 up 4 days,  8:31,  0 users,  load average: 0.68, 0.70, 0.67
real  0m50.540s
user  0m0.025s
sys 0m0.008s
So, in the oVirt web admin console, I put the "ovirt-node-02” node into
Maintenance mode, then SSH’d to the server and rebooted it. Sure enough,
after the server came back up, SSH was fine (no delay), which again was the
same experience I had had with the first oVirt host. So, I went back to the
web console, and choose the “Confirm host has been rebooted” option, which
I thought would be the right action to take after a reboot. The system
opened a dialog box with a spinner, which never stopped spinning… So
finally, I closed the dialog box with the upper right (X) symbol, and then
for this same host choose “Activate” from the menu. It was then I noticed I
had recieved a state transition email notifying me that
"EngineUp-EngineUpBadHealth” and sure enough, the web UI was then
unresponsive. I checked on the first oVirt host, the VM with the name
“HostedEngine” is still running, but obviously isn’t working…
So, looks like I need to restart the HostedEngine VM or take whatever
action is needed to return oVirt to operation… Hate to keep asking this
question, but what’s the correct action at this point?
ovirt-ha-agent should always restart it for you after a few minutes but
the point is that the network configuration seams to be not that stable.
I know from another thread that you are trying to deploy hosted-engine
over GlusterFS in an hyperconverged way and this, as I said, is currently
not supported.
I think that it can also requires some specific configuration on network
side.
For hyperconverged gluster+engine , it should work without any specific
configuration on network side. However if the network is flaky, it is
possible that there are errors with gluster volume access. Could you
provide the ovirt-ha-agent logs as well as gluster mount logs?
Adding Sahina and Dan here.
Thanks, again,
Will
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Hosted Engine crash - state = EngineUp-EngineUpBadHealth

Simone Tiraboschi

Will Dennis

tags

participants (2)