[ovirt-users] Hosted Engine crash - state = EngineUp-EngineUpBadHealth

Simone Tiraboschi stirabos at redhat.com
Tue Dec 22 15:23:27 UTC 2015


On Tue, Dec 22, 2015 at 4:03 PM, Will Dennis <wdennis at nec-labs.com> wrote:

> I believe IPtables may be the culprit...
>
>
>
> Host 1:
>
> -------
>
> [root at ovirt-node-01 ~]# iptables -L
>
> Chain INPUT (policy ACCEPT)
>
> target     prot opt source               destination
>
> ACCEPT     all  --  anywhere             anywhere             state
> RELATED,ESTABLISHED
>
> ACCEPT     icmp --  anywhere             anywhere
>
> ACCEPT     all  --  anywhere             anywhere
>
> ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:54321
>
> ACCEPT     tcp  --  anywhere             anywhere             tcp
> dpt:sunrpc
>
> ACCEPT     udp  --  anywhere             anywhere             udp
> dpt:sunrpc
>
> ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh
>
> ACCEPT     udp  --  anywhere             anywhere             udp dpt:snmp
>
> ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:16514
>
> ACCEPT     tcp  --  anywhere             anywhere             multiport
> dports rockwell-csp2
>
> ACCEPT     tcp  --  anywhere             anywhere             multiport
> dports rfb:6923
>
> ACCEPT     tcp  --  anywhere             anywhere             multiport
> dports 49152:49216
>
> REJECT     all  --  anywhere             anywhere             reject-with
> icmp-host-prohibited
>
>
>
> Chain FORWARD (policy ACCEPT)
>
> target     prot opt source               destination
>
> REJECT     all  --  anywhere             anywhere             PHYSDEV
> match ! --physdev-is-bridged reject-with icmp-host-prohibited
>
>
>
> Chain OUTPUT (policy ACCEPT)
>
> target     prot opt source               destination
>
>
>
> Host 2:
>
> -------
>
> [root at ovirt-node-02 ~]# iptables -L
>
> Chain INPUT (policy ACCEPT)
>
> target     prot opt source               destination
>
> ACCEPT     all  --  anywhere             anywhere             state
> RELATED,ESTABLISHED
>
> ACCEPT     icmp --  anywhere             anywhere
>
> ACCEPT     all  --  anywhere             anywhere
>
> ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:54321
>
> ACCEPT     tcp  --  anywhere             anywhere             tcp
> dpt:sunrpc
>
> ACCEPT     udp  --  anywhere             anywhere             udp
> dpt:sunrpc
>
> ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh
>
> ACCEPT     udp  --  anywhere             anywhere             udp dpt:snmp
>
> ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:16514
>
> ACCEPT     tcp  --  anywhere             anywhere             multiport
> dports rockwell-csp2
>
> ACCEPT     tcp  --  anywhere             anywhere             multiport
> dports rfb:6923
>
> ACCEPT     tcp  --  anywhere             anywhere             multiport
> dports 49152:49216
>
> REJECT     all  --  anywhere             anywhere             reject-with
> icmp-host-prohibited
>
>
>
> Chain FORWARD (policy ACCEPT)
>
> target     prot opt source               destination
>
> REJECT     all  --  anywhere             anywhere             PHYSDEV
> match ! --physdev-is-bridged reject-with icmp-host-prohibited
>
>
>
> Chain OUTPUT (policy ACCEPT)
>
> target     prot opt source               destination
>
>
>
> Host 3:
>
> -------
>
> [root at ovirt-node-03 ~]# iptables -L
>
> Chain INPUT (policy ACCEPT)
>
> target     prot opt source               destination
>
>
>
> Chain FORWARD (policy ACCEPT)
>
> target     prot opt source               destination
>
>
>
> Chain OUTPUT (policy ACCEPT)
>
> target     prot opt source               destination
>
>
>
>
>
> An example of my Gluster engine volume status (off host #2):
>
>
>
> [root at ovirt-node-02 ~]# gluster volume status
>
> Status of volume: engine
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
>
> ------------------------------------------------------------------------------
>
> Brick ovirt-node-02:/gluster_brick2/engine_
>
> brick                                       49217     0          Y
> 2973
>
> Brick ovirt-node-03:/gluster_brick3/engine_
>
> brick                                       N/A       N/A        N
> N/A
>
> Brick ovirt-node-02:/gluster_brick4/engine_
>
> brick                                       49218     0          Y
> 2988
>
> Brick ovirt-node-03:/gluster_brick5/engine_
>
> brick                                       N/A       N/A        N
> N/A
>
> NFS Server on localhost                     2049      0          Y
> 3007
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 3012
>
> NFS Server on ovirt-node-03                 2049      0          Y
> 1671
>
> Self-heal Daemon on ovirt-node-03           N/A       N/A        Y
> 1707
>
>
>
> I had changed the base port # per instructions found at
> http://www.ovirt.org/Features/Self_Hosted_Engine_Hyper_Converged_Gluster_Support
> :
>
> “By default gluster uses a port that vdsm also wants, so we need to change
> base-port setting avoiding the clash between the two daemons. We need to add
>
>
>
> option base-port 49217
>
> to /etc/glusterfs/glusterd.vol
>
>
>
> and ensure glusterd service is enabled and started before proceeding.”
>
>
>
> So I did that on all the hosts:
>
>
>
> [root at ovirt-node-02 ~]# cat /etc/glusterfs/glusterd.vol
>
> volume management
>
>     type mgmt/glusterd
>
>     option working-directory /var/lib/glusterd
>
>     option transport-type socket,rdma
>
>     option transport.socket.keepalive-time 10
>
>     option transport.socket.keepalive-interval 2
>
>    option transport.socket.read-fail-log off
>
>     option ping-timeout 30
>
> #   option base-port 49152
>
>     option base-port 49217
>
>     option rpc-auth-allow-insecure on
>
> end-volume
>
>
>
>
>
> Question: does oVirt really need IPtables to be enforcing rules, or can I
> just set everything wide open? If I can, how to specify that in setup?
>

hosted-engine-setup asks:
          iptables was detected on your computer, do you wish setup to
configure it? (Yes, No)[Yes]:

You have just to say no here.

If you say no it's completely up to you to configure it opening the
required ports or everything disabling it if you don't care.

The issue with gluster ports is that hosted-engine-setup simply configure
iptables for what it knows you'll need and on 3.6 it's always assuming that
the gluster volume is served by external hosts.

>
>
>
> W.
>
>
>
>
>
> *From:* Sahina Bose [mailto:sabose at redhat.com]
> *Sent:* Tuesday, December 22, 2015 9:19 AM
> *To:* Will Dennis; Simone Tiraboschi; Dan Kenigsberg
>
> *Subject:* Re: [ovirt-users] Hosted Engine crash - state =
> EngineUp-EngineUpBadHealth
>
>
>
>
>
> On 12/22/2015 07:47 PM, Sahina Bose wrote:
>
>
>
> On 12/22/2015 07:28 PM, Will Dennis wrote:
>
> See attached for requested log files
>
>
> From gluster logs
>
> [2015-12-22 00:40:53.501341] W [MSGID: 108001]
> [afr-common.c:3924:afr_notify] 0-engine-replicate-1: Client-quorum is not
> met
> [2015-12-22 00:40:53.502288] W [socket.c:588:__socket_rwv]
> 0-engine-client-2: readv on 138.15.200.93:49217 failed (No data available)
>
> [2015-12-22 00:41:17.667302] W [fuse-bridge.c:2292:fuse_writev_cbk]
> 0-glusterfs-fuse: 3875597: WRITE => -1 (Read-only file system)
>
> Could you check if the gluster ports are open on all nodes?
>
>
> It's possible you ran into this ? -
> https://bugzilla.redhat.com/show_bug.cgi?id=1288979
>
>
>
>
>
>
>
> *From:* Sahina Bose [mailto:sabose at redhat.com <sabose at redhat.com>]
> *Sent:* Tuesday, December 22, 2015 4:59 AM
> *To:* Simone Tiraboschi; Will Dennis; Dan Kenigsberg
> *Cc:* users
> *Subject:* Re: [ovirt-users] Hosted Engine crash - state =
> EngineUp-EngineUpBadHealth
>
>
>
>
>
> On 12/22/2015 02:38 PM, Simone Tiraboschi wrote:
>
>
>
>
>
> On Tue, Dec 22, 2015 at 2:31 AM, Will Dennis <wdennis at nec-labs.com> wrote:
>
> OK, another problem :(
>
> I was having the same problem with my second oVirt host that I had with my
> first one, where when I ran “hosted-engine —deploy” on it, after it
> completed successfully, then I was experiencing a ~50sec lag when SSH’ing
> into the node…
>
> vpnp71:~ will$ time ssh root at ovirt-node-02 uptime
>  19:36:06 up 4 days,  8:31,  0 users,  load average: 0.68, 0.70, 0.67
>
> real  0m50.540s
> user  0m0.025s
> sys 0m0.008s
>
>
> So, in the oVirt web admin console, I put the "ovirt-node-02” node into
> Maintenance mode, then SSH’d to the server and rebooted it. Sure enough,
> after the server came back up, SSH was fine (no delay), which again was the
> same experience I had had with the first oVirt host. So, I went back to the
> web console, and choose the “Confirm host has been rebooted” option, which
> I thought would be the right action to take after a reboot. The system
> opened a dialog box with a spinner, which never stopped spinning… So
> finally, I closed the dialog box with the upper right (X) symbol, and then
> for this same host choose “Activate” from the menu. It was then I noticed I
> had recieved a state transition email notifying me that
> "EngineUp-EngineUpBadHealth” and sure enough, the web UI was then
> unresponsive. I checked on the first oVirt host, the VM with the name
> “HostedEngine” is still running, but obviously isn’t working…
>
> So, looks like I need to restart the HostedEngine VM or take whatever
> action is needed to return oVirt to operation… Hate to keep asking this
> question, but what’s the correct action at this point?
>
>
>
> ovirt-ha-agent should always restart it for you after a few minutes but
> the point is that the network configuration seams to be not that stable.
>
>
>
> I know from another thread that you are trying to deploy hosted-engine
> over GlusterFS in an hyperconverged way and this, as I said, is currently
> not supported.
>
> I think that it can also requires some specific configuration on network
> side.
>
>
> For hyperconverged gluster+engine , it should work without any specific
> configuration on network side. However if the network is flaky, it is
> possible that there are errors with gluster volume access. Could you
> provide the ovirt-ha-agent logs as well as gluster mount logs?
>
>
>
>
> Adding Sahina and Dan here.
>
>
>
> Thanks, again,
> Will
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20151222/2fff4281/attachment-0001.html>


More information about the Users mailing list