I think I found the answer to glusterd not starting.
https://bugzilla.redhat.com/show_bug.cgi?id=1472267
Apparently the version of gluster (3.12.15) that comes packaged with
ovirt-node 4.2.8 has a known issue where gluster tries to come up before
networking, fails, and crashes. This was fixed in gluster 3.13.0
(apparently). Do devs paruse this list? Any chance someone who can update
the gluster package might read this?
On Mon, Feb 4, 2019 at 2:38 AM Simone Tiraboschi <stirabos(a)redhat.com>
wrote:
On Sat, Feb 2, 2019 at 7:32 PM feral <blistovmhz(a)gmail.com> wrote:
> How is an oVirt hyperconverged cluster supposed to come back to life
> after a power outage to all 3 nodes?
>
> Running ovirt-node (ovirt-node-ng-installer-4.2.0-2019013006.el7.iso) to
> get things going, but I've run into multiple issues.
>
> 1. During the gluster setup, the volume sizes I specify, are not
> reflected in the deployment configuration. The auto-populated values are
> used every time. I manually hacked on the config to get the volume sizes
> correct. I also noticed if I create the deployment config with "sdb" by
> accident, but click back and change it to "vdb", again, the changes are
not
> reflected in the config.
> My deployment config does seem to work. All volumes are created (though
> the xfs options used don't make sense as you end up with stripe sizes that
> aren't a multiple of the block size).
> Once gluster is deployed, I deploy the hosted engine, and everything
> works.
>
> 2. Reboot all nodes. I was testing for power outage response. All nodes
> come up, but glusterd is not running (seems to have failed for some
> reason). I can manually restart glusterd on all nodes and it comes up and
> starts communicating normally. However, the engine does not come online. So
> I figure out where it last lived, and try to start it manually through the
> web interface. This fails because vdsm-ovirtmgmt is not up. I figured out
> the correct way to start up the engine would be through the cli via
> hosted-engine --vm-start.
>
This is not required at all.
Are you sure that your cluster is not set in global maintenance mode?
Can you please share /var/log/ovirt-hosted-engine-ha/agent.log and
broker.log from your hosts?
> This does work, but it takes a very long time, and it usually starts up
> on any node other than the one I told it to start on.
>
> So I guess two (or three) questions. What is the expected operation after
> a full cluster reboot (ie: in the event of a power failure)? Why doesn't
> the engine start automatically, and what might be causing glusterd to fail,
> when it can be restarted manually and works fine?
>
> --
> _____
> Fact:
> 1. Ninjas are mammals.
> 2. Ninjas fight ALL the time.
> 3. The purpose of the ninja is to flip out and kill people.
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RIADNRZRXTP...
>
--
_____
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.