HI:
Thanks.
great work .
Why not update this to wiki page?
2014-04-25 21:08 GMT+08:00 Daniel Helgenberger <daniel.helgenberger(a)m-box.de
Hello ovirt-users,
after playing around with my ovirt 3.4 hosted engine two node HA cluster
I have devised a procedure on how to restart the whole cluster after a
power loss / normal shutdown. This assumes all HA-Nodes have been taken
offline. This also applies partly to rebooted HA nodes.
Please feel free do ask questions and/or comment on improvements. Most
of the things should be obsoleted by future updates anyway.
Note 1:
The problem IMHO seems to be the non connected nfs storage domain,
resulting in the HA-Agent crash / hang. The ha-broker service should be
up and running all the time. Please check this.
Note 2:
My setup consists of two nodes; 'all nodes' means the task has to be
performed on every node HA node in the cluster.
Node 3:
By 'Login' I mean SSH or local access.
Part A: SHUTDOWN THE CLUSTER
Prerequisite: oVirt HE cluster running, should be taken offline for
maintenance:
1. In oVirt, shutdown all VM's except HostedEngine.
2. Login to one cluster node and run 'hosted-engine
--set-maintenance --mode=global' to put the cluster into global
maintenance
3. Login to ovirt engine VM and shut it down with 'shutdown -h now'
4. Login to one cluster node and run 'hosted-engine --vm-status' to
check if the engine is really down.
5. Shutdown all HA nodes subsequently.
Part B: STARTING THE CLUSTER
Prerequisite: oVirt HE cluster down, NFS storage server running and
exporting the vdsm share.
1. Start all nodes and wait for them to boot up.
2. Login to one cluster node. Check the status of the following
services: vdsm, ovirt-ha-agent, ovirt-ha-broker. The status
should be all are running except ovirt-ha-agent is in 'locked'
state and down.
3. Check 'hosted-engine --vm-status', this should result in a
python stack trace (crash).
4. On all cluster nodes, connect the storage pool: 'hosted-engine
--connect-storage'. Now, 'hosted-engine --vm-status' runs and
reports 'up to date: False' and 'unknown-stale-data' for all
nodes.
5. On all cluster nodes, start the 'ovirt-ha-agent' service:
'service ovirt-ha-agent start'
6. Wait a few minutes for the ha-broker and the agent to collect
the cluster state.
7. Login to one cluster node. Check 'hosted-engine --vm-status'
until you have cluster nodes 'status-up-to-date: True' and
'score: 2400'
8. If the cluster was shutdown by yourself and in global
maintenance, remove the maintenance mode with 'hosted-engine
--set-maintenance --mode=none'. Now, the system should do a FSM
reinitialize and start the HostedEngine by itself.¹ If it was
not in maintenance (eg. power fail) the engine should be started
as soon as one host gets a score of 2400.
Part C: STARTING A SINGLE NODE
Prerequisite: oVirt HE cluster up, HostedEngine running. One ha node was
taken offline by local maintenance in oVirt and rebooted.
1. Follow steps 1-5 of Part B
2. In oVirt, navigate to Cluster, Hosts and activate the node
previously in maintenance.
---
1 I observed the following things:
* If you use the command 'hosted-engine --vm-shutdown' instead of
loging in to the ovirt HE and do a local shutdown, the Default
Data Center is set to non - responsive and being Contented after
the reboot. I highly suspect an unclean shutdown by running the
command. Further, it waits about two min. with the shutdown.
* If you use the command 'hosted-engine --vm-start' on a cluster
in global maintenance, wait for successful start ({'health':
'good', 'vm': 'up', 'detail': 'up'}) and
remove the maintenance
status, the engine gets restarted once. By removing the
maintenance first and letting ha-agent do the work, the engine
is not restarted.
Cheers,
Daniel
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users