
HI: Thanks. great work . Why not update this to wiki page? 2014-04-25 21:08 GMT+08:00 Daniel Helgenberger <daniel.helgenberger@m-box.de
:
Hello ovirt-users,
after playing around with my ovirt 3.4 hosted engine two node HA cluster I have devised a procedure on how to restart the whole cluster after a power loss / normal shutdown. This assumes all HA-Nodes have been taken offline. This also applies partly to rebooted HA nodes.
Please feel free do ask questions and/or comment on improvements. Most of the things should be obsoleted by future updates anyway.
Note 1: The problem IMHO seems to be the non connected nfs storage domain, resulting in the HA-Agent crash / hang. The ha-broker service should be up and running all the time. Please check this.
Note 2: My setup consists of two nodes; 'all nodes' means the task has to be performed on every node HA node in the cluster.
Node 3: By 'Login' I mean SSH or local access.
Part A: SHUTDOWN THE CLUSTER Prerequisite: oVirt HE cluster running, should be taken offline for maintenance: 1. In oVirt, shutdown all VM's except HostedEngine. 2. Login to one cluster node and run 'hosted-engine --set-maintenance --mode=global' to put the cluster into global maintenance 3. Login to ovirt engine VM and shut it down with 'shutdown -h now' 4. Login to one cluster node and run 'hosted-engine --vm-status' to check if the engine is really down. 5. Shutdown all HA nodes subsequently.
Part B: STARTING THE CLUSTER Prerequisite: oVirt HE cluster down, NFS storage server running and exporting the vdsm share. 1. Start all nodes and wait for them to boot up. 2. Login to one cluster node. Check the status of the following services: vdsm, ovirt-ha-agent, ovirt-ha-broker. The status should be all are running except ovirt-ha-agent is in 'locked' state and down. 3. Check 'hosted-engine --vm-status', this should result in a python stack trace (crash). 4. On all cluster nodes, connect the storage pool: 'hosted-engine --connect-storage'. Now, 'hosted-engine --vm-status' runs and reports 'up to date: False' and 'unknown-stale-data' for all nodes. 5. On all cluster nodes, start the 'ovirt-ha-agent' service: 'service ovirt-ha-agent start' 6. Wait a few minutes for the ha-broker and the agent to collect the cluster state. 7. Login to one cluster node. Check 'hosted-engine --vm-status' until you have cluster nodes 'status-up-to-date: True' and 'score: 2400' 8. If the cluster was shutdown by yourself and in global maintenance, remove the maintenance mode with 'hosted-engine --set-maintenance --mode=none'. Now, the system should do a FSM reinitialize and start the HostedEngine by itself.¹ If it was not in maintenance (eg. power fail) the engine should be started as soon as one host gets a score of 2400.
Part C: STARTING A SINGLE NODE Prerequisite: oVirt HE cluster up, HostedEngine running. One ha node was taken offline by local maintenance in oVirt and rebooted. 1. Follow steps 1-5 of Part B 2. In oVirt, navigate to Cluster, Hosts and activate the node previously in maintenance.
--- 1 I observed the following things: * If you use the command 'hosted-engine --vm-shutdown' instead of loging in to the ovirt HE and do a local shutdown, the Default Data Center is set to non - responsive and being Contented after the reboot. I highly suspect an unclean shutdown by running the command. Further, it waits about two min. with the shutdown. * If you use the command 'hosted-engine --vm-start' on a cluster in global maintenance, wait for successful start ({'health': 'good', 'vm': 'up', 'detail': 'up'}) and remove the maintenance status, the engine gets restarted once. By removing the maintenance first and letting ha-agent do the work, the engine is not restarted.
Cheers, Daniel --
Daniel Helgenberger m box bewegtbild GmbH
P: +49/30/2408781-22 F: +49/30/2408781-10
ACKERSTR. 19 D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner Handeslregister: Amtsgericht Charlottenburg / HRB 112767
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- 独立之思想,自由之精神。 --陈寅恪