<div dir="ltr">HI:<div> Thanks. </div><div> great work .</div><div><br></div><div> Why not update this to wiki page?</div><div><br></div><div> </div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-25 21:08 GMT+08:00 Daniel Helgenberger <span dir="ltr"><<a href="mailto:daniel.helgenberger@m-box.de" target="_blank">daniel.helgenberger@m-box.de</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello ovirt-users,<br>
<br>
after playing around with my ovirt 3.4 hosted engine two node HA cluster<br>
I have devised a procedure on how to restart the whole cluster after a<br>
power loss / normal shutdown. This assumes all HA-Nodes have been taken<br>
offline. This also applies partly to rebooted HA nodes.<br>
<br>
Please feel free do ask questions and/or comment on improvements. Most<br>
of the things should be obsoleted by future updates anyway.<br>
<br>
Note 1:<br>
The problem IMHO seems to be the non connected nfs storage domain,<br>
resulting in the HA-Agent crash / hang. The ha-broker service should be<br>
up and running all the time. Please check this.<br>
<br>
Note 2:<br>
My setup consists of two nodes; 'all nodes' means the task has to be<br>
performed on every node HA node in the cluster.<br>
<br>
Node 3:<br>
By 'Login' I mean SSH or local access.<br>
<br>
<br>
Part A: SHUTDOWN THE CLUSTER<br>
Prerequisite: oVirt HE cluster running, should be taken offline for<br>
maintenance:<br>
1. In oVirt, shutdown all VM's except HostedEngine.<br>
2. Login to one cluster node and run 'hosted-engine<br>
--set-maintenance --mode=global' to put the cluster into global<br>
maintenance<br>
3. Login to ovirt engine VM and shut it down with 'shutdown -h now'<br>
4. Login to one cluster node and run 'hosted-engine --vm-status' to<br>
check if the engine is really down.<br>
5. Shutdown all HA nodes subsequently.<br>
<br>
<br>
Part B: STARTING THE CLUSTER<br>
Prerequisite: oVirt HE cluster down, NFS storage server running and<br>
exporting the vdsm share.<br>
1. Start all nodes and wait for them to boot up.<br>
2. Login to one cluster node. Check the status of the following<br>
services: vdsm, ovirt-ha-agent, ovirt-ha-broker. The status<br>
should be all are running except ovirt-ha-agent is in 'locked'<br>
state and down.<br>
3. Check 'hosted-engine --vm-status', this should result in a<br>
python stack trace (crash).<br>
4. On all cluster nodes, connect the storage pool: 'hosted-engine<br>
--connect-storage'. Now, 'hosted-engine --vm-status' runs and<br>
reports 'up to date: False' and 'unknown-stale-data' for all<br>
nodes.<br>
5. On all cluster nodes, start the 'ovirt-ha-agent' service:<br>
'service ovirt-ha-agent start'<br>
6. Wait a few minutes for the ha-broker and the agent to collect<br>
the cluster state.<br>
7. Login to one cluster node. Check 'hosted-engine --vm-status'<br>
until you have cluster nodes 'status-up-to-date: True' and<br>
'score: 2400'<br>
8. If the cluster was shutdown by yourself and in global<br>
maintenance, remove the maintenance mode with 'hosted-engine<br>
--set-maintenance --mode=none'. Now, the system should do a FSM<br>
reinitialize and start the HostedEngine by itself.¹ If it was<br>
not in maintenance (eg. power fail) the engine should be started<br>
as soon as one host gets a score of 2400.<br>
<br>
<br>
Part C: STARTING A SINGLE NODE<br>
Prerequisite: oVirt HE cluster up, HostedEngine running. One ha node was<br>
taken offline by local maintenance in oVirt and rebooted.<br>
1. Follow steps 1-5 of Part B<br>
2. In oVirt, navigate to Cluster, Hosts and activate the node<br>
previously in maintenance.<br>
<br>
---<br>
1 I observed the following things:<br>
* If you use the command 'hosted-engine --vm-shutdown' instead of<br>
loging in to the ovirt HE and do a local shutdown, the Default<br>
Data Center is set to non - responsive and being Contented after<br>
the reboot. I highly suspect an unclean shutdown by running the<br>
command. Further, it waits about two min. with the shutdown.<br>
* If you use the command 'hosted-engine --vm-start' on a cluster<br>
in global maintenance, wait for successful start ({'health':<br>
'good', 'vm': 'up', 'detail': 'up'}) and remove the maintenance<br>
status, the engine gets restarted once. By removing the<br>
maintenance first and letting ha-agent do the work, the engine<br>
is not restarted.<br>
<br>
<br>
Cheers,<br>
Daniel<br>
--<br>
<br>
Daniel Helgenberger<br>
m box bewegtbild GmbH<br>
<br>
P: +49/30/2408781-22<br>
F: +49/30/2408781-10<br>
<br>
ACKERSTR. 19<br>
D-10115 BERLIN<br>
<br>
<br>
<a href="http://www.m-box.de" target="_blank">www.m-box.de</a> <a href="http://www.monkeymen.tv" target="_blank">www.monkeymen.tv</a><br>
<br>
Geschäftsführer: Martin Retschitzegger / Michaela Göllner<br>
Handeslregister: Amtsgericht Charlottenburg / HRB 112767<br>
<br>_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>独立之思想,自由之精神。<br> --陈寅恪
</div>