[ovirt-users] oVirt 3.4 - Hosted Engine: Cluster Reboot procedure

适兕 lijiangsheng1 at gmail.com
Fri Apr 25 21:30:52 EDT 2014


HI:
   Thanks.
   great work .

   Why not update this to wiki page?




2014-04-25 21:08 GMT+08:00 Daniel Helgenberger <daniel.helgenberger at m-box.de
>:

> Hello ovirt-users,
>
> after playing around with my ovirt 3.4 hosted engine two node HA cluster
> I have devised a procedure on how to restart the whole cluster after a
> power loss / normal shutdown. This assumes all HA-Nodes have been taken
> offline. This also applies partly to rebooted HA nodes.
>
> Please feel free do ask questions and/or comment on improvements. Most
> of the things should be obsoleted by future updates anyway.
>
> Note 1:
> The problem IMHO seems to be the non connected nfs storage domain,
> resulting in the HA-Agent crash / hang. The ha-broker service should be
> up and running all the time. Please check this.
>
> Note 2:
> My setup consists of two nodes; 'all nodes' means the task has to be
> performed on every node HA node in the cluster.
>
> Node 3:
> By 'Login' I mean SSH or local access.
>
>
> Part A: SHUTDOWN THE CLUSTER
> Prerequisite: oVirt HE cluster running, should be taken offline for
> maintenance:
>      1. In oVirt, shutdown all VM's except HostedEngine.
>      2. Login to one cluster node and run 'hosted-engine
>         --set-maintenance --mode=global' to put the cluster into global
>         maintenance
>      3. Login to ovirt engine VM and shut it down with 'shutdown -h now'
>      4. Login to one cluster node and run 'hosted-engine --vm-status' to
>         check if the engine is really down.
>      5. Shutdown all HA nodes subsequently.
>
>
> Part B: STARTING THE CLUSTER
> Prerequisite: oVirt HE cluster down, NFS storage server running and
> exporting the vdsm share.
>      1. Start all nodes and wait for them to boot up.
>      2. Login to one cluster node. Check the status of the following
>         services: vdsm, ovirt-ha-agent, ovirt-ha-broker. The status
>         should be all are running except ovirt-ha-agent is in 'locked'
>         state and down.
>      3. Check 'hosted-engine --vm-status', this should result in a
>         python stack trace (crash).
>      4. On all cluster nodes, connect the storage pool: 'hosted-engine
>         --connect-storage'. Now, 'hosted-engine --vm-status' runs and
>         reports 'up to date: False' and 'unknown-stale-data' for all
>         nodes.
>      5. On all cluster nodes, start the 'ovirt-ha-agent' service:
>         'service ovirt-ha-agent start'
>      6. Wait a few minutes for the ha-broker and the agent to collect
>         the cluster state.
>      7. Login to one cluster node. Check 'hosted-engine --vm-status'
>         until you have cluster nodes 'status-up-to-date: True' and
>         'score: 2400'
>      8. If the cluster was shutdown by yourself and in global
>         maintenance, remove the maintenance mode with 'hosted-engine
>         --set-maintenance --mode=none'. Now, the system should do a FSM
>         reinitialize and start the HostedEngine by itself.¹ If it was
>         not in maintenance (eg. power fail) the engine should be started
>         as soon as one host gets a score of 2400.
>
>
> Part C: STARTING A SINGLE NODE
> Prerequisite: oVirt HE cluster up, HostedEngine running. One ha node was
> taken offline by local maintenance in oVirt and rebooted.
>      1. Follow steps 1-5 of Part B
>      2. In oVirt, navigate to Cluster, Hosts and activate the node
>         previously in maintenance.
>
> ---
> 1 I observed the following things:
>       * If you use the command 'hosted-engine --vm-shutdown' instead of
>         loging in to the ovirt HE and do a local shutdown, the Default
>         Data Center is set to non - responsive and being Contented after
>         the reboot. I highly suspect an unclean shutdown by running the
>         command. Further, it waits about two min. with the shutdown.
>       * If you use the command 'hosted-engine --vm-start' on a cluster
>         in global maintenance, wait for successful start ({'health':
>         'good', 'vm': 'up', 'detail': 'up'}) and remove the maintenance
>         status, the engine gets restarted once. By removing the
>         maintenance first and letting ha-agent do the work, the engine
>         is not restarted.
>
>
> Cheers,
> Daniel
> --
>
> Daniel Helgenberger
> m box bewegtbild GmbH
>
> P: +49/30/2408781-22
> F: +49/30/2408781-10
>
> ACKERSTR. 19
> D-10115 BERLIN
>
>
> www.m-box.de  www.monkeymen.tv
>
> Geschäftsführer: Martin Retschitzegger / Michaela Göllner
> Handeslregister: Amtsgericht Charlottenburg / HRB 112767
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>


-- 
独立之思想,自由之精神。
                        --陈寅恪
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140426/e15ba866/attachment-0001.html>


More information about the Users mailing list