[ovirt-users] oVirt 3.4 - Hosted Engine: Cluster Reboot procedure

Daniel Helgenberger daniel.helgenberger at m-box.de
Fri Apr 25 09:08:45 EDT 2014


Hello ovirt-users,

after playing around with my ovirt 3.4 hosted engine two node HA cluster
I have devised a procedure on how to restart the whole cluster after a
power loss / normal shutdown. This assumes all HA-Nodes have been taken
offline. This also applies partly to rebooted HA nodes.

Please feel free do ask questions and/or comment on improvements. Most
of the things should be obsoleted by future updates anyway.

Note 1:
The problem IMHO seems to be the non connected nfs storage domain,
resulting in the HA-Agent crash / hang. The ha-broker service should be
up and running all the time. Please check this.

Note 2: 
My setup consists of two nodes; 'all nodes' means the task has to be
performed on every node HA node in the cluster.

Node 3:
By 'Login' I mean SSH or local access.


Part A: SHUTDOWN THE CLUSTER
Prerequisite: oVirt HE cluster running, should be taken offline for
maintenance:
     1. In oVirt, shutdown all VM's except HostedEngine.
     2. Login to one cluster node and run 'hosted-engine
        --set-maintenance --mode=global' to put the cluster into global
        maintenance
     3. Login to ovirt engine VM and shut it down with 'shutdown -h now'
     4. Login to one cluster node and run 'hosted-engine --vm-status' to
        check if the engine is really down. 
     5. Shutdown all HA nodes subsequently.


Part B: STARTING THE CLUSTER
Prerequisite: oVirt HE cluster down, NFS storage server running and
exporting the vdsm share.
     1. Start all nodes and wait for them to boot up.
     2. Login to one cluster node. Check the status of the following
        services: vdsm, ovirt-ha-agent, ovirt-ha-broker. The status
        should be all are running except ovirt-ha-agent is in 'locked'
        state and down.
     3. Check 'hosted-engine --vm-status', this should result in a
        python stack trace (crash).
     4. On all cluster nodes, connect the storage pool: 'hosted-engine
        --connect-storage'. Now, 'hosted-engine --vm-status' runs and
        reports 'up to date: False' and 'unknown-stale-data' for all
        nodes.
     5. On all cluster nodes, start the 'ovirt-ha-agent' service:
        'service ovirt-ha-agent start'
     6. Wait a few minutes for the ha-broker and the agent to collect
        the cluster state.
     7. Login to one cluster node. Check 'hosted-engine --vm-status'
        until you have cluster nodes 'status-up-to-date: True' and
        'score: 2400'
     8. If the cluster was shutdown by yourself and in global
        maintenance, remove the maintenance mode with 'hosted-engine
        --set-maintenance --mode=none'. Now, the system should do a FSM
        reinitialize and start the HostedEngine by itself.¹ If it was
        not in maintenance (eg. power fail) the engine should be started
        as soon as one host gets a score of 2400.


Part C: STARTING A SINGLE NODE
Prerequisite: oVirt HE cluster up, HostedEngine running. One ha node was
taken offline by local maintenance in oVirt and rebooted.
     1. Follow steps 1-5 of Part B
     2. In oVirt, navigate to Cluster, Hosts and activate the node
        previously in maintenance.

---
1 I observed the following things:
      * If you use the command 'hosted-engine --vm-shutdown' instead of
        loging in to the ovirt HE and do a local shutdown, the Default
        Data Center is set to non - responsive and being Contented after
        the reboot. I highly suspect an unclean shutdown by running the
        command. Further, it waits about two min. with the shutdown.
      * If you use the command 'hosted-engine --vm-start' on a cluster
        in global maintenance, wait for successful start ({'health':
        'good', 'vm': 'up', 'detail': 'up'}) and remove the maintenance
        status, the engine gets restarted once. By removing the
        maintenance first and letting ha-agent do the work, the engine
        is not restarted.


Cheers,
Daniel
-- 

Daniel Helgenberger 
m box bewegtbild GmbH 

P: +49/30/2408781-22
F: +49/30/2408781-10

ACKERSTR. 19 
D-10115 BERLIN 


www.m-box.de  www.monkeymen.tv 

Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4315 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140425/84e0dcb4/attachment.bin>


More information about the Users mailing list