[ovirt-users] [Users] Migrate cluster 3.3 -> 3.4 hosted on existing hosts

Fri May 23 18:40:50 UTC 2014

On 4/2/2014 1:58 AM, Yedidyah Bar David wrote:
> ----- Original Message -----
>> From: "Ted Miller" <tmiller at hcjb.org>
>> To: "users" <users at ovirt.org>
>> Sent: Tuesday, April 1, 2014 10:40:38 PM
>> Subject: [Users] Migrate cluster 3.3 -> 3.4 hosted on existing hosts
>>
>> Current setup:
>>      * 3 identical hosts running on HP GL180 g5 servers
>>          * gluster running 5 volumes in replica 3
>>      * engine running on VMWare Server on another computer (that computer is
>>      NOT available to convert to a host)
>>
>> Where I want to end up:
>>      * 3 identical hosted-engine hosts running on HP GL180 g5 servers
>>          * gluster running 6 volumes in replica 3
>>              * new volume will be nfs storage for engine VM
>>      * hosted engine in oVirt VM
>>      * as few changes to current setup as possible
>>
>> The two pages I found on the wiki are: Hosted Engine Howto and Migrate to
>> Hosted Engine . Both were written during the testing process, and have not
>> been updated to reflect production status. I don't know if anything in the
>> process has changed since they were written.
> Basically things remained the same, with some details changing perhaps.
>
>> Process outlined in above two pages (as I understand it):
>>
>> have nfs file store ready to hold VM
>>
>> Do minimal install (not clear if ovirt node, Centos, or Fedora was used--I am
>> Centos-based)
> Fedora/Centos/RHEL are supposed to work. ovirt node is currently not
> supported - iirc it's planned to be supported soon, not sure.
>
>> # yum install ovirt-hosted-engine-setup
>> # hosted-engine --deploy
>>
>>
>> Install OS on VM
>>
>>
>> return to host console
>>
>>
>> at "Please install the engine in the VM" prompt on host
>>
>>
>> on VM console
>> # yum install ovirt-engine
>>
>>
>> on old engine:
>> service ovirt-engine stop
>> chkconfig ovirt-engine off
>>
>> set up dns for new engine
>>
>>
>> # engine-backup --mode=backup --file=backup1 --log=backup1.log
>> scp backup file to new engine VM
>>
>>
>> on new VM:
> Please see [1]. Specifically, if you had a local db, you'll first have
> to create it yourself.
>
> [1] http://www.ovirt.org/Ovirt-engine-backup#Howto
>
>> # engine-backup --mode=restore --file=backup1 --log=backup1-restore.log
>> --change-db-credentials --db-host=didi-lap --db-user=engine --db-password
>> --db-name=engine
> The above assumes a db was already created and ready to use (access etc)
> using the supplied credentials. You'll naturally have to provide your own.
>
>> # engine-setup
>>
>> on host:
>> run script until: "The system will wait until the VM is down."
>>
>> on new VM:
>> # reboot
>>
>> on Host: finish script
>> My questions:
>>
>> 1. Is the above still the recommended way to do a hosted-engine install?
> Yes.
>
>> 2. Will it blow up at me if I use my existing host (with glusterfs all set
>> up, etc) as the starting point, instead of a clean install?
> a. Probably yes, for now. I did not hear much about testing such a migration
> using an existing host - ovirt or gluster or both. I did not test that myself
> either.
>
> If at all possible, you should use a new clean host. Do plan well and test.
>
> Also see discussions on the mailing lists, e.g. this one:
>
> http://lists.ovirt.org/pipermail/users/2014-March/thread.html#22441
>
> Good luck, and please report back!
I have good news and bad news.

I migrated the 3 host cluster from 3.4 to 3.4 hosted.  The process went 
fairly smoothly.  Engine ran, I was able to add the three hosts to the 
engine's domain, etc.  That was all working about Thursday. (I did not get 
fencing set up).

Friday, at the end of the day, I shut down the entire system (it is not yet 
in production) because I was leaving for a week's vacation/holiday.  I am 
fairly certain that I put the system into global maintenance mode before 
shutting down.  I know I shut down the engine before shutting down the hosts.

Monday (10 days later) I came back from vacation and powered up the three 
machines.  The hosts came up fine, but the engine will not start.  (I found 
some gluster split-brain errors, and chased that for a couple of days, until 
I realized that the split-brain was not the fundamental problem.)

During bootup /var/log/messages shows:

May 21 19:22:00 s2 ovirt-ha-broker mgmt_bridge.MgmtBridge ERROR Failed to getVdsCapabilities: VDSM initialization timeout
May 21 19:22:00 s2 ovirt-ha-broker mem_free.MemFree ERROR Failed to getVdsStats: VDSM initialization timeout
May 21 19:22:00 s2 ovirt-ha-broker cpu_load_no_engine.EngineHealth ERROR Failed to getVmStats: VDSM initialization timeout
May 21 19:22:00 s2 ovirt-ha-broker engine_health.CpuLoadNoEngine ERROR Failed to getVmStats: VDSM initialization timeout
May 21 19:22:03 s2 vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed.

and then /var/log/ovirt-hosted-engine-ha/agent.log shows:

MainThread::ERROR::2014-05-21 19:22:04,198::hosted_engine::414::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed trying to connect storage:
MainThread::CRITICAL::2014-05-21 19:22:04,199::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent
Traceback (most recent call last):
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 97, in run self._run_agent()
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 299, in start_monitoring self._initialize_vdsm()
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 415, in _initialize_vdsm
     raise Exception("Failed trying to connect storage")
Exception: Failed trying to connect storage

I can manually mount and start storage, but then my 
sanlock/gluster/split-brain bites me, and there are no instructions out there 
on how to recover from this situation.

I am starting over the "sanitary" way.  The only thing getting reused is the 
gluster file system.  I will migrate one host at a time, and will get the 
gluster system running under the hosts as I go, adding the host back into the 
system as I go along.

You were right, :(
Ted Miller
Elkhart, IN, USA

P.S. I hope that the install scripts can eventually be modified to migrate 
existing hosts, but I understand that first you must have it stable from a 
clean install.