I ended the process.... but I think there must be a global revision in the
architecture.... tooo intricate and definitely not consumer ready
This is what I did:
-1. in my environment I have pacemaker cluster between nodes (as to overcome gluster I
tried to implement also an ha nfs, and one node is my "software and helping
repository" (of course using vdo to compress and deduplicate) so it have a cluster of
3 SATA 10TB disk to have some "space" published with linux iSCSI, all mapped on
the management 10Gb/s interface (also the VMs vlan are on the same phisical interface,
and also the iSCSI initiator for an external storage, so, as every node have to have its
address the "datacenter" complain about the fact that the network isn't
synced... (!) )
0. a subset of the virtual machine moved to a node that would become -temporarly- an
orphan of the old cluster
1. put the cluster is in global maintenance and hosted-engine stopped
2. on the node (free of VMs) where I would like to deploy the new engine I issued:
ovirt-hosted-engine-cleanup (as stated in
https://access.redhat.com/solutions/6529691) and
verified that the "sanlock client status" left the storages unlocked
3. Deployied the new hosted-engine (hosted-engine --deploy) with time consuming on the new
nfs external storage
4. logged in the new engine and defined manually all the networks
5. on the other nodes, stopped the virtual machines and unlocked the sanlock domains and
manually umount all the mounted things on /rhev/.* (to stop machines virsh is you friend)
virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf
to check what storage is belonging to what VMs, the commands are:
list - to see what VMs is on or paused
domblklist --domain <name> - to see in what place in the local filesystem is mapped
to the virtual qemu disks
shutdown <name> --mode acpi - to stop gracefully th VM
destroy <name> - to stop forcefully the "hanged" (!) VM
6. on every survived orphan node you have to unlock the domain you want to import (sanlock
command is your friend but I didn't found the command to unlock only one storage at a
time so I have simply stopped the daemon "sanlock client shutdown -f 1") and
then umount the mountpoint "umount /rhev/[....]"
7. that is the "magic": on the new engine under StorageDomain>Storage you can
select "import domain" and define the domain you want to import, oVirt recognize
the domain as initialized domain and warning you that you can loose data... finger cross,
acnolewdge an go on
8. selecting the domain you find new "tabs": "{VM|Template|Disk}
Import" under, you can import the object follow instructions from oVirt and
addressing network warnings (probably the MAC address of the old VMs are OUT the new
ionterval.
Hint and things:
So your -basic- cluster will be up and running again.
Obviously I didn't address the restore of other elements of oVirt environment..
probably someone can write a script that starting from a backup of the engine you can
selectively restore the object (without ansible please)
VM with pending snapshot are probably not imported and you have to "commit" the
disk under the domain object "qemu-img commit <file of the snapshot>" and
then you can select "scan disks" under storage domain page and then continue the
import
this method give "a sort of" control while the method -that should work under
normal circumstances- give you only frustration and offline of the services.
Hope this note is useful to someone and would be welcomed by the developers that try to
make things consistent. Thank you for your work.
Diego