4.5.4 Hosted-Engine: change hosted-engine storage

ovirt-engine-appliance-4.5-20221206133948 Hello, I have some trouble with my Gluster instance where there is hosted-engine. I would like to copy data from that hosted engine and reverse to another hosted-engine storage (I will try NFS). I think the main method is to put ovirt in global-management mode, stop the hosted-engine instance, unmount the hosted-engine storage systemctl stop {vdsmd;supervdsmd;ovirt-ha-broker;ovirt-ha-agent}; hosted-engine --disconnect-storage umount /rhev/data-center/mnt/glusterSD/localhost:_glen/ and then redeploy the hosted engine with the current backup: method 1: [doesn't work; it fail the ansible script after the pause] ovirt-hosted-engine-setup --4 --config-append=/var/lib/ovirt-hosted-engine-setup/answers/answers-20230116184834.conf --restore-from-file=230116-backup.tar.gz --ansible-extra-vars=he_pause_before_engine_setup=true but it doesn't work method 2: [I cannot find the glitch] I saw in every node there is a configuration under /etc/ovirt-hosted-engine/hosted-engine.conf where there are some interessant entry: vm_disk_id=0a1a501c-fc45-430f-bfd3-076172cec406 vm_disk_vol_id=f65dab86-67f1-46fa-87c0-f9076f479741 storage=localhost:/glen domainType=glusterfs metadata_volume_UUID=4e64e155-ee11-4fdc-b9a0-a7cbee5e4181 metadata_image_UUID=20234090-ea40-4614-ae95-8f91b339ba3e lockspace_volume_UUID=6a975f46-4126-4c2a-b444-6e5a34872cf6 lockspace_image_UUID=893a1fc1-9a1d-44fc-a02f-8fdac19afc18 conf_volume_UUID=206e505f-1bb8-4cc4-abd9-942654c47612 conf_image_UUID=9871a483-8e7b-4f52-bf71-a6a8adc2309b so, the conf_volume under conf_image dir is a TAR archive: /rhev/data-center/mnt/glusterSD/localhost:_glen/3577c21e-f757-4405-97d1-0f827c9b4e22/images/9871a483-8e7b-4f52-bf71-a6a8adc2309b/206e505f-1bb8-4cc4-abd9-942654c47612: POSIX tar archive (GNU) That I think is the common configuration. I copied the structure (from 3577c21e-f757-4405-97d1-0f827c9b4e22 directory) in the new storage (nfs), changed the entries under local configuration and "shared configuration": from storage=localhost:/glen to storage=<server>:/directory (writable by vdsm user and kvm group) from domainType=glusterfs to domainType=nfs and issued: 1. hosted-engine --connect-storage <- work 2. hosted-engine --vm-start <- doesn't work because there is a complain about the not-starting ovirt-ha-agent What can I do? Is there any documentation somewhere? Diego

Hi, Check out https://access.redhat.com/solutions/6529691. I would say that you have done a few hacks that are in 'vicinity' of correct approach, but the correct approach is a bit simpler than you what you tried to come up with. The idea of procedure is that you should 'forget' about the original eg. image files and to a significant extent The procedure is "launch a brand new hosted-engine, but actually use a database backup " Your answerfile should correspond to what you need (or you just not provide it and answer questions interactively) - obviously you should not use glusterfs as domainType but instead OVEHOSTED_STORAGE/domainType=str:nfs and also the essential thing OVEHOSTED_STORAGE/storageDomainConnection=str:<IP of nfs server>:/<name of NFS volume> Maybe the owner/group of NFS directory was not changed to vdsm/kvm, the solution above contains reference to that. BR, Konstantin On 17.01.23, 09:48, "Diego Ercolani" <diego.ercolani@ssis.sm <mailto:diego.ercolani@ssis.sm>> wrote: ovirt-engine-appliance-4.5-20221206133948 Hello, I have some trouble with my Gluster instance where there is hosted-engine. I would like to copy data from that hosted engine and reverse to another hosted-engine storage (I will try NFS). I think the main method is to put ovirt in global-management mode, stop the hosted-engine instance, unmount the hosted-engine storage systemctl stop {vdsmd;supervdsmd;ovirt-ha-broker;ovirt-ha-agent}; hosted-engine --disconnect-storage umount /rhev/data-center/mnt/glusterSD/localhost:_glen/ and then redeploy the hosted engine with the current backup: method 1: [doesn't work; it fail the ansible script after the pause] ovirt-hosted-engine-setup --4 --config-append=/var/lib/ovirt-hosted-engine-setup/answers/answers-20230116184834.conf --restore-from-file=230116-backup.tar.gz --ansible-extra-vars=he_pause_before_engine_setup=true but it doesn't work method 2: [I cannot find the glitch] I saw in every node there is a configuration under /etc/ovirt-hosted-engine/hosted-engine.conf where there are some interessant entry: vm_disk_id=0a1a501c-fc45-430f-bfd3-076172cec406 vm_disk_vol_id=f65dab86-67f1-46fa-87c0-f9076f479741 storage=localhost:/glen domainType=glusterfs metadata_volume_UUID=4e64e155-ee11-4fdc-b9a0-a7cbee5e4181 metadata_image_UUID=20234090-ea40-4614-ae95-8f91b339ba3e lockspace_volume_UUID=6a975f46-4126-4c2a-b444-6e5a34872cf6 lockspace_image_UUID=893a1fc1-9a1d-44fc-a02f-8fdac19afc18 conf_volume_UUID=206e505f-1bb8-4cc4-abd9-942654c47612 conf_image_UUID=9871a483-8e7b-4f52-bf71-a6a8adc2309b so, the conf_volume under conf_image dir is a TAR archive: /rhev/data-center/mnt/glusterSD/localhost:_glen/3577c21e-f757-4405-97d1-0f827c9b4e22/images/9871a483-8e7b-4f52-bf71-a6a8adc2309b/206e505f-1bb8-4cc4-abd9-942654c47612: POSIX tar archive (GNU) That I think is the common configuration. I copied the structure (from 3577c21e-f757-4405-97d1-0f827c9b4e22 directory) in the new storage (nfs), changed the entries under local configuration and "shared configuration": from storage=localhost:/glen to storage=<server>:/directory (writable by vdsm user and kvm group) from domainType=glusterfs to domainType=nfs and issued: 1. hosted-engine --connect-storage <- work 2. hosted-engine --vm-start <- doesn't work because there is a complain about the not-starting ovirt-ha-agent What can I do? Is there any documentation somewhere? Diego _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org <mailto:users@ovirt.org>/message/QOJYO43SVOMCX6NHDP2N6PF3EIXDTRLP/

Thank you, I'm currently trying to accomplish what you reported. But I'm currently stuck: I launched this: hosted-engine --deploy --4 --restore-from-file=/root/deploy_hosted_engine_230117/230117-scopeall-backup.tar.gz --config-append=/root/deploy_hosted_engine_230117/hosted_engine_deploy.answer.conf --ansible-extra-vars=Debug=99 --ansible-extra-vars=pauseonRestore=true --ansible-extra-vars=he_pause_before_engine_setup=true --otopi-environment="OVESETUP_CONFIG/keycloakEnable=bool:False" I can install correctly the ovirt-engine locally, it launch the engine-setup with the restore of the database but it fails invalidating: OVESETUP_OVN/ovirtProviderOvnSecret as is possible to see in the engine under /ver/log/ovirt-engine/setup/ [...] 2023-01-17 15:58:05,667+0000 DEBUG otopi.context context._executeMethod:127 Stage misc METHOD otopi.plugins.ovirt_engine_setup.ovirt_engine.network.ovirtproviderovn.Plugin._misc_configure_provider 2023-01-17 15:58:05,667+0000 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/network/ovirtproviderovn.py", line 1124, in _misc_configure_provider self._configure_ovirt_provider_ovn() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/network/ovirtproviderovn.py", line 807, in _configure_ovirt_provider_ovn content = self._create_config_content() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/network/ovirtproviderovn.py", line 772, in _create_config_content OvnEnv.OVIRT_PROVIDER_OVN_SECRET KeyError: 'OVESETUP_OVN/ovirtProviderOvnSecret' 2023-01-17 15:58:05,669+0000 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Misc configuration': 'OVESETUP_OVN/ovirtProviderOvnSecret' 2023-01-17 15:58:05,669+0000 DEBUG otopi.transaction transaction.abort:124 aborting 'DNF Transaction' [...] And also I have a problem because it seem the keycloak is enabled even if I disable and also ovn I tried to disable it and in the engine under /etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf I have: [...] 20-setup-ovirt-post.conf:OVESETUP_OVN/ovirtProviderOvn=bool:True 20-setup-ovirt-post.conf:OVESETUP_OVN/ovirtProviderOvnId=str:e6b92384-b112-40e0-8d6f-2c6e4536cd1a [...] but it doesn't appear any ovirtProviderOvnSecret by the way I don't use keycloak (I don't have any keycloak table in the engine dump) and even OVN. I will try to set OVESETUP_OVN/ovirtProviderOvn=bool:True to False before ansible launch engine-setup

Hi, The problem/problems don't seem to be related to HE storage in any way. As you don't use OVN I would make sure to remove any parameters in answer files that mention 'ovn' or something about external providers (and let ovengine setup prompt for any values interactively) and make sure that none of the networks have that 'external network' checkmark with ovn as provider in ovEngine GUI (but yes, I see that you explicitly said 'I don't have OVN in my dump'...) For Keycloak probably you see the change described in https://www.mail-archive.com/users@ovirt.org/msg70682.html and thus it is indeed likely that you need to use OVESETUP_CONFIG/keycloakEnable=bool:False All in all, there are 3 primary things to think of: backup, answers file and ovirt version (and of course things like OS/networking aspects...) and you might narrow down the troubleshooting by using some test host and then don't use backup/adapt or don't use answers file. BR, Konstantin On 17.01.23, 17:29, "Diego Ercolani" <diego.ercolani@ssis.sm <mailto:diego.ercolani@ssis.sm>> wrote: Thank you, I'm currently trying to accomplish what you reported. But I'm currently stuck: I launched this: hosted-engine --deploy --4 --restore-from-file=/root/deploy_hosted_engine_230117/230117-scopeall-backup.tar.gz --config-append=/root/deploy_hosted_engine_230117/hosted_engine_deploy.answer.conf --ansible-extra-vars=Debug=99 --ansible-extra-vars=pauseonRestore=true --ansible-extra-vars=he_pause_before_engine_setup=true --otopi-environment="OVESETUP_CONFIG/keycloakEnable=bool:False" I can install correctly the ovirt-engine locally, it launch the engine-setup with the restore of the database but it fails invalidating: OVESETUP_OVN/ovirtProviderOvnSecret as is possible to see in the engine under /ver/log/ovirt-engine/setup/ [...] 2023-01-17 15:58:05,667+0000 DEBUG otopi.context context._executeMethod:127 Stage misc METHOD otopi.plugins.ovirt_engine_setup.ovirt_engine.network.ovirtproviderovn.Plugin._misc_configure_provider 2023-01-17 15:58:05,667+0000 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/network/ovirtproviderovn.py", line 1124, in _misc_configure_provider self._configure_ovirt_provider_ovn() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/network/ovirtproviderovn.py", line 807, in _configure_ovirt_provider_ovn content = self._create_config_content() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/network/ovirtproviderovn.py", line 772, in _create_config_content OvnEnv.OVIRT_PROVIDER_OVN_SECRET KeyError: 'OVESETUP_OVN/ovirtProviderOvnSecret' 2023-01-17 15:58:05,669+0000 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Misc configuration': 'OVESETUP_OVN/ovirtProviderOvnSecret' 2023-01-17 15:58:05,669+0000 DEBUG otopi.transaction transaction.abort:124 aborting 'DNF Transaction' [...] And also I have a problem because it seem the keycloak is enabled even if I disable and also ovn I tried to disable it and in the engine under /etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf I have: [...] 20-setup-ovirt-post.conf:OVESETUP_OVN/ovirtProviderOvn=bool:True 20-setup-ovirt-post.conf:OVESETUP_OVN/ovirtProviderOvnId=str:e6b92384-b112-40e0-8d6f-2c6e4536cd1a [...] but it doesn't appear any ovirtProviderOvnSecret by the way I don't use keycloak (I don't have any keycloak table in the engine dump) and even OVN. I will try to set OVESETUP_OVN/ovirtProviderOvn=bool:True to False before ansible launch engine-setup _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org <mailto:users@ovirt.org>/message/AW5PWNJRLCC4MUAFQZTPJZ5YIE6GTDEB/

Thank you very much. I think the process is very overcomplicated.... I successfully setup the engine installing a fresh engine and then restore the backup... but then, when I tryied to register the new storage... everything gone wrong. There shoud be some shortcircuits that permit to overcome problems in some manner... it's not possible that with an error under ansible you have to restart from the beginning. Is frustating, megabytes and megabytes of unreadable log.... why these overcomplications? Currently I'm at the point that I have restored the engine in the localstorage but I need to copy it to the destination storage.... but ansible prefer to exit with error..... frustrating this is /var/log/ovirt-engine/engine.log extract: '5b474fe5-b354-4afd-b555-082dd9820274' 2023-01-18 19:19:25,366Z INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-64) [fc89774] BaseAsyncTask::removeTaskFromDB: Removed task 'a700b2be-e607-4b29-a8cc-33a6534b120b' from DataBase 2023-01-18 19:19:25,366Z INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-64) [fc89774] CommandAsyncTask::HandleEndActionResult [within thread]: Removing CommandMultiAsyncTasks object for entity 'd44d1f8d-e597-4e63-bc5c-4d1d4a607a0d' 2023-01-18 19:19:25,790Z ERROR [org.ovirt.engine.core.sso.service.SsoService] (default task-1) [] OAuthException access_denied: Cannot authenticate user null. 2023-01-18 19:19:26,940Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt-node3.ovirt/192.168.123.13 2023-01-18 19:19:26,940Z INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connected to ovirt-node3.ovirt/192.168.123.13:54321 2023-01-18 19:19:26,941Z ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-34) [] Unable to RefreshCapabilities: ConnectException: Connection refused

I ended the process.... but I think there must be a global revision in the architecture.... tooo intricate and definitely not consumer ready This is what I did: -1. in my environment I have pacemaker cluster between nodes (as to overcome gluster I tried to implement also an ha nfs, and one node is my "software and helping repository" (of course using vdo to compress and deduplicate) so it have a cluster of 3 SATA 10TB disk to have some "space" published with linux iSCSI, all mapped on the management 10Gb/s interface (also the VMs vlan are on the same phisical interface, and also the iSCSI initiator for an external storage, so, as every node have to have its address the "datacenter" complain about the fact that the network isn't synced... (!) ) 0. a subset of the virtual machine moved to a node that would become -temporarly- an orphan of the old cluster 1. put the cluster is in global maintenance and hosted-engine stopped 2. on the node (free of VMs) where I would like to deploy the new engine I issued: ovirt-hosted-engine-cleanup (as stated in https://access.redhat.com/solutions/6529691) and verified that the "sanlock client status" left the storages unlocked 3. Deployied the new hosted-engine (hosted-engine --deploy) with time consuming on the new nfs external storage 4. logged in the new engine and defined manually all the networks 5. on the other nodes, stopped the virtual machines and unlocked the sanlock domains and manually umount all the mounted things on /rhev/.* (to stop machines virsh is you friend) virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf to check what storage is belonging to what VMs, the commands are: list - to see what VMs is on or paused domblklist --domain <name> - to see in what place in the local filesystem is mapped to the virtual qemu disks shutdown <name> --mode acpi - to stop gracefully th VM destroy <name> - to stop forcefully the "hanged" (!) VM 6. on every survived orphan node you have to unlock the domain you want to import (sanlock command is your friend but I didn't found the command to unlock only one storage at a time so I have simply stopped the daemon "sanlock client shutdown -f 1") and then umount the mountpoint "umount /rhev/[....]" 7. that is the "magic": on the new engine under StorageDomain>Storage you can select "import domain" and define the domain you want to import, oVirt recognize the domain as initialized domain and warning you that you can loose data... finger cross, acnolewdge an go on 8. selecting the domain you find new "tabs": "{VM|Template|Disk} Import" under, you can import the object follow instructions from oVirt and addressing network warnings (probably the MAC address of the old VMs are OUT the new ionterval. Hint and things: So your -basic- cluster will be up and running again. Obviously I didn't address the restore of other elements of oVirt environment.. probably someone can write a script that starting from a backup of the engine you can selectively restore the object (without ansible please) VM with pending snapshot are probably not imported and you have to "commit" the disk under the domain object "qemu-img commit <file of the snapshot>" and then you can select "scan disks" under storage domain page and then continue the import this method give "a sort of" control while the method -that should work under normal circumstances- give you only frustration and offline of the services. Hope this note is useful to someone and would be welcomed by the developers that try to make things consistent. Thank you for your work. Diego
participants (2)
-
Diego Ercolani
-
Volenbovskyi, Konstantin