oVirt 4.4 hosted engine deploy fails - repository issues

Hi, I want to upgrade oVirt 4.3 to oVirt 4.4. Thus i have to reinstall one node to EL8 an deploy the engine with restore. i get this error message at deploy: [ INFO ] TASK [ovirt.ovirt.engine_setup : Install oVirt Engine package] [ ERROR ] fatal: [localhost -> 192.168.2.143]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'ovirt-4.4-centos-ceph-pacific': Cannot prepare internal mirrorlist: Curl error (56): Failure when receiving data from the peer for http://mirrorlist.centos.org/?release=8-stream&arch=x86_64&repo=storage-ceph-pacific [Recv failure: Connection reset by peer]", "rc": 1, "results": []} Since i do use our satellite server, this URL is not included in the repositories i provided. A repository named 'ovirt-4.4-centos-ceph-pacific' is deinitely provided and available. How do i get the deploy to use the correct repositories? I hope someone can help me out, best regards

Hi! On Tue, Feb 14, 2023 at 12:40 PM <lars.stolpe@bvg.de> wrote:
Hi, I want to upgrade oVirt 4.3 to oVirt 4.4. Thus i have to reinstall one node to EL8 an deploy the engine with restore.
It's not completely clear, what you are trying to do. I suppose you refer to hosted-engine deployment with --restore-from-file.
i get this error message at deploy: [ INFO ] TASK [ovirt.ovirt.engine_setup : Install oVirt Engine package] [ ERROR ] fatal: [localhost -> 192.168.2.143]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'ovirt-4.4-centos-ceph-pacific': Cannot prepare internal mirrorlist: Curl error (56): Failure when receiving data from the peer for http://mirrorlist.centos.org/?release=8-stream&arch=x86_64&repo=storage-ceph-pacific [Recv failure: Connection reset by peer]", "rc": 1, "results": []}
Since i do use our satellite server, this URL is not included in the repositories i provided. A repository named 'ovirt-4.4-centos-ceph-pacific' is deinitely provided and available. How do i get the deploy to use the correct repositories?
IIUC this code runs inside the engine VM, not the host, so your customizations on the host do not apply there. The appliance image used for creating the engine VM includes in itself a suitable ovirt-release package, which also includes the above failing repo. Does the engine VM (via the host it's running on) have direct access to the Internet? Or do you force all comm to go through your satellite proxy? Anyway, some ways you can try to overcome the situation: - Run it offline (--ansible-extra-vars=he_offline_deployment=true) - Provide a custom enginevm_before_engine_setup hook to patch the configuration. See e.g. how this is done in ovirt-system-tests [1]. - Pass (similarly to above) he_pause_before_engine_setup=true . Should work since 4.4.7, https://bugzilla.redhat.com/show_bug.cgi?id=1959273 . - See also [2] [1] https://github.com/oVirt/ovirt-system-tests Search the code for 'proxy', 'socks', etc. [2] https://github.com/didib/ovirt-ansible-collection/tree/master/roles/hosted_e...
I hope someone can help me out,
Good luck and best regards, -- Didi

Hi, thank you for your answer, your hints are helpful
Hi!
On Tue, Feb 14, 2023 at 12:40 PM <lars.stolpe(a)bvg.de> wrote:
It's not completely clear, what you are trying to do. I suppose you refer to hosted-engine deployment with --restore-from-file.
I do an upgrade vorm 4.3 to 4.4 according to the official oVirt upgrade guide. the deploy command is as follow: hosted-engine --deploy --restore-from-file=/root/backup_ovirtman13.bck
IIUC this code runs inside the engine VM, not the host, so your customizations on the host do not apply there. The appliance image used for creating the engine VM includes in itself a suitable ovirt-release package, which also includes the above failing repo. Does the engine VM (via the host it's running on) have direct access to the Internet? Or do you force all comm to go through your satellite proxy?
The hosts do not have direct access to the internet. The new engine have per default no access to MAN either. The 4.3 appliance did not need accass for installation, why would one change that...
Anyway, some ways you can try to overcome the situation: - Run it offline (--ansible-extra-vars=he_offline_deployment=true) - Provide a custom enginevm_before_engine_setup hook to patch the configuration. See e.g. how this is done in ovirt-system-tests [1]. - Pass (similarly to above) he_pause_before_engine_setup=true . Should work since 4.4.7, https://bugzilla.redhat.com/show_bug.cgi?id=1959273 .
What is "offline deployment" doing different, are the necessary packets transferred internally?
- See also [2]
[1] https://github.com/oVirt/ovirt-system-tests Search the code for 'proxy', 'socks', etc. [2] https://github.com/didib/ovirt-ansible-collection/tree/master/roles/hoste...
Good luck and best regards,
Can the management bridge switched to another interface after installation? That opens up some easier ways to enable access to repositories. I know, that in 4.3 is is not possible to do that. ...i miss the option to install the engine during deploy with OS .iso file ;) at least that pause parameter will give me the chance to fix the repositories best regards

On Tue, Feb 14, 2023 at 1:36 PM <lars.stolpe@bvg.de> wrote:
Hi, thank you for your answer, your hints are helpful
Hi!
On Tue, Feb 14, 2023 at 12:40 PM <lars.stolpe(a)bvg.de> wrote:
It's not completely clear, what you are trying to do. I suppose you refer to hosted-engine deployment with --restore-from-file.
I do an upgrade vorm 4.3 to 4.4 according to the official oVirt upgrade guide. the deploy command is as follow: hosted-engine --deploy --restore-from-file=/root/backup_ovirtman13.bck
OK. That's a hosted-engine upgrade. A standalone engine is somewhat different.
IIUC this code runs inside the engine VM, not the host, so your customizations on the host do not apply there. The appliance image used for creating the engine VM includes in itself a suitable ovirt-release package, which also includes the above failing repo. Does the engine VM (via the host it's running on) have direct access to the Internet? Or do you force all comm to go through your satellite proxy?
The hosts do not have direct access to the internet. The new engine have per default no access to MAN either. The 4.3 appliance did not need accass for installation, why would one change that...
Many software packages upgrade themselves during installation, I think that's more-or-less the norm these days - no? A concrete reason: If your current engine is newer than the one included in the appliance, engine-backup inside the appliance will refuse to restore your backup. Upgrading to the latest before running engine-backup mitigates this issue.
Anyway, some ways you can try to overcome the situation: - Run it offline (--ansible-extra-vars=he_offline_deployment=true) - Provide a custom enginevm_before_engine_setup hook to patch the configuration. See e.g. how this is done in ovirt-system-tests [1]. - Pass (similarly to above) he_pause_before_engine_setup=true . Should work since 4.4.7, https://bugzilla.redhat.com/show_bug.cgi?id=1959273 .
What is "offline deployment" doing different, are the necessary packets transferred internally?
In this context, it means it's "offline" - does not require Internet access. In practice it means that whatever that's included in the appliance is going to be used for setup/deployment, without updating. You are welcome to search ovirt-engine-appliance source code for he_offline_deployment, to see the actual details - they are not that many.
- See also [2]
[1] https://github.com/oVirt/ovirt-system-tests Search the code for 'proxy', 'socks', etc. [2] https://github.com/didib/ovirt-ansible-collection/tree/master/roles/hoste...
Good luck and best regards,
Can the management bridge switched to another interface after installation? That opens up some easier ways to enable access to repositories.
The point where your deployment failed is before using the management bridge configured by the engine - it's a more-or-less the default 'default' libvirt network.
I know, that in 4.3 is is not possible to do that.
...i miss the option to install the engine during deploy with OS .iso file ;)
At the time I also felt bad about this. Now, I can easily say it was worth it. The number of issues/problems that people ran into after we introduced the appliance and removed the option to do this manually greatly diminished, IMO.
at least that pause parameter will give me the chance to fix the repositories
Good luck, -- Didi

The backed up system is the latest 4.3 and the install sources of 4.4 are also the latest available. i used "offline" and "pause" , set up squid on the deploy host, modified the repository files in the fresh temporary engine VM to point to our satellite. It worked fine to the point the deploy denies that "offline" parameter and updated the repository package, overwriting my changes....and aborted due to a non reachable repository Maybe i could put the repository package on the ignore list. Providing an own OVA may be a solution. Install a VM, run engine setup to install all neded packages, export as OVA and provide it to the deploy process and let it refill the engine with my backup file. Is it as simple as that? I need to upgrade to 4.5 soon anyways, a direct upgrade from 4.3 to 4.5 is not possible i assume? best regards

On Wed, Feb 15, 2023 at 12:41 PM <lars.stolpe@bvg.de> wrote:
The backed up system is the latest 4.3 and the install sources of 4.4 are also the latest available.
i used "offline" and "pause" , set up squid on the deploy host, modified the repository files in the fresh temporary engine VM to point to our satellite. It worked fine to the point the deploy denies that "offline" parameter and updated the repository package, overwriting my changes...
Sorry, I do not remember where this is done. Did you check the logs?
.and aborted due to a non reachable repository
Maybe i could put the repository package on the ignore list.
Maybe
Providing an own OVA may be a solution. Install a VM, run engine setup to install all neded packages, export as OVA and provide it to the deploy process and let it refill the engine with my backup file. Is it as simple as that?
More-or-less yes, in principle. See also https://github.com/oVirt/ovirt-appliance/ .
I need to upgrade to 4.5 soon anyways, a direct upgrade from 4.3 to 4.5 is not possible i assume?
Sorry, do not remember either... A quick search finds: https://bugzilla.redhat.com/show_bug.cgi?id=2087735 https://bugzilla.redhat.com/show_bug.cgi?id=2072881 https://github.com/oVirt/ovirt-engine/pull/244 So it might work. Good luck and best regards, -- Didi

well, that's interesting... The deploy abortion did not destroy the local runing VM as usual. So i disabled CentOS-Stream-PowerTools.repo again (was re-enabled by deploy), and provided my modified oVirt repositories. I ran engine-setup on the running local VM, setup checked for updates, nothing to be done, and then running fine through the complete setup. Now i have a local VM engine running fine, the hosts are recognized as "up" , VMs and storage domains are fine as well. Is there a way to make the deploy jump to that stage and resume? Or: can i do the engine-setup(with restore or without) myself after providing my modified repositories? If the deploy finds nothing to be updated, all should run afterwards? Since this is the test for upgrading the production i should not depend on "i hope the local VM is still running after deploy abort" All i could find in engine-setup log is, that the repository could not be reached. best regards

On Wed, Feb 15, 2023 at 2:16 PM <lars.stolpe@bvg.de> wrote:
well, that's interesting... The deploy abortion did not destroy the local runing VM as usual.
When was it usual to destroy it? I think it remains up since at least 4.3, perhaps much longer.
So i disabled CentOS-Stream-PowerTools.repo again (was re-enabled by deploy),
Did you try to check where/when/what does this? Perhaps on the host on /var/log/ovirt-hosted-engine-setup, not on the engine VM itself
and provided my modified oVirt repositories.
I ran engine-setup on the running local VM, setup checked for updates,
engine-setup also has an option '--offline'. Running HE deploy with offline should also use this.
nothing to be done, and then running fine through the complete setup. Now i have a local VM engine running fine, the hosts are recognized as "up" , VMs and storage domains are fine as well.
Good!
Is there a way to make the deploy jump to that stage and resume?
Definitely not easily. If you just want to try this as a learning game, you can try. If you want to automate this, or plan for production, I'd use a different approach.
Or: can i do the engine-setup(with restore or without) myself after providing my modified repositories?
There is no way to skip engine-setup and let you run it manually. You can provide before/after hooks.
If the deploy finds nothing to be updated, all should run afterwards?
In principle yes, and this might be a good approach - replace all repos with yours (e.g. in a before hook), make sure yours do not include a release package that will overwrite your repos.
Since this is the test for upgrading the production i should not depend on "i hope the local VM is still running after deploy abort"
Agreed, in principle.
All i could find in engine-setup log is, that the repository could not be reached.
Good luck and best regards, -- Didi

On Wed, Feb 15, 2023 at 2:16 PM <lars.stolpe(a)bvg.de> wrote:
When was it usual to destroy it? I think it remains up since at least 4.3, perhaps much longer.
As i was moving (reinstall with restore) hosted storage to a new SAN (V 4.3), the local VM was shut down before i could take action several times.
Did you try to check where/when/what does this? Perhaps on the host on /var/log/ovirt-hosted-engine-setup, not on the engine VM itself
engine-setup also has an option '--offline'. Running HE deploy with offline should also use this.
Good!
Definitely not easily. If you just want to try this as a learning game, you can try. If you want to automate this, or plan for production, I'd use a different approach.
There is no way to skip engine-setup and let you run it manually. You can provide before/after hooks.
In principle yes, and this might be a good approach - replace all repos with yours (e.g. in a before hook), make sure yours do not include a release package that will overwrite your repos.
Agreed, in principle.
Good luck and best regards,
Almost... The deploy went good, til this host was added. There is a bug that prevents vdsm to read the local configuration. I used RHEL 8.5 and did not update, because the latest update 7.9 made oVirt 4.2 inoperable. So i decided to use the version mentioned in the documentation. Update vdsm and back to square 1 I'm confident it will work afterwards. best regards

On Thu, Feb 16, 2023 at 12:19 PM <lars.stolpe@bvg.de> wrote:
Almost... The deploy went good, til this host was added. There is a bug that prevents vdsm to read the local configuration. I used RHEL 8.5 and did not update, because the latest update 7.9 made oVirt 4.2 inoperable. So i decided to use the version mentioned in the documentation.
8.5? Where? RHV officially supports only RHEL 8.6, AFAIR. oVirt is tested regularly on Stream, so should most likely work on 8.7, or 8.8 when available.
Update vdsm and back to square 1
Not sure what you mean. vdsm is on the host, not on the engine vm. -- Didi

On Thu, Feb 16, 2023 at 12:19 PM <lars.stolpe(a)bvg.de> wrote:
8.5? Where?
RHV officially supports only RHEL 8.6, AFAIR.
oVirt is tested regularly on Stream, so should most likely work on 8.7, or 8.8 when available.
Not sure what you mean. vdsm is on the host, not on the engine vm. the engine can not determine the hosts capability, some research revealed a solved bug
Now i get "500 - Internal Server Error" during "Check if Engine health page is up" I'm getting grey hair....

https://lists.ovirt.org/archives/list/devel@ovirt.org/message/SBCWNXLFLJBKTA... that internal server error is because of updating to the latest updates available xD

Next issue: CPU and microcode... In oVirt 4.3 Broadwell and Skylake are suported. In EL8 microcode updates are disabling the processor compatibility. This host can not be activated, because CPU requirements are not met. I hope i can downgrade the microcode package far enough to re-enable CPU support.

The deploy finished successfull now. Task: upgrade oVirt 4.3 to oVirt 4.4 boundary conditions: access to repositories via repository server only (RH Satellite), encapsulated management network (no connection outside this segment) short steps i have done starting with an already installed host: 1. # dnf downgrade microcode_ctl until # vdsm-client Host getCapabilities | grep -i flags shows a matching CPU type (the same as in oVirt cluster configuration) 2. install and start squid on deploy host, no configuration necessary since 192.168.0.0/16 is already allowed 3. # hosted-engine --deploy --restore-from-file=/root/backup_ovirtman.bck --ansible-extra-vars=he_offline_deployment=true --ansible-extra-vars=he_pause_before_engine_setup=true gateway: <any pingable device within the managemant network> check method: ping Pause after adding the host? (Yes, No)[No]: yes (don't miss that "yes") 4. in the local VM: modify repositories as needed to match repo server, add exclude=postgresql-jdbc to CentOS-Stream-AppStream.repo add exclude=ovirt-release44 to ovirt-4.4.repo add proxy entry to /etc/dnf/dnf.conf (proxy=http://<deploy host IP>:3128/) add hostnames to /etc/hosts as needed 5. # dnf upgrade --nobest 6. check if all repos are still working, make corrections if needed check postgresql-jdbc # dnf list postgresql-jdbc must not be 42.2.14* , should be 42.2.3-3.el8_2 7. delete Lock-file 8. at the next stop, start firefox on host and log in to the engine set up host network as required, go on only, if all hosts and all storage domains are up after saving the network configuration the host routing may be broken, chek on host: # cat /proc/sys/net/ipv4/ip_forward if 0 do: # echo 1 > /proc/sys/net/ipv4/ip_forward all hosts und domains should be up again 9. delete Lock-file 10. chose hosted_storage and complete the deploy thanks to Didi for the hints and the patience

instead of downgrading microcode i did eventually enabled the needed CPU flag: #> grubby --update-kernel=ALL --args="tsx=on" #> grub2-mkconfig -o /etc/grub2.cfg #> reboot Reverse changes after change the default CPU type of the cluster: set host in maintenance #> grubby --update-kernel=ALL --remove-args="tsx=on" #> grub2-mkconfig -o /etc/grub2.cfg #> reboot

Hi, Great job! Perhaps you'd like to post this somewhere more noticeable/findable? I'd love to say "E.g. on the oVirt blog", but I have absolutely no idea how that is updated. Adding Sandro... Best regards, On Thu, Mar 9, 2023 at 6:46 AM <lars.stolpe@bvg.de> wrote:
instead of downgrading microcode i did eventually enabled the needed CPU flag: #> grubby --update-kernel=ALL --args="tsx=on" #> grub2-mkconfig -o /etc/grub2.cfg #> reboot
Reverse changes after change the default CPU type of the cluster: set host in maintenance #> grubby --update-kernel=ALL --remove-args="tsx=on" #> grub2-mkconfig -o /etc/grub2.cfg #> reboot _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/4OJIR7XJU3PKJS...
-- Didi

Il giorno gio 9 mar 2023 alle ore 07:43 Yedidyah Bar David <didi@redhat.com> ha scritto:
Hi,
Great job!
Perhaps you'd like to post this somewhere more noticeable/findable?
I'd love to say "E.g. on the oVirt blog", but I have absolutely no idea how that is updated. Adding Sandro...
For oVirt blog we need to create an account for Lars so he can publish a blog post. Alternative is to add the procedure to the upgrade guide I guess.
Best regards,
On Thu, Mar 9, 2023 at 6:46 AM <lars.stolpe@bvg.de> wrote:
instead of downgrading microcode i did eventually enabled the needed CPU
flag:
#> grubby --update-kernel=ALL --args="tsx=on" #> grub2-mkconfig -o /etc/grub2.cfg #> reboot
Reverse changes after change the default CPU type of the cluster: set host in maintenance #> grubby --update-kernel=ALL --remove-args="tsx=on" #> grub2-mkconfig -o /etc/grub2.cfg #> reboot _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/4OJIR7XJU3PKJS...
-- Didi
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING - Red Hat In-Vehicle Operating System Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
participants (3)
-
lars.stolpe@bvg.de
-
Sandro Bonazzola
-
Yedidyah Bar David