botched 3.6 -> 4.0/1/2 upgrade, how to recover

Hello, an old 3.6 self-hosted one-system installation was half upgraded to 4.0 and I took on the task to continue upgrading it. I managed to setup a new engine and upgraded step by step until 4.2 where the host needs to be upgraded. During that time the host was "non-responsive", but I though that this had to do with the upgrade process. Upon trying to upgrade the host the setup broke to pieces. I tried to downgrade everything and restart the old engine to no avail. I now try the way forward to try to start the new engine or create a complete new one, I don;t care much about the old engine's inherited information - all I need to to resurrect the VMs again. At the end of the day the setup is supposed to migrate to a three system hyperconverged setup anyway. How should I proceed to get a working state that fires up the VMs? Is it safe to install a hosted engine from scratch and reattach the domain? Probably the VM configuration will be lost and I will have to puzzle together the disks. I still have the 3.6 engine's backup file, if that is of any help. Should I perhaps recreate a 4.0 engine with that file and try to continue from there? I guess all information on which VMs are attached to which images etc are in the DB of the engine, so I either have to get this info off the 3.6 backup or wire them up again by inspection. BTW this is not a pure ovirt, but an RHV with self-evaluation support, but I believe that any pure ovirt solution will apply. Many thanks in advance.

If you can: restart your setup from scratch with at least 3 hosts. Really. If you don't face harsh consequences for doing so. It isn't just worth the hassle and pain. Use at least 3 hosts because: A node can corrupt due to many reasons. Natural causes, self inflicted ones. A corrupt node can only recover through re-installation. This is triggered by the engine with VMs safe on the other nodes. On a single node system how can this be possible with only a corrupted node at hand? If you really can only stick to a one host setup you could try Proxmox. They have a community edition. For multi host setups choose oVirt. It is just great.

Hi, update: I managed to get the old engine running (3.6), but it is not communicating with the host. Do I need to downgrade the host and/or somehow re-enroll it to the engine?

On Tue, May 14, 2019 at 9:20 AM <axel.thimm@01lgc.com> wrote:
Hi,
update: I managed to get the old engine running (3.6), but it is not communicating with the host. Do I need to downgrade the host and/or somehow re-enroll it to the engine?
I didn't understand the current state of your host, but if it's (partially? upgraded to) 4.2, then IIRC you need to downgrade. Also, please provide more details, when you ask for help - what happens when you try to activate it (if the ui allows that at all)? What errors do you get in the ui and in the logs (engine and vdsm)? etc. Thanks and best regards, -- Didi

Hi, yes, the host seems to have been updated. vdsm-api-4.30.13-1.el7ev.noarch vdsm-http-4.30.13-1.el7ev.noarch vdsm-hook-vmfex-dev-4.30.13-1.el7ev.noarch vdsm-network-4.30.13-1.el7ev.x86_64 vdsm-common-4.30.13-1.el7ev.noarch vdsm-yajsonrpc-4.30.13-1.el7ev.noarch vdsm-jsonrpc-4.30.13-1.el7ev.noarch vdsm-hook-ethtool-options-4.30.13-1.el7ev.noarch vdsm-hook-openstacknet-4.30.13-1.el7ev.noarch vdsm-hook-vhostmd-4.30.13-1.el7ev.noarch vdsm-client-4.30.13-1.el7ev.noarch vdsm-hook-fcoe-4.30.13-1.el7ev.noarch vdsm-python-4.30.13-1.el7ev.noarch vdsm-4.30.13-1.el7ev.x86_64 I see some communication happening between the engine and the host's vdsmd on port 54321, but it is encrypted. It does break off immediately, so I wonder if this is an SSL issue. I already downgraded gnutls due to https://bugzilla.redhat.com/show_bug.cgi?id=1648190 The engine just complains on communication errors, e.g. 2019-05-14 09:30:56,275 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-83) [] Command 'GetCapabilitiesVDSCommand(HostName = hetzner-XXXXXX, VdsIdAndVdsVDSCommandParam etersBase:{runAsync='true', hostId='2bc7bd41-aaa2-4973-95ab-402a0fe32daf', vds='Host[hetzner-XXXXXX,2bc7bd41-aaa2-4973-95ab-402a0fe32daf]'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which ca n be caused by communication issues 2019-05-14 09:30:56,275 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-83) [] Failure to refresh Vds runtime info: VDSGenericException: VDSNetworkException: Message timeout which can be c aused by communication issues 2019-05-14 09:30:56,275 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-83) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkExce ption: Message timeout which can be caused by communication issues Would trying hosted-engine --upgrade-appliance make any sense w/o restoring the communication on vdsm level?

On Tue, May 14, 2019 at 11:10 AM <axel.thimm@01lgc.com> wrote:
Hi,
yes, the host seems to have been updated.
vdsm-api-4.30.13-1.el7ev.noarch vdsm-http-4.30.13-1.el7ev.noarch vdsm-hook-vmfex-dev-4.30.13-1.el7ev.noarch vdsm-network-4.30.13-1.el7ev.x86_64 vdsm-common-4.30.13-1.el7ev.noarch vdsm-yajsonrpc-4.30.13-1.el7ev.noarch vdsm-jsonrpc-4.30.13-1.el7ev.noarch vdsm-hook-ethtool-options-4.30.13-1.el7ev.noarch vdsm-hook-openstacknet-4.30.13-1.el7ev.noarch vdsm-hook-vhostmd-4.30.13-1.el7ev.noarch vdsm-client-4.30.13-1.el7ev.noarch vdsm-hook-fcoe-4.30.13-1.el7ev.noarch vdsm-python-4.30.13-1.el7ev.noarch vdsm-4.30.13-1.el7ev.x86_64
I see some communication happening between the engine and the host's vdsmd on port 54321, but it is encrypted. It does break off immediately, so I wonder if this is an SSL issue. I already downgraded gnutls due to https://bugzilla.redhat.com/show_bug.cgi?id=1648190
The engine just complains on communication errors, e.g. 2019-05-14 09:30:56,275 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-83) [] Command 'GetCapabilitiesVDSCommand(HostName = hetzner-XXXXXX, VdsIdAndVdsVDSCommandParam etersBase:{runAsync='true', hostId='2bc7bd41-aaa2-4973-95ab-402a0fe32daf', vds='Host[hetzner-XXXXXX,2bc7bd41-aaa2-4973-95ab-402a0fe32daf]'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which ca n be caused by communication issues 2019-05-14 09:30:56,275 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-83) [] Failure to refresh Vds runtime info: VDSGenericException: VDSNetworkException: Message timeout which can be c aused by communication issues 2019-05-14 09:30:56,275 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-83) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkExce ption: Message timeout which can be caused by communication issues
Would trying
hosted-engine --upgrade-appliance
make any sense w/o restoring the communication on vdsm level?
Might not make sense at all, --upgrade-appliance is only for el6/3.6->el7/4.0. Later upgrades you do inside the VM. I'd personally first try to fix the engine<->vdsm comm issue. How did you upgrade the host? yum update? Is is ovirt-node? If you are ok debugging/fixing yourself and only need the occasional tip, fine. Otherwise, please provide more details - exact versions of engine/vdsm, last version that worked, whether you can reinstall the host or need anything on it, etc. Best regards, -- Didi

I'd personally first try to fix the engine<->vdsm comm issue.
I fully agree. I might then take exports of all VMs and do a fresh start over. I'm not sure the upgrade instructions work, as the upgrade for 3.6->4.0 already activates much newer content. For example the quoted --upgrade-appliance options does not even exist, probably because a later package removed it. One would have to manually versionlock the 4.0 hypervisior packages.
How did you upgrade the host? yum update? Is is ovirt-node?
Currently the state is as follows: engine is running on the 3.6 branch, all updates applied [1]. The hosting RHEL host is on the RHV4 branch with all updats in, but the gnutls mentioned [2]. I managed to flag the host as in maintenance mode, so I could ask it to re-enroll the certs. I can see from the hosts logs, that the ssh connection works, and the certs have been updated. But now I see in the host's vdsm log that the SSL connection is being dropped.
If you are ok debugging/fixing yourself and only need the occasional tip, fine. Otherwise, please provide more details - exact versions of engine/vdsm, last version that worked, whether you can reinstall the host or need anything on it, etc.
I'm infinitely grateful for any assistance! I didn't want to spam the list with my first mail. I hope I provided some better information now. Thanks! In the long run I will decommission this host. I need the VMs on it to migrate to a fresh 4.3 3 host hyperconverged setup which is yet to be built. Actually I am abusing one of the three nodes at the moment as a backup system for this failed upgrade, so I'm blocked with going forward with that setup as well. ... :( [1] [root@engine ~]# engine-upgrade-check VERB: queue package rhevm-setup for update VERB: Downloading: repomdtDvMe7tmp.xml (0%) VERB: Downloading: repomdtDvMe7tmp.xml 3.4 k(100%) VERB: Downloading: jb-eap-6-for-rhel-6-server-rpms/primary_db (0%) VERB: Downloading: jb-eap-6-for-rhel-6-server-rpms/primary_db 1.2 M(100%) VERB: Downloading: repomd__rFIUtmp.xml (0%) VERB: Downloading: repomd__rFIUtmp.xml 3.4 k(100%) VERB: Downloading: repomdXw2WPVtmp.xml (0%) VERB: Downloading: repomdXw2WPVtmp.xml 4.0 k(100%) VERB: Downloading: repomdF3AGBOtmp.xml (0%) VERB: Downloading: repomdF3AGBOtmp.xml 3.5 k(100%) VERB: Downloading: repomd9DvU9vtmp.xml (0%) VERB: Downloading: repomd9DvU9vtmp.xml 3.4 k(100%) VERB: Downloading: rhel-6-server-supplementary-rpms/primary_db (0%) VERB: Downloading: rhel-6-server-supplementary-rpms/primary_db 851 k(100%) VERB: Building transaction VERB: Empty transaction VERB: Transaction Summary: No upgrade [2] # subscription-manager repos --list-enabled; yum check-update +----------------------------------------------------------+ Available Repositories in /etc/yum.repos.d/redhat.repo +----------------------------------------------------------+ Repo ID: rhel-7-server-ansible-2-rpms Repo Name: Red Hat Ansible Engine 2 RPMs for Red Hat Enterprise Linux 7 Server Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/$basearch/a nsible/2/os Enabled: 1 Repo ID: rhel-7-server-rhev-mgmt-agent-rpms Repo Name: Red Hat Enterprise Virtualization Management Agents for RHEL 7 (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/rhev-mgmt-agent/3/os Enabled: 1 Repo ID: rhel-7-server-rhv-4-mgmt-agent-rpms Repo Name: Red Hat Virtualization 4 Management Agents for RHEL 7 (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/rhv-mgmt-agent/4/os Enabled: 1 Repo ID: rhel-7-server-rpms Repo Name: Red Hat Enterprise Linux 7 Server (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/os Enabled: 1 Repo ID: rhel-7-server-optional-rpms Repo Name: Red Hat Enterprise Linux 7 Server - Optional (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/optional/os Enabled: 1 Loaded plugins: enabled_repos_upload, package_upload, product-id, search- : disabled-repos, subscription-manager, vdsmupgrade gnutls.x86_64 3.3.29-9.el7_6 rhel-7-server-rpms gnutls-dane.x86_64 3.3.29-9.el7_6 rhel-7-server-rpms gnutls-utils.x86_64 3.3.29-9.el7_6 rhel-7-server-rpms Uploading Enabled Repositories Report Loaded plugins: product-id, subscription-manager Unable to upload Enabled Repositories Report [3] May 14 10:57:18 hetzner-XXXXXX systemd[1]: Starting Virtual Desktop Server Manager... May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running mkdirs May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running configure_coredump May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running configure_vdsm_logs May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running wait_for_network May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running run_init_hooks May 14 10:57:19 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running check_is_configured May 14 10:57:19 hetzner-XXXXXX sasldblistusers2[8203]: DIGEST-MD5 common mech free May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: abrt is already configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: Managed volume database is already configured May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: lvm is configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: libvirt is already configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: Current revision of multipath.conf detected, preserving May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running validate_configuration May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: SUCCESS: ssl configured to true. No conflicts May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running prepare_transient_repository May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running syslog_available May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running nwfilter May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running dummybr May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running tune_system May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running test_space May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running test_lo May 14 10:57:21 hetzner-XXXXXX systemd[1]: Started Virtual Desktop Server Manager. May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN unhandled write event May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN Not ready yet, ignoring event '|virt|VM_status|4f28af23-dd7e-413e-a331-1875f4dd18b3' args={'4f28af23-dd7e-413e-a331-1875f4dd18b3': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-8231387692555228201', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '4f28af23-dd7e-413e-a331-1875f4dd18b3', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '8420', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN MOM not available. May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN MOM not available, KSM stats will be missing. May 14 10:58:47 hetzner-XXXXXX vdsm[8252]: WARN File: /var/lib/libvirt/qemu/channels/4f28af23-dd7e-413e-a331-1875f4dd18b3.com.redhat.rhevm.vdsm already removed May 14 10:58:47 hetzner-XXXXXX vdsm[8252]: WARN File: /var/lib/libvirt/qemu/channels/4f28af23-dd7e-413e-a331-1875f4dd18b3.org.qemu.guest_agent.0 already removed May 14 11:00:52 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:01:54 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:02:06 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:05:18 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:05:29 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:08:41 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:08:53 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:12:04 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:12:16 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:15:28 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:15:40 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10

On Tue, May 14, 2019 at 12:25 PM <axel.thimm@01lgc.com> wrote:
I'd personally first try to fix the engine<->vdsm comm issue.
I fully agree. I might then take exports of all VMs and do a fresh start over. I'm not sure the upgrade instructions work, as the upgrade for 3.6->4.0 already activates much newer content. For example the quoted --upgrade-appliance options does not even exist, probably because a later package removed it. One would have to manually versionlock the 4.0 hypervisior packages.
How did you upgrade the host? yum update? Is is ovirt-node?
Currently the state is as follows: engine is running on the 3.6 branch, all updates applied [1]. The hosting RHEL host is on the RHV4 branch with all updats in, but the gnutls mentioned [2].
I managed to flag the host as in maintenance mode, so I could ask it to re-enroll the certs.
Did you? Please do.
I can see from the hosts logs, that the ssh connection works, and the certs have been updated. But now I see in the host's vdsm log that the SSL connection is being dropped.
This can also be because a 3.6 engine defaults xmlrpc and a new 4.3 host talks only jsonrpc. Not sure, perhaps you can change the host (on engine side) to use json rpc and it would work, although a 4.3 host is definitely not supposed to support a 3.6 engine. If all you want is to export the VMs, I'd downgrade the host to 3.6.
If you are ok debugging/fixing yourself and only need the occasional tip, fine. Otherwise, please provide more details - exact versions of engine/vdsm, last version that worked, whether you can reinstall the host or need anything on it, etc.
I'm infinitely grateful for any assistance! I didn't want to spam the list with my first mail. I hope I provided some better information now. Thanks!
In the long run I will decommission this host. I need the VMs on it to migrate to a fresh 4.3 3 host hyperconverged setup which is yet to be built. Actually I am abusing one of the three nodes at the moment as a backup system for this failed upgrade, so I'm blocked with going forward with that setup as well. ... :(
Another option you might attempt is to import the data domain directly from the new engine. Obviously, do this first on a test copy... Re [2]: I'll use this opportunity to clarify something which is not obvious, since you use RHV and not oVirt. There is a significant difference between oVirt and RHV regarding repos/channels. In oVirt, each minor version has its own repos, and users can choose freely which one they want. In RHV, this the same for the engine channels, but not for hosts. For hosts, there is only a single channel per each major version. So if you use RHV 4 host channel, you get latest. Older versions are still available there, but you have to play with yum to choose them, and in RHV this is considered not supported, other than using RHVH (RHV downstream of oVirt node) which can be used for specific cases (perhaps like yours, but not sure it would help). This has the advantage, for RHV customers, of not having to add a new repo to get updates, and the disadvantage of not being able to stay on a specific minor version when upgrading. Good luck and best regards,
[1] [root@engine ~]# engine-upgrade-check VERB: queue package rhevm-setup for update VERB: Downloading: repomdtDvMe7tmp.xml (0%) VERB: Downloading: repomdtDvMe7tmp.xml 3.4 k(100%) VERB: Downloading: jb-eap-6-for-rhel-6-server-rpms/primary_db (0%) VERB: Downloading: jb-eap-6-for-rhel-6-server-rpms/primary_db 1.2 M(100%) VERB: Downloading: repomd__rFIUtmp.xml (0%) VERB: Downloading: repomd__rFIUtmp.xml 3.4 k(100%) VERB: Downloading: repomdXw2WPVtmp.xml (0%) VERB: Downloading: repomdXw2WPVtmp.xml 4.0 k(100%) VERB: Downloading: repomdF3AGBOtmp.xml (0%) VERB: Downloading: repomdF3AGBOtmp.xml 3.5 k(100%) VERB: Downloading: repomd9DvU9vtmp.xml (0%) VERB: Downloading: repomd9DvU9vtmp.xml 3.4 k(100%) VERB: Downloading: rhel-6-server-supplementary-rpms/primary_db (0%) VERB: Downloading: rhel-6-server-supplementary-rpms/primary_db 851 k(100%) VERB: Building transaction VERB: Empty transaction VERB: Transaction Summary: No upgrade
[2] # subscription-manager repos --list-enabled; yum check-update +----------------------------------------------------------+ Available Repositories in /etc/yum.repos.d/redhat.repo +----------------------------------------------------------+ Repo ID: rhel-7-server-ansible-2-rpms Repo Name: Red Hat Ansible Engine 2 RPMs for Red Hat Enterprise Linux 7 Server Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/$basearch/a nsible/2/os Enabled: 1
Repo ID: rhel-7-server-rhev-mgmt-agent-rpms Repo Name: Red Hat Enterprise Virtualization Management Agents for RHEL 7 (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/rhev-mgmt-agent/3/os Enabled: 1
Repo ID: rhel-7-server-rhv-4-mgmt-agent-rpms Repo Name: Red Hat Virtualization 4 Management Agents for RHEL 7 (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/rhv-mgmt-agent/4/os Enabled: 1
Repo ID: rhel-7-server-rpms Repo Name: Red Hat Enterprise Linux 7 Server (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/os Enabled: 1
Repo ID: rhel-7-server-optional-rpms Repo Name: Red Hat Enterprise Linux 7 Server - Optional (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/optional/os Enabled: 1
Loaded plugins: enabled_repos_upload, package_upload, product-id, search- : disabled-repos, subscription-manager, vdsmupgrade
gnutls.x86_64 3.3.29-9.el7_6 rhel-7-server-rpms gnutls-dane.x86_64 3.3.29-9.el7_6 rhel-7-server-rpms gnutls-utils.x86_64 3.3.29-9.el7_6 rhel-7-server-rpms Uploading Enabled Repositories Report Loaded plugins: product-id, subscription-manager Unable to upload Enabled Repositories Report
[3] May 14 10:57:18 hetzner-XXXXXX systemd[1]: Starting Virtual Desktop Server Manager... May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running mkdirs May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running configure_coredump May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running configure_vdsm_logs May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running wait_for_network May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running run_init_hooks May 14 10:57:19 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running check_is_configured May 14 10:57:19 hetzner-XXXXXX sasldblistusers2[8203]: DIGEST-MD5 common mech free May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: abrt is already configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: Managed volume database is already configured May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: lvm is configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: libvirt is already configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: Current revision of multipath.conf detected, preserving May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running validate_configuration May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: SUCCESS: ssl configured to true. No conflicts May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running prepare_transient_repository May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running syslog_available May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running nwfilter May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running dummybr May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running tune_system May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running test_space May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running test_lo May 14 10:57:21 hetzner-XXXXXX systemd[1]: Started Virtual Desktop Server Manager. May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN unhandled write event May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN Not ready yet, ignoring event '|virt|VM_status|4f28af23-dd7e-413e-a331-1875f4dd18b3' args={'4f28af23-dd7e-413e-a331-1875f4dd18b3': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-8231387692555228201', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '4f28af23-dd7e-413e-a331-1875f4dd18b3', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '8420', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN MOM not available. May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN MOM not available, KSM stats will be missing. May 14 10:58:47 hetzner-XXXXXX vdsm[8252]: WARN File: /var/lib/libvirt/qemu/channels/4f28af23-dd7e-413e-a331-1875f4dd18b3.com.redhat.rhevm.vdsm already removed May 14 10:58:47 hetzner-XXXXXX vdsm[8252]: WARN File: /var/lib/libvirt/qemu/channels/4f28af23-dd7e-413e-a331-1875f4dd18b3.org.qemu.guest_agent.0 already removed May 14 11:00:52 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:01:54 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:02:06 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:05:18 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:05:29 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:08:41 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:08:53 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:12:04 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:12:16 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:15:28 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:15:40 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DOD7T7DL55TR5L...
-- Didi

On Tue, May 14, 2019 at 12:25 PM Yedidyah Bar David <didi@redhat.com> wrote:
On Tue, May 14, 2019 at 12:25 PM <axel.thimm@01lgc.com> wrote:
I'd personally first try to fix the engine<->vdsm comm issue.
I fully agree. I might then take exports of all VMs and do a fresh start
over. I'm not sure the upgrade instructions work, as the upgrade for 3.6->4.0 already activates much newer content. For example the quoted --upgrade-appliance options does not even exist, probably because a later package removed it. One would have to manually versionlock the 4.0 hypervisior packages.
How did you upgrade the host? yum update? Is is ovirt-node?
Currently the state is as follows: engine is running on the 3.6 branch,
all updates applied [1]. The hosting RHEL host is on the RHV4 branch with all updats in, but the gnutls mentioned [2].
I managed to flag the host as in maintenance mode, so I could ask it to
re-enroll the certs.
Did you? Please do.
I can see from the hosts logs, that the ssh connection works, and the certs have been updated. But now I see in the host's vdsm log that the SSL connection is being dropped.
This can also be because a 3.6 engine defaults xmlrpc and a new 4.3 host talks only jsonrpc. Not sure, perhaps you can change the host (on engine side) to use json rpc and it would work, although a 4.3 host is definitely not supposed to support a 3.6 engine. If all you want is to export the VMs, I'd downgrade the host to 3.6.
If you are ok debugging/fixing yourself and only need the occasional tip, fine. Otherwise, please provide more details - exact versions of engine/vdsm, last version that worked, whether you can reinstall the host or need anything on it, etc.
I'm infinitely grateful for any assistance! I didn't want to spam the
list with my first mail. I hope I provided some better information now. Thanks!
In the long run I will decommission this host. I need the VMs on it to
migrate to a fresh 4.3 3 host hyperconverged setup which is yet to be built. Actually I am abusing one of the three nodes at the moment as a backup system for this failed upgrade, so I'm blocked with going forward with that setup as well. ... :(
Another option you might attempt is to import the data domain directly from the new engine. Obviously, do this first on a test copy...
Re [2]: I'll use this opportunity to clarify something which is not obvious, since you use RHV and not oVirt.
There is a significant difference between oVirt and RHV regarding repos/channels. In oVirt, each minor version has its own repos, and users can choose freely which one they want. In RHV, this the same for the engine channels, but not for hosts. For hosts, there is only a single channel per each major version. So if you use RHV 4 host channel, you get latest. Older versions are still available there, but you have to play with yum to choose them, and in RHV this is considered not supported, other than using RHVH (RHV downstream of oVirt node) which can be used for specific cases (perhaps like yours, but not sure it would help).
Yes, for this specific case the best option is to use latest RHV-H from 4.2 time.
This has the advantage, for RHV customers, of not having to add a new repo to get updates, and the disadvantage of not being able to stay on a specific minor version when upgrading.
Good luck and best regards,
[1] [root@engine ~]# engine-upgrade-check VERB: queue package rhevm-setup for update VERB: Downloading: repomdtDvMe7tmp.xml (0%) VERB: Downloading: repomdtDvMe7tmp.xml 3.4 k(100%) VERB: Downloading: jb-eap-6-for-rhel-6-server-rpms/primary_db (0%) VERB: Downloading: jb-eap-6-for-rhel-6-server-rpms/primary_db 1.2 M(100%) VERB: Downloading: repomd__rFIUtmp.xml (0%) VERB: Downloading: repomd__rFIUtmp.xml 3.4 k(100%) VERB: Downloading: repomdXw2WPVtmp.xml (0%) VERB: Downloading: repomdXw2WPVtmp.xml 4.0 k(100%) VERB: Downloading: repomdF3AGBOtmp.xml (0%) VERB: Downloading: repomdF3AGBOtmp.xml 3.5 k(100%) VERB: Downloading: repomd9DvU9vtmp.xml (0%) VERB: Downloading: repomd9DvU9vtmp.xml 3.4 k(100%) VERB: Downloading: rhel-6-server-supplementary-rpms/primary_db (0%) VERB: Downloading: rhel-6-server-supplementary-rpms/primary_db 851
k(100%)
VERB: Building transaction VERB: Empty transaction VERB: Transaction Summary: No upgrade
[2] # subscription-manager repos --list-enabled; yum check-update +----------------------------------------------------------+ Available Repositories in /etc/yum.repos.d/redhat.repo +----------------------------------------------------------+ Repo ID: rhel-7-server-ansible-2-rpms Repo Name: Red Hat Ansible Engine 2 RPMs for Red Hat Enterprise Linux 7 Server Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/$basearch/a nsible/2/os Enabled: 1
Repo ID: rhel-7-server-rhev-mgmt-agent-rpms Repo Name: Red Hat Enterprise Virtualization Management Agents for RHEL 7 (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/rhev-mgmt-agent/3/os Enabled: 1
Repo ID: rhel-7-server-rhv-4-mgmt-agent-rpms Repo Name: Red Hat Virtualization 4 Management Agents for RHEL 7 (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/rhv-mgmt-agent/4/os Enabled: 1
Repo ID: rhel-7-server-rpms Repo Name: Red Hat Enterprise Linux 7 Server (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/os Enabled: 1
Repo ID: rhel-7-server-optional-rpms Repo Name: Red Hat Enterprise Linux 7 Server - Optional (RPMs) Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basear ch/optional/os Enabled: 1
Loaded plugins: enabled_repos_upload, package_upload, product-id, search- : disabled-repos, subscription-manager, vdsmupgrade
gnutls.x86_64 3.3.29-9.el7_6
rhel-7-server-rpms
gnutls-dane.x86_64 3.3.29-9.el7_6
rhel-7-server-rpms
gnutls-utils.x86_64 3.3.29-9.el7_6
Uploading Enabled Repositories Report Loaded plugins: product-id, subscription-manager Unable to upload Enabled Repositories Report
[3] May 14 10:57:18 hetzner-XXXXXX systemd[1]: Starting Virtual Desktop Server Manager... May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running mkdirs May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running configure_coredump May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running configure_vdsm_logs May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running wait_for_network May 14 10:57:18 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running run_init_hooks May 14 10:57:19 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running check_is_configured May 14 10:57:19 hetzner-XXXXXX sasldblistusers2[8203]: DIGEST-MD5 common mech free May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: abrt is already configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: Managed volume database is already configured May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: lvm is configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: libvirt is already configured for vdsm May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: Current revision of multipath.conf detected, preserving May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running validate_configuration May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: SUCCESS: ssl configured to true. No conflicts May 14 10:57:20 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running
rhel-7-server-rpms prepare_transient_repository
May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running syslog_available May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running nwfilter May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running dummybr May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running tune_system May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running test_space May 14 10:57:21 hetzner-XXXXXX vdsmd_init_common.sh[8177]: vdsm: Running test_lo May 14 10:57:21 hetzner-XXXXXX systemd[1]: Started Virtual Desktop Server Manager. May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN unhandled write event May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN Not ready yet, ignoring event '|virt|VM_status|4f28af23-dd7e-413e-a331-1875f4dd18b3' args={'4f28af23-dd7e-413e-a331-1875f4dd18b3': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-8231387692555228201', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '4f28af23-dd7e-413e-a331-1875f4dd18b3', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '8420', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN MOM not available. May 14 10:57:22 hetzner-XXXXXX vdsm[8252]: WARN MOM not available, KSM stats will be missing. May 14 10:58:47 hetzner-XXXXXX vdsm[8252]: WARN File: /var/lib/libvirt/qemu/channels/4f28af23-dd7e-413e-a331-1875f4dd18b3.com.redhat.rhevm.vdsm already removed May 14 10:58:47 hetzner-XXXXXX vdsm[8252]: WARN File: /var/lib/libvirt/qemu/channels/4f28af23-dd7e-413e-a331-1875f4dd18b3.org.qemu.guest_agent.0 already removed May 14 11:00:52 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:01:54 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:02:06 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:05:18 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:05:29 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:08:41 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:08:53 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:12:04 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:12:16 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:15:28 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 May 14 11:15:40 hetzner-XXXXXX vdsm[8252]: ERROR ssl handshake: SSLError, address: ::ffff:192.168.111.10 _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DOD7T7DL55TR5L...
-- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VVT3XFYQPSBQUE...
-- Simone Tiraboschi He / Him / His Principal Software Engineer Red Hat <https://www.redhat.com/> stirabos@redhat.com @redhatjobs <https://twitter.com/redhatjobs> redhatjobs <https://www.facebook.com/redhatjobs> @redhatjobs <https://instagram.com/redhatjobs> <https://red.ht/sig> <https://redhat.com/summit>

Hi, thanks to both - downgrading did not work, there is too much that needs to be removed and the old repos are deprecated and only available via eus. So, I'll setup the three node first and try to import the old domain there.
Yes, for this specific case the best option is to use latest RHV-H from 4.2 time.
Is 4.3 and RHEL hosts also a good option, or is there something specific in 4.2 that makes 3.6 domains better to import/attach?

On Tue, May 14, 2019 at 2:33 PM <axel.thimm@01lgc.com> wrote:
Hi,
thanks to both - downgrading did not work, there is too much that needs to be removed and the old repos are deprecated and only available via eus. So, I'll setup the three node first and try to import the old domain there.
Yes, for this specific case the best option is to use latest RHV-H from 4.2 time.
Is 4.3 and RHEL hosts also a good option, or is there something specific in 4.2 that makes 3.6 domains better to import/attach?
In 4.3 we completely removed the support for 3.6 and 4.0 datacenter/cluster levels as for: https://bugzilla.redhat.com/1655115 <https://bugzilla.redhat.com/show_bug.cgi?id=1655115> No way using 4.3 stuff (that's why we also removed --upgrade-appliance from hosted-engine in 4.3). Your only option now is to use 4.2 only repos with oVirt or RHV-H from 4.2 if on RHV.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WS7SMHFKALCL74...
-- Simone Tiraboschi He / Him / His Principal Software Engineer Red Hat <https://www.redhat.com/> stirabos@redhat.com @redhatjobs <https://twitter.com/redhatjobs> redhatjobs <https://www.facebook.com/redhatjobs> @redhatjobs <https://instagram.com/redhatjobs> <https://red.ht/sig> <https://redhat.com/summit>
participants (5)
-
Andreas Elvers
-
Axel.Thimm@01lgc.com
-
axel.thimm@01lgc.com
-
Simone Tiraboschi
-
Yedidyah Bar David