Unable to live migrate a VM from 4.4.2 to 4.4.3 CentOS Linux host

Hello, I was able to update an external CentOS Linux 8.2 standalone engine from 4.4.2 to 4.4.3 (see dedicated thread). Then I was able to put into maintenance one 4.4.2 host (CentOS Linux 8.2 based, not ovirt node ng) and run: [root@ov301 ~]# dnf update Last metadata expiration check: 0:27:11 ago on Wed 11 Nov 2020 08:48:04 PM CET. Dependencies resolved. ====================================================================================================================== Package Arch Version Repository Size ====================================================================================================================== Installing: kernel x86_64 4.18.0-193.28.1.el8_2 BaseOS 2.8 M kernel-core x86_64 4.18.0-193.28.1.el8_2 BaseOS 28 M kernel-modules x86_64 4.18.0-193.28.1.el8_2 BaseOS 23 M ovirt-ansible-collection noarch 1.2.1-1.el8 ovirt-4.4 276 k replacing ovirt-ansible-engine-setup.noarch 1.2.4-1.el8 replacing ovirt-ansible-hosted-engine-setup.noarch 1.1.8-1.el8 Upgrading: ansible noarch 2.9.15-2.el8 ovirt-4.4-centos-ovirt44 17 M bpftool x86_64 4.18.0-193.28.1.el8_2 BaseOS 3.4 M cockpit-ovirt-dashboard noarch 0.14.13-1.el8 ovirt-4.4 3.5 M ioprocess x86_64 1.4.2-1.el8 ovirt-4.4 37 k kernel-tools x86_64 4.18.0-193.28.1.el8_2 BaseOS 3.0 M kernel-tools-libs x86_64 4.18.0-193.28.1.el8_2 BaseOS 2.8 M libiscsi x86_64 1.18.0-8.module_el8.2.0+524+f765f7e0 AppStream 89 k nftables x86_64 1:0.9.3-12.el8_2.1 BaseOS 311 k ovirt-hosted-engine-ha noarch 2.4.5-1.el8 ovirt-4.4 325 k ovirt-hosted-engine-setup noarch 2.4.8-1.el8 ovirt-4.4 227 k ovirt-imageio-client x86_64 2.1.1-1.el8 ovirt-4.4 21 k ovirt-imageio-common x86_64 2.1.1-1.el8 ovirt-4.4 155 k ovirt-imageio-daemon x86_64 2.1.1-1.el8 ovirt-4.4 15 k ovirt-provider-ovn-driver noarch 1.2.32-1.el8 ovirt-4.4 27 k ovirt-release44 noarch 4.4.3-1.el8 ovirt-4.4 17 k python3-ioprocess x86_64 1.4.2-1.el8 ovirt-4.4 33 k python3-nftables x86_64 1:0.9.3-12.el8_2.1 BaseOS 25 k python3-ovirt-engine-sdk4 x86_64 4.4.6-1.el8 ovirt-4.4 560 k python3-perf x86_64 4.18.0-193.28.1.el8_2 BaseOS 2.9 M python3-pyasn1 noarch 0.4.6-3.el8 ovirt-4.4-centos-opstools 140 k python3-pyasn1-modules noarch 0.4.6-3.el8 ovirt-4.4-centos-opstools 151 k qemu-img x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 1.0 M qemu-kvm x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 118 k qemu-kvm-block-curl x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 129 k qemu-kvm-block-gluster x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 131 k qemu-kvm-block-iscsi x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 136 k qemu-kvm-block-rbd x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 130 k qemu-kvm-block-ssh x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 131 k qemu-kvm-common x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 1.2 M qemu-kvm-core x86_64 15:4.2.0-29.el8.6 ovirt-4.4-advanced-virtualization 3.4 M selinux-policy noarch 3.14.3-41.el8_2.8 BaseOS 615 k selinux-policy-targeted noarch 3.14.3-41.el8_2.8 BaseOS 15 M spice-server x86_64 0.14.2-1.el8_2.1 AppStream 404 k tzdata noarch 2020d-1.el8 BaseOS 471 k vdsm x86_64 4.40.35.1-1.el8 ovirt-4.4 1.4 M vdsm-api noarch 4.40.35.1-1.el8 ovirt-4.4 106 k vdsm-client noarch 4.40.35.1-1.el8 ovirt-4.4 24 k vdsm-common noarch 4.40.35.1-1.el8 ovirt-4.4 136 k vdsm-hook-ethtool-options noarch 4.40.35.1-1.el8 ovirt-4.4 9.8 k vdsm-hook-fcoe noarch 4.40.35.1-1.el8 ovirt-4.4 10 k vdsm-hook-openstacknet noarch 4.40.35.1-1.el8 ovirt-4.4 18 k vdsm-hook-vhostmd noarch 4.40.35.1-1.el8 ovirt-4.4 17 k vdsm-hook-vmfex-dev noarch 4.40.35.1-1.el8 ovirt-4.4 11 k vdsm-http noarch 4.40.35.1-1.el8 ovirt-4.4 15 k vdsm-jsonrpc noarch 4.40.35.1-1.el8 ovirt-4.4 31 k vdsm-network x86_64 4.40.35.1-1.el8 ovirt-4.4 331 k vdsm-python noarch 4.40.35.1-1.el8 ovirt-4.4 1.3 M vdsm-yajsonrpc noarch 4.40.35.1-1.el8 ovirt-4.4 40 k Installing dependencies: NetworkManager-ovs x86_64 1:1.22.14-1.el8 ovirt-4.4-copr:copr.fedorainfracloud.org:networkmanager:NetworkManager-1.22 144 k Transaction Summary ====================================================================================================================== Install 5 Packages Upgrade 48 Packages Total download size: 116 M After reboot I can activate the host (strange that I see many pop up messages about "finished activating host") and the host is shown as OS Version: RHEL - 8.2 - 2.2004.0.2.el8 OS Description: CentOS Linux 8 (Core) Kernel Version: 4.18.0 - 193.28.1.el8_2.x86_64 KVM Version: 4.2.0 - 29.el8.6 LIBVIRT Version: libvirt-6.0.0-25.2.el8 VDSM Version: vdsm-4.40.35.1-1.el8 SPICE Version: 0.14.2 - 1.el8_2.1 GlusterFS Version: [N/A] CEPH Version: librbd1-12.2.7-9.el8 Open vSwitch Version: [N/A] Nmstate Version: nmstate-0.2.10-1.el8 Kernel Features: MDS: (Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable), L1TF: (Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable), SRBDS: (Not affected), MELTDOWN: (Mitigation: PTI), SPECTRE_V1: (Mitigation: usercopy/swapgs barriers and __user pointer sanitization), SPECTRE_V2: (Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling), ITLB_MULTIHIT: (KVM: Mitigation: Split huge pages), TSX_ASYNC_ABORT: (Not affected), SPEC_STORE_BYPASS: (Mitigation: Speculative Store Bypass disabled via prctl and seccomp) VNC Encryption: Disabled FIPS mode enabled: Disabled while another host still in 4.4.2: OS Version: RHEL - 8.2 - 2.2004.0.2.el8 OS Description: CentOS Linux 8 (Core) Kernel Version: 4.18.0 - 193.19.1.el8_2.x86_64 KVM Version: 4.2.0 - 29.el8.3 LIBVIRT Version: libvirt-6.0.0-25.2.el8 VDSM Version: vdsm-4.40.26.3-1.el8 SPICE Version: 0.14.2 - 1.el8 GlusterFS Version: [N/A] CEPH Version: librbd1-12.2.7-9.el8 Open vSwitch Version: [N/A] Nmstate Version: nmstate-0.2.10-1.el8 Kernel Features: MDS: (Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable), L1TF: (Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable), SRBDS: (Not affected), MELTDOWN: (Mitigation: PTI), SPECTRE_V1: (Mitigation: usercopy/swapgs barriers and __user pointer sanitization), SPECTRE_V2: (Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling), ITLB_MULTIHIT: (KVM: Mitigation: Split huge pages), TSX_ASYNC_ABORT: (Not affected), SPEC_STORE_BYPASS: (Mitigation: Speculative Store Bypass disabled via prctl and seccomp) VNC Encryption: Disabled FIPS mode enabled: Disabled But if I try to move away VMs from the 4.4.2 host to the 4.4.3 one I get error: Failed to migrate VM c8client to Host ov301 . Trying to migrate to another Host. (btw: there is no other active host; there is a ov300 host that is in maintenance) No available host was found to migrate VM c8client to. It seems the root error in engine.log is: 2020-11-11 21:44:42,487+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-11) [] Migration of VM 'c8client' to host 'ov301' failed: VM destroyed during the startup. On target host in /var/log/libvirt/qemu/c8clinet.log I see: 2020-11-11 20:44:40.981+0000: shutting down, reason=failed In target vdsm.log 2020-11-11 21:44:39,958+0100 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call VM.migrationCreate took more than 1.00 seconds to succeed: 1.97 (__init__:316) 2020-11-11 21:44:40,230+0100 INFO (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=cb51fd4a-09d3-4d77-821b-391da2467487 (api:48) 2020-11-11 21:44:40,231+0100 INFO (periodic/3) [vdsm.api] FINISH repoStats return={'fa33df49-b09d-4f86-9719-ede649542c21': {'code': 0, 'lastCheck': '4.1', 'delay': '0.000836715', 'valid': True, 'version': 4, 'acquired': True, 'actual': True}} from=internal, task_id=cb51fd4a-09d3-4d77-821b-391da2467487 (api:54) 2020-11-11 21:44:41,929+0100 INFO (jsonrpc/5) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:10.4.192.32,52266, vmId=c95da734-7ed1-4caa-bacb-3fa24f4efb56 (api:48) 2020-11-11 21:44:41,930+0100 INFO (jsonrpc/5) [virt.vm] (vmId='c95da734-7ed1-4caa-bacb-3fa24f4efb56') Release VM resources (vm:4666) 2020-11-11 21:44:41,930+0100 INFO (jsonrpc/5) [virt.vm] (vmId='c95da734-7ed1-4caa-bacb-3fa24f4efb56') Stopping connection (guestagent:444) 2020-11-11 21:44:41,930+0100 INFO (jsonrpc/5) [vdsm.api] START teardownImage(sdUUID='fa33df49-b09d-4f86-9719-ede649542c21', spUUID='ef17cad6-7724-4cd8-96e3-9af6e529db51', imgUUID='ff10a405-cc61-4d00-a83f-3ee04b19f381', volUUID=None) from=::ffff:10.4.192.32,52266, task_id=177461c0-83d6-4c90-9c5c-3cc8ee9150c7 (api:48) It seems that during the host update the OVN configuration has not been maintained. Right now all my active VMs are with at least a vnic on OVN so I cannot test the scenario of migrating a VM without OVN based vnic. In fact on engine I see only the currently active host in 4.4.2 (ov200) and another host that is in maintenance (it is still in 4.3.10; I wanted to update to 4.4.2 but I realized that 4.4.3 has been out...): [root@ovmgr1 ovirt-engine]# ovn-sbctl show Chassis "6a46b802-5a50-4df5-b1af-e73f58a57164" hostname: "ov200.mydomain" Encap geneve ip: "10.4.192.32" options: {csum="true"} Port_Binding "2ae7391b-4297-4247-a315-99312f6392e6" Port_Binding "c1ec60a4-b4f3-4cb5-8985-43c086156e83" Port_Binding "174b69f8-00ed-4e25-96fc-7db11ea8a8b9" Port_Binding "66359e79-56c4-47e0-8196-2241706329f6" Port_Binding "ccbd6188-78eb-437b-9df9-9929e272974b" Chassis "ddecf0da-4708-4f93-958b-6af365a5eeca" hostname: "ov300.mydomain" Encap geneve ip: "10.4.192.33" options: {csum="true"} [root@ovmgr1 ovirt-engine]# Any hint about the reason of losing OVN config for ov301 and the correct procedure to get it again and persiste future updates? NOTE: this was a cluster in 4.3.10 and I updated it to 4.4.2 and I noticed that the OVN config was not retained and I had to run on hosts: [root@ov200 ~]# vdsm-tool ovn-config engine_ip ov200_ip_on_mgmt Using default PKI files Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch.service → /usr/lib/systemd/system/openvswitch.service. Created symlink /etc/systemd/system/multi-user.target.wants/ovn-controller.service → /usr/lib/systemd/system/ovn-controller.service. [root@ov200 ~]# Now it seems the problem persists... Why do I have to run each time? Gianluca

On Wed, Nov 11, 2020 at 10:01 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
NOTE: this was a cluster in 4.3.10 and I updated it to 4.4.2 and I noticed that the OVN config was not retained and I had to run on hosts:
[root@ov200 ~]# vdsm-tool ovn-config engine_ip ov200_ip_on_mgmt Using default PKI files Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch.service → /usr/lib/systemd/system/openvswitch.service. Created symlink /etc/systemd/system/multi-user.target.wants/ovn-controller.service → /usr/lib/systemd/system/ovn-controller.service. [root@ov200 ~]#
Now it seems the problem persists... Why do I have to run each time?
Gianluca
In the mean time I confirm that the manual step below on ov301 let me saw it again between the chassis of OVN southbound on engine and I was able to migrate VMs to update the other host and then for example to successfully ping between VMs on OVN across the two hosts: [root@ov301 vdsm]# vdsm-tool ovn-config 10.4.192.43 10.4.192.34 Using default PKI files Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch.service → /usr/lib/systemd/system/openvswitch.service. Created symlink /etc/systemd/system/multi-user.target.wants/ovn-controller.service → /usr/lib/systemd/system/ovn-controller.service. [root@ov301 vdsm]#

On Thu, Nov 12, 2020 at 11:08 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Wed, Nov 11, 2020 at 10:01 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
NOTE: this was a cluster in 4.3.10 and I updated it to 4.4.2 and I noticed that the OVN config was not retained and I had to run on hosts:
[root@ov200 ~]# vdsm-tool ovn-config engine_ip ov200_ip_on_mgmt Using default PKI files Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch.service → /usr/lib/systemd/system/openvswitch.service. Created symlink /etc/systemd/system/multi-user.target.wants/ovn-controller.service → /usr/lib/systemd/system/ovn-controller.service. [root@ov200 ~]#
Now it seems the problem persists... Why do I have to run each time?
Gianluca
In the mean time I confirm that the manual step below on ov301 let me saw it again between the chassis of OVN southbound on engine and I was able to migrate VMs to update the other host and then for example to successfully ping between VMs on OVN across the two hosts:
[root@ov301 vdsm]# vdsm-tool ovn-config 10.4.192.43 10.4.192.34 Using default PKI files Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch.service → /usr/lib/systemd/system/openvswitch.service. Created symlink /etc/systemd/system/multi-user.target.wants/ovn-controller.service → /usr/lib/systemd/system/ovn-controller.service. [root@ov301 vdsm]#
One further update. On the other 4.4.2 host I applied the now recommended approach of updating from web admin gui, after putting the host into maintenenace: Hosts --> Select Host --> Installation --> Upgrade I deselected the reboot host and the update completed successfully. Then I manually rebooted the host from web admin gui: Management --> SSH Management --> Restart At reboot all is ok and I still see the host as one of the southbound chassis I can activate the host (why at least 10 popups with the same message "Finished Activating Host ov200"???) If I compare with diff the packages installed on the two hosts I see: < = ov200 (the one from web admin gui)
= ov301 (updated through dnf update)
19c19 < ansible-2.9.14-1.el8.noarch ---
ansible-2.9.15-2.el8.noarch 262d261 < gpg-pubkey-56863776-5f117571 658c657 < NetworkManager-1.26.2-1.el8.x86_64
NetworkManager-1.22.14-1.el8.x86_64 660,663c659,662 < NetworkManager-libnm-1.26.2-1.el8.x86_64 < NetworkManager-ovs-1.26.2-1.el8.x86_64 < NetworkManager-team-1.26.2-1.el8.x86_64 < NetworkManager-tui-1.26.2-1.el8.x86_64
NetworkManager-libnm-1.22.14-1.el8.x86_64 NetworkManager-ovs-1.22.14-1.el8.x86_64 NetworkManager-team-1.22.14-1.el8.x86_64 NetworkManager-tui-1.22.14-1.el8.x86_64 1079d1077 < yum-utils-4.0.12-4.el8_2.noarch
any comments? On the host updated through the web admin gui, if I run dnf update I 'm proposed with: Dependencies resolved. ====================================================================================================================== Package Arch Version Repository Size ====================================================================================================================== Upgrading: NetworkManager-config-server noarch 1:1.26.2-1.el8 ovirt-4.4-copr:copr.fedorainfracloud.org:networkmanager:NetworkManager-1.26 117 k ansible noarch 2.9.15-2.el8 ovirt-4.4-centos-ovirt44 17 M nmstate noarch 0.3.6-2.el8 ovirt-4.4-copr:copr.fedorainfracloud.org:nmstate:nmstate-0.3 34 k python3-libnmstate noarch 0.3.6-2.el8 ovirt-4.4-copr:copr.fedorainfracloud.org:nmstate:nmstate-0.3 178 k Installing dependencies: python3-varlink noarch 29.0.0-1.el8 BaseOS 49 k Transaction Summary ====================================================================================================================== Install 1 Package Upgrade 4 Packages Total download size: 18 M Why ansible has not been updated? Probably on CentOS lInux plain host I shouldn't run at all any "dnf update" command? Or what is a clear statement for managing plain CentOS Linux hosts in 4.4? In case couldn't be put in place sort of global version lock to prevent "dnf update" commands? Thanks, Gianluca

On Thu, Nov 12, 2020 at 11:08 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Wed, Nov 11, 2020 at 10:01 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
NOTE: this was a cluster in 4.3.10 and I updated it to 4.4.2 and I noticed that the OVN config was not retained and I had to run on hosts:
[root@ov200 ~]# vdsm-tool ovn-config engine_ip ov200_ip_on_mgmt Using default PKI files Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch.service → /usr/lib/systemd/system/openvswitch.service. Created symlink /etc/systemd/system/multi-user.target.wants/ovn-controller.service → /usr/lib/systemd/system/ovn-controller.service. [root@ov200 ~]#
Now it seems the problem persists... Why do I have to run each time?
Gianluca
In the mean time I confirm that the manual step below on ov301 let me saw it again between the chassis of OVN southbound on engine and I was able to migrate VMs to update the other host and then for example to successfully ping between VMs on OVN across the two hosts:
[root@ov301 vdsm]# vdsm-tool ovn-config 10.4.192.43 10.4.192.34 Using default PKI files Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch.service → /usr/lib/systemd/system/openvswitch.service. Created symlink /etc/systemd/system/multi-user.target.wants/ovn-controller.service → /usr/lib/systemd/system/ovn-controller.service. [root@ov301 vdsm]#
As this OVN related problem seems often impacting me, I created this bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1910340 Thanks for watching Gianluca
participants (1)
-
Gianluca Cecchi