Different cpu flags preventing live migration

Hello, I would like to share some problems that I'm finding on an environment based on RHV, but could impact also oVirt based environments. For my problems I'm working on an opened case to understand root cause. I have Dell R730 hypervisors (with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz cpu) and some VMs configured for "customized" high performance. In the sense that I configure VM as high performance type and then: - I enable the graphical console (I remove the flag of headless Mode" in Console section); - In Host section I set for "Start Running On" Any Host in Cluster and allow manual and automating migration; - I also set HA in high availability section, accepting default values for leasing, resume behaviour, ecc.. These settings however have the effect to leave enable the flag related to "Pass-Through Host CPU" in Host section. I noticed that with same Bios, ecc with kernel in 4.3.8 (that seems equal between oVirt and RHV): [g.cecchi@ov200 ~]$ uname -r 3.10.0-1062.12.1.el7.x86_64 [g.cecchi@ov200 ~]$ I have this cpu flag set: invpcid_single Instead on the same host in 4.3.5 (eg kernel 3.10.0-1062.el7.x86_64) the flag is not set. This creates problem during upgrade from 4.3.5 to 4.3.8 because initially I empty one host and update it, but then I cannot live migrate back the VMs to update the other one, due to the pass-through flag set, that requires exact set of flags. So crosscheck on a test environment in case you have the "Pass-Through Host CPU" set in Host section for any critical VM you cannot shutdown. BTW: the flag is somehow tricky to remove if you set the VM as High Performance type; you have to: edit VM --> Host Start Running On --> change to select a specific host among the available ones Now you can remove the "Pass-Through Host CPU" flag --> remove it Start Running On --> change again to "Any Host in Cluster" Save Now you can live migrate this "customized" High Performance VM But the problem is that if the VM is running, you have to shutdown it to get the change in effect. HH, Gianluca

On 2 Mar 2020, at 12:10, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, I would like to share some problems that I'm finding on an environment based on RHV, but could impact also oVirt based environments. For my problems I'm working on an opened case to understand root cause. I have Dell R730 hypervisors (with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz cpu) and some VMs configured for "customized" high performance. In the sense that I configure VM as high performance type and then: - I enable the graphical console (I remove the flag of headless Mode" in Console section); - In Host section I set for "Start Running On" Any Host in Cluster and allow manual and automating migration; - I also set HA in high availability section, accepting default values for leasing, resume behaviour, ecc..
These settings however have the effect to leave enable the flag related to "Pass-Through Host CPU" in Host section.
I noticed that with same Bios, ecc with kernel in 4.3.8 (that seems equal between oVirt and RHV): [g.cecchi@ov200 ~]$ uname -r 3.10.0-1062.12.1.el7.x86_64 [g.cecchi@ov200 ~]$
I have this cpu flag set: invpcid_single
Instead on the same host in 4.3.5 (eg kernel 3.10.0-1062.el7.x86_64) the flag is not set.
sometimes there are kernel changes like that, but more usually it’s just the microcode version on each host (not bios), did you check that too?
This creates problem during upgrade from 4.3.5 to 4.3.8 because initially I empty one host and update it, but then I cannot live migrate back the VMs to update the other one, due to the pass-through flag set, that requires exact set of flags.
yes, they need to completely match, and even when they do..it’s a good idea to really do this only across same hw, same microcode, same kernel versions…
So crosscheck on a test environment in case you have the "Pass-Through Host CPU" set in Host section for any critical VM you cannot shutdown.
BTW: the flag is somehow tricky to remove if you set the VM as High Performance type; you have to: edit VM --> Host Start Running On --> change to select a specific host among the available ones Now you can remove the "Pass-Through Host CPU" flag --> remove it
but then you’re losing probably the biggest differentiator for “high performance”, so then maybe not use that profile at all?
Start Running On --> change again to "Any Host in Cluster" Save
Now you can live migrate this "customized" High Performance VM But the problem is that if the VM is running, you have to shutdown it to get the change in effect.
HH, Gianluca _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DB4FSA6MIDBJ35...

On Mon, Mar 2, 2020 at 4:51 PM Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
I have this cpu flag set: invpcid_single
Instead on the same host in 4.3.5 (eg kernel 3.10.0-1062.el7.x86_64) the
flag is not set.
sometimes there are kernel changes like that, but more usually it’s just the microcode version on each host (not bios), did you check that too?
I do think so too. The microcode update from 4.3.5 to 4.3.8 consists in microcode_ctl passing from microcode_ctl-2.1-53.el7.x86_64 to microcode_ctl-2.1-53.7.el7_7.x86_64. And in the mean time this solution has been created, that confirms the flag introduced for Meltdown mitigations: https://access.redhat.com/solutions/4866021
Now you can remove the "Pass-Through Host CPU" flag --> remove it
but then you’re losing probably the biggest differentiator for “high performance”, so then maybe not use that profile at all?
Yes, you are right ;-) I used it mainly for enabling I/O threads and testing on huge pages (and different icon display ...) Gianluca

Hi Gianluca, I'm also using the Pass-through Host CPU on my AMD cluster and I have never had such behaviour. This leads me to the Specte/Meltdown mitigations that are constantly affecting Intel's CPUs. As per https://access.redhat.com/solutions/4866021 you have to power off and then on the VM or to find which mitigation is enforcing this flag and to disable it (if you think the added security is not necessary). Most probably avoiding the upgrade of the intel-ucode could save you the headaches - but on a price (security is important and depends on the oVirt Usage). Best Regards, Strahil Nikolov В понеделник, 2 март 2020 г., 13:12:06 ч. Гринуич+2, Gianluca Cecchi <gianluca.cecchi@gmail.com> написа: Hello, I would like to share some problems that I'm finding on an environment based on RHV, but could impact also oVirt based environments. For my problems I'm working on an opened case to understand root cause. I have Dell R730 hypervisors (with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz cpu) and some VMs configured for "customized" high performance. In the sense that I configure VM as high performance type and then: - I enable the graphical console (I remove the flag of headless Mode" in Console section); - In Host section I set for "Start Running On" Any Host in Cluster and allow manual and automating migration; - I also set HA in high availability section, accepting default values for leasing, resume behaviour, ecc.. These settings however have the effect to leave enable the flag related to "Pass-Through Host CPU" in Host section. I noticed that with same Bios, ecc with kernel in 4.3.8 (that seems equal between oVirt and RHV): [g.cecchi@ov200 ~]$ uname -r 3.10.0-1062.12.1.el7.x86_64 [g.cecchi@ov200 ~]$ I have this cpu flag set: invpcid_single Instead on the same host in 4.3.5 (eg kernel 3.10.0-1062.el7.x86_64) the flag is not set. This creates problem during upgrade from 4.3.5 to 4.3.8 because initially I empty one host and update it, but then I cannot live migrate back the VMs to update the other one, due to the pass-through flag set, that requires exact set of flags. So crosscheck on a test environment in case you have the "Pass-Through Host CPU" set in Host section for any critical VM you cannot shutdown. BTW: the flag is somehow tricky to remove if you set the VM as High Performance type; you have to: edit VM --> Host Start Running On --> change to select a specific host among the available ones Now you can remove the "Pass-Through Host CPU" flag --> remove it Start Running On --> change again to "Any Host in Cluster" Save Now you can live migrate this "customized" High Performance VM But the problem is that if the VM is running, you have to shutdown it to get the change in effect. HH, Gianluca _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DB4FSA6MIDBJ35...
participants (3)
-
Gianluca Cecchi
-
Michal Skrivanek
-
Strahil Nikolov