High performance VM cannot migrate due to TSC frequency

Hello, I'm in 4.4.3 and CentOS 8.3 with 3 hosts. I have a high performance VM that is running on ov300 and is configured to be run on any host. It seems that both if I set or not the option Migrate only to hosts with the same TSC frequency I always am unable to migrate the VM and inside engine.log I see this: 2020-12-11 15:56:03,424+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-36) [e4801b28-c832-4474-aa53-4ebfd7c6e2d0] Candidate host 'ov301' ('382bfc8f-60d5-4e06-8571-7dae1700574d') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) 2020-12-11 15:56:03,424+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-36) [e4801b28-c832-4474-aa53-4ebfd7c6e2d0] Candidate host 'ov200' ('949d0087-2c24-4759-8427-f9eade1dd2cc') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) Can you verify if it is only my problem? Apart from the problem itself, what is "TSC frequency" and how can I check if my 3 hosts are different or not indeed? Normal VMs are able to migrate without problems Thanks, Gianluca

Gianluca Cecchi <gianluca.cecchi@gmail.com> writes:
Hello, I'm in 4.4.3 and CentOS 8.3 with 3 hosts.
I have a high performance VM that is running on ov300 and is configured to be run on any host.
It seems that both if I set or not the option
Migrate only to hosts with the same TSC frequency
I always am unable to migrate the VM and inside engine.log I see this:
2020-12-11 15:56:03,424+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-36) [e4801b28-c832-4474-aa53-4ebfd7c6e2d0] Candidate host 'ov301' ('382bfc8f-60d5-4e06-8571-7dae1700574d') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null)
2020-12-11 15:56:03,424+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-36) [e4801b28-c832-4474-aa53-4ebfd7c6e2d0] Candidate host 'ov200' ('949d0087-2c24-4759-8427-f9eade1dd2cc') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null)
Can you verify if it is only my problem?
Apart from the problem itself, what is "TSC frequency" and how can I check if my 3 hosts are different or not indeed?
TSC frequency is the frequency with which Time Stamp Counter register is updated, typically a nominal CPU frequency (see https://en.wikipedia.org/wiki/Time_Stamp_Counter for more details). You can check the value oVirt gets from libvirt by running # virsh -r capabilities and looking at the line like <counter name='tsc' frequency='2133409000' scaling='no'/> in the output. Unless frequency scaling is available, the host frequencies must be almost the same in order to be able to migrate high performance VMs among them. Note there is a bug that may cause a migration failure for the VMs even between hosts with the same frequencies (https://bugzilla.redhat.com/1821199). But this is apparently not your case, since the migration is prevented already by Engine. Regards, Milan
Normal VMs are able to migrate without problems
Thanks, Gianluca _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/2HYSCVHSVZS6KX...

On Fri, Dec 11, 2020 at 5:39 PM Milan Zamazal <mzamazal@redhat.com> wrote:
TSC frequency is the frequency with which Time Stamp Counter register is updated, typically a nominal CPU frequency (see https://en.wikipedia.org/wiki/Time_Stamp_Counter for more details).
You can check the value oVirt gets from libvirt by running
# virsh -r capabilities
and looking at the line like
<counter name='tsc' frequency='2133409000' scaling='no'/>
in the output. Unless frequency scaling is available, the host frequencies must be almost the same in order to be able to migrate high performance VMs among them.
Note there is a bug that may cause a migration failure for the VMs even between hosts with the same frequencies (https://bugzilla.redhat.com/1821199). But this is apparently not your case, since the migration is prevented already by Engine.
Regards, Milan
See here: [root@ov200 ~]# virsh -r capabilities | grep "name='tsc'" <counter name='tsc' frequency='3457996000' scaling='no'/> [root@ov200 ~]# [root@ov300 ~]# virsh -r capabilities | grep "name='tsc'" <counter name='tsc' frequency='3457988000' scaling='no'/> [root@ov300 ~]# [root@ov301 ~]# virsh -r capabilities | grep "name='tsc'" <counter name='tsc' frequency='3457997000' scaling='no'/> [root@ov301 ~]# The three hosts have the same model cpu Model name: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz and slightly different actual frequencies at a certain moment... But what does it mean so the checkbox Migrate only to hosts with the same TSC frequency if even if I don't check it the migration is prevented? BTW the command lscpu produces exactly the same output on the three hosts, apart "CPU MHz" and corresponding "BogoMIPS" that slightly change each time I run the command. And the flags for all are: Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d Gianluca

Gianluca Cecchi <gianluca.cecchi@gmail.com> writes:
On Fri, Dec 11, 2020 at 5:39 PM Milan Zamazal <mzamazal@redhat.com> wrote:
TSC frequency is the frequency with which Time Stamp Counter register is updated, typically a nominal CPU frequency (see https://en.wikipedia.org/wiki/Time_Stamp_Counter for more details).
You can check the value oVirt gets from libvirt by running
# virsh -r capabilities
and looking at the line like
<counter name='tsc' frequency='2133409000' scaling='no'/>
in the output. Unless frequency scaling is available, the host frequencies must be almost the same in order to be able to migrate high performance VMs among them.
Note there is a bug that may cause a migration failure for the VMs even between hosts with the same frequencies (https://bugzilla.redhat.com/1821199). But this is apparently not your case, since the migration is prevented already by Engine.
Regards, Milan
See here:
[root@ov200 ~]# virsh -r capabilities | grep "name='tsc'" <counter name='tsc' frequency='3457996000' scaling='no'/> [root@ov200 ~]#
[root@ov300 ~]# virsh -r capabilities | grep "name='tsc'" <counter name='tsc' frequency='3457988000' scaling='no'/> [root@ov300 ~]#
[root@ov301 ~]# virsh -r capabilities | grep "name='tsc'" <counter name='tsc' frequency='3457997000' scaling='no'/> [root@ov301 ~]#
The three hosts have the same model cpu Model name: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz and slightly different actual frequencies at a certain moment...
OK, so this is actually https://bugzilla.redhat.com/1821199.
But what does it mean so the checkbox
Migrate only to hosts with the same TSC frequency
if even if I don't check it the migration is prevented?
If the checkbox is unchecked, the migration shouldn't be prevented. I think the TSC frequency shouldn't be written to the VM domain XML in such a case and then there should be no restrictions (and no guarantees) on the frequency. Do you mean you can't migrate even with the checkbox unchecked? If so, what error message do you get in such a case?
BTW the command lscpu produces exactly the same output on the three hosts, apart "CPU MHz" and corresponding "BogoMIPS" that slightly change each time I run the command.
Yes, the TSC frequency is measured on each boot and may differ across reboots on the same host.
And the flags for all are:
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
Gianluca
Regards, Milan

On Wed, Dec 16, 2020 at 8:59 PM Milan Zamazal <mzamazal@redhat.com> wrote:
If the checkbox is unchecked, the migration shouldn't be prevented. I think the TSC frequency shouldn't be written to the VM domain XML in such a case and then there should be no restrictions (and no guarantees) on the frequency.
Do you mean you can't migrate even with the checkbox unchecked? If so, what error message do you get in such a case?
Yes, exactly. I powered off the VM and then disabled the check and then powered on the VM again, that is running on host ov301. ANd I have other two hosts: ov300 and ov200. From web admin gui if I select the VM and "migrate" button I cannot select the destination host and inside the bix there is the words "No available host to migrate VMs to" and going to engine.log, as soon as I click the "migrate" button I see these new lines: 2020-12-16 23:13:27,949+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-41) [308a29e2-2c4f-45fe-bdce-b032b36d4656] Candidate host 'ov300' ('07b979fb-4779-4477-89f2-6a96093c06f7') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) 2020-12-16 23:13:27,949+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-41) [308a29e2-2c4f-45fe-bdce-b032b36d4656] Candidate host 'ov200' ('949d0087-2c24-4759-8427-f9eade1dd2cc') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) 2020-12-16 23:13:28,032+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-38) [5837b695-c70d-4f45-a452-2c7c1b4ea69b] Candidate host 'ov300' ('07b979fb-4779-4477-89f2-6a96093c06f7') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) 2020-12-16 23:13:28,032+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-38) [5837b695-c70d-4f45-a452-2c7c1b4ea69b] Candidate host 'ov200' ('949d0087-2c24-4759-8427-f9eade1dd2cc') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) On all three nodes I have this kind of running kernel and package versions: [root@ov300 vdsm]# rpm -q qemu-kvm libvirt-daemon systemd qemu-kvm-4.2.0-34.module_el8.3.0+555+a55c8938.x86_64 libvirt-daemon-6.0.0-28.module_el8.3.0+555+a55c8938.x86_64 systemd-239-41.el8_3.x86_64 and [root@ov300 vdsm]# uname -r 4.18.0-240.1.1.el8_3.x86_64 [root@ov300 vdsm]# Gianluca

Gianluca Cecchi <gianluca.cecchi@gmail.com> writes:
On Wed, Dec 16, 2020 at 8:59 PM Milan Zamazal <mzamazal@redhat.com> wrote:
If the checkbox is unchecked, the migration shouldn't be prevented. I think the TSC frequency shouldn't be written to the VM domain XML in such a case and then there should be no restrictions (and no guarantees) on the frequency.
Do you mean you can't migrate even with the checkbox unchecked? If so, what error message do you get in such a case?
Yes, exactly. I powered off the VM and then disabled the check and then powered on the VM again, that is running on host ov301. ANd I have other two hosts: ov300 and ov200. From web admin gui if I select the VM and "migrate" button I cannot select the destination host and inside the bix there is the words "No available host to migrate VMs to" and going to engine.log, as soon as I click the "migrate" button I see these new lines:
I see, I can reproduce it. It looks like a bug in Engine. While the VM is correctly started without TSC frequency set, the migration filter in Engine apparently still applies. I'll add a note about it to the TSC migration bug. Regards, Milan
2020-12-16 23:13:27,949+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-41) [308a29e2-2c4f-45fe-bdce-b032b36d4656] Candidate host 'ov300' ('07b979fb-4779-4477-89f2-6a96093c06f7') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) 2020-12-16 23:13:27,949+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-41) [308a29e2-2c4f-45fe-bdce-b032b36d4656] Candidate host 'ov200' ('949d0087-2c24-4759-8427-f9eade1dd2cc') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) 2020-12-16 23:13:28,032+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-38) [5837b695-c70d-4f45-a452-2c7c1b4ea69b] Candidate host 'ov300' ('07b979fb-4779-4477-89f2-6a96093c06f7') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null) 2020-12-16 23:13:28,032+01 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-38) [5837b695-c70d-4f45-a452-2c7c1b4ea69b] Candidate host 'ov200' ('949d0087-2c24-4759-8427-f9eade1dd2cc') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Migration-Tsc-Frequency' (correlation id: null)
On all three nodes I have this kind of running kernel and package versions:
[root@ov300 vdsm]# rpm -q qemu-kvm libvirt-daemon systemd qemu-kvm-4.2.0-34.module_el8.3.0+555+a55c8938.x86_64 libvirt-daemon-6.0.0-28.module_el8.3.0+555+a55c8938.x86_64 systemd-239-41.el8_3.x86_64
and [root@ov300 vdsm]# uname -r 4.18.0-240.1.1.el8_3.x86_64 [root@ov300 vdsm]#
Gianluca

On Thu, Dec 17, 2020 at 5:30 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Gianluca Cecchi <gianluca.cecchi@gmail.com> writes:
On Wed, Dec 16, 2020 at 8:59 PM Milan Zamazal <mzamazal@redhat.com> wrote:
If the checkbox is unchecked, the migration shouldn't be prevented. I think the TSC frequency shouldn't be written to the VM domain XML in such a case and then there should be no restrictions (and no guarantees) on the frequency.
Do you mean you can't migrate even with the checkbox unchecked? If so, what error message do you get in such a case?
Yes, exactly. I powered off the VM and then disabled the check and then powered on the VM again, that is running on host ov301. ANd I have other two hosts: ov300 and ov200. From web admin gui if I select the VM and "migrate" button I cannot select the destination host and inside the bix there is the words "No available host to migrate VMs to" and going to engine.log, as soon as I click the "migrate" button I see these new lines:
I see, I can reproduce it. It looks like a bug in Engine. While the VM is correctly started without TSC frequency set, the migration filter in Engine apparently still applies.
I'll add a note about it to the TSC migration bug.
Regards, Milan
Ok, thanks. In the meantime do I have any sort of workaround to be able to migrate the VM? Eg I could set the VM as non High Performance, or any better other option? Gianluca

Gianluca Cecchi <gianluca.cecchi@gmail.com> writes:
On Thu, Dec 17, 2020 at 5:30 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Gianluca Cecchi <gianluca.cecchi@gmail.com> writes:
On Wed, Dec 16, 2020 at 8:59 PM Milan Zamazal <mzamazal@redhat.com> wrote:
If the checkbox is unchecked, the migration shouldn't be prevented. I think the TSC frequency shouldn't be written to the VM domain XML in such a case and then there should be no restrictions (and no guarantees) on the frequency.
Do you mean you can't migrate even with the checkbox unchecked? If so, what error message do you get in such a case?
Yes, exactly. I powered off the VM and then disabled the check and then powered on the VM again, that is running on host ov301. ANd I have other two hosts: ov300 and ov200. From web admin gui if I select the VM and "migrate" button I cannot select the destination host and inside the bix there is the words "No available host to migrate VMs to" and going to engine.log, as soon as I click the "migrate" button I see these new lines:
I see, I can reproduce it. It looks like a bug in Engine. While the VM is correctly started without TSC frequency set, the migration filter in Engine apparently still applies.
I'll add a note about it to the TSC migration bug.
Regards, Milan
Ok, thanks. In the meantime do I have any sort of workaround to be able to migrate the VM? Eg I could set the VM as non High Performance, or any better other option?
Non high performance VMs should migrate fine, but changing the VM kind requires restart. Once a high performance VM is running, I don't know about any good way to avoid the TSC constraint. Regards, Milan

I'd put my money on a fall-through error condition where TSC is simply the last one with a 'good' error message pointed to. I have clusters with CPUs that are both 10 years and 10x apart in performance performing migrations between themselves quite happily (Sandy Bridge dual quads to Skylake 56 cores), as long as you make sure the cluster and default machine type low enough. Ok this is 4.3 still, but... So what if you start the VM on the 'weak' host first? Can it then move freely?
participants (3)
-
Gianluca Cecchi
-
Milan Zamazal
-
thomas@hoberg.net