freezing VMs

13 Oct 2024

      Hey,

I'm wondering if  anyone is experiencing freezing VMs? Especially Windows servers and especially with a lot of RAM.

I found the following here:
https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/page-11

Post #218 and further down are really interesting, as the conclusion is that there might a problem in kernels between 4.18.0-372.26.1 all the way up to Mainstream 6.3 or LTS 6.1 kernels. The problem appears to be, that mmu_notifier_seq is referenced as an integer in the is_page_fault_stale() function, causing KVM to freeze when the counter reaches max integer - 2,147,483,647.

Post #220 seems to state, that it was fixed in commit ba6e3fe25543 in the kernel.

The problem is, that the latest kernel of my nodes running on Centos 8 are running 4.18.0-408.el8.x86_64, which is affected. I could try and downgrade on those nodes, but would lock me to the unsupported CentOS 8 oVirt nodes.

I tried a new oVirt node based on CentOS 9, en it comes with 5.14.0-514.el9.x86_64, which is also affected by the looks of it. I tried upgrading the kernel on CentOS 9 to the latest 6.1 LTS kernel og the latest 6.11 Mainstream kernel, and while the node works fine, it does not work for oVirt. The node cannot be activated, once the new kernel is in use.

Is I'm a fixer and not a develloper, I think the task migh be too big for me to fix ovirt and make it work with 6.1/6.3 kernels. My last attempt is going to be an attempt to backport the fix to the 5.14 kernel supplied with oVirt based on CentOS 9 nodes.

I know... I should probably look for a new solution, but oVirt has been running our many VMs quite well, at an affordable price. Yes we have more work fixing various issues that pop up from time to time, but if left alone, it does work quite well and stable.

Has anyone else encountered these issues?

//J

change_jeeringly679＠dralias.com

change_jeeringly679＠dralias.com

tags

participants (1)