Hi Gianluca,

I am facing two different cases. Lets say the first case "stuck VM" and the second "fake 100% CPU". On both I have verified that I have no storage issues. Gluster volumes are up and accessible with other VMs (Windows 10 and Windows server 2016) running normally. The "stuck VM" case I have observed more rarely. For the fake 100CPU% case, I suspect it could be sth with the guest agent drivers or sth between qemu and Win10 since I've never seen this with Windows 2016 server or Linux VMs.

Alex

On Mon, Jan 1, 2018 at 9:56 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Jan 1, 2018 at 8:43 PM, Alex K <rightkicktech@gmail.com> wrote:
Hi all and Happy New Year!

I have a ovirt 4.1.3.5 cluster (running with 3 nodes and shared gluster storage).
I have randomly observed that some Windows 10 64bit VMs are reported from engine dashboard with 100%CPU while when connecting within the VM the CPU utilization is normal.
Sometimes, when reported with 100% CPU I cannot get a console at VM (console gives black screen) then I have to force shutdown the VM and start it up again. The only warning I see is in the qemu logs of the guest reporting that CPUs not present in any NUMA nodes.

Any ideas how to tackle this?

Thanx,
Alex

 
Hi Alex,
I have seen something similar but on ISCSI domain environment and not GlusterFS one, when I got problems with the storage array (in my case it was a firmware update that lasted too much) and the VMs were paused and after some seconds reactivated again.
For some of them I registered the related qemu-kvm process going to fixed 100% cpu usage and unable to open spice console (black screen). But in my case also the VM itself was stuck: unable to connect to it via network or ping.
I had to force power off the VM and power on it again. Some other VMs resumed from pause state without any apparent problem (apart from clock unsync).
Both the good and bad VMs had ovirt guest agent running: they were CentOS 6.5 VMs
Perhaps your situation is something in the middle.... verify you didn't had any problem with your storage and that your problematic VM had not been paused/resumed due to that

Gianluca