[ovirt-users] Re: Problem with Ovirt Machines

Sunday, 31 May 2020

На 31 май 2020 г. 15:52:14 GMT+03:00, aigini82(a)gmail.com написа:
...
Hi,

Our company uses Ovirt to host some of its virtual machines. The
version used is 4.2.6.4-1.el7. There are about 36 virtual hosts in it.
The specifications used for the host machine is 30G RAM and 6 CPUs.
Some of the VMs in the ovirt host run with 4 CPUs. Some with 2 CPUs. 

The problem I face now is that recently there was a need for high CPU
and memory specs to setup a VM for DR. I created a VM with 16G RAM and
6 CPUs, without checking the CPUs available in the host first. After
DR, the VM was brought down already. Then later another person in the
team brought the VM back up for a different DR use, for a much larger
DB restoration purpose.

This caused the VM to pause due to storage error. And then worse things
happened, whereby 2 other VMs inadvertently went down. Although I
assumed that this was caused by storage errors/problems, the senior
admins in the team concluded that the problem was due to fencing
because of the max allotted CPU for the host being used for the VM. 
 Check the libvirt logs  on the host where the VM was  running. In the engine, you can
check the logs  for any fencing, but I have never seen such thing as "excessive CPU
allocation" to cause fencing.
Either the VM passes the checks (overcommit rules,  scheduling,etc)  and gets up and
running  or the engine will refuse  to power  it up.
Also  check via  journalctl  for any messages  at that time  for the
'sanlock.service' .  Any issues  (storage  unavailable or high lattency detected) 
will be reported  via the sanlock service  on the affected node.
If you use multipath  -  check if it also reported  any paths failing.

...
Now what I need to know is how to properly allocate CPU resources to
a
host to run multiple virtual machines in it like the situation above.  The  best way
is to start with less  CPUs  as possible.
Here is a  short (or maybe not)  example:
Hoypervisor has  8 CPUs/8 Threads. First VM has 1  CPU. Second has 6 CPUs allocated and a
third VM has  8 CPUs allocated.
For the hypervisor to allocate CPU time for the third  (beefy) VM,  it needs to have all 8
CPUs  available. As the host itself has 8 cores and usually some  OS stuff is going on -
the third  VM  will receive  far less  CPU time than first/second VM.

...
I even tried to look for errors in vdsm.log, but this log was not
available in the host machine nor in the affected VM. My colleague
asked me to check "Events" section of the ovirt management interface to
see past the past events. However, I don't find much details about the
fencing activity or how the fencing occurred or what caused the
fencing. 

And how did they conclude that the CPU count caused the fencing and not
the storage? 
Interesting question... I think they just assumed. In worst case  scenario (CPU 
starvation), the vdsmd service  might not respond to the engine, but then a 'soft'
reboot will happen where the engine will restart  this service over ssh.
...
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QX7NAZQ67VB...

Best  Regards,
Strahil  Nikolov

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Re: Problem with Ovirt Machines