Re: [Engine-devel] Mom Balloon policy issue

9 Oct 2012

      Thanks for writing this.  Some thoughts inline, below.  Also, cc'ing some lists
in case other folks want to participate in the discussion.

On Tue, Oct 09, 2012 at 01:12:30PM -0400, Noam Slomianko wrote:
...
Greetings,
I've fiddled around with ballooning and wanted to raise a question for debate.
Currently as long as the host is under memory pressure, MOM will try and reclaim back memory from all guests with more free memory then a given threshold.
Main issue: Guest allocated memory is not the same as the resident (physical) memory used by qemu.
This means that when memory is reclaimed back (the balloon is inflated) we might not get as much memory as planed back (or non at all).
*Example1 no memory is reclaimed back:
    name | allocated memory | used by the vm | resident memory used in the host by qemu
    Vm1  |       4G         |       4G,      |                4G
    Vm2  |       4G         |       1G       |                1G
 - MOM will inflate the balloon in vm2 (as vm has no free memory) and will gain no memory
One thing to keep in mind is that VMs having less host RSS than their memory
allocation is a temporary condition.  All VMs will eventually consume their full
allocation if allowed to run.  I'd be curious to know how long this process
takes in general.

We might be able to handle this case by refusing to inflate the balloon if:
    (VM free memory - planned balloon inflation) > host RSS
...
*Example1 memory is reclaimed partially:
    name | allocated memory | used by the vm | resident memory used in the host by qemu
    Vm1  |       4G         |       4G,      |                4G
    Vm2  |       4G         |       1G       |                1G
    Vm3  |       4G         |       1G       |                4G
 - MOM will inflate the balloon in vm2 and vm3 slowly gaining only from vm3
The above rule extension may help here too.
...
this behaviour might in the cause us to:
 * spend time reclaiming memory from many guests when we can reclaim only from a subgroup
 * be under the impression that we have more potential memory to reclaim when we do
 * bring inactive VMs dangerously low as they are constantly reclaimed (I've had guests crashing from kernel out of memory)
To address this I suggest that we collect guest memory stats from libvirt as well, so we have the option to use them in our calculations.
This can be achieved with the command "virsh dommemstat <domain>" which returns
    actual 3915372 (allocated)
    rss 2141580 (resident memory used by qemu)
I would suggest adding these two fields to the VmStats that are collected by
vdsm.  Then, to try it out, add the fields to the GuestMemory Collector.  (Note:
MOM does have a collector that gathers RSS for VMs.  It's called GuestQemuProc).
You can then extend the Balloon policy to add a snippet to check if the proposed
balloon adjustment should be carried out.  You could add the logic to the
change_big_enough function.
...
additional topic:
 * should we include per guest config (for example a hard minimum memory cap, this vm cannot run effectively with less then 1G memory)
Yes.  This is probably something we want to do.  There is a whole topic around
VM tagging that we should consider.  In the future we will want to be able to do
many different things in policy based on a VMs tag.  For example, some VMs may
be completely exempt from ballooning.  Others may have a minimum limit.

I want to avoid passing in the raw guest configuration because MOM needs to work
with direct libvirt vms and with ovirt/vdsm vms.  Therefore, we want to think
carefully about the abstractions we use when presenting VM properties to MOM.

-- 
Adam Litke <agl@us.ibm.com>
IBM Linux Technology Center