[Engine-devel] Mom Balloon policy issue

Wed Oct 10 09:33:20 UTC 2012

On 10/10/2012 11:23 AM, Noam Slomianko wrote:
> Regarding qemu-kvm memory allocation behaviour, this is the clearest explanation I found:
>
> "Unless using explicit hugepages for the guest, KVM will
> *not* initially allocate all RAM from the host OS. Each
> page of RAM is only allocated when the guest OS touches
> that page. So if you set memory=8G and currentMemory=4G,
> although you might expect resident usage of KVM on the
> host to be 4 G, it can in fact be pretty much any value
> between 0 and 4 GB depending on whether the guest OS
> has touched the memory pages, or as much as 8 GB if the
> guest OS has no balloon driver and touches all memory
> immediately."[1]

Windows zeros out all memory on startup, so it essentially touches all 
of its pages right away.
Y.

>
> And as far as I've seen it behave the same after the guest is ballooned and then deflated.
>
> Conclusion, the time in which the guest is allocated the resident memory is not deterministic and dependent on use and OS
> For example Windows7 uses all available memory as cache while fedora does not.
> So an inactive guest running windows7 will consume all possible resident memory right away while an inactive guest running fedora might never grow in memory size.
>
> [1] http://www.redhat.com/archives/virt-tools-list/2011-January/msg00001.html
>
> ------------------
> Noam Slomianko
> Red Hat Enterprise Virtualization, SLA team
>
> ----- Original Message -----
> From: "Adam Litke" <agl at us.ibm.com>
> To: "Noam Slomianko" <nslomian at redhat.com>
> Cc: "Doron Fediuck" <dfediuck at redhat.com>, vdsm-devel at lists.fedorahosted.org, engine-devel at ovirt.org
> Sent: Tuesday, October 9, 2012 8:06:02 PM
> Subject: Re: Mom Balloon policy issue
>
> Thanks for writing this.  Some thoughts inline, below.  Also, cc'ing some lists
> in case other folks want to participate in the discussion.
>
> On Tue, Oct 09, 2012 at 01:12:30PM -0400, Noam Slomianko wrote:
>> Greetings,
>>
>> I've fiddled around with ballooning and wanted to raise a question for debate.
>>
>> Currently as long as the host is under memory pressure, MOM will try and reclaim back memory from all guests with more free memory then a given threshold.
>>
>> Main issue: Guest allocated memory is not the same as the resident (physical) memory used by qemu.
>> This means that when memory is reclaimed back (the balloon is inflated) we might not get as much memory as planed back (or non at all).
>>
>>   *Example1 no memory is reclaimed back:
>>      name | allocated memory | used by the vm | resident memory used in the host by qemu
>>      Vm1  |       4G         |       4G,      |                4G
>>      Vm2  |       4G         |       1G       |                1G
>>   - MOM will inflate the balloon in vm2 (as vm has no free memory) and will gain no memory
> One thing to keep in mind is that VMs having less host RSS than their memory
> allocation is a temporary condition.  All VMs will eventually consume their full
> allocation if allowed to run.  I'd be curious to know how long this process
> takes in general.
>
> We might be able to handle this case by refusing to inflate the balloon if:
>      (VM free memory - planned balloon inflation) > host RSS
>
>
>>   *Example1 memory is reclaimed partially:
>>      name | allocated memory | used by the vm | resident memory used in the host by qemu
>>      Vm1  |       4G         |       4G,      |                4G
>>      Vm2  |       4G         |       1G       |                1G
>>      Vm3  |       4G         |       1G       |                4G
>>   - MOM will inflate the balloon in vm2 and vm3 slowly gaining only from vm3
> The above rule extension may help here too.
>
>> this behaviour might in the cause us to:
>>   * spend time reclaiming memory from many guests when we can reclaim only from a subgroup
>>   * be under the impression that we have more potential memory to reclaim when we do
>>   * bring inactive VMs dangerously low as they are constantly reclaimed (I've had guests crashing from kernel out of memory)
>>
>>
>> To address this I suggest that we collect guest memory stats from libvirt as well, so we have the option to use them in our calculations.
>> This can be achieved with the command "virsh dommemstat <domain>" which returns
>>      actual 3915372 (allocated)
>>      rss 2141580 (resident memory used by qemu)
> I would suggest adding these two fields to the VmStats that are collected by
> vdsm.  Then, to try it out, add the fields to the GuestMemory Collector.  (Note:
> MOM does have a collector that gathers RSS for VMs.  It's called GuestQemuProc).
> You can then extend the Balloon policy to add a snippet to check if the proposed
> balloon adjustment should be carried out.  You could add the logic to the
> change_big_enough function.
>
>> additional topic:
>>   * should we include per guest config (for example a hard minimum memory cap, this vm cannot run effectively with less then 1G memory)
> Yes.  This is probably something we want to do.  There is a whole topic around
> VM tagging that we should consider.  In the future we will want to be able to do
> many different things in policy based on a VMs tag.  For example, some VMs may
> be completely exempt from ballooning.  Others may have a minimum limit.
>
> I want to avoid passing in the raw guest configuration because MOM needs to work
> with direct libvirt vms and with ovirt/vdsm vms.  Therefore, we want to think
> carefully about the abstractions we use when presenting VM properties to MOM.
>