[Engine-devel] [vdsm] Mom Balloon policy issue

Wed Oct 10 12:36:11 UTC 2012

On 10/10/2012 11:23 AM, Noam Slomianko wrote:
> Regarding qemu-kvm memory allocation behaviour, this is the clearest explanation I found:
>
> "Unless using explicit hugepages for the guest, KVM will
> *not* initially allocate all RAM from the host OS. Each
> page of RAM is only allocated when the guest OS touches
> that page. So if you set memory=8G and currentMemory=4G,
> although you might expect resident usage of KVM on the
> host to be 4 G, it can in fact be pretty much any value
> between 0 and 4 GB depending on whether the guest OS
> has touched the memory pages, or as much as 8 GB if the
> guest OS has no balloon driver and touches all memory
> immediately."[1]

A simpler explanation is that we use demand paging with allocating the 
guest RAM and thus we don't instantiate any page prior to its usage by 
the guest. It's the same with huge pages as long as prealloc isn't called.

>
> And as far as I've seen it behave the same after the guest is ballooned and then deflated.
>
> Conclusion, the time in which the guest is allocated the resident memory is not deterministic and dependent on use and OS
> For example Windows7 uses all available memory as cache while fedora does not.

(not cache but zero the pages).

> So an inactive guest running windows7 will consume all possible resident memory right away while an inactive guest running fedora might never grow in memory size.

It takes time for win7 to touch all the pages (during its boot) and to 
KVM to allocate it so it's not instantaneous.

One can measure exactly the current size of qemu's rss so MOM can be 
aware of it.

>
> [1] http://www.redhat.com/archives/virt-tools-list/2011-January/msg00001.html
>
> ------------------
> Noam Slomianko
> Red Hat Enterprise Virtualization, SLA team
>
> ----- Original Message -----
> From: "Adam Litke" <agl at us.ibm.com>
> To: "Noam Slomianko" <nslomian at redhat.com>
> Cc: "Doron Fediuck" <dfediuck at redhat.com>, vdsm-devel at lists.fedorahosted.org, engine-devel at ovirt.org
> Sent: Tuesday, October 9, 2012 8:06:02 PM
> Subject: Re: Mom Balloon policy issue
>
> Thanks for writing this.  Some thoughts inline, below.  Also, cc'ing some lists
> in case other folks want to participate in the discussion.
>
> On Tue, Oct 09, 2012 at 01:12:30PM -0400, Noam Slomianko wrote:
>> Greetings,
>>
>> I've fiddled around with ballooning and wanted to raise a question for debate.
>>
>> Currently as long as the host is under memory pressure, MOM will try and reclaim back memory from all guests with more free memory then a given threshold.
>>
>> Main issue: Guest allocated memory is not the same as the resident (physical) memory used by qemu.
>> This means that when memory is reclaimed back (the balloon is inflated) we might not get as much memory as planed back (or non at all).
>>
>>   *Example1 no memory is reclaimed back:
>>      name | allocated memory | used by the vm | resident memory used in the host by qemu
>>      Vm1  |       4G         |       4G,      |                4G
>>      Vm2  |       4G         |       1G       |                1G
>>   - MOM will inflate the balloon in vm2 (as vm has no free memory) and will gain no memory
>
> One thing to keep in mind is that VMs having less host RSS than their memory
> allocation is a temporary condition.  All VMs will eventually consume their full
> allocation if allowed to run.  I'd be curious to know how long this process
> takes in general.
>
> We might be able to handle this case by refusing to inflate the balloon if:
>      (VM free memory - planned balloon inflation) > host RSS
>
>
>>   *Example1 memory is reclaimed partially:
>>      name | allocated memory | used by the vm | resident memory used in the host by qemu
>>      Vm1  |       4G         |       4G,      |                4G
>>      Vm2  |       4G         |       1G       |                1G
>>      Vm3  |       4G         |       1G       |                4G
>>   - MOM will inflate the balloon in vm2 and vm3 slowly gaining only from vm3
>
> The above rule extension may help here too.
>
>> this behaviour might in the cause us to:
>>   * spend time reclaiming memory from many guests when we can reclaim only from a subgroup
>>   * be under the impression that we have more potential memory to reclaim when we do
>>   * bring inactive VMs dangerously low as they are constantly reclaimed (I've had guests crashing from kernel out of memory)
>>
>>
>> To address this I suggest that we collect guest memory stats from libvirt as well, so we have the option to use them in our calculations.
>> This can be achieved with the command "virsh dommemstat <domain>" which returns
>>      actual 3915372 (allocated)
>>      rss 2141580 (resident memory used by qemu)
>
> I would suggest adding these two fields to the VmStats that are collected by
> vdsm.  Then, to try it out, add the fields to the GuestMemory Collector.  (Note:
> MOM does have a collector that gathers RSS for VMs.  It's called GuestQemuProc).
> You can then extend the Balloon policy to add a snippet to check if the proposed
> balloon adjustment should be carried out.  You could add the logic to the
> change_big_enough function.
>
>> additional topic:
>>   * should we include per guest config (for example a hard minimum memory cap, this vm cannot run effectively with less then 1G memory)
>
> Yes.  This is probably something we want to do.  There is a whole topic around
> VM tagging that we should consider.  In the future we will want to be able to do
> many different things in policy based on a VMs tag.  For example, some VMs may
> be completely exempt from ballooning.  Others may have a minimum limit.
>
> I want to avoid passing in the raw guest configuration because MOM needs to work
> with direct libvirt vms and with ovirt/vdsm vms.  Therefore, we want to think
> carefully about the abstractions we use when presenting VM properties to MOM.
>