New subject: Understanding ovirt memory management which appears incorrect

28 Jan 2020

      Hi All,

A question regarding memory management with ovirt. I know memory can
be complicated hence I'm asking the experts. :)

Two examples of where it looks - to me - that memory management from
ovirt perspective is incorrect. This is resulting in us not getting as
much out of a host as we'd expect.

## Example 1:

host: dev-cluster-04

I understand the mem on the host to be:
128G total (physical)
68G used
53G available
56G buff/cache

I understand therefore 53G should still be available to allocate
(approximately, minus a few things).

```
  DEV  [root@dev-cluster-04:~]  # free -m
                total        used        free      shared  buff/cache   available
  Mem:         128741       68295        4429        4078       56016       53422
  Swap:         12111        1578       10533
  DEV  [root@dev-cluster-04:~]  # cat /proc/meminfo
  MemTotal:       131831292 kB
  MemFree:         4540852 kB
  MemAvailable:   54709832 kB
  Buffers:            3104 kB
  Cached:          5174136 kB
  SwapCached:       835012 kB
  Active:         66943552 kB
  Inactive:        5980340 kB
  Active(anon):   66236968 kB
  Inactive(anon):  5713972 kB
  Active(file):     706584 kB
  Inactive(file):   266368 kB
  Unevictable:       50036 kB
  Mlocked:           54132 kB
  SwapTotal:      12402684 kB
  SwapFree:       10786688 kB
  Dirty:               812 kB
  Writeback:             0 kB
  AnonPages:      67068548 kB
  Mapped:           143880 kB
  Shmem:           4176328 kB
  Slab:           52183680 kB
  SReclaimable:   49822156 kB
  SUnreclaim:      2361524 kB
  KernelStack:       20000 kB
  PageTables:       213628 kB
  NFS_Unstable:          0 kB
  Bounce:                0 kB
  WritebackTmp:          0 kB
  CommitLimit:    78318328 kB
  Committed_AS:   110589076 kB
  VmallocTotal:   34359738367 kB
  VmallocUsed:      859104 kB
  VmallocChunk:   34291324976 kB
  HardwareCorrupted:     0 kB
  AnonHugePages:    583680 kB
  CmaTotal:              0 kB
  CmaFree:               0 kB
  HugePages_Total:       0
  HugePages_Free:        0
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB
  DirectMap4k:      621088 kB
  DirectMap2M:    44439552 kB
  DirectMap1G:    91226112 kB
```

The ovirt engine, compute -> hosts view shows s4-dev-cluster-01 as 93%
memory utilised.

Clicking on the node says:
Physical Memory: 128741 MB total, 119729 MB used, 9012 MB free

So ovirt engine says 9G free. The OS reports 4G free but 53G
available. Surely ovirt should be looking at available memory?

This is a problem, for instance, when trying to run a VM, called
dev-cassandra-01, with mem size 24576, max mem 24576 and mem
guarantee set to 10240 on this host it fails with:

```
  Cannot run VM. There is no host that satisfies current scheduling
  constraints. See below for details:

  The host dev-cluster-04.fnb.co.za did not satisfy internal filter
  Memory because its available memory is too low (19884 MB) to run the
  VM.
```

To me this looks blatantly wrong. The host has 53G available according
to free -m.

Guessing I'm missing something, unless this is some sort of bug?

versions:

```
  engine: 4.3.7.2-1.el7

  host:
  OS Version: RHEL - 7 - 6.1810.2.el7.centos
  OS Description: CentOS Linux 7 (Core)
  Kernel Version: 3.10.0 - 957.12.1.el7.x86_64
  KVM Version: 2.12.0 - 18.el7_6.3.1
  LIBVIRT Version: libvirt-4.5.0-10.el7_6.7
  VDSM Version: vdsm-4.30.13-1.el7
  SPICE Version: 0.14.0 - 6.el7_6.1
  GlusterFS Version: [N/A]
  CEPH Version: librbd1-10.2.5-4.el7
  Open vSwitch Version: openvswitch-2.10.1-3.el7
  Kernel Features: PTI: 1, IBRS: 0, RETP: 1, SSBD: 3
  VNC Encryption: Disabled
```

## Example 2:

A ovirt host with two VMs:

According to the host, it has 128G of physical memory of which 56G is
used, 69G is buff/cache and 65G is available.

As is shown here:

```
  LIVE  [root@prod-cluster-01:~]  # cat /proc/meminfo
  MemTotal:       131326836 kB
  MemFree:         2630812 kB
  MemAvailable:   66573596 kB
  Buffers:            2376 kB
  Cached:          5670628 kB
  SwapCached:       151072 kB
  Active:         59106140 kB
  Inactive:        2744176 kB
  Active(anon):   58099732 kB
  Inactive(anon):  2327428 kB
  Active(file):    1006408 kB
  Inactive(file):   416748 kB
  Unevictable:       40004 kB
  Mlocked:           42052 kB
  SwapTotal:       4194300 kB
  SwapFree:        3579492 kB
  Dirty:                 0 kB
  Writeback:             0 kB
  AnonPages:      56085040 kB
  Mapped:           121816 kB
  Shmem:           4231808 kB
  Slab:           65143868 kB
  SReclaimable:   63145684 kB
  SUnreclaim:      1998184 kB
  KernelStack:       25296 kB
  PageTables:       148336 kB
  NFS_Unstable:          0 kB
  Bounce:                0 kB
  WritebackTmp:          0 kB
  CommitLimit:    69857716 kB
  Committed_AS:   76533164 kB
  VmallocTotal:   34359738367 kB
  VmallocUsed:      842296 kB
  VmallocChunk:   34291404724 kB
  HardwareCorrupted:     0 kB
  AnonHugePages:     55296 kB
  CmaTotal:              0 kB
  CmaFree:               0 kB
  HugePages_Total:       0
  HugePages_Free:        0
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB
  DirectMap4k:      722208 kB
  DirectMap2M:    48031744 kB
  DirectMap1G:    87031808 kB

  LIVE  [root@prod-cluster-01:~]  # free -m
                total        used        free      shared  buff/cache   available
  Mem:         128248       56522        2569        4132       69157       65013
  Swap:          4095         600        3495
```

However the compute -> hosts ovirt screen shows this node as 94%
memory.

Clicking compute -> hosts -> prod-cluster-01 -> general says:

Physical Memory: 128248 MB total, 120553 MB used, 7695 MB free
Swap Size: 4095 MB total, 600 MB used, 3495 MB free

The physical memory in the above makes no sense to me. Unless it
includes caches which I would think it shouldn't.

This host has just two VMs:

LIVE  [root@prod-cluster-01:~]  # virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf list
 Id    Name                           State
----------------------------------------------------
 35    prod-box-18                   running
 36    prod-box-11                   running

Moreover each VM has 32G memory set, in every possible place - from
what I can see.

```
  LIVE  [root@prod-cluster-01:~]  # virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf dumpxml prod-box-11|grep -i mem
      <ovirt-vm:memGuaranteedSize type="int">32768</ovirt-vm:memGuaranteedSize>
      <ovirt-vm:minGuaranteedMemoryMb type="int">32768</ovirt-vm:minGuaranteedMemoryMb>
    <memory unit='KiB'>33554432</memory>
    <currentMemory unit='KiB'>33554432</currentMemory>
        <cell id='0' cpus='0-27' memory='33554432' unit='KiB'/>
      <suspend-to-mem enabled='no'/>
        <model type='qxl' ram='65536' vram='32768' vgamem='16384' heads='1' primary='yes'/>
      <memballoon model='virtio'>
      </memballoon>
```

prod-box-11 is however set as high performance VM. That could cause a
problem.

Same for the other VM:

```
  LIVE  [root@prod-cluster-01:~]  # virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf dumpxml prod-box-18|grep -i mem
      <ovirt-vm:memGuaranteedSize type="int">32768</ovirt-vm:memGuaranteedSize>
      <ovirt-vm:minGuaranteedMemoryMb type="int">32768</ovirt-vm:minGuaranteedMemoryMb>
    <memory unit='KiB'>33554432</memory>
    <currentMemory unit='KiB'>33554432</currentMemory>
        <cell id='0' cpus='0-27' memory='33554432' unit='KiB'/>
      <suspend-to-mem enabled='no'/>
        <model type='qxl' ram='65536' vram='32768' vgamem='16384' heads='1' primary='yes'/>
      <memballoon model='virtio'>
      </memballoon>
```

So I understand that two VMs each having allocated 32G of ram should
consume approx 64G of ram on the host. The host has 128G of ram, so
usage should be at approx 50%. However ovirt is reporting 94% usage.

Versions:

```
  engine: 4.3.5.5-1.el7

  host:
  OS Version: RHEL - 7 - 6.1810.2.el7.centos
  OS Description: CentOS Linux 7 (Core)
  Kernel Version: 3.10.0 - 957.10.1.el7.x86_64
  KVM Version: 2.12.0 - 18.el7_6.3.1
  LIBVIRT Version: libvirt-4.5.0-10.el7_6.6
  VDSM Version: vdsm-4.30.11-1.el7
  SPICE Version: 0.14.0 - 6.el7_6.1
  GlusterFS Version: [N/A]
  CEPH Version: librbd1-10.2.5-4.el7
  Open vSwitch Version: openvswitch-2.10.1-3.el7
  Kernel Features: PTI: 1, IBRS: 0, RETP: 1
  VNC Encryption: Disabled
```

Thanks for any insights!

--
Divan Santana
https://divansantana.com

Understanding ovirt memory management which appears incorrect

divan＠santanas.co.za

Amit Bawer

Divan Santana

Strahil Nikolov

Divan Santana

Andrej Krejcir

Andrej Krejcir

Divan Santana

tags

participants (5)