Using oVirt VM pools in oVirt infra

13 Jan 2016

      VM Pools are a nice feature of oVirt.
A VM pool lets you quickly create a pool of stateles VMs all based on
the same template.
A VM pool also seems to currently be the only way to create
template-based thin QCOW2 VMs in oVirt. (Cloning from template creates
a thick copy, this is why its relatively slow)
With the autostart [1] feature, you can have the VMs auto-started when
the pool is started, it also means VMs get started automatically a few
minutes after they are shut down.
What this comes down to is that if you run 'shutdown' in a VM from a
pool, you will automatically get back a clean VM a few minutes later.

Unfortunately VM pools are not without their short comings, I've
documented two of these in BZ#1298235 [2] and BZ#1298232 [3].
When this means in essence is that oVirt does not give you a way to
predictably assign names or IPs to VMs in a pool.

So how do we solve this?

Since the ultimate goal for VMs in a pool is to become Jenkins slaves,
one solution is to use the swarm plugin [4].
With the swarm plugin, the actual name and address of the slave VM
becomes not very important.
We could quite easily setup the cloud-init invoked for VMs in the pool
to download the swarm plugin client and then run it to register to
Jenkins while setting labels according to various system properties.

The question remains how to assign IP addresses and names, to the pool VMs.
We will probably need a range of IP addresses that is pre-assigned to
a range of DNS records an that will be assigned to pool VMs as they
boot up.

Currently our DHCP and DNS servers in PHX is managed by Foreman in a
semi-random fashion.
As we've seen in the past this is subject to various failures such as
the MAC address of the foreman record getting out of sync with the one
of the VM (for example due to Facter reporting a bad address after a
particularity nasty VDSM test run), or the DNS record going out of
sync with the VM's host name and address in the DHCP.
At this point I think we've enough evidence against Foreman's style of
managing DNS and DHCP, I suggest we will:
1. Cease from creating new VMs in PHX via Foreman for a while.
2. Shutdown the PHX foreman proxy to disconnect it from managing the
DNS and DHCP.
3. Map out our currently active MAC->IP->HOSTNAME combinations and
create static DNS and DHCP configuration files (I suggest we also
migrate from BIND+ISC DHCPD to Dnsmasq which is far easier to
configure and provides very tight DNS, DHCP and TFTP integration)
4. Add configuration for a dynamically assigned IP range as described above.

Another way to resolve the current problem of coming up with a
dynamically assignable range of IPs, is to create a new VLAN in PHX
for the new pools of VMs.

One more issue we need to consider is how to use Puppet on the pool
VMs, we will probably still like Puppet to run in order to setup SSH
access for us, as well as other things needed on the slave.
Possibly we would also like for the swarm plugin client to be actually
installed and activated by Puppet, as that would grant us easy access
to Facter facts for determining the labels the slave should have while
also ensuring the slave will not become available to Jenkins until it
is actually ready for use.
It is easy enough to get Puppet running via a cloud-init script, but
the issue here is how to select classes for the new VMs.
Since they are not created in Foreman, they will not get assigned to
hostgroups, and therefore class assignment by way of hostgroup
membership will not work.
I see a few ways to resolve this:
1. An a 'node' entry in 'site.pp' to detect pool VMs (with a name
regex) and assgin classes to them
2. Use 'hiera_include' [5] in 'site.pp' to assign classes by facts via Hiera
3. Use a combination of the two methods above to ensure
'hiera_include' gets applied to and only to pool VMs.

These are my thoughts about this so far, I am working on building a
POC for this, but I would be happy to hear other thoughts and opinions
at this point.

[1]: http://www.ovirt.org/Features/PrestartedVm
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1298235
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1298232
[4]: https://wiki.jenkins-ci.org/display/JENKINS/Swarm+Plugin
[5]: https://docs.puppetlabs.com/hiera/1/puppet.html#assigning-classes-to-nodes-w...

-- 
Barak Korren
bkorren@redhat.com
RHEV-CI Team

Barak Korren

David Caro

Anton Marchukov

David Caro

Barak Korren

Barak Korren

David Caro

Barak Korren

Anton Marchukov

Barak Korren

David Caro

Anton Marchukov

David Caro

Anton Marchukov

David Caro

tags

participants (3)