Re: ngn build jobs take more than twice (x) as long as in the last days

25 May 2016

      On 25 May 2016 at 14:52, David Caro <dcaro@redhat.com> wrote:
...
On 05/25 14:42, Barak Korren wrote:
...
On 25 May 2016 at 12:44, Eyal Edri <eedri@redhat.com> wrote:
...
OK,
I suggest to test using a VM with local disk (preferably on a host with SSD
configured), if its working,
lets expedite moving all VMs or at least a large amount of VMs to it until
we see network load reduced.
This is not that easy, oVirt doesn't support mixing local disk and
storage in the same cluster, so we will need to move hosts to a new
cluster for this.
Also we will lose the ability to use templates, or otherwise have to
create the templates on each and every disk.
The scratch disk is a good solution for this, where you can have the
OS image on the central storage and the ephemeral data on the local
disk.
WRT to the storage architecture - a single huge (10.9T) ext4 is used
as the FS on top of the DRBD, this is probably not the most efficient
thing one can do (XFS would probably have been better, RAW via iSCSI -
even better).
That was done >3 years ago, xfs was not quite stable and widely used and
supported back then.
AFAIK it pre-dates EXT4, in any case this does not detract from the
fact that the current configuration in not as efficient as we can make
it.
...
...
I'm guessing that those 10/9TB are not made from a single disk but
with a hardware RAID of some sort. In this case deactivating the
hardware RAID and re-exposing it as multiple separate iSCSI LUNs (That
are then re-joined to a single sotrage domain in oVirt) will enable
different VMs to concurrently work on different disks. This should
lower the per-vm storage latency.
That would get rid of the drbd too, it's a totally different setup, from
scratch (no nfs either).
We can and should still use DRBD, just setup a device for each disk.
But yeah, NFS should probably go away.
(We are seeing dramatically better performance for iSCSI in
integration-engine)
...
...
Looking at the storage machine I see strong indication it is IO bound
- the load average is ~12 while there are just 1-5 working processes
and the CPU is ~80% idle and the rest is IO wait.
Running 'du *' at:
/srv/ovirt_storage/jenkins-dc/658e5b87-1207-4226-9fcc-4e5fa02b86b4/images
one can see that most images are ~40G in size (that is _real_ 40G not
sparse!). This means that despite having most VMs created based on
templates, the VMs are full template copies rather then COW clones.
That should not be like that, maybe the templates are wrongly configured? or
foreman images?
This is the expected behaviour when creating a VM from template in the
oVirt admin UI. I thought Foreman might behave differently, but it
seems it does not.

This behaviour is determined by the parameters you pass to the engine
API when instantiating a VM, so it most probably doesn't have anything
to do with the template configuration.
...
...
What this means is that using pools (where all VMs are COW copies of
the single pool template) is expected to significantly reduce the
storage utilization and therefore the IO load on it (the less you
store, the less you need to read back).
That should happen too without pools, with normal qcow templates.
Not unless you create all the VMs via the API and pass the right
parameters. Pools are the easiest way to ensure you never mess that
up...
...
And in any case, that will not lower the normal io, when not actually
creating vms, as any read and write will still hit the disk anyhow, it
only alleviates the io when creating new vms.
Since you are reading the same bits over and over (for different VMs)
you enable the various buffer caches along the way (in the storage
machines and in the hypevirsors) to do what they are supposed to.
...
The local disk (scratch disk) is the best option
imo, now and for the foreseeable future.
This is not an either/or thing, IMO we need to do both.

-- 
Barak Korren
bkorren@redhat.com
RHEV-CI Team