--iiZKCn1f/U0ES2iY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On 05/25 17:06, David Caro wrote:
On 05/25 16:09, Barak Korren wrote:
> On 25 May 2016 at 14:52, David Caro <dcaro(a)redhat.com> wrote:
> > On 05/25 14:42, Barak Korren wrote:
> >> On 25 May 2016 at 12:44, Eyal Edri <eedri(a)redhat.com> wrote:
> >> > OK,
> >> > I suggest to test using a VM with local disk (preferably on a host=
with SSD
> >> > configured), if its working,
> >> > lets expedite moving all VMs or at least a large amount of VMs to =
it until
> >> > we see network load reduced.
> >> >
> >>
> >> This is not that easy, oVirt doesn't support mixing local disk and
> >> storage in the same cluster, so we will need to move hosts to a new
> >> cluster for this.
> >> Also we will lose the ability to use templates, or otherwise have to
> >> create the templates on each and every disk.
> >>
> >> The scratch disk is a good solution for this, where you can have the
> >> OS image on the central storage and the ephemeral data on the local
> >> disk.
> >>
> >> WRT to the storage architecture - a single huge (10.9T) ext4 is used
> >> as the FS on top of the DRBD, this is probably not the most efficient
> >> thing one can do (XFS would probably have been better, RAW via iSCSI=
-
> >> even better).
> >
> > That was done >3 years ago, xfs was not quite stable and widely used =
and
> > supported back then.
> >
> AFAIK it pre-dates EXT4
=20
It does, but for el6, it was performing way poorly, and with more bugs (f=
or
what the reviews of it said at the time).
=20
> in any case this does not detract from the
> fact that the current configuration in not as efficient as we can make
> it.
>=20
=20
It does not, I agree to better focus on what we can do now on, now what s=
hould
have been done then.
=20
>=20
> >>
> >> I'm guessing that those 10/9TB are not made from a single disk but
> >> with a hardware RAID of some sort. In this case deactivating the
> >> hardware RAID and re-exposing it as multiple separate iSCSI LUNs (Th=
at
> >> are then re-joined to a single sotrage domain in oVirt)
will enable
> >> different VMs to concurrently work on different disks. This should
> >> lower the per-vm storage latency.
> >
> > That would get rid of the drbd too, it's a totally different setup, f=
rom
> > scratch (no nfs either).
>=20
> We can and should still use DRBD, just setup a device for each disk.
> But yeah, NFS should probably go away.
> (We are seeing dramatically better performance for iSCSI in
> integration-engine)
=20
I don't understand then what you said about splitting the hardware raids,=
you
mean to setup one drdb device on top of each hard drive instead?=20
Though I really think we should move to gluster/ceph instead for the jenkins
vms, anyone knows what's the current status of the hyperconverge?
That would allow us for better scalable distributed storage, and properly u=
se
the hosts local disks (we have more space on the combined hosts right now t=
hat
on the storage servers).
=20
=20
btw. I think that the nfs is used also for something more than just the e=
ngine
storage domain (just to keep it in mind that it has to be checked if
we a=
re
going to get rid of it)
=20
>=20
> >
> >>
> >> Looking at the storage machine I see strong indication it is IO bound
> >> - the load average is ~12 while there are just 1-5 working processes
> >> and the CPU is ~80% idle and the rest is IO wait.
> >>
> >> Running 'du *' at:
> >> /srv/ovirt_storage/jenkins-dc/658e5b87-1207-4226-9fcc-4e5fa02b86b4/i=
mages
> >> one can see that most images are ~40G in size (that is
_real_ 40G not
> >> sparse!). This means that despite having most VMs created based on
> >> templates, the VMs are full template copies rather then COW clones.
> >
> > That should not be like that, maybe the templates are wrongly configu=
red? or
> > foreman images?
>=20
> This is the expected behaviour when creating a VM from template in the
> oVirt admin UI. I thought Foreman might behave differently, but it
> seems it does not.
>=20
> This behaviour is determined by the parameters you pass to the engine
> API when instantiating a VM, so it most probably doesn't have anything
> to do with the template configuration.
=20
So maybe a misconfiguration in foreman?
=20
>=20
> >
> >> What this means is that using pools (where all VMs are COW copies of
> >> the single pool template) is expected to significantly reduce the
> >> storage utilization and therefore the IO load on it (the less you
> >> store, the less you need to read back).
> >
> > That should happen too without pools, with normal qcow templates.
>=20
> Not unless you create all the VMs via the API and pass the right
> parameters. Pools are the easiest way to ensure you never mess that
> up...
=20
That was the idea
=20
>=20
> > And in any case, that will not lower the normal io, when not actually
> > creating vms, as any read and write will still hit the disk anyhow, it
> > only alleviates the io when creating new vms.
>=20
> Since you are reading the same bits over and over (for different VMs)
> you enable the various buffer caches along the way (in the storage
> machines and in the hypevirsors) to do what they are supposed to.
=20
=20
Once the vm is started, mostly all that's needed is on ram, so there are =
not
that much reads from disk, unless you start writing down to it, and
that's
mostly what we are hitting, lots of writes.
=20
>=20
> > The local disk (scratch disk) is the best option
> > imo, now and for the foreseeable future.
>=20
> This is not an either/or thing, IMO we need to do both.
=20
I think that it's way more useful, because it will solve our current issu=
es
faster and for longer, so IMO it should get more attention sooner.
=20
Any improvement that does not remove the current bottleneck, is not really
giving any value to the overall infra (even if it might become valuable l=
ater).
=20
>=20
> --=20
> Barak Korren
> bkorren(a)redhat.com
> RHEV-CI Team
=20
--=20
David Caro
=20
Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D
=20
Tel.: +420 532 294 605
Email: dcaro(a)redhat.com
IRC: dcaro|dcaroest@{freenode|oftc|redhat}
Web:
www.redhat.com
RHT Global #: 82-62605
--=20
David Caro
Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605
Email: dcaro(a)redhat.com
IRC: dcaro|dcaroest@{freenode|oftc|redhat}
Web:
www.redhat.com
RHT Global #: 82-62605
--iiZKCn1f/U0ES2iY
Content-Type: application/pgp-signature; name="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJXRcH2AAoJEEBxx+HSYmnDh3UH/08y1/p2/Xf5jzMSRTbnSnZQ
GTZNyseiTzG2G7gWj7wkMWkSiLbl3vCXOGppt4EhWkGBaBs6jEfaEqDB8yCRg88T
dNew80UTRW73EQXR4HwriYzm0zSeGGU4Y68Bg98yBB4jeuaO4B5uwNzNMjdwMgXs
Gg2j9rpEyI/hsS2qsVw9l2uA8Q7mQ6QZqL/m/zeEZL4xfHT65685vrPt4XmrLCpb
39RY7jtcLdF2rk9UhLPmaLN5wTGAjKooo/5KdcuQ/uM1x1oPLZtzQssMVX6ZD83A
KqK/4aDrUivyz6IGg3kgOwfdcK38qAAvJmSxRCfaM2ynfVyeBmAY5VCp+JocEsg=
=r1Oa
-----END PGP SIGNATURE-----
--iiZKCn1f/U0ES2iY--