On Sun, Jun 26, 2016 at 11:49 AM, Nicolás <nicolas(a)devels.es> wrote:
Hi Nir,
El 25/06/16 a las 22:57, Nir Soffer escribió:
On Sat, Jun 25, 2016 at 11:47 PM, Nicolás <nicolas(a)devels.es> wrote:
Hi,
We're using Ceph along with an iSCSI gateway, so our storage domain is
actually an iSCSI backend. So far, we have had zero issues with cca. 50 high
IO rated VMs. Perhaps [1] might shed some light on how to set it up.
Can you share more details on this setup and how you integrate with ovirt?
For example, are you using ceph luns in regular iscsi storage domain, or
attaching luns directly to vms?
Fernando Frediani (responding to this thread) hit the nail on the head.
Actually we have a 3-node Ceph infrastructure, so we created a few volumes
on the Ceph nodes side (RBD) and then exported them to iSCSI, so it's oVirt
who creates the LVs on the top, this way we don't need to attach luns
directly.
Once the volumes are exported on the iSCSI side, adding an iSCSI domain on
oVirt is enough to make the whole thing work.
As for experience, we have done a few tests and so far we've had zero
issues:
The main bottleneck is the iSCSI gateway interface bandwith. In our case we
have a balance-alb bond over two 1G network interfaces. Later we realized
this kind of bonding is useless because MAC addresses won't change, so in
practice only 1G will be used at most. Making some heavy tests (i.e.,
powering on 50 VMs at a time) we've reached this threshold at specific
points but it didn't affect performance significantly.
Doing some additional heavy tests (powering on and off all VMs at a time),
we've reached the maximum value of cca. 1200 IOPS at a time. In normal
conditions we don't surpass 200 IOPS, even when these 50 VMs do lots of disk
operations.
We've also done some tolerance tests, like removing one or more disks from a
Ceph node, reinserting them, suddenly shut down one node, restoring it...
The only problem we've experienced is a slower access to the iSCSI backend,
which results in a message in the oVirt manager warning about this:
something like "Storage is taking to long to respond...", which was maybe
15-20 seconds. We got no VM pauses at any time, though, nor any significant
issue.
This setup works, but you are not fully using ceph potential.
You are actually using iscsi storage, so you are limited to 350 lvs per
storage domain (for performance reasons). You are also using ovirt thin
provisioning instead of ceph thin provisioning, so all your vms depend
on the spm to extend vms disks when needed, and your vms may pause
from time to time if the spm could not extend the disks fast enough.
When cloning disks (e.g. create vm from template), you are copying the
data from ceph to the spm node, and back to ceph. With cinder/ceph,
this operation happen inside the ceph cluster and is much more efficient,
possibly not copying anything.
Performance is limited by the iscsi gateway(s) - when using native ceph,
each vm is talking directly to the osds keeping its data. Reads and writes
are using multiple hosts.
On the other hand you are not limited by missing features in our current
ceph implementation (e.g. live storage migration, copy disks from other
storage domains, no monitoring).
It would be interesting to compare cinder/ceph with your system. You
can install a vm with cinder and the rest of the components, add another
pool for cinder, and compare vms using native ceph and iscsi/ceph.
You may like to check this project providing production-ready openstack
containers:
https://github.com/openstack/kolla
Nir