[ovirt-users] Multi-node cluster with local storage

Sahina Bose sabose at redhat.com
Fri Mar 4 10:50:22 UTC 2016



On 03/04/2016 04:13 PM, Pavel Gashev wrote:
>
>
>
> On 04/03/16 12:22, "Sahina Bose" <sabose at redhat.com> wrote:
>> On 03/04/2016 02:14 AM, Pavel Gashev wrote:
>>> Unfortunately, oVirt doesn't support multi-node local storage clusters.
>>> And Gluster/CEPH doesn't work well over 1G network. It looks like that
>>> the only way to use oVirt in a three-node cluster is to share local
>>> storages over NFS. At least it makes possible to migrate VMs and move
>>> disks among hardware nodes.
>>
>> Do you know of reported problems with Gluster over 1Gb network? I think
>> 10Gb is recommended, but 1Gb can also be used for gluster.
>> (We use it in our lab setup, and haven't encountered any issues so far
>> but of course, the workload may be different - hence the question)
> Let's calculate. If I have a three node replicated gluster volume, each block writing on a node copies the block to the other two nodes. Thus, maximal write performance can't be above 50MB/s. Even it's acceptable for my workload, things get worse in failure recovering scenario. Gluster works with files. When a node fails and then recovers (even it's just a plain reboot), gluster copies the whole file over network if the file is changed during node outage. So if I have a 100GB VM disk, and guest system has written a 512-byte block to the disk, the whole 100GB will be copied during recovery. It might take 20 minutes for 100GB, and 3 hours for 1TB. And network will be 100% busy during recovery, so VMs on other nodes will wait for I/O most of time. In other words, a plain reboot of a node would result in datacenter out of service for several hours.
>
> Things might be better if you have a distributed+replicated gluster volume. It requires at least six nodes. But things are still bad when you try to rebalance the volume after adding new bricks, or when a node has really failed and replaced.
>
> Thus, 1GB network is ok for a lab, but it's not ok for production. IMHO.

Most of the problems that you outline here - related to healing and 
replacing are addressed with the sharding translator. Sharding breaks 
the large image file into smaller files, so that the entire file does 
not have to be copied. More details here - 
http://blog.gluster.org/2015/12/introducing-shard-translator/






More information about the Users mailing list