[ovirt-users] Multi-node cluster with local storage

Fri Mar 4 13:39:55 UTC 2016

On 03/04/2016 05:30 PM, Pavel Gashev wrote:
> On 04/03/16 13:50, "Sahina Bose" <sabose at redhat.com> wrote:
>> On 03/04/2016 04:13 PM, Pavel Gashev wrote:
>>> On 04/03/16 12:22, "Sahina Bose" <sabose at redhat.com> wrote:
>>>> On 03/04/2016 02:14 AM, Pavel Gashev wrote:
>>>>> Unfortunately, oVirt doesn't support multi-node local storage clusters.
>>>>> And Gluster/CEPH doesn't work well over 1G network. It looks like that
>>>>> the only way to use oVirt in a three-node cluster is to share local
>>>>> storages over NFS. At least it makes possible to migrate VMs and move
>>>>> disks among hardware nodes.
>>>> Do you know of reported problems with Gluster over 1Gb network? I think
>>>> 10Gb is recommended, but 1Gb can also be used for gluster.
>>>> (We use it in our lab setup, and haven't encountered any issues so far
>>>> but of course, the workload may be different - hence the question)
>>> Let's calculate. If I have a three node replicated gluster volume, each block writing on a node copies the block to the other two nodes. Thus, maximal write performance can't be above 50MB/s. Even it's acceptable for my workload, things get worse in failure recovering scenario. Gluster works with files. When a node fails and then recovers (even it's just a plain reboot), gluster copies the whole file over network if the file is changed during node outage. So if I have a 100GB VM disk, and guest system has written a 512-byte block to the disk, the whole 100GB will be copied during recovery. It might take 20 minutes for 100GB, and 3 hours for 1TB. And network will be 100% busy during recovery, so VMs on other nodes will wait for I/O most of time. In other words, a plain reboot of a node would result in datacenter out of service for several hours.
>>>
>>> Things might be better if you have a distributed+replicated gluster volume. It requires at least six nodes. But things are still bad when you try to rebalance the volume after adding new bricks, or when a node has really failed and replaced.
>>>
>>> Thus, 1GB network is ok for a lab, but it's not ok for production. IMHO.
>> Most of the problems that you outline here - related to healing and
>> replacing are addressed with the sharding translator. Sharding breaks
>> the large image file into smaller files, so that the entire file does
>> not have to be copied. More details here -
>> http://blog.gluster.org/2015/12/introducing-shard-translator/
> Sure, I meant the same by mentioning distributed+replicated volumes. Actually, distributed+striped+replicated - https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/Administration_Guide/sect-User_Guide-Setting_Volumes-Distributed_Striped_Replicated.html

Ok. Sharding is not the same as striped volumes in gluster. With 
striping, like you mentioned, you would require more number of nodes to 
form the striped set in addition to the replica set.( so 6 nodes since 
you need replica 3 )
Sharding can however work with 3 nodes - so on the replica 3 gluster 
volume that you create, you can turn on the volume option 
"features.shard on", to turn on this feature.

>
>
>
>
>