[ovirt-users] Ovirt/Gluster

Fri Aug 21 14:27:30 UTC 2015

On 08/21/2015 02:21 PM, Ravishankar N wrote:
>
>
> On 08/21/2015 04:32 PM, Sander Hoentjen wrote:
>>
>>
>> On 08/21/2015 11:30 AM, Ravishankar N wrote:
>>>
>>>
>>> On 08/21/2015 01:21 PM, Sander Hoentjen wrote:
>>>>
>>>>
>>>> On 08/21/2015 09:28 AM, Ravishankar N wrote:
>>>>>
>>>>>
>>>>> On 08/20/2015 02:14 PM, Sander Hoentjen wrote:
>>>>>>
>>>>>>
>>>>>> On 08/19/2015 09:04 AM, Ravishankar N wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:
>>>>>>>> + Ravi from gluster.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ramesh
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Sander Hoentjen" <sander at hoentjen.eu>
>>>>>>>> To: users at ovirt.org
>>>>>>>> Sent: Tuesday, August 18, 2015 3:30:35 PM
>>>>>>>> Subject: [ovirt-users] Ovirt/Gluster
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We are looking for some easy to manage self contained VM 
>>>>>>>> hosting. Ovirt
>>>>>>>> with GlusterFS seems to fit that bill perfectly. I installed it 
>>>>>>>> and then
>>>>>>>> starting kicking the tires. First results looked promising, but 
>>>>>>>> now I
>>>>>>>> can get a VM to pause indefinitely fairly easy:
>>>>>>>>
>>>>>>>> My setup is 3 hosts that are in a Virt and Gluster cluster. 
>>>>>>>> Gluster is
>>>>>>>> setup as replica-3. The gluster export is used as the storage 
>>>>>>>> domain for
>>>>>>>> the VM's.
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> What version of gluster and ovirt are you using?
>>>>>> glusterfs-3.7.3-1.el7.x86_64
>>>>>> vdsm-4.16.20-0.el7.centos.x86_64
>>>>>> ovirt-engine-3.5.3.1-1.el7.centos.noarch
>>>>>>>
>>>>>>>>
>>>>>>>> Now when I start the VM all is good, performance is good enough 
>>>>>>>> so we
>>>>>>>> are happy. I then start bonnie++ to generate some load. I have 
>>>>>>>> a VM
>>>>>>>> running on host 1, host 2 is SPM and all 3 VM's are seeing some 
>>>>>>>> network
>>>>>>>> traffic courtesy of gluster.
>>>>>>>>
>>>>>>>> Now, for fun, suddenly the network on host3 goes bad (iptables 
>>>>>>>> -I OUTPUT
>>>>>>>> -m statistic --mode random --probability 0.75 -j REJECT).
>>>>>>>> Some time later I see the guest has a small "hickup", I'm 
>>>>>>>> guessing that
>>>>>>>> is when gluster decides host 3 is not allowed to play anymore. 
>>>>>>>> No big
>>>>>>>> deal anyway.
>>>>>>>> After a while 25% of packages just isn't good enough for Ovirt 
>>>>>>>> anymore,
>>>>>>>> so the host will be fenced.
>>>>>>>
>>>>>>> I'm not sure what fencing means w.r.t ovirt and what it actually 
>>>>>>> fences. As far is gluster is concerned, since only one node is 
>>>>>>> blocked, the VM image should still be accessible by the VM 
>>>>>>> running on host1.
>>>>>> Fencing means (at least in this case) that the IPMI of the server 
>>>>>> does a power reset.
>>>>>>>> After a reboot *sometimes* the VM will be
>>>>>>>> paused, and even after the gluster self-heal is complete it can 
>>>>>>>> not be
>>>>>>>> unpaused, has to be restarted.
>>>>>>>
>>>>>>> Could you provide the gluster mount (fuse?) logs and the brick 
>>>>>>> logs of all 3 nodes when the VM is paused? That should give us 
>>>>>>> some clue.
>>>>>>>
>>>>>> Logs are attached. Problem was at around 8:15 - 8:20 UTC
>>>>>> This time however the vm stopped even without a reboot of hyp03
>>>>>
>>>>>
>>>>> The mount logs  (rhev-data-center-mnt-glusterSD*) are indicating 
>>>>> frequent disconnects to the bricks  with 
>>>>> 'clnt_ping_timer_expired', 'Client-quorum is not met' and 
>>>>> 'Read-only file system' messages.
>>>>> client-quorum is enabled by default for replica 3 volumes. So if 
>>>>> the mount cannot connect to 2 bricks at least, quorum is lost and 
>>>>> the gluster volume becomes read-only. That seems to be the reason 
>>>>> why the VMs are pausing.
>>>>> I'm not sure if the frequent disconnects are due a flaky network 
>>>>> or the bricks not responding to the mount's ping timer due to it's 
>>>>> epoll threads busy with I/O (unlikely). Can you also share the 
>>>>> output of `gluster volume info <volname>` ?
>>>> The frequent disconnects are probably because I intentionally broke 
>>>> the network on hyp03 (dropped 75% of outgoing packets). In my 
>>>> opinion this should not affect the VM an hyp02. Am I wrong to think 
>>>> that?
>>>
>>>
>>> For client-quorum, If a client (mount)  cannot connect to the number 
>>> of bricks to achieve quorum, the client becomes read-only. So if the 
>>> client on hyp02 can see itself and 01, it shouldn't be affected.
>> But it was, and I only "broke" hyp03.
>
> Beats me then. I see "[2015-08-18 15:15:27.922998] W [MSGID: 108001] 
> [afr-common.c:4043:afr_notify] 0-VMS-replicate-0: Client-quorum is not 
> met" on hyp02's mount log but the time stamp is earlier than when you 
> say you observed the hang (2015-08-20, around 8:15 - 8:20 UTC?).  
> (they do occur in that time on hyp03 though).
Yeah that event is from before. For your information: This setup is used 
to test, so I try to break it and hope I don't succeed. Unfortunately I 
succeeded.
>
>>>
>>>>
>>>> [root at hyp01 ~]# gluster volume info VMS
>>>>
>>>> Volume Name: VMS
>>>> Type: Replicate
>>>> Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca
>>>> Status: Started
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: 10.99.50.20:/brick/VMS
>>>> Brick2: 10.99.50.21:/brick/VMS
>>>> Brick3: 10.99.50.22:/brick/VMS
>>>> Options Reconfigured:
>>>> performance.readdir-ahead: on
>>>> nfs.disable: on
>>>> user.cifs: disable
>>>> auth.allow: *
>>>> performance.quick-read: off
>>>> performance.read-ahead: off
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> cluster.eager-lock: enable
>>>> network.remote-dio: enable
>>>> cluster.quorum-type: auto
>>>> cluster.server-quorum-type: server
>>>
>>> I see that you have enabled server-quorum too. Since you blocked 
>>> hyp03, the if the glusterd on that node cannot  see the other 2 
>>> nodes due to iptable rules, it would kill all brick processes. See 
>>> the "7 How To Test " section in 
>>> http://www.gluster.org/community/documentation/index.php/Features/Server-quorum 
>>> to get a better idea of server-quorum.
>>>
>> Yes but it should only kill the bricks on hyp03, right? So then why 
>> does the VM on hyp02 die? I don't like the fact that a problem on any 
>> one of the hosts can bring down any VM on any host.
>>
>
> Right. Well from a gluster point of view, if you don't want quorum 
> enforcement you can turn it off possibly at the risk of files ending 
> up in split-brain
But I *do* want quorum enforcement (I want to have as little chance as 
possible for a split-brain), and I also want my VM's on different nodes 
not be affected when one host has issues (and I want a pony).
Is there anybody out there that does not have this issue? I googled some 
more and now I also found 
http://www.gluster.org/pipermail/gluster-users/2014-February/016015.html 
what looks like the same issue but unfortunately without a solution.

Maybe I should formulate some clear questions:
1) Am I correct in assuming that an issue on of of 3 gluster nodes 
should not cause downtime for VM's on other nodes?
2) What can I/we do to fix the issue I am seeing?
3) Can anybody else reproduce my issue?

-- 
Sander