[ovirt-users] Ovirt/Gluster

Fri Aug 21 07:28:25 UTC 2015

On 08/20/2015 02:14 PM, Sander Hoentjen wrote:
>
>
> On 08/19/2015 09:04 AM, Ravishankar N wrote:
>>
>>
>> On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:
>>> + Ravi from gluster.
>>>
>>> Regards,
>>> Ramesh
>>>
>>> ----- Original Message -----
>>> From: "Sander Hoentjen" <sander at hoentjen.eu>
>>> To: users at ovirt.org
>>> Sent: Tuesday, August 18, 2015 3:30:35 PM
>>> Subject: [ovirt-users] Ovirt/Gluster
>>>
>>> Hi,
>>>
>>> We are looking for some easy to manage self contained VM hosting. Ovirt
>>> with GlusterFS seems to fit that bill perfectly. I installed it and 
>>> then
>>> starting kicking the tires. First results looked promising, but now I
>>> can get a VM to pause indefinitely fairly easy:
>>>
>>> My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is
>>> setup as replica-3. The gluster export is used as the storage domain 
>>> for
>>> the VM's.
>>
>> Hi,
>>
>> What version of gluster and ovirt are you using?
> glusterfs-3.7.3-1.el7.x86_64
> vdsm-4.16.20-0.el7.centos.x86_64
> ovirt-engine-3.5.3.1-1.el7.centos.noarch
>>
>>>
>>> Now when I start the VM all is good, performance is good enough so we
>>> are happy. I then start bonnie++ to generate some load. I have a VM
>>> running on host 1, host 2 is SPM and all 3 VM's are seeing some network
>>> traffic courtesy of gluster.
>>>
>>> Now, for fun, suddenly the network on host3 goes bad (iptables -I 
>>> OUTPUT
>>> -m statistic --mode random --probability 0.75 -j REJECT).
>>> Some time later I see the guest has a small "hickup", I'm guessing that
>>> is when gluster decides host 3 is not allowed to play anymore. No big
>>> deal anyway.
>>> After a while 25% of packages just isn't good enough for Ovirt anymore,
>>> so the host will be fenced.
>>
>> I'm not sure what fencing means w.r.t ovirt and what it actually 
>> fences. As far is gluster is concerned, since only one node is 
>> blocked, the VM image should still be accessible by the VM running on 
>> host1.
> Fencing means (at least in this case) that the IPMI of the server does 
> a power reset.
>>> After a reboot *sometimes* the VM will be
>>> paused, and even after the gluster self-heal is complete it can not be
>>> unpaused, has to be restarted.
>>
>> Could you provide the gluster mount (fuse?) logs and the brick logs 
>> of all 3 nodes when the VM is paused? That should give us some clue.
>>
> Logs are attached. Problem was at around 8:15 - 8:20 UTC
> This time however the vm stopped even without a reboot of hyp03

The mount logs  (rhev-data-center-mnt-glusterSD*) are indicating 
frequent disconnects to the bricks  with 'clnt_ping_timer_expired', 
'Client-quorum is not met' and 'Read-only file system' messages.
client-quorum is enabled by default for replica 3 volumes. So if the 
mount cannot connect to 2 bricks at least, quorum is lost and the 
gluster volume becomes read-only. That seems to be the reason why the 
VMs are pausing.
I'm not sure if the frequent disconnects are due a flaky network or the 
bricks not responding to the mount's ping timer due to it's epoll 
threads busy with I/O (unlikely). Can you also share the output of 
`gluster volume info <volname>` ?

Regards,
Ravi

>
>> Regards,
>> Ravi
>>>
>>> Is there anything I can do to prevent the VM from being paused?
>>>
>>> Regards,
>>> Sander
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>
>