On 08/20/2015 02:14 PM, Sander Hoentjen wrote:
On 08/19/2015 09:04 AM, Ravishankar N wrote:
>
>
> On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:
>> + Ravi from gluster.
>>
>> Regards,
>> Ramesh
>>
>> ----- Original Message -----
>> From: "Sander Hoentjen" <sander(a)hoentjen.eu>
>> To: users(a)ovirt.org
>> Sent: Tuesday, August 18, 2015 3:30:35 PM
>> Subject: [ovirt-users] Ovirt/Gluster
>>
>> Hi,
>>
>> We are looking for some easy to manage self contained VM hosting. Ovirt
>> with GlusterFS seems to fit that bill perfectly. I installed it and
>> then
>> starting kicking the tires. First results looked promising, but now I
>> can get a VM to pause indefinitely fairly easy:
>>
>> My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is
>> setup as replica-3. The gluster export is used as the storage domain
>> for
>> the VM's.
>
> Hi,
>
> What version of gluster and ovirt are you using?
glusterfs-3.7.3-1.el7.x86_64
vdsm-4.16.20-0.el7.centos.x86_64
ovirt-engine-3.5.3.1-1.el7.centos.noarch
>
>>
>> Now when I start the VM all is good, performance is good enough so we
>> are happy. I then start bonnie++ to generate some load. I have a VM
>> running on host 1, host 2 is SPM and all 3 VM's are seeing some network
>> traffic courtesy of gluster.
>>
>> Now, for fun, suddenly the network on host3 goes bad (iptables -I
>> OUTPUT
>> -m statistic --mode random --probability 0.75 -j REJECT).
>> Some time later I see the guest has a small "hickup", I'm guessing
that
>> is when gluster decides host 3 is not allowed to play anymore. No big
>> deal anyway.
>> After a while 25% of packages just isn't good enough for Ovirt anymore,
>> so the host will be fenced.
>
> I'm not sure what fencing means w.r.t ovirt and what it actually
> fences. As far is gluster is concerned, since only one node is
> blocked, the VM image should still be accessible by the VM running on
> host1.
Fencing means (at least in this case) that the IPMI of the server does
a power reset.
>> After a reboot *sometimes* the VM will be
>> paused, and even after the gluster self-heal is complete it can not be
>> unpaused, has to be restarted.
>
> Could you provide the gluster mount (fuse?) logs and the brick logs
> of all 3 nodes when the VM is paused? That should give us some clue.
>
Logs are attached. Problem was at around 8:15 - 8:20 UTC
This time however the vm stopped even without a reboot of hyp03
The mount logs (rhev-data-center-mnt-glusterSD*) are indicating
frequent disconnects to the bricks with 'clnt_ping_timer_expired',
'Client-quorum is not met' and 'Read-only file system' messages.
client-quorum is enabled by default for replica 3 volumes. So if the
mount cannot connect to 2 bricks at least, quorum is lost and the
gluster volume becomes read-only. That seems to be the reason why the
VMs are pausing.
I'm not sure if the frequent disconnects are due a flaky network or the
bricks not responding to the mount's ping timer due to it's epoll
threads busy with I/O (unlikely). Can you also share the output of
`gluster volume info <volname>` ?
Regards,
Ravi
> Regards,
> Ravi
>>
>> Is there anything I can do to prevent the VM from being paused?
>>
>> Regards,
>> Sander
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
>