
On 08/21/2015 02:21 PM, Ravishankar N wrote:
On 08/21/2015 04:32 PM, Sander Hoentjen wrote:
On 08/21/2015 11:30 AM, Ravishankar N wrote:
On 08/21/2015 01:21 PM, Sander Hoentjen wrote:
On 08/21/2015 09:28 AM, Ravishankar N wrote:
On 08/20/2015 02:14 PM, Sander Hoentjen wrote:
On 08/19/2015 09:04 AM, Ravishankar N wrote: > > > On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote: >> + Ravi from gluster. >> >> Regards, >> Ramesh >> >> ----- Original Message ----- >> From: "Sander Hoentjen" <sander@hoentjen.eu> >> To: users@ovirt.org >> Sent: Tuesday, August 18, 2015 3:30:35 PM >> Subject: [ovirt-users] Ovirt/Gluster >> >> Hi, >> >> We are looking for some easy to manage self contained VM >> hosting. Ovirt >> with GlusterFS seems to fit that bill perfectly. I installed it >> and then >> starting kicking the tires. First results looked promising, but >> now I >> can get a VM to pause indefinitely fairly easy: >> >> My setup is 3 hosts that are in a Virt and Gluster cluster. >> Gluster is >> setup as replica-3. The gluster export is used as the storage >> domain for >> the VM's. > > Hi, > > What version of gluster and ovirt are you using? glusterfs-3.7.3-1.el7.x86_64 vdsm-4.16.20-0.el7.centos.x86_64 ovirt-engine-3.5.3.1-1.el7.centos.noarch > >> >> Now when I start the VM all is good, performance is good enough >> so we >> are happy. I then start bonnie++ to generate some load. I have >> a VM >> running on host 1, host 2 is SPM and all 3 VM's are seeing some >> network >> traffic courtesy of gluster. >> >> Now, for fun, suddenly the network on host3 goes bad (iptables >> -I OUTPUT >> -m statistic --mode random --probability 0.75 -j REJECT). >> Some time later I see the guest has a small "hickup", I'm >> guessing that >> is when gluster decides host 3 is not allowed to play anymore. >> No big >> deal anyway. >> After a while 25% of packages just isn't good enough for Ovirt >> anymore, >> so the host will be fenced. > > I'm not sure what fencing means w.r.t ovirt and what it actually > fences. As far is gluster is concerned, since only one node is > blocked, the VM image should still be accessible by the VM > running on host1. Fencing means (at least in this case) that the IPMI of the server does a power reset. >> After a reboot *sometimes* the VM will be >> paused, and even after the gluster self-heal is complete it can >> not be >> unpaused, has to be restarted. > > Could you provide the gluster mount (fuse?) logs and the brick > logs of all 3 nodes when the VM is paused? That should give us > some clue. > Logs are attached. Problem was at around 8:15 - 8:20 UTC This time however the vm stopped even without a reboot of hyp03
The mount logs (rhev-data-center-mnt-glusterSD*) are indicating frequent disconnects to the bricks with 'clnt_ping_timer_expired', 'Client-quorum is not met' and 'Read-only file system' messages. client-quorum is enabled by default for replica 3 volumes. So if the mount cannot connect to 2 bricks at least, quorum is lost and the gluster volume becomes read-only. That seems to be the reason why the VMs are pausing. I'm not sure if the frequent disconnects are due a flaky network or the bricks not responding to the mount's ping timer due to it's epoll threads busy with I/O (unlikely). Can you also share the output of `gluster volume info <volname>` ?
The frequent disconnects are probably because I intentionally broke the network on hyp03 (dropped 75% of outgoing packets). In my opinion this should not affect the VM an hyp02. Am I wrong to think that?
For client-quorum, If a client (mount) cannot connect to the number of bricks to achieve quorum, the client becomes read-only. So if the client on hyp02 can see itself and 01, it shouldn't be affected.
But it was, and I only "broke" hyp03.
Beats me then. I see "[2015-08-18 15:15:27.922998] W [MSGID: 108001] [afr-common.c:4043:afr_notify] 0-VMS-replicate-0: Client-quorum is not met" on hyp02's mount log but the time stamp is earlier than when you say you observed the hang (2015-08-20, around 8:15 - 8:20 UTC?). (they do occur in that time on hyp03 though).
Yeah that event is from before. For your information: This setup is used to test, so I try to break it and hope I don't succeed. Unfortunately I succeeded.
[root@hyp01 ~]# gluster volume info VMS
Volume Name: VMS Type: Replicate Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.99.50.20:/brick/VMS Brick2: 10.99.50.21:/brick/VMS Brick3: 10.99.50.22:/brick/VMS Options Reconfigured: performance.readdir-ahead: on nfs.disable: on user.cifs: disable auth.allow: * performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server
I see that you have enabled server-quorum too. Since you blocked hyp03, the if the glusterd on that node cannot see the other 2 nodes due to iptable rules, it would kill all brick processes. See the "7 How To Test " section in http://www.gluster.org/community/documentation/index.php/Features/Server-quo... to get a better idea of server-quorum.
Yes but it should only kill the bricks on hyp03, right? So then why does the VM on hyp02 die? I don't like the fact that a problem on any one of the hosts can bring down any VM on any host.
Right. Well from a gluster point of view, if you don't want quorum enforcement you can turn it off possibly at the risk of files ending up in split-brain
But I *do* want quorum enforcement (I want to have as little chance as possible for a split-brain), and I also want my VM's on different nodes not be affected when one host has issues (and I want a pony). Is there anybody out there that does not have this issue? I googled some more and now I also found http://www.gluster.org/pipermail/gluster-users/2014-February/016015.html what looks like the same issue but unfortunately without a solution. Maybe I should formulate some clear questions: 1) Am I correct in assuming that an issue on of of 3 gluster nodes should not cause downtime for VM's on other nodes? 2) What can I/we do to fix the issue I am seeing? 3) Can anybody else reproduce my issue? -- Sander