On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj(a)redhat.com>
wrote:
OK, could you try the following:
i. Set network.remote-dio to off
# gluster volume set <VOL> network.remote-dio off
ii. Set performance.strict-o-direct to on
# gluster volume set <VOL> performance.strict-o-direct on
iii. Stop the affected vm(s) and start again
and tell me if you notice any improvement?
Previous instll I had issue with is still on gluster 3.7.11
My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak
disk right now isn't allowing me to add the gluster storage at all.
Keep getting some type of UI error
2016-07-25 12:49:09,277 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785
2016-07-25 12:49:09,277 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-33) [] Uncaught exception: : java.lang.ClassCastException
at Unknown.ps(
-Krutika
On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah(a)neutraali.net>
wrote:
> Hi,
>
> > On 25 Jul 2016, at 12:34, David Gossage <dgossage(a)carouselchecks.com>
> wrote:
> >
> > On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <kdhananj(a)redhat.com>
> wrote:
> > Hi,
> >
> > Thanks for the logs. So I have identified one issue from the logs for
> which the fix is this:
http://review.gluster.org/#/c/14669/. Because of
> a bug in the code, ENOENT was getting converted to EPERM and being
> propagated up the stack causing the reads to bail out early with 'Operation
> not permitted' errors.
> > I still need to find out two things:
> > i) why there was a readv() sent on a non-existent (ENOENT) file (this
> is important since some of the other users have not faced or reported this
> issue on gluster-users with 3.7.13)
> > ii) need to see if there's a way to work around this issue.
> >
> > Do you mind sharing the steps needed to be executed to run into this
> issue? This is so that we can apply our patches, test and ensure they fix
> the problem.
>
>
> Unfortunately I can’t test this right away nor give exact steps how to
> test this. This is just a theory but please correct me if you see some
> mistakes.
>
> oVirt uses cache=none settings for VM’s by default which requires direct
> I/O. oVirt also uses dd with iflag=direct to check that storage has direct
> I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks
> running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11
> and problems exist at least with version .12 and .13. There has been some
> posts saying that GlusterFS 3.8.x is also affected.
>
> Steps to reproduce:
> 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok.
> 2. GlusterFS is upgraded to 3.7.12+
> 3. Sharded file cannot be read or written with direct I/O enabled. (Ie.
> oVirt uses to check storage connection with command "dd
> if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox
> iflag=direct,fullblock count=1 bs=1024000”)
>
> Please let me know if you need more information.
>
> -samuli
>
> > Well after upgrade of gluster all I did was start ovirt hosts up which
> launched and started their ha-agent and broker processes. I don't believe
> I started getting any errors till it mounted GLUSTER1. I had enabled
> sharding but had no sharded disk images yet. Not sure if the check for
> shards would have caused that. Unfortunately I can't just update this
> cluster and try and see what caused it as it has sme VM's users expect to
> be available in few hours.
> >
> > I can see if I can get my test setup to recreate it. I think I'll need
> to de-activate data center so I can detach the storage thats on xfs and
> attach the one thats over zfs with sharding enabled. My test is 3 bricks
> on same local machine, with 3 different volumes but I think im running into
> sanlock issue or something as it won't mount more than one volume that was
> created locally.
> >
> >
> > -Krutika
> >
> > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage <
> dgossage(a)carouselchecks.com> wrote:
> > Trimmed out the logs to just about when I was shutting down ovirt
> servers for updates which was 14:30 UTC 2016-07-09
> >
> > Pre-update settings were
> >
> > Volume Name: GLUSTER1
> > Type: Replicate
> > Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
> > Status: Started
> > Number of Bricks: 1 x 3 = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
> > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
> > Brick3: ccgl3.gl.local:/gluster1/BRICK1/1
> > Options Reconfigured:
> > performance.readdir-ahead: on
> > storage.owner-uid: 36
> > storage.owner-gid: 36
> > performance.quick-read: off
> > performance.read-ahead: off
> > performance.io-cache: off
> > performance.stat-prefetch: off
> > cluster.eager-lock: enable
> > network.remote-dio: enable
> > cluster.quorum-type: auto
> > cluster.server-quorum-type: server
> > server.allow-insecure: on
> > cluster.self-heal-window-size: 1024
> > cluster.background-self-heal-count: 16
> > performance.strict-write-ordering: off
> > nfs.disable: on
> > nfs.addr-namelookup: off
> > nfs.enable-ino32: off
> >
> > At the time of updates ccgl3 was offline from bad nic on server but had
> been so for about a week with no issues in volume
> >
> > Shortly after update I added these settings to enable sharding but did
> not as of yet have any VM images sharded.
> > features.shard-block-size: 64MB
> > features.shard: on
> >
> >
> >
> >
> > David Gossage
> > Carousel Checks Inc. | System Administrator
> > Office 708.613.2284
> >
> > On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhananj(a)redhat.com>
> wrote:
> > Hi David,
> >
> > Could you also share the brick logs from the affected volume? They're
> located at
> /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.
> >
> > Also, could you share the volume configuration (output of `gluster
> volume info <VOL>`) for the affected volume(s) AND at the time you actually
> saw this issue?
> >
> > -Krutika
> >
> >
> >
> >
> > On Thu, Jul 21, 2016 at 11:23 PM, David Gossage <
> dgossage(a)carouselchecks.com> wrote:
> > On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer(a)gmail.com> wrote:
> > Hi David,
> >
> > My backend storage is ZFS.
> >
> > I thought about moving from FUSE to NFS mounts for my Gluster volumes
> to help test. But since I use hosted engine this would be a real pain.
> Its difficult to modify the storage domain type/path in the
> hosted-engine.conf. And I don't want to go through the process of
> re-deploying hosted engine.
> >
> >
> > I found this
> >
> >
https://bugzilla.redhat.com/show_bug.cgi?id=1347553
> >
> > Not sure if related.
> >
> > But I also have zfs backend, another user in gluster mailing list had
> issues and used zfs backend although she used proxmox and got it working by
> changing disk to writeback cache I think it was.
> >
> > I also use hosted engine, but I run my gluster volume for HE actually
> on a LVM separate from zfs on xfs and if i recall it did not have the
> issues my gluster on zfs did. I'm wondering now if the issue was zfs
> settings.
> >
> > Hopefully should have a test machone up soon I can play around with
> more.
> >
> > Scott
> >
> > On Thu, Jul 21, 2016 at 11:36 AM David Gossage <
> dgossage(a)carouselchecks.com> wrote:
> > What back end storage do you run gluster on? xfs/zfs/ext4 etc?
> >
> > David Gossage
> > Carousel Checks Inc. | System Administrator
> > Office 708.613.2284
> >
> > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer(a)gmail.com> wrote:
> > I get similar problems with oVirt 4.0.1 and hosted engine. After
> upgrading all my hosts to Gluster 3.7.13 (client and server), I get the
> following:
> >
> > $ sudo hosted-engine --set-maintenance --mode=none
> > Traceback (most recent call last):
> > File "/usr/lib64/python2.7/runpy.py", line 162, in
_run_module_as_main
> > "__main__", fname, loader, pkg_name)
> > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
> > exec code in run_globals
> > File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py",
> line 73, in <module>
> > if not maintenance.set_mode(sys.argv[1]):
> > File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py",
> line 61, in set_mode
> > value=m_global,
> > File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 259, in set_maintenance_mode
> > str(value))
> > File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 204, in set_global_md_flag
> > all_stats = broker.get_stats_from_storage(service)
> > File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 232, in get_stats_from_storage
> > result = self._checked_communicate(request)
> > File
>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 260, in _checked_communicate
> > .format(message or response))
> > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
> failed to read metadata: [Errno 1] Operation not permitted
> >
> > If I only upgrade one host, then things will continue to work but my
> nodes are constantly healing shards. My logs are also flooded with:
> >
> > [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 274714: READ => -1 gfid=4
> > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
> permitted)
> > The message "W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote
> operation failed [Operation not permitted]" repeated 6 times between
> [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226]
> > The message "W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote
> operation failed [Operation not permitted]" repeated 8 times between
> [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178]
> > The message "W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote
> operation failed [Operation not permitted]" repeated 7 times between
> [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666]
> > [2016-07-21 13:15:24.134647] W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote
> operation failed [Operation not permitted]
> > [2016-07-21 13:15:24.134764] W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote
> operation failed [Operation not permitted]
> > [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 274741: READ => -1
> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not
> permitted)
> > [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 274756: READ => -1
> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
> permitted)
> > [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 274818: READ => -1
> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not
> permitted)
> > [2016-07-21 13:15:54.133582] W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote
> operation failed [Operation not permitted]
> > [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 274853: READ => -1
> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not
> permitted)
> > [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 274879: READ => -1
> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
> permitted)
> > [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 274894: READ => -1
> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not
> permitted)
> >
> > Scott
> >
> >
> > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein <
> f.rothenstein(a)bodden-kliniken.de> wrote:
> > Hey Devid,
> >
> > I have the very same problem on my test-cluster, despite on running
> ovirt 4.0.
> > If you access your volumes via NFS all is fine, problem is FUSE. I
> stayed on 3.7.13, but have no solution yet, now I use NFS.
> >
> > Frank
> >
> > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:
> >> Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I
> am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but
> have been told by users on gluster mail list due to some gluster changes
> I'd need to change the disk parameters to use writeback cache. Something
> to do with aio support being removed.
> >>
> >> I believe this could be done with custom parameters? But I believe
> strage tests are done using dd and would they fail with current settings
> then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability
> isues where gluster storage would go into down state and always show N/A as
> space available/used. Even if hosts saw storage still and VM's were
> running on it on all 3 hosts.
> >>
> >> Saw a lot of messages like these that went away once gluster rollback
> finished
> >>
> >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel
> 7.22
> >> [2016-07-09 15:27:49.555466] W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote
> operation failed [Operation not permitted]
> >> [2016-07-09 15:27:49.556574] W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote
> operation failed [Operation not permitted]
> >> [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d
> fd=0x7f5224002f68 (Operation not permitted)
> >> [2016-07-09 15:27:59.612477] W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote
> operation failed [Operation not permitted]
> >> [2016-07-09 15:27:59.613700] W [MSGID: 114031]
> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote
> operation failed [Operation not permitted]
> >> [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d
> fd=0x7f5224002f68 (Operation not permitted)
> >>
> >> David Gossage
> >> Carousel Checks Inc. | System Administrator
> >> Office 708.613.2284
> >> _______________________________________________
> >> Users mailing list
> >>
> >> Users(a)ovirt.org
> >>
http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> >
> >
> >
> >
> ______________________________________________________________________________
> > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH
> > Sandhufe 2
> > 18311 Ribnitz-Damgarten
> >
> > Telefon: 03821-700-0
> > Fax: 03821-700-240
> >
> > E-Mail: info(a)bodden-kliniken.de Internet:
>
http://www.bodden-kliniken.de
> >
> > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.:
> 079/133/40188
> > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko
> Milski
> >
> > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten
> Adressaten bestimmt. Wenn Sie nicht der vorge-
> > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten,
> beachten Sie bitte, dass jede Form der Veröf-
> > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail
> unzulässig ist. Wir bitten Sie, sofort den
> > Absender zu informieren und die E-Mail zu löschen.
> >
> >
> > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016
> > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus ***
> > _______________________________________________
> > Users mailing list
> > Users(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/users
>
>