Re: [ovirt-users] ovirt 3.6.6 and gluster 3.7.13

25 Jul 2016

      On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com>
wrote:
...
OK, could you try the following:
i. Set network.remote-dio to off
        # gluster volume set <VOL> network.remote-dio off
ii. Set performance.strict-o-direct to on
        # gluster volume set <VOL> performance.strict-o-direct on
iii. Stop the affected vm(s) and start again
and tell me if you notice any improvement?
Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak
disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785
2016-07-25 12:49:09,277 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-33) [] Uncaught exception: : java.lang.ClassCastException
        at Unknown.ps(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
   at Unknown.ts(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
     at Unknown.vs(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
     at Unknown.iJf(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.Xab(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.P8o(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
  at Unknown.jQr(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.A8o(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.u8o(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
   at Unknown.Eap(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
 at Unknown.p8n(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
   at Unknown.Cao(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.Bap(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
 at Unknown.kRn(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.nRn(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
   at Unknown.eVn(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.hVn(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
 at Unknown.MTn(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.PTn(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
 at Unknown.KJe(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
 at Unknown.Izk(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
 at Unknown.P3(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown.g4(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
   at Unknown.<anonymous>(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
    at Unknown._t(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
     at Unknown.du(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...)
     at Unknown.<anonymous>(
https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...
)
...
-Krutika
On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net>
wrote:
...
Hi,
...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com>
wrote:
On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <kdhananj@redhat.com>
wrote:
Hi,
Thanks for the logs. So I have identified one issue from the logs for
which the fix is this: http://review.gluster.org/#/c/14669/. Because of
a bug in the code, ENOENT was getting converted to EPERM and being
propagated up the stack causing the reads to bail out early with 'Operation
not permitted' errors.
I still need to find out two things:
i) why there was a readv() sent on a non-existent (ENOENT) file (this
is important since some of the other users have not faced or reported this
issue on gluster-users with 3.7.13)
ii) need to see if there's a way to work around this issue.
Do you mind sharing the steps needed to be executed to run into this
issue? This is so that we can apply our patches, test and ensure they fix
the problem.
Unfortunately I can’t test this right away nor give exact steps how to
test this. This is just a theory but please correct me if you see some
mistakes.
oVirt uses cache=none settings for VM’s by default which requires direct
I/O. oVirt also uses dd with iflag=direct to check that storage has direct
I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks
running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11
and problems exist at least with version .12 and .13. There has been some
posts saying that GlusterFS 3.8.x is also affected.
Steps to reproduce:
1. Sharded file is created with GlusterFS 3.7.11. Everything works ok.
2. GlusterFS is upgraded to 3.7.12+
3. Sharded file cannot be read or written with direct I/O enabled. (Ie.
oVirt uses to check storage connection with command "dd
if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox
iflag=direct,fullblock count=1 bs=1024000”)
Please let me know if you need more information.
-samuli
...
Well after upgrade of gluster all I did was start ovirt hosts up which
launched and started their ha-agent and broker processes.  I don't believe
I started getting any errors till it mounted GLUSTER1.  I had enabled
sharding but had no sharded disk images yet.  Not sure if the check for
shards would have caused that.  Unfortunately I can't just update this
cluster and try and see what caused it as it has sme VM's users expect to
be available in few hours.
I can see if I can get my test setup to recreate it.  I think I'll need
to de-activate data center so I can detach the storage thats on xfs and
attach the one thats over zfs with sharding enabled.  My test is 3 bricks
on same local machine, with 3 different volumes but I think im running into
sanlock issue or something as it won't mount more than one volume that was
created locally.
-Krutika
On Fri, Jul 22, 2016 at 7:17 PM, David Gossage <
dgossage@carouselchecks.com> wrote:
Trimmed out the logs to just about when I was shutting down ovirt
servers for updates which was 14:30 UTC 2016-07-09
Pre-update settings were
Volume Name: GLUSTER1
Type: Replicate
Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
Brick3: ccgl3.gl.local:/gluster1/BRICK1/1
Options Reconfigured:
performance.readdir-ahead: on
storage.owner-uid: 36
storage.owner-gid: 36
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
server.allow-insecure: on
cluster.self-heal-window-size: 1024
cluster.background-self-heal-count: 16
performance.strict-write-ordering: off
nfs.disable: on
nfs.addr-namelookup: off
nfs.enable-ino32: off
At the time of updates ccgl3 was offline from bad nic on server but had
been so for about a week with no issues in volume
Shortly after update I added these settings to enable sharding but did
not as of yet have any VM images sharded.
features.shard-block-size: 64MB
features.shard: on
David Gossage
Carousel Checks Inc. | System Administrator
Office 708.613.2284
On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhananj@redhat.com>
wrote:
Hi David,
Could you also share the brick logs from the affected volume? They're
located at
/var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.
Also, could you share the volume configuration (output of `gluster
volume info <VOL>`) for the affected volume(s) AND at the time you actually
saw this issue?
-Krutika
On Thu, Jul 21, 2016 at 11:23 PM, David Gossage <
dgossage@carouselchecks.com> wrote:
On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote:
Hi David,
My backend storage is ZFS.
I thought about moving from FUSE to NFS mounts for my Gluster volumes
to help test.  But since I use hosted engine this would be a real pain.
Its difficult to modify the storage domain type/path in the
hosted-engine.conf.  And I don't want to go through the process of
re-deploying hosted engine.
I found this
https://bugzilla.redhat.com/show_bug.cgi?id=1347553
Not sure if related.
But I also have zfs backend, another user in gluster mailing list had
issues and used zfs backend although she used proxmox and got it working by
changing disk to writeback cache I think it was.
I also use hosted engine, but I run my gluster volume for HE actually
on a LVM separate from zfs on xfs and if i recall it did not have the
issues my gluster on zfs did.  I'm wondering now if the issue was zfs
settings.
Hopefully should have a test machone up soon I can play around with
more.
Scott
On Thu, Jul 21, 2016 at 11:36 AM David Gossage <
dgossage@carouselchecks.com> wrote:
What back end storage do you run gluster on?  xfs/zfs/ext4 etc?
David Gossage
Carousel Checks Inc. | System Administrator
Office 708.613.2284
On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:
I get similar problems with oVirt 4.0.1 and hosted engine.  After
upgrading all my hosts to Gluster 3.7.13 (client and server), I get the
following:
$ sudo hosted-engine --set-maintenance --mode=none
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py",
line 73, in <module>
    if not maintenance.set_mode(sys.argv[1]):
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py",
line 61, in set_mode
    value=m_global,
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 259, in set_maintenance_mode
    str(value))
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 204, in set_global_md_flag
    all_stats = broker.get_stats_from_storage(service)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 232, in get_stats_from_storage
    result = self._checked_communicate(request)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 260, in _checked_communicate
    .format(message or response))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
failed to read metadata: [Errno 1] Operation not permitted
If I only upgrade one host, then things will continue to work but my
nodes are constantly healing shards.  My logs are also flooded with:
[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 274714: READ => -1 gfid=4
41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
permitted)
The message "W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote
operation failed [Operation not permitted]" repeated 6 times between
[2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226]
The message "W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote
operation failed [Operation not permitted]" repeated 8 times between
[2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178]
The message "W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote
operation failed [Operation not permitted]" repeated 7 times between
[2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666]
[2016-07-21 13:15:24.134647] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote
operation failed [Operation not permitted]
[2016-07-21 13:15:24.134764] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote
operation failed [Operation not permitted]
[2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 274741: READ => -1
gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not
permitted)
[2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 274756: READ => -1
gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
permitted)
[2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 274818: READ => -1
gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not
permitted)
[2016-07-21 13:15:54.133582] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote
operation failed [Operation not permitted]
[2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 274853: READ => -1
gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not
permitted)
[2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 274879: READ => -1
gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
permitted)
[2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 274894: READ => -1
gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not
permitted)
Scott
On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein <
f.rothenstein@bodden-kliniken.de> wrote:
Hey Devid,
I have the very same problem on my test-cluster, despite on running
ovirt 4.0.
If you access your volumes via NFS all is fine, problem is FUSE. I
stayed on 3.7.13, but have no solution yet, now I use NFS.
Frank
...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13?  I
am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but
have been told by users on gluster mail list due to some gluster changes
I'd need to change the disk parameters to use writeback cache.  Something
to do with aio support being removed.
I believe this could be done with custom parameters?  But I believe
strage tests are done using dd and would they fail with current settings
Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:
then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability
isues where gluster storage would go into down state and always show N/A as
space available/used.  Even if hosts saw storage still and VM's were
running on it on all 3 hosts.
...
Saw a lot of messages like these that went away once gluster rollback
finished
...
[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel
7.22
...
[2016-07-09 15:27:49.555466] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote
operation failed [Operation not permitted]
[2016-07-09 15:27:49.556574] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote
operation failed [Operation not permitted]
[2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d
fd=0x7f5224002f68 (Operation not permitted)
[2016-07-09 15:27:59.612477] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote
operation failed [Operation not permitted]
[2016-07-09 15:27:59.613700] W [MSGID: 114031]
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote
operation failed [Operation not permitted]
[2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk]
0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d
fd=0x7f5224002f68 (Operation not permitted)
David Gossage
Carousel Checks Inc. | System Administrator
Office 708.613.2284
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
______________________________________________________________________________
...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH
Sandhufe 2
18311 Ribnitz-Damgarten
Telefon: 03821-700-0
Fax:       03821-700-240
E-Mail: info@bodden-kliniken.de   Internet:
http://www.bodden-kliniken.de
Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.:
079/133/40188
Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko
Milski
Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten
Adressaten bestimmt. Wenn Sie nicht der vorge-
sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten,
beachten Sie bitte, dass jede Form der Veröf-
fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail
unzulässig ist. Wir bitten Sie, sofort den
Absender zu informieren und die E-Mail zu löschen.
Bodden-Kliniken Ribnitz-Damgarten GmbH 2016
*** Virenfrei durch Kerio Mail Server und Sophos Antivirus ***
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users