[ovirt-users] ovirt 3.6.6 and gluster 3.7.13
David Gossage
dgossage at carouselchecks.com
Mon Jul 25 14:23:42 EDT 2016
On Mon, Jul 25, 2016 at 1:07 PM, David Gossage <dgossage at carouselchecks.com>
wrote:
>
> On Mon, Jul 25, 2016 at 1:00 PM, David Gossage <
> dgossage at carouselchecks.com> wrote:
>
>>
>> On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj at redhat.com>
>> wrote:
>>
>>> OK, could you try the following:
>>>
>>> i. Set network.remote-dio to off
>>> # gluster volume set <VOL> network.remote-dio off
>>>
>>> ii. Set performance.strict-o-direct to on
>>> # gluster volume set <VOL> performance.strict-o-direct on
>>>
>>> iii. Stop the affected vm(s) and start again
>>>
>>> and tell me if you notice any improvement?
>>>
>>
Not sure if helpful but over the gluster mount it creates even though it
won't attech to data center I get this error from bricks log running
following
dd if=/dev/zero
of=/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
oflag=direct count=100 bs=1M
dd: error writing
‘/rhev/data-center/mnt/glusterSD/192.168.71.10:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test’:
Invalid argument
dd: closing output file
‘/rhev/data-center/mnt/glusterSD/192.168.71.10:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test’:
Invalid argument
[2016-07-25 18:20:19.393121] E [MSGID: 113039] [posix.c:2939:posix_open]
0-glustershard-posix: open on
/gluster2/brick1/1/.glusterfs/02/f4/02f4783b-2799-46d9-b787-53e4ccd9a052,
flags: 16385 [Invalid argument]
[2016-07-25 18:20:19.393204] E [MSGID: 115070]
[server-rpc-fops.c:1568:server_open_cbk] 0-glustershard-server: 120: OPEN
/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
(02f4783b-2799-46d9-b787-53e4ccd9a052) ==> (Invalid argument) [Invalid
argument]
and
/var/log/glusterfs/rhev-data-center-mnt-glusterSD-192.168.71.10\:_glustershard.log
[2016-07-25 18:20:19.393275] E [MSGID: 114031]
[client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-0: remote
operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
(02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument]
[2016-07-25 18:20:19.393270] E [MSGID: 114031]
[client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-1: remote
operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
(02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument]
[2016-07-25 18:20:19.393317] E [MSGID: 114031]
[client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-2: remote
operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
(02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument]
[2016-07-25 18:20:19.393357] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 117: WRITE => -1
gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid
argument)
[2016-07-25 18:20:19.393389] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 118: WRITE => -1
gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid
argument)
[2016-07-25 18:20:19.393611] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 119: WRITE => -1
gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid
argument)
[2016-07-25 18:20:19.393708] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 120: WRITE => -1
gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid
argument)
[2016-07-25 18:20:19.393771] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 121: WRITE => -1
gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid
argument)
[2016-07-25 18:20:19.393840] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 122: WRITE => -1
gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid
argument)
[2016-07-25 18:20:19.393914] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 123: WRITE => -1
gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid
argument)
[2016-07-25 18:20:19.393982] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 124: WRITE => -1
gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid
argument)
[2016-07-25 18:20:19.394045] W [fuse-bridge.c:709:fuse_truncate_cbk]
0-glusterfs-fuse: 125: FTRUNCATE() ERR => -1 (Invalid argument)
[2016-07-25 18:20:19.394338] W [fuse-bridge.c:1290:fuse_err_cbk]
0-glusterfs-fuse: 126: FLUSH() ERR => -1 (Invalid argument)
>>>
>> Previous instll I had issue with is still on gluster 3.7.11
>>
>> My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a
>> locak disk right now isn't allowing me to add the gluster storage at all.
>>
>> Keep getting some type of UI error
>>
>> 2016-07-25 12:49:09,277 ERROR
>> [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
>> (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785
>> 2016-07-25 12:49:09,277 ERROR
>> [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
>> (default task-33) [] Uncaught exception: : java.lang.ClassCastException
>> at Unknown.ps(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@3837)
>> at Unknown.ts(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@20)
>> at Unknown.vs(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@18)
>> at Unknown.iJf(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@19)
>> at Unknown.Xab(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@48)
>> at Unknown.P8o(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@4447)
>> at Unknown.jQr(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@21)
>> at Unknown.A8o(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@51)
>> at Unknown.u8o(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@101)
>> at Unknown.Eap(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10718)
>> at Unknown.p8n(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@161)
>> at Unknown.Cao(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@31)
>> at Unknown.Bap(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10469)
>> at Unknown.kRn(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@49)
>> at Unknown.nRn(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@438)
>> at Unknown.eVn(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@40)
>> at Unknown.hVn(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@25827)
>> at Unknown.MTn(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@25)
>> at Unknown.PTn(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@24052)
>> at Unknown.KJe(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@21125)
>> at Unknown.Izk(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10384)
>> at Unknown.P3(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@137)
>> at Unknown.g4(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@8271)
>> at Unknown.<anonymous>(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@65)
>> at Unknown._t(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@29)
>> at Unknown.du(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@57)
>> at Unknown.<anonymous>(
>> https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@54
>> )
>>
>>
>
> If I add from storage tab it creates storage domaibn but won't attach to a
> datacenter
>
> Error while executing action Attach Storage Domain: AcquireHostIdFailure
> engine.log
> 2016-07-25 13:04:45,186 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand]
> (default task-90) [4e0e7cbd] Failed in 'CreateStoragePoolVDS' method
> 2016-07-25 13:04:45,211 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (default task-90) [4e0e7cbd] Correlation ID: null, Call Stack: null, Custom
> Event ID: -1, Message: VDSM local command failed: Cannot acquire host id:
> (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock
> lockspace add failure', 'Operation not permitted'))
> 2016-07-25 13:04:45,211 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand]
> (default task-90) [4e0e7cbd] Command
> 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand'
> return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=661,
> message=Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08',
> SanlockException(1, 'Sanlock lockspace add failure', 'Operation not
> permitted'))]]'
> 2016-07-25 13:04:45,211 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand]
> (default task-90) [4e0e7cbd] HostName = local
> 2016-07-25 13:04:45,212 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand]
> (default task-90) [4e0e7cbd] Command 'CreateStoragePoolVDSCommand(HostName
> = local, CreateStoragePoolVDSCommandParameters:{runAsync='true',
> hostId='b4d03420-3de8-45b8-a671-45bbe7c05e06',
> storagePoolId='7fe4f6ec-71aa-485b-8bba-958e493b66eb',
> storagePoolName='NewDefault',
> masterDomainId='5b8a4477-4d87-43a1-aa52-b664b1bd9e08',
> domainsIdList='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08]',
> masterVersion='4'})' execution failed: VDSGenericException:
> VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire
> host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1,
> 'Sanlock lockspace add failure', 'Operation not permitted')), code = 661
> 2016-07-25 13:04:45,212 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand]
> (default task-90) [4e0e7cbd] FINISH, CreateStoragePoolVDSCommand, log id:
> 2ed8b2b6
> 2016-07-25 13:04:45,212 ERROR
> [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand]
> (default task-90) [4e0e7cbd] Command
> 'org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand'
> failed: EngineException:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException:
> VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS,
> error = Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08',
> SanlockException(1, 'Sanlock lockspace add failure', 'Operation not
> permitted')), code = 661 (Failed with error AcquireHostIdFailure and code
> 661)
> 2016-07-25 13:04:45,220 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (default task-90) [4e0e7cbd] Correlation ID: 4f77f0e0, Job ID:
> 6aae65f2-ff61-4bec-a513-18b31828442b, Call Stack: null, Custom Event ID:
> -1, Message: Failed to attach Storage Domains to Data Center NewDefault.
> (User: admin at internal)
> 2016-07-25 13:04:45,228 INFO
> [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand]
> (default task-90) [4e0e7cbd] Lock freed to object
> 'EngineLock:{exclusiveLocks='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08=<STORAGE,
> ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
> 2016-07-25 13:04:45,229 INFO
> [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand]
> (default task-90) [4e0e7cbd] Command
> [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating
> DELETED_OR_UPDATED_ENTITY of
> org.ovirt.engine.core.common.businessentities.StoragePool; snapshot:
> id=7fe4f6ec-71aa-485b-8bba-958e493b66eb.
> 2016-07-25 13:04:45,231 INFO
> [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand]
> (default task-90) [4e0e7cbd] Command
> [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating NEW_ENTITY_ID of
> org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot:
> StoragePoolIsoMapId:{storagePoolId='7fe4f6ec-71aa-485b-8bba-958e493b66eb',
> storageId='5b8a4477-4d87-43a1-aa52-b664b1bd9e08'}.
> 2016-07-25 13:04:45,231 INFO
> [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand]
> (default task-90) [4e0e7cbd] Command
> [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating
> DELETED_OR_UPDATED_ENTITY of
> org.ovirt.engine.core.common.businessentities.StorageDomainStatic;
> snapshot: id=5b8a4477-4d87-43a1-aa52-b664b1bd9e08.
> 2016-07-25 13:04:45,245 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (default task-90) [4e0e7cbd] Correlation ID: 6cae9150, Job ID:
> 6aae65f2-ff61-4bec-a513-18b31828442b, Call Stack: null, Custom Event ID:
> -1, Message: Failed to attach Storage Domain newone to Data Center
> NewDefault. (User: admin at internal)
> 2016-07-25 13:04:45,253 WARN
> [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-90)
> [4e0e7cbd] Trying to release exclusive lock which does not exist, lock key:
> '5b8a4477-4d87-43a1-aa52-b664b1bd9e08STORAGE'
> 2016-07-25 13:04:45,253 INFO
> [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand]
> (default task-90) [4e0e7cbd] Lock freed to object
> 'EngineLock:{exclusiveLocks='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08=<STORAGE,
> ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
>
>
>
>
>> -Krutika
>>>
>>> On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah at neutraali.net>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> > On 25 Jul 2016, at 12:34, David Gossage <dgossage at carouselchecks.com>
>>>> wrote:
>>>> >
>>>> > On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <
>>>> kdhananj at redhat.com> wrote:
>>>> > Hi,
>>>> >
>>>> > Thanks for the logs. So I have identified one issue from the logs for
>>>> which the fix is this: http://review.gluster.org/#/c/14669/. Because
>>>> of a bug in the code, ENOENT was getting converted to EPERM and being
>>>> propagated up the stack causing the reads to bail out early with 'Operation
>>>> not permitted' errors.
>>>> > I still need to find out two things:
>>>> > i) why there was a readv() sent on a non-existent (ENOENT) file (this
>>>> is important since some of the other users have not faced or reported this
>>>> issue on gluster-users with 3.7.13)
>>>> > ii) need to see if there's a way to work around this issue.
>>>> >
>>>> > Do you mind sharing the steps needed to be executed to run into this
>>>> issue? This is so that we can apply our patches, test and ensure they fix
>>>> the problem.
>>>>
>>>>
>>>> Unfortunately I can’t test this right away nor give exact steps how to
>>>> test this. This is just a theory but please correct me if you see some
>>>> mistakes.
>>>>
>>>> oVirt uses cache=none settings for VM’s by default which requires
>>>> direct I/O. oVirt also uses dd with iflag=direct to check that storage has
>>>> direct I/O enabled. Problems exist with GlusterFS with sharding enabled and
>>>> bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS
>>>> 3.7.11 and problems exist at least with version .12 and .13. There has been
>>>> some posts saying that GlusterFS 3.8.x is also affected.
>>>>
>>>> Steps to reproduce:
>>>> 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok.
>>>> 2. GlusterFS is upgraded to 3.7.12+
>>>> 3. Sharded file cannot be read or written with direct I/O enabled. (Ie.
>>>> oVirt uses to check storage connection with command "dd
>>>> if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox
>>>> iflag=direct,fullblock count=1 bs=1024000”)
>>>>
>>>> Please let me know if you need more information.
>>>>
>>>> -samuli
>>>>
>>>> > Well after upgrade of gluster all I did was start ovirt hosts up
>>>> which launched and started their ha-agent and broker processes. I don't
>>>> believe I started getting any errors till it mounted GLUSTER1. I had
>>>> enabled sharding but had no sharded disk images yet. Not sure if the check
>>>> for shards would have caused that. Unfortunately I can't just update this
>>>> cluster and try and see what caused it as it has sme VM's users expect to
>>>> be available in few hours.
>>>> >
>>>> > I can see if I can get my test setup to recreate it. I think I'll
>>>> need to de-activate data center so I can detach the storage thats on xfs
>>>> and attach the one thats over zfs with sharding enabled. My test is 3
>>>> bricks on same local machine, with 3 different volumes but I think im
>>>> running into sanlock issue or something as it won't mount more than one
>>>> volume that was created locally.
>>>> >
>>>> >
>>>> > -Krutika
>>>> >
>>>> > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>> > Trimmed out the logs to just about when I was shutting down ovirt
>>>> servers for updates which was 14:30 UTC 2016-07-09
>>>> >
>>>> > Pre-update settings were
>>>> >
>>>> > Volume Name: GLUSTER1
>>>> > Type: Replicate
>>>> > Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f
>>>> > Status: Started
>>>> > Number of Bricks: 1 x 3 = 3
>>>> > Transport-type: tcp
>>>> > Bricks:
>>>> > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>> > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>> > Brick3: ccgl3.gl.local:/gluster1/BRICK1/1
>>>> > Options Reconfigured:
>>>> > performance.readdir-ahead: on
>>>> > storage.owner-uid: 36
>>>> > storage.owner-gid: 36
>>>> > performance.quick-read: off
>>>> > performance.read-ahead: off
>>>> > performance.io-cache: off
>>>> > performance.stat-prefetch: off
>>>> > cluster.eager-lock: enable
>>>> > network.remote-dio: enable
>>>> > cluster.quorum-type: auto
>>>> > cluster.server-quorum-type: server
>>>> > server.allow-insecure: on
>>>> > cluster.self-heal-window-size: 1024
>>>> > cluster.background-self-heal-count: 16
>>>> > performance.strict-write-ordering: off
>>>> > nfs.disable: on
>>>> > nfs.addr-namelookup: off
>>>> > nfs.enable-ino32: off
>>>> >
>>>> > At the time of updates ccgl3 was offline from bad nic on server but
>>>> had been so for about a week with no issues in volume
>>>> >
>>>> > Shortly after update I added these settings to enable sharding but
>>>> did not as of yet have any VM images sharded.
>>>> > features.shard-block-size: 64MB
>>>> > features.shard: on
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > David Gossage
>>>> > Carousel Checks Inc. | System Administrator
>>>> > Office 708.613.2284
>>>> >
>>>> > On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <
>>>> kdhananj at redhat.com> wrote:
>>>> > Hi David,
>>>> >
>>>> > Could you also share the brick logs from the affected volume? They're
>>>> located at
>>>> /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.
>>>> >
>>>> > Also, could you share the volume configuration (output of `gluster
>>>> volume info <VOL>`) for the affected volume(s) AND at the time you actually
>>>> saw this issue?
>>>> >
>>>> > -Krutika
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Jul 21, 2016 at 11:23 PM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>> > On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer at gmail.com> wrote:
>>>> > Hi David,
>>>> >
>>>> > My backend storage is ZFS.
>>>> >
>>>> > I thought about moving from FUSE to NFS mounts for my Gluster volumes
>>>> to help test. But since I use hosted engine this would be a real pain.
>>>> Its difficult to modify the storage domain type/path in the
>>>> hosted-engine.conf. And I don't want to go through the process of
>>>> re-deploying hosted engine.
>>>> >
>>>> >
>>>> > I found this
>>>> >
>>>> > https://bugzilla.redhat.com/show_bug.cgi?id=1347553
>>>> >
>>>> > Not sure if related.
>>>> >
>>>> > But I also have zfs backend, another user in gluster mailing list had
>>>> issues and used zfs backend although she used proxmox and got it working by
>>>> changing disk to writeback cache I think it was.
>>>> >
>>>> > I also use hosted engine, but I run my gluster volume for HE actually
>>>> on a LVM separate from zfs on xfs and if i recall it did not have the
>>>> issues my gluster on zfs did. I'm wondering now if the issue was zfs
>>>> settings.
>>>> >
>>>> > Hopefully should have a test machone up soon I can play around with
>>>> more.
>>>> >
>>>> > Scott
>>>> >
>>>> > On Thu, Jul 21, 2016 at 11:36 AM David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>> > What back end storage do you run gluster on? xfs/zfs/ext4 etc?
>>>> >
>>>> > David Gossage
>>>> > Carousel Checks Inc. | System Administrator
>>>> > Office 708.613.2284
>>>> >
>>>> > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer at gmail.com> wrote:
>>>> > I get similar problems with oVirt 4.0.1 and hosted engine. After
>>>> upgrading all my hosts to Gluster 3.7.13 (client and server), I get the
>>>> following:
>>>> >
>>>> > $ sudo hosted-engine --set-maintenance --mode=none
>>>> > Traceback (most recent call last):
>>>> > File "/usr/lib64/python2.7/runpy.py", line 162, in
>>>> _run_module_as_main
>>>> > "__main__", fname, loader, pkg_name)
>>>> > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
>>>> > exec code in run_globals
>>>> > File
>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py",
>>>> line 73, in <module>
>>>> > if not maintenance.set_mode(sys.argv[1]):
>>>> > File
>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py",
>>>> line 61, in set_mode
>>>> > value=m_global,
>>>> > File
>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>> line 259, in set_maintenance_mode
>>>> > str(value))
>>>> > File
>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>> line 204, in set_global_md_flag
>>>> > all_stats = broker.get_stats_from_storage(service)
>>>> > File
>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>> line 232, in get_stats_from_storage
>>>> > result = self._checked_communicate(request)
>>>> > File
>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>> line 260, in _checked_communicate
>>>> > .format(message or response))
>>>> > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
>>>> failed to read metadata: [Errno 1] Operation not permitted
>>>> >
>>>> > If I only upgrade one host, then things will continue to work but my
>>>> nodes are constantly healing shards. My logs are also flooded with:
>>>> >
>>>> > [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 274714: READ => -1 gfid=4
>>>> > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
>>>> permitted)
>>>> > The message "W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote
>>>> operation failed [Operation not permitted]" repeated 6 times between
>>>> [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226]
>>>> > The message "W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote
>>>> operation failed [Operation not permitted]" repeated 8 times between
>>>> [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178]
>>>> > The message "W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote
>>>> operation failed [Operation not permitted]" repeated 7 times between
>>>> [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666]
>>>> > [2016-07-21 13:15:24.134647] W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote
>>>> operation failed [Operation not permitted]
>>>> > [2016-07-21 13:15:24.134764] W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote
>>>> operation failed [Operation not permitted]
>>>> > [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 274741: READ => -1
>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not
>>>> permitted)
>>>> > [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 274756: READ => -1
>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
>>>> permitted)
>>>> > [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 274818: READ => -1
>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not
>>>> permitted)
>>>> > [2016-07-21 13:15:54.133582] W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote
>>>> operation failed [Operation not permitted]
>>>> > [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 274853: READ => -1
>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not
>>>> permitted)
>>>> > [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 274879: READ => -1
>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not
>>>> permitted)
>>>> > [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 274894: READ => -1
>>>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not
>>>> permitted)
>>>> >
>>>> > Scott
>>>> >
>>>> >
>>>> > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein <
>>>> f.rothenstein at bodden-kliniken.de> wrote:
>>>> > Hey Devid,
>>>> >
>>>> > I have the very same problem on my test-cluster, despite on running
>>>> ovirt 4.0.
>>>> > If you access your volumes via NFS all is fine, problem is FUSE. I
>>>> stayed on 3.7.13, but have no solution yet, now I use NFS.
>>>> >
>>>> > Frank
>>>> >
>>>> > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:
>>>> >> Anyone running one of recent 3.6.x lines and gluster using 3.7.13?
>>>> I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but
>>>> have been told by users on gluster mail list due to some gluster changes
>>>> I'd need to change the disk parameters to use writeback cache. Something
>>>> to do with aio support being removed.
>>>> >>
>>>> >> I believe this could be done with custom parameters? But I believe
>>>> strage tests are done using dd and would they fail with current settings
>>>> then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability
>>>> isues where gluster storage would go into down state and always show N/A as
>>>> space available/used. Even if hosts saw storage still and VM's were
>>>> running on it on all 3 hosts.
>>>> >>
>>>> >> Saw a lot of messages like these that went away once gluster
>>>> rollback finished
>>>> >>
>>>> >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]
>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel
>>>> 7.22
>>>> >> [2016-07-09 15:27:49.555466] W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote
>>>> operation failed [Operation not permitted]
>>>> >> [2016-07-09 15:27:49.556574] W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote
>>>> operation failed [Operation not permitted]
>>>> >> [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d
>>>> fd=0x7f5224002f68 (Operation not permitted)
>>>> >> [2016-07-09 15:27:59.612477] W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote
>>>> operation failed [Operation not permitted]
>>>> >> [2016-07-09 15:27:59.613700] W [MSGID: 114031]
>>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote
>>>> operation failed [Operation not permitted]
>>>> >> [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>>> 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d
>>>> fd=0x7f5224002f68 (Operation not permitted)
>>>> >>
>>>> >> David Gossage
>>>> >> Carousel Checks Inc. | System Administrator
>>>> >> Office 708.613.2284
>>>> >> _______________________________________________
>>>> >> Users mailing list
>>>> >>
>>>> >> Users at ovirt.org
>>>> >> http://lists.ovirt.org/mailman/listinfo/users
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> ______________________________________________________________________________
>>>> > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH
>>>> > Sandhufe 2
>>>> > 18311 Ribnitz-Damgarten
>>>> >
>>>> > Telefon: 03821-700-0
>>>> > Fax: 03821-700-240
>>>> >
>>>> > E-Mail: info at bodden-kliniken.de Internet:
>>>> http://www.bodden-kliniken.de
>>>> >
>>>> > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919,
>>>> Steuer-Nr.: 079/133/40188
>>>> > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko
>>>> Milski
>>>> >
>>>> > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten
>>>> Adressaten bestimmt. Wenn Sie nicht der vorge-
>>>> > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten,
>>>> beachten Sie bitte, dass jede Form der Veröf-
>>>> > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser
>>>> E-Mail unzulässig ist. Wir bitten Sie, sofort den
>>>> > Absender zu informieren und die E-Mail zu löschen.
>>>> >
>>>> >
>>>> > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016
>>>> > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus ***
>>>> > _______________________________________________
>>>> > Users mailing list
>>>> > Users at ovirt.org
>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Users mailing list
>>>> > Users at ovirt.org
>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Users mailing list
>>>> > Users at ovirt.org
>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/6d9d7668/attachment-0001.html>
More information about the Users
mailing list