On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote:
On 07/19/2014 11:25 AM, Andrew Lau wrote:
>
>
> On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri
> <pkarampu(a)redhat.com <mailto:pkarampu@redhat.com>> wrote:
>
>
> On 07/18/2014 05:43 PM, Andrew Lau wrote:
>>
>>
>> On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
>> <vbellur(a)redhat.com <mailto:vbellur@redhat.com>> wrote:
>>
>> [Adding gluster-devel]
>>
>>
>> On 07/18/2014 05:20 PM, Andrew Lau wrote:
>>
>> Hi all,
>>
>> As most of you have got hints from previous messages,
>> hosted engine
>> won't work on gluster . A quote from BZ1097639
>>
>> "Using hosted engine with Gluster backed storage is
>> currently something
>> we really warn against.
>>
>>
>> I think this bug should be closed or re-targeted at
>> documentation, because there is nothing we can do here.
>> Hosted engine assumes that all writes are atomic and
>> (immediately) available for all hosts in the cluster.
>> Gluster violates those assumptions.
>> "
>>
>> I tried going through BZ1097639 but could not find much
>> detail with respect to gluster there.
>>
>> A few questions around the problem:
>>
>> 1. Can somebody please explain in detail the scenario that
>> causes the problem?
>>
>> 2. Is hosted engine performing synchronous writes to ensure
>> that writes are durable?
>>
>> Also, if there is any documentation that details the hosted
>> engine architecture that would help in enhancing our
>> understanding of its interactions with gluster.
>>
>>
>>
>>
>> Now my question, does this theory prevent a scenario of
>> perhaps
>> something like a gluster replicated volume being mounted
>> as a glusterfs
>> filesystem and then re-exported as the native kernel NFS
>> share for the
>> hosted-engine to consume? It could then be possible to
>> chuck ctdb in
>> there to provide a last resort failover solution. I have
>> tried myself
>> and suggested it to two people who are running a similar
>> setup. Now
>> using the native kernel NFS server for hosted-engine and
>> they haven't
>> reported as many issues. Curious, could anyone validate
>> my theory on this?
>>
>>
>> If we obtain more details on the use case and obtain gluster
>> logs from the failed scenarios, we should be able to
>> understand the problem better. That could be the first step
>> in validating your theory or evolving further recommendations :).
>>
>>
>> I'm not sure how useful this is, but Jiri Moskovcak tracked
>> this down in an off list message.
>>
>> Message Quote:
>>
>> ==
>>
>> We were able to track it down to this (thanks Andrew for
>> providing the testing setup):
>>
>> -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
>> Traceback (most recent call last):
>> File
>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>> line 165, in handle
>> response = "success " + self._dispatch(data)
>> File
>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>> line 261, in _dispatch
>> .get_all_stats_for_service_type(**options)
>> File
>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>> line 41, in get_all_stats_for_service_type
>> d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>> File
>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>> line 74, in get_raw_stats_for_service_type
>> f = os.open(path, direct_flag | os.O_RDONLY)
>> OSError: [Errno 116] Stale file handle:
>>
'/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'
> Andrew/Jiri,
> Would it be possible to post gluster logs of both the
> mount and bricks on the bz? I can take a look at it once. If I
> gather nothing then probably I will ask for your help in
> re-creating the issue.
>
> Pranith
>
>
> Unfortunately, I don't have the logs for that setup any more.. I'll
> try replicate when I get a chance. If I understand the comment from
> the BZ, I don't think it's a gluster bug per-say, more just how
> gluster does its replication.
hi Andrew,
Thanks for that. I couldn't come to any conclusions because no
logs were available. It is unlikely that self-heal is involved because
there were no bricks going down/up according to the bug description.
Hi,
I've never had such setup, I guessed problem with gluster based on
"OSError: [Errno 116] Stale file handle:" which happens when the file
opened by application on client gets removed on the server. I'm pretty
sure we (hosted-engine) don't remove that file, so I think it's some
gluster magic moving the data around...
--Jirka
Pranith
>
>
>>
>> It's definitely connected to the storage which leads us to the
>> gluster, I'm not very familiar with the gluster so I need to
>> check this with our gluster gurus.
>>
>> ==
>>
>> Thanks,
>> Vijay
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel(a)gluster.org <mailto:Gluster-devel@gluster.org>
>>
http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>