On Tue, Jul 25, 2017 at 11:12 AM, Kasturi Narra <knarra(a)redhat.com> wrote:
These errors are because not having glusternw assigned to the
correct
interface. Once you attach that these errors should go away. This has
nothing to do with the problem you are seeing.
sahina any idea about engine not showing the correct volume info ?
Please provide the vdsm.log (contianing the gluster volume info) and
engine.log
On Mon, Jul 24, 2017 at 7:30 PM, yayo (j) <jaganz(a)gmail.com>
wrote:
> Hi,
>
> UI refreshed but problem still remain ...
>
> No specific error, I've only these errors but I've read that there is no
> problem if I have this kind of errors:
>
>
> 2017-07-24 15:53:59,823+02 INFO [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] START, GlusterServersListVDSCommand(HostName =
> node01.localdomain.local, VdsIdVDSCommandParametersBase:{runAsync='true',
> hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 29a62417
> 2017-07-24 15:54:01,066+02 INFO [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] FINISH, GlusterServersListVDSCommand, return: [10.10.20.80/24:CONNECTED,
> node02.localdomain.local:CONNECTED, gdnode04:CONNECTED], log id: 29a62417
> 2017-07-24 15:54:01,076+02 INFO [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] START, GlusterVolumesListVDSCommand(HostName =
> node01.localdomain.local, GlusterVolumesListVDSParameters:{runAsync=
> 'true', hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id:
7fce25d3
> 2017-07-24 15:54:02,209+02 WARN [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
> [b7590c4] Could not associate brick 'gdnode01:/gluster/engine/brick' of
> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,212+02 WARN [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
> [b7590c4] Could not associate brick 'gdnode02:/gluster/engine/brick' of
> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,215+02 WARN [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
> [b7590c4] Could not associate brick 'gdnode04:/gluster/engine/brick' of
> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,218+02 WARN [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
> [b7590c4] Could not associate brick 'gdnode01:/gluster/data/brick' of
> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,221+02 WARN [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
> [b7590c4] Could not associate brick 'gdnode02:/gluster/data/brick' of
> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,224+02 WARN [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
> [b7590c4] Could not associate brick 'gdnode04:/gluster/data/brick' of
> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
> 2017-07-24 15:54:02,224+02 INFO [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] FINISH, GlusterVolumesListVDSCommand, return: {d19c19e3-910d
> -437b-8ba7-4f2a23d17515=org.ovirt.engine.core.
> common.businessentities.gluster.GlusterVolumeEntity@fdc91062, c7a5dfc9
> -3e72-4ea1-843e-c8275d4a7c2d=org.ovirt.engine.core.c
> ommon.businessentities.gluster.GlusterVolumeEntity@999a6f23}, log id: 7
> fce25d3
>
>
> Thank you
>
>
> 2017-07-24 8:12 GMT+02:00 Kasturi Narra <knarra(a)redhat.com>:
>
>> Hi,
>>
>> Regarding the UI showing incorrect information about engine and data
>> volumes, can you please refresh the UI and see if the issue persists plus
>> any errors in the engine.log files ?
>>
>> Thanks
>> kasturi
>>
>> On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar(a)redhat.com>
>> wrote:
>>
>>>
>>> On 07/21/2017 11:41 PM, yayo (j) wrote:
>>>
>>> Hi,
>>>
>>> Sorry for follow up again, but, checking the ovirt interface I've found
>>> that ovirt report the "engine" volume as an "arbiter"
configuration and the
>>> "data" volume as full replicated volume. Check these screenshots:
>>>
>>>
>>> This is probably some refresh bug in the UI, Sahina might be able to
>>> tell you.
>>>
>>>
>>>
https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFf
>>> VmR5aDQ?usp=sharing
>>>
>>> But the "gluster volume info" command report that all 2 volume are
full
>>> replicated:
>>>
>>>
>>> *Volume Name: data*
>>> *Type: Replicate*
>>> *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
>>> *Status: Started*
>>> *Snapshot Count: 0*
>>> *Number of Bricks: 1 x 3 = 3*
>>> *Transport-type: tcp*
>>> *Bricks:*
>>> *Brick1: gdnode01:/gluster/data/brick*
>>> *Brick2: gdnode02:/gluster/data/brick*
>>> *Brick3: gdnode04:/gluster/data/brick*
>>> *Options Reconfigured:*
>>> *nfs.disable: on*
>>> *performance.readdir-ahead: on*
>>> *transport.address-family: inet*
>>> *storage.owner-uid: 36*
>>> *performance.quick-read: off*
>>> *performance.read-ahead: off*
>>> *performance.io-cache: off*
>>> *performance.stat-prefetch: off*
>>> *performance.low-prio-threads: 32*
>>> *network.remote-dio: enable*
>>> *cluster.eager-lock: enable*
>>> *cluster.quorum-type: auto*
>>> *cluster.server-quorum-type: server*
>>> *cluster.data-self-heal-algorithm: full*
>>> *cluster.locking-scheme: granular*
>>> *cluster.shd-max-threads: 8*
>>> *cluster.shd-wait-qlength: 10000*
>>> *features.shard: on*
>>> *user.cifs: off*
>>> *storage.owner-gid: 36*
>>> *features.shard-block-size: 512MB*
>>> *network.ping-timeout: 30*
>>> *performance.strict-o-direct: on*
>>> *cluster.granular-entry-heal: on*
>>> *auth.allow: **
>>> *server.allow-insecure: on*
>>>
>>>
>>>
>>>
>>>
>>> *Volume Name: engine*
>>> *Type: Replicate*
>>> *Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
>>> *Status: Started*
>>> *Snapshot Count: 0*
>>> *Number of Bricks: 1 x 3 = 3*
>>> *Transport-type: tcp*
>>> *Bricks:*
>>> *Brick1: gdnode01:/gluster/engine/brick*
>>> *Brick2: gdnode02:/gluster/engine/brick*
>>> *Brick3: gdnode04:/gluster/engine/brick*
>>> *Options Reconfigured:*
>>> *nfs.disable: on*
>>> *performance.readdir-ahead: on*
>>> *transport.address-family: inet*
>>> *storage.owner-uid: 36*
>>> *performance.quick-read: off*
>>> *performance.read-ahead: off*
>>> *performance.io-cache: off*
>>> *performance.stat-prefetch: off*
>>> *performance.low-prio-threads: 32*
>>> *network.remote-dio: off*
>>> *cluster.eager-lock: enable*
>>> *cluster.quorum-type: auto*
>>> *cluster.server-quorum-type: server*
>>> *cluster.data-self-heal-algorithm: full*
>>> *cluster.locking-scheme: granular*
>>> *cluster.shd-max-threads: 8*
>>> *cluster.shd-wait-qlength: 10000*
>>> *features.shard: on*
>>> *user.cifs: off*
>>> *storage.owner-gid: 36*
>>> *features.shard-block-size: 512MB*
>>> *network.ping-timeout: 30*
>>> *performance.strict-o-direct: on*
>>> *cluster.granular-entry-heal: on*
>>> *auth.allow: **
>>>
>>> server.allow-insecure: on
>>>
>>>
>>> 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz(a)gmail.com>:
>>>
>>>> 2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar(a)redhat.com>:
>>>>
>>>>>
>>>>> But it does say something. All these gfids of completed heals in
the
>>>>> log below are the for the ones that you have given the getfattr
output of.
>>>>> So what is likely happening is there is an intermittent connection
problem
>>>>> between your mount and the brick process, leading to pending heals
again
>>>>> after the heal gets completed, which is why the numbers are varying
each
>>>>> time. You would need to check why that is the case.
>>>>> Hope this helps,
>>>>> Ravi
>>>>>
>>>>>
>>>>>
>>>>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>>>>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>>>>> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
>>>>> sources=[0] 1 sinks=2*
>>>>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>>>>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>>>>> 0-engine-replicate-0: performing metadata selfheal on
>>>>> f05b9742-2771-484a-85fc-5b6974bcef81*
>>>>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>>>>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>>>>> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
>>>>> sources=[0] 1 sinks=2*
>>>>>
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> following your suggestion, I've checked the "peer" status
and I found
>>>> that there is too many name for the hosts, I don't know if this can
be the
>>>> problem or part of it:
>>>>
>>>> *gluster peer status on NODE01:*
>>>> *Number of Peers: 2*
>>>>
>>>> *Hostname: dnode02.localdomain.local*
>>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *192.168.10.52*
>>>> *dnode02.localdomain.local*
>>>> *10.10.20.90*
>>>> *10.10.10.20*
>>>>
>>>>
>>>>
>>>>
>>>> *gluster peer status on NODE02:*
>>>> *Number of Peers: 2*
>>>>
>>>> *Hostname: dnode01.localdomain.local*
>>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *gdnode01*
>>>> *10.10.10.10*
>>>>
>>>> *Hostname: gdnode04*
>>>> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *192.168.10.54*
>>>> *10.10.10.40*
>>>>
>>>>
>>>> *gluster peer status on NODE04:*
>>>> *Number of Peers: 2*
>>>>
>>>> *Hostname: dnode02.neridom.dom*
>>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *10.10.20.90*
>>>> *gdnode02*
>>>> *192.168.10.52*
>>>> *10.10.10.20*
>>>>
>>>> *Hostname: dnode01.localdomain.local*
>>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>>> *State: Peer in Cluster (Connected)*
>>>> *Other names:*
>>>> *gdnode01*
>>>> *10.10.10.10*
>>>>
>>>>
>>>>
>>>> All these ip are pingable and hosts resolvible across all 3 nodes but,
>>>> only the 10.10.10.0 network is the decidated network for gluster
(rosolved
>>>> using gdnode* host names) ... You think that remove other entries can
fix
>>>> the problem? So, sorry, but, how can I remove other entries?
>>>>
>>> I don't think having extra entries could be a problem. Did you check
>>> the fuse mount logs for disconnect messages that I referred to in the other
>>> email?
>>>
>>>
>>>> And, what about the selinux?
>>>>
>>> Not sure about this. See if there are disconnect messages in the mount
>>> logs first.
>>> -Ravi
>>>
>>>
>>>> Thank you
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Linux User: 369739
http://counter.li.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
>
> --
> Linux User: 369739
http://counter.li.org
>
_______________________________________________
Gluster-users mailing list
Gluster-users(a)gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users