[ovirt-users] Fwd: Having issues with Hosted Engine

Simone Tiraboschi stirabos at redhat.com
Fri Apr 29 07:25:37 UTC 2016


On Fri, Apr 29, 2016 at 4:44 AM, Luiz Claudio Prazeres Goncalves
<luizcpg at gmail.com> wrote:
> Hi Simone, I was reviewing the changelog of 3.6.6, on the link below, but i
> was not able to find the bug (https://bugzilla.redhat.com/1327516) as fixed
> on the list. According to Bugzilla the target is really 3.6.6, so what's
> wrong?
>
>
> http://www.ovirt.org/release/3.6.6/

' oVirt 3.6.6 first release candidate' so it's still not the GA.

> Thanks
> Luiz
>
> Em qui, 28 de abr de 2016 11:33, Luiz Claudio Prazeres Goncalves
> <luizcpg at gmail.com> escreveu:
>>
>> Nice!... so, I'll survive a bit more with these issues until the version
>> 3.6.6 gets released...
>>
>>
>> Thanks
>> -Luiz
>>
>> 2016-04-28 4:50 GMT-03:00 Simone Tiraboschi <stirabos at redhat.com>:
>>>
>>> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose <sabose at redhat.com> wrote:
>>> > This seems like issue reported in
>>> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
>>> >
>>> > Nir, Simone?
>>>
>>> The issue is here:
>>> MainThread::INFO::2016-04-27
>>>
>>> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
>>> Disconnecting storage server
>>> MainThread::INFO::2016-04-27
>>>
>>> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
>>> Fixing storage path in conf file
>>>
>>> And it's tracked here: https://bugzilla.redhat.com/1327516
>>>
>>> We already have a patch, it will be fixed with 3.6.6
>>>
>>> As far as I saw this issue will only cause a lot of mess in the logs
>>> and some false alert but it's basically harmless
>>>
>>> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
>>> >
>>> >
>>> > Hi everyone,
>>> >
>>> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
>>> > nodes
>>> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
>>> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
>>> > engine
>>> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
>>> >
>>> > For some weird reason i've been receiving emails from oVirt with
>>> > EngineUnexpectedDown (attached picture) on a daily basis more or less,
>>> > but
>>> > the engine seems to be working fine and my vm's are up and running
>>> > normally.
>>> > I've never had any issue to access the User Interface to manage the
>>> > vm's
>>> >
>>> > Today I run "yum update" on the nodes and realised that vdsm was
>>> > outdated,
>>> > so I updated the kvm hosts and they are now , again, fully updated.
>>> >
>>> >
>>> > Reviewing the logs It seems to be an intermittent connectivity issue
>>> > when
>>> > trying to access the gluster engine storage domain as you can see
>>> > below. I
>>> > don't have any network issue in place and I'm 100% sure about it. I
>>> > have
>>> > another oVirt Cluster using the same network and using a engine storage
>>> > domain on top of an iSCSI Storage Array with no issues.
>>> >
>>> > Here seems to be the issue:
>>> >
>>> > Thread-1111::INFO::2016-04-27
>>> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
>>> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
>>> >
>>> > Thread-1111::DEBUG::2016-04-27
>>> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
>>> > read
>>> > lines (FileMetadataRW)=[]
>>> >
>>> > Thread-1111::DEBUG::2016-04-27
>>> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
>>> > Empty
>>> > metadata
>>> >
>>> > Thread-1111::ERROR::2016-04-27
>>> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
>>> >
>>> > Traceback (most recent call last):
>>> >
>>> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
>>> >
>>> >     return fn(*args, **kargs)
>>> >
>>> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
>>> >
>>> >     res = f(*args, **kwargs)
>>> >
>>> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
>>> > getStorageDomainInfo
>>> >
>>> >     dom = self.validateSdUUID(sdUUID)
>>> >
>>> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
>>> >
>>> >     sdDom.validate()
>>> >
>>> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
>>> >
>>> >     raise se.StorageDomainAccessError(self.sdUUID)
>>> >
>>> > StorageDomainAccessError: Domain is either partially accessible or
>>> > entirely
>>> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
>>> >
>>> > Thread-1111::DEBUG::2016-04-27
>>> > 23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
>>> > d2acf575-1a60-4fa0-a5bb-cd4363636b94
>>> > ('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task
>>> >
>>> > Thread-1111::DEBUG::2016-04-27
>>> > 23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state
>>> > preparing
>>> > (force False)
>>> >
>>> > Thread-1111::DEBUG::2016-04-27
>>> > 23:01:27,865::task::993::Storage.TaskManager.Task::(_decref)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1 aborting True
>>> >
>>> > Thread-1111::INFO::2016-04-27
>>> > 23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting: Task is aborted:
>>> > 'Domain is either partially accessible or entirely inaccessible' - code
>>> > 379
>>> >
>>> > Thread-1111::DEBUG::2016-04-27
>>> > 23:01:27,866::task::1176::Storage.TaskManager.Task::(prepare)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Prepare: aborted: Domain
>>> > is
>>> > either partially accessible or entirely inaccessible
>>> >
>>> >
>>> > Question: Anyone know what might be happening? I have several gluster
>>> > config's, as you can see below. All the storage domain are using the
>>> > same
>>> > config's
>>> >
>>> >
>>> > More information:
>>> >
>>> > I have the "engine" storage domain, "vmos1" storage domain and "master"
>>> > storage domain, so everything looks good.
>>> >
>>> > [root at kvm1 vdsm]# vdsClient -s 0 getStorageDomainsList
>>> >
>>> > 03926733-1872-4f85-bb21-18dc320560db
>>> >
>>> > 35021ff4-fb95-43d7-92a3-f538273a3c2e
>>> >
>>> > e306e54e-ca98-468d-bb04-3e8900f8840c
>>> >
>>> >
>>> > Gluster config:
>>> >
>>> > [root at gluster-root1 ~]# gluster volume info
>>> >
>>> >
>>> >
>>> > Volume Name: engine
>>> >
>>> > Type: Replicate
>>> >
>>> > Volume ID: 64b413d2-c42e-40fd-b356-3e6975e941b0
>>> >
>>> > Status: Started
>>> >
>>> > Number of Bricks: 1 x 3 = 3
>>> >
>>> > Transport-type: tcp
>>> >
>>> > Bricks:
>>> >
>>> > Brick1: gluster1.xyz.com:/gluster/engine/brick1
>>> >
>>> > Brick2: gluster2.xyz.com:/gluster/engine/brick1
>>> >
>>> > Brick3: gluster-root1.xyz.com:/gluster/engine/brick1
>>> >
>>> > Options Reconfigured:
>>> >
>>> > performance.cache-size: 1GB
>>> >
>>> > performance.write-behind-window-size: 4MB
>>> >
>>> > performance.write-behind: off
>>> >
>>> > performance.quick-read: off
>>> >
>>> > performance.read-ahead: off
>>> >
>>> > performance.io-cache: off
>>> >
>>> > performance.stat-prefetch: off
>>> >
>>> > cluster.eager-lock: enable
>>> >
>>> > cluster.quorum-type: auto
>>> >
>>> > network.remote-dio: enable
>>> >
>>> > cluster.server-quorum-type: server
>>> >
>>> > cluster.data-self-heal-algorithm: full
>>> >
>>> > performance.low-prio-threads: 32
>>> >
>>> > features.shard-block-size: 512MB
>>> >
>>> > features.shard: on
>>> >
>>> > storage.owner-gid: 36
>>> >
>>> > storage.owner-uid: 36
>>> >
>>> > performance.readdir-ahead: on
>>> >
>>> >
>>> > Volume Name: master
>>> >
>>> > Type: Replicate
>>> >
>>> > Volume ID: 20164808-7bbe-4eeb-8770-d222c0e0b830
>>> >
>>> > Status: Started
>>> >
>>> > Number of Bricks: 1 x 3 = 3
>>> >
>>> > Transport-type: tcp
>>> >
>>> > Bricks:
>>> >
>>> > Brick1: gluster1.xyz.com:/home/storage/master/brick1
>>> >
>>> > Brick2: gluster2.xyz.com:/home/storage/master/brick1
>>> >
>>> > Brick3: gluster-root1.xyz.com:/home/storage/master/brick1
>>> >
>>> > Options Reconfigured:
>>> >
>>> > performance.readdir-ahead: on
>>> >
>>> > performance.quick-read: off
>>> >
>>> > performance.read-ahead: off
>>> >
>>> > performance.io-cache: off
>>> >
>>> > performance.stat-prefetch: off
>>> >
>>> > cluster.eager-lock: enable
>>> >
>>> > network.remote-dio: enable
>>> >
>>> > cluster.quorum-type: auto
>>> >
>>> > cluster.server-quorum-type: server
>>> >
>>> > storage.owner-uid: 36
>>> >
>>> > storage.owner-gid: 36
>>> >
>>> > features.shard: on
>>> >
>>> > features.shard-block-size: 512MB
>>> >
>>> > performance.low-prio-threads: 32
>>> >
>>> > cluster.data-self-heal-algorithm: full
>>> >
>>> > performance.write-behind: off
>>> >
>>> > performance.write-behind-window-size: 4MB
>>> >
>>> > performance.cache-size: 1GB
>>> >
>>> >
>>> > Volume Name: vmos1
>>> >
>>> > Type: Replicate
>>> >
>>> > Volume ID: ea8fb50e-7bc8-4de3-b775-f3976b6b4f13
>>> >
>>> > Status: Started
>>> >
>>> > Number of Bricks: 1 x 3 = 3
>>> >
>>> > Transport-type: tcp
>>> >
>>> > Bricks:
>>> >
>>> > Brick1: gluster1.xyz.com:/gluster/vmos1/brick1
>>> >
>>> > Brick2: gluster2.xyz.com:/gluster/vmos1/brick1
>>> >
>>> > Brick3: gluster-root1.xyz.com:/gluster/vmos1/brick1
>>> >
>>> > Options Reconfigured:
>>> >
>>> > network.ping-timeout: 60
>>> >
>>> > performance.readdir-ahead: on
>>> >
>>> > performance.quick-read: off
>>> >
>>> > performance.read-ahead: off
>>> >
>>> > performance.io-cache: off
>>> >
>>> > performance.stat-prefetch: off
>>> >
>>> > cluster.eager-lock: enable
>>> >
>>> > network.remote-dio: enable
>>> >
>>> > cluster.quorum-type: auto
>>> >
>>> > cluster.server-quorum-type: server
>>> >
>>> > storage.owner-uid: 36
>>> >
>>> > storage.owner-gid: 36
>>> >
>>> > features.shard: on
>>> >
>>> > features.shard-block-size: 512MB
>>> >
>>> > performance.low-prio-threads: 32
>>> >
>>> > cluster.data-self-heal-algorithm: full
>>> >
>>> > performance.write-behind: off
>>> >
>>> > performance.write-behind-window-size: 4MB
>>> >
>>> > performance.cache-size: 1GB
>>> >
>>> >
>>> >
>>> > Attached goes all the logs...
>>> >
>>> >
>>> >
>>> > Thanks
>>> >
>>> > -Luiz
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Users mailing list
>>> > Users at ovirt.org
>>> > http://lists.ovirt.org/mailman/listinfo/users
>>> >
>>> >
>>
>>
>



More information about the Users mailing list