[ovirt-users] Fwd: Having issues with Hosted Engine

Fri Apr 29 09:59:22 UTC 2016

Got it. It should be included until 3.6.6 GA

Thanks
Luiz

Em sex, 29 de abr de 2016 04:26, Simone Tiraboschi <stirabos at redhat.com>
escreveu:

> On Fri, Apr 29, 2016 at 4:44 AM, Luiz Claudio Prazeres Goncalves
> <luizcpg at gmail.com> wrote:
> > Hi Simone, I was reviewing the changelog of 3.6.6, on the link below,
> but i
> > was not able to find the bug (https://bugzilla.redhat.com/1327516) as
> fixed
> > on the list. According to Bugzilla the target is really 3.6.6, so what's
> > wrong?
> >
> >
> > http://www.ovirt.org/release/3.6.6/
>
> ' oVirt 3.6.6 first release candidate' so it's still not the GA.
>
> > Thanks
> > Luiz
> >
> > Em qui, 28 de abr de 2016 11:33, Luiz Claudio Prazeres Goncalves
> > <luizcpg at gmail.com> escreveu:
> >>
> >> Nice!... so, I'll survive a bit more with these issues until the version
> >> 3.6.6 gets released...
> >>
> >>
> >> Thanks
> >> -Luiz
> >>
> >> 2016-04-28 4:50 GMT-03:00 Simone Tiraboschi <stirabos at redhat.com>:
> >>>
> >>> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose <sabose at redhat.com>
> wrote:
> >>> > This seems like issue reported in
> >>> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
> >>> >
> >>> > Nir, Simone?
> >>>
> >>> The issue is here:
> >>> MainThread::INFO::2016-04-27
> >>>
> >>>
> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
> >>> Disconnecting storage server
> >>> MainThread::INFO::2016-04-27
> >>>
> >>>
> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
> >>> Fixing storage path in conf file
> >>>
> >>> And it's tracked here: https://bugzilla.redhat.com/1327516
> >>>
> >>> We already have a patch, it will be fixed with 3.6.6
> >>>
> >>> As far as I saw this issue will only cause a lot of mess in the logs
> >>> and some false alert but it's basically harmless
> >>>
> >>> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
> >>> >
> >>> >
> >>> > Hi everyone,
> >>> >
> >>> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
> >>> > nodes
> >>> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
> >>> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
> >>> > engine
> >>> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
> >>> >
> >>> > For some weird reason i've been receiving emails from oVirt with
> >>> > EngineUnexpectedDown (attached picture) on a daily basis more or
> less,
> >>> > but
> >>> > the engine seems to be working fine and my vm's are up and running
> >>> > normally.
> >>> > I've never had any issue to access the User Interface to manage the
> >>> > vm's
> >>> >
> >>> > Today I run "yum update" on the nodes and realised that vdsm was
> >>> > outdated,
> >>> > so I updated the kvm hosts and they are now , again, fully updated.
> >>> >
> >>> >
> >>> > Reviewing the logs It seems to be an intermittent connectivity issue
> >>> > when
> >>> > trying to access the gluster engine storage domain as you can see
> >>> > below. I
> >>> > don't have any network issue in place and I'm 100% sure about it. I
> >>> > have
> >>> > another oVirt Cluster using the same network and using a engine
> storage
> >>> > domain on top of an iSCSI Storage Array with no issues.
> >>> >
> >>> > Here seems to be the issue:
> >>> >
> >>> > Thread-1111::INFO::2016-04-27
> >>> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
> >>> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
> >>> >
> >>> > Thread-1111::DEBUG::2016-04-27
> >>> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
> >>> > read
> >>> > lines (FileMetadataRW)=[]
> >>> >
> >>> > Thread-1111::DEBUG::2016-04-27
> >>> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
> >>> > Empty
> >>> > metadata
> >>> >
> >>> > Thread-1111::ERROR::2016-04-27
> >>> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
> >>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
> >>> >
> >>> > Traceback (most recent call last):
> >>> >
> >>> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
> >>> >
> >>> >     return fn(*args, **kargs)
> >>> >
> >>> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
> >>> >
> >>> >     res = f(*args, **kwargs)
> >>> >
> >>> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
> >>> > getStorageDomainInfo
> >>> >
> >>> >     dom = self.validateSdUUID(sdUUID)
> >>> >
> >>> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
> >>> >
> >>> >     sdDom.validate()
> >>> >
> >>> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
> >>> >
> >>> >     raise se.StorageDomainAccessError(self.sdUUID)
> >>> >
> >>> > StorageDomainAccessError: Domain is either partially accessible or
> >>> > entirely
> >>> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
> >>> >
> >>> > Thread-1111::DEBUG::2016-04-27
> >>> > 23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
> >>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
> >>> > d2acf575-1a60-4fa0-a5bb-cd4363636b94
> >>> > ('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task
> >>> >
> >>> > Thread-1111::DEBUG::2016-04-27
> >>> > 23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
> >>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state
> >>> > preparing
> >>> > (force False)
> >>> >
> >>> > Thread-1111::DEBUG::2016-04-27
> >>> > 23:01:27,865::task::993::Storage.TaskManager.Task::(_decref)
> >>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1 aborting True
> >>> >
> >>> > Thread-1111::INFO::2016-04-27
> >>> > 23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare)
> >>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting: Task is
> aborted:
> >>> > 'Domain is either partially accessible or entirely inaccessible' -
> code
> >>> > 379
> >>> >
> >>> > Thread-1111::DEBUG::2016-04-27
> >>> > 23:01:27,866::task::1176::Storage.TaskManager.Task::(prepare)
> >>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Prepare: aborted: Domain
> >>> > is
> >>> > either partially accessible or entirely inaccessible
> >>> >
> >>> >
> >>> > Question: Anyone know what might be happening? I have several gluster
> >>> > config's, as you can see below. All the storage domain are using the
> >>> > same
> >>> > config's
> >>> >
> >>> >
> >>> > More information:
> >>> >
> >>> > I have the "engine" storage domain, "vmos1" storage domain and
> "master"
> >>> > storage domain, so everything looks good.
> >>> >
> >>> > [root at kvm1 vdsm]# vdsClient -s 0 getStorageDomainsList
> >>> >
> >>> > 03926733-1872-4f85-bb21-18dc320560db
> >>> >
> >>> > 35021ff4-fb95-43d7-92a3-f538273a3c2e
> >>> >
> >>> > e306e54e-ca98-468d-bb04-3e8900f8840c
> >>> >
> >>> >
> >>> > Gluster config:
> >>> >
> >>> > [root at gluster-root1 ~]# gluster volume info
> >>> >
> >>> >
> >>> >
> >>> > Volume Name: engine
> >>> >
> >>> > Type: Replicate
> >>> >
> >>> > Volume ID: 64b413d2-c42e-40fd-b356-3e6975e941b0
> >>> >
> >>> > Status: Started
> >>> >
> >>> > Number of Bricks: 1 x 3 = 3
> >>> >
> >>> > Transport-type: tcp
> >>> >
> >>> > Bricks:
> >>> >
> >>> > Brick1: gluster1.xyz.com:/gluster/engine/brick1
> >>> >
> >>> > Brick2: gluster2.xyz.com:/gluster/engine/brick1
> >>> >
> >>> > Brick3: gluster-root1.xyz.com:/gluster/engine/brick1
> >>> >
> >>> > Options Reconfigured:
> >>> >
> >>> > performance.cache-size: 1GB
> >>> >
> >>> > performance.write-behind-window-size: 4MB
> >>> >
> >>> > performance.write-behind: off
> >>> >
> >>> > performance.quick-read: off
> >>> >
> >>> > performance.read-ahead: off
> >>> >
> >>> > performance.io-cache: off
> >>> >
> >>> > performance.stat-prefetch: off
> >>> >
> >>> > cluster.eager-lock: enable
> >>> >
> >>> > cluster.quorum-type: auto
> >>> >
> >>> > network.remote-dio: enable
> >>> >
> >>> > cluster.server-quorum-type: server
> >>> >
> >>> > cluster.data-self-heal-algorithm: full
> >>> >
> >>> > performance.low-prio-threads: 32
> >>> >
> >>> > features.shard-block-size: 512MB
> >>> >
> >>> > features.shard: on
> >>> >
> >>> > storage.owner-gid: 36
> >>> >
> >>> > storage.owner-uid: 36
> >>> >
> >>> > performance.readdir-ahead: on
> >>> >
> >>> >
> >>> > Volume Name: master
> >>> >
> >>> > Type: Replicate
> >>> >
> >>> > Volume ID: 20164808-7bbe-4eeb-8770-d222c0e0b830
> >>> >
> >>> > Status: Started
> >>> >
> >>> > Number of Bricks: 1 x 3 = 3
> >>> >
> >>> > Transport-type: tcp
> >>> >
> >>> > Bricks:
> >>> >
> >>> > Brick1: gluster1.xyz.com:/home/storage/master/brick1
> >>> >
> >>> > Brick2: gluster2.xyz.com:/home/storage/master/brick1
> >>> >
> >>> > Brick3: gluster-root1.xyz.com:/home/storage/master/brick1
> >>> >
> >>> > Options Reconfigured:
> >>> >
> >>> > performance.readdir-ahead: on
> >>> >
> >>> > performance.quick-read: off
> >>> >
> >>> > performance.read-ahead: off
> >>> >
> >>> > performance.io-cache: off
> >>> >
> >>> > performance.stat-prefetch: off
> >>> >
> >>> > cluster.eager-lock: enable
> >>> >
> >>> > network.remote-dio: enable
> >>> >
> >>> > cluster.quorum-type: auto
> >>> >
> >>> > cluster.server-quorum-type: server
> >>> >
> >>> > storage.owner-uid: 36
> >>> >
> >>> > storage.owner-gid: 36
> >>> >
> >>> > features.shard: on
> >>> >
> >>> > features.shard-block-size: 512MB
> >>> >
> >>> > performance.low-prio-threads: 32
> >>> >
> >>> > cluster.data-self-heal-algorithm: full
> >>> >
> >>> > performance.write-behind: off
> >>> >
> >>> > performance.write-behind-window-size: 4MB
> >>> >
> >>> > performance.cache-size: 1GB
> >>> >
> >>> >
> >>> > Volume Name: vmos1
> >>> >
> >>> > Type: Replicate
> >>> >
> >>> > Volume ID: ea8fb50e-7bc8-4de3-b775-f3976b6b4f13
> >>> >
> >>> > Status: Started
> >>> >
> >>> > Number of Bricks: 1 x 3 = 3
> >>> >
> >>> > Transport-type: tcp
> >>> >
> >>> > Bricks:
> >>> >
> >>> > Brick1: gluster1.xyz.com:/gluster/vmos1/brick1
> >>> >
> >>> > Brick2: gluster2.xyz.com:/gluster/vmos1/brick1
> >>> >
> >>> > Brick3: gluster-root1.xyz.com:/gluster/vmos1/brick1
> >>> >
> >>> > Options Reconfigured:
> >>> >
> >>> > network.ping-timeout: 60
> >>> >
> >>> > performance.readdir-ahead: on
> >>> >
> >>> > performance.quick-read: off
> >>> >
> >>> > performance.read-ahead: off
> >>> >
> >>> > performance.io-cache: off
> >>> >
> >>> > performance.stat-prefetch: off
> >>> >
> >>> > cluster.eager-lock: enable
> >>> >
> >>> > network.remote-dio: enable
> >>> >
> >>> > cluster.quorum-type: auto
> >>> >
> >>> > cluster.server-quorum-type: server
> >>> >
> >>> > storage.owner-uid: 36
> >>> >
> >>> > storage.owner-gid: 36
> >>> >
> >>> > features.shard: on
> >>> >
> >>> > features.shard-block-size: 512MB
> >>> >
> >>> > performance.low-prio-threads: 32
> >>> >
> >>> > cluster.data-self-heal-algorithm: full
> >>> >
> >>> > performance.write-behind: off
> >>> >
> >>> > performance.write-behind-window-size: 4MB
> >>> >
> >>> > performance.cache-size: 1GB
> >>> >
> >>> >
> >>> >
> >>> > Attached goes all the logs...
> >>> >
> >>> >
> >>> >
> >>> > Thanks
> >>> >
> >>> > -Luiz
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > Users mailing list
> >>> > Users at ovirt.org
> >>> > http://lists.ovirt.org/mailman/listinfo/users
> >>> >
> >>> >
> >>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160429/d871b427/attachment-0001.html>