[ovirt-users] Fwd: Having issues with Hosted Engine

Luiz Claudio Prazeres Goncalves luizcpg at gmail.com
Fri Apr 29 02:44:54 UTC 2016


Hi Simone, I was reviewing the changelog of 3.6.6, on the link below, but i
was not able to find the bug (https://bugzilla.redhat.com/1327516) as fixed
on the list. According to Bugzilla the target is really 3.6.6, so what's
wrong?


http://www.ovirt.org/release/3.6.6/


Thanks
Luiz

Em qui, 28 de abr de 2016 11:33, Luiz Claudio Prazeres Goncalves <
luizcpg at gmail.com> escreveu:

> Nice!... so, I'll survive a bit more with these issues until the version
> 3.6.6 gets released...
>
>
> Thanks
> -Luiz
>
> 2016-04-28 4:50 GMT-03:00 Simone Tiraboschi <stirabos at redhat.com>:
>
>> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose <sabose at redhat.com> wrote:
>> > This seems like issue reported in
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
>> >
>> > Nir, Simone?
>>
>> The issue is here:
>> MainThread::INFO::2016-04-27
>>
>> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
>> Disconnecting storage server
>> MainThread::INFO::2016-04-27
>>
>> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
>> Fixing storage path in conf file
>>
>> And it's tracked here: https://bugzilla.redhat.com/1327516
>>
>> We already have a patch, it will be fixed with 3.6.6
>>
>> As far as I saw this issue will only cause a lot of mess in the logs
>> and some false alert but it's basically harmless
>>
>> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
>> >
>> >
>> > Hi everyone,
>> >
>> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
>> nodes
>> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
>> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
>> engine
>> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
>> >
>> > For some weird reason i've been receiving emails from oVirt with
>> > EngineUnexpectedDown (attached picture) on a daily basis more or less,
>> but
>> > the engine seems to be working fine and my vm's are up and running
>> normally.
>> > I've never had any issue to access the User Interface to manage the vm's
>> >
>> > Today I run "yum update" on the nodes and realised that vdsm was
>> outdated,
>> > so I updated the kvm hosts and they are now , again, fully updated.
>> >
>> >
>> > Reviewing the logs It seems to be an intermittent connectivity issue
>> when
>> > trying to access the gluster engine storage domain as you can see
>> below. I
>> > don't have any network issue in place and I'm 100% sure about it. I have
>> > another oVirt Cluster using the same network and using a engine storage
>> > domain on top of an iSCSI Storage Array with no issues.
>> >
>> > Here seems to be the issue:
>> >
>> > Thread-1111::INFO::2016-04-27
>> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
>> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
>> >
>> > Thread-1111::DEBUG::2016-04-27
>> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
>> read
>> > lines (FileMetadataRW)=[]
>> >
>> > Thread-1111::DEBUG::2016-04-27
>> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
>> Empty
>> > metadata
>> >
>> > Thread-1111::ERROR::2016-04-27
>> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
>> >
>> > Traceback (most recent call last):
>> >
>> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
>> >
>> >     return fn(*args, **kargs)
>> >
>> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
>> >
>> >     res = f(*args, **kwargs)
>> >
>> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
>> getStorageDomainInfo
>> >
>> >     dom = self.validateSdUUID(sdUUID)
>> >
>> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
>> >
>> >     sdDom.validate()
>> >
>> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
>> >
>> >     raise se.StorageDomainAccessError(self.sdUUID)
>> >
>> > StorageDomainAccessError: Domain is either partially accessible or
>> entirely
>> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
>> >
>> > Thread-1111::DEBUG::2016-04-27
>> > 23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
>> > d2acf575-1a60-4fa0-a5bb-cd4363636b94
>> > ('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task
>> >
>> > Thread-1111::DEBUG::2016-04-27
>> > 23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state preparing
>> > (force False)
>> >
>> > Thread-1111::DEBUG::2016-04-27
>> > 23:01:27,865::task::993::Storage.TaskManager.Task::(_decref)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1 aborting True
>> >
>> > Thread-1111::INFO::2016-04-27
>> > 23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting: Task is aborted:
>> > 'Domain is either partially accessible or entirely inaccessible' - code
>> 379
>> >
>> > Thread-1111::DEBUG::2016-04-27
>> > 23:01:27,866::task::1176::Storage.TaskManager.Task::(prepare)
>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Prepare: aborted: Domain is
>> > either partially accessible or entirely inaccessible
>> >
>> >
>> > Question: Anyone know what might be happening? I have several gluster
>> > config's, as you can see below. All the storage domain are using the
>> same
>> > config's
>> >
>> >
>> > More information:
>> >
>> > I have the "engine" storage domain, "vmos1" storage domain and "master"
>> > storage domain, so everything looks good.
>> >
>> > [root at kvm1 vdsm]# vdsClient -s 0 getStorageDomainsList
>> >
>> > 03926733-1872-4f85-bb21-18dc320560db
>> >
>> > 35021ff4-fb95-43d7-92a3-f538273a3c2e
>> >
>> > e306e54e-ca98-468d-bb04-3e8900f8840c
>> >
>> >
>> > Gluster config:
>> >
>> > [root at gluster-root1 ~]# gluster volume info
>> >
>> >
>> >
>> > Volume Name: engine
>> >
>> > Type: Replicate
>> >
>> > Volume ID: 64b413d2-c42e-40fd-b356-3e6975e941b0
>> >
>> > Status: Started
>> >
>> > Number of Bricks: 1 x 3 = 3
>> >
>> > Transport-type: tcp
>> >
>> > Bricks:
>> >
>> > Brick1: gluster1.xyz.com:/gluster/engine/brick1
>> >
>> > Brick2: gluster2.xyz.com:/gluster/engine/brick1
>> >
>> > Brick3: gluster-root1.xyz.com:/gluster/engine/brick1
>> >
>> > Options Reconfigured:
>> >
>> > performance.cache-size: 1GB
>> >
>> > performance.write-behind-window-size: 4MB
>> >
>> > performance.write-behind: off
>> >
>> > performance.quick-read: off
>> >
>> > performance.read-ahead: off
>> >
>> > performance.io-cache: off
>> >
>> > performance.stat-prefetch: off
>> >
>> > cluster.eager-lock: enable
>> >
>> > cluster.quorum-type: auto
>> >
>> > network.remote-dio: enable
>> >
>> > cluster.server-quorum-type: server
>> >
>> > cluster.data-self-heal-algorithm: full
>> >
>> > performance.low-prio-threads: 32
>> >
>> > features.shard-block-size: 512MB
>> >
>> > features.shard: on
>> >
>> > storage.owner-gid: 36
>> >
>> > storage.owner-uid: 36
>> >
>> > performance.readdir-ahead: on
>> >
>> >
>> > Volume Name: master
>> >
>> > Type: Replicate
>> >
>> > Volume ID: 20164808-7bbe-4eeb-8770-d222c0e0b830
>> >
>> > Status: Started
>> >
>> > Number of Bricks: 1 x 3 = 3
>> >
>> > Transport-type: tcp
>> >
>> > Bricks:
>> >
>> > Brick1: gluster1.xyz.com:/home/storage/master/brick1
>> >
>> > Brick2: gluster2.xyz.com:/home/storage/master/brick1
>> >
>> > Brick3: gluster-root1.xyz.com:/home/storage/master/brick1
>> >
>> > Options Reconfigured:
>> >
>> > performance.readdir-ahead: on
>> >
>> > performance.quick-read: off
>> >
>> > performance.read-ahead: off
>> >
>> > performance.io-cache: off
>> >
>> > performance.stat-prefetch: off
>> >
>> > cluster.eager-lock: enable
>> >
>> > network.remote-dio: enable
>> >
>> > cluster.quorum-type: auto
>> >
>> > cluster.server-quorum-type: server
>> >
>> > storage.owner-uid: 36
>> >
>> > storage.owner-gid: 36
>> >
>> > features.shard: on
>> >
>> > features.shard-block-size: 512MB
>> >
>> > performance.low-prio-threads: 32
>> >
>> > cluster.data-self-heal-algorithm: full
>> >
>> > performance.write-behind: off
>> >
>> > performance.write-behind-window-size: 4MB
>> >
>> > performance.cache-size: 1GB
>> >
>> >
>> > Volume Name: vmos1
>> >
>> > Type: Replicate
>> >
>> > Volume ID: ea8fb50e-7bc8-4de3-b775-f3976b6b4f13
>> >
>> > Status: Started
>> >
>> > Number of Bricks: 1 x 3 = 3
>> >
>> > Transport-type: tcp
>> >
>> > Bricks:
>> >
>> > Brick1: gluster1.xyz.com:/gluster/vmos1/brick1
>> >
>> > Brick2: gluster2.xyz.com:/gluster/vmos1/brick1
>> >
>> > Brick3: gluster-root1.xyz.com:/gluster/vmos1/brick1
>> >
>> > Options Reconfigured:
>> >
>> > network.ping-timeout: 60
>> >
>> > performance.readdir-ahead: on
>> >
>> > performance.quick-read: off
>> >
>> > performance.read-ahead: off
>> >
>> > performance.io-cache: off
>> >
>> > performance.stat-prefetch: off
>> >
>> > cluster.eager-lock: enable
>> >
>> > network.remote-dio: enable
>> >
>> > cluster.quorum-type: auto
>> >
>> > cluster.server-quorum-type: server
>> >
>> > storage.owner-uid: 36
>> >
>> > storage.owner-gid: 36
>> >
>> > features.shard: on
>> >
>> > features.shard-block-size: 512MB
>> >
>> > performance.low-prio-threads: 32
>> >
>> > cluster.data-self-heal-algorithm: full
>> >
>> > performance.write-behind: off
>> >
>> > performance.write-behind-window-size: 4MB
>> >
>> > performance.cache-size: 1GB
>> >
>> >
>> >
>> > Attached goes all the logs...
>> >
>> >
>> >
>> > Thanks
>> >
>> > -Luiz
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160429/2e14298e/attachment-0001.html>


More information about the Users mailing list