On 01/24/2013 12:44 AM, Dead Horse wrote:
I narrowed down on the commit where the originally reported issue
crept in:
commitfc3a44f71d2ef202cff18d7203b9e4165b546621building and testing with
this commit or subsequent commits yields the original issue.
Interesting.. it might
be related to this commit and we're trying to
reproduce it.
Did you try to remove that code and run again? does it work without the
additional of zombieReaper?
does the connectivity to the storage work well? when you run 'ls' on the
mounted folder you get see the files without a long delay ? it might
related to too long timeout when validating access to this mount..
we work on that.. any additional info can help
Thanks.
- DHC
On Wed, Jan 23, 2013 at 3:56 PM, Dead Horse
<deadhorseconsulting(a)gmail.com>wrote:
> Indeed reverting back to an older vdsm clears up the above issue. However
> now I the issue is see is:
> Thread-18::ERROR::2013-01-23
> 15:50:42,885::task::833::TaskManager.Task::(_setError)
> Task=`08709e68-bcbc-40d8-843a-d69d4df40ac6`::Unexpected error
>
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/task.py", line 840, in _run
> return fn(*args, **kargs)
> File "/usr/share/vdsm/logUtils.py", line 42, in wrapper
> res = f(*args, **kwargs)
> File "/usr/share/vdsm/storage/hsm.py", line 923, in connectStoragePool
> masterVersion, options)
> File "/usr/share/vdsm/storage/hsm.py", line 970, in _connectStoragePool
> res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
> File "/usr/share/vdsm/storage/sp.py", line 643, in connect
> self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
> File "/usr/share/vdsm/storage/sp.py", line 1167, in __rebuild
> self.masterDomain = self.getMasterDomain(msdUUID=msdUUID,
> masterVersion=masterVersion)
> File "/usr/share/vdsm/storage/sp.py", line 1506, in getMasterDomain
> raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
> StoragePoolMasterNotFound: Cannot find master domain:
> 'spUUID=f90a0d1c-06ca-11e2-a05b-00151712f280,
> msdUUID=67534cca-1327-462a-b455-a04464084b31'
> Thread-18::DEBUG::2013-01-23
> 15:50:42,887::task::852::TaskManager.Task::(_run)
> Task=`08709e68-bcbc-40d8-843a-d69d4df40ac6`::Task._run:
> 08709e68-bcbc-40d8-843a-d69d4df40ac6
> ('f90a0d1c-06ca-11e2-a05b-00151712f280', 2,
> 'f90a0d1c-06ca-11e2-a05b-00151712f280',
> '67534cca-1327-462a-b455-a04464084b31', 433) {} failed - stopping task
>
> This is with vdsm built from
> commit25a2d8572ad32352227c98a86631300fbd6523c1
> - DHC
>
>
> On Wed, Jan 23, 2013 at 10:44 AM, Dead Horse <
> deadhorseconsulting(a)gmail.com> wrote:
>
>> VDSM was built from:
>> commit 166138e37e75767b32227746bb671b1dab9cdd5e
>>
>> Attached is the full vdsm log
>>
>> I should also note that from engine perspective it sees the master
>> storage domain as locked and the others as unknown.
>>
>>
>> On Wed, Jan 23, 2013 at 2:49 AM, Dan Kenigsberg <danken(a)redhat.com>wrote:
>>
>>> On Tue, Jan 22, 2013 at 04:02:24PM -0600, Dead Horse wrote:
>>>> Any ideas on this one? (from VDSM log):
>>>> Thread-25::DEBUG::2013-01-22
>>>> 15:35:29,065::BindingXMLRPC::914::vds::(wrapper) client
>>> [3.57.111.30]::call
>>>> getCapabilities with () {}
>>>> Thread-25::ERROR::2013-01-22 15:35:29,113::netinfo::159::root::(speed)
>>>> cannot read ib0 speed
>>>> Traceback (most recent call last):
>>>> File "/usr/lib64/python2.6/site-packages/vdsm/netinfo.py",
line 155,
>>> in
>>>> speed
>>>> s = int(file('/sys/class/net/%s/speed' % dev).read())
>>>> IOError: [Errno 22] Invalid argument
>>>>
>>>> Causes VDSM to fail to attach storage
>>>
>>> I doubt that this is the cause of the failure, as vdsm has always
>>> reported "0" for ib devices, and still is.
it happens only
when you call to getCapabilities.. so it doesn't related
to the flow, and it can't effect the storage.
Dan: I guess this is not the issue but why is the IOError?
>>>
>>> Does a former version works with your Engine?
>>> Could you share more of your vdsm.log? I suppose the culprit lies in one
>>> one of the storage-related commands, not in statistics retrieval.
>>>
>>>>
>>>> Engine side sees:
>>>> ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper]
>>>> (QuartzScheduler_Worker-96) [553ef26e] The connection with details
>>>> 192.168.0.1:/ovirt/ds failed because of error code 100 and error
>>> message
>>>> is: general exception
>>>> 2013-01-22 15:35:30,160 INFO
>>>> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
>>>> (QuartzScheduler_Worker-96) [1ab78378] Running command:
>>>> SetNonOperationalVdsCommand internal: true. Entities affected : ID:
>>>> 8970b3fe-1faf-11e2-bc1f-00151712f280 Type: VDS
>>>> 2013-01-22 15:35:30,200 INFO
>>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>>> (QuartzScheduler_Worker-96) [1ab78378] START,
>>>> SetVdsStatusVDSCommand(HostName = kezan, HostId =
>>>> 8970b3fe-1faf-11e2-bc1f-00151712f280, status=NonOperational,
>>>> nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: 4af5c4cd
>>>> 2013-01-22 15:35:30,211 INFO
>>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>>> (QuartzScheduler_Worker-96) [1ab78378] FINISH, SetVdsStatusVDSCommand,
>>> log
>>>> id: 4af5c4cd
>>>> 2013-01-22 15:35:30,242 ERROR
>>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>> (QuartzScheduler_Worker-96) [1ab78378] Try to add duplicate audit log
>>>> values with the same name. Type: VDS_SET_NONOPERATIONAL_DOMAIN. Value:
>>>> storagepoolname
>>>>
>>>> Engine = latest master
>>>> VDSM = latest master
>>>
>>> Since "latest master" is an unstable reference by definition,
I'm sure
>>> that History would thank you if you post the exact version (git hash?)
>>> of the code.
>>>
>>>> node = el6
>>>
>>>
>>
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--
Yaniv Bronhaim.
RedHat, Israel
09-7692289
054-7744187