
I narrowed down on the commit where the originally reported issue crept in: commitfc3a44f71d2ef202cff18d7203b9e4165b546621building and testing with this commit or subsequent commits yields the original issue. - DHC On Wed, Jan 23, 2013 at 3:56 PM, Dead Horse <deadhorseconsulting@gmail.com>wrote:
Indeed reverting back to an older vdsm clears up the above issue. However now I the issue is see is: Thread-18::ERROR::2013-01-23 15:50:42,885::task::833::TaskManager.Task::(_setError) Task=`08709e68-bcbc-40d8-843a-d69d4df40ac6`::Unexpected error
Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 42, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 923, in connectStoragePool masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 970, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 643, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1167, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1506, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=f90a0d1c-06ca-11e2-a05b-00151712f280, msdUUID=67534cca-1327-462a-b455-a04464084b31' Thread-18::DEBUG::2013-01-23 15:50:42,887::task::852::TaskManager.Task::(_run) Task=`08709e68-bcbc-40d8-843a-d69d4df40ac6`::Task._run: 08709e68-bcbc-40d8-843a-d69d4df40ac6 ('f90a0d1c-06ca-11e2-a05b-00151712f280', 2, 'f90a0d1c-06ca-11e2-a05b-00151712f280', '67534cca-1327-462a-b455-a04464084b31', 433) {} failed - stopping task
This is with vdsm built from commit25a2d8572ad32352227c98a86631300fbd6523c1 - DHC
On Wed, Jan 23, 2013 at 10:44 AM, Dead Horse < deadhorseconsulting@gmail.com> wrote:
VDSM was built from: commit 166138e37e75767b32227746bb671b1dab9cdd5e
Attached is the full vdsm log
I should also note that from engine perspective it sees the master storage domain as locked and the others as unknown.
On Wed, Jan 23, 2013 at 2:49 AM, Dan Kenigsberg <danken@redhat.com>wrote:
On Tue, Jan 22, 2013 at 04:02:24PM -0600, Dead Horse wrote:
Any ideas on this one? (from VDSM log): Thread-25::DEBUG::2013-01-22 15:35:29,065::BindingXMLRPC::914::vds::(wrapper) client [3.57.111.30]::call getCapabilities with () {} Thread-25::ERROR::2013-01-22 15:35:29,113::netinfo::159::root::(speed) cannot read ib0 speed Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/vdsm/netinfo.py", line 155, in speed s = int(file('/sys/class/net/%s/speed' % dev).read()) IOError: [Errno 22] Invalid argument
Causes VDSM to fail to attach storage
I doubt that this is the cause of the failure, as vdsm has always reported "0" for ib devices, and still is.
Does a former version works with your Engine? Could you share more of your vdsm.log? I suppose the culprit lies in one one of the storage-related commands, not in statistics retrieval.
Engine side sees: ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (QuartzScheduler_Worker-96) [553ef26e] The connection with details 192.168.0.1:/ovirt/ds failed because of error code 100 and error
message
is: general exception 2013-01-22 15:35:30,160 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-96) [1ab78378] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 8970b3fe-1faf-11e2-bc1f-00151712f280 Type: VDS 2013-01-22 15:35:30,200 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-96) [1ab78378] START, SetVdsStatusVDSCommand(HostName = kezan, HostId = 8970b3fe-1faf-11e2-bc1f-00151712f280, status=NonOperational, nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: 4af5c4cd 2013-01-22 15:35:30,211 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-96) [1ab78378] FINISH, SetVdsStatusVDSCommand, log id: 4af5c4cd 2013-01-22 15:35:30,242 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (QuartzScheduler_Worker-96) [1ab78378] Try to add duplicate audit log values with the same name. Type: VDS_SET_NONOPERATIONAL_DOMAIN. Value: storagepoolname
Engine = latest master VDSM = latest master
Since "latest master" is an unstable reference by definition, I'm sure that History would thank you if you post the exact version (git hash?) of the code.
node = el6