>Thread-18::DEBUG::2013-01-22 10:41:03,570::misc::85::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 192.168.0.1:/ovirt/silvermoon /rhev/data-center/mnt/192.168.0.1:_ovirt_silvermoon' (cwd None) >Thread-18::DEBUG::2013-01-22 10:41:03,607::misc::85::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 192.168.0.1:/ovirt/undercity /rhev/data-center/mnt/192.168.0.1:_ovirt_undercity' (cwd None) >Thread-18::ERROR::2013-01-22 10:41:03,627::hsm::2215::Storage.HSM::(connectStorageServer) Could not connect to storageServer >Traceback (most recent call last): > File "/usr/share/vdsm/storage/hsm.py", line 2211, in connectStorageServer > conObj.connect() > File "/usr/share/vdsm/storage/storageServer.py", line 303, in connect > return self._mountCon.connect() > File "/usr/share/vdsm/storage/storageServer.py", line 209, in connect > fileSD.validateDirAccess(self.getMountObj().getRecord().fs_file) > File "/usr/share/vdsm/storage/fileSD.py", line 55, in validateDirAccess > (os.R_OK | os.X_OK)) > File "/usr/share/vdsm/supervdsm.py", line 81, in __call__ > return callMethod() > File "/usr/share/vdsm/supervdsm.py", line 72, in <lambda> > **kwargs) > File "<string>", line 2, in validateAccess > File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod > raise convert_to_error(kind, result)

>--------------------------------------------------------------------------- >Traceback (most recent call last): > File "/usr/lib64/python2.6/multiprocessing/managers.py", line 214, in serve_client > request = recv() >IOError: [Errno 4] Interrupted system call >---------------------------------------------------------------------------

msg = ('#RETURN', res) except AttributeError: if methodname is None: msg = ('#TRACEBACK', format_exc()) else: try: fallback_func = self.fallback_mapping[methodname] result = fallback_func( self, conn, ident, obj, *args, **kwds ) msg = ('#RETURN', result) except Exception: msg = ('#TRACEBACK', format_exc()) except EOFError: util.debug('got EOF -- exiting thread serving %r', threading.current_thread().name) sys.exit(0) except Exception:<------does not handle IOError,INTR here should retry recv() msg = ('#TRACEBACK', format_exc())

-------- Original Message --------

Subject:	Re: [Users] latest vdsm cannot read ib device speeds causing storage attach fail
Resent-Date:	Thu, 24 Jan 2013 12:24:10 +0200
Resent-From:	Dan Kenigsberg <danken@redhat.com>
Resent-To:	Royce Lv <lvroyce@linux.vnet.ibm.com>
Date:	Wed, 23 Jan 2013 10:44:57 -0600
From:	Dead Horse <deadhorseconsulting@gmail.com>
To:	Dan Kenigsberg <danken@redhat.com>
CC:	<users@ovirt.org> <users@ovirt.org>

VDSM was built from:
commit 166138e37e75767b32227746bb671b1dab9cdd5e

Attached is the full vdsm log

I should also note that from engine perspective it sees the master storage
domain as locked and the others as unknown.


On Wed, Jan 23, 2013 at 2:49 AM, Dan Kenigsberg <danken@redhat.com> wrote:

> On Tue, Jan 22, 2013 at 04:02:24PM -0600, Dead Horse wrote:
> > Any ideas on this one? (from VDSM log):
> > Thread-25::DEBUG::2013-01-22
> > 15:35:29,065::BindingXMLRPC::914::vds::(wrapper) client
> [3.57.111.30]::call
> > getCapabilities with () {}
> > Thread-25::ERROR::2013-01-22 15:35:29,113::netinfo::159::root::(speed)
> > cannot read ib0 speed
> > Traceback (most recent call last):
> >   File "/usr/lib64/python2.6/site-packages/vdsm/netinfo.py", line 155, in
> > speed
> >     s = int(file('/sys/class/net/%s/speed' % dev).read())
> > IOError: [Errno 22] Invalid argument
> >
> > Causes VDSM to fail to attach storage
>
> I doubt that this is the cause of the failure, as vdsm has always
> reported "0" for ib devices, and still is.
>
> Does a former version works with your Engine?
> Could you share more of your vdsm.log? I suppose the culprit lies in one
> one of the storage-related commands, not in statistics retrieval.
>
> >
> > Engine side sees:
> > ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper]
> > (QuartzScheduler_Worker-96) [553ef26e] The connection with details
> > 192.168.0.1:/ovirt/ds failed because of error code 100 and error message
> > is: general exception
> > 2013-01-22 15:35:30,160 INFO
> > [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
> > (QuartzScheduler_Worker-96) [1ab78378] Running command:
> > SetNonOperationalVdsCommand internal: true. Entities affected :  ID:
> > 8970b3fe-1faf-11e2-bc1f-00151712f280 Type: VDS
> > 2013-01-22 15:35:30,200 INFO
> > [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> > (QuartzScheduler_Worker-96) [1ab78378] START,
> > SetVdsStatusVDSCommand(HostName = kezan, HostId =
> > 8970b3fe-1faf-11e2-bc1f-00151712f280, status=NonOperational,
> > nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: 4af5c4cd
> > 2013-01-22 15:35:30,211 INFO
> > [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> > (QuartzScheduler_Worker-96) [1ab78378] FINISH, SetVdsStatusVDSCommand,
> log
> > id: 4af5c4cd
> > 2013-01-22 15:35:30,242 ERROR
> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> > (QuartzScheduler_Worker-96) [1ab78378] Try to add duplicate audit log
> > values with the same name. Type: VDS_SET_NONOPERATIONAL_DOMAIN. Value:
> > storagepoolname
> >
> > Engine = latest master
> > VDSM = latest master
>
> Since "latest master" is an unstable reference by definition, I'm sure
> that History would thank you if you post the exact version (git hash?)
> of the code.
>
> > node = el6
>
>