[ovirt-users] SPM error

Liron Aravot laravot at redhat.com
Wed Apr 16 14:30:11 UTC 2014



----- Original Message -----
> From: "Maurice James" <mjames at media-node.com>
> To: "Liron Aravot" <laravot at redhat.com>
> Cc: users at ovirt.org
> Sent: Wednesday, April 16, 2014 4:49:18 PM
> Subject: Re: [ovirt-users] SPM error
> 
> After a few sleepless nights. I went through my entire system again and I
> found an interface on one of my hosts that had already been removed from the
> UI. Even after multiple "service network restart" it would still show up
> when I ran "ip addr". I had to end up forcefully removing it with rm -rf
> /etc/sysconfig/network-scripts/ifcfg-<interface>. After that I rebooted the
> node and the SPM came out of contention. I cant make sense of it but it
> worked

ok, so we might have a bug here - what os are you running? 

as it seems the initial issue SPM issue is as the bug i provided earlier,
seems like the bug you opened on that issue can be closed as a duplicate, adding federico to verify that there's no further sanlock issue there.
> 
> ----- Original Message -----
> From: "Liron Aravot" <laravot at redhat.com>
> To: "Maurice \"Moe\" James" <mjames at media-node.com>
> Cc: users at ovirt.org
> Sent: Wednesday, April 16, 2014 8:49:05 AM
> Subject: Re: [ovirt-users] SPM error
> 
> Hi Maurice,
> any updates on the above?
> 
> thanks, Liron
> 
> ----- Original Message -----
> > From: "Liron Aravot" <laravot at redhat.com>
> > To: "Maurice \"Moe\" James" <mjames at media-node.com>
> > Cc: users at ovirt.org
> > Sent: Tuesday, April 15, 2014 11:53:40 AM
> > Subject: Re: [ovirt-users] SPM error
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Maurice \"Moe\" James" <mjames at media-node.com>
> > > To: "Liron Aravot" <laravot at redhat.com>
> > > Cc: "Itamar Heim" <iheim at redhat.com>, users at ovirt.org
> > > Sent: Tuesday, April 15, 2014 3:14:16 AM
> > > Subject: Re: [ovirt-users] SPM error
> > > 
> > > Sorry forgot to paste
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1086951
> > 
> > Hi Maurice,
> > the issue is that the host doesn't have access to all the storage domains
> > which causes to the spm start process to fail.
> > There's a bug open for that issue -
> > https://bugzilla.redhat.com/show_bug.cgi?id=1072900 so seems we'll be able
> > to close the one you opened as a duplicate but let's wait with that till
> > your issue is solved.
> > From looking in the logs, it seems like that host have problem accessing
> > two
> > storage domains -
> > 3406665e-4adc-4fd4-aa1e-037547b29adb
> > f3b51811-4a7f-43af-8633-322b3db23c48
> > 
> > Can you verify that the host can access those domains? from the log it
> > seems
> > like the nfs paths for those are:
> > shtistg01.suprtekstic.com:/storage/infrastructure
> > shtistg01.suprtekstic.com:/storage/exports
> > 
> > 
> > log snippet:
> > 1.
> > Thread-14::DEBUG::2014-04-11
> > 22:54:44,331::mount::226::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n
> > /bin/mount -t nfs -o soft,nosharecache,timeo=600,retra
> > ns=6,nfsvers=3 ashtistg01.suprtekstic.com:/storage/exports
> > /rhev/data-center/mnt/ashtistg01.suprtekstic.com:_storage_exports' (cwd
> > None)
> > Thread-14::ERROR::2014-04-11
> > 22:55:36,659::storageServer::209::StorageServer.MountConnection::(connect)
> > Mount failed: (32, ';mount.nfs: Failed to resolve serv
> > er ashtistg01.suprtekstic.com: Name or service not known\n')
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect
> >     self._mount.mount(self.options, self._vfsType)
> >   File "/usr/share/vdsm/storage/mount.py", line 222, in mount
> >     return self._runcmd(cmd, timeout)
> >   File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd
> >     raise MountError(rc, ";".join((out, err)))
> > MountError: (32, ';mount.nfs: Failed to resolve server
> > ashtistg01.suprtekstic.com: Name or service not known\n')
> > Thread-14::ERROR::2014-04-11
> > 22:55:36,705::hsm::2379::Storage.HSM::(connectStorageServer) Could not
> > connect to storageServer
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer
> >     conObj.connect()
> >   File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect
> >     return self._mountCon.connect()
> >   File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect
> >     raise e
> > MountError: (32, ';mount.nfs: Failed to resolve server
> > ashtistg01.suprtekstic.com: Name or service not known\n')
> > 
> > 
> > 
> > 
> > 
> > 2.
> > Thread-14::ERROR::2014-04-11
> > 22:56:29,307::storageServer::209::StorageServer.MountConnection::(connect)
> > Mount failed: (32, ';mount.nfs: Failed to resolve serv
> > er ashtistg01.suprtekstic.com: Name or service not known\n')
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect
> >     self._mount.mount(self.options, self._vfsType)
> >   File "/usr/share/vdsm/storage/mount.py", line 222, in mount
> >     return self._runcmd(cmd, timeout)
> >   File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd
> >     raise MountError(rc, ";".join((out, err)))
> > MountError: (32, ';mount.nfs: Failed to resolve server
> > ashtistg01.suprtekstic.com: Name or service not known\n')
> > Thread-14::ERROR::2014-04-11
> > 22:56:29,309::hsm::2379::Storage.HSM::(connectStorageServer) Could not
> > connect to storageServer
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer
> >     conObj.connect()
> >   File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect
> >     return self._mountCon.connect()
> >   File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect
> >     raise e
> > MountError: (32, ';mount.nfs: Failed to resolve server
> > ashtistg01.suprtekstic.com: Name or service not known\n')
> > 
> > 
> > Regardless of that, there are sanlock errors over the log when trying to
> > acquire host-id over the log.
> > I'd start with check the connectivity issues to the storage domains above,
> > later on you can attach check that sanlock is running and operational
> > and/or
> > attach the sanlock logs.
> > 
> > 
> > > 
> > > On Mon, 2014-04-14 at 17:11 -0400, Liron Aravot wrote:
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Maurice \"Moe\" James" <mjames at media-node.com>
> > > > > To: "Itamar Heim" <iheim at redhat.com>
> > > > > Cc: users at ovirt.org
> > > > > Sent: Sunday, April 13, 2014 2:28:45 AM
> > > > > Subject: Re: [ovirt-users] SPM error
> > > > > 
> > > > > Were you able to find out anything? Is there anything that I can
> > > > > check
> > > > > in the meanwhile?
> > > > > 
> > > > 
> > > > Hi Muarice,
> > > > can you please attach the ovirt engine/vdsm logs?
> > > > thanks,
> > > > Liron
> > > > > 
> > > > > On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
> > > > > > On 04/12/2014 03:40 PM, Maurice James wrote:
> > > > > > > What did you do to try to fix the sanlock? Anything is better
> > > > > > > than
> > > > > > > nothing at this point
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > From: "Ted Miller" <tmiller at hcjb.org>
> > > > > > > To: "Maurice James" <mjames at media-node.com>
> > > > > > > Sent: Friday, April 11, 2014 7:27:24 PM
> > > > > > > Subject: Re: [ovirt-users] SPM error
> > > > > > >
> > > > > > > I did receive some help on one stage of rebuilding my sanlock,
> > > > > > > but
> > > > > > > there
> > > > > > > were
> > > > > > > too many other things wrong to get it started again. Only advice
> > > > > > > I
> > > > > > > have
> > > > > > > is --
> > > > > > > look at your sanlock logs, and see if you can find anything there
> > > > > > > that is
> > > > > > > helpful.
> > > > > > >
> > > > > > > On 4/11/2014 7:23 PM, Maurice James wrote:
> > > > > > >> Nooooooo.
> > > > > > >>
> > > > > > >>
> > > > > > >> Sent from my Galaxy S®III
> > > > > > >>
> > > > > > >> -------- Original message --------
> > > > > > >> From: Ted Miller <tmiller at hcjb.org>
> > > > > > >> Date:04/11/2014  7:08 PM  (GMT-05:00)
> > > > > > >> To: Maurice James <mjames at media-node.com>
> > > > > > >> Subject: Re: [ovirt-users] SPM error
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> I didn't, really.  I did something wrong along the way, and
> > > > > > >> ended
> > > > > > >> up
> > > > > > >> having
> > > > > > >> to rebuild the engine and hosts.  (My problems were due to a
> > > > > > >> glusterfs
> > > > > > >> split-brain.)
> > > > > > >> Ted Miller
> > > > > > >>
> > > > > > >> On 4/11/2014 6:03 PM, Maurice James wrote:
> > > > > > >>> How did you fix it?
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Sent from my Galaxy S®III
> > > > > > >>>
> > > > > > >>> -------- Original message --------
> > > > > > >>> From: Ted Miller <tmiller at hcjb.org>
> > > > > > >>> Date:04/11/2014  6:00 PM  (GMT-05:00)
> > > > > > >>> To: users at ovirt.org
> > > > > > >>> Subject: Re: [ovirt-users] SPM error
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On 4/11/2014 2:05 PM, Maurice James wrote:
> > > > > > >>>> I have an error trying to bring the master DC back online.
> > > > > > >>>> After
> > > > > > >>>> several
> > > > > > >>>> reboots, no luck. I took the other cluster members offline to
> > > > > > >>>> try
> > > > > > >>>> to
> > > > > > >>>> troubleshoot. The remaining host is constantly in contention
> > > > > > >>>> with
> > > > > > >>>> itself
> > > > > > >>>> for SPM
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> ERROR
> > > > > > >>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
> > > > > > >>>> (DefaultQuartzScheduler_Worker-40) [38d400ea]
> > > > > > >>>> IrsBroker::Failed::GetStoragePoolInfoVDS due to:
> > > > > > >>>> IrsSpmStartFailedException: IRSGenericException:
> > > > > > >>>> IRSErrorException:
> > > > > > >>>> SpmStart failed
> > > > > > >>>>
> > > > > > >>> I'm no expert, but the last time I beat my head on that rock,
> > > > > > >>> something
> > > > > > >>> was
> > > > > > >>> wrong with my sanlock storage.  YMMV
> > > > > > >>> Ted Miller
> > > > > > >>> Elkhart, IN, USA
> > > > > > >>>
> > > > > > >
> > > > > > 
> > > > > > Maurice - which type of storage is this?
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > Users mailing list
> > > > > Users at ovirt.org
> > > > > http://lists.ovirt.org/mailman/listinfo/users
> > > > > 
> > > 
> > > 
> > > 
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> > 
> 



More information about the Users mailing list