OST Network suite is failing on "OSError: [Errno 28] No space left on device"

Dominik Holler dholler at redhat.com
Mon Mar 19 16:56:45 UTC 2018


Thanks Gal, I expect the problem is fixed until something eats
all space in /dev/shm.
But the usage of /dev/shm is logged in the output, so we would be able
to detect the problem next time instantly.

From my point of view it would be good to know why /dev/shm was full,
to prevent this situation in future.


 On Mon, 19 Mar 2018 18:44:54
+0200 Gal Ben Haim <gbenhaim at redhat.com> wrote:

> I see that this failure happens a lot on "ovirt-srv19.phx.ovirt.org
> <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org>", and by
> different projects that uses ansible.
> Not sure it relates, but I've found (and removed) a stale lago
> environment in "/dev/shm" that were created by
> ovirt-system-tests_he-basic-iscsi-suite -master
> <http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-iscsi-suite-master/>
> .
> The stale environment caused the suite to not run in "/dev/shm".
> The maximum number of semaphore on both  ovirt-srv19.phx.ovirt.org
> <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> and
> ovirt-srv23.phx.ovirt.org
> <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> (which
> run the ansible suite with success) is 128.
> 
> On Mon, Mar 19, 2018 at 3:37 PM, Yedidyah Bar David <didi at redhat.com>
> wrote:
> 
> > Failed also here:
> >
> > http://jenkins.ovirt.org/job/ovirt-system-tests_master_
> > check-patch-el7-x86_64/4540/
> >
> > The patch trigerring this affects many suites, and the job failed
> > during ansible-suite-master .
> >
> > On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri <eedri at redhat.com> wrote:
> >  
> >> Gal and Daniel are looking into it, strange its not affecting all
> >> suites.
> >>
> >> On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler
> >> <dholler at redhat.com> wrote:
> >>  
> >>> Looks like /dev/shm is run out of space.
> >>>
> >>> On Mon, 19 Mar 2018 13:33:28 +0200
> >>> Leon Goldberg <lgoldber at redhat.com> wrote:
> >>>  
> >>> > Hey, any updates?
> >>> >
> >>> > On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas <ehaas at redhat.com>
> >>> > wrote:
> >>> >  
> >>> > > We are doing nothing special there, just executing ansible
> >>> > > through their API.
> >>> > >
> >>> > > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
> >>> > > <dbelenky at redhat.com> wrote:
> >>> > >  
> >>> > >> It's not a space issue. Other suites ran on that slave after
> >>> > >> your suite successfully.
> >>> > >> I think that the problem is the setting for max semaphores,
> >>> > >> though I don't know what you're doing to reach that limit.
> >>> > >>
> >>> > >> [dbelenky at ovirt-srv18 ~]$ ipcs -ls
> >>> > >>
> >>> > >> ------ Semaphore Limits --------
> >>> > >> max number of arrays = 128
> >>> > >> max semaphores per array = 250
> >>> > >> max semaphores system wide = 32000
> >>> > >> max ops per semop call = 32
> >>> > >> semaphore max value = 32767
> >>> > >>
> >>> > >>
> >>> > >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas
> >>> > >> <ehaas at redhat.com> wrote:  
> >>> > >>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit  
> >>> e-master/  
> >>> > >>>
> >>> > >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
> >>> > >>> <dbelenky at redhat.com> wrote:
> >>> > >>>  
> >>> > >>>> Hi Edi,
> >>> > >>>>
> >>> > >>>> Are there any logs? where you're running the suite? may I
> >>> > >>>> have a link?
> >>> > >>>>
> >>> > >>>> On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas
> >>> > >>>> <ehaas at redhat.com> wrote:  
> >>> > >>>>> Good morning,
> >>> > >>>>>
> >>> > >>>>> We are running in the OST network suite a test module with
> >>> > >>>>> Ansible and it started failing during the weekend on
> >>> > >>>>> "OSError: [Errno 28] No space left on device" when
> >>> > >>>>> attempting to take a lock in the mutiprocessing python
> >>> > >>>>> module.
> >>> > >>>>>
> >>> > >>>>> It smells like a slave resource problem, could someone
> >>> > >>>>> help investigate this?
> >>> > >>>>>
> >>> > >>>>> Thanks,
> >>> > >>>>> Edy.
> >>> > >>>>>
> >>> > >>>>> =================================== FAILURES
> >>> > >>>>> =================================== ______________________
> >>> > >>>>> test_ovn_provider_create_scenario _______________________
> >>> > >>>>>
> >>> > >>>>> os_client_config = None
> >>> > >>>>>
> >>> > >>>>>     def
> >>> > >>>>> test_ovn_provider_create_scenario(os_client_config):  
> >>> > >>>>> >       _test_ovn_provider('create_scenario.yml')  
> >>> > >>>>>
> >>> > >>>>> network-suite-master/tests/test_ovn_provider.py:68:
> >>> > >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> >>> > >>>>> _ _ _ _ _ _ _ _ _ _ _
> >>> > >>>>> network-suite-master/tests/test_ovn_provider.py:78: in
> >>> > >>>>> _test_ovn_provider playbook.run()
> >>> > >>>>> network-suite-master/lib/ansiblelib.py:127: in run
> >>> > >>>>> self._run_playbook_executor()
> >>> > >>>>> network-suite-master/lib/ansiblelib.py:138: in
> >>> > >>>>> _run_playbook_executor pbex =
> >>> > >>>>> PlaybookExecutor(**self._pbex_args)  
> >>> /usr/lib/python2.7/site-packages/ansible/executor/playbook_e
> >>> xecutor.py:60:  
> >>> > >>>>> in __init__ self._tqm =
> >>> > >>>>> TaskQueueManager(inventory=inventory,
> >>> > >>>>> variable_manager=variable_manager, loader=loader,
> >>> > >>>>> options=options,
> >>> > >>>>> passwords=self.passwords) /usr/lib/python2.7/site-packag  
> >>> es/ansible/executor/task_queue_manager.py:104:  
> >>> > >>>>> in __init__ self._final_q =
> >>> > >>>>> multiprocessing.Queue() /usr/lib64/python2.7/multiproc  
> >>> essing/__init__.py:218:  
> >>> > >>>>> in Queue return
> >>> > >>>>> Queue(maxsize) /usr/lib64/python2.7/multiproc  
> >>> essing/queues.py:63:  
> >>> > >>>>> in __init__ self._rlock =
> >>> > >>>>> Lock() /usr/lib64/python2.7/multiprocessing/synchronize.py:147:
> >>> > >>>>> in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) _ _ _
> >>> > >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> >>> > >>>>> _ _ _ _ _ _ _ _
> >>> > >>>>>
> >>> > >>>>> self = <Lock(owner=unknown)>, kind = 1, value = 1,
> >>> > >>>>> maxvalue = 1
> >>> > >>>>>
> >>> > >>>>>     def __init__(self, kind, value, maxvalue):  
> >>> > >>>>> >       sl = self._semlock =
> >>> > >>>>> > _multiprocessing.SemLock(kind, value, maxvalue)  
> >>> > >>>>> E       OSError: [Errno 28] No space left on device
> >>> > >>>>>
> >>> > >>>>> /usr/lib64/python2.7/multiprocessing/synchronize.py:75:
> >>> > >>>>> OSError
> >>> > >>>>>
> >>> > >>>>>  
> >>> > >>>>
> >>> > >>>>
> >>> > >>>> --
> >>> > >>>>
> >>> > >>>> DANIEL BELENKY
> >>> > >>>>
> >>> > >>>> RHV DEVOPS
> >>> > >>>>  
> >>> > >>>
> >>> > >>>  
> >>> > >>
> >>> > >>
> >>> > >> --
> >>> > >>
> >>> > >> DANIEL BELENKY
> >>> > >>
> >>> > >> RHV DEVOPS
> >>> > >>  
> >>> > >
> >>> > >  
> >>>
> >>> _______________________________________________
> >>> Infra mailing list
> >>> Infra at ovirt.org
> >>> http://lists.ovirt.org/mailman/listinfo/infra
> >>>  
> >>
> >>
> >>
> >> --
> >>
> >> Eyal edri
> >>
> >>
> >> MANAGER
> >>
> >> RHV DevOps
> >>
> >> EMEA VIRTUALIZATION R&D
> >>
> >>
> >> Red Hat EMEA <https://www.redhat.com/>
> >> <https://red.ht/sig> TRIED. TESTED. TRUSTED.
> >> <https://redhat.com/trusted> phone: +972-9-7692018
> >> <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
> >>
> >> _______________________________________________
> >> Infra mailing list
> >> Infra at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/infra
> >>
> >>  
> >
> >
> > --
> > Didi
> >  
> 
> 
> 



More information about the Infra mailing list