Re: OST Network suite is failing on "OSError: [Errno 28] No space left on device"

19 Mar 2018

      Thanks Gal, I expect the problem is fixed until something eats
all space in /dev/shm.
But the usage of /dev/shm is logged in the output, so we would be able
to detect the problem next time instantly.

From my point of view it would be good to know why /dev/shm was full,
to prevent this situation in future.

 On Mon, 19 Mar 2018 18:44:54
+0200 Gal Ben Haim <gbenhaim@redhat.com> wrote:
...
I see that this failure happens a lot on "ovirt-srv19.phx.ovirt.org
<http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org>", and by
different projects that uses ansible.
Not sure it relates, but I've found (and removed) a stale lago
environment in "/dev/shm" that were created by
ovirt-system-tests_he-basic-iscsi-suite -master
<http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-iscsi-suite-master/>
.
The stale environment caused the suite to not run in "/dev/shm".
The maximum number of semaphore on both  ovirt-srv19.phx.ovirt.org
<http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> and
ovirt-srv23.phx.ovirt.org
<http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> (which
run the ansible suite with success) is 128.
On Mon, Mar 19, 2018 at 3:37 PM, Yedidyah Bar David <didi@redhat.com>
wrote:
...
Failed also here:
http://jenkins.ovirt.org/job/ovirt-system-tests_master_
check-patch-el7-x86_64/4540/
The patch trigerring this affects many suites, and the job failed
during ansible-suite-master .
On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri <eedri@redhat.com> wrote:
...
Gal and Daniel are looking into it, strange its not affecting all
suites.
On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler
<dholler@redhat.com> wrote:
...
Looks like /dev/shm is run out of space.
On Mon, 19 Mar 2018 13:33:28 +0200
Leon Goldberg <lgoldber@redhat.com> wrote:
...
Hey, any updates?
On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas <ehaas@redhat.com>
wrote:
...
We are doing nothing special there, just executing ansible
through their API.
On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
<dbelenky@redhat.com> wrote:
> It's not a space issue. Other suites ran on that slave after
> your suite successfully.
> I think that the problem is the setting for max semaphores,
> though I don't know what you're doing to reach that limit.
>
> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
>
> ------ Semaphore Limits --------
> max number of arrays = 128
> max semaphores per array = 250
> max semaphores system wide = 32000
> max ops per semop call = 32
> semaphore max value = 32767
>
>
> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas
> <ehaas@redhat.com> wrote:  
>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit  
e-master/  
>>
>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
>> <dbelenky@redhat.com> wrote:
>>  
>>> Hi Edi,
>>>
>>> Are there any logs? where you're running the suite? may I
>>> have a link?
>>>
>>> On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas
>>> <ehaas@redhat.com> wrote:  
>>>> Good morning,
>>>>
>>>> We are running in the OST network suite a test module with
>>>> Ansible and it started failing during the weekend on
>>>> "OSError: [Errno 28] No space left on device" when
>>>> attempting to take a lock in the mutiprocessing python
>>>> module.
>>>>
>>>> It smells like a slave resource problem, could someone
>>>> help investigate this?
>>>>
>>>> Thanks,
>>>> Edy.
>>>>
>>>> =================================== FAILURES
>>>> =================================== ______________________
>>>> test_ovn_provider_create_scenario _______________________
>>>>
>>>> os_client_config = None
>>>>
>>>>     def
>>>> test_ovn_provider_create_scenario(os_client_config):  
>>>> >       _test_ovn_provider('create_scenario.yml')  
>>>>
>>>> network-suite-master/tests/test_ovn_provider.py:68:
>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>>> _ _ _ _ _ _ _ _ _ _ _
>>>> network-suite-master/tests/test_ovn_provider.py:78: in
>>>> _test_ovn_provider playbook.run()
>>>> network-suite-master/lib/ansiblelib.py:127: in run
>>>> self._run_playbook_executor()
>>>> network-suite-master/lib/ansiblelib.py:138: in
>>>> _run_playbook_executor pbex =
>>>> PlaybookExecutor(**self._pbex_args)  
/usr/lib/python2.7/site-packages/ansible/executor/playbook_e
xecutor.py:60:  
>>>> in __init__ self._tqm =
>>>> TaskQueueManager(inventory=inventory,
>>>> variable_manager=variable_manager, loader=loader,
>>>> options=options,
>>>> passwords=self.passwords) /usr/lib/python2.7/site-packag  
es/ansible/executor/task_queue_manager.py:104:  
>>>> in __init__ self._final_q =
>>>> multiprocessing.Queue() /usr/lib64/python2.7/multiproc  
essing/__init__.py:218:  
>>>> in Queue return
>>>> Queue(maxsize) /usr/lib64/python2.7/multiproc  
essing/queues.py:63:  
>>>> in __init__ self._rlock =
>>>> Lock() /usr/lib64/python2.7/multiprocessing/synchronize.py:147:
>>>> in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) _ _ _
>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>>> _ _ _ _ _ _ _ _
>>>>
>>>> self = <Lock(owner=unknown)>, kind = 1, value = 1,
>>>> maxvalue = 1
>>>>
>>>>     def __init__(self, kind, value, maxvalue):  
>>>> >       sl = self._semlock =
>>>> > _multiprocessing.SemLock(kind, value, maxvalue)  
>>>> E       OSError: [Errno 28] No space left on device
>>>>
>>>> /usr/lib64/python2.7/multiprocessing/synchronize.py:75:
>>>> OSError
>>>>
>>>>  
>>>
>>>
>>> --
>>>
>>> DANIEL BELENKY
>>>
>>> RHV DEVOPS
>>>  
>>
>>  
>
>
> --
>
> DANIEL BELENKY
>
> RHV DEVOPS
>
_______________________________________________
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra
--
Eyal edri
MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED.
<https://redhat.com/trusted> phone: +972-9-7692018
<+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra
--
Didi