Thanks Gal, I expect the problem is fixed until something eats
all space in /dev/shm.
But the usage of /dev/shm is logged in the output, so we would be able
to detect the problem next time instantly.
From my point of view it would be good to know why /dev/shm was full,
to prevent this situation in future.
On Mon, 19 Mar 2018 18:44:54
+0200 Gal Ben Haim <gbenhaim(a)redhat.com> wrote:
I see that this failure happens a lot on
"ovirt-srv19.phx.ovirt.org
<
http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org>", and by
different projects that uses ansible.
Not sure it relates, but I've found (and removed) a stale lago
environment in "/dev/shm" that were created by
ovirt-system-tests_he-basic-iscsi-suite -master
<
http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tes...
.
The stale environment caused the suite to not run in "/dev/shm".
The maximum number of semaphore on both
ovirt-srv19.phx.ovirt.org
<
http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> and
ovirt-srv23.phx.ovirt.org
<
http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> (which
run the ansible suite with success) is 128.
On Mon, Mar 19, 2018 at 3:37 PM, Yedidyah Bar David <didi(a)redhat.com>
wrote:
> Failed also here:
>
>
http://jenkins.ovirt.org/job/ovirt-system-tests_master_
> check-patch-el7-x86_64/4540/
>
> The patch trigerring this affects many suites, and the job failed
> during ansible-suite-master .
>
> On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri <eedri(a)redhat.com> wrote:
>
>> Gal and Daniel are looking into it, strange its not affecting all
>> suites.
>>
>> On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler
>> <dholler(a)redhat.com> wrote:
>>
>>> Looks like /dev/shm is run out of space.
>>>
>>> On Mon, 19 Mar 2018 13:33:28 +0200
>>> Leon Goldberg <lgoldber(a)redhat.com> wrote:
>>>
>>> > Hey, any updates?
>>> >
>>> > On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas <ehaas(a)redhat.com>
>>> > wrote:
>>> >
>>> > > We are doing nothing special there, just executing ansible
>>> > > through their API.
>>> > >
>>> > > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky
>>> > > <dbelenky(a)redhat.com> wrote:
>>> > >
>>> > >> It's not a space issue. Other suites ran on that slave
after
>>> > >> your suite successfully.
>>> > >> I think that the problem is the setting for max semaphores,
>>> > >> though I don't know what you're doing to reach that
limit.
>>> > >>
>>> > >> [dbelenky@ovirt-srv18 ~]$ ipcs -ls
>>> > >>
>>> > >> ------ Semaphore Limits --------
>>> > >> max number of arrays = 128
>>> > >> max semaphores per array = 250
>>> > >> max semaphores system wide = 32000
>>> > >> max ops per semop call = 32
>>> > >> semaphore max value = 32767
>>> > >>
>>> > >>
>>> > >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas
>>> > >> <ehaas(a)redhat.com> wrote:
>>> > >>>
http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit
>>> e-master/
>>> > >>>
>>> > >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky
>>> > >>> <dbelenky(a)redhat.com> wrote:
>>> > >>>
>>> > >>>> Hi Edi,
>>> > >>>>
>>> > >>>> Are there any logs? where you're running the
suite? may I
>>> > >>>> have a link?
>>> > >>>>
>>> > >>>> On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas
>>> > >>>> <ehaas(a)redhat.com> wrote:
>>> > >>>>> Good morning,
>>> > >>>>>
>>> > >>>>> We are running in the OST network suite a test
module with
>>> > >>>>> Ansible and it started failing during the weekend
on
>>> > >>>>> "OSError: [Errno 28] No space left on
device" when
>>> > >>>>> attempting to take a lock in the mutiprocessing
python
>>> > >>>>> module.
>>> > >>>>>
>>> > >>>>> It smells like a slave resource problem, could
someone
>>> > >>>>> help investigate this?
>>> > >>>>>
>>> > >>>>> Thanks,
>>> > >>>>> Edy.
>>> > >>>>>
>>> > >>>>> =================================== FAILURES
>>> > >>>>> ===================================
______________________
>>> > >>>>> test_ovn_provider_create_scenario
_______________________
>>> > >>>>>
>>> > >>>>> os_client_config = None
>>> > >>>>>
>>> > >>>>> def
>>> > >>>>>
test_ovn_provider_create_scenario(os_client_config):
>>> > >>>>> >
_test_ovn_provider('create_scenario.yml')
>>> > >>>>>
>>> > >>>>>
network-suite-master/tests/test_ovn_provider.py:68:
>>> > >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
>>> > >>>>> _ _ _ _ _ _ _ _ _ _ _
>>> > >>>>>
network-suite-master/tests/test_ovn_provider.py:78: in
>>> > >>>>> _test_ovn_provider playbook.run()
>>> > >>>>> network-suite-master/lib/ansiblelib.py:127: in
run
>>> > >>>>> self._run_playbook_executor()
>>> > >>>>> network-suite-master/lib/ansiblelib.py:138: in
>>> > >>>>> _run_playbook_executor pbex =
>>> > >>>>> PlaybookExecutor(**self._pbex_args)
>>> /usr/lib/python2.7/site-packages/ansible/executor/playbook_e
>>> xecutor.py:60:
>>> > >>>>> in __init__ self._tqm =
>>> > >>>>> TaskQueueManager(inventory=inventory,
>>> > >>>>> variable_manager=variable_manager, loader=loader,
>>> > >>>>> options=options,
>>> > >>>>> passwords=self.passwords)
/usr/lib/python2.7/site-packag
>>> es/ansible/executor/task_queue_manager.py:104:
>>> > >>>>> in __init__ self._final_q =
>>> > >>>>> multiprocessing.Queue()
/usr/lib64/python2.7/multiproc
>>> essing/__init__.py:218:
>>> > >>>>> in Queue return
>>> > >>>>> Queue(maxsize) /usr/lib64/python2.7/multiproc
>>> essing/queues.py:63:
>>> > >>>>> in __init__ self._rlock =
>>> > >>>>> Lock()
/usr/lib64/python2.7/multiprocessing/synchronize.py:147:
>>> > >>>>> in __init__ SemLock.__init__(self, SEMAPHORE, 1,
1) _ _ _
>>> > >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
>>> > >>>>> _ _ _ _ _ _ _ _
>>> > >>>>>
>>> > >>>>> self = <Lock(owner=unknown)>, kind = 1,
value = 1,
>>> > >>>>> maxvalue = 1
>>> > >>>>>
>>> > >>>>> def __init__(self, kind, value, maxvalue):
>>> > >>>>> > sl = self._semlock =
>>> > >>>>> > _multiprocessing.SemLock(kind, value,
maxvalue)
>>> > >>>>> E OSError: [Errno 28] No space left on
device
>>> > >>>>>
>>> > >>>>>
/usr/lib64/python2.7/multiprocessing/synchronize.py:75:
>>> > >>>>> OSError
>>> > >>>>>
>>> > >>>>>
>>> > >>>>
>>> > >>>>
>>> > >>>> --
>>> > >>>>
>>> > >>>> DANIEL BELENKY
>>> > >>>>
>>> > >>>> RHV DEVOPS
>>> > >>>>
>>> > >>>
>>> > >>>
>>> > >>
>>> > >>
>>> > >> --
>>> > >>
>>> > >> DANIEL BELENKY
>>> > >>
>>> > >> RHV DEVOPS
>>> > >>
>>> > >
>>> > >
>>>
>>> _______________________________________________
>>> Infra mailing list
>>> Infra(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/infra
>>>
>>
>>
>>
>> --
>>
>> Eyal edri
>>
>>
>> MANAGER
>>
>> RHV DevOps
>>
>> EMEA VIRTUALIZATION R&D
>>
>>
>> Red Hat EMEA <
https://www.redhat.com/>
>> <
https://red.ht/sig> TRIED. TESTED. TRUSTED.
>> <
https://redhat.com/trusted> phone: +972-9-7692018
>> <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>
>> _______________________________________________
>> Infra mailing list
>> Infra(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/infra
>>
>>
>
>
> --
> Didi
>