
Thanks Gal, I expect the problem is fixed until something eats all space in /dev/shm. But the usage of /dev/shm is logged in the output, so we would be able to detect the problem next time instantly. From my point of view it would be good to know why /dev/shm was full, to prevent this situation in future. On Mon, 19 Mar 2018 18:44:54 +0200 Gal Ben Haim <gbenhaim@redhat.com> wrote:
I see that this failure happens a lot on "ovirt-srv19.phx.ovirt.org <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org>", and by different projects that uses ansible. Not sure it relates, but I've found (and removed) a stale lago environment in "/dev/shm" that were created by ovirt-system-tests_he-basic-iscsi-suite -master <http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-iscsi-suite-master/> . The stale environment caused the suite to not run in "/dev/shm". The maximum number of semaphore on both ovirt-srv19.phx.ovirt.org <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> and ovirt-srv23.phx.ovirt.org <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> (which run the ansible suite with success) is 128.
On Mon, Mar 19, 2018 at 3:37 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Failed also here:
http://jenkins.ovirt.org/job/ovirt-system-tests_master_ check-patch-el7-x86_64/4540/
The patch trigerring this affects many suites, and the job failed during ansible-suite-master .
On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri <eedri@redhat.com> wrote:
Gal and Daniel are looking into it, strange its not affecting all suites.
On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler <dholler@redhat.com> wrote:
Looks like /dev/shm is run out of space.
On Mon, 19 Mar 2018 13:33:28 +0200 Leon Goldberg <lgoldber@redhat.com> wrote:
Hey, any updates?
On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas <ehaas@redhat.com> wrote:
We are doing nothing special there, just executing ansible through their API.
On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky <dbelenky@redhat.com> wrote:
> It's not a space issue. Other suites ran on that slave after > your suite successfully. > I think that the problem is the setting for max semaphores, > though I don't know what you're doing to reach that limit. > > [dbelenky@ovirt-srv18 ~]$ ipcs -ls > > ------ Semaphore Limits -------- > max number of arrays = 128 > max semaphores per array = 250 > max semaphores system wide = 32000 > max ops per semop call = 32 > semaphore max value = 32767 > > > On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas > <ehaas@redhat.com> wrote: >> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit e-master/ >> >> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky >> <dbelenky@redhat.com> wrote: >> >>> Hi Edi, >>> >>> Are there any logs? where you're running the suite? may I >>> have a link? >>> >>> On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas >>> <ehaas@redhat.com> wrote: >>>> Good morning, >>>> >>>> We are running in the OST network suite a test module with >>>> Ansible and it started failing during the weekend on >>>> "OSError: [Errno 28] No space left on device" when >>>> attempting to take a lock in the mutiprocessing python >>>> module. >>>> >>>> It smells like a slave resource problem, could someone >>>> help investigate this? >>>> >>>> Thanks, >>>> Edy. >>>> >>>> =================================== FAILURES >>>> =================================== ______________________ >>>> test_ovn_provider_create_scenario _______________________ >>>> >>>> os_client_config = None >>>> >>>> def >>>> test_ovn_provider_create_scenario(os_client_config): >>>> > _test_ovn_provider('create_scenario.yml') >>>> >>>> network-suite-master/tests/test_ovn_provider.py:68: >>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>>> _ _ _ _ _ _ _ _ _ _ _ >>>> network-suite-master/tests/test_ovn_provider.py:78: in >>>> _test_ovn_provider playbook.run() >>>> network-suite-master/lib/ansiblelib.py:127: in run >>>> self._run_playbook_executor() >>>> network-suite-master/lib/ansiblelib.py:138: in >>>> _run_playbook_executor pbex = >>>> PlaybookExecutor(**self._pbex_args) /usr/lib/python2.7/site-packages/ansible/executor/playbook_e xecutor.py:60: >>>> in __init__ self._tqm = >>>> TaskQueueManager(inventory=inventory, >>>> variable_manager=variable_manager, loader=loader, >>>> options=options, >>>> passwords=self.passwords) /usr/lib/python2.7/site-packag es/ansible/executor/task_queue_manager.py:104: >>>> in __init__ self._final_q = >>>> multiprocessing.Queue() /usr/lib64/python2.7/multiproc essing/__init__.py:218: >>>> in Queue return >>>> Queue(maxsize) /usr/lib64/python2.7/multiproc essing/queues.py:63: >>>> in __init__ self._rlock = >>>> Lock() /usr/lib64/python2.7/multiprocessing/synchronize.py:147: >>>> in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) _ _ _ >>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>>> _ _ _ _ _ _ _ _ >>>> >>>> self = <Lock(owner=unknown)>, kind = 1, value = 1, >>>> maxvalue = 1 >>>> >>>> def __init__(self, kind, value, maxvalue): >>>> > sl = self._semlock = >>>> > _multiprocessing.SemLock(kind, value, maxvalue) >>>> E OSError: [Errno 28] No space left on device >>>> >>>> /usr/lib64/python2.7/multiprocessing/synchronize.py:75: >>>> OSError >>>> >>>> >>> >>> >>> -- >>> >>> DANIEL BELENKY >>> >>> RHV DEVOPS >>> >> >> > > > -- > > DANIEL BELENKY > > RHV DEVOPS >
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
--
Eyal edri
MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- Didi