[ovirt-devel] [VDSM] stuck tests in ci
Nir Soffer
nsoffer at redhat.com
Tue May 31 17:35:45 UTC 2016
On Tue, May 31, 2016 at 8:16 PM, Piotr Kliczewski
<piotr.kliczewski at gmail.com> wrote:
> All,
>
> I just noticed one more build [1] which got stuck with:
>
> 15:46:40 Traceback (most recent call last):
> 15:46:40 File "/usr/lib64/python2.7/threading.py", line 804, in
> __bootstrap_inner
> 15:46:40 File "/usr/lib64/python2.7/threading.py", line 757, in run
> 15:46:40 File
> "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 181, in
> _communicate
> 15:46:40 <type 'exceptions.AttributeError'>: 'NoneType' object has no
> attribute 'close'
Yea, we still did not fix this issue, we need a backtrace to understand which
thread is blocking python.
>
> Thanks,
> Piotr
>
> [1] http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/2380/console
>
> On Sat, May 21, 2016 at 8:28 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>> The issue is non-daemon thread blocking the python process during
>> shutdown of the tests.
>>
>> Current ioprocess does not create such thread, but we still see this
>> issue today:
>> http://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/1841/console
>>
>> If the builds are using the latest ioprocess build (0.16.0-1), built after
>> Sun May 15 21:29:24 2016 +0300, this is probably not related to ioprocess
>>
>> To understand this issue we need to get a stacktrace from the stuck python
>> process.
>>
>> See relevant log bellow.
>>
>>
>> Nir
>>
>> ----
>>
>> 11:49:30 --------------------------------------------------------------------------------------------------------------------------------------------
>> 11:49:30 TOTAL
>> 40672 21072 48%
>> 11:49:30 ----------------------------------------------------------------------
>> 11:49:30 Ran 2182 tests in 147.661s
>> 11:49:30
>> 11:49:30 OK (SKIP=94)
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'udev_unref'" in <bound method Context.__del__ of <pyudev.core.Context
>> object at 0x7312610>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5435350>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5420850>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x7269910>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5420f90>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5419c10>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x610e610>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x6dc4d50>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x610f390>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x54195d0>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5432510>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x6110250>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5414750>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5414150>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x6a33390>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x543d590>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x6ba8390>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5432d50>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5435a90>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x503f350>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5420250>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5442e90>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x503f950>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x610e7d0>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x5414d90>> ignored
>> 11:49:30 Exception AttributeError: "'NoneType' object has no attribute
>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>> object at 0x543d450>> ignored
>> 11:49:30 Exception in thread ioprocess communication (8008) (most
>> likely raised during interpreter shutdown):
>> 11:49:30 Traceback (most recent call last):
>> 11:49:30 File "/usr/lib64/python2.7/threading.py", line 811, in
>> __bootstrap_inner
>> 11:49:30 File "/usr/lib64/python2.7/threading.py", line 764, in run
>> 11:49:30 File
>> "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 180, in
>> _communicate
>> 11:49:30 <type 'exceptions.AttributeError'>: 'NoneType' object has no
>> attribute 'close'
>> 17:42:17 Build timed out (after 360 minutes). Marking the build as failed.
>> 17:42:17 Build was aborted
>>
>> On Fri, May 20, 2016 at 10:30 AM, Piotr Kliczewski
>> <piotr.kliczewski at gmail.com> wrote:
>>> Eyal,
>>>
>>> This was ioprocess issue occurring after the fix was provided. I
>>> haven't seen it since build #1389.
>>>
>>> Thanks,
>>> Piotr
>>>
>>> On Thu, May 19, 2016 at 3:00 PM, Eyal Edri <eedri at redhat.com> wrote:
>>>> was that resolved?
>>>> any infra issue or was it problems with the tests?
>>>>
>>>> On Mon, May 16, 2016 at 3:27 PM, Piotr Kliczewski
>>>> <piotr.kliczewski at gmail.com> wrote:
>>>>>
>>>>> and one more:
>>>>>
>>>>>
>>>>> http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/1389/console
>>>>>
>>>>> On Mon, May 16, 2016 at 1:46 PM, Piotr Kliczewski
>>>>> <piotr.kliczewski at gmail.com> wrote:
>>>>> > One more occurrence of the issue [1]
>>>>> >
>>>>> >
>>>>> > [1]
>>>>> > http://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/1359/console
>>>>> >
>>>>> > On Sun, May 15, 2016 at 8:37 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>>>>> >> The ioprocess issue fixed in https://gerrit.ovirt.org/57473
>>>>> >>
>>>>> >> Will be merge soon and available via ovirt-release-master.
>>>>> >>
>>>>> >> Nir
>>>>> >>
>>>>> >> On Sun, May 15, 2016 at 7:45 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>>>>> >>> Hi all,
>>>>> >>>
>>>>> >>> I found another stuck build today:
>>>>> >>>
>>>>> >>> http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/1151/console
>>>>> >>>
>>>>> >>> 11:27:18
>>>>> >>> ------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> >>> 11:27:18 TOTAL
>>>>> >>> 40513 21121
>>>>> >>> 48%
>>>>> >>> 11:27:18
>>>>> >>> ----------------------------------------------------------------------
>>>>> >>> 11:27:18 Ran 2169 tests in 145.934s
>>>>> >>> 11:27:18
>>>>> >>> 11:27:18 OK (SKIP=88)
>>>>> >>> 11:27:18 Exception AttributeError: "'NoneType' object has no attribute
>>>>> >>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>>>>> >>> object at 0x7fd7c9f2d3d0>> ignored
>>>>> >>> [...]
>>>>> >>> 11:27:18 Exception AttributeError: "'NoneType' object has no attribute
>>>>> >>> 'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
>>>>> >>> object at 0x7fd7c9f15550>> ignored
>>>>> >>> 11:27:18 Exception in thread ioprocess communication (6533) (most
>>>>> >>> likely raised during interpreter shutdown):
>>>>> >>> 11:27:18 Traceback (most recent call last):
>>>>> >>> 11:27:18 File "/usr/lib64/python2.7/threading.py", line 804, in
>>>>> >>> __bootstrap_inner
>>>>> >>> 11:27:18 File "/usr/lib64/python2.7/threading.py", line 757, in run
>>>>> >>> 11:27:18 File
>>>>> >>> "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 180, in
>>>>> >>> _communicate
>>>>> >>> 11:27:18 <type 'exceptions.AttributeError'>: 'NoneType' object has no
>>>>> >>> attribute 'close'
>>>>> >>>
>>>>> >>> This seems smells like a non-daemon thread started by some code,
>>>>> >>> blocking hte test process.
>>>>> >>>
>>>>> >>> I suspect ioprocess, starting such thread, looking into it.
>>>>> >>>
>>>>> >>> Meanwhile, please:
>>>>> >>> - verify that all threads in actual code and in the tests are daemon
>>>>> >>> threads
>>>>> >>> - convert your threads to use vdsm.concurrent.thread instead of
>>>>> >>> threading.Thread (daemon by default)
>>>>> >>> - watch your builds and abort stuck builds
>>>>> >>>
>>>>> >>> David, we need a timeout in the ci, aborting the job after a project
>>>>> >>> based timeout, maybe
>>>>> >>> defined in the project yaml.
>>>>> >>>
>>>>> >>> Cheers,
>>>>> >>> Nir
>>>>> >> _______________________________________________
>>>>> >> Devel mailing list
>>>>> >> Devel at ovirt.org
>>>>> >> http://lists.ovirt.org/mailman/listinfo/devel
>>>>> _______________________________________________
>>>>> Devel mailing list
>>>>> Devel at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Eyal Edri
>>>> Associate Manager
>>>> RHEV DevOps
>>>> EMEA ENG Virtualization R&D
>>>> Red Hat Israel
>>>>
>>>> phone: +972-9-7692018
>>>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
More information about the Devel
mailing list