Hi all,
I found another stuck build today:
http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc23-x86_64/1151/con...
11:27:18
------------------------------------------------------------------------------------------------------------------------------------------------
11:27:18 TOTAL
40513 21121
48%
11:27:18 ----------------------------------------------------------------------
11:27:18 Ran 2169 tests in 145.934s
11:27:18
11:27:18 OK (SKIP=88)
11:27:18 Exception AttributeError: "'NoneType' object has no attribute
'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
object at 0x7fd7c9f2d3d0>> ignored
[...]
11:27:18 Exception AttributeError: "'NoneType' object has no attribute
'write'" in <bound method IOProcess.__del__ of <ioprocess.IOProcess
object at 0x7fd7c9f15550>> ignored
11:27:18 Exception in thread ioprocess communication (6533) (most
likely raised during interpreter shutdown):
11:27:18 Traceback (most recent call last):
11:27:18 File "/usr/lib64/python2.7/threading.py", line 804, in
__bootstrap_inner
11:27:18 File "/usr/lib64/python2.7/threading.py", line 757, in run
11:27:18 File
"/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 180, in
_communicate
11:27:18 <type 'exceptions.AttributeError'>: 'NoneType' object has
no
attribute 'close'
This seems smells like a non-daemon thread started by some code,
blocking hte test process.
I suspect ioprocess, starting such thread, looking into it.
Meanwhile, please:
- verify that all threads in actual code and in the tests are daemon threads
- convert your threads to use vdsm.concurrent.thread instead of
threading.Thread (daemon by default)
- watch your builds and abort stuck builds
David, we need a timeout in the ci, aborting the job after a project
based timeout, maybe
defined in the project yaml.
Cheers,
Nir