Debugging stuck vdsm jobs
Nir Soffer
nsoffer at redhat.com
Sat May 28 23:10:48 UTC 2016
It look like this when tests times out:
23:04:44 miscTests.EventTests
23:04:44 testEmit OK
23:04:44 testEmitCallbackException OK
23:04:49 testEmitStale OK
23:04:49 testInstanceMethod OK
23:04:50 testInstanceMethodDead OK
23:04:55 testOneShot
23:04:55 ========================================================================
23:04:55 = Timeout completing tests - extracting stacktrace
=
23:04:55 ========================================================================
23:04:55
23:04:55 attach: No such file or directory.
23:04:55 [New LWP 7887]
23:04:55 [New LWP 7880]
23:04:55 [New LWP 7873]
23:04:55 [New LWP 7866]
23:04:55 [New LWP 7859]
23:04:55 [New LWP 7852]
23:04:55 [New LWP 7845]
23:04:55 [Thread debugging using libthread_db enabled]
23:04:55 Using host libthread_db library "/lib64/libthread_db.so.1".
23:04:56 0x00007f17f0a1fa82 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
23:04:56
23:04:56 Thread 8 (Thread 0x7f17df860700 (LWP 7845)):
23:04:56 Undefined command: "py-bt". Try "help".
23:04:56 OK
23:04:56 testUnregister
23:04:56 ========================================================================
23:04:56 = Aborting tests
=
23:04:56 ========================================================================
23:04:56 ../tests/run_tests_local.sh: line 35: 7743 Killed
"$PYTHON_EXE" ../tests/testrunner.py --local-modules $@
On Sun, May 29, 2016 at 2:07 AM, Nir Soffer <nsoffer at redhat.com> wrote:
> On Thu, May 26, 2016 at 11:08 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>> Hi all,
>>
>> We had 2 issues causing vdsm check-patch and check-merge jobs to get stuck.
>>
>> I fixed the one that caused most trouble:
>> https://gerrit.ovirt.org/57993
>>
>> The other issue may be related to ioprocess, I fixed a related issue:
>> https://gerrit.ovirt.org/57473
>>
>> But I have seen stuck jobs after this change, so the issue may not
>> be fixed yet.
>>
>> If you see a stuck vdsm job - job that run more than 15 minutes, please
>> get me a backtrace:
>>
>> 1. locate the test_runner process pid:
>>
>> $ ps aux | grep testrunner.py | grep -v grep
>> nsoffer 26297 82.6 0.9 389592 111144 pts/3 R+ 22:52 0:02
>> /usr/bin/python ../tests/testrunner.py ...
>>
>> 2. save a backtrace:
>>
>> gdb attach 26297 --batch -ex "thread apply all py-bt" > py-bt.out
>
> This requires the python-debuginfo package, typically installed using:
>
> dnf debuginfo-install python
>
> I sent this patch, detecting stuck vdsm tests, printing a backtrace, and killing
> the stuck process:
> https://gerrit.ovirt.org/58212
>
> It works, but we don't get a backtrace, since python-debuginfo is not installed
> although I require it - probably we need to add the fedora-debug repository
> to check-patch.repos. I tried to use the urls from /etc/yum.repos.d/fedora.repo,
> but none of them work.
>
> I will need help from infra to get it working.
>
> Nir
More information about the Infra
mailing list