
On 4/1/20 11:54 AM, Martin Perina wrote:
On Wed, Apr 1, 2020 at 11:15 AM Marcin Sobczyk <msobczyk@redhat.com <mailto:msobczyk@redhat.com>> wrote:
On 4/1/20 11:06 AM, Marcin Sobczyk wrote: > > > On 4/1/20 9:51 AM, Marcin Sobczyk wrote: >> Hi, >> >> On 4/1/20 8:44 AM, Yedidyah Bar David wrote: >>> On Wed, Apr 1, 2020 at 6:21 AM <jenkins@jenkins.phx.ovirt.org <mailto:jenkins@jenkins.phx.ovirt.org>> wrote: >>>> Project: >>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
>>>> >>>> Build: >>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/ >>> Previous build 1547 passed!, after many months of failing, thanks to >>> Evgeny's work >>> in recent weeks. Above one failed. >>> I think the root cause is that the engine tried to connect to vdsm >>> right after >>> successfully finishing ansible host-deploy, but failed. vdsm.log has: >>> >>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/...
>>> >>> >>> 2020-03-31 22:58:49,773-0400 ERROR (Reactor thread) [vds.dispatcher] >>> uncaptured python exception, closing channel >>> <yajsonrpc.betterAsyncore.Dispatcher connected >>> ('::ffff:192.168.222.76', 46754, 0, 0) at 0x7f416c150a90> (<class >>> 'ssl.SSLError'>:[X509] no certificate or crl found (_ssl.c:3771) >>> [/usr/lib64/python3.6/asyncore.py|readwrite|110] >>> [/usr/lib64/python3.6/asyncore.py|handle_write_event|442] >>> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_write|74]
>>> >>> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168]
>>> >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|handle_write|190] >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_handle_io|194] >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_set_up_socket|154]) >>> (betterAsyncore:179) >>> >>> Not sure what might have caused this. Can anyone have a look? Thanks. >> Probably caused by https://gerrit.ovirt.org/108016 >> Looking into this. >> > Turns out that the patch is not the cause of the error per se - it simply > uncovered a different problem - the CA on the hosts is broken: > > [root@lago-basic-suite-master-host-0 certs]# openssl x509 -in > /etc/pki/vdsm/certs/cacert.pem -text > unable to load certificate > 139987452258112:error:0909006C:PEM routines:get_name:no start > line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE It looks like they have spaces instead of newlines. When I manually replaced the spaces to newlines, openssl is able to read them.
Martin/Dana, couldn't this be caused by any recent changes in ansible-runner integrations?
This looks like a suspect to me: https://gerrit.ovirt.org/#/c/107683/5/backend/manager/modules/bll/src/main/j...
> >>> >>>> Build Number: 1548 >>>> Build Status: Failure >>>> Triggered By: Started by timer >>>> >>>> ------------------------------------- >>>> Changes Since Last Success: >>>> ------------------------------------- >>>> Changes for Build #1548 >>>> [Galit Rosenthal] Fix the repo for suites that weren't moved to no >>>> reposync >>>> >>>> >>>> >>>> >>>> ----------------- >>>> Failed Tests: >>>> ----------------- >>>> No tests ran. >>> >>> >> >
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.