On Wed, Apr 1, 2020 at 11:15 AM Marcin Sobczyk <msobczyk@redhat.com> wrote:


On 4/1/20 11:06 AM, Marcin Sobczyk wrote:
>
>
> On 4/1/20 9:51 AM, Marcin Sobczyk wrote:
>> Hi,
>>
>> On 4/1/20 8:44 AM, Yedidyah Bar David wrote:
>>> On Wed, Apr 1, 2020 at 6:21 AM <jenkins@jenkins.phx.ovirt.org> wrote:
>>>> Project:
>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
>>>>
>>>> Build:
>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/
>>> Previous build 1547 passed!, after many months of failing, thanks to
>>> Evgeny's work
>>> in recent weeks. Above one failed.
>>> I think the root cause is that the engine tried to connect to vdsm
>>> right after
>>> successfully finishing ansible host-deploy, but failed. vdsm.log has:
>>>
>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/artifact/exported-artifacts/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
>>>
>>>
>>> 2020-03-31 22:58:49,773-0400 ERROR (Reactor thread) [vds.dispatcher]
>>> uncaptured python exception, closing channel
>>> <yajsonrpc.betterAsyncore.Dispatcher connected
>>> ('::ffff:192.168.222.76', 46754, 0, 0) at 0x7f416c150a90> (<class
>>> 'ssl.SSLError'>:[X509] no certificate or crl found (_ssl.c:3771)
>>> [/usr/lib64/python3.6/asyncore.py|readwrite|110]
>>> [/usr/lib64/python3.6/asyncore.py|handle_write_event|442]
>>> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_write|74]
>>>
>>> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168]
>>>
>>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|handle_write|190]
>>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_handle_io|194]
>>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_set_up_socket|154])
>>> (betterAsyncore:179)
>>>
>>> Not sure what might have caused this. Can anyone have a look? Thanks.
>> Probably caused by https://gerrit.ovirt.org/108016
>> Looking into this.
>>
> Turns out that the patch is not the cause of the error per se - it simply
> uncovered a different problem - the CA on the hosts is broken:
>
> [root@lago-basic-suite-master-host-0 certs]# openssl x509 -in
> /etc/pki/vdsm/certs/cacert.pem -text
> unable to load certificate
> 139987452258112:error:0909006C:PEM routines:get_name:no start
> line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE
It looks like they have spaces instead of newlines.
When I manually replaced the spaces to newlines, openssl is able to read
them.

Martin/Dana, couldn't this be caused by any recent changes in ansible-runner integrations?

>
>>>
>>>> Build Number: 1548
>>>> Build Status:  Failure
>>>> Triggered By: Started by timer
>>>>
>>>> -------------------------------------
>>>> Changes Since Last Success:
>>>> -------------------------------------
>>>> Changes for Build #1548
>>>> [Galit Rosenthal] Fix the repo for suites that weren't moved to no
>>>> reposync
>>>>
>>>>
>>>>
>>>>
>>>> -----------------
>>>> Failed Tests:
>>>> -----------------
>>>> No tests ran.
>>>
>>>
>>
>



--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.