On 4/1/20 11:06 AM, Marcin Sobczyk wrote:
>
>
> On 4/1/20 9:51 AM, Marcin Sobczyk wrote:
>> Hi,
>>
>> On 4/1/20 8:44 AM, Yedidyah Bar David wrote:
>>> On Wed, Apr 1, 2020 at 6:21 AM <jenkins@jenkins.phx.ovirt.org>
wrote:
>>>> Project:
>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
>>>>
>>>> Build:
>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/
>>> Previous build 1547 passed!, after many months
of failing, thanks to
>>> Evgeny's work
>>> in recent weeks. Above one failed.
>>> I think the root cause is that the engine tried
to connect to vdsm
>>> right after
>>> successfully finishing ansible host-deploy, but
failed. vdsm.log has:
>>>
>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/artifact/exported-artifacts/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
>>>
>>>
>>> 2020-03-31 22:58:49,773-0400 ERROR (Reactor
thread) [vds.dispatcher]
>>> uncaptured python exception, closing channel
>>> <yajsonrpc.betterAsyncore.Dispatcher
connected
>>> ('::ffff:192.168.222.76', 46754, 0, 0) at
0x7f416c150a90> (<class
>>> 'ssl.SSLError'>:[X509] no certificate or crl
found (_ssl.c:3771)
>>>
[/usr/lib64/python3.6/asyncore.py|readwrite|110]
>>>
[/usr/lib64/python3.6/asyncore.py|handle_write_event|442]
>>>
[/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_write|74]
>>>
>>>
[/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168]
>>>
>>>
[/usr/lib/python3.6/site-packages/vdsm/sslutils.py|handle_write|190]
>>>
[/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_handle_io|194]
>>>
[/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_set_up_socket|154])
>>> (betterAsyncore:179)
>>>
>>> Not sure what might have caused this. Can
anyone have a look? Thanks.
>> Probably caused by https://gerrit.ovirt.org/108016
>> Looking into this.
>>
> Turns out that the patch is not the cause of the error
per se - it simply
> uncovered a different problem - the CA on the hosts is
broken:
>
> [root@lago-basic-suite-master-host-0 certs]# openssl
x509 -in
> /etc/pki/vdsm/certs/cacert.pem -text
> unable to load certificate
> 139987452258112:error:0909006C:PEM routines:get_name:no
start
> line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED
CERTIFICATE
It looks like they have spaces instead of newlines.
When I manually replaced the spaces to newlines, openssl is
able to read
them.
Martin/Dana, couldn't this be caused
by any recent changes in ansible-runner integrations?