
I have debug the flow until the moment the request is being seng via http client to ansible runner service and until that point it was correct. The json did contain correctly formatted ovirt_ca_cert. Artur On Wed, Apr 1, 2020 at 12:26 PM Marcin Sobczyk <msobczyk@redhat.com> wrote:
On 4/1/20 11:54 AM, Martin Perina wrote:
On Wed, Apr 1, 2020 at 11:15 AM Marcin Sobczyk <msobczyk@redhat.com> wrote:
On 4/1/20 11:06 AM, Marcin Sobczyk wrote:
On 4/1/20 9:51 AM, Marcin Sobczyk wrote:
Hi,
On 4/1/20 8:44 AM, Yedidyah Bar David wrote:
On Wed, Apr 1, 2020 at 6:21 AM <jenkins@jenkins.phx.ovirt.org> wrote:
Project:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
Build:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/ Previous build 1547 passed!, after many months of failing, thanks to Evgeny's work in recent weeks. Above one failed. I think the root cause is that the engine tried to connect to vdsm right after successfully finishing ansible host-deploy, but failed. vdsm.log has:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/...
2020-03-31 22:58:49,773-0400 ERROR (Reactor thread) [vds.dispatcher] uncaptured python exception, closing channel <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:192.168.222.76', 46754, 0, 0) at 0x7f416c150a90> (<class 'ssl.SSLError'>:[X509] no certificate or crl found (_ssl.c:3771) [/usr/lib64/python3.6/asyncore.py|readwrite|110] [/usr/lib64/python3.6/asyncore.py|handle_write_event|442]
[/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_write|74]
[/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168]
[/usr/lib/python3.6/site-packages/vdsm/sslutils.py|handle_write|190] [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_handle_io|194]
[/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_set_up_socket|154])
(betterAsyncore:179)
Not sure what might have caused this. Can anyone have a look? Thanks. Probably caused by https://gerrit.ovirt.org/108016 Looking into this.
Turns out that the patch is not the cause of the error per se - it simply uncovered a different problem - the CA on the hosts is broken:
[root@lago-basic-suite-master-host-0 certs]# openssl x509 -in /etc/pki/vdsm/certs/cacert.pem -text unable to load certificate 139987452258112:error:0909006C:PEM routines:get_name:no start line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE It looks like they have spaces instead of newlines. When I manually replaced the spaces to newlines, openssl is able to read them.
Martin/Dana, couldn't this be caused by any recent changes in ansible-runner integrations?
This looks like a suspect to me:
https://gerrit.ovirt.org/#/c/107683/5/backend/manager/modules/bll/src/main/j...
Build Number: 1548 Build Status: Failure Triggered By: Started by timer
------------------------------------- Changes Since Last Success: ------------------------------------- Changes for Build #1548 [Galit Rosenthal] Fix the repo for suites that weren't moved to no reposync
----------------- Failed Tests: ----------------- No tests ran.
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.
-- Artur Socha Senior Software Engineer, RHV Red Hat <https://www.redhat.com> <https://www.redhat.com>