It's possible that the issue was introduced in the patch [1], but as
Arthurs logs showed properly formatted ovirt_ca_cert, so not sure with it.
Arthur/Marcin could you please check command in
ovirt-engine/share/ovirt-engine/ansible-runner-service-project/artifacts
you should see there variables with which the ansible-playbook is executed.
It should be same as you linked but still want to make sure that there
isn't some issue. Also you can check stdout file if there is some issue.
[1]
Martin Necas
On Wed, Apr 1, 2020 at 1:22 PM Artur Socha <asocha(a)redhat.com> wrote:
Posting a public pastebin url [1]. Apologies for using the private
one
before.
[1]
https://pastebin.com/wrw5ME7j
A.
On Wed, Apr 1, 2020 at 12:31 PM Artur Socha <asocha(a)redhat.com> wrote:
>
> Adding request content:
>
http://pastebin.test.redhat.com/850652
>
> A.
>
> On Wed, Apr 1, 2020 at 12:28 PM Artur Socha <asocha(a)redhat.com> wrote:
>>
>> I have debug the flow until the moment the request is being seng via
http client to ansible runner service and until that point it was correct.
The json did contain correctly formatted ovirt_ca_cert.
>> Artur
>>
>> On Wed, Apr 1, 2020 at 12:26 PM Marcin Sobczyk <msobczyk(a)redhat.com>
wrote:
>>>
>>>
>>>
>>> On 4/1/20 11:54 AM, Martin Perina wrote:
>>>
>>>
>>>
>>> On Wed, Apr 1, 2020 at 11:15 AM Marcin Sobczyk <msobczyk(a)redhat.com>
wrote:
>>>>
>>>>
>>>>
>>>> On 4/1/20 11:06 AM, Marcin Sobczyk wrote:
>>>> >
>>>> >
>>>> > On 4/1/20 9:51 AM, Marcin Sobczyk wrote:
>>>> >> Hi,
>>>> >>
>>>> >> On 4/1/20 8:44 AM, Yedidyah Bar David wrote:
>>>> >>> On Wed, Apr 1, 2020 at 6:21 AM
<jenkins(a)jenkins.phx.ovirt.org>
wrote:
>>>> >>>> Project:
>>>> >>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
>>>> >>>>
>>>> >>>> Build:
>>>> >>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/
>>>> >>> Previous build 1547 passed!, after many months of failing,
thanks
to
>>>> >>> Evgeny's work
>>>> >>> in recent weeks. Above one failed.
>>>> >>> I think the root cause is that the engine tried to connect
to vdsm
>>>> >>> right after
>>>> >>> successfully finishing ansible host-deploy, but failed.
vdsm.log
has:
>>>> >>>
>>>> >>>
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/15...
>>>> >>>
>>>> >>>
>>>> >>> 2020-03-31 22:58:49,773-0400 ERROR (Reactor thread)
[vds.dispatcher]
>>>> >>> uncaptured python exception, closing channel
>>>> >>> <yajsonrpc.betterAsyncore.Dispatcher connected
>>>> >>> ('::ffff:192.168.222.76', 46754, 0, 0) at
0x7f416c150a90> (<class
>>>> >>> 'ssl.SSLError'>:[X509] no certificate or crl
found (_ssl.c:3771)
>>>> >>> [/usr/lib64/python3.6/asyncore.py|readwrite|110]
>>>> >>> [/usr/lib64/python3.6/asyncore.py|handle_write_event|442]
>>>> >>>
[/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_write|74]
>>>> >>>
>>>> >>>
[/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168]
>>>> >>>
>>>> >>>
[/usr/lib/python3.6/site-packages/vdsm/sslutils.py|handle_write|190]
>>>> >>>
[/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_handle_io|194]
>>>> >>>
[/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_set_up_socket|154])
>>>> >>> (betterAsyncore:179)
>>>> >>>
>>>> >>> Not sure what might have caused this. Can anyone have a
look?
Thanks.
>>>> >> Probably caused by
https://gerrit.ovirt.org/108016
>>>> >> Looking into this.
>>>> >>
>>>> > Turns out that the patch is not the cause of the error per se - it
simply
>>>> > uncovered a different problem - the CA on the hosts is broken:
>>>> >
>>>> > [root@lago-basic-suite-master-host-0 certs]# openssl x509 -in
>>>> > /etc/pki/vdsm/certs/cacert.pem -text
>>>> > unable to load certificate
>>>> > 139987452258112:error:0909006C:PEM routines:get_name:no start
>>>> > line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE
>>>> It looks like they have spaces instead of newlines.
>>>> When I manually replaced the spaces to newlines, openssl is able to
read
>>>> them.
>>>
>>>
>>> Martin/Dana, couldn't this be caused by any recent changes in
ansible-runner integrations?
>>>
>>> This looks like a suspect to me:
>>>
>>>
https://gerrit.ovirt.org/#/c/107683/5/backend/manager/modules/bll/src/mai...
>>>
>>>>
>>>> >
>>>> >>>
>>>> >>>> Build Number: 1548
>>>> >>>> Build Status: Failure
>>>> >>>> Triggered By: Started by timer
>>>> >>>>
>>>> >>>> -------------------------------------
>>>> >>>> Changes Since Last Success:
>>>> >>>> -------------------------------------
>>>> >>>> Changes for Build #1548
>>>> >>>> [Galit Rosenthal] Fix the repo for suites that
weren't moved to
no
>>>> >>>> reposync
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> -----------------
>>>> >>>> Failed Tests:
>>>> >>>> -----------------
>>>> >>>> No tests ran.
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>> --
>>> Martin Perina
>>> Manager, Software Engineering
>>> Red Hat Czech s.r.o.
>>>
>>>
>>
>>
>> --
>>
>> Artur Socha
>>
>> Senior Software Engineer, RHV
>>
>> Red Hat
>
>
>
> --
>
> Artur Socha
>
> Senior Software Engineer, RHV
>
> Red Hat
--
Artur Socha
Senior Software Engineer, RHV
Red Hat