Nadav,

Here is the code [1] which is responsible for this check. Here is vdsm log [2] where I added logging statement to understand what is commonName value (it was 'engine').

Here are the steps what is done during the check:
1. We get client peer name by calling socket.getpeername()[0] which is:
::ffff:192.168.201.3
2. We get common name from the certificate. It should be engine's fqdn and as in the log we get 'engine'
3. For name use we compare common name and name lookup based on the IP. I pushed [3] a patch to normalizes the the ip (still requires my attention)

Based on the outcome of the logs it seems that 192.168.201.3 does not resolve to 'engine' name.

Thanks,
Piotr

On Sun, Apr 30, 2017 at 12:52 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
Looking at the failure, I'm not sure what is wrong here on the setup
side. The FQDN(lago-basic-suite-master-engine) should be resolvable in
the hosts - at least from what I tested that locally. On the engine
setup.log I see this was the generated certificate(if we're talking
about the same one here):

2017-04-30 06:30:41,308-0400 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.pki.ca
plugin.executeRaw:813 execute:
('/usr/share/ovirt-engine/bin/pki-enroll-pkcs12.sh', '--name=engine',
'--password=**FILTERED**',
'--subject=/C=US/O=Test/CN=lago-basic-suite-master-engine'),
executable='None', cwd='None', env=None
2017-04-30 06:30:44,542-0400 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.pki.ca
plugin.executeRaw:863 execute-result:
('/usr/share/ovirt-engine/bin/pki-enroll-pkcs12.sh', '--name=engine',
'--password=**FILTERED**',
'--subject=/C=US/O=Test/CN=lago-basic-suite-master-engine'), rc=0
2017-04-30 06:30:44,543-0400 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.pki.ca
plugin.execute:921 execute-output:
('/usr/share/ovirt-engine/bin/pki-enroll-pkcs12.sh', '--name=engine',
'--password=**FILTERED**',
'--subject=/C=US/O=Test/CN=lago-basic-suite-master-engine')


Do we expect the '--name' parameter to be the same as the hostname? My
thought was that it should use the engine FQDN, and that should match
the certificate name.

If that is not the problem, can you make the output more verbose in
vdsm logs? so we'll know exactly what name is it looking for.


Thanks

Nadav.

On Sun, Apr 30, 2017 at 1:43 PM, Piotr Kliczewski <pkliczew@redhat.com> wrote:
> The job failed.
>
> Just to be clear. We need to resolve engine name on a host side or use ip
> address.
>
> Thanks,
> Piotr
>
> On Sun, Apr 30, 2017 at 12:23 PM, Piotr Kliczewski <pkliczew@redhat.com>
> wrote:
>>
>> Here is the link
>>
>> http://jenkins.ovirt.org/job/ovirt-system-tests_manual/331/
>>
>> On Sun, Apr 30, 2017 at 12:17 PM, Piotr Kliczewski <pkliczew@redhat.com>
>> wrote:
>>>
>>> Sure, will test
>>>
>>> 30 kwi 2017 12:14 "Nadav Goldin" <ngoldin@redhat.com> napisał(a):
>>>>
>>>> It is under-work in [1], as it requires cross-changes in all suites it
>>>> takes a while to test it/cover all changes, though basic-suite-master
>>>> already passed.
>>>> Can you test it by running OST manual with your changes and the OST
>>>> patch(i.e. put also in GERRIT_REFSPEC: refs/changes/25/76225/7 )
>>>>
>>>>
>>>>
>>>> [1] https://gerrit.ovirt.org/76225
>>>>
>>>> On Sun, Apr 30, 2017 at 1:09 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
>>>> >
>>>> >
>>>> > On Sun, Apr 30, 2017 at 1:03 PM, Piotr Kliczewski
>>>> > <piotr.kliczewski@gmail.com> wrote:
>>>> >>
>>>> >> When we can have it fixed? I checked few minutes ago and the problem
>>>> >> is still there.
>>>> >
>>>> >
>>>> > https://gerrit.ovirt.org/#/c/76225/ should cover this.
>>>> >
>>>> > What I wonder is what caused this in the first place. The SSL change?
>>>> > Y.
>>>> >
>>>> >>
>>>> >>
>>>> >> Thanks,
>>>> >> Piotr
>>>> >>
>>>> >> On Sat, Apr 29, 2017 at 11:18 AM, Piotr Kliczewski
>>>> >> <pkliczew@redhat.com>
>>>> >> wrote:
>>>> >> > Nadav,
>>>> >> >
>>>> >> > Yes, vdsm is not able to resolve 'engine' which is used in engine's
>>>> >> > certificate.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Piotr
>>>> >> >
>>>> >> > 29 kwi 2017 00:37 "Nadav Goldin" <ngoldin@redhat.com> napisał(a):
>>>> >> >
>>>> >> > Hi Piotr,
>>>> >> > Can you clarify what you noticed is not resolvable - the 'engine'
>>>> >> > FQDN
>>>> >> > from host0?
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Nadav.
>>>> >> >
>>>> >> >
>>>> >> > On Fri, Apr 28, 2017 at 4:15 PM, Piotr Kliczewski
>>>> >> > <pkliczew@redhat.com>
>>>> >> > wrote:
>>>> >> >> I started to investigate the issue [1] and it seems like there is
>>>> >> >> an
>>>> >> >> issue
>>>> >> >> in Lago setup we use.
>>>> >> >>
>>>> >> >> During handshake we have a step to verify whether client
>>>> >> >> certificate
>>>> >> >> was
>>>> >> >> issued for a specific host (no such functionality in m2crytpo code
>>>> >> >> base).
>>>> >> >> It works fine when using either ip addresses or fqdns but in this
>>>> >> >> particular
>>>> >> >> setup we use mixed.
>>>> >> >>
>>>> >> >> When added logging I see that in engine certificate we use
>>>> >> >> 'engine'
>>>> >> >> name
>>>> >> >> which is not resolvable on the host side and the check fails.
>>>> >> >> I posted a patch [2] which fixes IPv4 mapped addresses issue but
>>>> >> >> we
>>>> >> >> need
>>>> >> >> to
>>>> >> >> fix the setup issue.
>>>> >> >>
>>>> >> >> Thanks,
>>>> >> >> Piotr
>>>> >> >>
>>>> >> >> [1] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/326/
>>>> >> >> [2] https://gerrit.ovirt.org/#/c/76197/
>>>> >> >>
>>>> >> >> On Thu, Apr 27, 2017 at 3:39 PM, Piotr Kliczewski
>>>> >> >> <pkliczew@redhat.com>
>>>> >> >> wrote:
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> On Thu, Apr 27, 2017 at 3:13 PM, Evgheni Dereveanchin
>>>> >> >>> <ederevea@redhat.com> wrote:
>>>> >> >>>>
>>>> >> >>>> Test failed: 002_bootstrap/add_hosts
>>>> >> >>>>
>>>> >> >>>> Link to suspected patches:
>>>> >> >>>>  https://gerrit.ovirt.org/76107 - ssl: change default library
>>>> >> >>>>
>>>> >> >>>> Link to job:
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6491/
>>>> >> >>>>
>>>> >> >>>> VDSM log:
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6491/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log
>>>> >> >>>>
>>>> >> >>>> Error snippet from VDSM log, this repeats on each connection
>>>> >> >>>> attempt
>>>> >> >>>> from
>>>> >> >>>> Engine side:
>>>> >> >>>>
>>>> >> >>>> <error>
>>>> >> >>>>
>>>> >> >>>> 2017-04-27 06:39:27,768-0400 INFO  (Reactor thread)
>>>> >> >>>> [ProtocolDetector.AcceptorImpl] Accepted connection from
>>>> >> >>>> ::ffff:192.168.201.3:49530 (protocoldetector:74)
>>>> >> >>>> 2017-04-27 06:39:27,898-0400 ERROR (Reactor thread)
>>>> >> >>>> [vds.dispatcher]
>>>> >> >>>> uncaptured python exception, closing channel
>>>> >> >>>> <yajsonrpc.betterAsyncore.Dispatcher connected
>>>> >> >>>> ('::ffff:192.168.201.3',
>>>> >> >>>> 49530, 0, 0) at 0x1cc3b00> (<class 'socket.error'>:Address
>>>> >> >>>> family not
>>>> >> >>>> supported by protocol
>>>> >> >>>> [/usr/lib64/python2.7/asyncore.py|readwrite|110]
>>>> >> >>>> [/usr/lib64/python2.7/asyncore.py|handle_write_event|468]
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>> [/usr/lib/python2.7/site-packages/yajsonrpc/betterAsyncore.py|handle_write|70]
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>> [/usr/lib/python2.7/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|149]
>>>> >> >>>>
>>>> >> >>>> [/usr/lib/python2.7/site-packages/vdsm/sslutils.py|handle_write|213]
>>>> >> >>>>
>>>> >> >>>> [/usr/lib/python2.7/site-packages/vdsm/sslutils.py|_handle_io|223]
>>>> >> >>>>
>>>> >> >>>> [/usr/lib/python2.7/site-packages/vdsm/sslutils.py|_verify_host|237]
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>> [/usr/lib/python2.7/site-packages/vdsm/sslutils.py|compare_names|249])
>>>> >> >>>> (betterAsyncore:160)
>>>> >> >>>>
>>>> >> >>>> </error>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> This means that what we have in the certificate do not match the
>>>> >> >>> source
>>>> >> >>> address we get. I suspect that we issue the certificate for
>>>> >> >>> 192.168.201.3
>>>> >> >>> but when we get ::ffff:192.168.201.3.
>>>> >> >>> The change was verified in the env when ipv4 is used. I pushed a
>>>> >> >>> revert
>>>> >> >>> [1] for now so we can work on fixing the issue.
>>>> >> >>>
>>>> >> >>> [1] https://gerrit.ovirt.org/#/c/76160
>>>> >> >>>
>>>> >> >>>>
>>>> >> >>>> --
>>>> >> >>>> Regards,
>>>> >> >>>> Evgheni Dereveanchin
>>>> >> >>>
>>>> >> >>>
>>>> >> >>
>>>> >> >>
>>>> >> >> _______________________________________________
>>>> >> >> Devel mailing list
>>>> >> >> Devel@ovirt.org
>>>> >> >> http://lists.ovirt.org/mailman/listinfo/devel
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > Devel mailing list
>>>> >> > Devel@ovirt.org
>>>> >> > http://lists.ovirt.org/mailman/listinfo/devel
>>>> >> _______________________________________________
>>>> >> Devel mailing list
>>>> >> Devel@ovirt.org
>>>> >> http://lists.ovirt.org/mailman/listinfo/devel
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Devel mailing list
>>>> > Devel@ovirt.org
>>>> > http://lists.ovirt.org/mailman/listinfo/devel
>>
>>
>