[ovirt-users] Engine setup: insistent DNS demand
Yedidyah Bar David
didi at redhat.com
Tue Nov 10 08:26:24 UTC 2015
On Mon, Nov 9, 2015 at 8:27 PM, Jamie Lawrence
<jlawrence at squaretrade.com> wrote:
> On 8 Nov 2015, at 1:32, Yedidyah Bar David wrote:
>
>> On Sat, Nov 7, 2015 at 3:00 AM, Jamie Lawrence
>> <jlawrence at squaretrade.com> wrote:
>
>
>>> I’m attempting to run engine-setup, and get to the DNS reverse lookup of
>>> the
>>> FQDN. The machine has two (bonded) interfaces, one for storage and one
>>> for
>>> everything else. The “everything else” network has DNS service, the
>>> storage
>>> network doesn’t, and this seems to make engine-setup cranky. /etc/hosts
>>> is
>>> properly set up for the storage network, but that apparently doesn’t
>>> count.
>>> I tried running with the -offline flag, but that apparently still expects
>>> DNS.
(-offline affects only package management, e.g. to prevent an upgrade)
>>
>>
>> IIUC engine-setup never fails on missing DNS resolution, only warns.
Sorry, that was wrong. Let me sketch the flow:
We query for the fqdn. let's mark the answer FQDN.
We then check what FQDN resolves to using getaddrinfo, let's mark the
result resolvedAddresses.
We also lookup FQDN in the dns (using 'dig'). If it does not resolve, we warn.
If it does resolve in the dns, *and* a special variable is set, we reverse
lookup each of resolvedAddresses (dig -x), and if none of the results
matches FQDN, we fail with the error you see.
This special variable is set, by default, only if you configured all-in-one.
This is the place where it failed for you. If it didn't, we then:
If another special variable is set (which is also set by default only
in all-in-one), we check if resolvedAddresses is a subset of the addresses
of non-loopback local interfaces, and fail if not. This is the only place
where we actually check local addresses.
getaddrinfo means we effectively use the local resolver as configured by
you in /etc/nsswitch.conf, by default looking first in /etc/hosts and then
dns.
>
>
> I may well be missing something or otherwise being bone-headed, but I am
> getting [ Error ] messages, which it doesn’t allow me to skip.
>
>>> Details:
>>> ovirt-engine.noarch 0:3.6.0.3-1.el7.centos
>>> ovirt-engine-setup-plugin-allinone.noarch 0:3.6.0.3-1.el7.centos
>>>
>>> CentOS Linux release 7.1.1503 (Core)
>
>
>> Please check/post setup logs. Thanks!
>
>
> The full log is pushing 350k; unless you want, I’m not going to do that to
> the mailing list.
Well, you can use other means for that, such as various
pastbins/file sharing sites/whatever.
>
> The relevant portion (starting a bit before) seems to be:
>
> - - - Snip - - -
> 2015-11-06 16:52:26 DEBUG otopi.plugins.otopi.dialog.human
> dialog.__logString:219 DIALOG:SEND Local storage domain name
> [
> local_storage]:
> 2015-11-06 16:52:27 DEBUG otopi.context context.dumpEnvironment:500
> ENVIRONMENT DUMP - BEGIN
> 2015-11-06 16:52:27 DEBUG otopi.context context.dumpEnvironment:510 ENV
> OVESETUP_AIO/storageDomainDir=str:’/mnt/gluster/vm-img-brick-1/gv
> 0'
> 2015-11-06 16:52:27 DEBUG otopi.context context.dumpEnvironment:510 ENV
> OVESETUP_AIO/storageDomainName=str:'local_storage'
> 2015-11-06 16:52:27 DEBUG otopi.context context.dumpEnvironment:510 ENV
> OVESETUP_SYSTEM/selinuxContexts=list:'[{'pattern': ‘/mnt/gluster/
> vm-img-brick-1/gv0(/.*)?', 'type': 'public_content_rw_t'}]'
> 2015-11-06 16:52:27 DEBUG otopi.context context.dumpEnvironment:510 ENV
> OVESETUP_SYSTEM/selinuxRestorePaths=list:’[‘/mnt/gluster/vm-img-b
> rick-1/gv0']'
> 2015-11-06 16:52:27 DEBUG otopi.context context.dumpEnvironment:514
> ENVIRONMENT DUMP - END
> 2015-11-06 16:52:27 DEBUG otopi.context context._executeMethod:142 Stage
> customization METHOD otopi.plugins.ovirt_engine_setup.ovirt_engi
> ne_common.dialog.titles.Plugin._title_e_allinone
> 2015-11-06 16:52:27 DEBUG otopi.context context._executeMethod:142 Stage
> customization METHOD otopi.plugins.ovirt_engine_setup.ovirt_engi
> ne_common.dialog.titles.Plugin._title_s_network
> 2015-11-06 16:52:27 DEBUG otopi.plugins.otopi.dialog.human
> dialog.__logString:219 DIALOG:SEND
> 2015-11-06 16:52:27 DEBUG otopi.plugins.otopi.dialog.human
> dialog.__logString:219 DIALOG:SEND --== NETWORK
> CONFIGURATION
> ==--
> 2015-11-06 16:52:27 DEBUG otopi.plugins.otopi.dialog.human
> dialog.__logString:219 DIALOG:SEND
> 2015-11-06 16:52:27 DEBUG otopi.context context._executeMethod:142 Stage
> customization METHOD otopi.plugins.ovirt_engine_common.base.netw
> ork.hostname.Plugin._customization
> 2015-11-06 16:52:27 DEBUG otopi.plugins.otopi.dialog.human
> human.queryString:156 query OVESETUP_NETWORK_FQDN_this
> DIALOG:SEND Host fully qualified DNS na
> me of this server [box-3.squaretrade.com]:
So here you were asked about FQDN, and accepted the default, which was
box-3.squaretrade.com
> 2015-11-06 16:52:29 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname
> hostname._validateFQDNresolvability:195 box-3
> .squaretrade.com resolves to: set(['172.16.1.13'])
That's resolvedAddresses, which contains exactly one address,
172.16.1.13.
> 2015-11-06 16:52:29 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname
> plugin.executeRaw:828 execute: ['/bin/dig', ‘box
> -3.squaretrade.com'], executable='None', cwd='None', env=None
> 2015-11-06 16:52:29 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname
> plugin.executeRaw:878 execute-result: ['/bin/dig', ‘box
> -3.squaretrade.com'], rc=0
> 2015-11-06 16:52:29 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname plugin.execute:936
> execute-output: ['/bin/dig', ‘box-3.
> squaretrade.com'] stdout:
>
> ; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.5 <<>> box-3.squaretrade.com
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62281
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
>
> ;; QUESTION SECTION:
> ;box-3.squaretrade.com. IN A
>
> ;; ANSWER SECTION:
> box-3.squaretrade.com. 83752 IN A 10.180.202.13
That's the result of a dns lookup for FQDN, 10.180.202.13 .
As you can see, we already might have a problem here, as this is
different. I admit I did not fully understand what you are trying to
do, but this seems like a contradiction between what you wrote in the
dns and what you wrote in /etc/hosts. As explained above, this would
have caused a failure eventually, if we didn't fail earlier.
>
> ;; Query time: 0 msec
> ;; SERVER: 10.22.10.253#53(10.22.10.253)
> ;; WHEN: Fri Nov 06 16:52:29 PST 2015
> ;; MSG SIZE rcvd: 65
>
>
> 2015-11-06 16:52:29 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname plugin.execute:941
> execute-output: ['/bin/dig', ‘box-3.
> squaretrade.com'] stderr:
>
>
> 2015-11-06 16:52:29 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname
> plugin.executeRaw:828 execute: ['/bin/dig', '-x', '172.
> 16.1.13'], executable='None', cwd='None', env=None
> 2015-11-06 16:52:44 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname
> plugin.executeRaw:878 execute-result: ['/bin/dig', '-x'
> , '172.16.1.13'], rc=9
> 2015-11-06 16:52:44 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname plugin.execute:936
> execute-output: ['/bin/dig', '-x', '
> 172.16.1.13'] stdout:
>
> ; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.5 <<>> -x 172.16.1.13
> ;; global options: +cmd
> ;; connection timed out; no servers could be reached
Here we try to reverse-lookup 172.16.1.13 in the dns, and timeout.
Not sure why. Perhaps your dns server is configured to not reply to queries
about private addresses, or something like that - I'd expect here a reply
NXDOMAIN (or some actual answer).
>
> 2015-11-06 16:52:44 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname plugin.execute:941
> execute-output: ['/bin/dig', '-x', '
> 172.16.1.13'] stderr:
>
>
> 2015-11-06 16:52:44 DEBUG
> otopi.plugins.ovirt_engine_common.base.network.hostname
> hostname.test_hostname:323 test_hostname exception
> Traceback (most recent call last):
> File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/hostname.py", line
> 320, in test_hostname
> self._validateFQDNresolvability(name)
> File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/hostname.py", line
> 232, in _validateFQDNresolvability
> fqdn=fqdn
> RuntimeError: The following addresses: 172.16.1.13 did not reverse resolve
> into sfo-mgmt-prod-3.squaretrade.com
> 2015-11-06 16:52:44 ERROR
> otopi.plugins.ovirt_engine_common.base.network.hostname
> dialog.queryEnvKey:115 Host name is not valid: The foll
> owing addresses: 172.16.1.13 did not reverse resolve into
> box-3.squaretrade.com
And this is the error message you received, which I think makes sense even
without understanding how engine-setup works.
> 2015-11-06 16:52:44 DEBUG otopi.plugins.otopi.dialog.human
> human.queryString:156 query OVESETUP_NETWORK_FQDN_this
> 2015-11-06 16:52:44 DEBUG otopi.plugins.otopi.dialog.human
> dialog.__logString:219 DIALOG:SEND Host fully qualified DNS
> na
> me of this server [box-3.squaretrade.com]:
> 2015-11-06 16:52:47 DEBUG otopi.context context._executeMethod:156 method
> exception
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/otopi/context.py", line 146, in
> _executeMethod
> method['method']()
> File
> "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-common/base/network/hostname.py",
> line 81, in _customization
> supply_default=True,
> File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/hostname.py", line
> 349, in getHostname
> 'test': test_hostname,
> File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/dialog.py", line
> 103, in queryEnvKey
> default=default,
> File "/usr/share/otopi/plugins/otopi/dialog/human.py", line 174, in
> queryString
> value = self._readline(hidden=hidden)
> File "/usr/lib/python2.7/site-packages/otopi/dialog.py", line 261, in
> _readline
> value = self.__input.readline()
> File "/usr/lib/python2.7/site-packages/otopi/main.py", line 63, in _signal
> raise RuntimeError("SIG%s" % signum)
> RuntimeError: SIG2
> 2015-11-06 16:52:47 ERROR otopi.context context._executeMethod:165 Failed to
> execute stage 'Environment customization': SIG2
> 2015-11-06 16:52:47 DEBUG otopi.context context.dumpEnvironment:500
> ENVIRONMENT DUMP - BEGIN
>
> - - - Snip - - -
>
> It looks like it resolves the non-storage network fine, tries to do the same
> for the storage network, can’t, and errors out.
No, it only tries the result of a lookup for FQDN.
As I wrote above, I do not fully understand what you try to do.
May I suggest that you simply use, everywhere, both in the dns and in
/etc/hosts,
different names for the different addresses.
E.g. if your storage interface is 172.16.1.13, give it a different name, say
box-3-storage.squaretrade.com or whatever. Do this either in the dns, or in
/etc/hosts, or both, but if both - same at both. And same for the other address.
And when asked about FQDN, input the one you want to access the engine with.
>
> Thanks for looking.
Thanks for the report.
If you feel, reading the above analysis, that engine-setup should behave
differently, by all means go ahead and open a bug. But if you do, please
describe exactly what you want.
And, if what you want is "Please add a flag or whatever that will allow me
to override all this name lookup mess and just make engine-setup do what I
say", please consider that the current behavior actually did find something
which I personally think is unintended, so it helped you catch it now instead
of perhaps spending much more time, during a much less comfortable situation,
when something actually breaks due to this.
Best,
--
Didi
More information about the Users
mailing list