On Thu, 7 Nov 2019 at 11:23, Nathanaël Blanchet <blanchet(a)abes.fr
<mailto:blanchet@abes.fr>> wrote:
Le 07/11/2019 à 07:18, Roy Golan a écrit :
>
>
> On Thu, 7 Nov 2019 at 00:10, Nathanaël Blanchet <blanchet(a)abes.fr
> <mailto:blanchet@abes.fr>> wrote:
>
>
> Le 05/11/2019 à 21:50, Roy Golan a écrit :
>>
>>
>> On Tue, 5 Nov 2019 at 22:46, Roy Golan <rgolan(a)redhat.com
>> <mailto:rgolan@redhat.com>> wrote:
>>
>>
>>
>> On Tue, 5 Nov 2019 at 20:28, Nathanaël Blanchet
>> <blanchet(a)abes.fr <mailto:blanchet@abes.fr>> wrote:
>>
>>
>> Le 05/11/2019 à 18:22, Roy Golan a écrit :
>>>
>>>
>>> On Tue, 5 Nov 2019 at 19:12, Nathanaël Blanchet
>>> <blanchet(a)abes.fr <mailto:blanchet@abes.fr>>
wrote:
>>>
>>>
>>> Le 05/11/2019 à 13:54, Roy Golan a écrit :
>>>>
>>>>
>>>> On Tue, 5 Nov 2019 at 14:52, Nathanaël
>>>> Blanchet <blanchet(a)abes.fr
>>>> <mailto:blanchet@abes.fr>> wrote:
>>>>
>>>> I tried openshift-install after compiling
>>>> but no ovirt provider is available... So
>>>> waht do you mean when you say "give a
>>>> try"? Maybe only provisionning ovirt with
>>>> the terraform module?
>>>>
>>>> [root@vm5 installer]#
>>>> bin/openshift-install create cluster
>>>> ? Platform [Use arrows to move, space to
>>>> select, type to filter, ? for more help]
>>>> > aws
>>>> azure
>>>> gcp
>>>> openstack
>>>>
>>>>
>>>> Its not merged yet. Please pull this image and
>>>> work with it as a container
>>>> quay.io/rgolangh/openshift-installer
>>>> <
http://quay.io/rgolangh/openshift-installer>
>>>
>>> A little feedback as you asked:
>>>
>>> [root@openshift-installer ~]# docker run -it
>>> 56e5b667100f create cluster
>>> ? Platform ovirt
>>> ? Enter oVirt's api endpoint URL
>>>
https://air-dev.v100.abes.fr/ovirt-engine/api
>>> ? Enter ovirt-engine username admin@internal
>>> ? Enter password **********
>>> ? Pick the oVirt cluster Default
>>> ? Pick a VM template centos7.x
>>> ? Enter the internal API Virtual IP 10.34.212.200
>>> ? Enter the internal DNS Virtual IP 10.34.212.100
>>> ? Enter the ingress IP 10.34.212.50
>>> ? Base Domain oc4.localdomain
>>> ? Cluster Name test
>>> ? Pull Secret [? for help]
>>> *************************************
>>> INFO Creating infrastructure resources...
>>> INFO Waiting up to 30m0s for the Kubernetes API
>>> at
https://api.test.oc4.localdomain:6443...
>>> ERROR Attempted to gather ClusterOperator
>>> status after installation failure: listing
>>> ClusterOperator objects: Get
>>>
https://api.test.oc4.localdomain:6443/apis/config.openshift.io/v1/cluster...:
>>> dial tcp: lookup api.test.oc4.localdomain on
>>> 10.34.212.100:53 <
http://10.34.212.100:53>: no
>>> such host
>>> INFO Pulling debug logs from the bootstrap machine
>>> ERROR Attempted to gather debug logs after
>>> installation failure: failed to create SSH
>>> client, ensure the proper ssh key is in your
>>> keyring or specify with --key: failed to
>>> initialize the SSH agent: failed to read
>>> directory "/output/.ssh": open /output/.ssh:
no
>>> such file or directory
>>> FATAL Bootstrap failed to complete: waiting for
>>> Kubernetes API: context deadline exceeded
>>>
>>> * 6 vms are successfully created thin
>>> dependent from the template
>>>
>>> * each vm is provisionned by cloud-init
>>> * the step "INFO Waiting up to 30m0s for the
>>> Kubernetes API at
>>>
https://api.test.oc4.localdomain:6443..."
>>> fails. It seems that the DNS pod is not up
>>> at this time.
>>> * Right this moment, there is no more
>>> visibility on what is done, what goes
>>> wrong... what's happening there? supposing
>>> a kind of playbook downloading a kind of
>>> images...
>>> * The" pull secret step" is not clear: we
>>> must have a redhat account to
>>>
https://cloud.redhat.com/openshift/install/
>>> to get a key like
>>> *
>>>
>>> {"auths":{"cloud.openshift.com
>>>
<
http://cloud.openshift.com>":{"auth":"b3BlbnNoaWZ0...
>>> <mailto:exploit@abes.fr>},"quay.io
>>>
<
http://quay.io>":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2Ut...
>>>
<mailto:exploit@abes.fr>},"registry.connect.redhat.com
>>>
<
http://registry.connect.redhat.com>":{"auth":"NTI0...
>>>
<mailto:exploit@abes.fr>},"registry.redhat.io
>>>
<
http://registry.redhat.io>":{"auth":"NTI0MjkwMnx1a...
>>> <mailto:exploit@abes.fr>}}}
>>>
>>> Can you tell me if I'm doing wrong?
>>>
>>>
>>> What is the template you are using? I don't think
>>> its RHCOS(Red Hat CoreOs) template, it looks like
>>> Centos?
>>>
>>> Use this gist to import the template
>>>
https://gist.github.com/rgolangh/adccf6d6b5eaecaebe0b0aeba9d3331b
>> Unfortunately, the result is the same with the RHCOS
>> template...
>>
>>
>> Make sure that:
>> - the IPs supplied are taken, and belong to the VM
>> network of those master VMs
>> - localdomain or local domain suffix shouldn't be used
>> - your ovirt-engine is version 4.3.7 or master
>>
>> I didn't mention that you can provide any domain name, even
>> non-existing.
>> When the bootstrap phase will be done, the instllation will
>> teardown the bootsrap mahchine.
>> At this stage if you are using a non-existing domain you
>> would need to add the DNS Virtual IP
>> you provided to your resolv.conf so the installation could
>> resolve api.$CLUSTER_NAME.$CLUSTER_DOMAIN.
>>
>> Also, you have a log under your
>> $INSTALL_DIR/.openshift_install.log
>
> I tried several things with your advices, but I'm still
> stuck at the
>
https://api.test.oc4.localdomain:6443/version?timeout=32s test
>
> with logs:
>
> time="2019-11-06T20:21:15Z" level=debug msg="Still waiting
> for the Kubernetes API: the server could not find the
> requested resource"
>
> So it means DNS resolution and network are now good and
> ignition provisionning is is OK but something goes wrong with
> the bootstrap vm.
>
> Now if I log into the bootstrap vm, I can see a selinux
> message, but it may be not relevant...
>
> SELinux: mount invalid. Same Superblock, different security
> settings for (dev nqueue, type nqueue).
>
> Some other cluewWith journalctl:
>
> journalctl -b -f -u bootkube
>
> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>
{"level":"warn","ts":"2019-11-06T21:55:40.661Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
> of unary invoker
>
failed","target":"endpoint://client-7beef51d-daad-4b46-9497-8e135e528f7c/etcd-1.test.oc4.localdomain:2379","attempt":0,"error":"rpc
> error: code = DeadlineExceeded desc = latest connection
> error: connection error: desc = \"transport: Error while
> dialing dial tcp: lookup etcd-1.test.oc4.localdomain on
> 10.34.212.101:53 <
http://10.34.212.101:53>: no such host\""}
> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
> of unary invoker
>
failed","target":"endpoint://client-03992fc6-5a87-4160-9b87-44ec6e82f7cd/etcd-2.test.oc4.localdomain:2379","attempt":0,"error":"rpc
> error: code = DeadlineExceeded desc = latest connection
> error: connection error: desc = \"transport: Error while
> dialing dial tcp: lookup etcd-2.test.oc4.localdomain on
> 10.34.212.101:53 <
http://10.34.212.101:53>: no such host\""}
> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
> of unary invoker
>
failed","target":"endpoint://client-00db28a7-5188-4666-896b-e37c88ad3ae9/etcd-0.test.oc4.localdomain:2379","attempt":0,"error":"rpc
> error: code = DeadlineExceeded desc = latest connection
> error: connection error: desc = \"transport: Error while
> dialing dial tcp: lookup etcd-0.test.oc4.localdomain on
> 10.34.212.101:53 <
http://10.34.212.101:53>: no such host\""}
> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>
https://etcd-1.test.oc4.localdomain:2379 is unhealthy: failed
> to commit proposal: context deadline exceeded
> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>
https://etcd-2.test.oc4.localdomain:2379 is unhealthy: failed
> to commit proposal: context deadline exceeded
> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>
https://etcd-0.test.oc4.localdomain:2379 is unhealthy: failed
> to commit proposal: context deadline exceeded
> Nov 06 21:55:40 localhost bootkube.sh[2101]: Error: unhealthy
> cluster
> Nov 06 21:55:40 localhost podman[61210]: 2019-11-06
> 21:55:40.720514151 +0000 UTC m=+5.813853296 container died
> 7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01
>
(
image=registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:6...
>
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...;,
> name=etcdctl)
> Nov 06 21:55:40 localhost podman[61210]: 2019-11-06
> 21:55:40.817475095 +0000 UTC m=+5.910814273 container remove
> 7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01
>
(
image=registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:6...
>
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...;,
> name=etcdctl)
> Nov 06 21:55:40 localhost bootkube.sh[2101]: etcdctl failed.
> Retrying in 5 seconds...
>
> It seems to be again a dns resolution issue.
>
> [user1@localhost ~]$ dig api.test.oc4.localdomain +short
> 10.34.212.201
>
> [user1@localhost ~]$ dig etcd-2.test.oc4.localdomain +short
> nothing
>
>
> So what do you think about that?
>
>
> Key here is the masters - they need to boot, get ignition from
> the bootstrap machine and start publishing their IPs and hostnames.
>
> Connect to a master, check its hostname, check its running or
> failing containers `crictl ps -a` by root user.
You were right:
# crictl ps -a
CONTAINER ID IMAGE CREATED STATE NAME
ATTEMPT POD ID
744cb8e654705
e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8 4
minutes ago Running discovery 75
9462e9a8ca478
912ba9db736c3
e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
14 minutes ago Exited discovery
74 9462e9a8ca478
# crictl logs 744cb8e654705
E1107 08:10:04.262330 1 run.go:67] error looking up self for
candidate IP 10.34.212.227 <
http://10.34.212.227>: lookup
_etcd-server-ssl._tcp.test.oc4.localdomain on 10.34.212.51:53
<
http://10.34.212.51:53>: no such host
# hostname
localhost
Conclusion: discovery didn't publish IPs and hostname to coreDNS
because the master didn't get its name
master-0.test.oc4.localdomain during provisionning phase.
I changed the master-0 hostname and reinitiates ignition to verify:
# hostnamectl set-hostname master-0.test.oc4.localdomain
# touch /boot/ignition.firstboot && rm -rf /etc/machine-id && reboot
After reboot is completed, no more exited discovery container:
CONTAINER ID IMAGE CREATED STATE NAME
ATTEMPT POD ID
e701efa8bc583
77ec5e26cc676ef2bf5c42dd40e55394a11fb45a3e2d7e95cbaf233a1eef472f
20 seconds ago Running coredns
1 cbabc53322ac8
2c7bc6abb5b65
d73eca122bd567a3a1f70fa5021683bc17dd87003d05d88b1cdd0215c55049f6
20 seconds ago Running mdns-publisher
1 6f8914ff9db35
b3f619d5afa2c
7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
21 seconds ago Running haproxy-monitor
1 0e5c209496787
07769ce79b032
7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
21 seconds ago Running keepalived-monitor
1 02cf141d01a29
fb20d66b81254
e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
21 seconds ago Running discovery
77 562f32067e0a7
476b07599260e
86a34bc5edd3e70073313f97bfd51ed8937658b341dc52334fb98ea6896ebdc2
22 seconds ago Running haproxy
1 0e5c209496787
26b53050a412b
9f94e500f85a735ec212ffb7305e0b63f7151a5346e41c2d5d293c8456f6fa42
22 seconds ago Running keepalived
1 02cf141d01a29
30ce48453854b
7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
22 seconds ago Exited render-config
1 cbabc53322ac8
ad3ab0ae52077
7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
22 seconds ago Exited render-config
1 6f8914ff9db35
650d62765e9e1
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:9a7e829...
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...
13 hours ago Exited coredns
0 2ae0512b3b6ac
481969ce49bb9
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:7681941...
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...
13 hours ago Exited mdns-publisher
0 d49754042b792
3594d9d261ca7
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:8c3b022...
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...
13 hours ago Exited haproxy-monitor
0 3476219058ba8
88b13ec02a5c1
7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
13 hours ago Exited keepalived-monitor
0 a3e13cf07c04f
1ab721b5599ed
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:629d73f...
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...
13 hours ago
because DNS registration is OK:
[user1@master-0 ~]$ dig etcd-0.test.oc4.localdomain +short
10.34.212.227
CONCLUSION:
* none of rhcos vm is correctly provisionned to their targeted
hostname, so they all stay with localhost.
What is your engine version? the hostname support for ignition is
merged into 4.3.7 and master
4.3.7.1-1.el7
I only upgraded engine and not vdsm on hosts, but I suppose hosts are
not important for ignition
* Cloud-init syntax for the hostname is ok, but it is not
provisioned by ignition:
Why not provisionning these hostnames with a json snippet or else?
|{"ignition":{"version":"2.2.0"},"storage":{"files":[{"filesystem":"root","path":"/etc/hostname","mode":420,"contents":{"source":"data:,master-0.test.oc4.localdomain"}}]}}|
>
>
>>
>>
>>>
>>>
>>>
>>>> Le 05/11/2019 à 12:24, Roy Golan a écrit :
>>>>>
>>>>>
>>>>> On Tue, 5 Nov 2019 at 13:22, Nathanaël
>>>>> Blanchet <blanchet(a)abes.fr
>>>>> <mailto:blanchet@abes.fr>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm interested by installing okd on
>>>>> ovirt with the official openshift
>>>>> installer
>>>>>
(
https://github.com/openshift/installer),
>>>>> but ovirt is not yet
>>>>> supported.
>>>>>
>>>>>
>>>>> If you want to give a try and supply
>>>>> feedback I'll be glad.
>>>>>
>>>>> Regarding
>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1578255
>>>>> and
>>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/EF7OQUVTY53G...
>>>>>
>>>>> , how ovirt 4.3.7 should integrate
>>>>> openshift installer integration with
>>>>> terraform?
>>>>>
>>>>>
>>>>> Terraform is part of it, yes, It is what
>>>>> we use to spin the first 3 masters, plus
>>>>> a bootstraping machine.
>>>>>
>>>>> --
>>>>> Nathanaël Blanchet
>>>>>
>>>>> Supervision réseau
>>>>> Pôle Infrastrutures Informatiques
>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>> 34193 MONTPELLIER CEDEX 5
>>>>> Tél. 33 (0)4 67 54 84 55
>>>>> Fax 33 (0)4 67 54 84 14
>>>>> blanchet(a)abes.fr
>>>>> <mailto:blanchet@abes.fr>
>>>>>
>>>> --
>>>> Nathanaël Blanchet
>>>>
>>>> Supervision réseau
>>>> Pôle Infrastrutures Informatiques
>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>> 34193 MONTPELLIER CEDEX 5
>>>> Tél. 33 (0)4 67 54 84 55
>>>> Fax 33 (0)4 67 54 84 14
>>>> blanchet(a)abes.fr <mailto:blanchet@abes.fr>
>>>>
>>> --
>>> Nathanaël Blanchet
>>>
>>> Supervision réseau
>>> Pôle Infrastrutures Informatiques
>>> 227 avenue Professeur-Jean-Louis-Viala
>>> 34193 MONTPELLIER CEDEX 5
>>> Tél. 33 (0)4 67 54 84 55
>>> Fax 33 (0)4 67 54 84 14
>>> blanchet(a)abes.fr <mailto:blanchet@abes.fr>
>>>
>> --
>> Nathanaël Blanchet
>>
>> Supervision réseau
>> Pôle Infrastrutures Informatiques
>> 227 avenue Professeur-Jean-Louis-Viala
>> 34193 MONTPELLIER CEDEX 5
>> Tél. 33 (0)4 67 54 84 55
>> Fax 33 (0)4 67 54 84 14
>> blanchet(a)abes.fr <mailto:blanchet@abes.fr>
>>
> --
> Nathanaël Blanchet
>
> Supervision réseau
> Pôle Infrastrutures Informatiques
> 227 avenue Professeur-Jean-Louis-Viala
> 34193 MONTPELLIER CEDEX 5
> Tél. 33 (0)4 67 54 84 55
> Fax 33 (0)4 67 54 84 14
> blanchet(a)abes.fr <mailto:blanchet@abes.fr>
>
--
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr <mailto:blanchet@abes.fr>
--
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr