On Thu, 7 Nov 2019 at 00:10, Nathanaël Blanchet <blanchet(a)abes.fr> wrote:
Le 05/11/2019 à 21:50, Roy Golan a écrit :
On Tue, 5 Nov 2019 at 22:46, Roy Golan <rgolan(a)redhat.com> wrote:
>
>
> On Tue, 5 Nov 2019 at 20:28, Nathanaël Blanchet <blanchet(a)abes.fr> wrote:
>
>>
>> Le 05/11/2019 à 18:22, Roy Golan a écrit :
>>
>>
>>
>> On Tue, 5 Nov 2019 at 19:12, Nathanaël Blanchet <blanchet(a)abes.fr>
>> wrote:
>>
>>>
>>> Le 05/11/2019 à 13:54, Roy Golan a écrit :
>>>
>>>
>>>
>>> On Tue, 5 Nov 2019 at 14:52, Nathanaël Blanchet <blanchet(a)abes.fr>
>>> wrote:
>>>
>>>> I tried openshift-install after compiling but no ovirt provider is
>>>> available... So waht do you mean when you say "give a try"?
Maybe only
>>>> provisionning ovirt with the terraform module?
>>>>
>>>> [root@vm5 installer]# bin/openshift-install create cluster
>>>> ? Platform [Use arrows to move, space to select, type to filter, ?
>>>> for more help]
>>>> > aws
>>>> azure
>>>> gcp
>>>> openstack
>>>>
>>>>
>>>>
>>> Its not merged yet. Please pull this image and work with it as a
>>> container
>>> quay.io/rgolangh/openshift-installer
>>>
>>> A little feedback as you asked:
>>>
>>> [root@openshift-installer ~]# docker run -it 56e5b667100f create
>>> cluster
>>> ? Platform ovirt
>>> ? Enter oVirt's api endpoint URL
>>>
https://air-dev.v100.abes.fr/ovirt-engine/api
>>> ? Enter ovirt-engine username admin@internal
>>> ? Enter password **********
>>> ? Pick the oVirt cluster Default
>>> ? Pick a VM template centos7.x
>>> ? Enter the internal API Virtual IP 10.34.212.200
>>> ? Enter the internal DNS Virtual IP 10.34.212.100
>>> ? Enter the ingress IP 10.34.212.50
>>> ? Base Domain oc4.localdomain
>>> ? Cluster Name test
>>> ? Pull Secret [? for help] *************************************
>>> INFO Creating infrastructure resources...
>>> INFO Waiting up to 30m0s for the Kubernetes API at
>>>
https://api.test.oc4.localdomain:6443...
>>> ERROR Attempted to gather ClusterOperator status after installation
>>> failure: listing ClusterOperator objects: Get
>>>
https://api.test.oc4.localdomain:6443/apis/config.openshift.io/v1/cluster...:
>>> dial tcp: lookup api.test.oc4.localdomain on 10.34.212.100:53: no such
>>> host
>>> INFO Pulling debug logs from the bootstrap machine
>>> ERROR Attempted to gather debug logs after installation failure: failed
>>> to create SSH client, ensure the proper ssh key is in your keyring or
>>> specify with --key: failed to initialize the SSH agent: failed to read
>>> directory "/output/.ssh": open /output/.ssh: no such file or
directory
>>> FATAL Bootstrap failed to complete: waiting for Kubernetes API: context
>>> deadline exceeded
>>>
>>> - 6 vms are successfully created thin dependent from the template
>>>
>>>
>>> - each vm is provisionned by cloud-init
>>> - the step "INFO Waiting up to 30m0s for the Kubernetes API at
>>>
https://api.test.oc4.localdomain:6443..." fails. It seems that the
>>> DNS pod is not up at this time.
>>> - Right this moment, there is no more visibility on what is done,
>>> what goes wrong... what's happening there? supposing a kind of
playbook
>>> downloading a kind of images...
>>> - The" pull secret step" is not clear: we must have a redhat
>>> account to
https://cloud.redhat.com/openshift/install/ to get a key
>>> like
>>> -
>>> {"auths":{"cloud.openshift.com
>>>
":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K2V4cGxvaXRhYmVzZnIxdGN0ZnR0dmFnMHpuazMxd2IwMnIwenV1MDg6TE9XVzFQODM1NzNJWlI4MlZDSUEyTFdEVlJJS0U5VTVWM0NTSUdOWjJH********************==","email":
>>> "exploit(a)abes.fr" <exploit(a)abes.fr>},"quay.io
>>>
":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K2V4cGxvaXRhYmVzZnIxdGN0ZnR0dmFnMHpuazMxd2IwMnIwenV1MDg6TE9XVzFQODM1NzNJWlI4MlZDSUEyTFdEVlJJS0U5VTVWM0NTSUdOWjJH********************==","email":
>>> "exploit(a)abes.fr"
<exploit(a)abes.fr>},"registry.connect.redhat.com
>>>
":{"auth":"NTI0MjkwMnx1aGMtMVRDVEZUVFZBRzBaTkszMXdCMDJSMFp1VTA4OmV5SmhiR2NpT2lKU1V6VXhNaUo5LmV5SnpkV0lpT2lJMk4ySTJNREV3WXpObE1HSTBNbVE0T1RGbVpUZGxa**********************","email":
>>> "exploit(a)abes.fr"
<exploit(a)abes.fr>},"registry.redhat.io
>>>
":{"auth":"NTI0MjkwMnx1aGMtMVRDVEZUVFZBRzBaTkszMXdCMDJSMFp1VTA4OmV5SmhiR2NpT2lKU1V6VXhNaUo5LmV5SnpkV0lpT2lJMk4ySTJNREV3WXpObE1HSTBNbVE0T1RGbVpUZGxa**********************","email":
>>> "exploit(a)abes.fr" <exploit(a)abes.fr>}}}
>>>
>>>
>>> Can you tell me if I'm doing wrong?
>>>
>>
>> What is the template you are using? I don't think its RHCOS(Red Hat
>> CoreOs) template, it looks like Centos?
>>
>> Use this gist to import the template
>>
https://gist.github.com/rgolangh/adccf6d6b5eaecaebe0b0aeba9d3331b
>>
>> Unfortunately, the result is the same with the RHCOS template...
>>
>
> Make sure that:
> - the IPs supplied are taken, and belong to the VM network of those
> master VMs
> - localdomain or local domain suffix shouldn't be used
> - your ovirt-engine is version 4.3.7 or master
>
> I didn't mention that you can provide any domain name, even non-existing.
When the bootstrap phase will be done, the instllation will teardown the
bootsrap mahchine.
At this stage if you are using a non-existing domain you would need to add
the DNS Virtual IP
you provided to your resolv.conf so the installation could resolve
api.$CLUSTER_NAME.$CLUSTER_DOMAIN.
Also, you have a log under your $INSTALL_DIR/.openshift_install.log
I tried several things with your advices, but I'm still stuck at the
https://api.test.oc4.localdomain:6443/version?timeout=32s test
with logs:
time="2019-11-06T20:21:15Z" level=debug msg="Still waiting for the
Kubernetes API: the server could not find the requested resource"
So it means DNS resolution and network are now good and ignition
provisionning is is OK but something goes wrong with the bootstrap vm.
Now if I log into the bootstrap vm, I can see a selinux message, but it
may be not relevant...
SELinux: mount invalid. Same Superblock, different security settings for
(dev nqueue, type nqueue).
Some other cluewWith journalctl:
journalctl -b -f -u bootkube
Nov 06 21:55:40 localhost bootkube.sh[2101]:
{"level":"warn","ts":"2019-11-06T21:55:40.661Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
of unary invoker
failed","target":"endpoint://client-7beef51d-daad-4b46-9497-8e135e528f7c/etcd-1.test.oc4.localdomain:2379","attempt":0,"error":"rpc
error: code = DeadlineExceeded desc = latest connection error: connection
error: desc = \"transport: Error while dialing dial tcp: lookup
etcd-1.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
Nov 06 21:55:40 localhost bootkube.sh[2101]:
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
of unary invoker
failed","target":"endpoint://client-03992fc6-5a87-4160-9b87-44ec6e82f7cd/etcd-2.test.oc4.localdomain:2379","attempt":0,"error":"rpc
error: code = DeadlineExceeded desc = latest connection error: connection
error: desc = \"transport: Error while dialing dial tcp: lookup
etcd-2.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
Nov 06 21:55:40 localhost bootkube.sh[2101]:
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
of unary invoker
failed","target":"endpoint://client-00db28a7-5188-4666-896b-e37c88ad3ae9/etcd-0.test.oc4.localdomain:2379","attempt":0,"error":"rpc
error: code = DeadlineExceeded desc = latest connection error: connection
error: desc = \"transport: Error while dialing dial tcp: lookup
etcd-0.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
Nov 06 21:55:40 localhost bootkube.sh[2101]:
https://etcd-1.test.oc4.localdomain:2379 is unhealthy: failed to commit
proposal: context deadline exceeded
Nov 06 21:55:40 localhost bootkube.sh[2101]:
https://etcd-2.test.oc4.localdomain:2379 is unhealthy: failed to commit
proposal: context deadline exceeded
Nov 06 21:55:40 localhost bootkube.sh[2101]:
https://etcd-0.test.oc4.localdomain:2379 is unhealthy: failed to commit
proposal: context deadline exceeded
Nov 06 21:55:40 localhost bootkube.sh[2101]: Error: unhealthy cluster
Nov 06 21:55:40 localhost podman[61210]: 2019-11-06 21:55:40.720514151
+0000 UTC m=+5.813853296 container died
7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01 (image=
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:696a0ba...,
name=etcdctl)
Nov 06 21:55:40 localhost podman[61210]: 2019-11-06 21:55:40.817475095
+0000 UTC m=+5.910814273 container remove
7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01 (image=
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:696a0ba...,
name=etcdctl)
Nov 06 21:55:40 localhost bootkube.sh[2101]: etcdctl failed. Retrying in 5
seconds...
It seems to be again a dns resolution issue.
[user1@localhost ~]$ dig api.test.oc4.localdomain +short
10.34.212.201
[user1@localhost ~]$ dig etcd-2.test.oc4.localdomain +short
nothing
So what do you think about that?
Key here is the masters - they need to boot, get ignition from the
bootstrap
machine and start publishing their IPs and hostnames.
Connect to a master, check its hostname, check its running or failing
containers `crictl ps -a` by root user.
>
>>
>>
>> Le 05/11/2019 à 12:24, Roy Golan a écrit :
>>>>
>>>>
>>>>
>>>> On Tue, 5 Nov 2019 at 13:22, Nathanaël Blanchet <blanchet(a)abes.fr>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm interested by installing okd on ovirt with the official
openshift
>>>>> installer (
https://github.com/openshift/installer), but ovirt is not
>>>>> yet
>>>>> supported.
>>>>>
>>>>>
>>>> If you want to give a try and supply feedback I'll be glad.
>>>>
>>>>
>>>>> Regarding
https://bugzilla.redhat.com/show_bug.cgi?id=1578255 and
>>>>>
>>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/EF7OQUVTY53G...
>>>>> , how ovirt 4.3.7 should integrate openshift installer integration
>>>>> with
>>>>> terraform?
>>>>>
>>>>>
>>>> Terraform is part of it, yes, It is what we use to spin the first 3
>>>> masters, plus a bootstraping machine.
>>>>
>>>> --
>>>>> Nathanaël Blanchet
>>>>>
>>>>> Supervision réseau
>>>>> Pôle Infrastrutures Informatiques
>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>> 34193 MONTPELLIER CEDEX 5
>>>>> Tél. 33 (0)4 67 54 84 55
>>>>> Fax 33 (0)4 67 54 84 14
>>>>> blanchet(a)abes.fr
>>>>>
>>>>> --
>>>> Nathanaël Blanchet
>>>>
>>>> Supervision réseau
>>>> Pôle Infrastrutures Informatiques
>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>> 34193 MONTPELLIER CEDEX 5
>>>> Tél. 33 (0)4 67 54 84 55
>>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>>
>>>> --
>>> Nathanaël Blanchet
>>>
>>> Supervision réseau
>>> Pôle Infrastrutures Informatiques
>>> 227 avenue Professeur-Jean-Louis-Viala
>>> 34193 MONTPELLIER CEDEX 5
>>> Tél. 33 (0)4 67 54 84 55
>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>
>>> --
>> Nathanaël Blanchet
>>
>> Supervision réseau
>> Pôle Infrastrutures Informatiques
>> 227 avenue Professeur-Jean-Louis-Viala
>> 34193 MONTPELLIER CEDEX 5
>> Tél. 33 (0)4 67 54 84 55
>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>
>> --
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr