On Thu, 7 Nov 2019 at 12:28, Nathanaël Blanchet <blanchet(a)abes.fr> wrote:
Le 07/11/2019 à 11:16, Roy Golan a écrit :
On Thu, 7 Nov 2019 at 11:23, Nathanaël Blanchet <blanchet(a)abes.fr> wrote:
>
> Le 07/11/2019 à 07:18, Roy Golan a écrit :
>
>
>
> On Thu, 7 Nov 2019 at 00:10, Nathanaël Blanchet <blanchet(a)abes.fr> wrote:
>
>>
>> Le 05/11/2019 à 21:50, Roy Golan a écrit :
>>
>>
>>
>> On Tue, 5 Nov 2019 at 22:46, Roy Golan <rgolan(a)redhat.com> wrote:
>>
>>>
>>>
>>> On Tue, 5 Nov 2019 at 20:28, Nathanaël Blanchet <blanchet(a)abes.fr>
>>> wrote:
>>>
>>>>
>>>> Le 05/11/2019 à 18:22, Roy Golan a écrit :
>>>>
>>>>
>>>>
>>>> On Tue, 5 Nov 2019 at 19:12, Nathanaël Blanchet <blanchet(a)abes.fr>
>>>> wrote:
>>>>
>>>>>
>>>>> Le 05/11/2019 à 13:54, Roy Golan a écrit :
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 5 Nov 2019 at 14:52, Nathanaël Blanchet
<blanchet(a)abes.fr>
>>>>> wrote:
>>>>>
>>>>>> I tried openshift-install after compiling but no ovirt provider
is
>>>>>> available... So waht do you mean when you say "give a
try"? Maybe only
>>>>>> provisionning ovirt with the terraform module?
>>>>>>
>>>>>> [root@vm5 installer]# bin/openshift-install create cluster
>>>>>> ? Platform [Use arrows to move, space to select, type to filter,
?
>>>>>> for more help]
>>>>>> > aws
>>>>>> azure
>>>>>> gcp
>>>>>> openstack
>>>>>>
>>>>>>
>>>>>>
>>>>> Its not merged yet. Please pull this image and work with it as a
>>>>> container
>>>>> quay.io/rgolangh/openshift-installer
>>>>>
>>>>> A little feedback as you asked:
>>>>>
>>>>> [root@openshift-installer ~]# docker run -it 56e5b667100f create
>>>>> cluster
>>>>> ? Platform ovirt
>>>>> ? Enter oVirt's api endpoint URL
>>>>>
https://air-dev.v100.abes.fr/ovirt-engine/api
>>>>> ? Enter ovirt-engine username admin@internal
>>>>> ? Enter password **********
>>>>> ? Pick the oVirt cluster Default
>>>>> ? Pick a VM template centos7.x
>>>>> ? Enter the internal API Virtual IP 10.34.212.200
>>>>> ? Enter the internal DNS Virtual IP 10.34.212.100
>>>>> ? Enter the ingress IP 10.34.212.50
>>>>> ? Base Domain oc4.localdomain
>>>>> ? Cluster Name test
>>>>> ? Pull Secret [? for help] *************************************
>>>>> INFO Creating infrastructure resources...
>>>>> INFO Waiting up to 30m0s for the Kubernetes API at
>>>>>
https://api.test.oc4.localdomain:6443...
>>>>> ERROR Attempted to gather ClusterOperator status after installation
>>>>> failure: listing ClusterOperator objects: Get
>>>>>
https://api.test.oc4.localdomain:6443/apis/config.openshift.io/v1/cluster...:
>>>>> dial tcp: lookup api.test.oc4.localdomain on 10.34.212.100:53: no
>>>>> such host
>>>>> INFO Pulling debug logs from the bootstrap machine
>>>>> ERROR Attempted to gather debug logs after installation failure:
>>>>> failed to create SSH client, ensure the proper ssh key is in your
keyring
>>>>> or specify with --key: failed to initialize the SSH agent: failed to
read
>>>>> directory "/output/.ssh": open /output/.ssh: no such file
or directory
>>>>> FATAL Bootstrap failed to complete: waiting for Kubernetes API:
>>>>> context deadline exceeded
>>>>>
>>>>> - 6 vms are successfully created thin dependent from the template
>>>>>
>>>>>
>>>>> - each vm is provisionned by cloud-init
>>>>> - the step "INFO Waiting up to 30m0s for the Kubernetes API
at
>>>>>
https://api.test.oc4.localdomain:6443..." fails. It seems
that
>>>>> the DNS pod is not up at this time.
>>>>> - Right this moment, there is no more visibility on what is done,
>>>>> what goes wrong... what's happening there? supposing a kind of
playbook
>>>>> downloading a kind of images...
>>>>> - The" pull secret step" is not clear: we must have a
redhat
>>>>> account to
https://cloud.redhat.com/openshift/install/ to get a
>>>>> key like
>>>>> -
>>>>> {"auths":{"cloud.openshift.com
>>>>>
":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K2V4cGxvaXRhYmVzZnIxdGN0ZnR0dmFnMHpuazMxd2IwMnIwenV1MDg6TE9XVzFQODM1NzNJWlI4MlZDSUEyTFdEVlJJS0U5VTVWM0NTSUdOWjJH********************==","email":
>>>>> "exploit(a)abes.fr"
<exploit(a)abes.fr>},"quay.io
>>>>>
":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K2V4cGxvaXRhYmVzZnIxdGN0ZnR0dmFnMHpuazMxd2IwMnIwenV1MDg6TE9XVzFQODM1NzNJWlI4MlZDSUEyTFdEVlJJS0U5VTVWM0NTSUdOWjJH********************==","email":
>>>>> "exploit(a)abes.fr"
<exploit(a)abes.fr>},"registry.connect.redhat.com
>>>>>
":{"auth":"NTI0MjkwMnx1aGMtMVRDVEZUVFZBRzBaTkszMXdCMDJSMFp1VTA4OmV5SmhiR2NpT2lKU1V6VXhNaUo5LmV5SnpkV0lpT2lJMk4ySTJNREV3WXpObE1HSTBNbVE0T1RGbVpUZGxa**********************","email":
>>>>> "exploit(a)abes.fr"
<exploit(a)abes.fr>},"registry.redhat.io
>>>>>
":{"auth":"NTI0MjkwMnx1aGMtMVRDVEZUVFZBRzBaTkszMXdCMDJSMFp1VTA4OmV5SmhiR2NpT2lKU1V6VXhNaUo5LmV5SnpkV0lpT2lJMk4ySTJNREV3WXpObE1HSTBNbVE0T1RGbVpUZGxa**********************","email":
>>>>> "exploit(a)abes.fr" <exploit(a)abes.fr>}}}
>>>>>
>>>>>
>>>>> Can you tell me if I'm doing wrong?
>>>>>
>>>>
>>>> What is the template you are using? I don't think its RHCOS(Red Hat
>>>> CoreOs) template, it looks like Centos?
>>>>
>>>> Use this gist to import the template
>>>>
https://gist.github.com/rgolangh/adccf6d6b5eaecaebe0b0aeba9d3331b
>>>>
>>>> Unfortunately, the result is the same with the RHCOS template...
>>>>
>>>
>>> Make sure that:
>>> - the IPs supplied are taken, and belong to the VM network of those
>>> master VMs
>>> - localdomain or local domain suffix shouldn't be used
>>> - your ovirt-engine is version 4.3.7 or master
>>>
>>> I didn't mention that you can provide any domain name, even
>> non-existing.
>> When the bootstrap phase will be done, the instllation will teardown the
>> bootsrap mahchine.
>> At this stage if you are using a non-existing domain you would need to
>> add the DNS Virtual IP
>> you provided to your resolv.conf so the installation could resolve
>> api.$CLUSTER_NAME.$CLUSTER_DOMAIN.
>>
>> Also, you have a log under your $INSTALL_DIR/.openshift_install.log
>>
>> I tried several things with your advices, but I'm still stuck at the
>>
https://api.test.oc4.localdomain:6443/version?timeout=32s test
>>
>> with logs:
>>
>> time="2019-11-06T20:21:15Z" level=debug msg="Still waiting for
the
>> Kubernetes API: the server could not find the requested resource"
>>
>> So it means DNS resolution and network are now good and ignition
>> provisionning is is OK but something goes wrong with the bootstrap vm.
>>
>> Now if I log into the bootstrap vm, I can see a selinux message, but it
>> may be not relevant...
>>
>> SELinux: mount invalid. Same Superblock, different security settings for
>> (dev nqueue, type nqueue).
>>
>> Some other cluewWith journalctl:
>>
>> journalctl -b -f -u bootkube
>>
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
{"level":"warn","ts":"2019-11-06T21:55:40.661Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>> of unary invoker
>>
failed","target":"endpoint://client-7beef51d-daad-4b46-9497-8e135e528f7c/etcd-1.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>> error: code = DeadlineExceeded desc = latest connection error: connection
>> error: desc = \"transport: Error while dialing dial tcp: lookup
>> etcd-1.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>> of unary invoker
>>
failed","target":"endpoint://client-03992fc6-5a87-4160-9b87-44ec6e82f7cd/etcd-2.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>> error: code = DeadlineExceeded desc = latest connection error: connection
>> error: desc = \"transport: Error while dialing dial tcp: lookup
>> etcd-2.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>> of unary invoker
>>
failed","target":"endpoint://client-00db28a7-5188-4666-896b-e37c88ad3ae9/etcd-0.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>> error: code = DeadlineExceeded desc = latest connection error: connection
>> error: desc = \"transport: Error while dialing dial tcp: lookup
>> etcd-0.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
https://etcd-1.test.oc4.localdomain:2379 is unhealthy: failed to commit
>> proposal: context deadline exceeded
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
https://etcd-2.test.oc4.localdomain:2379 is unhealthy: failed to commit
>> proposal: context deadline exceeded
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
https://etcd-0.test.oc4.localdomain:2379 is unhealthy: failed to commit
>> proposal: context deadline exceeded
>> Nov 06 21:55:40 localhost bootkube.sh[2101]: Error: unhealthy cluster
>> Nov 06 21:55:40 localhost podman[61210]: 2019-11-06 21:55:40.720514151
>> +0000 UTC m=+5.813853296 container died
>> 7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01 (image=
>>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:696a0ba...,
>> name=etcdctl)
>> Nov 06 21:55:40 localhost podman[61210]: 2019-11-06 21:55:40.817475095
>> +0000 UTC m=+5.910814273 container remove
>> 7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01 (image=
>>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:696a0ba...,
>> name=etcdctl)
>> Nov 06 21:55:40 localhost bootkube.sh[2101]: etcdctl failed. Retrying in
>> 5 seconds...
>>
>> It seems to be again a dns resolution issue.
>>
>> [user1@localhost ~]$ dig api.test.oc4.localdomain +short
>> 10.34.212.201
>>
>> [user1@localhost ~]$ dig etcd-2.test.oc4.localdomain +short
>> nothing
>>
>>
>> So what do you think about that?
>>
>>
>> Key here is the masters - they need to boot, get ignition from the
> bootstrap machine and start publishing their IPs and hostnames.
>
> Connect to a master, check its hostname, check its running or failing
> containers `crictl ps -a` by root user.
>
> You were right:
> # crictl ps -a
> CONTAINER ID
> IMAGE
> CREATED STATE NAME
> ATTEMPT POD ID
> 744cb8e654705
> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
> 4 minutes ago Running discovery
> 75 9462e9a8ca478
> 912ba9db736c3
> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
> 14 minutes ago Exited discovery
> 74 9462e9a8ca478
>
> # crictl logs 744cb8e654705
> E1107 08:10:04.262330 1 run.go:67] error looking up self for
> candidate IP 10.34.212.227: lookup
> _etcd-server-ssl._tcp.test.oc4.localdomain on 10.34.212.51:53: no such
> host
>
> # hostname
> localhost
>
> Conclusion: discovery didn't publish IPs and hostname to coreDNS because
> the master didn't get its name master-0.test.oc4.localdomain during
> provisionning phase.
>
> I changed the master-0 hostname and reinitiates ignition to verify:
>
> # hostnamectl set-hostname master-0.test.oc4.localdomain
>
> # touch /boot/ignition.firstboot && rm -rf /etc/machine-id && reboot
>
> After reboot is completed, no more exited discovery container:
>
> CONTAINER ID
> IMAGE
> CREATED STATE NAME
> ATTEMPT POD ID
> e701efa8bc583
> 77ec5e26cc676ef2bf5c42dd40e55394a11fb45a3e2d7e95cbaf233a1eef472f
> 20 seconds ago Running coredns
> 1 cbabc53322ac8
> 2c7bc6abb5b65
> d73eca122bd567a3a1f70fa5021683bc17dd87003d05d88b1cdd0215c55049f6
> 20 seconds ago Running mdns-publisher
> 1 6f8914ff9db35
> b3f619d5afa2c
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 21 seconds ago Running haproxy-monitor
> 1 0e5c209496787
> 07769ce79b032
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 21 seconds ago Running keepalived-monitor
> 1 02cf141d01a29
> fb20d66b81254
> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
> 21 seconds ago Running discovery
> 77 562f32067e0a7
> 476b07599260e
> 86a34bc5edd3e70073313f97bfd51ed8937658b341dc52334fb98ea6896ebdc2
> 22 seconds ago Running haproxy
> 1 0e5c209496787
> 26b53050a412b
> 9f94e500f85a735ec212ffb7305e0b63f7151a5346e41c2d5d293c8456f6fa42
> 22 seconds ago Running keepalived
> 1 02cf141d01a29
> 30ce48453854b
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 22 seconds ago Exited render-config
> 1 cbabc53322ac8
> ad3ab0ae52077
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 22 seconds ago Exited render-config
> 1 6f8914ff9db35
> 650d62765e9e1
>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:9a7e829...
> 13 hours ago Exited coredns
> 0 2ae0512b3b6ac
> 481969ce49bb9
>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:7681941...
> 13 hours ago Exited mdns-publisher
> 0 d49754042b792
> 3594d9d261ca7
>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:8c3b022...
> 13 hours ago Exited haproxy-monitor
> 0 3476219058ba8
> 88b13ec02a5c1
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 13 hours ago Exited keepalived-monitor
> 0 a3e13cf07c04f
> 1ab721b5599ed
>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:629d73f...
> 13 hours ago
>
> because DNS registration is OK:
>
> [user1@master-0 ~]$ dig etcd-0.test.oc4.localdomain +short
> 10.34.212.227
>
> CONCLUSION:
>
> - none of rhcos vm is correctly provisionned to their targeted
> hostname, so they all stay with localhost.
>
>
What is your engine version? the hostname support for ignition is merged
into 4.3.7 and master
4.3.7.1-1.el7
merged 2 days ago, so it will apear in
4.3.7.2.
Sandro when is 4.7.3.2 is due?
I only upgraded engine and not vdsm on hosts, but I suppose hosts are not
> - Cloud-init syntax for the hostname is ok, but it is not provisioned
> by ignition:
>
> Why not provisionning these hostnames with a json snippet or else?
>
> {
> "ignition": { "version": "2.2.0" },
> "storage": {
> "files": [{
> "filesystem": "root",
> "path": "/etc/hostname",
> "mode": 420,
> "contents": { "source":
"data:,master-0.test.oc4.localdomain" }
> }]
> }}
>
>
>
>
>
>>
>>>
>>>>
>>>>
>>>> Le 05/11/2019 à 12:24, Roy Golan a écrit :
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 5 Nov 2019 at 13:22, Nathanaël Blanchet
<blanchet(a)abes.fr>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'm interested by installing okd on ovirt with the
official
>>>>>>> openshift
>>>>>>> installer (
https://github.com/openshift/installer), but ovirt
is
>>>>>>> not yet
>>>>>>> supported.
>>>>>>>
>>>>>>>
>>>>>> If you want to give a try and supply feedback I'll be glad.
>>>>>>
>>>>>>
>>>>>>> Regarding
https://bugzilla.redhat.com/show_bug.cgi?id=1578255
and
>>>>>>>
>>>>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/EF7OQUVTY53G...
>>>>>>> , how ovirt 4.3.7 should integrate openshift installer
integration
>>>>>>> with
>>>>>>> terraform?
>>>>>>>
>>>>>>>
>>>>>> Terraform is part of it, yes, It is what we use to spin the first
3
>>>>>> masters, plus a bootstraping machine.
>>>>>>
>>>>>> --
>>>>>>> Nathanaël Blanchet
>>>>>>>
>>>>>>> Supervision réseau
>>>>>>> Pôle Infrastrutures Informatiques
>>>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>>>> 34193 MONTPELLIER CEDEX 5
>>>>>>> Tél. 33 (0)4 67 54 84 55
>>>>>>> Fax 33 (0)4 67 54 84 14
>>>>>>> blanchet(a)abes.fr
>>>>>>>
>>>>>>> --
>>>>>> Nathanaël Blanchet
>>>>>>
>>>>>> Supervision réseau
>>>>>> Pôle Infrastrutures Informatiques
>>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>>> 34193 MONTPELLIER CEDEX 5
>>>>>> Tél. 33 (0)4 67 54 84 55
>>>>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>>>>
>>>>>> --
>>>>> Nathanaël Blanchet
>>>>>
>>>>> Supervision réseau
>>>>> Pôle Infrastrutures Informatiques
>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>> 34193 MONTPELLIER CEDEX 5
>>>>> Tél. 33 (0)4 67 54 84 55
>>>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>>>
>>>>> --
>>>> Nathanaël Blanchet
>>>>
>>>> Supervision réseau
>>>> Pôle Infrastrutures Informatiques
>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>> 34193 MONTPELLIER CEDEX 5
>>>> Tél. 33 (0)4 67 54 84 55
>>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>>
>>>> --
>> Nathanaël Blanchet
>>
>> Supervision réseau
>> Pôle Infrastrutures Informatiques
>> 227 avenue Professeur-Jean-Louis-Viala
>> 34193 MONTPELLIER CEDEX 5
>> Tél. 33 (0)4 67 54 84 55
>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>
>> --
> Nathanaël Blanchet
>
> Supervision réseau
> Pôle Infrastrutures Informatiques
> 227 avenue Professeur-Jean-Louis-Viala
> 34193 MONTPELLIER CEDEX 5
> Tél. 33 (0)4 67 54 84 55
> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>
> --
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr