On Thu, 7 Nov 2019 at 12:28, Nathanaël Blanchet <blanchet(a)abes.fr> wrote:
>
> Le 07/11/2019 à 11:16, Roy Golan a écrit :
>
>
>
> On Thu, 7 Nov 2019 at 11:23, Nathanaël Blanchet <blanchet(a)abes.fr> wrote:
>
>>
>> Le 07/11/2019 à 07:18, Roy Golan a écrit :
>>
>>
>>
>> On Thu, 7 Nov 2019 at 00:10, Nathanaël Blanchet <blanchet(a)abes.fr>
>> wrote:
>>
>>>
>>> Le 05/11/2019 à 21:50, Roy Golan a écrit :
>>>
>>>
>>>
>>> On Tue, 5 Nov 2019 at 22:46, Roy Golan <rgolan(a)redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, 5 Nov 2019 at 20:28, Nathanaël Blanchet <blanchet(a)abes.fr>
>>>> wrote:
>>>>
>>>>>
>>>>> Le 05/11/2019 à 18:22, Roy Golan a écrit :
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 5 Nov 2019 at 19:12, Nathanaël Blanchet
<blanchet(a)abes.fr>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Le 05/11/2019 à 13:54, Roy Golan a écrit :
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 5 Nov 2019 at 14:52, Nathanaël Blanchet
<blanchet(a)abes.fr>
>>>>>> wrote:
>>>>>>
>>>>>>> I tried openshift-install after compiling but no ovirt
provider is
>>>>>>> available... So waht do you mean when you say "give a
try"? Maybe only
>>>>>>> provisionning ovirt with the terraform module?
>>>>>>>
>>>>>>> [root@vm5 installer]# bin/openshift-install create cluster
>>>>>>> ? Platform [Use arrows to move, space to select, type to
filter, ?
>>>>>>> for more help]
>>>>>>> > aws
>>>>>>> azure
>>>>>>> gcp
>>>>>>> openstack
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Its not merged yet. Please pull this image and work with it as a
>>>>>> container
>>>>>> quay.io/rgolangh/openshift-installer
>>>>>>
>>>>>> A little feedback as you asked:
>>>>>>
>>>>>> [root@openshift-installer ~]# docker run -it 56e5b667100f create
>>>>>> cluster
>>>>>> ? Platform ovirt
>>>>>> ? Enter oVirt's api endpoint URL
>>>>>>
https://air-dev.v100.abes.fr/ovirt-engine/api
>>>>>> ? Enter ovirt-engine username admin@internal
>>>>>> ? Enter password **********
>>>>>> ? Pick the oVirt cluster Default
>>>>>> ? Pick a VM template centos7.x
>>>>>> ? Enter the internal API Virtual IP 10.34.212.200
>>>>>> ? Enter the internal DNS Virtual IP 10.34.212.100
>>>>>> ? Enter the ingress IP 10.34.212.50
>>>>>> ? Base Domain oc4.localdomain
>>>>>> ? Cluster Name test
>>>>>> ? Pull Secret [? for help] *************************************
>>>>>> INFO Creating infrastructure resources...
>>>>>> INFO Waiting up to 30m0s for the Kubernetes API at
>>>>>>
https://api.test.oc4.localdomain:6443...
>>>>>> ERROR Attempted to gather ClusterOperator status after
installation
>>>>>> failure: listing ClusterOperator objects: Get
>>>>>>
https://api.test.oc4.localdomain:6443/apis/config.openshift.io/v1/cluster...:
>>>>>> dial tcp: lookup api.test.oc4.localdomain on 10.34.212.100:53:
no
>>>>>> such host
>>>>>> INFO Pulling debug logs from the bootstrap machine
>>>>>> ERROR Attempted to gather debug logs after installation failure:
>>>>>> failed to create SSH client, ensure the proper ssh key is in your
keyring
>>>>>> or specify with --key: failed to initialize the SSH agent: failed
to read
>>>>>> directory "/output/.ssh": open /output/.ssh: no such
file or directory
>>>>>> FATAL Bootstrap failed to complete: waiting for Kubernetes API:
>>>>>> context deadline exceeded
>>>>>>
>>>>>> - 6 vms are successfully created thin dependent from the
template
>>>>>>
>>>>>>
>>>>>> - each vm is provisionned by cloud-init
>>>>>> - the step "INFO Waiting up to 30m0s for the Kubernetes
API at
>>>>>>
https://api.test.oc4.localdomain:6443..." fails. It seems
that
>>>>>> the DNS pod is not up at this time.
>>>>>> - Right this moment, there is no more visibility on what is
>>>>>> done, what goes wrong... what's happening there? supposing
a kind of
>>>>>> playbook downloading a kind of images...
>>>>>> - The" pull secret step" is not clear: we must have
a redhat
>>>>>> account to
https://cloud.redhat.com/openshift/install/ to get
a
>>>>>> key like
>>>>>> -
>>>>>> {"auths":{"cloud.openshift.com
>>>>>>
":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K2V4cGxvaXRhYmVzZnIxdGN0ZnR0dmFnMHpuazMxd2IwMnIwenV1MDg6TE9XVzFQODM1NzNJWlI4MlZDSUEyTFdEVlJJS0U5VTVWM0NTSUdOWjJH********************==","email":
>>>>>> "exploit(a)abes.fr"
<exploit(a)abes.fr>},"quay.io
>>>>>>
":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K2V4cGxvaXRhYmVzZnIxdGN0ZnR0dmFnMHpuazMxd2IwMnIwenV1MDg6TE9XVzFQODM1NzNJWlI4MlZDSUEyTFdEVlJJS0U5VTVWM0NTSUdOWjJH********************==","email":
>>>>>> "exploit(a)abes.fr"
<exploit(a)abes.fr>},"registry.connect.redhat.com
>>>>>>
":{"auth":"NTI0MjkwMnx1aGMtMVRDVEZUVFZBRzBaTkszMXdCMDJSMFp1VTA4OmV5SmhiR2NpT2lKU1V6VXhNaUo5LmV5SnpkV0lpT2lJMk4ySTJNREV3WXpObE1HSTBNbVE0T1RGbVpUZGxa**********************","email":
>>>>>> "exploit(a)abes.fr"
<exploit(a)abes.fr>},"registry.redhat.io
>>>>>>
":{"auth":"NTI0MjkwMnx1aGMtMVRDVEZUVFZBRzBaTkszMXdCMDJSMFp1VTA4OmV5SmhiR2NpT2lKU1V6VXhNaUo5LmV5SnpkV0lpT2lJMk4ySTJNREV3WXpObE1HSTBNbVE0T1RGbVpUZGxa**********************","email":
>>>>>> "exploit(a)abes.fr" <exploit(a)abes.fr>}}}
>>>>>>
>>>>>>
>>>>>> Can you tell me if I'm doing wrong?
>>>>>>
>>>>>
>>>>> What is the template you are using? I don't think its RHCOS(Red
Hat
>>>>> CoreOs) template, it looks like Centos?
>>>>>
>>>>> Use this gist to import the template
>>>>>
https://gist.github.com/rgolangh/adccf6d6b5eaecaebe0b0aeba9d3331b
>>>>>
>>>>> Unfortunately, the result is the same with the RHCOS template...
>>>>>
>>>>
>>>> Make sure that:
>>>> - the IPs supplied are taken, and belong to the VM network of those
>>>> master VMs
>>>> - localdomain or local domain suffix shouldn't be used
>>>> - your ovirt-engine is version 4.3.7 or master
>>>>
>>>> I didn't mention that you can provide any domain name, even
>>> non-existing.
>>> When the bootstrap phase will be done, the instllation will teardown
>>> the bootsrap mahchine.
>>> At this stage if you are using a non-existing domain you would need to
>>> add the DNS Virtual IP
>>> you provided to your resolv.conf so the installation could resolve
>>> api.$CLUSTER_NAME.$CLUSTER_DOMAIN.
>>>
>>> Also, you have a log under your $INSTALL_DIR/.openshift_install.log
>>>
>>> I tried several things with your advices, but I'm still stuck at the
>>>
https://api.test.oc4.localdomain:6443/version?timeout=32s test
>>>
>>> with logs:
>>>
>>> time="2019-11-06T20:21:15Z" level=debug msg="Still waiting for
the
>>> Kubernetes API: the server could not find the requested resource"
>>>
>>> So it means DNS resolution and network are now good and ignition
>>> provisionning is is OK but something goes wrong with the bootstrap vm.
>>>
>>> Now if I log into the bootstrap vm, I can see a selinux message, but it
>>> may be not relevant...
>>>
>>> SELinux: mount invalid. Same Superblock, different security settings
>>> for (dev nqueue, type nqueue).
>>>
>>> Some other cluewWith journalctl:
>>>
>>> journalctl -b -f -u bootkube
>>>
>>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>>
{"level":"warn","ts":"2019-11-06T21:55:40.661Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>>> of unary invoker
>>>
failed","target":"endpoint://client-7beef51d-daad-4b46-9497-8e135e528f7c/etcd-1.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>>> error: code = DeadlineExceeded desc = latest connection error: connection
>>> error: desc = \"transport: Error while dialing dial tcp: lookup
>>> etcd-1.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
>>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>>
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>>> of unary invoker
>>>
failed","target":"endpoint://client-03992fc6-5a87-4160-9b87-44ec6e82f7cd/etcd-2.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>>> error: code = DeadlineExceeded desc = latest connection error: connection
>>> error: desc = \"transport: Error while dialing dial tcp: lookup
>>> etcd-2.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
>>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>>
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>>> of unary invoker
>>>
failed","target":"endpoint://client-00db28a7-5188-4666-896b-e37c88ad3ae9/etcd-0.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>>> error: code = DeadlineExceeded desc = latest connection error: connection
>>> error: desc = \"transport: Error while dialing dial tcp: lookup
>>> etcd-0.test.oc4.localdomain on 10.34.212.101:53: no such host\""}
>>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>>
https://etcd-1.test.oc4.localdomain:2379 is unhealthy: failed to
>>> commit proposal: context deadline exceeded
>>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>>
https://etcd-2.test.oc4.localdomain:2379 is unhealthy: failed to
>>> commit proposal: context deadline exceeded
>>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>>
https://etcd-0.test.oc4.localdomain:2379 is unhealthy: failed to
>>> commit proposal: context deadline exceeded
>>> Nov 06 21:55:40 localhost bootkube.sh[2101]: Error: unhealthy cluster
>>> Nov 06 21:55:40 localhost podman[61210]: 2019-11-06 21:55:40.720514151
>>> +0000 UTC m=+5.813853296 container died
>>> 7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01 (image=
>>>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:696a0ba...,
>>> name=etcdctl)
>>> Nov 06 21:55:40 localhost podman[61210]: 2019-11-06 21:55:40.817475095
>>> +0000 UTC m=+5.910814273 container remove
>>> 7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01 (image=
>>>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:696a0ba...,
>>> name=etcdctl)
>>> Nov 06 21:55:40 localhost bootkube.sh[2101]: etcdctl failed. Retrying
>>> in 5 seconds...
>>>
>>> It seems to be again a dns resolution issue.
>>>
>>> [user1@localhost ~]$ dig api.test.oc4.localdomain +short
>>> 10.34.212.201
>>>
>>> [user1@localhost ~]$ dig etcd-2.test.oc4.localdomain +short
>>> nothing
>>>
>>>
>>> So what do you think about that?
>>>
>>>
>>> Key here is the masters - they need to boot, get ignition from the
>> bootstrap machine and start publishing their IPs and hostnames.
>>
>> Connect to a master, check its hostname, check its running or failing
>> containers `crictl ps -a` by root user.
>>
>> You were right:
>> # crictl ps -a
>> CONTAINER ID
>> IMAGE
>> CREATED STATE NAME
>> ATTEMPT POD ID
>> 744cb8e654705
>> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
>> 4 minutes ago Running discovery
>> 75 9462e9a8ca478
>> 912ba9db736c3
>> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
>> 14 minutes ago Exited discovery
>> 74 9462e9a8ca478
>>
>> # crictl logs 744cb8e654705
>> E1107 08:10:04.262330 1 run.go:67] error looking up self for
>> candidate IP 10.34.212.227: lookup
>> _etcd-server-ssl._tcp.test.oc4.localdomain on 10.34.212.51:53: no such
>> host
>>
>> # hostname
>> localhost
>>
>> Conclusion: discovery didn't publish IPs and hostname to coreDNS because
>> the master didn't get its name master-0.test.oc4.localdomain during
>> provisionning phase.
>>
>> I changed the master-0 hostname and reinitiates ignition to verify:
>>
>> # hostnamectl set-hostname master-0.test.oc4.localdomain
>>
>> # touch /boot/ignition.firstboot && rm -rf /etc/machine-id &&
reboot
>>
>> After reboot is completed, no more exited discovery container:
>>
>> CONTAINER ID
>> IMAGE
>> CREATED STATE NAME
>> ATTEMPT POD ID
>> e701efa8bc583
>> 77ec5e26cc676ef2bf5c42dd40e55394a11fb45a3e2d7e95cbaf233a1eef472f
>> 20 seconds ago Running coredns
>> 1 cbabc53322ac8
>> 2c7bc6abb5b65
>> d73eca122bd567a3a1f70fa5021683bc17dd87003d05d88b1cdd0215c55049f6
>> 20 seconds ago Running mdns-publisher
>> 1 6f8914ff9db35
>> b3f619d5afa2c
>> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
>> 21 seconds ago Running haproxy-monitor
>> 1 0e5c209496787
>> 07769ce79b032
>> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
>> 21 seconds ago Running keepalived-monitor
>> 1 02cf141d01a29
>> fb20d66b81254
>> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
>> 21 seconds ago Running discovery
>> 77 562f32067e0a7
>> 476b07599260e
>> 86a34bc5edd3e70073313f97bfd51ed8937658b341dc52334fb98ea6896ebdc2
>> 22 seconds ago Running haproxy
>> 1 0e5c209496787
>> 26b53050a412b
>> 9f94e500f85a735ec212ffb7305e0b63f7151a5346e41c2d5d293c8456f6fa42
>> 22 seconds ago Running keepalived
>> 1 02cf141d01a29
>> 30ce48453854b
>> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
>> 22 seconds ago Exited render-config
>> 1 cbabc53322ac8
>> ad3ab0ae52077
>> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
>> 22 seconds ago Exited render-config
>> 1 6f8914ff9db35
>> 650d62765e9e1
>>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:9a7e829...
>> 13 hours ago Exited coredns
>> 0 2ae0512b3b6ac
>> 481969ce49bb9
>>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:7681941...
>> 13 hours ago Exited mdns-publisher
>> 0 d49754042b792
>> 3594d9d261ca7
>>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:8c3b022...
>> 13 hours ago Exited haproxy-monitor
>> 0 3476219058ba8
>> 88b13ec02a5c1
>> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
>> 13 hours ago Exited keepalived-monitor
>> 0 a3e13cf07c04f
>> 1ab721b5599ed
>>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:629d73f...
>> 13 hours ago
>>
>> because DNS registration is OK:
>>
>> [user1@master-0 ~]$ dig etcd-0.test.oc4.localdomain +short
>> 10.34.212.227
>>
>> CONCLUSION:
>>
>> - none of rhcos vm is correctly provisionned to their targeted
>> hostname, so they all stay with localhost.
>>
>>
> What is your engine version? the hostname support for ignition is merged
> into 4.3.7 and master
>
> 4.3.7.1-1.el7
>
https://gerrit.ovirt.org/c/100397/ merged 2 days ago, so it will apear in
4.3.7.2.
Sandro when is 4.7.3.2 is due?
You can also use the nightly 4.3 snapshot - it's not really nightly anymore
- it's updated per every run of CI Change-Queue, IIUC:
I only upgraded engine and not vdsm on hosts, but I suppose hosts are not
> important for ignition
>
Correct.
>
>
>> - Cloud-init syntax for the hostname is ok, but it is not
>> provisioned by ignition:
>>
>> Why not provisionning these hostnames with a json snippet or else?
>>
>> {
>> "ignition": { "version": "2.2.0" },
>> "storage": {
>> "files": [{
>> "filesystem": "root",
>> "path": "/etc/hostname",
>> "mode": 420,
>> "contents": { "source":
"data:,master-0.test.oc4.localdomain" }
>> }]
>> }}
>>
>>
>>
>>
>>
>>>
>>>>
>>>>>
>>>>>
>>>>> Le 05/11/2019 à 12:24, Roy Golan a écrit :
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 5 Nov 2019 at 13:22, Nathanaël Blanchet
<blanchet(a)abes.fr>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm interested by installing okd on ovirt with the
official
>>>>>>>> openshift
>>>>>>>> installer (
https://github.com/openshift/installer), but
ovirt is
>>>>>>>> not yet
>>>>>>>> supported.
>>>>>>>>
>>>>>>>>
>>>>>>> If you want to give a try and supply feedback I'll be
glad.
>>>>>>>
>>>>>>>
>>>>>>>> Regarding
https://bugzilla.redhat.com/show_bug.cgi?id=1578255 and
>>>>>>>>
>>>>>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/EF7OQUVTY53G...
>>>>>>>> , how ovirt 4.3.7 should integrate openshift installer
integration
>>>>>>>> with
>>>>>>>> terraform?
>>>>>>>>
>>>>>>>>
>>>>>>> Terraform is part of it, yes, It is what we use to spin the
first 3
>>>>>>> masters, plus a bootstraping machine.
>>>>>>>
>>>>>>> --
>>>>>>>> Nathanaël Blanchet
>>>>>>>>
>>>>>>>> Supervision réseau
>>>>>>>> Pôle Infrastrutures Informatiques
>>>>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>>>>> 34193 MONTPELLIER CEDEX 5
>>>>>>>> Tél. 33 (0)4 67 54 84 55
>>>>>>>> Fax 33 (0)4 67 54 84 14
>>>>>>>> blanchet(a)abes.fr
>>>>>>>>
>>>>>>>> --
>>>>>>> Nathanaël Blanchet
>>>>>>>
>>>>>>> Supervision réseau
>>>>>>> Pôle Infrastrutures Informatiques
>>>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>>>> 34193 MONTPELLIER CEDEX 5
>>>>>>> Tél. 33 (0)4 67 54 84 55
>>>>>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>>>>>
>>>>>>> --
>>>>>> Nathanaël Blanchet
>>>>>>
>>>>>> Supervision réseau
>>>>>> Pôle Infrastrutures Informatiques
>>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>>> 34193 MONTPELLIER CEDEX 5
>>>>>> Tél. 33 (0)4 67 54 84 55
>>>>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>>>>
>>>>>> --
>>>>> Nathanaël Blanchet
>>>>>
>>>>> Supervision réseau
>>>>> Pôle Infrastrutures Informatiques
>>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>>> 34193 MONTPELLIER CEDEX 5
>>>>> Tél. 33 (0)4 67 54 84 55
>>>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>>>
>>>>> --
>>> Nathanaël Blanchet
>>>
>>> Supervision réseau
>>> Pôle Infrastrutures Informatiques
>>> 227 avenue Professeur-Jean-Louis-Viala
>>> 34193 MONTPELLIER CEDEX 5
>>> Tél. 33 (0)4 67 54 84 55
>>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>>
>>> --
>> Nathanaël Blanchet
>>
>> Supervision réseau
>> Pôle Infrastrutures Informatiques
>> 227 avenue Professeur-Jean-Louis-Viala
>> 34193 MONTPELLIER CEDEX 5
>> Tél. 33 (0)4 67 54 84 55
>> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>>
>> --
> Nathanaël Blanchet
>
> Supervision réseau
> Pôle Infrastrutures Informatiques
> 227 avenue Professeur-Jean-Louis-Viala
> 34193 MONTPELLIER CEDEX 5
> Tél. 33 (0)4 67 54 84 55
> Fax 33 (0)4 67 54 84 14blanchet(a)abes.fr
>
> _______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GZ64UU7KYDY...