On Thu, Nov 7, 2019 at 12:57 PM Roy Golan <rgolan(a)redhat.com
<mailto:rgolan@redhat.com>> wrote:
On Thu, 7 Nov 2019 at 12:28, Nathanaël Blanchet <blanchet(a)abes.fr
<mailto:blanchet@abes.fr>> wrote:
Le 07/11/2019 à 11:16, Roy Golan a écrit :
>
>
> On Thu, 7 Nov 2019 at 11:23, Nathanaël Blanchet
> <blanchet(a)abes.fr <mailto:blanchet@abes.fr>> wrote:
>
>
> Le 07/11/2019 à 07:18, Roy Golan a écrit :
>>
>>
>> On Thu, 7 Nov 2019 at 00:10, Nathanaël Blanchet
>> <blanchet(a)abes.fr <mailto:blanchet@abes.fr>> wrote:
>>
>>
>> Le 05/11/2019 à 21:50, Roy Golan a écrit :
>>>
>>>
>>> On Tue, 5 Nov 2019 at 22:46, Roy Golan
>>> <rgolan(a)redhat.com <mailto:rgolan@redhat.com>>
wrote:
>>>
>>>
>>>
>>> On Tue, 5 Nov 2019 at 20:28, Nathanaël Blanchet
>>> <blanchet(a)abes.fr <mailto:blanchet@abes.fr>>
wrote:
>>>
>>>
>>> Le 05/11/2019 à 18:22, Roy Golan a écrit :
>>>>
>>>>
>>>> On Tue, 5 Nov 2019 at 19:12, Nathanaël
>>>> Blanchet <blanchet(a)abes.fr
>>>> <mailto:blanchet@abes.fr>> wrote:
>>>>
>>>>
>>>> Le 05/11/2019 à 13:54, Roy Golan a écrit :
>>>>>
>>>>>
>>>>> On Tue, 5 Nov 2019 at 14:52,
>>>>> Nathanaël Blanchet <blanchet(a)abes.fr
>>>>> <mailto:blanchet@abes.fr>>
wrote:
>>>>>
>>>>> I tried openshift-install after
>>>>> compiling but no ovirt provider
>>>>> is available... So waht do you
>>>>> mean when you say "give a
try"?
>>>>> Maybe only provisionning ovirt
>>>>> with the terraform module?
>>>>>
>>>>> [root@vm5 installer]#
>>>>> bin/openshift-install create cluster
>>>>> ? Platform [Use arrows to move,
>>>>> space to select, type to filter,
>>>>> ? for more help]
>>>>> > aws
>>>>> azure
>>>>> gcp
>>>>> openstack
>>>>>
>>>>>
>>>>> Its not merged yet. Please pull this
>>>>> image and work with it as a container
>>>>> quay.io/rgolangh/openshift-installer
>>>>>
<
http://quay.io/rgolangh/openshift-installer>
>>>>
>>>> A little feedback as you asked:
>>>>
>>>> [root@openshift-installer ~]# docker
>>>> run -it 56e5b667100f create cluster
>>>> ? Platform ovirt
>>>> ? Enter oVirt's api endpoint URL
>>>>
https://air-dev.v100.abes.fr/ovirt-engine/api
>>>> ? Enter ovirt-engine username
>>>> admin@internal
>>>> ? Enter password **********
>>>> ? Pick the oVirt cluster Default
>>>> ? Pick a VM template centos7.x
>>>> ? Enter the internal API Virtual IP
>>>> 10.34.212.200
>>>> ? Enter the internal DNS Virtual IP
>>>> 10.34.212.100
>>>> ? Enter the ingress IP 10.34.212.50
>>>> ? Base Domain oc4.localdomain
>>>> ? Cluster Name test
>>>> ? Pull Secret [? for help]
>>>> *************************************
>>>> INFO Creating infrastructure resources...
>>>> INFO Waiting up to 30m0s for the
>>>> Kubernetes API at
>>>>
https://api.test.oc4.localdomain:6443...
>>>> ERROR Attempted to gather
>>>> ClusterOperator status after
>>>> installation failure: listing
>>>> ClusterOperator objects: Get
>>>>
https://api.test.oc4.localdomain:6443/apis/config.openshift.io/v1/cluster...:
>>>> dial tcp: lookup
>>>> api.test.oc4.localdomain on
>>>> 10.34.212.100:53
>>>> <
http://10.34.212.100:53>: no such
host
>>>> INFO Pulling debug logs from the
>>>> bootstrap machine
>>>> ERROR Attempted to gather debug logs
>>>> after installation failure: failed to
>>>> create SSH client, ensure the proper
>>>> ssh key is in your keyring or specify
>>>> with --key: failed to initialize the
>>>> SSH agent: failed to read directory
>>>> "/output/.ssh": open /output/.ssh:
no
>>>> such file or directory
>>>> FATAL Bootstrap failed to complete:
>>>> waiting for Kubernetes API: context
>>>> deadline exceeded
>>>>
>>>> * 6 vms are successfully created
>>>> thin dependent from the template
>>>>
>>>> * each vm is provisionned by cloud-init
>>>> * the step "INFO Waiting up to 30m0s
>>>> for the Kubernetes API at
>>>>
https://api.test.oc4.localdomain:6443..."
>>>> fails. It seems that the DNS pod
>>>> is not up at this time.
>>>> * Right this moment, there is no
>>>> more visibility on what is done,
>>>> what goes wrong... what's
>>>> happening there? supposing a kind
>>>> of playbook downloading a kind of
>>>> images...
>>>> * The" pull secret step" is not
>>>> clear: we must have a redhat
>>>> account to
>>>>
https://cloud.redhat.com/openshift/install/
>>>> to get a key like
>>>> *
>>>>
>>>>
{"auths":{"cloud.openshift.com
>>>>
<
http://cloud.openshift.com>":{"auth":"b3BlbnNoaWZ0...
>>>>
<mailto:exploit@abes.fr>},"quay.io
>>>>
<
http://quay.io>":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2Ut...
>>>>
<mailto:exploit@abes.fr>},"registry.connect.redhat.com
>>>>
<
http://registry.connect.redhat.com>":{"auth":"NTI0...
>>>>
<mailto:exploit@abes.fr>},"registry.redhat.io
>>>>
<
http://registry.redhat.io>":{"auth":"NTI0MjkwMnx1a...
>>>> <mailto:exploit@abes.fr>}}}
>>>>
>>>> Can you tell me if I'm doing wrong?
>>>>
>>>>
>>>> What is the template you are using? I
>>>> don't think its RHCOS(Red Hat CoreOs)
>>>> template, it looks like Centos?
>>>>
>>>> Use this gist to import the template
>>>>
https://gist.github.com/rgolangh/adccf6d6b5eaecaebe0b0aeba9d3331b
>>> Unfortunately, the result is the same with
>>> the RHCOS template...
>>>
>>>
>>> Make sure that:
>>> - the IPs supplied are taken, and belong to the
>>> VM network of those master VMs
>>> - localdomain or local domain suffix shouldn't
>>> be used
>>> - your ovirt-engine is version 4.3.7 or master
>>>
>>> I didn't mention that you can provide any domain
>>> name, even non-existing.
>>> When the bootstrap phase will be done, the
>>> instllation will teardown the bootsrap mahchine.
>>> At this stage if you are using a non-existing
>>> domain you would need to add the DNS Virtual IP
>>> you provided to your resolv.conf so the
>>> installation could resolve
>>> api.$CLUSTER_NAME.$CLUSTER_DOMAIN.
>>>
>>> Also, you have a log under your
>>> $INSTALL_DIR/.openshift_install.log
>>
>> I tried several things with your advices, but I'm
>> still stuck at the
>>
https://api.test.oc4.localdomain:6443/version?timeout=32s
>> test
>>
>> with logs:
>>
>> time="2019-11-06T20:21:15Z" level=debug
msg="Still
>> waiting for the Kubernetes API: the server could not
>> find the requested resource"
>>
>> So it means DNS resolution and network are now good
>> and ignition provisionning is is OK but something
>> goes wrong with the bootstrap vm.
>>
>> Now if I log into the bootstrap vm, I can see a
>> selinux message, but it may be not relevant...
>>
>> SELinux: mount invalid. Same Superblock, different
>> security settings for (dev nqueue, type nqueue).
>>
>> Some other cluewWith journalctl:
>>
>> journalctl -b -f -u bootkube
>>
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
{"level":"warn","ts":"2019-11-06T21:55:40.661Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>> of unary invoker
>>
failed","target":"endpoint://client-7beef51d-daad-4b46-9497-8e135e528f7c/etcd-1.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>> error: code = DeadlineExceeded desc = latest
>> connection error: connection error: desc =
>> \"transport: Error while dialing dial tcp: lookup
>> etcd-1.test.oc4.localdomain on 10.34.212.101:53
>> <
http://10.34.212.101:53>: no such host\""}
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>> of unary invoker
>>
failed","target":"endpoint://client-03992fc6-5a87-4160-9b87-44ec6e82f7cd/etcd-2.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>> error: code = DeadlineExceeded desc = latest
>> connection error: connection error: desc =
>> \"transport: Error while dialing dial tcp: lookup
>> etcd-2.test.oc4.localdomain on 10.34.212.101:53
>> <
http://10.34.212.101:53>: no such host\""}
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
{"level":"warn","ts":"2019-11-06T21:55:40.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying
>> of unary invoker
>>
failed","target":"endpoint://client-00db28a7-5188-4666-896b-e37c88ad3ae9/etcd-0.test.oc4.localdomain:2379","attempt":0,"error":"rpc
>> error: code = DeadlineExceeded desc = latest
>> connection error: connection error: desc =
>> \"transport: Error while dialing dial tcp: lookup
>> etcd-0.test.oc4.localdomain on 10.34.212.101:53
>> <
http://10.34.212.101:53>: no such host\""}
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
https://etcd-1.test.oc4.localdomain:2379 is
>> unhealthy: failed to commit proposal: context
>> deadline exceeded
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
https://etcd-2.test.oc4.localdomain:2379 is
>> unhealthy: failed to commit proposal: context
>> deadline exceeded
>> Nov 06 21:55:40 localhost bootkube.sh[2101]:
>>
https://etcd-0.test.oc4.localdomain:2379 is
>> unhealthy: failed to commit proposal: context
>> deadline exceeded
>> Nov 06 21:55:40 localhost bootkube.sh[2101]: Error:
>> unhealthy cluster
>> Nov 06 21:55:40 localhost podman[61210]: 2019-11-06
>> 21:55:40.720514151 +0000 UTC m=+5.813853296
>> container died
>> 7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01
>>
(
image=registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:6...
>>
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...;,
>> name=etcdctl)
>> Nov 06 21:55:40 localhost podman[61210]: 2019-11-06
>> 21:55:40.817475095 +0000 UTC m=+5.910814273
>> container remove
>> 7db3014e3f19c61775bac2a7a155eeb8521a6b78fea0d512384dd965cb0b8b01
>>
(
image=registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:6...
>>
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...;,
>> name=etcdctl)
>> Nov 06 21:55:40 localhost bootkube.sh[2101]: etcdctl
>> failed. Retrying in 5 seconds...
>>
>> It seems to be again a dns resolution issue.
>>
>> [user1@localhost ~]$ dig api.test.oc4.localdomain +short
>> 10.34.212.201
>>
>> [user1@localhost ~]$ dig etcd-2.test.oc4.localdomain
>> +short
>> nothing
>>
>>
>> So what do you think about that?
>>
>>
>> Key here is the masters - they need to boot, get
>> ignition from the bootstrap machine and start publishing
>> their IPs and hostnames.
>>
>> Connect to a master, check its hostname, check its
>> running or failing containers `crictl ps -a` by root user.
>
> You were right:
>
> # crictl ps -a
> CONTAINER ID IMAGE CREATED STATE
> NAME ATTEMPT POD ID
> 744cb8e654705
> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
> 4 minutes ago Running discovery 75
> 9462e9a8ca478
> 912ba9db736c3
> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
> 14 minutes ago Exited discovery 74
> 9462e9a8ca478
>
> # crictl logs 744cb8e654705
> E1107 08:10:04.262330 1 run.go:67] error looking up
> self for candidate IP 10.34.212.227
> <
http://10.34.212.227>: lookup
> _etcd-server-ssl._tcp.test.oc4.localdomain on
> 10.34.212.51:53 <
http://10.34.212.51:53>: no such host
>
> # hostname
> localhost
>
> Conclusion: discovery didn't publish IPs and hostname to
> coreDNS because the master didn't get its name
> master-0.test.oc4.localdomain during provisionning phase.
>
> I changed the master-0 hostname and reinitiates ignition
> to verify:
>
> # hostnamectl set-hostname master-0.test.oc4.localdomain
>
> # touch /boot/ignition.firstboot && rm -rf
> /etc/machine-id && reboot
>
> After reboot is completed, no more exited discovery
> container:
>
> CONTAINER ID IMAGE CREATED STATE
> NAME ATTEMPT POD ID
> e701efa8bc583
> 77ec5e26cc676ef2bf5c42dd40e55394a11fb45a3e2d7e95cbaf233a1eef472f
> 20 seconds ago Running coredns 1
> cbabc53322ac8
> 2c7bc6abb5b65
> d73eca122bd567a3a1f70fa5021683bc17dd87003d05d88b1cdd0215c55049f6
> 20 seconds ago Running mdns-publisher 1
> 6f8914ff9db35
> b3f619d5afa2c
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 21 seconds ago Running haproxy-monitor 1
> 0e5c209496787
> 07769ce79b032
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 21 seconds ago Running keepalived-monitor 1
> 02cf141d01a29
> fb20d66b81254
> e77034cf36baff5e625acbba15331db68e1d84571f977d254fd833341158daa8
> 21 seconds ago Running discovery 77
> 562f32067e0a7
> 476b07599260e
> 86a34bc5edd3e70073313f97bfd51ed8937658b341dc52334fb98ea6896ebdc2
> 22 seconds ago Running haproxy 1
> 0e5c209496787
> 26b53050a412b
> 9f94e500f85a735ec212ffb7305e0b63f7151a5346e41c2d5d293c8456f6fa42
> 22 seconds ago Running keepalived 1
> 02cf141d01a29
> 30ce48453854b
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 22 seconds ago Exited render-config 1
> cbabc53322ac8
> ad3ab0ae52077
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 22 seconds ago Exited render-config 1
> 6f8914ff9db35
> 650d62765e9e1
>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:9a7e829...
>
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...
> 13 hours ago Exited coredns 0
> 2ae0512b3b6ac
> 481969ce49bb9
>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:7681941...
>
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...
> 13 hours ago Exited mdns-publisher 0
> d49754042b792
> 3594d9d261ca7
>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:8c3b022...
>
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...
> 13 hours ago Exited haproxy-monitor 0
> 3476219058ba8
> 88b13ec02a5c1
> 7aa184de043265814f9a775968234ac3280a285056da773f1aba0917e9615370
> 13 hours ago Exited keepalived-monitor 0
> a3e13cf07c04f
> 1ab721b5599ed
>
registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:629d73f...
>
<
http://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:...
> 13 hours ago
>
> because DNS registration is OK:
>
> [user1@master-0 ~]$ dig etcd-0.test.oc4.localdomain +short
> 10.34.212.227
>
> CONCLUSION:
>
> * none of rhcos vm is correctly provisionned to their
> targeted hostname, so they all stay with localhost.
>
>
> What is your engine version? the hostname support for
> ignition is merged into 4.3.7 and master
4.3.7.1-1.el7
https://gerrit.ovirt.org/c/100397/ merged 2 days ago, so it will
apear in 4.3.7.2.
Sandro when is 4.7.3.2 is due?
You can also use the nightly 4.3 snapshot - it's not really nightly
anymore - it's updated per every run of CI Change-Queue, IIUC:
https://www.ovirt.org/develop/dev-process/install-nightly-snapshot.html
I confirm 4.3 snapshot supports hostname change with ignition, now it
works out of the box... until this issue :
INFO Cluster operator image-registry Available is False with
StorageNotConfigured: storage backend not configured
ERROR Cluster operator image-registry Degraded is True with
StorageNotConfigured: storage backend not configured
INFO Cluster operator insights Disabled is False with :
FATAL failed to initialize the cluster: Cluster operator image-registry
is still updating
I only upgraded engine and not vdsm on hosts, but I suppose
hosts are not important for ignition
Correct.
>
> * Cloud-init syntax for the hostname is ok, but it is
> not provisioned by ignition:
>
> Why not provisionning these hostnames with a json snippet
> or else?
>
>
|{"ignition":{"version":"2.2.0"},"storage":{"files":[{"filesystem":"root","path":"/etc/hostname","mode":420,"contents":{"source":"data:,master-0.test.oc4.localdomain"}}]}}|
>
>
>>
>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>> Le 05/11/2019 à 12:24, Roy Golan
>>>>> a écrit :
>>>>>>
>>>>>>
>>>>>> On Tue, 5 Nov 2019 at 13:22,
>>>>>> Nathanaël Blanchet
>>>>>> <blanchet(a)abes.fr
>>>>>>
<mailto:blanchet@abes.fr>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm interested by
installing
>>>>>> okd on ovirt with the
>>>>>> official openshift
>>>>>> installer
>>>>>>
(
https://github.com/openshift/installer),
>>>>>> but ovirt is not yet
>>>>>> supported.
>>>>>>
>>>>>>
>>>>>> If you want to give a try and
>>>>>> supply feedback I'll be
glad.
>>>>>>
>>>>>> Regarding
>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1578255
>>>>>> and
>>>>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/EF7OQUVTY53G...
>>>>>>
>>>>>> , how ovirt 4.3.7 should
>>>>>> integrate openshift
>>>>>> installer integration with
>>>>>> terraform?
>>>>>>
>>>>>>
>>>>>> Terraform is part of it, yes, It
>>>>>> is what we use to spin the first
>>>>>> 3 masters, plus a bootstraping
>>>>>> machine.
>>>>>>
>>>>>> --
>>>>>> Nathanaël Blanchet
>>>>>>
>>>>>> Supervision réseau
>>>>>> Pôle Infrastrutures
>>>>>> Informatiques
>>>>>> 227 avenue
>>>>>> Professeur-Jean-Louis-Viala
>>>>>> 34193 MONTPELLIER CEDEX 5
>>>>>> Tél. 33 (0)4 67 54 84 55
>>>>>> Fax 33 (0)4 67 54 84 14
>>>>>> blanchet(a)abes.fr
>>>>>>
<mailto:blanchet@abes.fr>
>>>>>>
>>>>> --
>>>>> Nathanaël Blanchet
>>>>>
>>>>> Supervision réseau
>>>>> Pôle Infrastrutures Informatiques
>>>>> 227 avenue
Professeur-Jean-Louis-Viala
>>>>> 34193 MONTPELLIER CEDEX 5
>>>>> Tél. 33 (0)4 67 54 84 55
>>>>> Fax 33 (0)4 67 54 84 14
>>>>> blanchet(a)abes.fr
<mailto:blanchet@abes.fr>
>>>>>
>>>> --
>>>> Nathanaël Blanchet
>>>>
>>>> Supervision réseau
>>>> Pôle Infrastrutures Informatiques
>>>> 227 avenue Professeur-Jean-Louis-Viala
>>>> 34193 MONTPELLIER CEDEX 5
>>>> Tél. 33 (0)4 67 54 84 55
>>>> Fax 33 (0)4 67 54 84 14
>>>> blanchet(a)abes.fr
<mailto:blanchet@abes.fr>
>>>>
>>> --
>>> Nathanaël Blanchet
>>>
>>> Supervision réseau
>>> Pôle Infrastrutures Informatiques
>>> 227 avenue Professeur-Jean-Louis-Viala
>>> 34193 MONTPELLIER CEDEX 5
>>> Tél. 33 (0)4 67 54 84 55
>>> Fax 33 (0)4 67 54 84 14
>>> blanchet(a)abes.fr <mailto:blanchet@abes.fr>
>>>
>> --
>> Nathanaël Blanchet
>>
>> Supervision réseau
>> Pôle Infrastrutures Informatiques
>> 227 avenue Professeur-Jean-Louis-Viala
>> 34193 MONTPELLIER CEDEX 5
>> Tél. 33 (0)4 67 54 84 55
>> Fax 33 (0)4 67 54 84 14
>> blanchet(a)abes.fr <mailto:blanchet@abes.fr>
>>
> --
> Nathanaël Blanchet
>
> Supervision réseau
> Pôle Infrastrutures Informatiques
> 227 avenue Professeur-Jean-Louis-Viala
> 34193 MONTPELLIER CEDEX 5
> Tél. 33 (0)4 67 54 84 55
> Fax 33 (0)4 67 54 84 14
> blanchet(a)abes.fr <mailto:blanchet@abes.fr>
>
--
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr <mailto:blanchet@abes.fr>
_______________________________________________
Users mailing list -- users(a)ovirt.org <mailto:users@ovirt.org>
To unsubscribe send an email to users-leave(a)ovirt.org
<mailto:users-leave@ovirt.org>
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GZ64UU7KYDY...
--
Didi
--
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr