I actually see the pods running on master0 if I do this:

@master0 master]# oc project kube-system
Now using project "kube-system" on server "https://openshift-master.cloud.xxxxxxx.com:8443".
[root@master0 master]# oc get pods
NAME                                                      READY     STATUS    RESTARTS   AGE
master-api-master0.cloud.xxxxxxxx.com           1/1       Running   0          22m
master-controllers-master0.cloud.xxxxxxxx.com   1/1       Running   0          22m
master-etcd-master0.cloud.xxxxxxxxxx          1/1       Running   0          22m

So I wonder why the ansible "Wait for control plane pods to appear" task is looping

- name: Wait for control plane pods to appear
  oc_obj:
    state: list
    kind: pod
    name: "master-{{ item }}-{{ l_kubelet_node_name | lower }}"
    namespace: kube-system
  register: control_plane_pods
  until:
  - "'results' in control_plane_pods"
  - "'results' in control_plane_pods.results"
  - control_plane_pods.results.results | length > 0
  retries: 60
  delay: 5
  with_items:
  - "{{ 'etcd' if inventory_hostname in groups['oo_etcd_to_config'] else omit }}"
  - api
  - controllers
  ignore_errors: true

On Tue, May 28, 2019 at 4:23 PM Jayme <jaymef@gmail.com> wrote:
I just tried again from scratch this time making sure a proper wildcard DNS entry existed and without using the set /etc/hosts option and am still running in to the pods issue.  Can anyone confirm if this requires a public external IP to work?  I am working on an internal DNS zone here and natted ips. 

On Tue, May 28, 2019 at 3:28 PM Edward Berger <edwberger@gmail.com> wrote:
In my case it was a single bare metal host, so that would be equivalent to disabling iptables on the master0 VM you're installing to, in your ovirt scenario.

On Tue, May 28, 2019 at 1:25 PM Jayme <jaymef@gmail.com> wrote:
Do you mean the iptables firewall on the server being installed to i.e. master0 or the actual oVirt host that the master0 VM is running on?  I did try flushing iptables rules on master0 VM then ran plays again from installer VM but fail at the same point. 

Does this log message have anything to do with the issue, /etc/cni directory does not even exist on master0 VM.

May 28 17:23:35 master0 origin-node: W0528 17:23:35.012902   10434 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d
May 28 17:23:35 master0 origin-node: E0528 17:23:35.013398   10434 kubelet.go:2101] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized



On Tue, May 28, 2019 at 1:19 PM Edward Berger <edwberger@gmail.com> wrote:
> TASK [openshift_control_plane : Wait for control plane pods to appear] *********
> Monday 27 May 2019  13:31:54 +0000 (0:00:00.180)       0:14:33.857 ************
> FAILED - RETRYING: Wait for control plane pods to appear (60 retries left).
> FAILED - RETRYING: Wait for control plane pods to appear (59 retries left).
>    It eventually counts all the way down to zero and fails.  

This looks a lot like the issues I saw when the host firewall (iptables) was blocking another OKD all-in-one-host install script [1].
Disabling iptables allowed the installation to continue for my proof of concept "cluster".


The other error I had with [1] was it was trying to install a couple of packages (zile and python2-pip) from EPEL with the repo disabled.



On Tue, May 28, 2019 at 10:41 AM Jayme <jaymef@gmail.com> wrote:
Shirly,

Oh and I should mention that I did verify that NetworkManager was installed on the master0 VM and enabled/started the second go around.  So that service is there and running. 

# systemctl list-unit-files | grep Network
dbus-org.freedesktop.NetworkManager.service                             enabled
NetworkManager-dispatcher.service                                       enabled
NetworkManager-wait-online.service                                      enabled
NetworkManager.service                                                  enabled

On Tue, May 28, 2019 at 11:13 AM Jayme <jaymef@gmail.com> wrote:
Shirly,

I appreciate the help with this.  Unfortunately I am still running in to the same problem.  So far I've tried to install/enable/start NetworkManager on the existing "master0" server and re-ran the plans from the installer VM.  I ran in to the same problem waiting for control plane pods and same errors in syslog.

So I wiped everything out, killed the template along with the installer and master VMs.  On oVirt engine (I am running 4.3.3.7-1 stable) I did have ovirt-engine-metrics-1.3.0x rpm installed, no yum updates available on an update check.  So I installed http://resources.ovirt.org/pub/yum-repo/ovirt-release43-pre.rpm then proceeded to install the latest version of ovirt-engine-metrics which gave me: ovirt-engine-metrics-1.3.1-1.el7.noarch on hosted engine. 

After that package was installed I proceeded to follow steps from beginning outlined at: https://ovirt.org/documentation/metrics-install-guide/Installing_Metrics_Store.html -- I ran in to the docker check issue again (same as my initial email) so I disabled that and again got as far as starting control plane pods before failure. 

Not sure where to go from here at this point.  The only thing I can think of that I did differently vs the instructions outlined above is that I have not crated the wildcard DNS record, however I did set configs to create /etc/hosts entries and they /etc/hosts on the machines have the proper IPs assigned for all hostnames (automatically added by the ansible plays).

Any ideas how I can get past the plane pods issue?

Thanks!

On Tue, May 28, 2019 at 4:23 AM Shirly Radco <sradco@redhat.com> wrote:
Hi,

The latest release of 4.3.z should already include a fix for this issue, ovirt-engine-metrics-1.3.1 rpm.

The issue is that it requires the NetworkManagar to be installed, running and enabled for it to work.

You can install it manually on the master0 vm , start and enable it or you can also install the updated rpm from the nightly builds if your environment is oVirt 4.2.z:


Best regards,
--

Shirly Radco

BI Senior Software Engineer

Red Hat



On Mon, May 27, 2019 at 4:41 PM Jayme <jaymef@gmail.com> wrote:
I managed to get past that but am running in to another problem later in the process on the control plane pods to appear task.   I thought perhaps a glitch in the process from the failed docker step previously so after a few more runs I tried killing everything and restarting the metrics process again from the very beginning and end up hitting the same issue with control plane pods even though all other steps/tasks seem to be working.

I'm just getting this:

TASK [openshift_control_plane : Wait for control plane pods to appear] *********
Monday 27 May 2019  13:31:54 +0000 (0:00:00.180)       0:14:33.857 ************
FAILED - RETRYING: Wait for control plane pods to appear (60 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (59 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (58 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (57 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (56 retries left).

It eventually counts all the way down to zero and fails.  

In syslog of the master0 server I'm seeing some errors related to cni config

May 27 13:39:07 master0 ansible-oc_obj: Invoked with files=None kind=pod force=False all_namespaces=None field_selector=None namespace=kube-system delete_after=False kubeconfig=/etc/origin/master/admin.kubeconfig content=None state=list debug=False selector=None name=master-api-master0.xxxxxx.com
May 27 13:39:09 master0 origin-node: W0527 13:39:09.064230   20150 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d
May 27 13:39:09 master0 origin-node: E0527 13:39:09.064670   20150 kubelet.go:2101] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
May 27 13:39:13 master0 ansible-oc_obj: Invoked with files=None kind=pod force=False all_namespaces=None field_selector=None namespace=kube-system delete_after=False kubeconfig=/etc/origin/master/admin.kubeconfig content=None state=list debug=False selector=None name=master-api-master0.xxxxxx.com
May 27 13:39:14 master0 origin-node: W0527 13:39:14.066911   20150 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d
May 27 13:39:14 master0 origin-node: E0527 13:39:14.067321   20150 kubelet.go:2101] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
May 27 13:39:14 master0 origin-node: E0527 13:39:14.814705   20150 summary.go:102] Failed to get system container stats for "/system.slice/origin-node.service": failed to get cgroup stats for "/system.slice/origin-node.service": failed to get container info for "/system.slice/origin-node.service": unknown container "/system.slice/origin-node.service"
May 27 13:39:19 master0 origin-node: W0527 13:39:19.069450   20150 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d
May 27 13:39:19 master0 origin-node: E0527 13:39:19.069850   20150 kubelet.go:2101] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

On Mon, May 27, 2019 at 9:35 AM Shirly Radco <sradco@redhat.com> wrote:
Hi Jayme,

Thank you for reaching out.
Please try rerunning the ansible playbook.
If this doesn't work, try adding to the integ.ini in the metrics vm 
openshift_disable_check=docker_storage
and rerun the ansible playbook again.

Please update how it goes.

Best regards,
--

Shirly Radco

BI Senior Software Engineer

Red Hat



On Sun, May 26, 2019 at 9:34 PM Jayme <jaymef@gmail.com> wrote:
I'm running in to this ansible error during oVirt metrics installation (following procedures at: https://ovirt.org/documentation/metrics-install-guide/Installing_Metrics_Store.html )

This is happening late in the process, after successfully deploying the installation VM and then running second step from the metrics VM. 

CHECK [memory_availability : master0.xxxxxx.com] *************************************************************************************************
fatal: [master0.xxxxxxx.com]: FAILED! => {"changed": true, "checks": {"disk_availability": {}, "docker_image_availability": {"changed": true}, "docker_storage": {"failed": true, "failures": [["OpenShiftCheckException", "Could not find imported module support code for docker_info.  Looked for either AnsibleDockerClient.py or docker_common.py\nTraceback (most recent call last):\n  File \"/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/action_plugins/openshift_health_check.py\", line 225, in run_check\n    result = check.run()\n  File \"/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/openshift_checks/docker_storage.py\", line 53, in run\n    docker_info = self.execute_module(\"docker_info\", {})\n  File \"/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/openshift_checks/__init__.py\", line 211, in execute_module\n    result = self._execute_module(module_name, module_args, self.tmp, self.task_vars)\n  File \"/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py\", line 809, in _execute_module\n    (module_style, shebang, module_data, module_path) = self._configure_module(module_name=module_name, module_args=module_args, task_vars=task_vars)\n  File \"/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py\", line 203, in _configure_module\n    environment=final_environment)\n  File \"/usr/lib/python2.7/site-packages/ansible/executor/module_common.py\", line 1023, in modify_module\n    environment=environment)\n  File \"/usr/lib/python2.7/site-packages/ansible/executor/module_common.py\", line 859, in _find_module_utils\n    recursive_finder(module_name, b_module_data, py_module_names, py_module_cache, zf)\n  File \"/usr/lib/python2.7/site-packages/ansible/executor/module_common.py\", line 621, in recursive_finder\n    raise AnsibleError(' '.join(msg))\nAnsibleError: Could not find imported module support code for docker_info.  Looked for either AnsibleDockerClient.py or docker_common.py\n"]], "msg": "Could not find imported module support code for docker_info.  Looked for either AnsibleDockerClient.py or docker_common.py\nTraceback (most recent call last):\n  File \"/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/action_plugins/openshift_health_check.py\", line 225, in run_check\n    result = check.run()\n  File \"/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/openshift_checks/docker_storage.py\", line 53, in run\n    docker_info = self.execute_module(\"docker_info\", {})\n  File \"/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/openshift_checks/__init__.py\", line 211, in execute_module\n    result = self._execute_module(module_name, module_args, self.tmp, self.task_vars)\n  File \"/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py\", line 809, in _execute_module\n    (module_style, shebang, module_data, module_path) = self._configure_module(module_name=module_name, module_args=module_args, task_vars=task_vars)\n  File \"/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py\", line 203, in _configure_module\n    environment=final_environment)\n  File \"/usr/lib/python2.7/site-packages/ansible/executor/module_common.py\", line 1023, in modify_module\n    environment=environment)\n  File \"/usr/lib/python2.7/site-packages/ansible/executor/module_common.py\", line 859, in _find_module_utils\n    recursive_finder(module_name, b_module_data, py_module_names, py_module_cache, zf)\n  File \"/usr/lib/python2.7/site-packages/ansible/executor/module_common.py\", line 621, in recursive_finder\n    raise AnsibleError(' '.join(msg))\nAnsibleError: Could not find imported module support code for docker_info.  Looked for either AnsibleDockerClient.py or docker_common.py\n"}, "memory_availability": {}, "package_availability": {"changed": false, "invocation": {"module_args": {"packages": ["PyYAML", "bash-completion", "bind", "ceph-common", "dnsmasq", "docker", "firewalld", "flannel", "glusterfs-fuse", "httpd-tools", "iptables", "iptables-services", "iscsi-initiator-utils", "libselinux-python", "nfs-utils", "ntp", "openssl", "origin", "origin-clients", "origin-hyperkube", "origin-node", "pyparted", "python-httplib2", "yum-utils"]}}}, "package_version": {"changed": false, "invocation": {"module_args": {"package_list": [{"check_multi": false, "name": "origin", "version": ""}, {"check_multi": false, "name": "origin-master", "version": ""}, {"check_multi": false, "name": "origin-node", "version": ""}], "package_mgr": "yum"}}}}, "msg": "One or more checks failed", "playbook_context": "install"}

NO MORE HOSTS LEFT *******************************************************************************************************************************************

PLAY RECAP ***************************************************************************************************************************************************
localhost                  : ok=35   changed=1    unreachable=0    failed=0    skipped=16   rescued=0    ignored=0
master0.xxxxxxx.com : ok=96   changed=6    unreachable=0    failed=1    skipped=165  rescued=0    ignored=0


INSTALLER STATUS *********************************************************************************************************************************************
Initialization  : Complete (0:00:16)
Health Check    : In Progress (0:00:36)
This phase can be restarted by running: playbooks/openshift-checks/pre-install.yml
Sunday 26 May 2019  16:36:25 +0000 (0:00:36.151)       0:01:56.339 ************
===============================================================================
Run health checks (install) - EL --------------------------------------------------------------------------------------------------------------------- 36.15s
os_firewall : Ensure iptables services are not enabled ------------------------------------------------------------------------------------------------ 2.74s
openshift_repos : Ensure libselinux-python is installed ----------------------------------------------------------------------------------------------- 1.77s
openshift_repos : refresh cache ----------------------------------------------------------------------------------------------------------------------- 1.60s
Gather Cluster facts ---------------------------------------------------------------------------------------------------------------------------------- 1.51s
container_runtime : Fixup SELinux permissions for docker ---------------------------------------------------------------------------------------------- 1.33s
container_runtime : Place additional/blocked/insecure registries in /etc/containers/registries.conf --------------------------------------------------- 1.30s
Ensure openshift-ansible installer package deps are installed ----------------------------------------------------------------------------------------- 1.29s
container_runtime : Install Docker -------------------------------------------------------------------------------------------------------------------- 1.17s
Initialize openshift.node.sdn_mtu --------------------------------------------------------------------------------------------------------------------- 1.13s
os_firewall : Install firewalld packages -------------------------------------------------------------------------------------------------------------- 1.13s
container_runtime : Set various Docker options -------------------------------------------------------------------------------------------------------- 1.11s
install NetworkManager -------------------------------------------------------------------------------------------------------------------------------- 1.10s
openshift_repos : Configure correct origin release repository ----------------------------------------------------------------------------------------- 1.05s
container_runtime : Get current installed Docker version ---------------------------------------------------------------------------------------------- 1.04s
openshift_repos : Configure origin gpg keys ----------------------------------------------------------------------------------------------------------- 1.04s
openshift_repos : Remove openshift_additional.repo file ----------------------------------------------------------------------------------------------- 0.99s
container_runtime : Setup the docker-storage for overlay ---------------------------------------------------------------------------------------------- 0.96s
Detecting Operating System from ostree_booted --------------------------------------------------------------------------------------------------------- 0.95s
Gather Cluster facts ---------------------------------------------------------------------------------------------------------------------------------- 0.92s


Failure summary:


  1. Hosts:    master0.xxxxxxx.com
     Play:     OpenShift Health Checks
     Task:     Run health checks (install) - EL
     Message:  One or more checks failed
     Details:  check "docker_storage":
               Could not find imported module support code for docker_info.  Looked for either AnsibleDockerClient.py or docker_common.py
               Traceback (most recent call last):
                 File "/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/action_plugins/openshift_health_check.py", line 225, in run_check
                   result = check.run()
                 File "/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/openshift_checks/docker_storage.py", line 53, in run
                   docker_info = self.execute_module("docker_info", {})
                 File "/usr/share/ansible/openshift-ansible/roles/openshift_health_checker/openshift_checks/__init__.py", line 211, in execute_module
                   result = self._execute_module(module_name, module_args, self.tmp, self.task_vars)
                 File "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line 809, in _execute_module
                   (module_style, shebang, module_data, module_path) = self._configure_module(module_name=module_name, module_args=module_args, task_vars=task_vars)
                 File "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line 203, in _configure_module
                   environment=final_environment)
                 File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 1023, in modify_module
                   environment=environment)
                 File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 859, in _find_module_utils
                   recursive_finder(module_name, b_module_data, py_module_names, py_module_cache, zf)
                 File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 621, in recursive_finder
                   raise AnsibleError(' '.join(msg))
               AnsibleError: Could not find imported module support code for docker_info.  Looked for either AnsibleDockerClient.py or docker_common.py


The execution of "install_okd.yaml" includes checks designed to fail early if the requirements of the playbook are not met. One or more of these checks failed. To disregard these results,explicitly disable checks by setting an Ansible variable:
   openshift_disable_check=docker_storage
Failing check names are shown in the failure details above. Some checks may be configurable by variables if your requirements are different from the defaults; consult check documentation.
Variables can be set in the inventory or passed on the command line using the -e flag to ansible-playbook.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SEFPOF36T7G4GIIGHERUBKTNOPEMVFSM/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CN6G76Z7MVNK6AEZQ2I7SHULGXSXHCNQ/