Did a change in Ansible 2.9 in the ovirt_vm_facts module break the hosted-engine-setup?

I am having problems installing a 3-node HCI cluster on machines that used to work fine.... and on a fresh set of servers, too. After a series of setbacks on a set of machines with failed installations and potentially failed clean-ups, I am ssing a fresh set of servers that had never run oVirt before. Patched to just before today's bigger changes (new kernel..) installation failed during the setup of the local hosted engine first and when I switched from GUi to script setup 'hosted-engine --deploy' *without* doing a cleanup this time, it progressed further to the point where the local VM had actually been teleported onto the (gluster based) cluster and is running there. In what seems the absolutely final action before adding the other two hosts, ansible is doing a finaly inventory of the virtual machine and collects facts or rather information (that's perhaps the breaking point) about that first VM before I would continue, only the data structure got renamed between ansible 2.8 and 2.9 according to this: https://fossies.org/diffs/ansible/2.8.5_vs_2.9.0rc1/lib/ansible/modules/clou... And the resulting error message from the /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-*.log file is: 2019-12-04 13:15:19,232+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_vms": [{"affinity_labels": [], "applications": [], "bios": {"boot_menu": {"enabled": false}, "type": "i440fx_sea_bios"}, "cdroms": [], "cluster": {"href": "/ovirt-engine/api/clusters/6616551e-1695-11ea-a86b-00163e34e004", "id": "6616551e-1695-11ea-a86b-00163e34e004"}, "comment": "", "cpu": {"architecture": "x86_64", "topology": {"cores": 1, "sockets": 4, "threads": 1}}, "cpu_profile": {"href": "/ovirt-engine/api/cpuprofiles/58ca604e-01a7-003f-01de-000000000250", "id": "58ca604e-01a7-003f-01de-000000000250"}, "cpu_shares": 0, "creation_time": "2019-12-04 13:01:12.780000+00:00", "delete_protected": false, "description": "", "disk_attachments": [], "display": {"address": "127.0.0.1", "allow_override": false, "certificate": {"content": "-----BEGIN CERTIFICATE-----(redacted)-----END CERTIFICATE-----\n", "organization": "***", "subject": "**"}, "copy_paste_enabled": true, "disconnect_action": "LOCK_SCREEN", "file_transfer_enabled": true, "monitors": 1, "port": 5900, "single_qxl_pci": false, "smartcard_enabled": false, "type": "vnc"}, "fqdn": "xdrd1001s.priv.atos.fr", "graphics_consoles": [], "guest_operating_system": {"architecture": "x86_64", "codename": "", "distribution": "CentOS Linux", "family": "Linux", "kernel": {"version": {"build": 0, "full_version": "3.10.0-1062.4.3.el7.x86_64", "major": 3, "minor": 10, "revision": 1062}}, "version": {"full_version": "7", "major": 7}}, "guest_time_zone": {"name": "GMT", "utc_offset": "+00:00"}, "high_availability": {"enabled": false, "priority": 0}, "host": {"href": "/ovirt-engine/api/hosts/75d096fd-4a2f-4ba4-b9fb-941f86daf624", "id": "75d096fd-4a2f-4ba4-b9fb-941f86daf624"}, "host_devices": [], "href": "/ovirt-engine/api/vms/dee6ec3b-5b4a-4063-ade9-12dece0f5fab", "id": "dee6ec3b-5b4a-4063-ade9-12dece0f5fab", "io": {"threads": 1}, "katello_errata": [], "la rge_icon": {"href": "/ovirt-engine/api/icons/9588ebfc-865a-4969-9829-d170d3654900", "id": "9588ebfc-865a-4969-9829-d170d3654900"}, "memory": 17179869184, "memory_policy": {"guaranteed": 17179869184, "max": 17179869184}, "migration": {"auto_converge": "inherit", "compressed": "inherit"}, "migration_downtime": -1, "multi_queues_enabled": true, "name": "external-HostedEngineLocal", "next_run_configuration_exists": false, "nics": [], "numa_nodes": [], "numa_tune_mode": "interleave", "origin": "external", "original_template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "os": {"boot": {"devices": ["hd"]}, "type": "other"}, "permissions": [], "placement_policy": {"affinity": "migratable"}, "quota": {"id": "7af18f3a-1695-11ea-ab7e-00163e34e004"}, "reported_devices": [], "run_once": false, "sessions": [], "small_icon": {"href": "/ovirt-engine/api/icons/dec3572e-7465-4527-884b-f7c2eb2ed811", "id": "dec3572e-7465-4 527-884b-f7c2eb2ed811"}, "snapshots": [], "sso": {"methods": [{"id": "guest_agent"}]}, "start_paused": false, "stateless": false, "statistics": [], "status": "unknown", "storage_error_resume_behaviour": "auto_resume", "tags": [], "template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "time_zone": {"name": "Etc/GMT"}, "type": "desktop", "usb": {"enabled": false}, "watchdogs": []}]}, "attempts": 24, "changed": false, "deprecations": [{"msg": "The 'ovirt_vm_facts' module has been renamed to 'ovirt_vm_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]} If that is the case, wouldn't that imply that there is no test case that covers the most typical HCI deployment when Ansible is updated? That would be truly frightening...

This seems to be a much bigger generic issue with Ansible 2.9. Here is an excerpt from the release notes: "Renaming from _facts to _info Ansible 2.9 renamed a lot of modules from <something>_facts to <something>_info, because the modules do not return Ansible facts. Ansible facts relate to a specific host. For example, the configuration of a network interface, the operating system on a unix server, and the list of packages installed on a Windows box are all Ansible facts. The renamed modules return values that are not unique to the host. For example, account information or region data for a cloud provider. Renaming these modules should provide more clarity about the types of return values each set of modules offers." I guess that means all the oVirt playbooks need to be adapted for Ansible 2.9 and that evidently didn't happen or not completely. It would also seem to suggest that there is no automated integration testing before an oVirt release... which contradicts the opening clause of the opening phrase of the ovirt.org download page: "oVirt 4.3.7 is intended for production use and is available for the following platforms..."

On Thu, Dec 12, 2019 at 9:40 AM <thomas@hoberg.net> wrote:
This seems to be a much bigger generic issue with Ansible 2.9. Here is an excerpt from the release notes:
"Renaming from _facts to _info
Ansible 2.9 renamed a lot of modules from <something>_facts to <something>_info, because the modules do not return Ansible facts. Ansible facts relate to a specific host. For example, the configuration of a network interface, the operating system on a unix server, and the list of packages installed on a Windows box are all Ansible facts. The renamed modules return values that are not unique to the host. For example, account information or region data for a cloud provider. Renaming these modules should provide more clarity about the types of return values each set of modules offers."
I guess that means all the oVirt playbooks need to be adapted for Ansible 2.9 and that evidently didn't happen or not completely.
We are going to adapt, but this is not a breaking change. Till Ansible 2.11 there is automatic linking between *_facts and *_info, only in 2.12 *_facts will be removed. There is just deprecation warning about this tissue, but no breakage. Also please be aware that we will require Ansible 2.9 as minimum version for oVirt 4.4.
It would also seem to suggest that there is no automated integration testing before an oVirt release... which contradicts the opening clause of the opening phrase of the ovirt.org download page: "oVirt 4.3.7 is intended for production use and is available for the following platforms..." _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ROWX54XPPIGHBD...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

Thanks Martin, that actually helps a lot, because I was afraid of the implications. So the error must really be elsewhere. I am having another look at the logs from the HostedEngineLocal and it seems to complain that no Gluster members are up, not even the initial one. I also saw no entries in the Postgres gluster_servers table so I, killed the HostedEngineLocal VM and am doing another setup run to see if I can find out what's going wrong.

What got me derailed was the "ERROR" tag and the fact that it was the last thing to happen on the outside ("waiting for engine to be up"), while the HostedEngineLocal on the inside was looking for Gluster members it couldn't find...
participants (2)
-
Martin Perina
-
thomas@hoberg.net