First ovirt 4.4 installation failing

I'm having some trouble setting up my first oVirt system. I have the CentOS 8 installation on the bare metal (ovirt1.ldas.ligo-la.caltech.edu), the ovirt4.4 packages installed, and then try running 'hosted-engine --deploy' to set up my engine (ovirt-engine1.ldas.ligo-la.caltech.edu). For this initial deployment, I accept almost all of the defaults (other than local network-specific settings). However, the hosted-engine deployment fails with: [ INFO ] TASK [ovirt.hosted_engine_setup : Obtain SSO token using username/pass word credentials] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up] [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": false, "ov irt_hosts": []} [...cleanup...] [ INFO ] TASK [ovirt.hosted_engine_setup : Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} However, when I run 'virsh list', I can still see a HostedEngine1 vm running. In virt-hosted-engine-setup-20200522153439-e7iw3k.log I see the error: 2020-05-25 11:57:03,897-0500 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'changed': False, 'ovirt_hosts': [], 'invocation': {'module_args': {'pattern': 'name=ovirt1.ldas.ligo-la.caltech.edu', 'fetch_nested': False, 'nested_attributes': [], 'all_content': False, 'cluster_version': None}}, '_ansible_no_log': False, 'attempts': 120} 2020-05-25 11:57:03,998-0500 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"attempts": 120, "changed": false, "ovirt_hosts": []} In ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20200525112504-y2mmzu.log I see the following ansible errors: 2020-05-25 11:36:22,300-0500 DEBUG ansible on_any args localhostTASK: ovirt.hosted_engine_setup : Always revoke the SSO token kwargs 2020-05-25 11:36:23,766-0500 ERROR ansible failed { "ansible_host": "localhost", "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_result": { "_ansible_no_log": false, "changed": false, "invocation": { "module_args": { "ca_file": null, "compress": true, "headers": null, "hostname": null, "insecure": null, "kerberos": false, "ovirt_auth": { "ansible_facts": { "ovirt_auth": { "ca_file": null, "compress": true, "headers": null, "insecure": true, "kerberos": false, "timeout": 0, "token": "tF4ZMU0Q23zS13W2vzyhkswGMB4XAXZCFiPg9IVvbJXkPq9MFmne40wvCKaQOJO_TkYOpfxe78r9HHJcSrUWCQ", "url": "https://ovirt-engine1.ldas.ligo-la.caltech.edu/ovirt-engine/api" } }, "attempts": 1, "changed": false, "failed": false }, "password": null, "state": "absent", "timeout": 0, "token": null, "url": null, "username": null } }, "msg": "You must specify either 'url' or 'hostname'." }, "ansible_task": "Always revoke the SSO token", "ansible_type": "task", "status": "FAILED", "task_duration": 2 } 2020-05-25 11:36:23,767-0500 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f15adaffa58> kwargs ignore_errors:True Then further down: 2020-05-25 11:57:05,063-0500 DEBUG var changed: host "localhost" var "ansible_failed_result" type "<class 'dict'>" value: "{ "_ansible_no_log": false, "_ansible_parsed": true, "attempts": 120, "changed": false, "failed": true, "invocation": { "module_args": { "all_content": false, "cluster_version": null, "fetch_nested": false, "nested_attributes": [], "pattern": "name=ovirt1.ldas.ligo-la.caltech.edu" } }, "ovirt_hosts": [] }" 2020-05-25 11:57:05,063-0500 ERROR ansible failed { "ansible_host": "localhost", "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_result": { "_ansible_no_log": false, "attempts": 120, "changed": false, "invocation": { "module_args": { "all_content": false, "cluster_version": null, "fetch_nested": false, "nested_attributes": [], "pattern": "name=ovirt1.ldas.ligo-la.caltech.edu" } }, "ovirt_hosts": [] }, "ansible_task": "Wait for the host to be up", "ansible_type": "task", "status": "FAILED", "task_duration": 1235 } 2020-05-25 11:57:05,063-0500 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f15ad92dcc0> kwargs ignore_errors:None Not being very familiar with ansible, I'm not sure where to look next for the root cause of the problem. --Michael Thomas

After a week of iterations, I finally found the problem. I was setting 'PermitRootLogin no' in the global section of the bare metal OS sshd_config, as we do on all of our servers. Instead, PermitRootLogin is set to 'without-password' in a match block to allow root logins only from a well-known set of hosts. Can someone explain why setting 'PermitRootLogin no' in the sshd_config on the hypervisor OS would affect the hosted engine deployment? --Mike

On Sat, Jun 6, 2020 at 8:42 PM Michael Thomas <wart@caltech.edu> wrote:
After a week of iterations, I finally found the problem. I was setting 'PermitRootLogin no' in the global section of the bare metal OS sshd_config, as we do on all of our servers. Instead, PermitRootLogin is set to 'without-password' in a match block to allow root logins only from a well-known set of hosts.
Thanks for the report!
Can someone explain why setting 'PermitRootLogin no' in the sshd_config on the hypervisor OS would affect the hosted engine deployment?
Because the engine (running inside a VM) uses ssh as root to connect to the host (in which the engine vm is running). Best regards, -- Didi

On top of that Ansible is also using ssh, so you need to 'override' the settings for the engine. Best Regards, Strahil Nikolov На 7 юни 2020 г. 13:01:08 GMT+03:00, Yedidyah Bar David <didi@redhat.com> написа:
On Sat, Jun 6, 2020 at 8:42 PM Michael Thomas <wart@caltech.edu> wrote:
After a week of iterations, I finally found the problem. I was
setting 'PermitRootLogin no' in the global section of the bare metal OS sshd_config, as we do on all of our servers. Instead, PermitRootLogin is set to 'without-password' in a match block to allow root logins only from a well-known set of hosts.
Thanks for the report!
Can someone explain why setting 'PermitRootLogin no' in the
sshd_config on the hypervisor OS would affect the hosted engine deployment?
Because the engine (running inside a VM) uses ssh as root to connect to the host (in which the engine vm is running).
Best regards, -- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZOSVMS4LQFTKD7...

On 6/7/20 5:01 AM, Yedidyah Bar David wrote:
On Sat, Jun 6, 2020 at 8:42 PM Michael Thomas <wart@caltech.edu> wrote:
After a week of iterations, I finally found the problem. I was setting 'PermitRootLogin no' in the global section of the bare metal OS sshd_config, as we do on all of our servers. Instead, PermitRootLogin is set to 'without-password' in a match block to allow root logins only from a well-known set of hosts.
Thanks for the report!
Can someone explain why setting 'PermitRootLogin no' in the sshd_config on the hypervisor OS would affect the hosted engine deployment?
Because the engine (running inside a VM) uses ssh as root to connect to the host (in which the engine vm is running).
Would it be sufficient to set, on the host, 'PermitRootLogin without-password' in a Match block that matches the ovirt management network? Match Address 10.10.10.0/24 PermitRootLogin without-password ? --Mike

On Sun, Jun 7, 2020 at 4:07 PM Michael Thomas <wart@caltech.edu> wrote:
On 6/7/20 5:01 AM, Yedidyah Bar David wrote:
On Sat, Jun 6, 2020 at 8:42 PM Michael Thomas <wart@caltech.edu> wrote:
After a week of iterations, I finally found the problem. I was setting 'PermitRootLogin no' in the global section of the bare metal OS sshd_config, as we do on all of our servers. Instead, PermitRootLogin is set to 'without-password' in a match block to allow root logins only from a well-known set of hosts.
I understand that you meant to say that this is already working for you, right? That you set it to allow without-password from some addresses and that that was enough. If so:
Thanks for the report!
Can someone explain why setting 'PermitRootLogin no' in the sshd_config on the hypervisor OS would affect the hosted engine deployment?
Because the engine (running inside a VM) uses ssh as root to connect to the host (in which the engine vm is running).
Would it be sufficient to set, on the host, 'PermitRootLogin without-password' in a Match block that matches the ovirt management network?
Match Address 10.10.10.0/24 PermitRootLogin without-password
?
Do you mean here to ask if 10.10.10.10/24 is enough? The engine VM's IP address should be enough. What this address is, after deploy finishes, is of course up to you. During deploy it's by default in libvirt's default network, 192.168.222.0/24, but can be different if that's already in use by something else (e.g. a physical NIC). BTW, I didn't test this myself. I do see in the code that it's supposed to work. If you find a bug, please report one. Thanks. Best regards, -- Didi

On 6/7/20 8:42 AM, Yedidyah Bar David wrote:
On Sun, Jun 7, 2020 at 4:07 PM Michael Thomas <wart@caltech.edu> wrote:
On 6/7/20 5:01 AM, Yedidyah Bar David wrote:
On Sat, Jun 6, 2020 at 8:42 PM Michael Thomas <wart@caltech.edu> wrote:
After a week of iterations, I finally found the problem. I was setting 'PermitRootLogin no' in the global section of the bare metal OS sshd_config, as we do on all of our servers. Instead, PermitRootLogin is set to 'without-password' in a match block to allow root logins only from a well-known set of hosts.
I understand that you meant to say that this is already working for you, right? That you set it to allow without-password from some addresses and that that was enough. If so:
Correct. Once I added the engine's IP to the Match block allowing root logins, it worked again.
Thanks for the report!
Can someone explain why setting 'PermitRootLogin no' in the sshd_config on the hypervisor OS would affect the hosted engine deployment?
Because the engine (running inside a VM) uses ssh as root to connect to the host (in which the engine vm is running).
Would it be sufficient to set, on the host, 'PermitRootLogin without-password' in a Match block that matches the ovirt management network?
Match Address 10.10.10.0/24 PermitRootLogin without-password
?
Do you mean here to ask if 10.10.10.10/24 is enough?
The engine VM's IP address should be enough. What this address is, after deploy finishes, is of course up to you. During deploy it's by default in libvirt's default network, 192.168.222.0/24, but can be different if that's already in use by something else (e.g. a physical NIC).
BTW, I didn't test this myself. I do see in the code that it's supposed to work. If you find a bug, please report one. Thanks.
I think the two problems that I ran into were: * Lack of documentation about the requirement that the engine (whether self-hosted or standalone) be able to ssh into the bare metal hypervisor host over the ovirt management network using ssh keys. * No clear error message in the logs describing why this was failing. The only errors I got were a timeout waiting for the host to be up, and a generic ""The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n" I'll file this as a documentation bug. --Mike

On Sun, Jun 7, 2020 at 6:37 PM Michael Thomas <wart@caltech.edu> wrote:
On 6/7/20 8:42 AM, Yedidyah Bar David wrote:
On Sun, Jun 7, 2020 at 4:07 PM Michael Thomas <wart@caltech.edu> wrote:
On 6/7/20 5:01 AM, Yedidyah Bar David wrote:
On Sat, Jun 6, 2020 at 8:42 PM Michael Thomas <wart@caltech.edu> wrote:
After a week of iterations, I finally found the problem. I was setting 'PermitRootLogin no' in the global section of the bare metal OS sshd_config, as we do on all of our servers. Instead, PermitRootLogin is set to 'without-password' in a match block to allow root logins only from a well-known set of hosts.
I understand that you meant to say that this is already working for you, right? That you set it to allow without-password from some addresses and that that was enough. If so:
Correct. Once I added the engine's IP to the Match block allowing root logins, it worked again.
Thanks for the report!
Can someone explain why setting 'PermitRootLogin no' in the sshd_config on the hypervisor OS would affect the hosted engine deployment?
Because the engine (running inside a VM) uses ssh as root to connect to the host (in which the engine vm is running).
Would it be sufficient to set, on the host, 'PermitRootLogin without-password' in a Match block that matches the ovirt management network?
Match Address 10.10.10.0/24 PermitRootLogin without-password
?
Do you mean here to ask if 10.10.10.10/24 is enough?
The engine VM's IP address should be enough. What this address is, after deploy finishes, is of course up to you. During deploy it's by default in libvirt's default network, 192.168.222.0/24, but can be different if that's already in use by something else (e.g. a physical NIC).
BTW, I didn't test this myself. I do see in the code that it's supposed to work. If you find a bug, please report one. Thanks.
I think the two problems that I ran into were:
* Lack of documentation about the requirement that the engine (whether self-hosted or standalone) be able to ssh into the bare metal hypervisor host over the ovirt management network using ssh keys.
I agree it's not detailed enough. We have it briefly mentioned e.g. here: https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine... For some reason it's marked "Optional", not sure why.
* No clear error message in the logs describing why this was failing. The only errors I got were a timeout waiting for the host to be up, and a generic ""The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"
I'll file this as a documentation bug.
Very well. Thanks and best regards, -- Didi

On 6/8/20 12:58 AM, Yedidyah Bar David wrote:
On Sun, Jun 7, 2020 at 6:37 PM Michael Thomas <wart@caltech.edu> wrote:
On 6/7/20 8:42 AM, Yedidyah Bar David wrote:
On Sun, Jun 7, 2020 at 4:07 PM Michael Thomas <wart@caltech.edu> wrote:
On 6/7/20 5:01 AM, Yedidyah Bar David wrote:
On Sat, Jun 6, 2020 at 8:42 PM Michael Thomas <wart@caltech.edu> wrote:
After a week of iterations, I finally found the problem. I was setting 'PermitRootLogin no' in the global section of the bare metal OS sshd_config, as we do on all of our servers. Instead, PermitRootLogin is set to 'without-password' in a match block to allow root logins only from a well-known set of hosts.
I understand that you meant to say that this is already working for you, right? That you set it to allow without-password from some addresses and that that was enough. If so:
Correct. Once I added the engine's IP to the Match block allowing root logins, it worked again.
Thanks for the report!
Can someone explain why setting 'PermitRootLogin no' in the sshd_config on the hypervisor OS would affect the hosted engine deployment?
Because the engine (running inside a VM) uses ssh as root to connect to the host (in which the engine vm is running).
Would it be sufficient to set, on the host, 'PermitRootLogin without-password' in a Match block that matches the ovirt management network?
Match Address 10.10.10.0/24 PermitRootLogin without-password
?
Do you mean here to ask if 10.10.10.10/24 is enough?
The engine VM's IP address should be enough. What this address is, after deploy finishes, is of course up to you. During deploy it's by default in libvirt's default network, 192.168.222.0/24, but can be different if that's already in use by something else (e.g. a physical NIC).
BTW, I didn't test this myself. I do see in the code that it's supposed to work. If you find a bug, please report one. Thanks.
I think the two problems that I ran into were:
* Lack of documentation about the requirement that the engine (whether self-hosted or standalone) be able to ssh into the bare metal hypervisor host over the ovirt management network using ssh keys.
I agree it's not detailed enough.
We have it briefly mentioned e.g. here:
https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine...
For some reason it's marked "Optional", not sure why.
* No clear error message in the logs describing why this was failing. The only errors I got were a timeout waiting for the host to be up, and a generic ""The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"
I'll file this as a documentation bug.
Very well.
Filed: https://bugzilla.redhat.com/show_bug.cgi?id=1845271 --Mike

On 2020-06-08 08:58, Yedidyah Bar David wrote:
I agree it's not detailed enough. We have it briefly mentioned e.g. here: https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine... For some reason it's marked "Optional", not sure why.
I think it should also be pointed out that only certain keys are supported. You can't eg. have a ed25519-only setup as the installation tries to use RSA. Poltsi

On Tue, Jun 9, 2020 at 10:23 AM Paul-Erik Törrönen <poltsi@poltsi.fi> wrote:
On 2020-06-08 08:58, Yedidyah Bar David wrote:
I agree it's not detailed enough. We have it briefly mentioned e.g. here: https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine... For some reason it's marked "Optional", not sure why.
I think it should also be pointed out that only certain keys are supported.
You can't eg. have a ed25519-only setup as the installation tries to use RSA.
Thanks for this comment. Added a note for you on Wart's bug 1845271. Do you think this is a significant limitation? In theory, it should not be too hard to make the engine's PKI code more flexible, allowing configuring it to use whatever algorithms both openssl/m2crypto and Java support, but in reality this was never requested. Only relevant change I recall was the request to change from hash algo SHA1 to SHA256, several years ago (which we did, then, unconditionally, still hardcoding sha256 in several places). Thanks and best regards, -- Didi

On 2020-06-09 11:26, Yedidyah Bar David wrote:
On Tue, Jun 9, 2020 at 10:23 AM Paul-Erik Törrönen <poltsi@poltsi.fi> wrote:
You can't eg. have a ed25519-only setup as the installation tries to use RSA.
Thanks for this comment. Added a note for you on Wart's bug 1845271.
Thank you.
Do you think this is a significant limitation?
No, unless you get others requesting this particular support. I only stumbled across this as I am setting up my home network from scratch with a minimal ansible script collection which includes hardening the ssh. Nonetheless it would be a good to mention it in documentation. Poltsi
participants (5)
-
Michael Thomas
-
Paul-Erik Törrönen
-
Strahil Nikolov
-
wart@caltech.edu
-
Yedidyah Bar David