
I have an Ovirt installation with a hosted engine and three hosts. Using Gluster as the storage for the VMs. Ovirt: 4.4.6.7 Hosts: CentOS Stream release 8 (updated to latest) So far so good. I am trying to add a new host to the cluster with the same OS and hardware as the others and I cannot get it to install, it gives me all kind of errors and it will not install. I reinstalled the OS and I am getting the same results. DNS is configured properly and working ok for all hosts. I can see this error in this log file ansible-runner-service.log: 2021-05-30 15:46:38,319 - runner_service.services.hosts - ERROR - SSH - NOAUTH:SSH auth error - passwordless ssh not configured for 'ovirt4' (sshd is configured exactly the same as all other hosts and I can login to this host without a password from the ovirt hosted engine) I see these errors in the log engine.log: 2021-05-30 16:22:35,166Z ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 2021-05-30 16:22:35,175Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt4.net.miami.edu command Get Host Capabilities failed: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 2021-05-30 16:22:35,175Z ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 2021-05-30 16:22:35,597Z ERROR [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-23) [] Error while refreshing server data for cluster 'Default' from database: null I tried reinstalling, rebooting, put it in maintenance, enroll the certificate, check for Upgrades, rebooted multiple times both the hosts and the ovirt engine: nothing works. What am I doing wrong? Thank you in advance for your help.

In case it is useful to anyone. I found the problem. The permissions on the ovirt hosted-engine for the /etc/pki/ovirt-engine were root.root I changed them to ovirt.ovirt and I was able to add the host without issue. I have no idea why root ended up being the owner of that folder. There was no hint anywhere in the logs that I could see indicating that there was a problem with these permissions. All is good now.

On Tue, Jun 1, 2021 at 8:09 PM <pablo@miami.edu> wrote:
In case it is useful to anyone. I found the problem. The permissions on the ovirt hosted-engine for the /etc/pki/ovirt-engine were root.root I changed them to ovirt.ovirt and I was able to add the host without issue.
I have no idea why root ended up being the owner of that folder.
Thanks for the report! Is there any chance it was due to an oVirt bug? And not something else - a human mistake, some other software, etc.? If so, perhaps you can still try finding out a bit more information about this, by checking the directory's timestamp (in case you still kept it somewhere before fixing) and trying to find what was running during this. If you do find anything, please report. Thanks!
There was no hint anywhere in the logs that I could see indicating that there was a problem with these permissions.
I didn't try to reproduce, so only guess: One might consider this a bug in ansible - as in: ansible was told to use some private key, it failed to read it, but the log does not indicate this. You might want to report it (I think as an issue on their github repo).
All is good now.
:-) Best regards, -- Didi
participants (2)
-
pablo@miami.edu
-
Yedidyah Bar David