OK.

Managed to get the engine up and running. But now it fails to communicate with the nodes :/
... But at least I have an engine running...

*** DISCLAIMER ***
The following may eat your data, burn your house and possibly start WW3. Use it only if: A. This is the last ditch attempt to save your cluster. B. You feel brave.
As this problem literally plagues every single ovirt user, I'm posting this in an effort to create a what-to-do-when-your-certs-expire handbook.

Managed to get the engine and nodes up using a combination of data from 4 different sources.
A. Create a new local CA following the instructions here:
https://myhomelab.gr/linux/2019/12/13/local-ca-setup.html
NOTE: You need to add "keyUsage = keyEncipherment, dataEncipherment, digitalSignature" to opensslsan.cnf.
B. Use the newly created CA to generate (and deploy) apache.p12 cert(s), following the instructions here:
https://myhomelab.gr/linux/2020/01/20/replacing_ovirt_ssl.html
... and here:
https://rhv.bradmin.org/ovirt-engine/docs/Administration_Guide/appe-Red_Hat_Enterprise_Virtualization_and_SSL.html
C. Rebuild the host certs using the instructions below:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/56QU2AD7YUX2VZUP4NZMRFXK32MJM7QE/

Once you restart the engine and hosts services, I hosted-engine --vm-status between the hosts looks OK (all nodes are at 3400) and I can login into the engine.
However, the engine still refuses to talk to the hosts, citing:

2022-12-26 08:53:14,727+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = gilboa-home-hv1-dev.localdomain, VdsIdAndVdsVDSCommandParametersBase:{hostId='43ddfcd5-4bd1-4731-bf30-4fedce22f3ab', vds='Host[gilboa-home-hv1-dev.localdomain,43ddfcd5-4bd1-4731-bf30-4fedce22f3ab]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: SSL session is invalid
2022-12-26 08:53:17,744+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2022-12-26 08:53:17,748+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-6) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2022-12-26 08:53:18,187+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] Unable to RefreshCapabilities: ClientConnectionException: SSL session is invalid
2022-12-26 08:53:18,188+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = gilboa-home-hv1-dev.localdomain, VdsIdAndVdsVDSCommandParametersBase:{hostId='43ddfcd5-4bd1-4731-bf30-4fedce22f3ab', vds='Host[gilboa-home-hv1-dev.localdomain,43ddfcd5-4bd1-4731-bf30-4fedce22f3ab]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: SSL session is invalid
2022-12-26 08:53:18,348+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-62) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM gilboa-home-hv2-srv.localdomain command Get Host Capabilities failed: Message timeout which can be caused by communication issues
2022-12-26 08:53:18,348+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-62) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues

- Gilboa

On Sun, Dec 25, 2022 at 5:13 PM Gilboa Davara <gilboad@gmail.com> wrote:


On Sun, Dec 25, 2022 at 12:37 PM Gilboa Davara <gilboad@gmail.com> wrote:
On Sun, Dec 25, 2022 at 12:36 PM Gilboa Davara <gilboad@gmail.com> wrote:
Hello all,

Even though I do my best to keep track of the certificate issue date across my different clusters, I somehow missed the vdsm certificate expiration in one of my clusters.
Now I have an active cluster with multiple nodes (self-hosted / gluster storage), vdsm service is down on all nodes (due to certificate expiration) - hence, I cannot get the cluster into global maintenance mode (vdsms are down), and I cannot access my engine (to renew the engine certificates / re-enroll hosts).
How can manual renew the host certificate?

Thanks,
Gilboa

P.S. CentOS 8 Streams engine and host, ovirt v4.5.3 (I think).

- Gilboa

Managed to find an old email in this group (that I saved...)

This got the nodes working... but the engine (GRRR) still cannot connect to the nodes (I assume it has expired certs as well), hence, it cannot detect the cluster is in global maintenance mode, and cannot run engine-setup.


- Gilboa