recovery from expired engine.cer certificate

Hello, I am stuck in this situation... It looks like engine certificate (engine.cer) expired few days ago [root@ovirt ~]# openssl x509 -in /etc/pki/ovirt-engine/certs/engine.cer -noout -dates notBefore=Mar 23 21:34:19 2021 GMT notAfter=Apr 26 21:34:19 2022 GMT CA and other certs are still valid Yesterday I had one host outage and HE restarted on other host. But it cannot communicate with all hosts due to certificate expiration lnav /var/log/ovirt-engine/engine.log ... 2022-05-02 11:02:29,127+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-43) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Received fatal alert: certificate_expired ... There are vms still running on hosts. Is there way how to (manualy?) renew engine cert and recover from this situation? I have tried run engine-setup (and select renew certificate during install) [root@ovirt ~]# engine-setup --offline but it fails with [ ERROR ] It seems that you are running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before running engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data. [ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine setup detected, but Global Maintenance is not set. But global maintenance is enabled on host... [root@ovirt06 ~]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host ovirt05.net.slu.cz (id: 1) status ==-- Host ID : 1 Host timestamp : 38627 Score : 3400 Engine status : {"vm": "down_unexpected", "health": "bad", "detail": "Down", "reason": "bad vm status"} Hostname : ovirt05.net.slu.cz Local maintenance : False stopped : False crc32 : b719664d conf_on_shared_storage : True local_conf_timestamp : 38627 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=38627 (Mon May 2 10:55:43 2022) host-id=1 score=3400 vm_conf_refresh_time=38627 (Mon May 2 10:55:43 2022) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host ovirt06.net.slu.cz (id: 2) status ==-- Host ID : 2 Host timestamp : 8858161 Score : 3400 Engine status : {"vm": "up", "health": "good", "detail": "Up"} Hostname : ovirt06.net.slu.cz Local maintenance : False stopped : False crc32 : 414a980b conf_on_shared_storage : True local_conf_timestamp : 8858161 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8858161 (Mon May 2 10:55:48 2022) host-id=2 score=3400 vm_conf_refresh_time=8858161 (Mon May 2 10:55:48 2022) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! relevant lines from ovirt-engine-setup log are ... 2022-05-02 11:08:02,194+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:239 Creating own connection 2022-05-02 11:08:02,233+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:284 Result: [{'vm_guid': '96a6b6a7-75a9-472a-9d4f-1502b415470a', 'run_on_vds': 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a'}] 2022-05-02 11:08:02,234+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:234 Database: 'None', Statement: ' SELECT vds_id, ha_global_maintenance FROM vds_statistics WHERE vds_id = %(VdsId)s; ', args: {'VdsId': 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a'} 2022-05-02 11:08:02,234+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:239 Creating own connection 2022-05-02 11:08:02,250+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:284 Result: [{'vds_id': 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a', 'ha_global_maintenance': False}] 2022-05-02 11:08:02,250+0200 ERROR otopi.plugins.ovirt_engine_common.ovirt_engine.system.he he._validate:114 It seems that you are running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before running engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data. ... Thanks in advance for any advice, Jiri

Hi, will answer myself... but if you have comments or have better solution please levae comment ovirt-engine-setup log logs SELECT statements to test global maintenance state. In my case engine=# SELECT vm_guid, run_on_vds FROM vms WHERE vm_name = 'HostedEngine'; vm_guid | run_on_vds --------------------------------------+-------------------------------------- 96a6b6a7-75a9-472a-9d4f-1502b415470a | e24f0dcc-51f3-4d1a-acf5-2833a9dc584a (1 row) and engine=# SELECT vds_id, ha_global_maintenance FROM vds_statistics WHERE vds_id = 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a'; vds_id | ha_global_maintenance --------------------------------------+----------------------- e24f0dcc-51f3-4d1a-acf5-2833a9dc584a | f (1 row) because I believe global maintenance is really enabled I have updated ha_global_maintenance state with engine=# UPDATE vds_statistics SET ha_global_maintenance = true WHERE vds_id = 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a'; UPDATE 1 after that I run engine-setup --offline and choose Renew certificates? (Yes, No) [No]: Yes after that all hosts becomes up and vms were recovered (except that vms on failed and restarted host) Cheers, Jiri On 5/2/22 11:16, Jiří Sléžka wrote:
Hello,
I am stuck in this situation...
It looks like engine certificate (engine.cer) expired few days ago
[root@ovirt ~]# openssl x509 -in /etc/pki/ovirt-engine/certs/engine.cer -noout -dates notBefore=Mar 23 21:34:19 2021 GMT notAfter=Apr 26 21:34:19 2022 GMT
CA and other certs are still valid
Yesterday I had one host outage and HE restarted on other host. But it cannot communicate with all hosts due to certificate expiration
lnav /var/log/ovirt-engine/engine.log
... 2022-05-02 11:02:29,127+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-43) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Received fatal alert: certificate_expired ...
There are vms still running on hosts.
Is there way how to (manualy?) renew engine cert and recover from this situation?
I have tried run engine-setup (and select renew certificate during install)
[root@ovirt ~]# engine-setup --offline
but it fails with
[ ERROR ] It seems that you are running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before running engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data.
[ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine setup detected, but Global Maintenance is not set.
But global maintenance is enabled on host...
[root@ovirt06 ~]# hosted-engine --vm-status
!! Cluster is in GLOBAL MAINTENANCE mode !!
--== Host ovirt05.net.slu.cz (id: 1) status ==--
Host ID : 1 Host timestamp : 38627 Score : 3400 Engine status : {"vm": "down_unexpected", "health": "bad", "detail": "Down", "reason": "bad vm status"} Hostname : ovirt05.net.slu.cz Local maintenance : False stopped : False crc32 : b719664d conf_on_shared_storage : True local_conf_timestamp : 38627 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=38627 (Mon May 2 10:55:43 2022) host-id=1 score=3400 vm_conf_refresh_time=38627 (Mon May 2 10:55:43 2022) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt06.net.slu.cz (id: 2) status ==--
Host ID : 2 Host timestamp : 8858161 Score : 3400 Engine status : {"vm": "up", "health": "good", "detail": "Up"} Hostname : ovirt06.net.slu.cz Local maintenance : False stopped : False crc32 : 414a980b conf_on_shared_storage : True local_conf_timestamp : 8858161 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8858161 (Mon May 2 10:55:48 2022) host-id=2 score=3400 vm_conf_refresh_time=8858161 (Mon May 2 10:55:48 2022) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False
!! Cluster is in GLOBAL MAINTENANCE mode !!
relevant lines from ovirt-engine-setup log are
... 2022-05-02 11:08:02,194+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:239 Creating own connection 2022-05-02 11:08:02,233+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:284 Result: [{'vm_guid': '96a6b6a7-75a9-472a-9d4f-1502b415470a', 'run_on_vds': 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a'}] 2022-05-02 11:08:02,234+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:234 Database: 'None', Statement: ' SELECT vds_id, ha_global_maintenance FROM vds_statistics WHERE vds_id = %(VdsId)s; ', args: {'VdsId': 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a'} 2022-05-02 11:08:02,234+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:239 Creating own connection 2022-05-02 11:08:02,250+0200 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:284 Result: [{'vds_id': 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a', 'ha_global_maintenance': False}] 2022-05-02 11:08:02,250+0200 ERROR otopi.plugins.ovirt_engine_common.ovirt_engine.system.he he._validate:114 It seems that you are running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before running engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data. ...
Thanks in advance for any advice,
Jiri
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KJOWRWXM2EAZEF...
participants (1)
-
Jiří Sléžka