Hi,
will answer myself... but if you have comments or have better solution
please levae comment
ovirt-engine-setup log logs SELECT statements to test global maintenance
state. In my case
engine=# SELECT vm_guid, run_on_vds FROM vms WHERE vm_name =
'HostedEngine';
vm_guid | run_on_vds
--------------------------------------+--------------------------------------
96a6b6a7-75a9-472a-9d4f-1502b415470a |
e24f0dcc-51f3-4d1a-acf5-2833a9dc584a
(1 row)
and
engine=# SELECT vds_id, ha_global_maintenance FROM vds_statistics WHERE
vds_id = 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a';
vds_id | ha_global_maintenance
--------------------------------------+-----------------------
e24f0dcc-51f3-4d1a-acf5-2833a9dc584a | f
(1 row)
because I believe global maintenance is really enabled I have updated
ha_global_maintenance state with
engine=# UPDATE vds_statistics SET ha_global_maintenance = true WHERE
vds_id = 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a';
UPDATE 1
after that I run
engine-setup --offline
and choose Renew certificates? (Yes, No) [No]: Yes
after that all hosts becomes up and vms were recovered (except that vms
on failed and restarted host)
Cheers,
Jiri
On 5/2/22 11:16, Jiří Sléžka wrote:
Hello,
I am stuck in this situation...
It looks like engine certificate (engine.cer) expired few days ago
[root@ovirt ~]# openssl x509 -in /etc/pki/ovirt-engine/certs/engine.cer
-noout -dates
notBefore=Mar 23 21:34:19 2021 GMT
notAfter=Apr 26 21:34:19 2022 GMT
CA and other certs are still valid
Yesterday I had one host outage and HE restarted on other host. But it
cannot communicate with all hosts due to certificate expiration
lnav /var/log/ovirt-engine/engine.log
...
2022-05-02 11:02:29,127+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-43)
[] Unable to RefreshCapabilities: VDSNetworkException:
VDSGenericException: VDSNetworkException: Received fatal alert:
certificate_expired
...
There are vms still running on hosts.
Is there way how to (manualy?) renew engine cert and recover from this
situation?
I have tried run engine-setup (and select renew certificate during install)
[root@ovirt ~]# engine-setup --offline
but it fails with
[ ERROR ] It seems that you are running your engine inside of the
hosted-engine VM and are not in "Global Maintenance" mode.
In that case you should put the system into the "Global
Maintenance" mode before running engine-setup, or the hosted-engine HA
agent might kill the machine, which might corrupt your data.
[ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine
setup detected, but Global Maintenance is not set.
But global maintenance is enabled on host...
[root@ovirt06 ~]# hosted-engine --vm-status
!! Cluster is in GLOBAL MAINTENANCE mode !!
--== Host ovirt05.net.slu.cz (id: 1) status ==--
Host ID : 1
Host timestamp : 38627
Score : 3400
Engine status : {"vm": "down_unexpected",
"health":
"bad", "detail": "Down", "reason": "bad vm
status"}
Hostname : ovirt05.net.slu.cz
Local maintenance : False
stopped : False
crc32 : b719664d
conf_on_shared_storage : True
local_conf_timestamp : 38627
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=38627 (Mon May 2 10:55:43 2022)
host-id=1
score=3400
vm_conf_refresh_time=38627 (Mon May 2 10:55:43 2022)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host ovirt06.net.slu.cz (id: 2) status ==--
Host ID : 2
Host timestamp : 8858161
Score : 3400
Engine status : {"vm": "up", "health":
"good",
"detail": "Up"}
Hostname : ovirt06.net.slu.cz
Local maintenance : False
stopped : False
crc32 : 414a980b
conf_on_shared_storage : True
local_conf_timestamp : 8858161
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=8858161 (Mon May 2 10:55:48 2022)
host-id=2
score=3400
vm_conf_refresh_time=8858161 (Mon May 2 10:55:48 2022)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
!! Cluster is in GLOBAL MAINTENANCE mode !!
relevant lines from ovirt-engine-setup log are
...
2022-05-02 11:08:02,194+0200 DEBUG
otopi.ovirt_engine_setup.engine_common.database database.execute:239
Creating own connection
2022-05-02 11:08:02,233+0200 DEBUG
otopi.ovirt_engine_setup.engine_common.database database.execute:284
Result: [{'vm_guid': '96a6b6a7-75a9-472a-9d4f-1502b415470a',
'run_on_vds': 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a'}]
2022-05-02 11:08:02,234+0200 DEBUG
otopi.ovirt_engine_setup.engine_common.database database.execute:234
Database: 'None', Statement: '
SELECT vds_id, ha_global_maintenance
FROM vds_statistics
WHERE vds_id = %(VdsId)s;
', args: {'VdsId':
'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a'}
2022-05-02 11:08:02,234+0200 DEBUG
otopi.ovirt_engine_setup.engine_common.database database.execute:239
Creating own connection
2022-05-02 11:08:02,250+0200 DEBUG
otopi.ovirt_engine_setup.engine_common.database database.execute:284
Result: [{'vds_id': 'e24f0dcc-51f3-4d1a-acf5-2833a9dc584a',
'ha_global_maintenance': False}]
2022-05-02 11:08:02,250+0200 ERROR
otopi.plugins.ovirt_engine_common.ovirt_engine.system.he
he._validate:114 It seems that you are running your engine inside of the
hosted-engine VM and are not in "Global Maintenance" mode.
In that case you should put the system into the "Global Maintenance"
mode before running engine-setup, or the hosted-engine HA agent might
kill the machine, which might corrupt your data.
...
Thanks in advance for any advice,
Jiri
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KJOWRWXM2EA...