Right, I don't have those options, because the hosts are listed as unassigned. I
can't migrate the engine. I can't put anything into maintenance so the
installation menu becomes available.
On Feb 20, 2022, at 07:52, Strahil Nikolov
<hunter86_bg(a)yahoo.com> wrote:
Do you have the option to use 'Install' -> enroll certificate (or whatever is
the entry in UI ) ?
Best Regards,
Strahil Nikolov
On Sun, Feb 20, 2022 at 8:05, Joseph Gelinas
<joseph(a)gelinas.cc> wrote:
Both I guess. The host certificates expired on the 15th the console expires on the 23.
Right now since the engine sees the hosts as unassigned I don't get the option to set
hosts to maintenance mode and if I try to set Enable Global Maintenance I get the message:
"Cannot edit VM Cluster. Operation can be performed only when Hoist status is
Up."
> On Feb 19, 2022, at 14:55, Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote:
>
> Is your issue with the host certificates or the engine ?
>
> You can try to set a node in maintenance (or at least try that) and then try to
reenroll the certificate from the UI.
>
> Best Regards,
> Strahil Nikolov
>
> On Sat, Feb 19, 2022 at 9:48, Joseph Gelinas
> <joseph(a)gelinas.cc> wrote:
> I believe I ran `hosted-engine --deploy` on ovirt-1 to see if there was an option to
reenroll that way, but when it prompted and asked if it was really what I wanted to do I
ctrl-D or said no and it ran something anyways, so I ctrl-C out of it and maybe that is
what messed up vdsm on that node. Not sure about ovirt-3, is there a way to fix that?
>
> > On Feb 18, 2022, at 17:21, Joseph Gelinas <joseph(a)gelinas.cc> wrote:
> >
> > Unfortunately ovirt-ha-broker & ovirt-ha-agent are just in continual
restart loops on ovirt-1 & ovirt-3 (ovirt-engine is currently on ovirt-3).
> >
> > The output for broker.log:
> >
> > MainThread::ERROR::2022-02-18
22:08:58,101::broker::72::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Trying to
restart the broker
> > MainThread::INFO::2022-02-18
22:08:58,453::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.4.5 started
> > MainThread::INFO::2022-02-18
22:09:00,456::monitor::45::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> > MainThread::INFO::2022-02-18
22:09:00,456::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
> > MainThread::INFO::2022-02-18
22:09:00,457::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
> > MainThread::INFO::2022-02-18
22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
> > MainThread::INFO::2022-02-18
22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
> > MainThread::INFO::2022-02-18
22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
> > MainThread::INFO::2022-02-18
22:09:00,460::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
> > MainThread::INFO::2022-02-18
22:09:00,460::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
> > MainThread::INFO::2022-02-18
22:09:00,460::monitor::63::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
> > MainThread::WARNING::2022-02-18
22:10:00,788::storage_broker::100::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Couldn't connect to VDSM within 60 seconds
> > MainThread::ERROR::2022-02-18
22:10:00,788::broker::69::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Failed
initializing the broker: Couldn't connect to VDSM within 60 seconds
> > MainThread::ERROR::2022-02-18
22:10:00,789::broker::71::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Traceback
(most recent call last):
> > File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line
64, in run
> > self._storage_broker_instance = self._get_storage_broker()
> > File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line
143, in _get_storage_broker
> > return storage_broker.StorageBroker()
> > File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 97, in __init__
> > self._backend.connect()
> > File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 370, in connect
> > connection = util.connect_vdsm_json_rpc(logger=self._logger)
> > File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 472,
in connect_vdsm_json_rpc
> > __vdsm_json_rpc_connect(logger, timeout)
> > File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 415,
in __vdsm_json_rpc_connect
> > timeout=VDSM_MAX_RETRY * VDSM_DELAY
> > RuntimeError: Couldn't connect to VDSM within 60 seconds
> >
> >
> > vdsm.log:
> >
> > 2022-02-18 22:14:43,939+0000 INFO (vmrecovery) [vds] recovery: waiting for
storage pool to go up (clientIF:726)
> > 2022-02-18 22:14:44,071+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48832 (protocoldetector:61)
> > 2022-02-18 22:14:44,074+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:44,442+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48836 (protocoldetector:61)
> > 2022-02-18 22:14:44,445+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:45,077+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48838 (protocoldetector:61)
> > 2022-02-18 22:14:45,435+0000 INFO (periodic/2) [vdsm.api] START
repoStats(domains=()) from=internal, task_id=2dd417e7-0f4f-4a09-a1af-725f267af135
(api:48)
> > 2022-02-18 22:14:45,435+0000 INFO (periodic/2) [vdsm.api] FINISH repoStats
return={} from=internal, task_id=2dd417e7-0f4f-4a09-a1af-725f267af135 (api:54)
> > 2022-02-18 22:14:45,438+0000 WARN (periodic/2) [root] Failed to retrieve
Hosted Engine HA info, is Hosted Engine setup finished? (api:194)
> > 2022-02-18 22:14:45,447+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48840 (protocoldetector:61)
> > 2022-02-18 22:14:45,449+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:46,082+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48842 (protocoldetector:61)
> > 2022-02-18 22:14:46,084+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:46,452+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48844 (protocoldetector:61)
> > 2022-02-18 22:14:46,455+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:47,087+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48846 (protocoldetector:61)
> > 2022-02-18 22:14:47,089+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:47,457+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48848 (protocoldetector:61)
> > 2022-02-18 22:14:47,459+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:48,092+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48850 (protocoldetector:61)
> > 2022-02-18 22:14:48,094+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:48,461+0000 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48852 (protocoldetector:61)
> > 2022-02-18 22:14:48,464+0000 ERROR (Reactor thread)
[ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1
(sslutils:269)
> > 2022-02-18 22:14:48,941+0000 INFO (vmrecovery) [vdsm.api] START
getConnectedStoragePoolsList(options=None) from=internal,
task_id=75ef5d5f-c56b-4595-95c8-3dc64caa3a83 (api:48)
> > 2022-02-18 22:14:48,942+0000 INFO (vmrecovery) [vdsm.api] FINISH
getConnectedStoragePoolsList return={'poollist': []} from=internal,
task_id=75ef5d5f-c56b-4595-95c8-3dc64caa3a83 (api:54)
> >
> >
> >
> >> On Feb 18, 2022, at 16:35, Strahil Nikolov via Users
<users(a)ovirt.org> wrote:
> >>
> >> ovirt-2 is 'state=GlobalMaintenance' , but the other 2 nodes is
uknown.
> >> Try to start ovirt-ha-broker & ovirt-ha-agent
> >>
> >> Also, you may try to move the hosted-engine to ovirt-2 and try again
> >>
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
> >> On Fri, Feb 18, 2022 at 21:48, Joseph Gelinas
> >> <joseph(a)gelinas.cc> wrote:
> >> I may be in maintenance mode, I did try to set it in the beginning of this,
but engine-setup doesn't see it. At this point my nodes say they can't connect to
the HA daemon, or have stale data.
> >>
> >> [root@ovirt-1 ~]# hosted-engine --set-maintenance --mode=global
> >> Cannot connect to the HA daemon, please check the logs.
> >>
> >> [root@ovirt-3 ~]# hosted-engine --set-maintenance --mode=global
> >> Cannot connect to the HA daemon, please check the logs.
> >>
> >> [root@ovirt-2 ~]# hosted-engine --set-maintenance --mode=global
> >> [root@ovirt-2 ~]# hosted-engine --vm-status
> >>
> >>
> >> !! Cluster is in GLOBAL MAINTENANCE mode !!
> >>
> >>
> >>
> >> --== Host
ovirt-1.xxxxxx.com (id: 1) status ==--
> >>
> >> Host ID : 1
> >> Host timestamp : 6750990
> >> Score : 0
> >> Engine status : unknown stale-data
> >> Hostname :
ovirt-1.xxxxxx.com
> >> Local maintenance : False
> >> stopped : True
> >> crc32 : 5290657b
> >> conf_on_shared_storage : True
> >> local_conf_timestamp : 6750950
> >> Status up-to-date : False
> >> Extra metadata (valid at timestamp):
> >> metadata_parse_version=1
> >> metadata_feature_version=1
> >> timestamp=6750990 (Thu Feb 17 22:17:53 2022)
> >> host-id=1
> >> score=0
> >> vm_conf_refresh_time=6750950 (Thu Feb 17 22:17:12 2022)
> >> conf_on_shared_storage=True
> >> maintenance=False
> >> state=AgentStopped
> >> stopped=True
> >>
> >>
> >> --== Host
ovirt-3.xxxxxx.com (id: 2) status ==--
> >>
> >> Host ID : 2
> >> Host timestamp : 6731526
> >> Score : 0
> >> Engine status : unknown stale-data
> >> Hostname :
ovirt-3.xxxxxx.com
> >> Local maintenance : False
> >> stopped : True
> >> crc32 : 12c6b5c9
> >> conf_on_shared_storage : True
> >> local_conf_timestamp : 6731486
> >> Status up-to-date : False
> >> Extra metadata (valid at timestamp):
> >> metadata_parse_version=1
> >> metadata_feature_version=1
> >> timestamp=6731526 (Thu Feb 17 15:29:37 2022)
> >> host-id=2
> >> score=0
> >> vm_conf_refresh_time=6731486 (Thu Feb 17 15:28:57 2022)
> >> conf_on_shared_storage=True
> >> maintenance=False
> >> state=AgentStopped
> >> stopped=True
> >>
> >>
> >> --== Host
ovirt-2.xxxxxx.com (id: 3) status ==--
> >>
> >> Host ID : 3
> >> Host timestamp : 6829853
> >> Score : 3400
> >> Engine status : {"vm": "down",
"health": "bad", "detail": "unknown",
"reason": "vm not running on this host"}
> >> Hostname :
ovirt-2.xxxxxx.com
> >> Local maintenance : False
> >> stopped : False
> >> crc32 : 0779c0b8
> >> conf_on_shared_storage : True
> >> local_conf_timestamp : 6829853
> >> Status up-to-date : True
> >> Extra metadata (valid at timestamp):
> >> metadata_parse_version=1
> >> metadata_feature_version=1
> >> timestamp=6829853 (Fri Feb 18 19:25:17 2022)
> >> host-id=3
> >> score=3400
> >> vm_conf_refresh_time=6829853 (Fri Feb 18 19:25:17 2022)
> >> conf_on_shared_storage=True
> >> maintenance=False
> >> state=GlobalMaintenance
> >> stopped=False
> >>
> >>
> >> !! Cluster is in GLOBAL MAINTENANCE mode !!
> >>
> >>
> >> Ovirt-ha-agent on 1&3 just keeps trying to restart:
> >>
> >> MainThread::ERROR::2022-02-18
19:34:36,910::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to
restart agent
> >> MainThread::INFO::2022-02-18
19:34:36,910::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting
down
> >> MainThread::INFO::2022-02-18
19:34:47,268::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 2.4.5 started
> >> MainThread::INFO::2022-02-18
19:34:47,280::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Certificate common name not found, using hostname to identify host
> >> MainThread::ERROR::2022-02-18
19:35:47,629::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback
(most recent call last):
> >> File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line
131, in _run_agent
> >> return action(he)
> >> File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line
55, in action_proper
> >> return he.start_monitoring()
> >> File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 436, in start_monitoring
> >> self._initialize_vdsm()
> >> File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 595, in _initialize_vdsm
> >> logger=self._log
> >> File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 472,
in connect_vdsm_json_rpc
> >> __vdsm_json_rpc_connect(logger, timeout)
> >> File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 415,
in __vdsm_json_rpc_connect
> >> timeout=VDSM_MAX_RETRY * VDSM_DELAY
> >> RuntimeError: Couldn't connect to VDSM within 60 seconds
> >>
> >>
> >> Ovirt-2's ovirt-hosted-engine-ha/agent.log has entries detecting global
maintenance though `systemctl status ovirt-ha-agent` has python exception errors from
yesterday.
> >>
> >> MainThread::INFO::2022-02-18
19:39:10,452::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Global maintenance detected
> >> MainThread::INFO::2022-02-18
19:39:10,524::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state GlobalMaintenance (score: 3400)
> >>
> >>
> >> Feb 17 18:49:12
ovirt-2.us1.vricon.com python3[1324125]: detected unhandled
Python exception in
'/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py'
> >>
> >>
> >>
> >>> On Feb 18, 2022, at 14:20, Strahil Nikolov
<hunter86_bg(a)yahoo.com> wrote:
> >>>
> >>> To set the engine into maintenance mode you can ssh to any Hypervisor
and run:
> >>> 'hosted-engine --set-maintenance --mode=global'
> >>> wait 1 minute and run 'hosted-engine --vm-status' to validate.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>> On Fri, Feb 18, 2022 at 19:03, Joseph Gelinas
> >>> <joseph(a)gelinas.cc> wrote:
> >>> Hi,
> >>>
> >>> The certificates on our oVirt stack recently expired, while all the VMs
are still up, I can't put the cluster into global maintenance via ovirt-engine, or do
anything via ovirt-engine for that matter. Just get event logs about cert validity.
> >>>
> >>> VDSM
ovirt-1.xxxxx.com command Get Host Capabilities failed: PKIX path
validation failed: java.security.cert.CertPathValidatorException: validity check failed
> >>> VDSM
ovirt-2.xxxxx.com command Get Host Capabilities failed: PKIX path
validation failed: java.security.cert.CertPathValidatorException: validity check failed
> >>> VDSM
ovirt-3.xxxxx.com command Get Host Capabilities failed: PKIX path
validation failed: java.security.cert.CertPathValidatorException: validity check failed
> >>>
> >>> Under Compute -> Hosts, all are status Unassigned. Default data
center is status Non Responsive.
> >>>
> >>> I have tried a couple of solutions to regenerate the certificates
without much luck and have copied the originals back in place.
> >>>
> >>>
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/...
> >>>
> >>>
https://access.redhat.com/solutions/2409751
> >>>
> >>>
> >>> I have seen things saying running engine-setup will generate new certs,
however engine doesn't think the cluster is in global maintenance so won't run
that, I believe I can get around the check with `engine-setup
--otopi-environment=OVESETUP_CONFIG/continueSetupOnHEVM=bool:True` but is that the right
thing to do? Will it deploy the certs on to the hosts as well so things communicate
properly? Looks like one is supposed to put a node into maintenance and reenroll it after
doing the engine-setup, but will it even be able to put the nodes into maintenance given I
can't do anything with them now?
> >>>
> >>> Appreciate any ideas.
> >>>
> >>>
> >>> _______________________________________________
> >>> Users mailing list -- users(a)ovirt.org
> >>> To unsubscribe send an email to users-leave(a)ovirt.org
> >>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> >>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> >>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QCFPKQ3OKPO...
> >> _______________________________________________
> >> Users mailing list -- users(a)ovirt.org
> >> To unsubscribe send an email to users-leave(a)ovirt.org
> >> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> >> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives:
> >>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XOQBFYM5W7S...
> >>
> >> _______________________________________________
> >> Users mailing list -- users(a)ovirt.org
> >> To unsubscribe send an email to users-leave(a)ovirt.org
> >> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> >> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NZE5DYLGQEF...
>
> >
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/477NW53FXLC...
>
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TSBVZC37VMV...