Hello,
It appears that my Manager / hosted-engine isn't working, and I'm unable to get it
to start.
I have a 3-node HCI cluster, but right now, Gluster is only running on 1 host (so no
replication).
I was hoping to upgrade / replace the storage on my 2nd host today, but aborted that
maintenance when I found that I couldn't even get into the Manager.
The storage is mounted, but here's what I see:
[root@cha2-storage dwhite]# hosted-engine --vm-statusThe hosted
engine configuration has not been retrieved from shared storage. Please ensure that
ovirt-ha-agent is running and the storage server is reachable.
[root@cha2-storage dwhite]# systemctl status ovirt-ha-agent●
ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor
preset: disabled)
Active: active (running) since Fri 2021-08-13 11:10:51 EDT; 2h 44min ago
Main PID: 3591872 (ovirt-ha-agent)
Tasks: 1 (limit: 409676)
Memory: 21.5M
CGroup: /system.slice/ovirt-ha-agent.service
└─3591872 /usr/libexec/platform-python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Aug 13 11:10:51
cha2-storage.mgt.barredowlweb.com systemd[1]: Started
oVirt Hosted Engine High Availability Monitoring Agent.
Any time I try to do anything like connect the engine storage, disconnect the engine
storage, or connect to the console, it just sits there, and doesn't do anything, and I
eventually have to ctl-c out of it.
Maybe I have to be patient? When I ctl-c, I get a trackback error:
[root@cha2-storage dwhite]# hosted-engine --console^CTraceback (most
recent call last):
File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py",
line 214, in <module>
[root@cha2-storage dwhite]# args.command(args)
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py",
line 42, in func
f(*args, **kwargs)
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py",
line 91, in checkVmStatus
cli = ohautil.connect_vdsm_json_rpc()
File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py",
line 472, in connect_vdsm_json_rpc
__vdsm_json_rpc_connect(logger, timeout)
File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py",
line 395, in __vdsm_json_rpc_connect
timeout=timeout)
File "/usr/lib/python3.6/site-packages/vdsm/client.py", line 154, in connect
outgoing_heartbeat=outgoing_heartbeat, nr_retries=nr_retries)
File "/usr/lib/python3.6/site-packages/yajsonrpc/stompclient.py", line 426,
in SimpleClient
nr_retries, reconnect_interval)
File "/usr/lib/python3.6/site-packages/yajsonrpc/stompclient.py", line 448,
in StandAloneRpcClient
client = StompClient(utils.create_connected_socket(host, port, sslctx),
File "/usr/lib/python3.6/site-packages/vdsm/utils.py", line 379, in
create_connected_socket
sock.connect((host, port))
File "/usr/lib64/python3.6/ssl.py", line 1068, in connect
self._real_connect(addr, False)
File "/usr/lib64/python3.6/ssl.py", line 1059, in _real_connect
self.do_handshake()
File "/usr/lib64/python3.6/ssl.py", line 1036, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib64/python3.6/ssl.py", line 648, in do_handshake
self._sslobj.do_handshake()
This is what I see in /var/log/ovirt-hosted-engine-ha/broker.log:
MainThread::WARNING::2021-08-11
10:24:41,596::storage_broker::100::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Connection to storage server failed
MainThread::ERROR::2021-08-11
10:24:41,596::broker::69::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Failed
initializing the broker: Connection to storage server failed
MainThread::ERROR::2021-08-11
10:24:41,598::broker::71::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Traceback
(most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line
64, in run
self._storage_broker_instance = self._get_storage_broker()
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line
143, in _get_storage_broker
return storage_broker.StorageBroker()
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 97, in __init__
self._backend.connect()
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 375, in connect
sserver.connect_storage_server()
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py",
line 451, in connect_storage_server
'Connection to storage server failed'
RuntimeError: Connection to storage server failed
MainThread::ERROR::2021-08-11
10:24:41,599::broker::72::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Trying to
restart the broker
MainThread::INFO::2021-08-11
10:24:42,439::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.4.7 started
MainThread::INFO::2021-08-11
10:24:44,442::monitor::45::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2021-08-11
10:24:44,443::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2021-08-11
10:24:44,449::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2021-08-11
10:24:44,450::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2021-08-11
10:24:44,451::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2021-08-11
10:24:44,451::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2021-08-11
10:24:44,452::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2021-08-11
10:24:44,452::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2021-08-11
10:24:44,452::monitor::63::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
And I see this in /var/log/vdsm/vdsm.log:
2021-08-13 14:08:10,844-0400 ERROR (Reactor thread)
[ProtocolDetector.AcceptorImpl] Unhandled exception in acceptor (protocoldetector:76)
Traceback (most recent call last):
File "/usr/lib64/python3.6/asyncore.py", line 108, in readwrite
File "/usr/lib64/python3.6/asyncore.py", line 417, in handle_read_event
File "/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py", line 57,
in handle_accept
File "/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py", line
173, in _delegate_call
File "/usr/lib/python3.6/site-packages/vdsm/protocoldetector.py", line 53, in
handle_accept
File "/usr/lib64/python3.6/asyncore.py", line 348, in accept
File "/usr/lib64/python3.6/socket.py", line 205, in accept
OSError: [Errno 24] Too many open files
Can anyone help?
Sent with ProtonMail Secure Email.