Hello,
It appears that my Manager / hosted-engine isn't working, and I'm unable to get it to start.

I have a 3-node HCI cluster, but right now, Gluster is only running on 1 host (so no replication).
I was hoping to upgrade / replace the storage on my 2nd host today, but aborted that maintenance when I found that I couldn't even get into the Manager.

The storage is mounted, but here's what I see:

[root@cha2-storage dwhite]# hosted-engine --vm-status
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.

[root@cha2-storage dwhite]# systemctl status ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-08-13 11:10:51 EDT; 2h 44min ago
Main PID: 3591872 (ovirt-ha-agent)
    Tasks: 1 (limit: 409676)
   Memory: 21.5M
   CGroup: /system.slice/ovirt-ha-agent.service
           └─3591872 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent

Aug 13 11:10:51 cha2-storage.mgt.barredowlweb.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.

Any time I try to do anything like connect the engine storage, disconnect the engine storage, or connect to the console, it just sits there, and doesn't do anything, and I eventually have to ctl-c out of it.
Maybe I have to be patient? When I ctl-c, I get a trackback error:

[root@cha2-storage dwhite]# hosted-engine --console
^CTraceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main

    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 214, in <module>
[root@cha2-storage dwhite]#     args.command(args)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 42, in func
    f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 91, in checkVmStatus
    cli = ohautil.connect_vdsm_json_rpc()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 472, in connect_vdsm_json_rpc
    __vdsm_json_rpc_connect(logger, timeout)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 395, in __vdsm_json_rpc_connect
    timeout=timeout)
  File "/usr/lib/python3.6/site-packages/vdsm/client.py", line 154, in connect
    outgoing_heartbeat=outgoing_heartbeat, nr_retries=nr_retries)
  File "/usr/lib/python3.6/site-packages/yajsonrpc/stompclient.py", line 426, in SimpleClient
    nr_retries, reconnect_interval)
  File "/usr/lib/python3.6/site-packages/yajsonrpc/stompclient.py", line 448, in StandAloneRpcClient
    client = StompClient(utils.create_connected_socket(host, port, sslctx),
  File "/usr/lib/python3.6/site-packages/vdsm/utils.py", line 379, in create_connected_socket
    sock.connect((host, port))
  File "/usr/lib64/python3.6/ssl.py", line 1068, in connect
    self._real_connect(addr, False)
  File "/usr/lib64/python3.6/ssl.py", line 1059, in _real_connect
    self.do_handshake()
  File "/usr/lib64/python3.6/ssl.py", line 1036, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib64/python3.6/ssl.py", line 648, in do_handshake
    self._sslobj.do_handshake()


This is what I see in /var/log/ovirt-hosted-engine-ha/broker.log:

MainThread::WARNING::2021-08-11 10:24:41,596::storage_broker::100::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Connection to storage server failed
MainThread::ERROR::2021-08-11 10:24:41,596::broker::69::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Failed initializing the broker: Connection to storage server failed
MainThread::ERROR::2021-08-11 10:24:41,598::broker::71::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 64, in run
    self._storage_broker_instance = self._get_storage_broker()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 143, in _get_storage_broker
    return storage_broker.StorageBroker()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 97, in __init__
    self._backend.connect()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 375, in connect
    sserver.connect_storage_server()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 451, in connect_storage_server
    'Connection to storage server failed'
RuntimeError: Connection to storage server failed

MainThread::ERROR::2021-08-11 10:24:41,599::broker::72::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Trying to restart the broker
MainThread::INFO::2021-08-11 10:24:42,439::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.4.7 started
MainThread::INFO::2021-08-11 10:24:44,442::monitor::45::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2021-08-11 10:24:44,443::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2021-08-11 10:24:44,449::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2021-08-11 10:24:44,450::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2021-08-11 10:24:44,451::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2021-08-11 10:24:44,451::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2021-08-11 10:24:44,452::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2021-08-11 10:24:44,452::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2021-08-11 10:24:44,452::monitor::63::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors


And I see this in /var/log/vdsm/vdsm.log:

2021-08-13 14:08:10,844-0400 ERROR (Reactor thread) [ProtocolDetector.AcceptorImpl] Unhandled exception in acceptor (protocoldetector:76)
Traceback (most recent call last):
  File "/usr/lib64/python3.6/asyncore.py", line 108, in readwrite
  File "/usr/lib64/python3.6/asyncore.py", line 417, in handle_read_event
  File "/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py", line 57, in handle_accept
  File "/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py", line 173, in _delegate_call
  File "/usr/lib/python3.6/site-packages/vdsm/protocoldetector.py", line 53, in handle_accept
  File "/usr/lib64/python3.6/asyncore.py", line 348, in accept
  File "/usr/lib64/python3.6/socket.py", line 205, in accept
OSError: [Errno 24] Too many open files

Can anyone help?

Sent with ProtonMail Secure Email.