
Hello, It appears that my Manager / hosted-engine isn't working, and I'm unable to get it to start. I have a 3-node HCI cluster, but right now, Gluster is only running on 1 host (so no replication). I was hoping to upgrade / replace the storage on my 2nd host today, but aborted that maintenance when I found that I couldn't even get into the Manager. The storage is mounted, but here's what I see:
[root@cha2-storage dwhite]# hosted-engine --vm-statusThe hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
[root@cha2-storage dwhite]# systemctl status ovirt-ha-agent● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2021-08-13 11:10:51 EDT; 2h 44min ago Main PID: 3591872 (ovirt-ha-agent) Tasks: 1 (limit: 409676) Memory: 21.5M CGroup: /system.slice/ovirt-ha-agent.service └─3591872 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Aug 13 11:10:51 cha2-storage.mgt.barredowlweb.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
Any time I try to do anything like connect the engine storage, disconnect the engine storage, or connect to the console, it just sits there, and doesn't do anything, and I eventually have to ctl-c out of it. Maybe I have to be patient? When I ctl-c, I get a trackback error:
[root@cha2-storage dwhite]# hosted-engine --console^CTraceback (most recent call last): File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 214, in <module> [root@cha2-storage dwhite]# args.command(args) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 42, in func f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 91, in checkVmStatus cli = ohautil.connect_vdsm_json_rpc() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 472, in connect_vdsm_json_rpc __vdsm_json_rpc_connect(logger, timeout) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 395, in __vdsm_json_rpc_connect timeout=timeout) File "/usr/lib/python3.6/site-packages/vdsm/client.py", line 154, in connect outgoing_heartbeat=outgoing_heartbeat, nr_retries=nr_retries) File "/usr/lib/python3.6/site-packages/yajsonrpc/stompclient.py", line 426, in SimpleClient nr_retries, reconnect_interval) File "/usr/lib/python3.6/site-packages/yajsonrpc/stompclient.py", line 448, in StandAloneRpcClient client = StompClient(utils.create_connected_socket(host, port, sslctx), File "/usr/lib/python3.6/site-packages/vdsm/utils.py", line 379, in create_connected_socket sock.connect((host, port)) File "/usr/lib64/python3.6/ssl.py", line 1068, in connect self._real_connect(addr, False) File "/usr/lib64/python3.6/ssl.py", line 1059, in _real_connect self.do_handshake() File "/usr/lib64/python3.6/ssl.py", line 1036, in do_handshake self._sslobj.do_handshake() File "/usr/lib64/python3.6/ssl.py", line 648, in do_handshake self._sslobj.do_handshake()
This is what I see in /var/log/ovirt-hosted-engine-ha/broker.log:
MainThread::WARNING::2021-08-11 10:24:41,596::storage_broker::100::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Connection to storage server failed MainThread::ERROR::2021-08-11 10:24:41,596::broker::69::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Failed initializing the broker: Connection to storage server failed MainThread::ERROR::2021-08-11 10:24:41,598::broker::71::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 64, in run self._storage_broker_instance = self._get_storage_broker() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 143, in _get_storage_broker return storage_broker.StorageBroker() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 97, in __init__ self._backend.connect() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 375, in connect sserver.connect_storage_server() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 451, in connect_storage_server 'Connection to storage server failed' RuntimeError: Connection to storage server failed
MainThread::ERROR::2021-08-11 10:24:41,599::broker::72::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Trying to restart the broker MainThread::INFO::2021-08-11 10:24:42,439::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.4.7 started MainThread::INFO::2021-08-11 10:24:44,442::monitor::45::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors MainThread::INFO::2021-08-11 10:24:44,443::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2021-08-11 10:24:44,449::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2021-08-11 10:24:44,450::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2021-08-11 10:24:44,451::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2021-08-11 10:24:44,451::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2021-08-11 10:24:44,452::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2021-08-11 10:24:44,452::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2021-08-11 10:24:44,452::monitor::63::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
And I see this in /var/log/vdsm/vdsm.log:
2021-08-13 14:08:10,844-0400 ERROR (Reactor thread) [ProtocolDetector.AcceptorImpl] Unhandled exception in acceptor (protocoldetector:76) Traceback (most recent call last): File "/usr/lib64/python3.6/asyncore.py", line 108, in readwrite File "/usr/lib64/python3.6/asyncore.py", line 417, in handle_read_event File "/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py", line 57, in handle_accept File "/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py", line 173, in _delegate_call File "/usr/lib/python3.6/site-packages/vdsm/protocoldetector.py", line 53, in handle_accept File "/usr/lib64/python3.6/asyncore.py", line 348, in accept File "/usr/lib64/python3.6/socket.py", line 205, in accept OSError: [Errno 24] Too many open files
Can anyone help? Sent with ProtonMail Secure Email.