HE deployment on FC (fibre-channel) disk fails at 99% completed at final "hosted-engine --reinitialize-lockspace --force"

This identical to many others who have encountered this issue, yet nothing definitive has been suggested. The entire HE deployment nearly finishes, but after the copy of the HE VM to the shared disk, shortly after that I reach the "initialize lockspace" section and the following error occurs: 20083 2024-06-19 13:24:01,777-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 {'changed': True, 'stdout': '', 'stderr': 'Traceback (most recent call last):\n File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main\n return _run_ code(code, main_globals, None,\n File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code\n exec(code, run_globals)\n File "/usr/lib/p ython3.9/site-packages/ovirt_hosted_engine_setup/reinitialize_lockspace.py", line 30, in <module>\n ha_cli.reset_lockspace(force)\n File " /usr/lib/python3.9/site-packages/ovirt_hosted_engine_ha/client/client.py", line 286, in reset_lockspace\n stats = broker.get_stats_from_sto rage()\n File "/usr/lib/python3.9/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 148, in get_stats_from_storage\n result = self._proxy.get_stats()\n File "/usr/lib64/python3.9/xmlrpc/client.py", line 1122, in __call__\n ret urn self.__send(self.__name, args)\n File "/usr/lib64/python3.9/xmlrpc/client.py", line 1464, in __request\n response = self.__transport.request(\n File "/usr/lib64/python3.9/ xmlrpc/client.py", line 1166, in request\n return self.single_request(host, handler, request_body, verbose)\n File "/usr/lib64/python3.9/x mlrpc/client.py", line 1178, in single_request\n http_conn = self.send_request(host, handler, request_body, verbose)\n File "/usr/lib64/py thon3.9/xmlrpc/client.py", line 1291, in send_request\n self.send_content(connection, request_body)\n File "/usr/lib64/python3.9/xmlrpc/cl ient.py", line 1321, in send_content\n connection.endheaders(request_body)\n File "/usr/lib64/python3.9/http/client.py", line 1280, in end headers\n self._send_output(message_body, encode_chunked=encode_chunked)\n File "/usr/lib64/python3.9/http/client.py", line 1040, in _send _output\n self.send(msg)\n File "/usr/lib64/python3.9/http/cl ient.py", line 980, in send\n self.connect()\n File "/usr/lib/python3.9/s ite-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 76, in connect\n self.sock.connect(base64.b16decode(self.host))\nFileNotFoundErro r: [Errno 2] No such file or directory', 'rc': 1, 'cmd': ['hosted-engine', '--reinitialize-lockspace', '--force'], 'start': '2024-06-19 13:24: 01.438386', 'end': '2024-06-19 13:24:01.618227', 'delta': '0:00:00.179841', 'msg': 'non-zero return code', 'invocation': {'module_args': {'_ra w_params': 'hosted-engine --reinitialize-lockspace --force', '_uses_shell': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv' : None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': [], 'stderr_lines': ['Traceback (most recent call last):', ' File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main', ' return _run_code(code, main_global s, None,', ' File "/usr/l ib64/python3.9/runpy.py", line 87, in _run_code', ' exec(code, run_globals)', ' File "/usr/lib/python3.9/site-pa ckages/ovirt_hosted_engine_setup/reinitialize_lockspace.py", line 30, in <module>', ' ha_cli.reset_lockspace(force)', ' File "/usr/lib/pyt hon3.9/site-packages/ovirt_hosted_engine_ha/client/client.py", line 286, in reset_lockspace', ' stats = broker.get_stats_from_storage()', ' File "/usr/lib/python3.9/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 148, in get_stats_from_storage', ' result = self._ proxy.get_stats()', ' File "/usr/lib64/python3.9/xmlrpc/client.py", line 1122, in __call__', ' return self.__send(self.__name, args)', ' File "/usr/lib64/python3.9/xmlrpc/client.py", line 1464, in __request', ' response = self.__transport.request(', ' File "/usr/lib64/python 3.9/xmlrpc/client.py", line 1166, in request', ' return self.single_request(host, handler, request_body, verbose)', ' File "/usr/li b64/pyt hon3.9/xmlrpc/client.py", line 1178, in single_request', ' http_conn = self.send_request(host, handler, request_body, verbose)', ' File "/ usr/lib64/python3.9/xmlrpc/client.py", line 1291, in send_request', ' self.send_content(connection, request_body)', ' File "/usr/lib64/pyt hon3.9/xmlrpc/client.py", line 1321, in send_content', ' connection.endheaders(request_body)', ' File "/usr/lib64/python3.9/http/client.py ", line 1280, in endheaders', ' self._send_output(message_body, encode_chunked=encode_chunked)', ' File "/usr/lib64/python3.9/http/client. py", line 1040, in _send_output', ' self.send(msg)', ' File "/usr/lib64/python3.9/http/client.py", line 980, in send', ' self.connect() ', ' File "/usr/lib/python3.9/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 76, in connect', ' self.sock.connect(base64.b16de code(self.host))', 'FileNotFoundError: [Errno 2] No such file or directory'], '_ansible_no_log': False, 'attempts': 5} No errors in /var/log/messages or /var/log/sanlock.log. I have this working with iSCSI on another storage system, but can't seem to get this to work on FC. I have read that the sector sizes could possible cause this. On my iSCSI system I have [PHY-SEC:LOG-SEC] as [512:512] but on my FC system I have [4096:512]. Hoping that someone can confirm whether this is the issue or not. Interestingly all previous "initialize lockspace" phases of the install are fine, it just appears to be this final one.

Hello, I got a similar issue deploying OLVM 4.5.4 over an MSA by FiberChannel In your KVM host probably you have the ha-agent or ha-broker stopped. Check the logs at your kvm host at /var/log/ovirt-hosted-engine-ha/agent.log or broker.log. In my case was an issue with metadata corruption, ant the log complained about the /rhev/data-center/mnt/blockSD/blablablabla/ha_agent/hosted-engine.metadata That can be fixed as stated by Yeddiyah Bar David in the past. I stopped the broker and agent services, archived the existing hosted metadata files, created an empty metadata 1GB file with dd (dd if=/dev/zero of=/run/vdsm/storage/your-uuid/your-uuid/ bs=1M count=1024), check permissions and owner vdsm:kvm 0644 and then playing around with the "hosted-engine --connect-storage", "hosted-engine maintenance mode and so on, it worked. Good luck
participants (2)
-
Jeffrey Slapp
-
raulmevi2@hotmail.com