Just a note about QNAP iSCSI: I hope to have the chance to throw it in the wastebin the that device as soon as possible :) , but for complete information I have to say that I was able to add a new storage domain using QNAP iSCSI LUNs after the HE was setup on different target.
So the problem was only within the HE deployment.


On 23/10/23 18:27, Giuliano David wrote:
Thanks for your suggestion.
I checked the /etc/iscsi/initiatorname.iscsi content of all my nodes, and they are uniques.
The iSCSI target was a QNAP system with 10Gb/s NIC. I set up a new target on a Debian server and the deploy ended successfully (many other errors on iSCSI deploying the HE, but i managed to solve them)
It is now clear to me that:
- QNAP iSCSI target introduces something nasty that oVirt HE deploy cannot manage
- ovirt-hosted-engine-cleanup is not enough to clean the HE setup environment on the node: manual clean of some directories and a reboot are necessary too.
-
The ansible script used to deploy oVirt HE is soooooooo fragile ... One thing should be implemented on it: the chance to suspend the script on failures, let the administrator fix the issue and then resume the script from the failing step (instead of aborting the deployment and performing the cleanup messing up the logfile too)

Thanks

giuliano


On 21/10/23 21:08, Strahil Nikolov wrote:
Simplest thing to check is if your host can discover and write to the LUN. Is it possible that more than 1 node has the same client IQN ?

Best Regards,
Strahil Nikolov 




On Friday, October 20, 2023, 12:38 PM, Giuliano David <giuliano.david@nvgroup.it> wrote:

Hi everyone.
I need help understanding a failure deploying the hosted engine on a
fresh-installed oVirt 4.5.4 el8 node.
After the setup via official ISO, I login via ssh in the node and I
issue the command:

# hosted-engine --deploy --4 --ansible-extra-vars=he_offline_deployment=true
-- Note --
The extra ansible variable is the only way I found to inhibit the
deployed hosted engine downloading last OS updates that will break
Python compatibility between the ansible playbook in the node deploying
and the ansible host in the engine deployed.
Without that extra variable the deployment fails with fancy reasons.
-- End note --

The deployment goes, until i specify an iSCSI target and a (free) LUN.
The playbook adds the storage domain, creates the HE disk and transfert
the HE vm to the domain. Then an error occurs:

[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Initialize lockspace
volume]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Workaround for
ovirt-ha-broker start failures]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Initialize lockspace
volume]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 5, "changed":
true, "cmd": ["hosted-engine", "--reinitialize-lockspace", "--force"],
"delta": "0:00:00.170053", "end": "2023-10-20 11:21:18.111299", "msg":
"non-zero return code", "rc": 1, "start": "2023-10-20 11:21:17.941246",
"stderr": "Traceback (most recent call last):\n  File
\"/usr/lib64/python3.6/runpy.py\", line 193, in _run_module_as_main\n   
\"__main__\", mod_spec)\n File \"/usr/lib64/python3.6/runpy.py\", line
85, in _run_code\n exec(code, run_globals)\n  File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_lockspace.py\",
line 30, in <module>\n    ha_cli.reset_lockspace(force)\n File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py\",
line 286, in reset_lockspace\n    stats =
broker.get_stats_from_storage()\n  File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py\",
line 148, in get_stats_from_storage\n    result =
self._proxy.get_stats()\n  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1112, in __call__\n   
return self.__send(self.__name, args)\n  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1452, in __request\n   
verbose=self.__verbose\n  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1154, in request\n   
return self.single_request(host, handler, request_body, verbose)\n  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1166, in
single_request\n    http_conn = self.send_request(host, handler,
request_body, verbose)\n  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1279, in
send_request\n    self.send_content(connection, request_body)\n File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1309, in
send_content\n    connection.endheaders(request_body)\n  File
\"/usr/lib64/python3.6/http/client.py\", line 1268, in endheaders\n   
self._send_output(message_body, encode_chunked=encode_chunked)\n  File
\"/usr/lib64/python3.6/http/client.py\", line 1044, in _send_output\n   
self.send(msg)\n  File \"/usr/lib64/python3.6/http/client.py\", line
982, in send\n self.connect()\n  File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py\",
line 76, in connect\n
self.sock.connect(base64.b16decode(self.host))\nFileNotFoundError:
[Errno 2] No such file or directory", "stderr_lines": ["Traceback (most
recent call last):", "  File \"/usr/lib64/python3.6/runpy.py\", line
193, in _run_module_as_main", "    \"__main__\", mod_spec)", "  File
\"/usr/lib64/python3.6/runpy.py\", line 85, in _run_code", " exec(code,
run_globals)", "  File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_lockspace.py\",
line 30, in <module>", "    ha_cli.reset_lockspace(force)", " File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py\",
line 286, in reset_lockspace", "    stats =
broker.get_stats_from_storage()", "  File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py\",
line 148, in get_stats_from_storage", "    result =
self._proxy.get_stats()", "  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1112, in __call__", "   
return self.__send(self.__name, args)", "  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1452, in __request",
"    verbose=self.__verbose", "  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1154, in request", "   
return self.single_request(host, handler, request_body, verbose)", " 
File \"/usr/lib64/python3.6/xmlrpc/client.py\", line 1166, in
single_request", "    http_conn = self.send_request(host, handler,
request_body, verbose)", "  File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1279, in send_request",
"    self.send_content(connection, request_body)", " File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1309, in send_content",
"    connection.endheaders(request_body)", "  File
\"/usr/lib64/python3.6/http/client.py\", line 1268, in endheaders", "   
self._send_output(message_body, encode_chunked=encode_chunked)", "  File
\"/usr/lib64/python3.6/http/client.py\", line 1044, in _send_output",
"    self.send(msg)", "  File \"/usr/lib64/python3.6/http/client.py\",
line 982, in send", " self.connect()", "  File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py\",
line 76, in connect", " self.sock.connect(base64.b16decode(self.host))",
"FileNotFoundError: [Errno 2] No such file or directory"], "stdout": "",
"stdout_lines": []}

Then the playbook cleans all the installation and exits.

Really cant' figure what's going on ...
This is the best point i reached in deploying HE after two weeks of
failures and errors of any kind.
Please, can someone point me in the right direction to solve this new issue?

Thanks.

giuliano



_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org

--
Giuliano David
Systems and Networks Administrator
NV Group |
            20 Annivarsario | Noviservice | Novi Solutions | idPost |
            Funnel
NV Group
Noviservice S.r.l. | Novisolution S.r.l. | idPost S.r.l. | Funnel S.r.l.
Cagliari   Roma   Milano
www.nvgroup.it

--
Giuliano David
Systems and Networks Administrator
NV Group |
          20 Annivarsario | Noviservice | Novi Solutions | idPost |
          Funnel
NV Group
Noviservice S.r.l. | Novisolution S.r.l. | idPost S.r.l. | Funnel S.r.l.
Cagliari   Roma   Milano
www.nvgroup.it