It seems that after the last attempt I managed to move forward:

systemctl start ovirt-ha-agent ovirt-ha-broker

then stopped the ovirt-ha-agent and run "hosted-engine --reinitialize-lockspace"

Now the situation changed a little bit:
# sanlock client status
daemon 5f37f400-b865-11dc-a4f5-2c4d54502372
p -1 helper
p -1 listener
p 89795 HostedEngine
p -1 status
s hosted-engine:1:/run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1:0
s ca3807b9-5afc-4bcd-a557-aacbcc53c340:1:/rhev/data-center/mnt/glusterSD/ovirt2\:_engine44/ca3807b9-5afc-4bcd-a557-aacbcc53c340/dom_md/ids:0
r ca3807b9-5afc-4bcd-a557-aacbcc53c340:292c2cac-8dad-4229-a9a3-e64811f4b34e:/rhev/data-center/mnt/glusterSD/ovirt2\:_engine44/ca3807b9-5afc-4bcd-a557-aacbcc53c340/images/1deecc6a-0584-4758-8fbb-6386662a8075/292c2cac-8dad-4229-a9a3-e64811f4b34e.lease:0:1 p 89795

And the engine is running:
--== Host ovirt2.localdomain (id: 1) status ==--

Host ID : 1
Host timestamp : 31136
Score : 3400
Engine status : {"vm": "up", "health": "bad", "detail": "Up", "reason": "failed liveliness check"}
Hostname : ovirt2.localdomain
Local maintenance : False
stopped : False
crc32 : 5f5bbd94
conf_on_shared_storage : True
local_conf_timestamp : 31136
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=31136 (Thu Jan 6 11:46:23 2022)
host-id=1
score=3400
vm_conf_refresh_time=31136 (Thu Jan 6 11:46:23 2022)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False


I will leave it for a while before trying to troubleshoot.

Best Regards,
Strahil Nikolov
В четвъртък, 6 януари 2022 г., 09:23:11 Гринуич+2, Strahil Nikolov via Users <users@ovirt.org> написа:


Hello All,

I was trying to upgrade my single node setup (Actually it used to be 2+1 arbiter, but one of the data nodes died) from 4.3.10 to 4.4.?

The deployment failed on 'hosted-engine --reinitialize-lockspace --force' and it seems that sanlock fails to obtain a lock:

# hosted-engine --reinitialize-lockspace --force
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_lockspace.py", line 30, in <module>
    ha_cli.reset_lockspace(force)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 286, in reset_lockspace
    stats = broker.get_stats_from_storage()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 148, in get_stats_from_storage
    result = self._proxy.get_stats()
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
    verbose=self.__verbose
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
    http_conn = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
    connection.endheaders(request_body)
  File "/usr/lib64/python3.6/http/client.py", line 1268, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1044, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.6/http/client.py", line 982, in send
    self.connect()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect
    self.sock.connect(base64.b16decode(self.host))
FileNotFoundError: [Errno 2] No such file or directory

# grep sanlock /var/log/messages | tail
Jan  6 08:29:48 ovirt2 sanlock[1269]: 2022-01-06 08:29:48 19341 [77108]: s1777 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan  6 08:29:49 ovirt2 sanlock[1269]: 2022-01-06 08:29:49 19342 [1310]: s1777 add_lockspace fail result -223
Jan  6 08:29:54 ovirt2 sanlock[1269]: 2022-01-06 08:29:54 19347 [77113]: s1778 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan  6 08:29:55 ovirt2 sanlock[1269]: 2022-01-06 08:29:55 19348 [1310]: s1778 add_lockspace fail result -223
Jan  6 08:30:00 ovirt2 sanlock[1269]: 2022-01-06 08:30:00 19353 [77138]: s1779 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan  6 08:30:01 ovirt2 sanlock[1269]: 2022-01-06 08:30:01 19354 [1311]: s1779 add_lockspace fail result -223
Jan  6 08:30:06 ovirt2 sanlock[1269]: 2022-01-06 08:30:06 19359 [77144]: s1780 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan  6 08:30:07 ovirt2 sanlock[1269]: 2022-01-06 08:30:07 19360 [1310]: s1780 add_lockspace fail result -223
Jan  6 08:30:12 ovirt2 sanlock[1269]: 2022-01-06 08:30:12 19365 [77151]: s1781 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan  6 08:30:13 ovirt2 sanlock[1269]: 2022-01-06 08:30:13 19366 [1310]: s1781 add_lockspace fail result -223


# sanlock client status
daemon 5f37f400-b865-11dc-a4f5-2c4d54502372
p -1 helper
p -1 listener
p -1 status
s ca3807b9-5afc-4bcd-a557-aacbcc53c340:1:/rhev/data-center/mnt/glusterSD/ovirt2\:_engine44/ca3807b9-5afc-4bcd-a557-aacbcc53c340/dom_md/ids:0


Could it be related to the sector size of the Gluster's Brick?

# smartctl -a /dev/sdb | grep  'Sector Sizes'
Sector Sizes:    512 bytes logical, 4096 bytes physical


Any hint will be helpful


Best Regads,
Strahil Nikolov
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MB2POLUPBLAZ7ORZ45IGWPF5QFMKLY3Y/