I'm not sure how it happened, but a few hours ago, the lockspace for the hosted engine
became corrupted. sanlock reports a -223 error.
CentOS 8 Stream, Ceph iSCSI backend (Reef at 18.2.4) using tcmu-runner.
I managed to format the lockspace, after shutting down the HA Agents and Brokers on all HA
Engine nodes, but as soon as I start up the HA Agent on any node, the lockspace becomes
corrupted again, and sanlock starts returning the -223 error message again. I see no
relevant other errors in what I can find so far.
Any suggestions on where to investigate next? I have an entire cluster than cannot
start/stop/migrate VMS as the entire DataCenter is marked Non Operational.
Show replies by date