On Thu, Oct 29, 2015 at 2:52 PM, Robert Story <rstory(a)tislabs.com> wrote:
On Thu, 29 Oct 2015 14:08:22 +0100 Simone wrote:
ST> it seams that two hosts are fighting fir the same host ID:
ST>
ST> MainThread::INFO::2015-10-27
ST>
09:14:56,764::hosted_engine::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
ST> Ensuring lease for lockspace hosted-engine, host id 1 is acquired
(file:
ST>
/var/run/vdsm/storage/2daba0ab-2b3d-4026-bcfc-1cd071c30038/04b08c8e-657f-4bac-9ddf-c9c57373409c/2d7f5020-42c1-442d-8237-fba9d6787080)
ST> MainThread::ERROR::2015-10-27
ST>
09:14:56,766::hosted_engine::578::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
ST> cannot get lock on host id 1: host already holds lock on a different
ST> host id MainThread::ERROR::2015-10-27
ST>
09:14:56,767::agent::177::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
ST> Error: '(22, 'Sanlock lockspace add failure', 'Invalid
argument')' -
ST> trying to restart agent
ST>
ST> can you please share the output of: hosted-engine --vm-status
Hi Simone, thanks for taking the time to look at this. Here is the outpu:
# hosted-engine --vm-status
!! Cluster is in GLOBAL MAINTENANCE mode !!
--== Host 1 status ==--
Status up-to-date : False
Hostname : ares.netsec
Host ID : 1
Engine status : unknown stale-data
Score : 2334
Local maintenance : False
Host timestamp : 2496391
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=2496391 (Tue Oct 27 07:41:00 2015)
host-id=1
score=2334
maintenance=False
state=EngineUp
--== Host 2 status ==--
Status up-to-date : False
Hostname : hera.netsec
Host ID : 2
Engine status : unknown stale-data
Score : 1689
Local maintenance : False
Host timestamp : 2038037
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=2038037 (Mon Oct 26 08:50:13 2015)
host-id=2
score=1689
maintenance=False
state=EngineDown
--== Host 3 status ==--
Status up-to-date : False
Hostname : eclipse.netsec
Host ID : 3
Engine status : unknown stale-data
Score : 2000
Local maintenance : False
Host timestamp : 2298393
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=2298393 (Thu Oct 29 09:46:21 2015)
host-id=3
score=2000
maintenance=False
state=GlobalMaintenance
--== Host 4 status ==--
Status up-to-date : False
Hostname : poseidon.netsec
Host ID : 4
Engine status : unknown stale-data
Score : 2000
Local maintenance : False
Host timestamp : 123241
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=123241 (Thu Oct 29 09:46:30 2015)
host-id=4
score=2000
maintenance=False
state=GlobalMaintenance
--== Host 5 status ==--
Status up-to-date : False
Hostname : apollo.netsec
Host ID : 5
Engine status : unknown stale-data
Score : 2000
Local maintenance : False
Host timestamp : 2028116
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=2028116 (Mon Oct 26 04:14:46 2015)
host-id=5
score=2000
maintenance=False
state=EngineDown
Here the host IDs seam coherent.
Can you please specify the name of the hosts where you took the logs in
your first log archive (complaining host and engine host) ?
Robert
--
Senior Software Engineer @ Parsons