_initialize_sanlock cannot get lock, host already holds lock on a different host id

--Sig_/.6C2e=H_BL1+iV=uP1v2/O5 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable I'm running oVirt 3.5.x with a hosted engine. On 3 of my 5 nodes, ovirt-ha-agent won't start, complaining that "(_initialize_sanlock) cannot get lock on host id 5: host already holds lock on a different host id." Running 'grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf' on each host shows that they all have different unique ids, 1-5. How can I debug this? Is there a command I can run to see which host holds a sanlock for each id? Robert --=20 Senior Software Engineer @ Parsons --Sig_/.6C2e=H_BL1+iV=uP1v2/O5 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWRJtSAAoJEMHFVuy5l8Y4T+EQAKbWkcTuXrUIP4V/JcDKXp+k nVp6VATvbX9AImdyW16QK+k5b/AMRRMhuo+0/vwmSo9iLst1Hf2fGf78ydSB8nj4 d3k+hMSjsJrFajL79iECd9izX9l3BB63pp02Ako0sxDjRzQOOuFTYQnNfX1Pt2V7 Tq7K1im3N2gVjVsBB0bDxC7bVPNpxHR1Jt6uKOUHE0pnWAGKsgOqqZtFhr4yNQHw C7k0+Qx6yuiqQnnRi0KmN0XaHMlZVqBwkCB9IuKj7l6akWIwsqYivOobm3fWgrzl vdy6EIKq6/tZdZvyPqtm2LkvdfAujIx0294xkiuMvGDaXjYQynhITro8P/EvAk1z XqEPmxCzSHZdT9gGe2Xw5jUKNrb4UO4H82uoWkyuibXGCR6ezPDINFmwaZ1kl7mo 5D+D8DioJaUVlfzFN+s9XxrKGpaJZ6FM+P7wsrfwK5W29jCa43sAxquPdsRVxQFU JYLzF0n+1B+1jtFnz2Ok3X57dHxEgy1i/QgWz/3vTGolD+r2lIxDCkbxdrrMVeMN 6/9bWp4Ki//5hEflZEbqCFWfZNgh+F3dXu2oZ55vleej8o5n/nbbb+7vV2v9BWjP osIHvhon39AEyMAIPFAttvxPbTMNVw8Uw45LFGkHuhr0NlNqEfUCMT+j/MFoeTlR KI/VAxFT/evusA8MRGxF =I+CF -----END PGP SIGNATURE----- --Sig_/.6C2e=H_BL1+iV=uP1v2/O5--

Adding Martin On Thu, Nov 12, 2015 at 2:59 PM, Robert Story <rstory@tislabs.com> wrote:
I'm running oVirt 3.5.x with a hosted engine. On 3 of my 5 nodes, ovirt-ha-agent won't start, complaining that "(_initialize_sanlock) cannot get lock on host id 5: host already holds lock on a different host id."
It should correctly refuse to start the vm since the lock is already taken, not sure if the message log is just confusing or a real issue.
Running 'grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf' on each host shows that they all have different unique ids, 1-5.
How can I debug this? Is there a command I can run to see which host holds a
sanlock for each id?
Robert
-- Senior Software Engineer @ Parsons
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

--Sig_/Nq8U=9dsbKG_dQJxhR/y0u6 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 12 Nov 2015 15:22:18 +0100 Sandro wrote: SB> > I'm running oVirt 3.5.x with a hosted engine. On 3 of my 5 nodes, SB> > ovirt-ha-agent won't start, complaining that "(_initialize_sanlock) SB> > cannot get lock on host id 5: host already holds lock on a different SB> > host id." SB> > SB> > SB> It should correctly refuse to start the vm since the lock is already SB> taken, not sure if the message log is just confusing or a real issue. Just to clarify, this isn't about a vm. The engine VM is up and I'm not having issues with any other vms. The problem is with the ovirt-ha-agent. Robert --=20 Senior Software Engineer @ Parsons --Sig_/Nq8U=9dsbKG_dQJxhR/y0u6 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWRNJ1AAoJEMHFVuy5l8Y4M6MP/jhkXlUjYvqguk/0uilPdfWC 4InADYOij68Ian5fJ28V9tFG9fKblJ8wEArwfQvBPdrFYmFudnzY53mOhK/ixEiI twSjsrFjdgvP8USQwZB0ZfAYmeCtIOQIva5Yx3mw5zLhqRHzD9M5ftult/oc9mQC difiyJC+9x3RUFPkeOcdAxEvd1ElnJGTL47GOj7Tabz1fh/rISWnG5kUyog5GEQU mbMO0FjsRhhqkerq4NBhCWY8YMLq1MIvUmnIwH2HqIsvf+0XN1jB6gRCQECG1eWU JVNSxTloDiVPdCpZ8GAfF/9XMISV8OQkoOCyVXmDP5NP07yvdY42Uq1Ln+MWEbhw U0PlJTABjqgekl2Uh0a6SCMYD99eFvSC7bRbA76becFEp/w6HZZo1nVpA0X7kPWv GeItellA+wRn1zUtv3O3dj7pRrdS8gGoDxGisafuv2obqT1CHXIddO7ZJbJArJrA 4hiQRPImxxj0j7lRRnN51UUgN8F10OkdXlC0ZwrrMR9SnzHnHLyzJbkbDxX4Ncse +288G+N2B3P5k1ML6wtH8Qy39bILL3JGXOf+sJ6COS+B4MWoR3uQDHbMfIkRJDD1 hdcALth6cg9meXci2IpNZccrJk81QTvljBvFXorH1S+9RPLLY3402NIhV3Rdf+oN xys7oMYbUxg2E3bZdTYK =FtqY -----END PGP SIGNATURE----- --Sig_/Nq8U=9dsbKG_dQJxhR/y0u6--

--Sig_/aUPHn2=1qjLIJxT+AhdxWjI Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 12 Nov 2015 12:54:49 -0500 Robert wrote: RS> On Thu, 12 Nov 2015 15:22:18 +0100 Sandro wrote: RS> SB> > I'm running oVirt 3.5.x with a hosted engine. On 3 of my 5 nodes, RS> SB> > ovirt-ha-agent won't start, complaining that RS> SB> > "(_initialize_sanlock) cannot get lock on host id 5: host already RS> SB> > holds lock on a different host id." RS> SB> > RS> SB> > RS> SB> It should correctly refuse to start the vm since the lock is already RS> SB> taken, not sure if the message log is just confusing or a real RS> SB> issue. RS>=20 RS> Just to clarify, this isn't about a vm. The engine VM is up and I'm not RS> having issues with any other vms. The problem is with the RS> ovirt-ha-agent. Some additional info. I ran 'sanlock client status -D' on 1 working and 1 non-working host. host 3 (working): s hosted-engine:3:/var/run/vdsm/storage/2daba0ab-2b3d-4026-bcfc-1cd071c3003= 8/04b08c8e-657f-4bac-9ddf-c9c57373409c/2d7f5020-42c1-442d-8237-fba9d6787080= :0 list=3Dspaces space_id=3D4 io_timeout=3D10 host_generation=3D5 renew_fail=3D0 space_dead=3D0 killing_pids=3D0 used_retries=3D0 external_used=3D0 used_by_orphans=3D0 corrupt_result=3D0 acquire_last_result=3D1 renewal_last_result=3D1 acquire_last_attempt=3D2178388 acquire_last_success=3D2178528 renewal_last_attempt=3D3523708 renewal_last_success=3D3523708 host 5 (not working): s hosted-engine:5:/rhev/data-center/mnt/ovirt-nfs.netsec\:_ovirt_hosted-eng= ine/2daba0ab-2b3d-4026-bcfc-1cd071c30038/images/04b08c8e-657f-4bac-9ddf-c9c= 57373409c/2d7f5020-42c1-442d-8237-fba9d6787080:0 list=3Dspaces space_id=3D2 io_timeout=3D10 host_generation=3D17 renew_fail=3D0 space_dead=3D0 killing_pids=3D0 used_retries=3D0 external_used=3D0 used_by_orphans=3D0 corrupt_result=3D0 acquire_last_result=3D1 renewal_last_result=3D1 acquire_last_attempt=3D101 acquire_last_success=3D241 renewal_last_attempt=3D3532404 renewal_last_success=3D3532404 And running 'sanlock client host_status -s hosted-engine -D' (on either 3 or 5), info for hosts 3 and 5 is: 3 timestamp 3523933 last_check=3D3523954 last_live=3D3523954 last_req=3D0 owner_id=3D3 owner_generation=3D5 timestamp=3D3523933 io_timeout=3D10 owner_name=3D53d2cee3-fdd8-4c4c-8265-83328bf729af.eclipse.ne 5 timestamp 3532732 last_check=3D3523954 last_live=3D3523954 last_req=3D0 owner_id=3D5 owner_generation=3D17 timestamp=3D3532732 io_timeout=3D10 owner_name=3D2c1ec955-4802-4f89-a824-d8a7470c2c9f.apollo.net Robert --=20 Senior Software Engineer @ Parsons --Sig_/aUPHn2=1qjLIJxT+AhdxWjI Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWRNalAAoJEMHFVuy5l8Y461AP/Aio/mxNbOMKHd/7hKwW21mA sW2wlBGWJYYeULtus4lGG1rEUutx5HbkYW3avLSdb7sbDZgZc5vFN2F4nF3C5TyQ p9xuZcjpH9qQZqh+WewhvXX8ezJGP/zpxCWwvY93hUZIirdczmEZ9QwcndMmARA3 1HuJsFqgMD58lywmE04/i5nesat2Zq9KwneBmz4+s0mmKAaA/RbYi2/BaOq2mccy 24S2GIxz2JieS+re+4zwjoWugyDvvpNY5PML0qvMlZ1W1sEU48w+Dyi/8/3BjY7O /swQr281VdJNaDKL9z12IQvY4lEsYFqa9jfeKMNexin3YemreZDOU9T7Oehhib80 DlKJAMzOSMwaaDnfJxqE4fpJXozraurA/8mZErnKjJLfjp908hDi3Rc770TLic/H k2JTgZ7lV25W2VL0MKuk9PgGhP6RlUQ2GRA/JA406ejiqKUxNW6VK8fCe0HGolke lkGLJI5XR7P5fDunCEnoK7aiQBvuLUJQXYtwoyxm5V6QN8DRc1IXGw3C5xu70Dyq ske88fHiFP+CcdHh1YBGvXOpqj7LFKFRrIox4cNR8unHwD2XgoBxc4hjcD8Cg+2E UPSIbsEjamrgpdd1zFwmkH2EUyGahqzJoYeOzZNKRXZ2sFYPyXX/IavX00n5Z7M2 aHoyDVanKsrHMA50NXy9 =W9be -----END PGP SIGNATURE----- --Sig_/aUPHn2=1qjLIJxT+AhdxWjI--
participants (2)
-
Robert Story
-
Sandro Bonazzola