repeating EngineUnexpectedlyDown/EngineDown/EngineStart/EngineStarting

--Sig_/+j1R2P+c=Zv.QGARGGJyPq/ Content-Type: multipart/mixed; boundary="MP_/_7Zm8Xi//waNSwzp1NQwbMu" --MP_/_7Zm8Xi//waNSwzp1NQwbMu Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi, I have oVirt 3.5.4 on CentOS 7.1 hosts, and everyone once in a while one of my hosts starts sending me the 4 engine status messages above about every 10-15 minutes. It looks like the ha broker on the host currently running is having issues (already tried restarting it once. I've attached a tarball with log snippets for the engine, host with active engine vm running, and the complaining host... (I'll be hanging out in #ovirt all day too, will respond to questions or suggestions here or there.) Robert --=20 Senior Software Engineer @ Parsons --MP_/_7Zm8Xi//waNSwzp1NQwbMu Content-Type: application/gzip Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=ovirt-logs.tar.gz H4sICDt9L1YAA292aXJ0LWxvZ3MudGFyAO2dW3MjN5Lv9exPUW/qjihSuAPFiH7w2O51x3F79ri9 nocZh6JIliSGSZYOL/L2bsx3PwCKVPMGoCSRmZJbFQ63eP9V4pL/xCVR341mi864vp5fnJ3qIvbS Urp/meZi89/1dUaZolxpxQU5I/aB0meZPBnRxrWcL8pZlp3N5ot69jn8vtTrL/Sqv5T/oJ7cjsvR dDS97tzU88XRKsSDyl9TW/6KCvVa/hBXrPz7s/qPata1Lz7tN1wBKyEOlz+1L2l+X/7U1gXCiP37 LCPHucX49ZWX/683s6ocdlSv9+Hn93/v9azpZYeSDtMZKXqU9yTPRVH0etX0ejStLm+qcry46fUU 33mq+93t8qe6HP5c/+Cf7vXelIPFqJ6+zX77mE3rRVZPs8XNaJ65uvXN6ncZMVpwHf51mRtpf308 mi+qaTXr2SdFr+dr7aX7omp4ucYou+sKu3pz97t6Oq08xI/ldDh2H38zrxbL27fZl5eyytaAvv3M TTV8CJamW1hGPQnrxv+1xTUY1/NdJBNHYkiWSmBJBEsVcSSFZKkEloG3lCBxJKTWl8AyCK1P0DgS UutLYQkES7E4kkSyVByrgK1TIgZTMOtgJ9Xk8mpWWV8qNx51P1aT9/7Zew87WT2T8cKY9ffLwPeL HhG5Zta7DW6Xl2Prsy+n9erm7G27n9p/odt49R9X7v/N8nZYLqpLa+jF5dXI3WfEyz+ShNE2JINy PFiOHYx739vs02dbKJPMPcgW9aIcvyNdqinLs+bz9qHTnbmlnXbun3Lv2K4rPAxti0capCocx8IQ JSJUkVdIWN1ipNY5LIGElbAWglZKWUojIIUCkhUSVutLYBUIlgpFKQ2SIUiWSmBh9FOhKGWFhNVP JbAw5FsoSrFIRc4UVkgQx8IInmQoSmmQsFpfCoshWCoUpRxEgrNUAguh9clQlLJCQgqeUlgKwVIR MeyQNJKlElgIKkFGVKdFKrD6qQQWsEoIwlCZG2ZtdDuaXttgnDV/df/TP7yPwj8tB4NqPr9ajsef M/eGaphR0tW0a//PJN++9bDopionEqtE4lhu2B+87oZFt0dCCuRSWBj9YVh0eySs/jCBhdEfhkX3 ASQ4S8WxMEZ3ZFh0H0CCs1QCC6GfUmHR7ZGQ+qkUFsKAkwqLbiZzQpD6qThWIQiCpcKaxCNh1akE FoLvU2HR7ZGw6lQcSyFE4iosuj0SR7JUAgthdEeFxbBHQlIJCSyNMGahwmKY7c9qwFkqjoUxCq3C YpipnBMsS8WxKIJGV2Ex7JAYUoSswsrTYXEslxy3lsCo6nFLSQSNrsNi2CEppGgmjiUIQo+uI2LY ISH1UyksBOWpI2LYISEpzwQWha1TEXFnYdzCrEctdtLbtxwW21zmEmvVQAKrQIgBdLg8LJIiSLNx KSyM1h0W2weQ4CwVx6IIYxU6LLYPIMFZKoGF4VvDYtsjYfnWBBZCDKDDqtYhYcUACSwN209FO01N Hutb1fYth8W8oLZyII0aJbAYgscwYTHvkZD0YAoLoXWbsJh3SBypdaewEEJpExbzHglJr6WwECJ8 ExbbHglpeCaFhTALZ8J+wyMhKdsUFsI8gAmLbWE9rEDyfQksjPE1ExbbHgmr9SWwEHY4mLDY9khI YxUJLIUQV5qw2D6ABGepBBbCqI6JiOF9JDhLJbAQIvAiIob3kcAslcJ6LnGlgynUI+NKuf7+iCRS OXHDDZPryeKyPxsNr91PbD/R/Wj//tvqtfsfel8vp8OseUfmjeM+kv05Wtxkt/VsMV//eNgjSJ4r J2xPmPGiCEcYbiMylpxPYGEs6ynCEYZHQnKTCSyJ4CaLcHM6gARnqQQWgpsswh3bASQ4SyWwMNxk OMLwSEgD1SksBDlfRPzJPhKcpRJYsIN+8QpePDbjxfZAdRGOYKTOmcLyGHEsjE32RTiC8UhIyxsT WBiL9opwBHMACc5SCSz4ASBJwhGMR8IJrJJY8MpWkojY1mibV5NY8MOvkkTEtt7bTwtnqThWARsD hCSRdDmWjP0LPdtTQ+KWgZw625OUIpHtyb5juyxDUYqFVjkhWFUsjsXgQwJJQiJuhYQza5TEwui2 Ig1hHwnOUnEsDj90IUkoSlkh4cirJBaGvApFESskLHmVwIKf3ZYkFEWskLD6qQQWfBgoSSiKkNYp 54Zhtb44Fkbro6EoYoWE1PpoSLKvsJAiiZS1EFxyylLwa+AkDUURDZLAGbZPYsEP20saFsMHkOAs lcBC0Og0LIY9Es6wfRILeBAmDOO2BqtjZnuSNCy6qcopxfIccSyM+JKGRbdHQtJtKSwMZxYW3QeQ 4CwVx8KIL2lYdHskJIWbwsLwsRHR7ZBwVromsRD6KRZRtw4JyccmsARsJB5x+BbG7aJ62hEHzS2H RT1TOcPyrQksDN/KwprHI2HV2QQWwpgIC4t6j4TkW1NYCPErC7dxh4S0hyuJheBbWVhseyQk35rC wuinwmLbI2H1Uwks2H4qXsFdPqfH+NaCbt9yWMwzkwukhcqShZWzwxJIQWIKC6N7jhcgClJYzDsk pIXKSSyE+QYeFvMeCUnZprAQ9BqPiG2HhLR+I4GFkE1d8ojYdkhYdSqBhTC+xsNim6ucG6QYII5V SISljDysRTwS0gRkCgvB9/Gw2HZISPkHk1gI0RIPi22PhBQtpbAQVroEz6k+iARnqQQWhkoI6/MD SHCWimMBZ+iPd5puy8Gj4kq2fcthsS0E2gkAKSyG0LqD52CvkJBC3RQWgm8NnoN9EAnOUnEshGxS MngO9goJaVQnhYUQLQWPW14hIUVLKSyEGCB4rvEKCalHT2EhxJXB45alz3uBlB8whYWxMjl43HKD hLU2MoWF4fvCYlvgnSaYxEKIK4PHLR9EgrNUAgthFi543PJBJDhLJbAQfF/wuOUVElKPnsKC9X0R R2xhzGPjyvtsUuFOWfLcuMzy6xs6UkKns7/C5Uu8M66v5xeDenI7LkfT0fS6427woryupouufe2J v+E24Coh3L9M861/7UWFouqMMkW50srG2GeEahuKn2XkKHeYuJa28cyy7Gw2X9Szz+H3pV5/oddH W+BNAwo0H+6Ww/uJfrf7u5rbduNWtgV6iqbKbD3d/dE/um9Sg3o6X05sT9E8kw3rP6d5Nq4H5di3 K/tENfet7aa8q7K+7d2y+aCeVd+0YVXNQQNbBL0eF+HOrQWyqyKLy0k9Hdk6YJuH7eWWs5n9WOZt srqT/5pW/31ru75qOP78vb2p7I3H7mXkbRt0pnM/s/oC0blFd4c9+a+8nJSDm2ZLv0vN+Xj0WWW7 //nN2+w/xnXfVo5JtSiH5aLsZf97PrFQ1heV00F13svel+N59e8nkcqjkLrns9L+3Z1Wi3k1yN6M hhl964ir/17MSst6vr6Ny9tyNq8u76rZ3LqYd/Rf0/tXrqpysZxtvbYYTZybn9y+Y6JQvKDZm1+X Vfb3wSJzN6d7gvYIydwNv/3X1NF2RkP3QV+Q7xjnwv7AF6u98zazLztDvFvVgtt/Tc/z7Nx9elpO nGXPN27GvVSOR3f3Fl+91f6QfYbaR411Ou47l/Nzd9eNY3VfdF3XQ/cNdxP3aHnr/h7aGx6NV4// bZ/wsPaxw7UPD5Ty+jcX7vtXpng2JX9TzcrNkmdHLnnCjY3DszcfrQ7xJa8yYnpW5lG+W/JsXfJU mSJZ8q7N75f9xu3Ey54dLHtbHPN66r7obuL789ly6sTFloryv3lfSfrlRh1xjmGnlkz/mLonN6uK u782VaWxXeuq4k5kO2VVqQbj0e282qwt/Li1hVPFKWXZm/ezUfapus2YzCjtCXt/are28Pt+QhDy qH5i+37i1YUftasQLrtLsvxX1ng25V/e1uNxvVn88tidBTOUqu3OQjTJdXeLX94XP2lR/Ic7i60b ipe/ROwuGGlVXVbWe1p1MUepLj95ZexqiGhqSGO5eyuc2nJNLmX77K+zpTOSDb47Li63z9imx7ph cw5ulx2XmMk+RbqcMOcmrq2N/iw/r76unX1Nc3TfCxTGosj9yN6LQxdWTFq9xDeiPv6k3s//9tvs k/sns1WSZEMrXxd1trzHzGztnd8sF646ZuUi29S3Rc/+xxt9+xD6F2h4SnOfJeYFojOL7oaFn/9I gWVluZAv08yc59wdMfcC0YXIuZsUeoHoUuacv0R0n1lKuG35L7EvX9G/xJYqe1S/0E5GuuFInz38 +fflzQ4B/TKbJi+a3SkvEF2S3J9Z8OLQVY/Q3J/f9NzQ4ed/Nub/1oGlm/o76m+4WT4tZWD+z1/3 83+CqTNbWpSKs0welSJwfeXzf4HyX8/4P3nu113R+V9GuFT35S+1cOWvKZWv878Q13q9iFbhvO9c 58ytOF31bFYLJJeKrN7a/bj+yKpznC/7q5esfnTPuGGqL0/6PGrftGYyG0zutIojMVn5uoOUZ24e z+1iEMZWWSUIVa0xJQUxnTvnrNMM1rVG0wzCghtku4bk1Ij2tDB1cD3A2Z5LgVhxhbVtQkGJEKY9 qgYx4XrktzOtV0n32xJyBlIl9wF3zEoKZUh7aAFi1q2h/9ZwXEBYdIttw5i0UFJqzlrXUc38LH09 s4r7sl8O/qimQxt/UpeRNkA+HvW7+594czm4qQZ/XM4/T8aj6R9zK8fHVeknRpa3TpKPq+yn3zL3 WnZ+Mbup7i7ctFpnYBV7NbuYTBcXjTqaXq2XQPQuNwlWNeeCDct+Scp+h/X5sCNst9bpD64GHToY Ek0HnBBuLm7KSx9CXGx91iqswR/z23JQnW/YJ3xUlLMPRkqlRJkVWnxFZbaegG1dZEUBvMlLq/AZ Wp4HYZV7kgl4ibtW4dOzuMkJQdiLn2SCtpEOn5vleChCDowUE4NuazrSNZq9TcNANkowAW/m0jq0 tXPFg7A3N8UkgBMV+0PHIzzOxT58hwTlm5GqDh/G61ZZYORN0zp62xIjKUOSCXiHn9bhA2/96hgM RZhiAt7bp3X4qNsDPEA2SjCBu/OITHU8KK4qwQTuqiIydbUSDd5GcSbo40sTfkQJ4EwNOniuj3AT rNydF37iMyc1Z8XmmZOykDtnTirN5YYFI0K/yCVGMuAEkwZviSYi9B0PRkuMMxnoc0C0idZ8gyKs E0zQeVy0iQj9ojl3Ht5GUaYCOuu2NpEe3fFgKIMIkyA5gc7Iq004KvA8CAkRUkwKOogNHvrneag7 RulRQezGXIwJBx6C5gTjvI8EE9XQxWDCgYflYQTFdYaFvmOiGEFsigk6iDVhoS/c4nqEwRAVTHna MLnksEc6KS9RbRV0VsFEcSjos8xUJO6yPD4McostLpvFFrZEiq0nuh/t339bvXZfPO/r5XSYNe/I PL/7SPbnaHGT3dazxfyLPYpI705zg1E9tQlHLd4mKO4gymQMdD1OlFsB7Z6KcITgeTDGoYuwIrdM BYrLjNmJWakDPe5TRPoftpfLFL/c2F7qOwAbRRS55TEodTsiT1lOcZiidQn8LLOEjRj02eCJesTA pyeLiDZjOScYI00JJnD5XkTCCceDMdKUYgLOaa4ic6uOx6//fMQIAdlYR1xEZLv7CZRiSDABF4Mu IhLV8aC4hDgTB7aRIRHZ7HgQbJRkAp6AMCQqUQXGiTZJJuCBY0OiElUShBBVBQ8daZjEYwdqdevb FsDziSp4ZkXD48Zw17+5mvqlbDdtbDxH7Gjulilvpt1phlGaz37OdvZCGBLRwDxXGIccJZg0tKMy JKIXeLNbAN5GUSYDna3bkEjswvPCIIzLpZigx5wMicQK7sghhBnOJBNwD2lIJFawPBSlHsWZGHCM Z0jEce6fvQJkoziTgI7xEjxuWPNR4oJt3HIkVpA5xxjSMrHJb5kLguLOo3baZTp980nZCFpeRJYO eh6UepRggo7xaETI7/MA2SgiUy3TztAfEFPcTtDDkUkbQctUGpHNjgdDgqWYgKfYZGxBl+WRjxwe 3dgFZGhEmaucKJQuL80EnX4kxeSWS2OkHxGU6o3t9ClMg5d+JIFGnRZATD8iKCkUaU8LZMjd9CMp LgVSD0PpRzjlrbsWvw4SN/1IkhDEmNH0I9L9TxvTuolTA2PWw+lHUnBufQZe+hGXZYgKpVlbXn9i 6FeTyuJA+hHDItGHyg3KIFCszHRO3czgV1Nm++lHokVmzQO9U8KwSCDkslBhCP0UE3RQzSLBmeXB SNGSZIIenGGRYMjxYEQqKSbwehTtGlmBMT8WZ+IEOEWLYZF5FseD0tbiTBTaRjQyZq+bhSWPGnjY zO8YrRbSZUo/9TZtobe2aWtK9rZpq2KjkCKTT5bYYIzox5nUTmUGaFzRiqNQZj1STOCOLDIL43hQ HFmCCdqR8cjMkOPBcGQJJvBOmkeFvkKZ9UgxQc8y8KjQVyizDCkm8LYWEfpFzhSKjaJMogCeiTE8 IvSLXKFMkySYoBNKGR5RdJbHoNSjiLC2TCiBdYRJkpww8LodLjfHI6EXnvGwpnU8O7lhgcosykSh c/qayIpty8MExkr7OBPn4DYKa1rLIxlCjgkTWXbumDhG3RZhzeaZMPptEdZIngnF38brE/jAeqLc wPvtRN2G5wlrSM+DsVJNhDWbY9Io7T/q/6XGaGsiqpF2E4ECMcXrkwGv39E+En7MJl6PFIfWtfG2 hsAT1ZCKo9TpqGZTGImSjYj6Wc0wkq9E8rE7JsWfnMou0eUV0DknjIy4c2rDMIwh2Fh1pTlj0EMe sSZtecBDnljToc2hac+oDiHwRCSz48GQXzLixmnOJUY7i9uJQ6ewTNlIQ0/hyYgUtDwoQ1QpJmi5 IyMukyJtppIRiWqZMNKOppgEtA9JlBv0+QJGRnTRPg9QmUWZJAX3a1EdIhlGaJFggs6XKlVUi0i/ NP0xuaY2lubLqDxVO9vTgYohzgR90p6MS1Q/k3ridV9KK7a57ktpvbvuixi5YcEosWYYzj5emTVG xq6onVhOoYOzWPp5uUqa+KgGX7QrBra33vb0fiByMqvlkRhHNKaYGLSvVJFYwmV+woi3VES7s1yh xIBxOxkDPdYSt5EBn4KPDQlynKMnkkzQUxSRRd6OR6DU6ziTZNA2isQSlkdhiMYUE3jbj8QSlkdj LCtPMUGPSUWO1HY8KHnfEkzQ+6NiiS4tDxWPnMYhfOOWIypY5Axl2CvOJKCrqoqkA3Q80K48ycM2 dq27XIUP37Ve3+5sWq9vbw/kG9nYrM45UYawjeGFh2E+anP9IcxoCpKnUp7KmME8H1xSItXzsulp YE9m2oPJPyxqIdkzs+vxSU9l1GgSECs1mNEPguYA5j0us3/59IYO5gbxvHLj5IAUrwvwTm7j9rg6 Ns4i4LdeteEBd6ouA4wSZCMBWhJTwjvVp1KCO1UrrKl5QFWEsOlpYGGdqqBECPPM7Hp8UhynaqFJ YcVre2iF7lSfyAztVGmhpNScPaBiQNi4Pa7SaVzQSLUFD6BT5YRopjYjlBSfhvSmT8QDdqNe0XH9 zIx5GlhAN2obtdLFZrr452HX45OiuFEPLYsNQZWENthu9KnM4LGpjwc0I+15C8zYdB83tsV8hQs7 D5DmAY9N3VCq1rxgrTEZwRnwfQolrFMtuLCSToj2VRHEpqeBBXSqDerW2PSzsOvxSVGcqocWXD6z ruDYzBTZqQpqcR/Q2iiuU93B1ZHDsNa4oAO+LXjAnaoNVARleqNWPgwTaMD34ZS4s6iCUrPRpT6M FnrA90mw4AO+fHNs+lnY9fikiJGqLh7byPAi1cczIwz4Kl2Qh7Q2iJnq9rhGR1b0O1yMVXQJJvBd 5onBB46xyyDBJKB3LLvMGTEejARtKSbonRg6sqLf8WCsoE8xQe9+1ZEV/Y4Hpa3FmRRwW4udMOt4 Cv30Y/J0PDQsUIohzXTvWaGOyYszeTFwzwR2TJ6PrtjGuFoKc+ssOuBj8hJoW/oO+Jg8P/ZbiAcY ckvdQR6Tl+JSIFbcj5waE2r1ABNqEBPGjslLEHKQKhmNk7xZjVGiPTRMzdyKPNrDgRw8GJu/o2br NEwTVdf+9Ihn5v4KNyL+1RwDd+jovph9ZM6cI/lq7HPgmLxYlZYIR3eZSHAmcykw9o2ZSDAk9zKe AjFF7bR7MgxAuUVtpMBTAUXODHc8KJmzE0wCequsiXaNSmJsJ04xQW/dN5Et4NJGcig2ijPBD85E Bh5k7nfRP2rgQW/ccmT4fPUT8MUQZzLgXV68GAzGOGKKCbqqmsgWcMeD4soTTNCuvIjMs1ieAiM7 ZIqJQdsoKptNgeEWUkzQ8qKIStRdHiAbRZkKAp0GMC4JC5cU9VGucyNvXxFR5mrvbFSgs+sjXZ6y CoafPPuh4YxvZT8sip3sh1ox0daKWoF30pGKY3k0SiedYALvpCOxhONB6aQTTOCddCSW2OcBslGC CToGLCJC3/IYlLaWYAJvaxGhv88DZKMEE3SfHTlO3vIY+tg4edNNRRyryQ3KwXoJJujhioJEYgnH gxADppgk8FqSgkRiCceDsG4ryQTc5RUkEks4HgR5kWQClhcFicjmfR4gG8WZFPA6Uh90RngwpHyK CXq8JXpqgbFx+yPj5I19RDKWkqNozhI+dUb+QoqtmNTe2k5GflaojUIKxRKyR9xxFwjOviAh7e6Z FMZ8SZIJvFMMafcVD8JkRpIJeB1Aom6Dz28XwcW1DQ/GmSYFCQnrhgljUXSKSUI71+DpFysejHJL MYH3R/G6raCdffB4y4aneOSJJXTjxJLgqjD/E7oA3wOS5kHaNNxuwfghTMhNww+iBEm5+fS144do QTcNPxUWcNPwQxaUQ9r1+KQom4YfsrAc0LwnZcZIb7Wz3vw52Lg9rhQJXGBd04YH0KlyIqx00eyR fKfPEvkkPCg36jHZRqNOYkKkUj0yJYjj9IybmXSehyWPiAjsKj2t3kgG+iwMehpYDOdYSP0Qb4OZ QvkAbvAc4BUu8KKGNjzgEaeXbEpsjHE/DBMw9+MTKBEiTi03khQlaSEy054GFjzipOwBTRzErscn xcv9+LAeFD2h8lOZoZ2q52Vk48CKZ2HjB+Dq6Myehj54ug0PuFNtktSbjcXMD8MEG8Z9EiXCKQWK cPa8bHokWEOwEirLoqCEavpYVDin+lRSvBP0TPHczHtSZpQT9GxP1rojg7Fxe1xDo4G1IdB5Flvw wDtVn09Xb5yR9DBMsBP0nkSJkFCZbE2ApGgZbkLlJ8BCJ1TmlKtnZtfjk6I4Ven+p415QCODMO9J mVHmRoXamJ16FjZuj1vQ0Opii8stLsamkwQTBVrzefbMriYP17i+nl+sC9je8IVPqNW1Tx/jN4i9 lBDuX6b51r/uT3c69xllynaFWirBzwhVmqqzjBzjx1PX0uXNy7KzmUtz9jn8vtTrL/T6WI6mTRsJ bgjgOi/cNtStltDriUiSwqb6bD3d/dE/+mH16TeXI9utjMrx6H+qy6ZJvc3+5v/NvrwyzLNyPN7o hebZvElq+E0bcJNTF2/sgEu3xvM44PNy6tLwvc1+mM6XM9e/j6tyXmVXtr+8z8+XbeWly/1D331m o3lWDv7fcjSzPe2bq9G46mUXd+XsYracXtwN55OLVeq9dnnwiOgTMzBVR0l91RH9ctAphsOrzqAY SM1tgyPFwH6TvpK2zXUEG9COEGzYMYzrzlW/LIZKG00Mebtl3B9++eXvv4SsS/et6xIkHdm6g3I6 rRfZdbXwVs1sd3pvxF7zZzl2tJ/tg/Fwfv+uMhuOrq6qmf3Z9ScedG8e2LoDV/2j99T8/9vm7W8u bQE2OQltzZjNrIvNzt8wlmfnn5pb2qgc5XCYXZWj8XJWnds3fJje2ZsfZuXsejmxX3D+9jzrZIvZ Z1e5FnU2q3wDyPy3b93LP7795ecPP//HgbsR3N6N/nI3JqxoUnfzS/PzDsY/Y9vnYlFNbhfZuT5v 0yYdi1uHvlNrGEtZOF5rbNXwH5yWE+t239fL6TAbVBb0ajQoF1U2qCe2B8ncyz1r22qddrIVssqt +9pD5ubo/d+H9VPOvjdlp3newq9VRGtcqxebD7scnG7fVpjVpfj88t5u0wf/5D+2yim7oQpXZb8d gte3jm2e/e+5rcyzc1vZKelq2rX/Z5Kf/7sttauXW9QiXFEfQr0cDKr5/OBQQVsyCmPPrfD7i1mX c9sZzsfOsovZ0vcTzZsuXYV2T3sW92n3misFe8PuedLa+pTu3uNJre9HEx5Ldirr3wfpUdM/zrwu aQ6cef24QksyvdvsTmTeQ0F61NB3k8vlcjR0T2krXio2MB1OldUuot/vWE1qZYvpV8IY1S/L4ePL RkOVzZehiMeSnahsdoL751EsbocNVJNZDWG0IdM5dccdbZFFRG+UzMqWdXrtYT2xP34Izx9ZpQoh HoL34qI1C870vsx6jdaOEq1566q/ZrTm7838VaI1yXLmMiWeNlozraI1y6LI/tDPc47W3DFILtnd C4nWPK54adGaP9dp1weezDvfb3ptSwYkmrCitfWZUCDW/7Ivti3Zrmp7adHa+rAoQPPqdsHw+myo rzdaW589BVc2flNtWzKgqv/sojV38wJw9G61GbcNmcyZ2m0yp4/WCkna1ZoV3ouL1nwe6Ndo7UTR mrfuXzRa8/em/xrRWrPc3eX6Pm20Vpy3MO0BmFRjDMP8WtfZpJx+zipn43lWDwbL2cx1Gteju+ag t272n02LnFV3o+rPbHFTWdtbeBfNWX80Gtr+xrY8f0NZf3ndTXcsq3vgX+7BZRx8wD3YW3ib+QfZ /Ga58NYc1n9On93Cm2dy7a3/uTj+b7hVPlrKwPoff63X/ygh7fNUUMHOMnl8lP3rK1//s1/+K99x pLVf7oqv/yLWf96v/1KcufJX9HX9F8y13f8S0hMk59xkzuNl2T/r2XXXV5HuqloM6lnVHZbj7rC/ WghZLoejha0tfs2j7W6737onfqqvv7faa2CN9nv25vvqqlyOF/93af3b/3wa3FTD5biaXf6jntlv 6BTibfZPyftScfV79l1t/cy49IsmP3zfy6bL8TjPvnNa1QZCgz/un7ElV0+yH+5cX+/e2KF59tFq cesQetm3d9bvl/1x5cYJbMFl9dVKzNzW43Gd/VMwkn382+9OJy6nzlMNqyt7g0PrxqwOcFIn+ycl TLg3db/ZMZPscZYbWWTOiQXN1B+Pu/PmZp1dfiyth69md+X9WlH7bNI43Fjj8OFgyKQpfs9+/NbJ lPW3OMW+WM69Jh6MrUHsfay+zt3Y3//PLrh1sRZcqzi41cerwv1tOP+v22G5qH5ZTn8dTaoP06s6 iUwJscxXVF4N+v3fra65chb9bfLTyNr/buKkZZtQdH1z77J/lKPF+3r2U7mcDm6c9rSA2W09r0ZD +2B0Pa19fDCyNz31MmTW/KQt2cVonE1G17PGXNYm9hPVAauonLBEcX6xype/vreabVZ//u37T9/V E6uXhknjaGcbI3jR79vK/unXb3/5Nc/2vuaNk+4/l5PK3v36RvPMPflhaJ8aVgNB6LDfIWpYdYQy /U4ptOhU1ZCxkhamfyVza+oPw3dtLJ27CjSo3l2V47kNouaVlW/D+a+1s/s7YiXfzKrpK9vqPq/f YiXcvJ6+e5t7xTca9jIqClqRK3nYtqmmcgrbvv/w84dPPx4wbltow0/aTLZpn95Kvrcy9/iNg5Ie 5zkRGqCvk76vq64KpQblU/s6G0tYcMYMALhxRUkL3Rf0ydhFTxh39utJ6x5TFliIK6VKpl9AF22t Im1gKBKFeZxuRHg9UlEqZDX4S/XRlWDiitPDxtUUwbhtOuk4tTxpQ9nBfa69NPO9NC8SHus4vXTh FemVFn2lntrdMd9Ly5QiPQq40q7TGwyIlrx6Kjj3FhfqtP30sTXCibvpxiiGJtrkUUqTUmsaRclA EsoeXprYgffr9Xq9Xq/X6/V6vV6v1+v1er1er9fr9Xq9Xq/X6/V6vV6v14mv/w8xCHBnAAgCAA== --MP_/_7Zm8Xi//waNSwzp1NQwbMu-- --Sig_/+j1R2P+c=Zv.QGARGGJyPq/ Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWL3/8AAoJEMHFVuy5l8Y4o4AP+gP1clGgqzj9nPnMNyq/IY7r EaqE6VG+sO/WpFhax1NkCEDFxz2LEQ5V53ptcIkVrRX/Q0eV+x3IXsdfK0oNxDsM bCW396uQmhOW2Y73ad5KOqA/uW2Lu2R1fnzQn4nTjXzvIDPG1/Y7BJDzN0NiJc3N uxBl3dF3AZJDHi3mpMspAQFIqqVG1bOBa823VvY8PSwNENR1ayk3DkMF57ex5MyJ hlynpKi98MukjdAY3/0NISawQGhKjJ1BCAlI6NErPAzhUa+SG6PTkGqdc9X16+Wu EycdieVDyDSveARHVxu08OhcxWKsc/TxPIkuQPr6De8/GcF/4LRfPAlbN2fljGyx D13Zl+A+e39A0YKJe0xZ/dYDAHim9buGqPkokHdLcXmflXrqWMtHw9V4oMdESl7H BTmMhzr29SO2mLuCq4rvYg7S9XXMJX4+cwvkm12EyIGFWCi868uOmgup1CSgQSdR NoblaGoQWwBxA56MvE0eCBOVLvRur3S3niMulBGQ7rTy1GIkMFi/v7+v/NHm2KYC BnqsvEbF5UfVxCPE4oqKe/nGifccrdfLL+nrBQqdKvXwjgeDoO90OYEZf70cUTR5 ujX3aVDfUGdpXIC64xJDKT+QUF3IaVnn3QrlWZoxO6C5TCUqPcTw0KKxMkvGAJ7H yVqQ23/zK4E0XWq/8u5e =i2ip -----END PGP SIGNATURE----- --Sig_/+j1R2P+c=Zv.QGARGGJyPq/--

--Sig_/sqDHQzA_mxf_gS/AkMx=kG+ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Tue, 27 Oct 2015 09:45:28 -0400 Robert wrote: RS> I have oVirt 3.5.4 on CentOS 7.1 hosts, and everyone once in a while RS> one of my hosts starts sending me the 4 engine status messages above RS> about every 10-15 minutes. I upgraded the engine and all hosts to 3.5.5, and then 2 hosts started sending me 4 emails every 10-15 minutes. Currently I'm running with the engine in global maintenance to keep my inbox from overflowing with these messages. Any suggestions on how to get this under control appreciated... Robert --=20 Senior Software Engineer @ Parsons --Sig_/sqDHQzA_mxf_gS/AkMx=kG+ Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWMhe3AAoJEMHFVuy5l8Y4dQQP/1PXm4lqMOwOTYpzDaHD5vHw LPGlAvnsxIZKUC3SlWhzyJkc3BGygYUwM846R/8/IZrhAfNZ7wMZ5py6ezi08MzN 2vvKbEFY0lW08KE/iI628HOezM2RIhtKyJCuq9vMgtQp3oSy5cjS0L1fQwz8whtf j9DYjj0d0/PaopTQyeybWtxMij/NXRCDl7AtidPsuyOTux5ywy/S1KNn6F9JkqUL rdOFogz+43Jr/3dUNZZ7qENPkGLUQqU2MwXmEJHq+t09BqB0sxK75IH+hQmytJPb tDyON6dfptTMGDC67pBeY65eVIOnP09yjyqjOz3lfFXbUDZ/D4Yjuq48XGdxKUOi TSItyLl5xt9OmAP2DASrIJyt4BUavMnsxMTvm7SEDNDwnrJMp2kfwskpzC+Vx8rW 0reimy1pO+zl3hGAXfKzP8Ld+jfmodRzo6CzoIZ35C+khVsq+O6+Su33DMuzJ56t VusITyPfN/BAZV9xgN/jFl4xZkb1a98AzfE8WCc8IUEC1viLcLYGFZYtAQZCqn7r OUNnAqjR2mlNz8LTOvSMIYZYm0IchCLQ8La/iYCeUFakeq9JQJkujCSJTyFs/ktG u+zd34fr59BPLr6mhekyW1zvakPztVoF8F87iDLdux6vItVQivUGUfOMEG3M6B9R 5yGy7vTaT2ApZ+AYWLHN =R99M -----END PGP SIGNATURE----- --Sig_/sqDHQzA_mxf_gS/AkMx=kG+--

Hi Robert, it seams that two hosts are fighting fir the same host ID: MainThread::INFO::2015-10-27 09:14:56,764::hosted_engine::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/2daba0ab-2b3d-4026-bcfc-1cd071c30038/04b08c8e-657f-4bac-9ddf-c9c57373409c/2d7f5020-42c1-442d-8237-fba9d6787080) MainThread::ERROR::2015-10-27 09:14:56,766::hosted_engine::578::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) cannot get lock on host id 1: host already holds lock on a different host id MainThread::ERROR::2015-10-27 09:14:56,767::agent::177::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: '(22, 'Sanlock lockspace add failure', 'Invalid argument')' - trying to restart agent MainThread::WARNING::2015-10-27 09:15:01,772::agent::180::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '9' MainThread::ERROR::2015-10-27 09:15:01,772::agent::182::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Too many errors occurred, giving up. Please review the log and consider filing a bug. MainThread::INFO::2015-10-27 09:15:01,773::agent::121::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down can you please share the output of: hosted-engine --vm-status On Thu, Oct 29, 2015 at 1:57 PM, Robert Story <rstory@tislabs.com> wrote:
On Tue, 27 Oct 2015 09:45:28 -0400 Robert wrote: RS> I have oVirt 3.5.4 on CentOS 7.1 hosts, and everyone once in a while RS> one of my hosts starts sending me the 4 engine status messages above RS> about every 10-15 minutes.
I upgraded the engine and all hosts to 3.5.5, and then 2 hosts started sending me 4 emails every 10-15 minutes. Currently I'm running with the engine in global maintenance to keep my inbox from overflowing with these messages.
Any suggestions on how to get this under control appreciated...
Robert
-- Senior Software Engineer @ Parsons
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Sig_/wmMSgNEZcHF++Et.y/v2+kZ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 29 Oct 2015 14:08:22 +0100 Simone wrote: ST> it seams that two hosts are fighting fir the same host ID: ST>=20 ST> MainThread::INFO::2015-10-27 ST> 09:14:56,764::hosted_engine::562::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(_initialize_sanlock) ST> Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: ST> /var/run/vdsm/storage/2daba0ab-2b3d-4026-bcfc-1cd071c30038/04b08c8e-657= f-4bac-9ddf-c9c57373409c/2d7f5020-42c1-442d-8237-fba9d6787080) ST> MainThread::ERROR::2015-10-27 ST> 09:14:56,766::hosted_engine::578::ovirt_hosted_engine_ha.agent.hosted_e= ngine.HostedEngine::(_initialize_sanlock) ST> cannot get lock on host id 1: host already holds lock on a different ST> host id MainThread::ERROR::2015-10-27 ST> 09:14:56,767::agent::177::ovirt_hosted_engine_ha.agent.agent.Agent::(_r= un_agent) ST> Error: '(22, 'Sanlock lockspace add failure', 'Invalid argument')' - ST> trying to restart agent ST>=20 ST> can you please share the output of: hosted-engine --vm-status Hi Simone, thanks for taking the time to look at this. Here is the outpu: # hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --=3D=3D Host 1 status =3D=3D-- Status up-to-date : False Hostname : ares.netsec Host ID : 1 Engine status : unknown stale-data Score : 2334 Local maintenance : False Host timestamp : 2496391 Extra metadata (valid at timestamp): metadata_parse_version=3D1 metadata_feature_version=3D1 timestamp=3D2496391 (Tue Oct 27 07:41:00 2015) host-id=3D1 score=3D2334 maintenance=3DFalse state=3DEngineUp --=3D=3D Host 2 status =3D=3D-- Status up-to-date : False Hostname : hera.netsec Host ID : 2 Engine status : unknown stale-data Score : 1689 Local maintenance : False Host timestamp : 2038037 Extra metadata (valid at timestamp): metadata_parse_version=3D1 metadata_feature_version=3D1 timestamp=3D2038037 (Mon Oct 26 08:50:13 2015) host-id=3D2 score=3D1689 maintenance=3DFalse state=3DEngineDown --=3D=3D Host 3 status =3D=3D-- Status up-to-date : False Hostname : eclipse.netsec Host ID : 3 Engine status : unknown stale-data Score : 2000 Local maintenance : False Host timestamp : 2298393 Extra metadata (valid at timestamp): metadata_parse_version=3D1 metadata_feature_version=3D1 timestamp=3D2298393 (Thu Oct 29 09:46:21 2015) host-id=3D3 score=3D2000 maintenance=3DFalse state=3DGlobalMaintenance --=3D=3D Host 4 status =3D=3D-- Status up-to-date : False Hostname : poseidon.netsec Host ID : 4 Engine status : unknown stale-data Score : 2000 Local maintenance : False Host timestamp : 123241 Extra metadata (valid at timestamp): metadata_parse_version=3D1 metadata_feature_version=3D1 timestamp=3D123241 (Thu Oct 29 09:46:30 2015) host-id=3D4 score=3D2000 maintenance=3DFalse state=3DGlobalMaintenance --=3D=3D Host 5 status =3D=3D-- Status up-to-date : False Hostname : apollo.netsec Host ID : 5 Engine status : unknown stale-data Score : 2000 Local maintenance : False Host timestamp : 2028116 Extra metadata (valid at timestamp): metadata_parse_version=3D1 metadata_feature_version=3D1 timestamp=3D2028116 (Mon Oct 26 04:14:46 2015) host-id=3D5 score=3D2000 maintenance=3DFalse state=3DEngineDown Robert --=20 Senior Software Engineer @ Parsons --Sig_/wmMSgNEZcHF++Et.y/v2+kZ Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWMiSRAAoJEMHFVuy5l8Y4uRUP/jwLzddfisVkNgGfN47gMcZ/ zC2CbWiEE+3l7J5/fPXFwO9MaqwW7QlPO+WhTLjF1gZ36ZyScvtBpqyEuCNd8DZe LsYK5JdBD9J7+SxhGeIIiZJansowLo0Hno1ZKlDnFnubLLI0tpJIOlryQybD/euA tjkBYvZnUtNw5mOymCdIQ16LGx02zdceUkJLctIFO74QA2Vz7P6QzBArE8TL4f7d c2Z+IbYrL83I79r8JrMnMlUjlt36jVT/E/1uMDtzVxGAtG+GHxV6WcLJUcsqWkAi ieC86oPhYdsPfEi6m9Lrq4xhAx3LjulmvbY2bPMemW3QWQTb4iMkhfadc4I8oNx2 n4WI/1hGhwCxf6QcJIX0gkDFyLoP/LEajmbKBIuRowZuuHFdQrOa7Bxkn2NiDfRh tczXnbbTOAcqtc4seXp/JLuAZ2zyyqh/6IBEYMeaxiWL9EnMYh5xTeY1+EGrqvnB FCLvr/ukfMm42XAojli1Du0jsDGq/Mk6hbrNb9pLYj/RIrtm7MmxQMCNYRntaDWa mPeEHFs1dyCMEneNe4NrBtnr66xZFWDT5DG1XLRe5GWJA+cIkcXZ3owka596GxPV JnPYBEhr1BixfoeYgPLuIvUOpstqmJB9Viq1mb0OccLmFeRDnwHT2t2kM5djQWKq Cn9yj4XFVzG9zVnNqwW6 =JkhW -----END PGP SIGNATURE----- --Sig_/wmMSgNEZcHF++Et.y/v2+kZ--

On Thu, Oct 29, 2015 at 2:52 PM, Robert Story <rstory@tislabs.com> wrote:
On Thu, 29 Oct 2015 14:08:22 +0100 Simone wrote: ST> it seams that two hosts are fighting fir the same host ID: ST> ST> MainThread::INFO::2015-10-27 ST> 09:14:56,764::hosted_engine::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) ST> Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: ST> /var/run/vdsm/storage/2daba0ab-2b3d-4026-bcfc-1cd071c30038/04b08c8e-657f-4bac-9ddf-c9c57373409c/2d7f5020-42c1-442d-8237-fba9d6787080) ST> MainThread::ERROR::2015-10-27 ST> 09:14:56,766::hosted_engine::578::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) ST> cannot get lock on host id 1: host already holds lock on a different ST> host id MainThread::ERROR::2015-10-27 ST> 09:14:56,767::agent::177::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) ST> Error: '(22, 'Sanlock lockspace add failure', 'Invalid argument')' - ST> trying to restart agent ST> ST> can you please share the output of: hosted-engine --vm-status
Hi Simone, thanks for taking the time to look at this. Here is the outpu:
# hosted-engine --vm-status
!! Cluster is in GLOBAL MAINTENANCE mode !!
--== Host 1 status ==-- Status up-to-date : False Hostname : ares.netsec Host ID : 1 Engine status : unknown stale-data Score : 2334 Local maintenance : False Host timestamp : 2496391 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2496391 (Tue Oct 27 07:41:00 2015) host-id=1 score=2334 maintenance=False state=EngineUp
--== Host 2 status ==-- Status up-to-date : False Hostname : hera.netsec Host ID : 2 Engine status : unknown stale-data Score : 1689 Local maintenance : False Host timestamp : 2038037 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2038037 (Mon Oct 26 08:50:13 2015) host-id=2 score=1689 maintenance=False state=EngineDown
--== Host 3 status ==-- Status up-to-date : False Hostname : eclipse.netsec Host ID : 3 Engine status : unknown stale-data Score : 2000 Local maintenance : False Host timestamp : 2298393 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2298393 (Thu Oct 29 09:46:21 2015) host-id=3 score=2000 maintenance=False state=GlobalMaintenance
--== Host 4 status ==-- Status up-to-date : False Hostname : poseidon.netsec Host ID : 4 Engine status : unknown stale-data Score : 2000 Local maintenance : False Host timestamp : 123241 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=123241 (Thu Oct 29 09:46:30 2015) host-id=4 score=2000 maintenance=False state=GlobalMaintenance
--== Host 5 status ==-- Status up-to-date : False Hostname : apollo.netsec Host ID : 5 Engine status : unknown stale-data Score : 2000 Local maintenance : False Host timestamp : 2028116 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2028116 (Mon Oct 26 04:14:46 2015) host-id=5 score=2000 maintenance=False state=EngineDown
Here the host IDs seam coherent. Can you please specify the name of the hosts where you took the logs in your first log archive (complaining host and engine host) ?
Robert
-- Senior Software Engineer @ Parsons

--Sig_/RT_yaXT/gJJK_NLrp_7i4Pt Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 29 Oct 2015 15:40:23 +0100 Simone wrote: ST> Here the host IDs seam coherent. ST> Can you please specify the name of the hosts where you took the logs in ST> your first log archive (complaining host and engine host) ? Hmm.. I know the complaining host was posedion, and I'm pretty sure the engine was running on ares. Robert --=20 Senior Software Engineer @ Parsons --Sig_/RT_yaXT/gJJK_NLrp_7i4Pt Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWMjGdAAoJEMHFVuy5l8Y4/bIQAJC2YrGBaxEhVXjL9JxfAylp yb9plv1+aYTPAtOannhxpNcWJxZtL4x2Yj1KFjdIoP3OcV+QFVx5RHx3Zi9i25Wl 5Nj7DRs2q/RyHBwmIL+UUl4tS5raDsyZ7ARN44DwudJtgLjv5ooONuEdaJirzgif Ha2jFElzdRVQ8R8RksvVuTRRroSiCcJGl6A+DtwiezLXcQD4obQpJmlZrtONp6pM 2L8aIJHYPsONf+S1vAYCELaRBlXCcVAMvibxUoyIGIRybunmumyr4ISPTfFig3gV MZnHBggctLffjXVkoNsSz35EQnWielBxiaL9db5djTPh/7rRGJxn7DNuqgYbPCpC RWPTvFcTAlNraCP+sMO9CWS0oCVO2Tx7u1EdAOryZIqziWSeU6PKgmmUjyqAWElk 2A6tKY9SC5co1pmqfk/SGMgvuwOAMIS8TTrsD5QTZKzbckhzbRsno9qbwHCpaVK0 +NFfD+pC0NHkX/+35E7XzPBC7ylR8dxC5VX+S16oFluZGNYXbkPQscKHttU/59ul 2yH399tWawkaWAMEwuT5xgn9C3K9EDJlFfY1Pa+KSEm7iZ8v0EPLkhew9zkjmOgb 5wnlj1fclkwA6ebto3KLaVkxH5XcO6/EbFwSJ/NYfcg0itK6ARs7UmPDb+6Vxb+9 yYfwi0ehn2PzTDNtZNe7 =5fzY -----END PGP SIGNATURE----- --Sig_/RT_yaXT/gJJK_NLrp_7i4Pt--

On Thu, Oct 29, 2015 at 3:47 PM, Robert Story <rstory@tislabs.com> wrote:
On Thu, 29 Oct 2015 15:40:23 +0100 Simone wrote: ST> Here the host IDs seam coherent. ST> Can you please specify the name of the hosts where you took the logs in ST> your first log archive (complaining host and engine host) ?
Hmm.. I know the complaining host was posedion, and I'm pretty sure the engine was running on ares.
And indeed ares was host 1 so when it failed it was correctly trying to get lock for host 1 but it seams that previously it acquired a lock as different host. Could you please check grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf on ares and share vdsm and sanlock logs from that host? MainThread::INFO::2015-10-27 09:14:56,764::hosted_engine::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/2daba0ab-2b3d-4026-bcfc-1cd071c30038/04b08c8e-657f-4bac-9ddf-c9c57373409c/2d7f5020-42c1-442d-8237-fba9d6787080) MainThread::ERROR::2015-10-27 09:14:56,766::hosted_engine::578::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) cannot get lock on host id 1: host already holds lock on a different host id MainThread::ERROR::2015-10-27 09:14:56,767::agent::177::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: '(22, 'Sanlock lockspace add failure', 'Invalid argument')' - trying to restart agent MainThread::WARNING::2015-10-27 09:15:01,772::agent::180::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '9' MainThread::ERROR::2015-10-27 09:15:01,772::agent::182::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Too many errors occurred, giving up. Please review the log and consider filing a bug. MainThread::INFO::2015-10-27 09:15:01,773::agent::121::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
Robert
-- Senior Software Engineer @ Parsons

--Sig_/0zE=B++yeXzvy1Vlh+eojbF Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 29 Oct 2015 16:00:27 +0100 Simone wrote: ST> And indeed ares was host 1 so when it failed it was correctly trying to ST> get lock for host 1 but it seams that previously it acquired a lock as ST> different host. ST> Could you please check ST> grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf ST> on ares and share vdsm and sanlock logs from that host? $ for x in ares hera eclipse poseidon apollo; do echo "* $x"; ssh root@$x g= rep host_id /etc/ovirt-hosted-engine/hosted-engine.conf 2>/dev/null; done * ares host_id=3D1 * hera host_id=3D2 * eclipse host_id=3D3 * poseidon host_id=3D4 * apollo host_id=3D5 Since I've upgraded, I figured I'd reproduce and send new logs. In that process, I noticed that the ha-agent was down on 3 hosts, and the 2 other hosts were the ones generating the messages. So I restarted ha-agent on all 5, disabled global maintenance for 2 minutes, re-enabeled it, then ran a grep on all the logs on all 5 hosts for those 2 minutes. I'll sent that to you directly, as it's rather large to be sending to the list.. All 3 hosts that had down ha-agents were down again, so I'm guessing that's the issue..=20 Robert --=20 Senior Software Engineer @ Parsons --Sig_/0zE=B++yeXzvy1Vlh+eojbF Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWMktbAAoJEMHFVuy5l8Y4El8P/0LiuqLaTiv/WCrlamVUKti5 2PoSajUlH1ihSyHzhbL7ahNXO0ZrduIiPTgz1386FyVKUodiCr6w24k5at/Dqn70 Gh8BGHApc6FQvFBJx/Rxvpm/WqCDtuPE9DdCLzb7LSytElEHS9SsDgVGG4G+ts8N ApLuPB5OgwyjCcaKpLF1OQDR6nZn2jchm8HAlZmisUc58R6By4MW4DdATBJDv6Vy 9UyBeovNrO73kFwIdLB/aXpqko9AcqnNI+nX+Zjipo5OsnzolhZKXVhoBrZ2m/PY H6k29fQSsM7RXJKr8EBWBfaeLWIc7IZHN/hxHXFtWkBRukQ5qgX5VY72Q1xvoHRr kZG3o1WCNZ65ook0XjDqEnJDYED2Z69t0LOBjjDO6dUSo2EE29FWMXbwh9XelUSm xZQlpR485443GtO7voC3aAvDn2hL/rnF1TF1GCw+vQZbzjeuY2689p8ThUQTEX2S kt6RnZfM2Y8zJ/1zcvMnnoMcb0Tz3qVOHFuZAkA67MMGoZzUYCJQaI9H9INRSAPq 4FENHiFhFg28sAbQ9+CjItShyWdrj0ruFyeMU05oXmRwmX1ZaSPpNTEyvLvinRbk aCBw7Kq6QLAXZ2EQqh4kP9qjOUqg+nXh54ep08fkZFXJTn51dhwAFAdQumO+e266 2jkIENSyLi1Nw/n/q3Ap =VDjl -----END PGP SIGNATURE----- --Sig_/0zE=B++yeXzvy1Vlh+eojbF--

--Sig_/1ZxAK_x5H3SBqIdapmI_5i0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 29 Oct 2015 12:37:44 -0400 Robert wrote: RS> All 3 hosts that had down ha-agents were down again, so I'm guessing RS> that's the issue..=20 As an experiment, I migrated the engine VM to one of the hosts with a working ha-agent process, and I'm no longer getting getting these emails. So the next question is: how can I fix the ha-agent on my other three nodes. Seems to be an issue with host_id and sanlock... Robert --=20 Senior Software Engineer @ Parsons --Sig_/1ZxAK_x5H3SBqIdapmI_5i0 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWMtsoAAoJEMHFVuy5l8Y4CEYP/2Jd0o85PZSK3XvUTfSJ7wd2 s9mLVbrW7ua69IxcX038sq5IcUOfejfctTVcL4rZhbdnq3zaSjHlF+XvImzYzOHU O3lznU/n9pMDqD1Jsrqe54Qz8RQc0DtkZPvPx/AMGMU0NnS3jzaOFg3j8VPbCDdq oac57l5fY4kv0rMumUmVK5mr1sunniPK0O0GHXryKERQJtobUBVCjJl2KcPFAzpY AkuUy/pUyTsNKcpXIAZFKtUl4sj4Tg92MVS0P9z66L6aDVqUWiG1wLLzcsp19dfx p1qy2icP3XJvsb6ia17PyGoWwzLsHPgf0zYsT2OPY/y8IuBR+KMIzCzJaNTpcmZc 7QYaoyIVJG8yTx+46RG6fE5g7AaVl2bOY0QxdVgQRAl1DsfdX0UZkM8DQa8a8C/2 lf2CaOqaVLkmUCO1YBhFMDxDHqpArLsPDGjoNAI4GQH2smbodVqPV8JWKufem7Le MEDP99hJug4ohBizlaTOoX1NonIZ8dcu4xm0KJAdNPx4puBYX8WkubEy6QynM4rR VWxlHeMQhceqRQcwGNX8rLolfnUgLoTqqM65VNpuwUAFy8nqhQF497fpOJ9NLA/N fVWnqpvmzb5kv+/WUrL4aXtlr8ImY+sKJ+jZgD8dSbpQlZ8ScqfbpafITvYINor5 Uuy+b0+0IaExuV9xaoBf =n2/Z -----END PGP SIGNATURE----- --Sig_/1ZxAK_x5H3SBqIdapmI_5i0--
participants (2)
-
Robert Story
-
Simone Tiraboschi