
Hi John, thanks for the logs. Seems like the engine is running on host2 and it decides that it doesn't have the best score and shuts the engine down and then neither of them want's to start the vm until you restart the host2. Unfortunately the logs doesn't contain the part from host1 from 2014-07-24 09:XX which I'd like to investigate because it might contain the information why host1 refused to start the vm when host2 killed it. Regards, Jirka On 07/28/2014 02:57 AM, John Gardeniers wrote:
Hi Jira,
Version: ovirt-hosted-engine-ha-1.1.5-1.el6.noarch
Attached are the logs. Thanks for looking.
Regards, John
On 25/07/14 17:47, Jiri Moskovcak wrote:
On 07/24/2014 11:37 PM, John Gardeniers wrote:
Hi Jiri,
Perhaps you can tell me how to determine the exact version of ovirt-hosted-engine-ha.
Centos/RHEL/Fedora: rpm -q ovirt-hosted-engine-ha
As for the logs, I am not going to attach 60MB of logs to an email,
- there are other ways to share the logs
nor can I see any imaginagle reason for you wanting to see them all, as the bulk is historical. I have already included the *relevant* sections. However, if you think there may be some other section that may help you feel free to be more explicit about what you are looking for. Right now I fail to understand what you might hope to see in logs from several weeks ago that you can't get from the last day or so.
It's a standard way, people tend to think that they know what is a relevant part of a log, but in many cases they fail. Asking for the whole logs has proven to be faster than trying to find the relevant part through the user. And you're right, I don't need the logs from last week, just logs since the last start of the services when you observed the problem.
Regards, Jirka
regards, John
On 24/07/14 19:10, Jiri Moskovcak wrote:
Hi, please provide the the exact versions of ovirt-hosted-engine-ha and all logs from /var/log/ovirt-hosted-engine-ha/
Thank you, Jirka
On 07/24/2014 01:29 AM, John Gardeniers wrote:
Hi All,
I have created a lab with 2 hypervisors and a self-hosted engine. Today I followed the upgrade instructions as described in http://www.ovirt.org/Hosted_Engine_Howto and rebooted the engine. I didn't really do an upgrade but simply wanted to test what would happen when the engine was rebooted.
When the engine didn't restart I re-ran hosted-engine --set-maintenance=none and restarted the vdsm, ovirt-ha-agent and ovirt-ha-broker services on both nodes. 15 minutes later it still hadn't restarted, so I then tried rebooting both hypervisers. After an hour there was still no sign of the engine starting. The agent logs don't help me much. The following bits are repeated over and over.
ovirt1 (192.168.19.20):
MainThread::INFO::2014-07-24 09:18:40,272::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1406157520.27 type=state_transition detail=EngineDown-EngineDown hostname='ovirt1.om.net' MainThread::INFO::2014-07-24 09:18:40,272::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored MainThread::INFO::2014-07-24 09:18:40,594::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400) MainThread::INFO::2014-07-24 09:18:40,594::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 192.168.19.21 (id: 2, score: 2400)
ovirt2 (192.168.19.21):
MainThread::INFO::2014-07-24 09:18:04,005::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1406157484.01 type=state_transition detail=EngineDown-EngineDown hostname='ovirt2.om.net' MainThread::INFO::2014-07-24 09:18:04,006::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored MainThread::INFO::2014-07-24 09:18:04,324::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineDown (score: 2400) MainThread::INFO::2014-07-24 09:18:04,324::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host 192.168.19.20 (id: 1, score: 2400)
From the above information I decided to simply shut down one hypervisor and see what happens. The engine did start back up again a few minutes later.
The interesting part is that each hypervisor seems to think the other is a better host. The two machines are identical, so there's no reason I can see for this odd behaviour. In a lab environment this is little more than an annoying inconvenience. In a production environment it would be completely unacceptable.
May I suggest that this issue be looked into and some means found to eliminate this kind of mutual exclusion? e.g. After a few minutes of such an issue one hypervisor could be randomly given a slightly higher weighting, which should result in it being chosen to start the engine.
regards, John _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com ______________________________________________________________________
______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com ______________________________________________________________________