Re: [ovirt-users] Self-hosted engine won't start

18 Aug 2014

      Hi Jirka,

Thanks for the update. It sounds like the same bug but with a few extra
issues thrown in. e.g. Comment 9 seems to me to be a completely separate
bug, although it may affect the issue I reported.

I can't see any mention of how the problem is being resolved, which I am
interested in, but will keep an eye on it.

I'll try the patched version when I get the time and enthusiasm to give
it another crack.

regards,
John

On 14/08/14 22:57, Jiri Moskovcak wrote:
...
Hi John,
after a deeper look I realized that you're probably facing [1]. The
patch is ready and I will also backport it to 3.4 branch.
--Jirka
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1093638
On 07/29/2014 11:41 PM, John Gardeniers wrote:
...
Hi Jiri,
Sorry, I can't supply the log because the hosts have been recycled but
I'm sure it would have contained exactly the same information that you
already have from host2. It's a classic deadlock situation that should
never be allowed to happen. A simple and time proven solution was in my
original post.
The reason for recycling the hosts is that I discovered yesterday that
although the engine was still running it could not be accessed in any
way. Upon further finding that there was no way to get it restarted I
decided to abandon the whole idea of self-hosting until such time as I
see an indication that it's production ready.
regards,
John
On 29/07/14 22:52, Jiri Moskovcak wrote:
...
Hi John,
thanks for the logs. Seems like the engine is running on host2 and it
decides that it doesn't have the best score and shuts the engine down
and then neither of them want's to start the vm until you restart the
host2. Unfortunately the logs doesn't contain the part from host1 from
2014-07-24 09:XX which I'd like to investigate because it might
contain the information why host1 refused to start the vm when host2
killed it.
Regards,
Jirka
On 07/28/2014 02:57 AM, John Gardeniers wrote:
...
Hi Jira,
Version: ovirt-hosted-engine-ha-1.1.5-1.el6.noarch
Attached are the logs. Thanks for looking.
Regards,
John
On 25/07/14 17:47, Jiri Moskovcak wrote:
...
On 07/24/2014 11:37 PM, John Gardeniers wrote:
...
Hi Jiri,
Perhaps you can tell me how to determine the exact version of
ovirt-hosted-engine-ha.
Centos/RHEL/Fedora: rpm -q ovirt-hosted-engine-ha
...
As for the logs, I am not going to attach 60MB
of logs to an email,
- there are other ways to share the logs
...
nor can I see any imaginagle reason for you wanting
to see them all, as the bulk is historical. I have already included
the
*relevant* sections. However, if you think there may be some other
section that may help you feel free to be more explicit about
what you
are looking for. Right now I fail to understand what you might
hope to
see in logs from several weeks ago that you can't get from the last
day
or so.
It's a standard way, people tend to think that they know what is a
relevant part of a log, but in many cases they fail. Asking for the
whole logs has proven to be faster than trying to find the relevant
part through the user. And you're right, I don't need the logs from
last week, just logs since the last start of the services when you
observed the problem.
Regards,
Jirka
...
regards,
John
On 24/07/14 19:10, Jiri Moskovcak wrote:
> Hi, please provide the the exact versions of ovirt-hosted-engine-ha
> and all logs from /var/log/ovirt-hosted-engine-ha/
>
> Thank you,
> Jirka
>
> On 07/24/2014 01:29 AM, John Gardeniers wrote:
>> Hi All,
>>
>> I have created a lab with 2 hypervisors and a self-hosted engine.
>> Today
>> I followed the upgrade instructions as described in
>> http://www.ovirt.org/Hosted_Engine_Howto and rebooted the
>> engine. I
>> didn't really do an upgrade but simply wanted to test what would
>> happen
>> when the engine was rebooted.
>>
>> When the engine didn't restart I re-ran hosted-engine
>> --set-maintenance=none and restarted the vdsm, ovirt-ha-agent and
>> ovirt-ha-broker services on both nodes. 15 minutes later it still
>> hadn't
>> restarted, so I then tried rebooting both hypervisers. After an
>> hour
>> there was still no sign of the engine starting. The agent logs
>> don't
>> help me much. The following bits are repeated over and over.
>>
>> ovirt1 (192.168.19.20):
>>
>> MainThread::INFO::2014-07-24
>> 09:18:40,272::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>
>>
>>
>>
>> Trying: notify time=1406157520.27 type=state_transition
>> detail=EngineDown-EngineDown hostname='ovirt1.om.net'
>> MainThread::INFO::2014-07-24
>> 09:18:40,272::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>
>>
>>
>>
>> Success, was notification of state_transition
>> (EngineDown-EngineDown)
>> sent? ignored
>> MainThread::INFO::2014-07-24
>> 09:18:40,594::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>
>>
>>
>>
>> Current state EngineDown (score: 2400)
>> MainThread::INFO::2014-07-24
>> 09:18:40,594::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>
>>
>>
>>
>> Best remote host 192.168.19.21 (id: 2, score: 2400)
>>
>> ovirt2 (192.168.19.21):
>>
>> MainThread::INFO::2014-07-24
>> 09:18:04,005::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>
>>
>>
>>
>> Trying: notify time=1406157484.01 type=state_transition
>> detail=EngineDown-EngineDown hostname='ovirt2.om.net'
>> MainThread::INFO::2014-07-24
>> 09:18:04,006::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>
>>
>>
>>
>> Success, was notification of state_transition
>> (EngineDown-EngineDown)
>> sent? ignored
>> MainThread::INFO::2014-07-24
>> 09:18:04,324::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>
>>
>>
>>
>> Current state EngineDown (score: 2400)
>> MainThread::INFO::2014-07-24
>> 09:18:04,324::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>
>>
>>
>>
>> Best remote host 192.168.19.20 (id: 1, score: 2400)
>>
>>     From the above information I decided to simply shut down one
>> hypervisor
>> and see what happens. The engine did start back up again a few
>> minutes
>> later.
>>
>> The interesting part is that each hypervisor seems to think the
>> other is
>> a better host. The two machines are identical, so there's no
>> reason I
>> can see for this odd behaviour. In a lab environment this is
>> little
>> more
>> than an annoying inconvenience. In a production environment it
>> would be
>> completely unacceptable.
>>
>> May I suggest that this issue be looked into and some means
>> found to
>> eliminate this kind of mutual exclusion? e.g. After a few
>> minutes of
>> such an issue one hypervisor could be randomly given a slightly
>> higher
>> weighting, which should result in it being chosen to start the
>> engine.
>>
>> regards,
>> John
>> _______________________________________________
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
> ______________________________________________________________________
>
>
> This email has been scanned by the Symantec Email Security.cloud
> service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
>
>
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud
service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud
service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________