[ovirt-users] Self-hosted engine won't start
Jiri Moskovcak
jmoskovc at redhat.com
Thu Aug 14 08:57:23 EDT 2014
Hi John,
after a deeper look I realized that you're probably facing [1]. The
patch is ready and I will also backport it to 3.4 branch.
--Jirka
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1093638
On 07/29/2014 11:41 PM, John Gardeniers wrote:
> Hi Jiri,
>
> Sorry, I can't supply the log because the hosts have been recycled but
> I'm sure it would have contained exactly the same information that you
> already have from host2. It's a classic deadlock situation that should
> never be allowed to happen. A simple and time proven solution was in my
> original post.
>
> The reason for recycling the hosts is that I discovered yesterday that
> although the engine was still running it could not be accessed in any
> way. Upon further finding that there was no way to get it restarted I
> decided to abandon the whole idea of self-hosting until such time as I
> see an indication that it's production ready.
>
> regards,
> John
>
>
> On 29/07/14 22:52, Jiri Moskovcak wrote:
>> Hi John,
>> thanks for the logs. Seems like the engine is running on host2 and it
>> decides that it doesn't have the best score and shuts the engine down
>> and then neither of them want's to start the vm until you restart the
>> host2. Unfortunately the logs doesn't contain the part from host1 from
>> 2014-07-24 09:XX which I'd like to investigate because it might
>> contain the information why host1 refused to start the vm when host2
>> killed it.
>>
>> Regards,
>> Jirka
>>
>> On 07/28/2014 02:57 AM, John Gardeniers wrote:
>>> Hi Jira,
>>>
>>> Version: ovirt-hosted-engine-ha-1.1.5-1.el6.noarch
>>>
>>> Attached are the logs. Thanks for looking.
>>>
>>> Regards,
>>> John
>>>
>>>
>>> On 25/07/14 17:47, Jiri Moskovcak wrote:
>>>> On 07/24/2014 11:37 PM, John Gardeniers wrote:
>>>>> Hi Jiri,
>>>>>
>>>>> Perhaps you can tell me how to determine the exact version of
>>>>> ovirt-hosted-engine-ha.
>>>>
>>>> Centos/RHEL/Fedora: rpm -q ovirt-hosted-engine-ha
>>>>
>>>>> As for the logs, I am not going to attach 60MB
>>>>> of logs to an email,
>>>>
>>>> - there are other ways to share the logs
>>>>
>>>>> nor can I see any imaginagle reason for you wanting
>>>>> to see them all, as the bulk is historical. I have already included
>>>>> the
>>>>> *relevant* sections. However, if you think there may be some other
>>>>> section that may help you feel free to be more explicit about what you
>>>>> are looking for. Right now I fail to understand what you might hope to
>>>>> see in logs from several weeks ago that you can't get from the last
>>>>> day
>>>>> or so.
>>>>>
>>>>
>>>> It's a standard way, people tend to think that they know what is a
>>>> relevant part of a log, but in many cases they fail. Asking for the
>>>> whole logs has proven to be faster than trying to find the relevant
>>>> part through the user. And you're right, I don't need the logs from
>>>> last week, just logs since the last start of the services when you
>>>> observed the problem.
>>>>
>>>> Regards,
>>>> Jirka
>>>>
>>>>> regards,
>>>>> John
>>>>>
>>>>>
>>>>> On 24/07/14 19:10, Jiri Moskovcak wrote:
>>>>>> Hi, please provide the the exact versions of ovirt-hosted-engine-ha
>>>>>> and all logs from /var/log/ovirt-hosted-engine-ha/
>>>>>>
>>>>>> Thank you,
>>>>>> Jirka
>>>>>>
>>>>>> On 07/24/2014 01:29 AM, John Gardeniers wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have created a lab with 2 hypervisors and a self-hosted engine.
>>>>>>> Today
>>>>>>> I followed the upgrade instructions as described in
>>>>>>> http://www.ovirt.org/Hosted_Engine_Howto and rebooted the engine. I
>>>>>>> didn't really do an upgrade but simply wanted to test what would
>>>>>>> happen
>>>>>>> when the engine was rebooted.
>>>>>>>
>>>>>>> When the engine didn't restart I re-ran hosted-engine
>>>>>>> --set-maintenance=none and restarted the vdsm, ovirt-ha-agent and
>>>>>>> ovirt-ha-broker services on both nodes. 15 minutes later it still
>>>>>>> hadn't
>>>>>>> restarted, so I then tried rebooting both hypervisers. After an hour
>>>>>>> there was still no sign of the engine starting. The agent logs don't
>>>>>>> help me much. The following bits are repeated over and over.
>>>>>>>
>>>>>>> ovirt1 (192.168.19.20):
>>>>>>>
>>>>>>> MainThread::INFO::2014-07-24
>>>>>>> 09:18:40,272::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Trying: notify time=1406157520.27 type=state_transition
>>>>>>> detail=EngineDown-EngineDown hostname='ovirt1.om.net'
>>>>>>> MainThread::INFO::2014-07-24
>>>>>>> 09:18:40,272::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Success, was notification of state_transition
>>>>>>> (EngineDown-EngineDown)
>>>>>>> sent? ignored
>>>>>>> MainThread::INFO::2014-07-24
>>>>>>> 09:18:40,594::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Current state EngineDown (score: 2400)
>>>>>>> MainThread::INFO::2014-07-24
>>>>>>> 09:18:40,594::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best remote host 192.168.19.21 (id: 2, score: 2400)
>>>>>>>
>>>>>>> ovirt2 (192.168.19.21):
>>>>>>>
>>>>>>> MainThread::INFO::2014-07-24
>>>>>>> 09:18:04,005::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Trying: notify time=1406157484.01 type=state_transition
>>>>>>> detail=EngineDown-EngineDown hostname='ovirt2.om.net'
>>>>>>> MainThread::INFO::2014-07-24
>>>>>>> 09:18:04,006::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Success, was notification of state_transition
>>>>>>> (EngineDown-EngineDown)
>>>>>>> sent? ignored
>>>>>>> MainThread::INFO::2014-07-24
>>>>>>> 09:18:04,324::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Current state EngineDown (score: 2400)
>>>>>>> MainThread::INFO::2014-07-24
>>>>>>> 09:18:04,324::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best remote host 192.168.19.20 (id: 1, score: 2400)
>>>>>>>
>>>>>>> From the above information I decided to simply shut down one
>>>>>>> hypervisor
>>>>>>> and see what happens. The engine did start back up again a few
>>>>>>> minutes
>>>>>>> later.
>>>>>>>
>>>>>>> The interesting part is that each hypervisor seems to think the
>>>>>>> other is
>>>>>>> a better host. The two machines are identical, so there's no
>>>>>>> reason I
>>>>>>> can see for this odd behaviour. In a lab environment this is little
>>>>>>> more
>>>>>>> than an annoying inconvenience. In a production environment it
>>>>>>> would be
>>>>>>> completely unacceptable.
>>>>>>>
>>>>>>> May I suggest that this issue be looked into and some means found to
>>>>>>> eliminate this kind of mutual exclusion? e.g. After a few minutes of
>>>>>>> such an issue one hypervisor could be randomly given a slightly
>>>>>>> higher
>>>>>>> weighting, which should result in it being chosen to start the
>>>>>>> engine.
>>>>>>>
>>>>>>> regards,
>>>>>>> John
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at ovirt.org
>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ______________________________________________________________________
>>>>>>
>>>>>> This email has been scanned by the Symantec Email Security.cloud
>>>>>> service.
>>>>>> For more information please visit http://www.symanteccloud.com
>>>>>> ______________________________________________________________________
>>>>>>
>>>>>
>>>>
>>>>
>>>> ______________________________________________________________________
>>>> This email has been scanned by the Symantec Email Security.cloud
>>>> service.
>>>> For more information please visit http://www.symanteccloud.com
>>>> ______________________________________________________________________
>>>
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> ______________________________________________________________________
>
More information about the Users
mailing list