[ovirt-users] Self-hosted engine won't start

Jiri Moskovcak jmoskovc at redhat.com
Fri Jul 25 07:47:17 UTC 2014


On 07/24/2014 11:37 PM, John Gardeniers wrote:
> Hi Jiri,
>
> Perhaps you can tell me how to determine the exact version of
> ovirt-hosted-engine-ha.

Centos/RHEL/Fedora: rpm -q ovirt-hosted-engine-ha

> As for the logs, I am not going to attach 60MB
> of logs to an email,

- there are other ways to share the logs

> nor can I see any imaginagle reason for you wanting
> to see them all, as the bulk is historical. I have already included the
> *relevant* sections. However, if you think there may be some other
> section that may help you feel free to be more explicit about what you
> are looking for. Right now I fail to understand what you might hope to
> see in logs from several weeks ago that you can't get from the last day
> or so.
>

It's a standard way, people tend to think that they know what is a 
relevant part of a log, but in many cases they fail. Asking for the 
whole logs has proven to be faster than trying to find the relevant part 
through the user. And you're right, I don't need the logs from last 
week, just logs since the last start of the services when you observed 
the problem.

Regards,
Jirka

> regards,
> John
>
>
> On 24/07/14 19:10, Jiri Moskovcak wrote:
>> Hi, please provide the the exact versions of ovirt-hosted-engine-ha
>> and all logs from /var/log/ovirt-hosted-engine-ha/
>>
>> Thank you,
>> Jirka
>>
>> On 07/24/2014 01:29 AM, John Gardeniers wrote:
>>> Hi All,
>>>
>>> I have created a lab with 2 hypervisors and a self-hosted engine. Today
>>> I followed the upgrade instructions as described in
>>> http://www.ovirt.org/Hosted_Engine_Howto and rebooted the engine. I
>>> didn't really do an upgrade but simply wanted to test what would happen
>>> when the engine was rebooted.
>>>
>>> When the engine didn't restart I re-ran hosted-engine
>>> --set-maintenance=none and restarted the vdsm, ovirt-ha-agent and
>>> ovirt-ha-broker services on both nodes. 15 minutes later it still hadn't
>>> restarted, so I then tried rebooting both hypervisers. After an hour
>>> there was still no sign of the engine starting. The agent logs don't
>>> help me much. The following bits are repeated over and over.
>>>
>>> ovirt1 (192.168.19.20):
>>>
>>> MainThread::INFO::2014-07-24
>>> 09:18:40,272::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>
>>> Trying: notify time=1406157520.27 type=state_transition
>>> detail=EngineDown-EngineDown hostname='ovirt1.om.net'
>>> MainThread::INFO::2014-07-24
>>> 09:18:40,272::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>
>>> Success, was notification of state_transition (EngineDown-EngineDown)
>>> sent? ignored
>>> MainThread::INFO::2014-07-24
>>> 09:18:40,594::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>
>>> Current state EngineDown (score: 2400)
>>> MainThread::INFO::2014-07-24
>>> 09:18:40,594::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>
>>> Best remote host 192.168.19.21 (id: 2, score: 2400)
>>>
>>> ovirt2 (192.168.19.21):
>>>
>>> MainThread::INFO::2014-07-24
>>> 09:18:04,005::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>
>>> Trying: notify time=1406157484.01 type=state_transition
>>> detail=EngineDown-EngineDown hostname='ovirt2.om.net'
>>> MainThread::INFO::2014-07-24
>>> 09:18:04,006::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>
>>> Success, was notification of state_transition (EngineDown-EngineDown)
>>> sent? ignored
>>> MainThread::INFO::2014-07-24
>>> 09:18:04,324::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>
>>> Current state EngineDown (score: 2400)
>>> MainThread::INFO::2014-07-24
>>> 09:18:04,324::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>
>>> Best remote host 192.168.19.20 (id: 1, score: 2400)
>>>
>>>   From the above information I decided to simply shut down one hypervisor
>>> and see what happens. The engine did start back up again a few minutes
>>> later.
>>>
>>> The interesting part is that each hypervisor seems to think the other is
>>> a better host. The two machines are identical, so there's no reason I
>>> can see for this odd behaviour. In a lab environment this is little more
>>> than an annoying inconvenience. In a production environment it would be
>>> completely unacceptable.
>>>
>>> May I suggest that this issue be looked into and some means found to
>>> eliminate this kind of mutual exclusion? e.g. After a few minutes of
>>> such an issue one hypervisor could be randomly given a slightly higher
>>> weighting, which should result in it being chosen to start the engine.
>>>
>>> regards,
>>> John
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> ______________________________________________________________________
>




More information about the Users mailing list