Hi John,
this is the patch fixing your problem [1]. It can be found at the top of
that bz page. It's really a simple change, so if you want you can just
change it manually on your system without waiting for a patches version.
--Jirka
[1]
Hi Jirka,
Thanks for the update. It sounds like the same bug but with a few extra
issues thrown in. e.g. Comment 9 seems to me to be a completely separate
bug, although it may affect the issue I reported.
I can't see any mention of how the problem is being resolved, which I am
interested in, but will keep an eye on it.
I'll try the patched version when I get the time and enthusiasm to give
it another crack.
regards,
John
On 14/08/14 22:57, Jiri Moskovcak wrote:
> Hi John,
> after a deeper look I realized that you're probably facing [1]. The
> patch is ready and I will also backport it to 3.4 branch.
>
> --Jirka
>
> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1093638
>
> On 07/29/2014 11:41 PM, John Gardeniers wrote:
>> Hi Jiri,
>>
>> Sorry, I can't supply the log because the hosts have been recycled but
>> I'm sure it would have contained exactly the same information that you
>> already have from host2. It's a classic deadlock situation that should
>> never be allowed to happen. A simple and time proven solution was in my
>> original post.
>>
>> The reason for recycling the hosts is that I discovered yesterday that
>> although the engine was still running it could not be accessed in any
>> way. Upon further finding that there was no way to get it restarted I
>> decided to abandon the whole idea of self-hosting until such time as I
>> see an indication that it's production ready.
>>
>> regards,
>> John
>>
>>
>> On 29/07/14 22:52, Jiri Moskovcak wrote:
>>> Hi John,
>>> thanks for the logs. Seems like the engine is running on host2 and it
>>> decides that it doesn't have the best score and shuts the engine down
>>> and then neither of them want's to start the vm until you restart the
>>> host2. Unfortunately the logs doesn't contain the part from host1 from
>>> 2014-07-24 09:XX which I'd like to investigate because it might
>>> contain the information why host1 refused to start the vm when host2
>>> killed it.
>>>
>>> Regards,
>>> Jirka
>>>
>>> On 07/28/2014 02:57 AM, John Gardeniers wrote:
>>>> Hi Jira,
>>>>
>>>> Version: ovirt-hosted-engine-ha-1.1.5-1.el6.noarch
>>>>
>>>> Attached are the logs. Thanks for looking.
>>>>
>>>> Regards,
>>>> John
>>>>
>>>>
>>>> On 25/07/14 17:47, Jiri Moskovcak wrote:
>>>>> On 07/24/2014 11:37 PM, John Gardeniers wrote:
>>>>>> Hi Jiri,
>>>>>>
>>>>>> Perhaps you can tell me how to determine the exact version of
>>>>>> ovirt-hosted-engine-ha.
>>>>>
>>>>> Centos/RHEL/Fedora: rpm -q ovirt-hosted-engine-ha
>>>>>
>>>>>> As for the logs, I am not going to attach 60MB
>>>>>> of logs to an email,
>>>>>
>>>>> - there are other ways to share the logs
>>>>>
>>>>>> nor can I see any imaginagle reason for you wanting
>>>>>> to see them all, as the bulk is historical. I have already
included
>>>>>> the
>>>>>> *relevant* sections. However, if you think there may be some
other
>>>>>> section that may help you feel free to be more explicit about
>>>>>> what you
>>>>>> are looking for. Right now I fail to understand what you might
>>>>>> hope to
>>>>>> see in logs from several weeks ago that you can't get from
the last
>>>>>> day
>>>>>> or so.
>>>>>>
>>>>>
>>>>> It's a standard way, people tend to think that they know what is
a
>>>>> relevant part of a log, but in many cases they fail. Asking for the
>>>>> whole logs has proven to be faster than trying to find the relevant
>>>>> part through the user. And you're right, I don't need the
logs from
>>>>> last week, just logs since the last start of the services when you
>>>>> observed the problem.
>>>>>
>>>>> Regards,
>>>>> Jirka
>>>>>
>>>>>> regards,
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 24/07/14 19:10, Jiri Moskovcak wrote:
>>>>>>> Hi, please provide the the exact versions of
ovirt-hosted-engine-ha
>>>>>>> and all logs from /var/log/ovirt-hosted-engine-ha/
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Jirka
>>>>>>>
>>>>>>> On 07/24/2014 01:29 AM, John Gardeniers wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I have created a lab with 2 hypervisors and a self-hosted
engine.
>>>>>>>> Today
>>>>>>>> I followed the upgrade instructions as described in
>>>>>>>>
http://www.ovirt.org/Hosted_Engine_Howto and rebooted
the
>>>>>>>> engine. I
>>>>>>>> didn't really do an upgrade but simply wanted to test
what would
>>>>>>>> happen
>>>>>>>> when the engine was rebooted.
>>>>>>>>
>>>>>>>> When the engine didn't restart I re-ran
hosted-engine
>>>>>>>> --set-maintenance=none and restarted the vdsm,
ovirt-ha-agent and
>>>>>>>> ovirt-ha-broker services on both nodes. 15 minutes later
it still
>>>>>>>> hadn't
>>>>>>>> restarted, so I then tried rebooting both hypervisers.
After an
>>>>>>>> hour
>>>>>>>> there was still no sign of the engine starting. The agent
logs
>>>>>>>> don't
>>>>>>>> help me much. The following bits are repeated over and
over.
>>>>>>>>
>>>>>>>> ovirt1 (192.168.19.20):
>>>>>>>>
>>>>>>>> MainThread::INFO::2014-07-24
>>>>>>>>
09:18:40,272::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Trying: notify time=1406157520.27 type=state_transition
>>>>>>>> detail=EngineDown-EngineDown
hostname='ovirt1.om.net'
>>>>>>>> MainThread::INFO::2014-07-24
>>>>>>>>
09:18:40,272::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Success, was notification of state_transition
>>>>>>>> (EngineDown-EngineDown)
>>>>>>>> sent? ignored
>>>>>>>> MainThread::INFO::2014-07-24
>>>>>>>>
09:18:40,594::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Current state EngineDown (score: 2400)
>>>>>>>> MainThread::INFO::2014-07-24
>>>>>>>>
09:18:40,594::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best remote host 192.168.19.21 (id: 2, score: 2400)
>>>>>>>>
>>>>>>>> ovirt2 (192.168.19.21):
>>>>>>>>
>>>>>>>> MainThread::INFO::2014-07-24
>>>>>>>>
09:18:04,005::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Trying: notify time=1406157484.01 type=state_transition
>>>>>>>> detail=EngineDown-EngineDown
hostname='ovirt2.om.net'
>>>>>>>> MainThread::INFO::2014-07-24
>>>>>>>>
09:18:04,006::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Success, was notification of state_transition
>>>>>>>> (EngineDown-EngineDown)
>>>>>>>> sent? ignored
>>>>>>>> MainThread::INFO::2014-07-24
>>>>>>>>
09:18:04,324::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Current state EngineDown (score: 2400)
>>>>>>>> MainThread::INFO::2014-07-24
>>>>>>>>
09:18:04,324::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best remote host 192.168.19.20 (id: 1, score: 2400)
>>>>>>>>
>>>>>>>> From the above information I decided to simply shut
down one
>>>>>>>> hypervisor
>>>>>>>> and see what happens. The engine did start back up again
a few
>>>>>>>> minutes
>>>>>>>> later.
>>>>>>>>
>>>>>>>> The interesting part is that each hypervisor seems to
think the
>>>>>>>> other is
>>>>>>>> a better host. The two machines are identical, so
there's no
>>>>>>>> reason I
>>>>>>>> can see for this odd behaviour. In a lab environment this
is
>>>>>>>> little
>>>>>>>> more
>>>>>>>> than an annoying inconvenience. In a production
environment it
>>>>>>>> would be
>>>>>>>> completely unacceptable.
>>>>>>>>
>>>>>>>> May I suggest that this issue be looked into and some
means
>>>>>>>> found to
>>>>>>>> eliminate this kind of mutual exclusion? e.g. After a
few
>>>>>>>> minutes of
>>>>>>>> such an issue one hypervisor could be randomly given a
slightly
>>>>>>>> higher
>>>>>>>> weighting, which should result in it being chosen to
start the
>>>>>>>> engine.
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> John
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users(a)ovirt.org
>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
______________________________________________________________________
>>>>>>>
>>>>>>>
>>>>>>> This email has been scanned by the Symantec Email
Security.cloud
>>>>>>> service.
>>>>>>> For more information please visit
http://www.symanteccloud.com
>>>>>>>
______________________________________________________________________
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
______________________________________________________________________
>>>>>
>>>>> This email has been scanned by the Symantec Email Security.cloud
>>>>> service.
>>>>> For more information please visit
http://www.symanteccloud.com
>>>>>
______________________________________________________________________
>>>>>
>>>>
>>>
>>>
>>> ______________________________________________________________________
>>> This email has been scanned by the Symantec Email Security.cloud
>>> service.
>>> For more information please visit
http://www.symanteccloud.com
>>> ______________________________________________________________________
>>
>
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit
http://www.symanteccloud.com
> ______________________________________________________________________