[ovirt-users] [hosted-engine-ha] restart-loop

Daniel Helgenberger daniel.helgenberger at m-box.de
Thu Oct 2 11:43:46 UTC 2014


On 02.10.2014 09:51, Jiri Moskovcak wrote:
> On 10/01/2014 02:39 PM, Daniel Helgenberger wrote:
>> On 01.10.2014 13:33, Jiri Moskovcak wrote:
>>> On 10/01/2014 01:17 PM, Daniel Helgenberger wrote:
>>>> Hello Jirka,
>>>> On 01.10.2014 09:10, Jiri Moskovcak wrote:
>>>>> Hi Daniel,
>>>>> from the logs it seems like you ran into [1]. It should be fixed in
>>>>> ovirt-hosted-engine-ha-1.1.5 (part of oVirt 3.4.2).
>>>> I am running 3.4.4 - and from hosted-engine --vm-status both hosts had a
>>>> score of 2400...
>>> - doesn't seem like it from the logs, I can see the transition from
>>> EngineStart to EngineUp and directly to EngineUpBadHealth, if you have
>>> the latest version it should go to the EngineStarting before it's
>>> EngineUp, are you sure you've restarted the services (broker and agent)
>>> after update? Please provide output of rpm -q ovirt-hosted-engine-ha.
>> here you go:
>> rpm -q ovirt-hosted-engine-ha
>> ovirt-hosted-engine-ha-1.1.5-1.el6.noarch
>>
>>
>> also, I upgraded to 3.4.3 prior to 3.4.4. I cannot recall whatevter I
>> restarted ovirt-ha-agent; but it is highly likely. Here system reboots
>> after kernel updates:
>> reboot   system boot  2.6.32-431.29.2. Tue Sep 30 21:46 - 14:36  (16:50)
>> reboot   system boot  2.6.32-431.29.2. Mon Sep 29 12:19 - 21:44 (1+09:24)
>> reboot   system boot  2.6.32-431.29.2. Fri Sep 12 08:47 - 12:17 (17+03:30)
>> reboot   system boot  2.6.32-431.20.3. Mon Sep  1 17:48 - 08:44 (10+14:56)
> ok, so please just to be 100% sure, check the version on both hosts (it 
> should be >= 1.1.5) and restart broker and agent and then try to 
> reproduce the problem. I went thru the code in 1.1.5 and I don't see any 
> code path which could take the agent from EngineStart to EngineUp 
> without going thru the EngineStarting state - this was the behavior 
> prior 1.1.5.
Hi Jirka,
sadly I cannot reproduce this atm because yesterday I upgraded to
ovirt-hosted-engine-ha-1.1.6-1.el6.noarch. (but at least, I did restart
everything). This was resulting in HA being inoperable, one of my HA
hosts quits with: Exception: Failed to start monitoring domain
(sd_uuid=bcfa7ec4-5278-44d8-9f31-682f2d9de91d, host_id=1): timeout
during domain acquisition

I might have a lot of issues because of changes I made for resolving
BZ1147148 (witch should be reverted by now). I try do downgrade and if I
get HA working again try to reproduce this.

Cheers
>
> Regards,
> Jirka
>
>>> Thanks,
>>> Jirka
>>>
>>>>> --Jirka
>>>>>
>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1093366
>>>>>
>>>>> On 09/27/2014 12:40 PM, Daniel Helgenberger wrote:
>>>>>> Hello,
>>>>>>
>>>>>> before filing a BZ against 3.4 branch I wanted to get some input on the
>>>>>> following issue:
>>>>>>
>>>>>> Steps, root shell on one engine-ha hosts, using hosted-engine cmd:
>>>>>> 1. set global maintenance
>>>>>> 2. shutdown hosted-engine vm
>>>>>> (do some work)
>>>>>> 3. disable global maintenance
>>>>>>
>>>>>> Result: My engine was started and immediately powered down again, in a loop.
>>>>>> I could only manually brake this with:
>>>>>> 1. enable global mt. gain
>>>>>> 2. start engine
>>>>>> 3. disable global mt.
>>>>>>
>>>>>> I attached the hosts' engine-ha broker logs as well as agent logs, from
>>>>>> today 12:00  to 12:27, right after I 'fixed' this.
>>>>>> Note, the engine was started on nodehv02 automatically after i disabled
>>>>>> global mt. @ about 12:05
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>
>

-- 
Daniel Helgenberger
m box bewegtbild GmbH

P: +49/30/2408781-22
F: +49/30/2408781-10

ACKERSTR. 19
D-10115 BERLIN


www.m-box.de  www.monkeymen.tv

Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767




More information about the Users mailing list