[ovirt-users] hosted engine health check issues

René Koch rkoch at linuxland.at
Thu Apr 17 08:01:47 UTC 2014


On 04/17/2014 09:40 AM, Jiri Moskovcak wrote:
> On 04/17/2014 09:34 AM, René Koch wrote:
>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
>>> On 04/14/2014 10:50 AM, René Koch wrote:
>>>> Hi,
>>>>
>>>> I have some issues with hosted engine status.
>>>>
>>>> oVirt hosts think that hosted engine is down because it seems that
>>>> hosts
>>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at
>>>> least I think so).
>>>>
>>>> Here's the output of vm-status:
>>>>
>>>> # hosted-engine --vm-status
>>>>
>>>>
>>>> --== Host 1 status ==--
>>>>
>>>> Status up-to-date                  : False
>>>> Hostname                           : 10.0.200.102
>>>> Host ID                            : 1
>>>> Engine status                      : unknown stale-data
>>>> Score                              : 2400
>>>> Local maintenance                  : False
>>>> Host timestamp                     : 1397035677
>>>> Extra metadata (valid at timestamp):
>>>>      metadata_parse_version=1
>>>>      metadata_feature_version=1
>>>>      timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
>>>>      host-id=1
>>>>      score=2400
>>>>      maintenance=False
>>>>      state=EngineUp
>>>>
>>>>
>>>> --== Host 2 status ==--
>>>>
>>>> Status up-to-date                  : True
>>>> Hostname                           : 10.0.200.101
>>>> Host ID                            : 2
>>>> Engine status                      : {'reason': 'vm not running on this
>>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
>>>> Score                              : 0
>>>> Local maintenance                  : False
>>>> Host timestamp                     : 1397464031
>>>> Extra metadata (valid at timestamp):
>>>>      metadata_parse_version=1
>>>>      metadata_feature_version=1
>>>>      timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
>>>>      host-id=2
>>>>      score=0
>>>>      maintenance=False
>>>>      state=EngineUnexpectedlyDown
>>>>      timeout=Mon Apr 14 10:35:05 2014
>>>>
>>>> oVirt engine is sending me 2 emails every 10 minutes with the following
>>>> subjects:
>>>> - ovirt-hosted-engine state transition EngineDown-EngineStart
>>>> - ovirt-hosted-engine state transition EngineStart-EngineUp
>>>>
>>>> In oVirt webadmin I can see the following message:
>>>> VM HostedEngine is down. Exit message: internal error Failed to acquire
>>>> lock: error -243.
>>>>
>>>> These messages are really annoying as oVirt isn't doing anything with
>>>> hosted engine - I have an uptime of 9 days in my engine vm.
>>>>
>>>> So my questions are now:
>>>> Is it intended to send out these messages and detect that ovirt engine
>>>> is down (which is false anyway), but not to restart the vm?
>>>>
>>>> How can I disable notifications? I'm planning to write a Nagios plugin
>>>> which parses the output of hosted-engine --vm-status and only Nagios
>>>> should notify me, not hosted-engine script.
>>>>
>>>> Is is possible or planned to make the whole ha feature optional? I
>>>> really really really hate cluster software as it causes more troubles
>>>> then standalone machines and in my case the hosted-engine ha feature
>>>> really causes troubles (and I didn't had a hardware or network outage
>>>> yet only issues with hosted-engine ha agent). I don't need any ha
>>>> feature for hosted engine. I just want to run engine virtualized on
>>>> oVirt and if engine vm fails (e.g. because of issues with a host) I'll
>>>> restart it on another node.
>>>
>>> Hi, you can:
>>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
>>> the logger as you like
>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services
>>
>> Thanks for the information.
>> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't
>> running?
>>
>
> - yes, it might cause some problems if you set up another host for
> hosted engine and run the agent on the other host, but as long as you
> don't have the agent running anywhere or you don't need to migrate the
> engine vm, you should be fine.

Thanks!

At the moment I have an issue with ovirt-ha-broker running crazy and 
don't react on kill -9:

# ps aux | egrep -e '%CPU|\[ovirt-ha-broker\]' | grep -v grep
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
vdsm      3059  224  0.0      0     0 ?        Zl   Mar03 145536:45 
[ovirt-ha-broker] <defunct>
# kill -9 3059
# ps aux | egrep -e '%CPU|\[ovirt-ha-broker\]' | grep -v grep
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
vdsm      3059  224  0.0      0     0 ?        Zl   Mar03 145545:17 
[ovirt-ha-broker] <defunct>


>
> --Jirka
>
>>
>> Regards,
>> René
>>
>>>
>>> --Jirka
>>>>
>>>> Thanks,
>>>> René
>>>>
>>>>
>>>
>



More information about the Users mailing list