[ovirt-users] hosted engine health check issues
René Koch
rkoch at linuxland.at
Thu Apr 17 08:01:47 UTC 2014
On 04/17/2014 09:40 AM, Jiri Moskovcak wrote:
> On 04/17/2014 09:34 AM, René Koch wrote:
>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
>>> On 04/14/2014 10:50 AM, René Koch wrote:
>>>> Hi,
>>>>
>>>> I have some issues with hosted engine status.
>>>>
>>>> oVirt hosts think that hosted engine is down because it seems that
>>>> hosts
>>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at
>>>> least I think so).
>>>>
>>>> Here's the output of vm-status:
>>>>
>>>> # hosted-engine --vm-status
>>>>
>>>>
>>>> --== Host 1 status ==--
>>>>
>>>> Status up-to-date : False
>>>> Hostname : 10.0.200.102
>>>> Host ID : 1
>>>> Engine status : unknown stale-data
>>>> Score : 2400
>>>> Local maintenance : False
>>>> Host timestamp : 1397035677
>>>> Extra metadata (valid at timestamp):
>>>> metadata_parse_version=1
>>>> metadata_feature_version=1
>>>> timestamp=1397035677 (Wed Apr 9 11:27:57 2014)
>>>> host-id=1
>>>> score=2400
>>>> maintenance=False
>>>> state=EngineUp
>>>>
>>>>
>>>> --== Host 2 status ==--
>>>>
>>>> Status up-to-date : True
>>>> Hostname : 10.0.200.101
>>>> Host ID : 2
>>>> Engine status : {'reason': 'vm not running on this
>>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
>>>> Score : 0
>>>> Local maintenance : False
>>>> Host timestamp : 1397464031
>>>> Extra metadata (valid at timestamp):
>>>> metadata_parse_version=1
>>>> metadata_feature_version=1
>>>> timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
>>>> host-id=2
>>>> score=0
>>>> maintenance=False
>>>> state=EngineUnexpectedlyDown
>>>> timeout=Mon Apr 14 10:35:05 2014
>>>>
>>>> oVirt engine is sending me 2 emails every 10 minutes with the following
>>>> subjects:
>>>> - ovirt-hosted-engine state transition EngineDown-EngineStart
>>>> - ovirt-hosted-engine state transition EngineStart-EngineUp
>>>>
>>>> In oVirt webadmin I can see the following message:
>>>> VM HostedEngine is down. Exit message: internal error Failed to acquire
>>>> lock: error -243.
>>>>
>>>> These messages are really annoying as oVirt isn't doing anything with
>>>> hosted engine - I have an uptime of 9 days in my engine vm.
>>>>
>>>> So my questions are now:
>>>> Is it intended to send out these messages and detect that ovirt engine
>>>> is down (which is false anyway), but not to restart the vm?
>>>>
>>>> How can I disable notifications? I'm planning to write a Nagios plugin
>>>> which parses the output of hosted-engine --vm-status and only Nagios
>>>> should notify me, not hosted-engine script.
>>>>
>>>> Is is possible or planned to make the whole ha feature optional? I
>>>> really really really hate cluster software as it causes more troubles
>>>> then standalone machines and in my case the hosted-engine ha feature
>>>> really causes troubles (and I didn't had a hardware or network outage
>>>> yet only issues with hosted-engine ha agent). I don't need any ha
>>>> feature for hosted engine. I just want to run engine virtualized on
>>>> oVirt and if engine vm fails (e.g. because of issues with a host) I'll
>>>> restart it on another node.
>>>
>>> Hi, you can:
>>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
>>> the logger as you like
>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services
>>
>> Thanks for the information.
>> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't
>> running?
>>
>
> - yes, it might cause some problems if you set up another host for
> hosted engine and run the agent on the other host, but as long as you
> don't have the agent running anywhere or you don't need to migrate the
> engine vm, you should be fine.
Thanks!
At the moment I have an issue with ovirt-ha-broker running crazy and
don't react on kill -9:
# ps aux | egrep -e '%CPU|\[ovirt-ha-broker\]' | grep -v grep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
vdsm 3059 224 0.0 0 0 ? Zl Mar03 145536:45
[ovirt-ha-broker] <defunct>
# kill -9 3059
# ps aux | egrep -e '%CPU|\[ovirt-ha-broker\]' | grep -v grep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
vdsm 3059 224 0.0 0 0 ? Zl Mar03 145545:17
[ovirt-ha-broker] <defunct>
>
> --Jirka
>
>>
>> Regards,
>> René
>>
>>>
>>> --Jirka
>>>>
>>>> Thanks,
>>>> René
>>>>
>>>>
>>>
>
More information about the Users
mailing list