[ovirt-users] hosted engine health check issues

Itamar Heim iheim at redhat.com
Tue Apr 22 14:04:15 UTC 2014


On 04/14/2014 11:50 AM, René Koch wrote:
> Hi,
>
> I have some issues with hosted engine status.
>
> oVirt hosts think that hosted engine is down because it seems that hosts
> can't write to hosted-engine.lockspace due to glusterfs issues (or at
> least I think so).
>
> Here's the output of vm-status:
>
> # hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> Status up-to-date                  : False
> Hostname                           : 10.0.200.102
> Host ID                            : 1
> Engine status                      : unknown stale-data
> Score                              : 2400
> Local maintenance                  : False
> Host timestamp                     : 1397035677
> Extra metadata (valid at timestamp):
>      metadata_parse_version=1
>      metadata_feature_version=1
>      timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
>      host-id=1
>      score=2400
>      maintenance=False
>      state=EngineUp
>
>
> --== Host 2 status ==--
>
> Status up-to-date                  : True
> Hostname                           : 10.0.200.101
> Host ID                            : 2
> Engine status                      : {'reason': 'vm not running on this
> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
> Score                              : 0
> Local maintenance                  : False
> Host timestamp                     : 1397464031
> Extra metadata (valid at timestamp):
>      metadata_parse_version=1
>      metadata_feature_version=1
>      timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
>      host-id=2
>      score=0
>      maintenance=False
>      state=EngineUnexpectedlyDown
>      timeout=Mon Apr 14 10:35:05 2014
>
> oVirt engine is sending me 2 emails every 10 minutes with the following
> subjects:
> - ovirt-hosted-engine state transition EngineDown-EngineStart
> - ovirt-hosted-engine state transition EngineStart-EngineUp
>
> In oVirt webadmin I can see the following message:
> VM HostedEngine is down. Exit message: internal error Failed to acquire
> lock: error -243.
>
> These messages are really annoying as oVirt isn't doing anything with
> hosted engine - I have an uptime of 9 days in my engine vm.
>
> So my questions are now:
> Is it intended to send out these messages and detect that ovirt engine
> is down (which is false anyway), but not to restart the vm?
>
> How can I disable notifications? I'm planning to write a Nagios plugin
> which parses the output of hosted-engine --vm-status and only Nagios
> should notify me, not hosted-engine script.
>
> Is is possible or planned to make the whole ha feature optional? I
> really really really hate cluster software as it causes more troubles
> then standalone machines and in my case the hosted-engine ha feature
> really causes troubles (and I didn't had a hardware or network outage
> yet only issues with hosted-engine ha agent). I don't need any ha
> feature for hosted engine. I just want to run engine virtualized on
> oVirt and if engine vm fails (e.g. because of issues with a host) I'll
> restart it on another node.
>
> Thanks,
> René
>
>

I'm pretty sure we removed hosted-engine on gluster due to concerns 
around the locking issues.
is the gluster configured with quorum to avoid split brains?




More information about the Users mailing list