[ovirt-users] hosted engine health check issues

Doron Fediuck dfediuck at redhat.com
Tue Apr 22 22:28:00 UTC 2014


Hi Rene,
any idea what closed your ovirtmgmt bridge?
as long as it is down vdsm may have issues starting up properly
and this is why you see the complaints on the rpc server.

Can you try manually fixing the network part first and then
restart vdsm?
Once vdsm is happy hosted engine VM will start.

----- Original Message -----
> From: "René Koch" <rkoch at linuxland.at>
> To: "Martin Sivak" <msivak at redhat.com>
> Cc: users at ovirt.org
> Sent: Tuesday, April 22, 2014 1:46:38 PM
> Subject: Re: [ovirt-users] hosted engine health check issues
> 
> Hi,
> 
> I rebooted one of my ovirt hosts today and the result is now that I
> can't start hosted-engine anymore.
> 
> ovirt-ha-agent isn't running because the lockspace file is missing
> (sanlock complains about it).
> So I tried to start hosted-engine with --vm-start and I get the
> following errors:
> 
> ==> /var/log/sanlock.log <==
> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
> lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
> 
> ==> /var/log/messages <==
> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654
> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name
> 2851af27-8744-445d-9fb1-a0d083c8dc82
> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
> disabled state
> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
> disabled state
> 
> ==> /var/log/vdsm/vdsm.log <==
> Thread-21::DEBUG::2014-04-22
> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
> lock: No space left on device
> Thread-21::DEBUG::2014-04-22
> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
> Thread-21::ERROR::2014-04-22
> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
> Traceback (most recent call last):
>    File "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm
>      self._run()
>    File "/usr/share/vdsm/vm.py", line 3170, in _run
>      self._connection.createXML(domxml, flags),
>    File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> line 92, in wrapper
>      ret = f(*args, **kwargs)
>    File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
> createXML
>      if ret is None:raise libvirtError('virDomainCreateXML() failed',
> conn=self)
> libvirtError: Failed to acquire lock: No space left on device
> 
> ==> /var/log/messages <==
> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
> failed#012Traceback (most recent call last):#012  File
> "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm#012
> self._run()#012  File "/usr/share/vdsm/vm.py", line 3170, in _run#012
>   self._connection.createXML(domxml, flags),#012  File
> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92,
> in wrapper#012    ret = f(*args, **kwargs)#012  File
> "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
> createXML#012    if ret is None:raise libvirtError('virDomainCreateXML()
> failed', conn=self)#012libvirtError: Failed to acquire lock: No space
> left on device
> 
> ==> /var/log/vdsm/vdsm.log <==
> Thread-21::DEBUG::2014-04-22
> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
> Failed to acquire lock: No space left on device
> 
> 
> No space left on device is nonsense as there is enough space (I had this
> issue last time as well where I had to patch machine.py, but this file
> is now Python 2.6.6 compatible.
> 
> Any idea what prevents hosted-engine from starting?
> ovirt-ha-broker, vdsmd and sanlock are running btw.
> 
> Btw, I can see in log that json rpc server module is missing - which
> package is required for CentOS 6.5?
> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json
> rpc server module. Please make sure it is installed.
> 
> 
> Thanks,
> René
> 
> 
> 
> On 04/17/2014 10:02 AM, Martin Sivak wrote:
> > Hi,
> >
> >>>> How can I disable notifications?
> >
> > The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf
> > section notification.
> > The email is sent when the key state_transition exists and the string
> > OldState-NewState contains the (case insensitive) regexp from the value.
> >
> >>>> Is it intended to send out these messages and detect that ovirt engine
> >>>> is down (which is false anyway), but not to restart the vm?
> >
> > Forget about emails for now and check the
> > /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them
> > as well btw).
> >
> >>>> oVirt hosts think that hosted engine is down because it seems that hosts
> >>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at
> >>>> least I think so).
> >
> > The hosts think so or can't really write there? The lockspace is managed by
> > sanlock and our HA daemons do not touch it at all. We only ask sanlock to
> > get make sure we have unique server id.
> >
> >>>> Is is possible or planned to make the whole ha feature optional?
> >
> > Well the system won't perform any automatic actions if you put the hosted
> > engine to global maintenance and only start/stop/migrate the VM manually.
> > I would discourage you from stopping agent/broker, because the engine
> > itself has some logic based on the reporting.
> >
> > Regards
> >
> > --
> > Martin Sivák
> > msivak at redhat.com
> > Red Hat Czech
> > RHEV-M SLA / Brno, CZ
> >
> > ----- Original Message -----
> >> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
> >>> On 04/14/2014 10:50 AM, René Koch wrote:
> >>>> Hi,
> >>>>
> >>>> I have some issues with hosted engine status.
> >>>>
> >>>> oVirt hosts think that hosted engine is down because it seems that hosts
> >>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at
> >>>> least I think so).
> >>>>
> >>>> Here's the output of vm-status:
> >>>>
> >>>> # hosted-engine --vm-status
> >>>>
> >>>>
> >>>> --== Host 1 status ==--
> >>>>
> >>>> Status up-to-date                  : False
> >>>> Hostname                           : 10.0.200.102
> >>>> Host ID                            : 1
> >>>> Engine status                      : unknown stale-data
> >>>> Score                              : 2400
> >>>> Local maintenance                  : False
> >>>> Host timestamp                     : 1397035677
> >>>> Extra metadata (valid at timestamp):
> >>>>       metadata_parse_version=1
> >>>>       metadata_feature_version=1
> >>>>       timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
> >>>>       host-id=1
> >>>>       score=2400
> >>>>       maintenance=False
> >>>>       state=EngineUp
> >>>>
> >>>>
> >>>> --== Host 2 status ==--
> >>>>
> >>>> Status up-to-date                  : True
> >>>> Hostname                           : 10.0.200.101
> >>>> Host ID                            : 2
> >>>> Engine status                      : {'reason': 'vm not running on this
> >>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
> >>>> Score                              : 0
> >>>> Local maintenance                  : False
> >>>> Host timestamp                     : 1397464031
> >>>> Extra metadata (valid at timestamp):
> >>>>       metadata_parse_version=1
> >>>>       metadata_feature_version=1
> >>>>       timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
> >>>>       host-id=2
> >>>>       score=0
> >>>>       maintenance=False
> >>>>       state=EngineUnexpectedlyDown
> >>>>       timeout=Mon Apr 14 10:35:05 2014
> >>>>
> >>>> oVirt engine is sending me 2 emails every 10 minutes with the following
> >>>> subjects:
> >>>> - ovirt-hosted-engine state transition EngineDown-EngineStart
> >>>> - ovirt-hosted-engine state transition EngineStart-EngineUp
> >>>>
> >>>> In oVirt webadmin I can see the following message:
> >>>> VM HostedEngine is down. Exit message: internal error Failed to acquire
> >>>> lock: error -243.
> >>>>
> >>>> These messages are really annoying as oVirt isn't doing anything with
> >>>> hosted engine - I have an uptime of 9 days in my engine vm.
> >>>>
> >>>> So my questions are now:
> >>>> Is it intended to send out these messages and detect that ovirt engine
> >>>> is down (which is false anyway), but not to restart the vm?
> >>>>
> >>>> How can I disable notifications? I'm planning to write a Nagios plugin
> >>>> which parses the output of hosted-engine --vm-status and only Nagios
> >>>> should notify me, not hosted-engine script.
> >>>>
> >>>> Is is possible or planned to make the whole ha feature optional? I
> >>>> really really really hate cluster software as it causes more troubles
> >>>> then standalone machines and in my case the hosted-engine ha feature
> >>>> really causes troubles (and I didn't had a hardware or network outage
> >>>> yet only issues with hosted-engine ha agent). I don't need any ha
> >>>> feature for hosted engine. I just want to run engine virtualized on
> >>>> oVirt and if engine vm fails (e.g. because of issues with a host) I'll
> >>>> restart it on another node.
> >>>
> >>> Hi, you can:
> >>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
> >>> the logger as you like
> >>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services
> >>
> >> Thanks for the information.
> >> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't
> >> running?
> >>
> >>
> >> Regards,
> >> René
> >>
> >>>
> >>> --Jirka
> >>>>
> >>>> Thanks,
> >>>> René
> >>>>
> >>>>
> >>>
> >> _______________________________________________
> >> Users mailing list
> >> Users at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> >>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 



More information about the Users mailing list