[ovirt-users] hosted engine health check issues
Martin Sivak
msivak at redhat.com
Wed Apr 23 09:08:05 UTC 2014
Hi René,
> >> libvirtError: Failed to acquire lock: No space left on device
> >> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
> >> lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
Can you please check the contents of /rhev/data-center/<your nfs mount>/<nfs domain uuid>/ha_agent/?
This is how it should look like:
[root at dev-03 ~]# ls -al /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
total 2036
drwxr-x---. 2 vdsm kvm 4096 Mar 19 18:46 .
drwxr-xr-x. 6 vdsm kvm 4096 Mar 19 18:46 ..
-rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace
-rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata
The errors seem to indicate that you somehow lost the lockspace file.
--
Martin Sivák
msivak at redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ
----- Original Message -----
> On 04/23/2014 12:28 AM, Doron Fediuck wrote:
> > Hi Rene,
> > any idea what closed your ovirtmgmt bridge?
> > as long as it is down vdsm may have issues starting up properly
> > and this is why you see the complaints on the rpc server.
> >
> > Can you try manually fixing the network part first and then
> > restart vdsm?
> > Once vdsm is happy hosted engine VM will start.
>
> Thanks for your feedback, Doron.
>
> My ovirtmgmt bridge seems to be on or isn't it:
> # brctl show ovirtmgmt
> bridge name bridge id STP enabled interfaces
> ovirtmgmt 8000.0025907587c2 no eth0.200
>
> # ip a s ovirtmgmt
> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> state UNKNOWN
> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
> inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
> inet6 fe80::225:90ff:fe75:87c2/64 scope link
> valid_lft forever preferred_lft forever
>
> # ip a s eth0.200
> 6: eth0.200 at eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UP
> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
> inet6 fe80::225:90ff:fe75:87c2/64 scope link
> valid_lft forever preferred_lft forever
>
> I tried the following yesterday:
> Copy virtual disk from GlusterFS storage to local disk of host and
> create a new vm with virt-manager which loads ovirtmgmt disk. I could
> reach my engine over the ovirtmgmt bridge (so bridge must be working).
>
> I also started libvirtd with Option -v and I saw the following in
> libvirtd.log when trying to start ovirt engine:
> 2014-04-22 14:18:25.432+0000: 8901: debug : virCommandRunAsync:2250 :
> Command result 0, with PID 11491
> 2014-04-22 14:18:25.478+0000: 8901: debug : virCommandRun:2045 : Result
> exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is
> not a chain
>
> So it could be that something is broken in my hosted-engine network. Do
> you have any clue how I can troubleshoot this?
>
>
> Thanks,
> René
>
>
> >
> > ----- Original Message -----
> >> From: "René Koch" <rkoch at linuxland.at>
> >> To: "Martin Sivak" <msivak at redhat.com>
> >> Cc: users at ovirt.org
> >> Sent: Tuesday, April 22, 2014 1:46:38 PM
> >> Subject: Re: [ovirt-users] hosted engine health check issues
> >>
> >> Hi,
> >>
> >> I rebooted one of my ovirt hosts today and the result is now that I
> >> can't start hosted-engine anymore.
> >>
> >> ovirt-ha-agent isn't running because the lockspace file is missing
> >> (sanlock complains about it).
> >> So I tried to start hosted-engine with --vm-start and I get the
> >> following errors:
> >>
> >> ==> /var/log/sanlock.log <==
> >> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
> >> lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
> >>
> >> ==> /var/log/messages <==
> >> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654
> >> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name
> >> 2851af27-8744-445d-9fb1-a0d083c8dc82
> >> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
> >> disabled state
> >> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
> >> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
> >> disabled state
> >>
> >> ==> /var/log/vdsm/vdsm.log <==
> >> Thread-21::DEBUG::2014-04-22
> >> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
> >> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
> >> lock: No space left on device
> >> Thread-21::DEBUG::2014-04-22
> >> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
> >> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
> >> Thread-21::ERROR::2014-04-22
> >> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
> >> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
> >> Traceback (most recent call last):
> >> File "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm
> >> self._run()
> >> File "/usr/share/vdsm/vm.py", line 3170, in _run
> >> self._connection.createXML(domxml, flags),
> >> File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> >> line 92, in wrapper
> >> ret = f(*args, **kwargs)
> >> File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
> >> createXML
> >> if ret is None:raise libvirtError('virDomainCreateXML() failed',
> >> conn=self)
> >> libvirtError: Failed to acquire lock: No space left on device
> >>
> >> ==> /var/log/messages <==
> >> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
> >> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
> >> failed#012Traceback (most recent call last):#012 File
> >> "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm#012
> >> self._run()#012 File "/usr/share/vdsm/vm.py", line 3170, in _run#012
> >> self._connection.createXML(domxml, flags),#012 File
> >> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92,
> >> in wrapper#012 ret = f(*args, **kwargs)#012 File
> >> "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
> >> createXML#012 if ret is None:raise libvirtError('virDomainCreateXML()
> >> failed', conn=self)#012libvirtError: Failed to acquire lock: No space
> >> left on device
> >>
> >> ==> /var/log/vdsm/vdsm.log <==
> >> Thread-21::DEBUG::2014-04-22
> >> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
> >> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
> >> Failed to acquire lock: No space left on device
> >>
> >>
> >> No space left on device is nonsense as there is enough space (I had this
> >> issue last time as well where I had to patch machine.py, but this file
> >> is now Python 2.6.6 compatible.
> >>
> >> Any idea what prevents hosted-engine from starting?
> >> ovirt-ha-broker, vdsmd and sanlock are running btw.
> >>
> >> Btw, I can see in log that json rpc server module is missing - which
> >> package is required for CentOS 6.5?
> >> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json
> >> rpc server module. Please make sure it is installed.
> >>
> >>
> >> Thanks,
> >> René
> >>
> >>
> >>
> >> On 04/17/2014 10:02 AM, Martin Sivak wrote:
> >>> Hi,
> >>>
> >>>>>> How can I disable notifications?
> >>>
> >>> The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf
> >>> section notification.
> >>> The email is sent when the key state_transition exists and the string
> >>> OldState-NewState contains the (case insensitive) regexp from the value.
> >>>
> >>>>>> Is it intended to send out these messages and detect that ovirt engine
> >>>>>> is down (which is false anyway), but not to restart the vm?
> >>>
> >>> Forget about emails for now and check the
> >>> /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them
> >>> as well btw).
> >>>
> >>>>>> oVirt hosts think that hosted engine is down because it seems that
> >>>>>> hosts
> >>>>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at
> >>>>>> least I think so).
> >>>
> >>> The hosts think so or can't really write there? The lockspace is managed
> >>> by
> >>> sanlock and our HA daemons do not touch it at all. We only ask sanlock to
> >>> get make sure we have unique server id.
> >>>
> >>>>>> Is is possible or planned to make the whole ha feature optional?
> >>>
> >>> Well the system won't perform any automatic actions if you put the hosted
> >>> engine to global maintenance and only start/stop/migrate the VM manually.
> >>> I would discourage you from stopping agent/broker, because the engine
> >>> itself has some logic based on the reporting.
> >>>
> >>> Regards
> >>>
> >>> --
> >>> Martin Sivák
> >>> msivak at redhat.com
> >>> Red Hat Czech
> >>> RHEV-M SLA / Brno, CZ
> >>>
> >>> ----- Original Message -----
> >>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
> >>>>> On 04/14/2014 10:50 AM, René Koch wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have some issues with hosted engine status.
> >>>>>>
> >>>>>> oVirt hosts think that hosted engine is down because it seems that
> >>>>>> hosts
> >>>>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at
> >>>>>> least I think so).
> >>>>>>
> >>>>>> Here's the output of vm-status:
> >>>>>>
> >>>>>> # hosted-engine --vm-status
> >>>>>>
> >>>>>>
> >>>>>> --== Host 1 status ==--
> >>>>>>
> >>>>>> Status up-to-date : False
> >>>>>> Hostname : 10.0.200.102
> >>>>>> Host ID : 1
> >>>>>> Engine status : unknown stale-data
> >>>>>> Score : 2400
> >>>>>> Local maintenance : False
> >>>>>> Host timestamp : 1397035677
> >>>>>> Extra metadata (valid at timestamp):
> >>>>>> metadata_parse_version=1
> >>>>>> metadata_feature_version=1
> >>>>>> timestamp=1397035677 (Wed Apr 9 11:27:57 2014)
> >>>>>> host-id=1
> >>>>>> score=2400
> >>>>>> maintenance=False
> >>>>>> state=EngineUp
> >>>>>>
> >>>>>>
> >>>>>> --== Host 2 status ==--
> >>>>>>
> >>>>>> Status up-to-date : True
> >>>>>> Hostname : 10.0.200.101
> >>>>>> Host ID : 2
> >>>>>> Engine status : {'reason': 'vm not running on
> >>>>>> this
> >>>>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
> >>>>>> Score : 0
> >>>>>> Local maintenance : False
> >>>>>> Host timestamp : 1397464031
> >>>>>> Extra metadata (valid at timestamp):
> >>>>>> metadata_parse_version=1
> >>>>>> metadata_feature_version=1
> >>>>>> timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
> >>>>>> host-id=2
> >>>>>> score=0
> >>>>>> maintenance=False
> >>>>>> state=EngineUnexpectedlyDown
> >>>>>> timeout=Mon Apr 14 10:35:05 2014
> >>>>>>
> >>>>>> oVirt engine is sending me 2 emails every 10 minutes with the
> >>>>>> following
> >>>>>> subjects:
> >>>>>> - ovirt-hosted-engine state transition EngineDown-EngineStart
> >>>>>> - ovirt-hosted-engine state transition EngineStart-EngineUp
> >>>>>>
> >>>>>> In oVirt webadmin I can see the following message:
> >>>>>> VM HostedEngine is down. Exit message: internal error Failed to
> >>>>>> acquire
> >>>>>> lock: error -243.
> >>>>>>
> >>>>>> These messages are really annoying as oVirt isn't doing anything with
> >>>>>> hosted engine - I have an uptime of 9 days in my engine vm.
> >>>>>>
> >>>>>> So my questions are now:
> >>>>>> Is it intended to send out these messages and detect that ovirt engine
> >>>>>> is down (which is false anyway), but not to restart the vm?
> >>>>>>
> >>>>>> How can I disable notifications? I'm planning to write a Nagios plugin
> >>>>>> which parses the output of hosted-engine --vm-status and only Nagios
> >>>>>> should notify me, not hosted-engine script.
> >>>>>>
> >>>>>> Is is possible or planned to make the whole ha feature optional? I
> >>>>>> really really really hate cluster software as it causes more troubles
> >>>>>> then standalone machines and in my case the hosted-engine ha feature
> >>>>>> really causes troubles (and I didn't had a hardware or network outage
> >>>>>> yet only issues with hosted-engine ha agent). I don't need any ha
> >>>>>> feature for hosted engine. I just want to run engine virtualized on
> >>>>>> oVirt and if engine vm fails (e.g. because of issues with a host) I'll
> >>>>>> restart it on another node.
> >>>>>
> >>>>> Hi, you can:
> >>>>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
> >>>>> the logger as you like
> >>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services
> >>>>
> >>>> Thanks for the information.
> >>>> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't
> >>>> running?
> >>>>
> >>>>
> >>>> Regards,
> >>>> René
> >>>>
> >>>>>
> >>>>> --Jirka
> >>>>>>
> >>>>>> Thanks,
> >>>>>> René
> >>>>>>
> >>>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> Users at ovirt.org
> >>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>>
> >> _______________________________________________
> >> Users mailing list
> >> Users at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> >>
>
More information about the Users
mailing list