same pb. ovirt-ha-broker have 400% cpu and is defunct. I can't kill with -9.
2014-04-23 13:55 GMT+02:00 Martin Sivak <msivak(a)redhat.com>:
Hi,
> Isn't this file created when hosted engine is started?
The file is created by the setup script. If it got lost then there was
probably something bad happening in your NFS or Gluster storage.
> Or how can I create this file manually?
I can give you experimental treatment for this. We do not have any
official way as this is something that should not ever happen :)
!! But before you do that make sure you do not have any nodes running
properly. This will destroy and reinitialize the lockspace database for the
whole hosted-engine environment (which you apparently lack, but..). !!
You have to create the ha_agent/hosted-engine.lockspace file with the
expected size (1MB) and then tell sanlock to initialize it as a lockspace
using:
# python
>>> import sanlock
>>> sanlock.write_lockspace(lockspace="hosted-engine",
... path="/rhev/data-center/mnt/<nfs>/<hosted engine storage
domain>/ha_agent/hosted-engine.lockspace",
... offset=0)
>>>
Then try starting the services (both broker and agent) again.
--
Martin Sivák
msivak(a)redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ
----- Original Message -----
> On 04/23/2014 11:08 AM, Martin Sivak wrote:
> > Hi René,
> >
> >>>> libvirtError: Failed to acquire lock: No space left on device
> >
> >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733
invalid
> >>>> lockspace found -1 failed 0 name
2851af27-8744-445d-9fb1-a0d083c8dc82
> >
> > Can you please check the contents of /rhev/data-center/<your nfs
> > mount>/<nfs domain uuid>/ha_agent/?
> >
> > This is how it should look like:
> >
> > [root@dev-03 ~]# ls -al
> >
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
> > total 2036
> > drwxr-x---. 2 vdsm kvm 4096 Mar 19 18:46 .
> > drwxr-xr-x. 6 vdsm kvm 4096 Mar 19 18:46 ..
> > -rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace
> > -rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata
> >
> > The errors seem to indicate that you somehow lost the lockspace file.
>
> True :)
> Isn't this file created when hosted engine is started? Or how can I
> create this file manually?
>
> >
> > --
> > Martin Sivák
> > msivak(a)redhat.com
> > Red Hat Czech
> > RHEV-M SLA / Brno, CZ
> >
> > ----- Original Message -----
> >> On 04/23/2014 12:28 AM, Doron Fediuck wrote:
> >>> Hi Rene,
> >>> any idea what closed your ovirtmgmt bridge?
> >>> as long as it is down vdsm may have issues starting up properly
> >>> and this is why you see the complaints on the rpc server.
> >>>
> >>> Can you try manually fixing the network part first and then
> >>> restart vdsm?
> >>> Once vdsm is happy hosted engine VM will start.
> >>
> >> Thanks for your feedback, Doron.
> >>
> >> My ovirtmgmt bridge seems to be on or isn't it:
> >> # brctl show ovirtmgmt
> >> bridge name bridge id STP enabled interfaces
> >> ovirtmgmt 8000.0025907587c2 no eth0.200
> >>
> >> # ip a s ovirtmgmt
> >> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue
> >> state UNKNOWN
> >> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
> >> inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
> >> inet6 fe80::225:90ff:fe75:87c2/64 scope link
> >> valid_lft forever preferred_lft forever
> >>
> >> # ip a s eth0.200
> >> 6: eth0.200@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> >> noqueue state UP
> >> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
> >> inet6 fe80::225:90ff:fe75:87c2/64 scope link
> >> valid_lft forever preferred_lft forever
> >>
> >> I tried the following yesterday:
> >> Copy virtual disk from GlusterFS storage to local disk of host and
> >> create a new vm with virt-manager which loads ovirtmgmt disk. I could
> >> reach my engine over the ovirtmgmt bridge (so bridge must be working).
> >>
> >> I also started libvirtd with Option -v and I saw the following in
> >> libvirtd.log when trying to start ovirt engine:
> >> 2014-04-22 14:18:25.432+0000: 8901: debug : virCommandRunAsync:2250 :
> >> Command result 0, with PID 11491
> >> 2014-04-22 14:18:25.478+0000: 8901: debug : virCommandRun:2045 :
Result
> >> exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto
'FO-vnet0'
is
> >> not a chain
> >>
> >> So it could be that something is broken in my hosted-engine network.
Do
> >> you have any clue how I can troubleshoot this?
> >>
> >>
> >> Thanks,
> >> René
> >>
> >>
> >>>
> >>> ----- Original Message -----
> >>>> From: "René Koch" <rkoch(a)linuxland.at>
> >>>> To: "Martin Sivak" <msivak(a)redhat.com>
> >>>> Cc: users(a)ovirt.org
> >>>> Sent: Tuesday, April 22, 2014 1:46:38 PM
> >>>> Subject: Re: [ovirt-users] hosted engine health check issues
> >>>>
> >>>> Hi,
> >>>>
> >>>> I rebooted one of my ovirt hosts today and the result is now that
I
> >>>> can't start hosted-engine anymore.
> >>>>
> >>>> ovirt-ha-agent isn't running because the lockspace file is
missing
> >>>> (sanlock complains about it).
> >>>> So I tried to start hosted-engine with --vm-start and I get the
> >>>> following errors:
> >>>>
> >>>> ==> /var/log/sanlock.log <==
> >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733
invalid
> >>>> lockspace found -1 failed 0 name
2851af27-8744-445d-9fb1-a0d083c8dc82
> >>>>
> >>>> ==> /var/log/messages <==
> >>>> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22
12:38:17+0200 654
> >>>> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed
0
name
> >>>> 2851af27-8744-445d-9fb1-a0d083c8dc82
> >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0)
entering
> >>>> disabled state
> >>>> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous
mode
> >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0)
entering
> >>>> disabled state
> >>>>
> >>>> ==> /var/log/vdsm/vdsm.log <==
> >>>> Thread-21::DEBUG::2014-04-22
> >>>> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
> >>>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to
acquire
> >>>> lock: No space left on device
> >>>> Thread-21::DEBUG::2014-04-22
> >>>> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
> >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
released
> >>>> Thread-21::ERROR::2014-04-22
> >>>> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
> >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
failed
> >>>> Traceback (most recent call last):
> >>>> File "/usr/share/vdsm/vm.py", line 2249, in
_startUnderlyingVm
> >>>> self._run()
> >>>> File "/usr/share/vdsm/vm.py", line 3170, in _run
> >>>> self._connection.createXML(domxml, flags),
> >>>> File
> >>>>
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> >>>> line 92, in wrapper
> >>>> ret = f(*args, **kwargs)
> >>>> File
"/usr/lib64/python2.6/site-packages/libvirt.py", line
2665, in
> >>>> createXML
> >>>> if ret is None:raise libvirtError('virDomainCreateXML()
failed',
> >>>> conn=self)
> >>>> libvirtError: Failed to acquire lock: No space left on device
> >>>>
> >>>> ==> /var/log/messages <==
> >>>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
> >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
> >>>> failed#012Traceback (most recent call last):#012 File
> >>>> "/usr/share/vdsm/vm.py", line 2249, in
_startUnderlyingVm#012
> >>>> self._run()#012 File "/usr/share/vdsm/vm.py", line 3170,
in
_run#012
> >>>> self._connection.createXML(domxml, flags),#012 File
> >>>>
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
line 92,
> >>>> in wrapper#012 ret = f(*args, **kwargs)#012 File
> >>>> "/usr/lib64/python2.6/site-packages/libvirt.py", line
2665, in
> >>>> createXML#012 if ret is None:raise
libvirtError('virDomainCreateXML()
> >>>> failed', conn=self)#012libvirtError: Failed to acquire lock:
No
space
> >>>> left on device
> >>>>
> >>>> ==> /var/log/vdsm/vdsm.log <==
> >>>> Thread-21::DEBUG::2014-04-22
> >>>> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
> >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to
Down:
> >>>> Failed to acquire lock: No space left on device
> >>>>
> >>>>
> >>>> No space left on device is nonsense as there is enough space (I
had
this
> >>>> issue last time as well where I had to patch machine.py, but this
file
> >>>> is now Python 2.6.6 compatible.
> >>>>
> >>>> Any idea what prevents hosted-engine from starting?
> >>>> ovirt-ha-broker, vdsmd and sanlock are running btw.
> >>>>
> >>>> Btw, I can see in log that json rpc server module is missing -
which
> >>>> package is required for CentOS 6.5?
> >>>> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the
json
> >>>> rpc server module. Please make sure it is installed.
> >>>>
> >>>>
> >>>> Thanks,
> >>>> René
> >>>>
> >>>>
> >>>>
> >>>> On 04/17/2014 10:02 AM, Martin Sivak wrote:
> >>>>> Hi,
> >>>>>
> >>>>>>>> How can I disable notifications?
> >>>>>
> >>>>> The notification is configured in
> >>>>> /etc/ovirt-hosted-engine-ha/broker.conf
> >>>>> section notification.
> >>>>> The email is sent when the key state_transition exists and the
string
> >>>>> OldState-NewState contains the (case insensitive) regexp from
the
> >>>>> value.
> >>>>>
> >>>>>>>> Is it intended to send out these messages and
detect that ovirt
> >>>>>>>> engine
> >>>>>>>> is down (which is false anyway), but not to restart
the vm?
> >>>>>
> >>>>> Forget about emails for now and check the
> >>>>> /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and
attach
> >>>>> them
> >>>>> as well btw).
> >>>>>
> >>>>>>>> oVirt hosts think that hosted engine is down
because it seems
that
> >>>>>>>> hosts
> >>>>>>>> can't write to hosted-engine.lockspace due to
glusterfs issues
(or
> >>>>>>>> at
> >>>>>>>> least I think so).
> >>>>>
> >>>>> The hosts think so or can't really write there? The
lockspace is
> >>>>> managed
> >>>>> by
> >>>>> sanlock and our HA daemons do not touch it at all. We only ask
sanlock
> >>>>> to
> >>>>> get make sure we have unique server id.
> >>>>>
> >>>>>>>> Is is possible or planned to make the whole ha
feature optional?
> >>>>>
> >>>>> Well the system won't perform any automatic actions if you
put the
> >>>>> hosted
> >>>>> engine to global maintenance and only start/stop/migrate the
VM
> >>>>> manually.
> >>>>> I would discourage you from stopping agent/broker, because the
engine
> >>>>> itself has some logic based on the reporting.
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> --
> >>>>> Martin Sivák
> >>>>> msivak(a)redhat.com
> >>>>> Red Hat Czech
> >>>>> RHEV-M SLA / Brno, CZ
> >>>>>
> >>>>> ----- Original Message -----
> >>>>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
> >>>>>>> On 04/14/2014 10:50 AM, René Koch wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I have some issues with hosted engine status.
> >>>>>>>>
> >>>>>>>> oVirt hosts think that hosted engine is down
because it seems
that
> >>>>>>>> hosts
> >>>>>>>> can't write to hosted-engine.lockspace due to
glusterfs issues
(or
> >>>>>>>> at
> >>>>>>>> least I think so).
> >>>>>>>>
> >>>>>>>> Here's the output of vm-status:
> >>>>>>>>
> >>>>>>>> # hosted-engine --vm-status
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --== Host 1 status ==--
> >>>>>>>>
> >>>>>>>> Status up-to-date : False
> >>>>>>>> Hostname : 10.0.200.102
> >>>>>>>> Host ID : 1
> >>>>>>>> Engine status : unknown
stale-data
> >>>>>>>> Score : 2400
> >>>>>>>> Local maintenance : False
> >>>>>>>> Host timestamp : 1397035677
> >>>>>>>> Extra metadata (valid at timestamp):
> >>>>>>>> metadata_parse_version=1
> >>>>>>>> metadata_feature_version=1
> >>>>>>>> timestamp=1397035677 (Wed Apr 9 11:27:57
2014)
> >>>>>>>> host-id=1
> >>>>>>>> score=2400
> >>>>>>>> maintenance=False
> >>>>>>>> state=EngineUp
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --== Host 2 status ==--
> >>>>>>>>
> >>>>>>>> Status up-to-date : True
> >>>>>>>> Hostname : 10.0.200.101
> >>>>>>>> Host ID : 2
> >>>>>>>> Engine status :
{'reason': 'vm not running
on
> >>>>>>>> this
> >>>>>>>> host', 'health': 'bad',
'vm': 'down', 'detail': 'unknown'}
> >>>>>>>> Score : 0
> >>>>>>>> Local maintenance : False
> >>>>>>>> Host timestamp : 1397464031
> >>>>>>>> Extra metadata (valid at timestamp):
> >>>>>>>> metadata_parse_version=1
> >>>>>>>> metadata_feature_version=1
> >>>>>>>> timestamp=1397464031 (Mon Apr 14 10:27:11
2014)
> >>>>>>>> host-id=2
> >>>>>>>> score=0
> >>>>>>>> maintenance=False
> >>>>>>>> state=EngineUnexpectedlyDown
> >>>>>>>> timeout=Mon Apr 14 10:35:05 2014
> >>>>>>>>
> >>>>>>>> oVirt engine is sending me 2 emails every 10
minutes with the
> >>>>>>>> following
> >>>>>>>> subjects:
> >>>>>>>> - ovirt-hosted-engine state transition
EngineDown-EngineStart
> >>>>>>>> - ovirt-hosted-engine state transition
EngineStart-EngineUp
> >>>>>>>>
> >>>>>>>> In oVirt webadmin I can see the following message:
> >>>>>>>> VM HostedEngine is down. Exit message: internal
error Failed to
> >>>>>>>> acquire
> >>>>>>>> lock: error -243.
> >>>>>>>>
> >>>>>>>> These messages are really annoying as oVirt
isn't doing anything
> >>>>>>>> with
> >>>>>>>> hosted engine - I have an uptime of 9 days in my
engine vm.
> >>>>>>>>
> >>>>>>>> So my questions are now:
> >>>>>>>> Is it intended to send out these messages and
detect that ovirt
> >>>>>>>> engine
> >>>>>>>> is down (which is false anyway), but not to restart
the vm?
> >>>>>>>>
> >>>>>>>> How can I disable notifications? I'm planning
to write a Nagios
> >>>>>>>> plugin
> >>>>>>>> which parses the output of hosted-engine
--vm-status and only
Nagios
> >>>>>>>> should notify me, not hosted-engine script.
> >>>>>>>>
> >>>>>>>> Is is possible or planned to make the whole ha
feature
optional? I
> >>>>>>>> really really really hate cluster software as it
causes more
> >>>>>>>> troubles
> >>>>>>>> then standalone machines and in my case the
hosted-engine ha
feature
> >>>>>>>> really causes troubles (and I didn't had a
hardware or network
> >>>>>>>> outage
> >>>>>>>> yet only issues with hosted-engine ha agent). I
don't need any
ha
> >>>>>>>> feature for hosted engine. I just want to run
engine
virtualized on
> >>>>>>>> oVirt and if engine vm fails (e.g. because of
issues with a
host)
> >>>>>>>> I'll
> >>>>>>>> restart it on another node.
> >>>>>>>
> >>>>>>> Hi, you can:
> >>>>>>> 1. edit
/etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and
tweak
> >>>>>>> the logger as you like
> >>>>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent
services
> >>>>>>
> >>>>>> Thanks for the information.
> >>>>>> So engine is able to run when ovirt-ha-broker and
ovirt-ha-agent
isn't
> >>>>>> running?
> >>>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> René
> >>>>>>
> >>>>>>>
> >>>>>>> --Jirka
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> René
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Users mailing list
> >>>>>> Users(a)ovirt.org
> >>>>>>
http://lists.ovirt.org/mailman/listinfo/users
> >>>>>>
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> Users(a)ovirt.org
> >>>>
http://lists.ovirt.org/mailman/listinfo/users
> >>>>
> >>
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users