Hi René,
>>> libvirtError: Failed to acquire lock: No space left on device
>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
>>> lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
Can you please check the contents of /rhev/data-center/<your nfs mount>/<nfs
domain uuid>/ha_agent/?
This is how it should look like:
[root@dev-03 ~]# ls -al
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
total 2036
drwxr-x---. 2 vdsm kvm 4096 Mar 19 18:46 .
drwxr-xr-x. 6 vdsm kvm 4096 Mar 19 18:46 ..
-rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace
-rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata
The errors seem to indicate that you somehow lost the lockspace file.
True :)
Isn't this file created when hosted engine is started? Or how can I
create this file manually?
--
Martin Sivák
msivak(a)redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ
----- Original Message -----
> On 04/23/2014 12:28 AM, Doron Fediuck wrote:
>> Hi Rene,
>> any idea what closed your ovirtmgmt bridge?
>> as long as it is down vdsm may have issues starting up properly
>> and this is why you see the complaints on the rpc server.
>>
>> Can you try manually fixing the network part first and then
>> restart vdsm?
>> Once vdsm is happy hosted engine VM will start.
>
> Thanks for your feedback, Doron.
>
> My ovirtmgmt bridge seems to be on or isn't it:
> # brctl show ovirtmgmt
> bridge name bridge id STP enabled interfaces
> ovirtmgmt 8000.0025907587c2 no eth0.200
>
> # ip a s ovirtmgmt
> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> state UNKNOWN
> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
> inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
> inet6 fe80::225:90ff:fe75:87c2/64 scope link
> valid_lft forever preferred_lft forever
>
> # ip a s eth0.200
> 6: eth0.200@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UP
> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
> inet6 fe80::225:90ff:fe75:87c2/64 scope link
> valid_lft forever preferred_lft forever
>
> I tried the following yesterday:
> Copy virtual disk from GlusterFS storage to local disk of host and
> create a new vm with virt-manager which loads ovirtmgmt disk. I could
> reach my engine over the ovirtmgmt bridge (so bridge must be working).
>
> I also started libvirtd with Option -v and I saw the following in
> libvirtd.log when trying to start ovirt engine:
> 2014-04-22 14:18:25.432+0000: 8901: debug : virCommandRunAsync:2250 :
> Command result 0, with PID 11491
> 2014-04-22 14:18:25.478+0000: 8901: debug : virCommandRun:2045 : Result
> exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto
'FO-vnet0' is
> not a chain
>
> So it could be that something is broken in my hosted-engine network. Do
> you have any clue how I can troubleshoot this?
>
>
> Thanks,
> René
>
>
>>
>> ----- Original Message -----
>>> From: "René Koch" <rkoch(a)linuxland.at>
>>> To: "Martin Sivak" <msivak(a)redhat.com>
>>> Cc: users(a)ovirt.org
>>> Sent: Tuesday, April 22, 2014 1:46:38 PM
>>> Subject: Re: [ovirt-users] hosted engine health check issues
>>>
>>> Hi,
>>>
>>> I rebooted one of my ovirt hosts today and the result is now that I
>>> can't start hosted-engine anymore.
>>>
>>> ovirt-ha-agent isn't running because the lockspace file is missing
>>> (sanlock complains about it).
>>> So I tried to start hosted-engine with --vm-start and I get the
>>> following errors:
>>>
>>> ==> /var/log/sanlock.log <==
>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
>>> lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
>>>
>>> ==> /var/log/messages <==
>>> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654
>>> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name
>>> 2851af27-8744-445d-9fb1-a0d083c8dc82
>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
>>> disabled state
>>> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
>>> disabled state
>>>
>>> ==> /var/log/vdsm/vdsm.log <==
>>> Thread-21::DEBUG::2014-04-22
>>> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
>>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
>>> lock: No space left on device
>>> Thread-21::DEBUG::2014-04-22
>>> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
>>> Thread-21::ERROR::2014-04-22
>>> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
>>> Traceback (most recent call last):
>>> File "/usr/share/vdsm/vm.py", line 2249, in
_startUnderlyingVm
>>> self._run()
>>> File "/usr/share/vdsm/vm.py", line 3170, in _run
>>> self._connection.createXML(domxml, flags),
>>> File
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
>>> line 92, in wrapper
>>> ret = f(*args, **kwargs)
>>> File "/usr/lib64/python2.6/site-packages/libvirt.py", line
2665, in
>>> createXML
>>> if ret is None:raise libvirtError('virDomainCreateXML()
failed',
>>> conn=self)
>>> libvirtError: Failed to acquire lock: No space left on device
>>>
>>> ==> /var/log/messages <==
>>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
>>> failed#012Traceback (most recent call last):#012 File
>>> "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm#012
>>> self._run()#012 File "/usr/share/vdsm/vm.py", line 3170, in
_run#012
>>> self._connection.createXML(domxml, flags),#012 File
>>> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
line 92,
>>> in wrapper#012 ret = f(*args, **kwargs)#012 File
>>> "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
>>> createXML#012 if ret is None:raise libvirtError('virDomainCreateXML()
>>> failed', conn=self)#012libvirtError: Failed to acquire lock: No space
>>> left on device
>>>
>>> ==> /var/log/vdsm/vdsm.log <==
>>> Thread-21::DEBUG::2014-04-22
>>> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
>>> Failed to acquire lock: No space left on device
>>>
>>>
>>> No space left on device is nonsense as there is enough space (I had this
>>> issue last time as well where I had to patch machine.py, but this file
>>> is now Python 2.6.6 compatible.
>>>
>>> Any idea what prevents hosted-engine from starting?
>>> ovirt-ha-broker, vdsmd and sanlock are running btw.
>>>
>>> Btw, I can see in log that json rpc server module is missing - which
>>> package is required for CentOS 6.5?
>>> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json
>>> rpc server module. Please make sure it is installed.
>>>
>>>
>>> Thanks,
>>> René
>>>
>>>
>>>
>>> On 04/17/2014 10:02 AM, Martin Sivak wrote:
>>>> Hi,
>>>>
>>>>>>> How can I disable notifications?
>>>>
>>>> The notification is configured in
/etc/ovirt-hosted-engine-ha/broker.conf
>>>> section notification.
>>>> The email is sent when the key state_transition exists and the string
>>>> OldState-NewState contains the (case insensitive) regexp from the value.
>>>>
>>>>>>> Is it intended to send out these messages and detect that
ovirt engine
>>>>>>> is down (which is false anyway), but not to restart the vm?
>>>>
>>>> Forget about emails for now and check the
>>>> /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach
them
>>>> as well btw).
>>>>
>>>>>>> oVirt hosts think that hosted engine is down because it seems
that
>>>>>>> hosts
>>>>>>> can't write to hosted-engine.lockspace due to glusterfs
issues (or at
>>>>>>> least I think so).
>>>>
>>>> The hosts think so or can't really write there? The lockspace is
managed
>>>> by
>>>> sanlock and our HA daemons do not touch it at all. We only ask sanlock
to
>>>> get make sure we have unique server id.
>>>>
>>>>>>> Is is possible or planned to make the whole ha feature
optional?
>>>>
>>>> Well the system won't perform any automatic actions if you put the
hosted
>>>> engine to global maintenance and only start/stop/migrate the VM
manually.
>>>> I would discourage you from stopping agent/broker, because the engine
>>>> itself has some logic based on the reporting.
>>>>
>>>> Regards
>>>>
>>>> --
>>>> Martin Sivák
>>>> msivak(a)redhat.com
>>>> Red Hat Czech
>>>> RHEV-M SLA / Brno, CZ
>>>>
>>>> ----- Original Message -----
>>>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
>>>>>> On 04/14/2014 10:50 AM, René Koch wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have some issues with hosted engine status.
>>>>>>>
>>>>>>> oVirt hosts think that hosted engine is down because it seems
that
>>>>>>> hosts
>>>>>>> can't write to hosted-engine.lockspace due to glusterfs
issues (or at
>>>>>>> least I think so).
>>>>>>>
>>>>>>> Here's the output of vm-status:
>>>>>>>
>>>>>>> # hosted-engine --vm-status
>>>>>>>
>>>>>>>
>>>>>>> --== Host 1 status ==--
>>>>>>>
>>>>>>> Status up-to-date : False
>>>>>>> Hostname : 10.0.200.102
>>>>>>> Host ID : 1
>>>>>>> Engine status : unknown stale-data
>>>>>>> Score : 2400
>>>>>>> Local maintenance : False
>>>>>>> Host timestamp : 1397035677
>>>>>>> Extra metadata (valid at timestamp):
>>>>>>> metadata_parse_version=1
>>>>>>> metadata_feature_version=1
>>>>>>> timestamp=1397035677 (Wed Apr 9 11:27:57 2014)
>>>>>>> host-id=1
>>>>>>> score=2400
>>>>>>> maintenance=False
>>>>>>> state=EngineUp
>>>>>>>
>>>>>>>
>>>>>>> --== Host 2 status ==--
>>>>>>>
>>>>>>> Status up-to-date : True
>>>>>>> Hostname : 10.0.200.101
>>>>>>> Host ID : 2
>>>>>>> Engine status : {'reason':
'vm not running on
>>>>>>> this
>>>>>>> host', 'health': 'bad', 'vm':
'down', 'detail': 'unknown'}
>>>>>>> Score : 0
>>>>>>> Local maintenance : False
>>>>>>> Host timestamp : 1397464031
>>>>>>> Extra metadata (valid at timestamp):
>>>>>>> metadata_parse_version=1
>>>>>>> metadata_feature_version=1
>>>>>>> timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
>>>>>>> host-id=2
>>>>>>> score=0
>>>>>>> maintenance=False
>>>>>>> state=EngineUnexpectedlyDown
>>>>>>> timeout=Mon Apr 14 10:35:05 2014
>>>>>>>
>>>>>>> oVirt engine is sending me 2 emails every 10 minutes with
the
>>>>>>> following
>>>>>>> subjects:
>>>>>>> - ovirt-hosted-engine state transition
EngineDown-EngineStart
>>>>>>> - ovirt-hosted-engine state transition EngineStart-EngineUp
>>>>>>>
>>>>>>> In oVirt webadmin I can see the following message:
>>>>>>> VM HostedEngine is down. Exit message: internal error Failed
to
>>>>>>> acquire
>>>>>>> lock: error -243.
>>>>>>>
>>>>>>> These messages are really annoying as oVirt isn't doing
anything with
>>>>>>> hosted engine - I have an uptime of 9 days in my engine vm.
>>>>>>>
>>>>>>> So my questions are now:
>>>>>>> Is it intended to send out these messages and detect that
ovirt engine
>>>>>>> is down (which is false anyway), but not to restart the vm?
>>>>>>>
>>>>>>> How can I disable notifications? I'm planning to write a
Nagios plugin
>>>>>>> which parses the output of hosted-engine --vm-status and only
Nagios
>>>>>>> should notify me, not hosted-engine script.
>>>>>>>
>>>>>>> Is is possible or planned to make the whole ha feature
optional? I
>>>>>>> really really really hate cluster software as it causes more
troubles
>>>>>>> then standalone machines and in my case the hosted-engine ha
feature
>>>>>>> really causes troubles (and I didn't had a hardware or
network outage
>>>>>>> yet only issues with hosted-engine ha agent). I don't
need any ha
>>>>>>> feature for hosted engine. I just want to run engine
virtualized on
>>>>>>> oVirt and if engine vm fails (e.g. because of issues with a
host) I'll
>>>>>>> restart it on another node.
>>>>>>
>>>>>> Hi, you can:
>>>>>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and
tweak
>>>>>> the logger as you like
>>>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services
>>>>>
>>>>> Thanks for the information.
>>>>> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent
isn't
>>>>> running?
>>>>>
>>>>>
>>>>> Regards,
>>>>> René
>>>>>
>>>>>>
>>>>>> --Jirka
>>>>>>>
>>>>>>> Thanks,
>>>>>>> René
>>>>>>>
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users(a)ovirt.org
>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>
>