[ovirt-users] hosted engine health check issues

René Koch rkoch at linuxland.at
Wed Apr 23 06:56:44 UTC 2014


On 04/23/2014 12:28 AM, Doron Fediuck wrote:
> Hi Rene,
> any idea what closed your ovirtmgmt bridge?
> as long as it is down vdsm may have issues starting up properly
> and this is why you see the complaints on the rpc server.
>
> Can you try manually fixing the network part first and then
> restart vdsm?
> Once vdsm is happy hosted engine VM will start.

Thanks for your feedback, Doron.

My ovirtmgmt bridge seems to be on or isn't it:
# brctl show ovirtmgmt
bridge name	bridge id		STP enabled	interfaces
ovirtmgmt		8000.0025907587c2	no		eth0.200

# ip a s ovirtmgmt
7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UNKNOWN
     link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
     inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
     inet6 fe80::225:90ff:fe75:87c2/64 scope link
        valid_lft forever preferred_lft forever

# ip a s eth0.200
6: eth0.200 at eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP
     link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::225:90ff:fe75:87c2/64 scope link
        valid_lft forever preferred_lft forever

I tried the following yesterday:
Copy virtual disk from GlusterFS storage to local disk of host and 
create a new vm with virt-manager which loads ovirtmgmt disk. I could 
reach my engine over the ovirtmgmt bridge (so bridge must be working).

I also started libvirtd with Option -v and I saw the following in 
libvirtd.log when trying to start ovirt engine:
2014-04-22 14:18:25.432+0000: 8901: debug : virCommandRunAsync:2250 : 
Command result 0, with PID 11491
2014-04-22 14:18:25.478+0000: 8901: debug : virCommandRun:2045 : Result 
exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is 
not a chain

So it could be that something is broken in my hosted-engine network. Do 
you have any clue how I can troubleshoot this?


Thanks,
René


>
> ----- Original Message -----
>> From: "René Koch" <rkoch at linuxland.at>
>> To: "Martin Sivak" <msivak at redhat.com>
>> Cc: users at ovirt.org
>> Sent: Tuesday, April 22, 2014 1:46:38 PM
>> Subject: Re: [ovirt-users] hosted engine health check issues
>>
>> Hi,
>>
>> I rebooted one of my ovirt hosts today and the result is now that I
>> can't start hosted-engine anymore.
>>
>> ovirt-ha-agent isn't running because the lockspace file is missing
>> (sanlock complains about it).
>> So I tried to start hosted-engine with --vm-start and I get the
>> following errors:
>>
>> ==> /var/log/sanlock.log <==
>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
>> lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
>>
>> ==> /var/log/messages <==
>> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654
>> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name
>> 2851af27-8744-445d-9fb1-a0d083c8dc82
>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
>> disabled state
>> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
>> disabled state
>>
>> ==> /var/log/vdsm/vdsm.log <==
>> Thread-21::DEBUG::2014-04-22
>> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
>> lock: No space left on device
>> Thread-21::DEBUG::2014-04-22
>> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
>> Thread-21::ERROR::2014-04-22
>> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
>> Traceback (most recent call last):
>>     File "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm
>>       self._run()
>>     File "/usr/share/vdsm/vm.py", line 3170, in _run
>>       self._connection.createXML(domxml, flags),
>>     File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
>> line 92, in wrapper
>>       ret = f(*args, **kwargs)
>>     File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
>> createXML
>>       if ret is None:raise libvirtError('virDomainCreateXML() failed',
>> conn=self)
>> libvirtError: Failed to acquire lock: No space left on device
>>
>> ==> /var/log/messages <==
>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
>> failed#012Traceback (most recent call last):#012  File
>> "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm#012
>> self._run()#012  File "/usr/share/vdsm/vm.py", line 3170, in _run#012
>>    self._connection.createXML(domxml, flags),#012  File
>> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92,
>> in wrapper#012    ret = f(*args, **kwargs)#012  File
>> "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
>> createXML#012    if ret is None:raise libvirtError('virDomainCreateXML()
>> failed', conn=self)#012libvirtError: Failed to acquire lock: No space
>> left on device
>>
>> ==> /var/log/vdsm/vdsm.log <==
>> Thread-21::DEBUG::2014-04-22
>> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
>> Failed to acquire lock: No space left on device
>>
>>
>> No space left on device is nonsense as there is enough space (I had this
>> issue last time as well where I had to patch machine.py, but this file
>> is now Python 2.6.6 compatible.
>>
>> Any idea what prevents hosted-engine from starting?
>> ovirt-ha-broker, vdsmd and sanlock are running btw.
>>
>> Btw, I can see in log that json rpc server module is missing - which
>> package is required for CentOS 6.5?
>> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json
>> rpc server module. Please make sure it is installed.
>>
>>
>> Thanks,
>> René
>>
>>
>>
>> On 04/17/2014 10:02 AM, Martin Sivak wrote:
>>> Hi,
>>>
>>>>>> How can I disable notifications?
>>>
>>> The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf
>>> section notification.
>>> The email is sent when the key state_transition exists and the string
>>> OldState-NewState contains the (case insensitive) regexp from the value.
>>>
>>>>>> Is it intended to send out these messages and detect that ovirt engine
>>>>>> is down (which is false anyway), but not to restart the vm?
>>>
>>> Forget about emails for now and check the
>>> /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them
>>> as well btw).
>>>
>>>>>> oVirt hosts think that hosted engine is down because it seems that hosts
>>>>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at
>>>>>> least I think so).
>>>
>>> The hosts think so or can't really write there? The lockspace is managed by
>>> sanlock and our HA daemons do not touch it at all. We only ask sanlock to
>>> get make sure we have unique server id.
>>>
>>>>>> Is is possible or planned to make the whole ha feature optional?
>>>
>>> Well the system won't perform any automatic actions if you put the hosted
>>> engine to global maintenance and only start/stop/migrate the VM manually.
>>> I would discourage you from stopping agent/broker, because the engine
>>> itself has some logic based on the reporting.
>>>
>>> Regards
>>>
>>> --
>>> Martin Sivák
>>> msivak at redhat.com
>>> Red Hat Czech
>>> RHEV-M SLA / Brno, CZ
>>>
>>> ----- Original Message -----
>>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
>>>>> On 04/14/2014 10:50 AM, René Koch wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have some issues with hosted engine status.
>>>>>>
>>>>>> oVirt hosts think that hosted engine is down because it seems that hosts
>>>>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at
>>>>>> least I think so).
>>>>>>
>>>>>> Here's the output of vm-status:
>>>>>>
>>>>>> # hosted-engine --vm-status
>>>>>>
>>>>>>
>>>>>> --== Host 1 status ==--
>>>>>>
>>>>>> Status up-to-date                  : False
>>>>>> Hostname                           : 10.0.200.102
>>>>>> Host ID                            : 1
>>>>>> Engine status                      : unknown stale-data
>>>>>> Score                              : 2400
>>>>>> Local maintenance                  : False
>>>>>> Host timestamp                     : 1397035677
>>>>>> Extra metadata (valid at timestamp):
>>>>>>        metadata_parse_version=1
>>>>>>        metadata_feature_version=1
>>>>>>        timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
>>>>>>        host-id=1
>>>>>>        score=2400
>>>>>>        maintenance=False
>>>>>>        state=EngineUp
>>>>>>
>>>>>>
>>>>>> --== Host 2 status ==--
>>>>>>
>>>>>> Status up-to-date                  : True
>>>>>> Hostname                           : 10.0.200.101
>>>>>> Host ID                            : 2
>>>>>> Engine status                      : {'reason': 'vm not running on this
>>>>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
>>>>>> Score                              : 0
>>>>>> Local maintenance                  : False
>>>>>> Host timestamp                     : 1397464031
>>>>>> Extra metadata (valid at timestamp):
>>>>>>        metadata_parse_version=1
>>>>>>        metadata_feature_version=1
>>>>>>        timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
>>>>>>        host-id=2
>>>>>>        score=0
>>>>>>        maintenance=False
>>>>>>        state=EngineUnexpectedlyDown
>>>>>>        timeout=Mon Apr 14 10:35:05 2014
>>>>>>
>>>>>> oVirt engine is sending me 2 emails every 10 minutes with the following
>>>>>> subjects:
>>>>>> - ovirt-hosted-engine state transition EngineDown-EngineStart
>>>>>> - ovirt-hosted-engine state transition EngineStart-EngineUp
>>>>>>
>>>>>> In oVirt webadmin I can see the following message:
>>>>>> VM HostedEngine is down. Exit message: internal error Failed to acquire
>>>>>> lock: error -243.
>>>>>>
>>>>>> These messages are really annoying as oVirt isn't doing anything with
>>>>>> hosted engine - I have an uptime of 9 days in my engine vm.
>>>>>>
>>>>>> So my questions are now:
>>>>>> Is it intended to send out these messages and detect that ovirt engine
>>>>>> is down (which is false anyway), but not to restart the vm?
>>>>>>
>>>>>> How can I disable notifications? I'm planning to write a Nagios plugin
>>>>>> which parses the output of hosted-engine --vm-status and only Nagios
>>>>>> should notify me, not hosted-engine script.
>>>>>>
>>>>>> Is is possible or planned to make the whole ha feature optional? I
>>>>>> really really really hate cluster software as it causes more troubles
>>>>>> then standalone machines and in my case the hosted-engine ha feature
>>>>>> really causes troubles (and I didn't had a hardware or network outage
>>>>>> yet only issues with hosted-engine ha agent). I don't need any ha
>>>>>> feature for hosted engine. I just want to run engine virtualized on
>>>>>> oVirt and if engine vm fails (e.g. because of issues with a host) I'll
>>>>>> restart it on another node.
>>>>>
>>>>> Hi, you can:
>>>>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
>>>>> the logger as you like
>>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services
>>>>
>>>> Thanks for the information.
>>>> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't
>>>> running?
>>>>
>>>>
>>>> Regards,
>>>> René
>>>>
>>>>>
>>>>> --Jirka
>>>>>>
>>>>>> Thanks,
>>>>>> René
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>



More information about the Users mailing list