[ovirt-users] Hosted-Engine HA problem

Jiri Moskovcak jmoskovc at redhat.com
Fri Oct 31 08:32:02 UTC 2014


On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
> Hi guys,
>
> these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared.
>
> agent.log
>      new_data = self.refresh(self._state.data)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh
>      stats.update(self.hosted_engine.collect_stats())
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 662, in collect_stats
>      constants.SERVICE_TYPE)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage
>      result = self._checked_communicate(request)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate
>      .format(message or response))
> RequestError: Request failed: <type 'exceptions.OSError'>
>
> broker.log
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle
>      response = "success " + self._dispatch(data)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch
>      .get_all_stats_for_service_type(**options)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type
>      d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type
>      f = os.open(path, direct_flag | os.O_RDONLY)
> OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'

- ah, there we go ^^^^^^ you might need to tweak the limit of allowed 
open files as described here [1] or find the app keeps so many files open


--Jirka

[1] 
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

> Thread-38160::INFO::2014-10-31 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
> Thread-38161::INFO::2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
> Thread-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'get-stats storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent service_type=hosted-engine'
> Traceback (most recent call last):
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle
>      response = "success " + self._dispatch(data)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch
>      .get_all_stats_for_service_type(**options)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type
>      d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type
>      f = os.open(path, direct_flag | os.O_RDONLY)
> OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
> Thread-38161::INFO::2014-10-31 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
>
> Thanks,
> Jaicel
>
> ----- Original Message -----
> From: "Niels de Vos" <ndevos at redhat.com>
> To: "Vijay Bellur" <vbellur at redhat.com>
> Cc: "Jiri Moskovcak" <jmoskovc at redhat.com>, "Jaicel R. Sabonsolin" <jaicel at asti.dost.gov.ph>, users at ovirt.org, "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Friday, October 31, 2014 4:11:25 AM
> Subject: Re: [ovirt-users] Hosted-Engine HA problem
>
> On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:
>> On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:
>>> On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:
>>>> Hi Guys,
>>>>
>>>> I need help with my ovirt Hosted-Engine HA setup. I am running on 2
>>>> ovirt hosts and 2 gluster nodes with replicated volumes. i already have
>>>> VMs running on my hosts and they can migrate normally once i for example
>>>> power off the host that they are running on. the problem is that the
>>>> engine can't migrate once i switch off the host that hosts the engine.
>>>>
>>>>     oVirt        3.4.3-1.el6
>>>>     KVM         0.12.1.2 - 2.415.el6_5.10
>>>>     LIBVIRT   libvirt-0.10.2-29.el6_5.9
>>>>     VDSM      vdsm-4.14.17-0.el6
>>>>
>>>>
>>>> right now, i have this result from hosted-engine --vm-status.
>>>>
>>>>        File "/usr/lib64/python2.6/runpy.py", line 122, in
>>>>     _run_module_as_main
>>>>          "__main__", fname, loader, pkg_name)
>>>>        File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
>>>>          exec code in run_globals
>>>>        File
>>>>
>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>
>>>>     line 111, in <module>
>>>>          if not status_checker.print_status():
>>>>        File
>>>>
>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>
>>>>     line 58, in print_status
>>>>          all_host_stats = ha_cli.get_all_host_stats()
>>>>        File
>>>>
>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>
>>>>     line 137, in get_all_host_stats
>>>>          return self.get_all_stats(self.StatModes.HOST)
>>>>        File
>>>>
>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>
>>>>     line 86, in get_all_stats
>>>>          constants.SERVICE_TYPE)
>>>>        File
>>>>
>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>
>>>>     line 171, in get_stats_from_storage
>>>>          result = self._checked_communicate(request)
>>>>        File
>>>>
>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>
>>>>     line 199, in _checked_communicate
>>>>          .format(message or response))
>>>>     ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
>>>>     <type 'exceptions.OSError'>
>>>>
>>>>
>>>> restarting ha-broker and ha-agent normalizes the status but eventually
>>>> it would become "false" and then return to the result above. hope you
>>>> guys could help me with this.
>>>>
>>>
>>> Hi Jaicel,
>>> please attach agent.log and broker.log from the host where you trying to
>>> run hosted-engine --vm-status. I have a feeling that you ran into a
>>> known problem on gluster - stalled file descriptor, in that case the
>>> only known solution at this time is to restart the broker & agent as you
>>> have already found out.
>>>
>>
>> Adding Niels and gluster-devel to troubleshoot from Gluster NFS perspective.
>
> I'd welcome any details on this "stalled file descriptor" problem. Is
> there a bug filed with some details like logs, sysrq-t and maybe even
> tcpdumps? If there is an easy way to reproduce this behaviour, I can
> surely look into it and hopefully come up with some advise or fix.
>
> Thanks,
> Niels
>




More information about the Users mailing list