[ovirt-users] HA Broker fails after 4.2 upgrade

Simone Tiraboschi stirabos at redhat.com
Thu Dec 21 08:02:04 UTC 2017


On Thu, Dec 21, 2017 at 5:13 AM, Andy <farkey_2000 at yahoo.com> wrote:

> Hello all,
>
> I just upgraded my OVIRT instance to 4.2, the engine completed
> successfully, however after I upgraded the hosts the HA Broker will not
> start.  The 2 hosts are running CentOS 7.4, running gluster and CTDB.  The
> VIPS are up and can be reached from both hosts as well as I can mount the
> gluster storage.
>
> The error from the agent.log:
>
> MainThread::INFO::2017-12-20 21:02:19,219::agent::67::
> ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha
> agent 2.2.2 started
> MainThread::INFO::2017-12-20 21:02:19,346::hosted_engine::
> 243::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> Found certificate common name: hm3svr01.hm3.loc
> MainThread::INFO::2017-12-20 21:02:20,478::hosted_engine::
> 525::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_broker) Initializing ha-broker connection
> MainThread::INFO::2017-12-20 21:02:20,482::brokerlink::77::
> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Starting monitor ping, options {'addr': '192.168.3.1'}
> MainThread::ERROR::2017-12-20 21:02:20,483::hosted_engine::
> 538::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_broker) Failed to start necessary monitors
> MainThread::ERROR::2017-12-20 21:02:20,485::agent::144::
> ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most
> recent call last):
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 131, in _run_agent
>     return action(he)
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 55, in action_proper
>     return he.start_monitoring()
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 416, in start_monitoring
>     self._initialize_broker()
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 535, in _initialize_broker
>     m.get('options', {}))
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 83, in start_monitor
>     .format(type, options, e))
> RequestError: Failed to start monitor ping, options {'addr':
> '192.168.x.x'}: [Errno 2] No such file or directory
>

This simply means that the broker is not ready.


>
>
> The broker.log:
>
> MainThread::INFO::2017-12-20 23:06:19,405::monitor::50::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Finished loading submonitors
> MainThread::INFO::2017-12-20 23:06:20,324::storage_
> backends::346::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> Connecting the storage
> MainThread::INFO::2017-12-20 23:06:20,325::storage_server::
> 252::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
> MainThread::INFO::2017-12-20 23:06:20,849::storage_server::
> 259::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
> MainThread::WARNING::2017-12-20 23:06:20,913::storage_broker::
> 96::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> Can't connect vdsm storage: Connection to storage server failed
> MainThread::INFO::2017-12-20 23:06:22,087::broker::45::
> ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha
> broker 2.2.2 started
> MainThread::INFO::2017-12-20 23:06:22,088::monitor::40::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Searching for submonitors in /usr/lib/python2.7/site-
> packages/ovirt_hosted_engine_ha/broker/s
> ubmonitors
> MainThread::INFO::2017-12-20 23:06:22,089::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor cpu-load
> MainThread::INFO::2017-12-20 23:06:22,093::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor cpu-load-no-engine
> MainThread::INFO::2017-12-20 23:06:22,146::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor engine-health
> MainThread::INFO::2017-12-20 23:06:22,147::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mem-free
> MainThread::INFO::2017-12-20 23:06:22,147::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mem-load
> MainThread::INFO::2017-12-20 23:06:22,148::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mgmt-bridge
> MainThread::INFO::2017-12-20 23:06:22,149::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor ping
> MainThread::INFO::2017-12-20 23:06:22,149::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor storage-domain
> MainThread::INFO::2017-12-20 23:06:22,150::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor cpu-load
> MainThread::INFO::2017-12-20 23:06:22,151::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor cpu-load-no-engine
> MainThread::INFO::2017-12-20 23:06:22,152::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor engine-health
> MainThread::INFO::2017-12-20 23:06:22,153::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mem-free
> MainThread::INFO::2017-12-20 23:06:22,153::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mem-load
> MainThread::INFO::2017-12-20 23:06:22,154::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mgmt-bridge
> MainThread::INFO::2017-12-20 23:06:22,154::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor ping
> MainThread::INFO::2017-12-20 23:06:22,155::monitor::49::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor storage-domain
>
>

Could you please change in /etc/ovirt-hosted-engine-ha/broker-log.conf
from
[logger_root]
level=INFO
to
[logger_root]
level=DEBUG

restart the broker service, wait a few minutes and then share its debug log?


>
> The VDSM log has alot of JSON errors with the storage fai2017-12-20
> 23:13:00,311-0500 INFO  (jsonrpc/6) [vdsm.api] FINISH getStorageDomainInfo
> error=Storage domain does not exist: (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',)
> from=::1,54630, task_id=ff009157-48f3-480c-b8fe-b8d0a791c922 (api:50)
> 2017-12-20 23:13:00,312-0500 ERROR (jsonrpc/6) [storage.TaskManager.Task]
> (Task='ff009157-48f3-480c-b8fe-b8d0a791c922') Unexpected error (task:875)
> 2017-12-20 23:13:00,314-0500 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH
> getStorageDomainInfo error=Storage domain does not exist:
> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) (dispatcher:82)
> 2017-12-20 23:13:00,314-0500 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC
> call StorageDomain.getInfo failed (error 358) in 0.48 seconds (__init__:573)
>     raise convert_to_error(kind, result)
> 2017-12-20 23:13:03,092-0500 INFO  (jsonrpc/3) [vdsm.api] FINISH
> getStorageDomainInfo error=Storage domain does not exist:
> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) from=::1,54632,
> task_id=39e022e5-db99-4bc4-88e1-9a218104b3c7 (api:50)
> 2017-12-20 23:13:03,093-0500 ERROR (jsonrpc/3) [storage.TaskManager.Task]
> (Task='39e022e5-db99-4bc4-88e1-9a218104b3c7') Unexpected error (task:875)
> 2017-12-20 23:13:03,095-0500 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH
> getStorageDomainInfo error=Storage domain does not exist:
> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) (dispatcher:82)
> 2017-12-20 23:13:03,095-0500 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC
> call StorageDomain.getInfo failed (error 358) in 0.49 seconds (__init__:573)
>     raise convert_to_error(kind, result)
> 2017-12-20 23:13:07,568-0500 INFO  (jsonrpc/4) [vdsm.api] FINISH
> getStorageDomainInfo error=Storage domain does not exist:
> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) from=::1,54640,
> task_id=c1b1b1a1-a7e6-494a-bda6-19c617820dec (api:50)
> 2017-12-20 23:13:07,569-0500 ERROR (jsonrpc/4) [storage.TaskManager.Task]
> (Task='c1b1b1a1-a7e6-494a-bda6-19c617820dec') Unexpected error (task:875)
> 2017-12-20 23:13:07,571-0500 ERROR (jsonrpc/4) [storage.Dispatcher] FINISH
> getStorageDomainInfo error=Storage domain does not exist:
> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) (dispatcher:82)
> 2017-12-20 23:13:07,571-0500 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC
> call StorageDomain.getInfo failed (error 358) in 0.48 seconds (__init__:573)
>     raise convert_to_error(kind, result)
> 2017-12-20 23:13:10,323-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH
> getStorageDomainInfo error=Storage domain does not exist:
> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) from=::1,54642,
> task_id=6354fa3d-933c-4fd0-9301-00f8abd29ec7 (api:50)
> 2017-12-20 23:13:10,323-0500 ERROR (jsonrpc/0) [storage.TaskManager.Task]
> (Task='6354fa3d-933c-4fd0-9301-00f8abd29ec7') Unexpected error (task:875)
> 2017-12-20 23:13:10,325-0500 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH
> getStorageDomainInfo error=Storage domain does not exist:
> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) (dispatcher:82)
> 2017-12-20 23:13:10,326-0500 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC
> call StorageDomain.getInfo failed (error 358) in 0.48 seconds (__init__:573)
>
> ling
>
>
> Any help is appreciated.
>
> thanks Andy
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171221/39dcfc05/attachment.html>


More information about the Users mailing list