[ovirt-users] HA Broker fails after 4.2 upgrade

Martin Sivak msivak at redhat.com
Thu Dec 21 09:53:09 UTC 2017


Btw lacking vdsm logs here this seems to be the same issue Jason
Brooks just reported here too. Hosted engine is trying to get storage
info from VDSM and gets error instead..

--
Martin Sivak
SLA / oVirt

On Thu, Dec 21, 2017 at 9:02 AM, Simone Tiraboschi <stirabos at redhat.com> wrote:
>
>
> On Thu, Dec 21, 2017 at 5:13 AM, Andy <farkey_2000 at yahoo.com> wrote:
>>
>> Hello all,
>>
>> I just upgraded my OVIRT instance to 4.2, the engine completed
>> successfully, however after I upgraded the hosts the HA Broker will not
>> start.  The 2 hosts are running CentOS 7.4, running gluster and CTDB.  The
>> VIPS are up and can be reached from both hosts as well as I can mount the
>> gluster storage.
>>
>> The error from the agent.log:
>>
>> MainThread::INFO::2017-12-20
>> 21:02:19,219::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> ovirt-hosted-engine-ha agent 2.2.2 started
>> MainThread::INFO::2017-12-20
>> 21:02:19,346::hosted_engine::243::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>> Found certificate common name: hm3svr01.hm3.loc
>> MainThread::INFO::2017-12-20
>> 21:02:20,478::hosted_engine::525::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>> Initializing ha-broker connection
>> MainThread::INFO::2017-12-20
>> 21:02:20,482::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>> Starting monitor ping, options {'addr': '192.168.3.1'}
>> MainThread::ERROR::2017-12-20
>> 21:02:20,483::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>> Failed to start necessary monitors
>> MainThread::ERROR::2017-12-20
>> 21:02:20,485::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Traceback (most recent call last):
>>   File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 131, in _run_agent
>>     return action(he)
>>   File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 55, in action_proper
>>     return he.start_monitoring()
>>   File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 416, in start_monitoring
>>     self._initialize_broker()
>>   File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 535, in _initialize_broker
>>     m.get('options', {}))
>>   File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>> line 83, in start_monitor
>>     .format(type, options, e))
>> RequestError: Failed to start monitor ping, options {'addr':
>> '192.168.x.x'}: [Errno 2] No such file or directory
>
>
> This simply means that the broker is not ready.
>
>>
>>
>>
>> The broker.log:
>>
>> MainThread::INFO::2017-12-20
>> 23:06:19,405::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Finished loading submonitors
>> MainThread::INFO::2017-12-20
>> 23:06:20,324::storage_backends::346::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> Connecting the storage
>> MainThread::INFO::2017-12-20
>> 23:06:20,325::storage_server::252::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Connecting storage server
>> MainThread::INFO::2017-12-20
>> 23:06:20,849::storage_server::259::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Connecting storage server
>> MainThread::WARNING::2017-12-20
>> 23:06:20,913::storage_broker::96::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>> Can't connect vdsm storage: Connection to storage server failed
>> MainThread::INFO::2017-12-20
>> 23:06:22,087::broker::45::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> ovirt-hosted-engine-ha broker 2.2.2 started
>> MainThread::INFO::2017-12-20
>> 23:06:22,088::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Searching for submonitors in
>> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/s
>> ubmonitors
>> MainThread::INFO::2017-12-20
>> 23:06:22,089::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor cpu-load
>> MainThread::INFO::2017-12-20
>> 23:06:22,093::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor cpu-load-no-engine
>> MainThread::INFO::2017-12-20
>> 23:06:22,146::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor engine-health
>> MainThread::INFO::2017-12-20
>> 23:06:22,147::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mem-free
>> MainThread::INFO::2017-12-20
>> 23:06:22,147::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mem-load
>> MainThread::INFO::2017-12-20
>> 23:06:22,148::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mgmt-bridge
>> MainThread::INFO::2017-12-20
>> 23:06:22,149::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor ping
>> MainThread::INFO::2017-12-20
>> 23:06:22,149::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor storage-domain
>> MainThread::INFO::2017-12-20
>> 23:06:22,150::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor cpu-load
>> MainThread::INFO::2017-12-20
>> 23:06:22,151::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor cpu-load-no-engine
>> MainThread::INFO::2017-12-20
>> 23:06:22,152::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor engine-health
>> MainThread::INFO::2017-12-20
>> 23:06:22,153::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mem-free
>> MainThread::INFO::2017-12-20
>> 23:06:22,153::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mem-load
>> MainThread::INFO::2017-12-20
>> 23:06:22,154::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mgmt-bridge
>> MainThread::INFO::2017-12-20
>> 23:06:22,154::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor ping
>> MainThread::INFO::2017-12-20
>> 23:06:22,155::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor storage-domain
>>
>
>
> Could you please change in /etc/ovirt-hosted-engine-ha/broker-log.conf
> from
> [logger_root]
> level=INFO
> to
> [logger_root]
> level=DEBUG
>
> restart the broker service, wait a few minutes and then share its debug log?
>
>>
>>
>> The VDSM log has alot of JSON errors with the storage fai2017-12-20
>> 23:13:00,311-0500 INFO  (jsonrpc/6) [vdsm.api] FINISH getStorageDomainInfo
>> error=Storage domain does not exist:
>> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) from=::1,54630,
>> task_id=ff009157-48f3-480c-b8fe-b8d0a791c922 (api:50)
>> 2017-12-20 23:13:00,312-0500 ERROR (jsonrpc/6) [storage.TaskManager.Task]
>> (Task='ff009157-48f3-480c-b8fe-b8d0a791c922') Unexpected error (task:875)
>> 2017-12-20 23:13:00,314-0500 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH
>> getStorageDomainInfo error=Storage domain does not exist:
>> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) (dispatcher:82)
>> 2017-12-20 23:13:00,314-0500 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC
>> call StorageDomain.getInfo failed (error 358) in 0.48 seconds (__init__:573)
>>     raise convert_to_error(kind, result)
>> 2017-12-20 23:13:03,092-0500 INFO  (jsonrpc/3) [vdsm.api] FINISH
>> getStorageDomainInfo error=Storage domain does not exist:
>> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) from=::1,54632,
>> task_id=39e022e5-db99-4bc4-88e1-9a218104b3c7 (api:50)
>> 2017-12-20 23:13:03,093-0500 ERROR (jsonrpc/3) [storage.TaskManager.Task]
>> (Task='39e022e5-db99-4bc4-88e1-9a218104b3c7') Unexpected error (task:875)
>> 2017-12-20 23:13:03,095-0500 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH
>> getStorageDomainInfo error=Storage domain does not exist:
>> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) (dispatcher:82)
>> 2017-12-20 23:13:03,095-0500 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC
>> call StorageDomain.getInfo failed (error 358) in 0.49 seconds (__init__:573)
>>     raise convert_to_error(kind, result)
>> 2017-12-20 23:13:07,568-0500 INFO  (jsonrpc/4) [vdsm.api] FINISH
>> getStorageDomainInfo error=Storage domain does not exist:
>> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) from=::1,54640,
>> task_id=c1b1b1a1-a7e6-494a-bda6-19c617820dec (api:50)
>> 2017-12-20 23:13:07,569-0500 ERROR (jsonrpc/4) [storage.TaskManager.Task]
>> (Task='c1b1b1a1-a7e6-494a-bda6-19c617820dec') Unexpected error (task:875)
>> 2017-12-20 23:13:07,571-0500 ERROR (jsonrpc/4) [storage.Dispatcher] FINISH
>> getStorageDomainInfo error=Storage domain does not exist:
>> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) (dispatcher:82)
>> 2017-12-20 23:13:07,571-0500 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC
>> call StorageDomain.getInfo failed (error 358) in 0.48 seconds (__init__:573)
>>     raise convert_to_error(kind, result)
>> 2017-12-20 23:13:10,323-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH
>> getStorageDomainInfo error=Storage domain does not exist:
>> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) from=::1,54642,
>> task_id=6354fa3d-933c-4fd0-9301-00f8abd29ec7 (api:50)
>> 2017-12-20 23:13:10,323-0500 ERROR (jsonrpc/0) [storage.TaskManager.Task]
>> (Task='6354fa3d-933c-4fd0-9301-00f8abd29ec7') Unexpected error (task:875)
>> 2017-12-20 23:13:10,325-0500 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH
>> getStorageDomainInfo error=Storage domain does not exist:
>> (u'1cc6cc89-571e-4b6a-9d41-c742d763e1cc',) (dispatcher:82)
>> 2017-12-20 23:13:10,326-0500 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC
>> call StorageDomain.getInfo failed (error 358) in 0.48 seconds (__init__:573)
>>
>> ling
>>
>>
>> Any help is appreciated.
>>
>> thanks Andy
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>


More information about the Users mailing list