So, tracing this further, I’m pretty sure this is something about sanlock.
As best I can tell this[1] seems to be the failure that is blocking importing the pool, creating storage domains, importing the HE, etc. Contrary to the log, sanlock is running; I verified it starts on system-boot and restarts just fine.
I found one reference to someone having a similar problem in 3.6, but that appeared to have been a permission issue I’m not afflicted with.
How can I move past this?
TIA,
-j
[1] agent.log:
MainThread::WARNING::2017-04-19 17:07:13,537::agent::209:: ovirt_hosted_engine_ha.agent. agent.Agent::(_run_agent) Restarting agent, attempt '6'
MainThread::INFO::2017-04-19 17:07:13,567::hosted_engine::242::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_get_hostname) Found certificate common name: sc5-ovirt-2.squaretrade.com
MainThread::INFO::2017-04-19 17:07:13,569::hosted_engine::604::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_initialize_ vdsm) Initializing VDSM
MainThread::INFO::2017-04-19 17:07:16,044::hosted_engine::630::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_initialize_ storage_images) Connecting the storage
MainThread::INFO::2017-04-19 17:07:16,045::storage_server::219::ovirt_hosted_engine_ha. lib.storage_server. StorageServer::(connect_ storage_server) Connecting storage server
MainThread::INFO::2017-04-19 17:07:20,876::storage_server::226::ovirt_hosted_engine_ha. lib.storage_server. StorageServer::(connect_ storage_server) Connecting storage server
MainThread::INFO::2017-04-19 17:07:20,893::storage_server::233::ovirt_hosted_engine_ha. lib.storage_server. StorageServer::(connect_ storage_server) Refreshing the storage domain
MainThread::INFO::2017-04-19 17:07:21,160::hosted_engine::657::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_initialize_ storage_images) Preparing images
MainThread::INFO::2017-04-19 17:07:21,160::image::126::ovirt_hosted_engine_ha.lib. image.Image::(prepare_images) Preparing images
MainThread::INFO::2017-04-19 17:07:23,954::hosted_engine::660::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_initialize_ storage_images) Refreshing vm.conf
MainThread::INFO::2017-04-19 17:07:23,955::config::485::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine. config::(refresh_vm_conf) Reloading vm.conf from the shared storage domain
MainThread::INFO::2017-04-19 17:07:23,955::config::412::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine. config::(_get_vm_conf_content_ from_ovf_store) Trying to get a fresher copy of vm configuration from the OVF_STORE
MainThread::WARNING::2017-04-19 17:07:26,741::ovf_store::107:: ovirt_hosted_engine_ha.lib. ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE
MainThread::ERROR::2017-04-19 17:07:26,744::config::450::ovirt_hosted_engine_ha.agent. hosted_engine.HostedEngine. config::(_get_vm_conf_content_ from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs
MainThread::INFO::2017-04-19 17:07:26,770::hosted_engine::509::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_initialize_ broker) Initializing ha-broker connection
MainThread::INFO::2017-04-19 17:07:26,771::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Starting monitor ping, options {'addr': '10.181.26.1'}
MainThread::INFO::2017-04-19 17:07:26,774::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Success, id 140621269798096
MainThread::INFO::2017-04-19 17:07:26,774::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
MainThread::INFO::2017-04-19 17:07:26,791::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Success, id 140621269798544
MainThread::INFO::2017-04-19 17:07:26,792::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
MainThread::INFO::2017-04-19 17:07:26,793::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Success, id 140621269798224
MainThread::INFO::2017-04-19 17:07:26,794::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '04ff4cf1-135a-4918-9a1f- 8023322f89a3', 'address': '0'}
MainThread::INFO::2017-04-19 17:07:26,796::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Success, id 140621269796816
MainThread::INFO::2017-04-19 17:07:26,796::brokerlink::130::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '04ff4cf1-135a-4918-9a1f- 8023322f89a3', 'address': '0'}
MainThread::INFO::2017-04-19 17:07:26,798::brokerlink::141::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(start_ monitor) Success, id 140621269797840
MainThread::INFO::2017-04-19 17:07:31,513::brokerlink::179::ovirt_hosted_engine_ha.lib. brokerlink.BrokerLink::(set_ storage_domain) Success, id 140621269798480
MainThread::INFO::2017-04-19 17:07:31,513::hosted_engine::601::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_initialize_ broker) Broker initialized, all submonitors started
MainThread::ERROR::2017-04-19 17:07:31,556::agent::196::ovirt_hosted_engine_ha.agent. agent.Agent::(_run_agent) Service sanlock is not running and the admin is responsible for starting it. Waiting...
MainThread::WARNING::2017-04-19 17:07:36,562::agent::209:: ovirt_hosted_engine_ha.agent. agent.Agent::(_run_agent) Restarting agent, attempt '7'
MainThread::INFO::2017-04-19 17:07:36,592::hosted_engine::242::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_get_hostname) Found certificate common name: sc5-ovirt-2.squaretrade.com
- - - prior message below - - -
I trialed installing the hosted engine, following the instructions at http://www.ovirt.org/documentation/self-hosted/ . This is using Gluster as the backend storage subsystem.chap-Deploying_Self-Hosted_ Engine/
Answer file at the end.
Per the docs,
"When the hosted-engine deployment script completes successfully, the oVirt Engine is configured and running on your host. The Engine has already configured the data center, cluster, host, the Engine virtual machine, and a shared storage domain dedicated to the Engine virtual machine.”
In my case, this is false. The installation claims success, but the hosted engine VM stays stopped, unless I start it manually. If I start it manually, the default DC is down, the default cluster has the installation host in the cluster, there is no storage, and the VM doesn’t show up in the GUI. In this install run, I have not yet started the engine manually.
I assume this is related to the errors in ovirt-hosted-engine-setup.log, below. (The timestamps are confusing; it looks like the Python errors are logged some time after they’re captured or something.) The HA broker and agent logs just show them looping in the sequence below.
Is there a decent way to pick this up and continue? If not, how do I make this work?
Thanks,
-j
- - - - ovirt-hosted-engine-setup.log snippet: - - - -
2017-04-19 12:29:55 DEBUG otopi.context context._executeMethod:128 Stage late_setup METHOD otopi.plugins.gr_he_setup.system.vdsmenv.Plugin._late_ setup
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd systemd.status:90 check service vdsmd status
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/bin/systemctl', 'status', 'vdsmd.service'), executable='None', cwd='None', env=None
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'status', 'vdsmd.service'), rc=0
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'status', 'vdsmd.service') stdout:
● vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2017-04-19 12:26:59 PDT; 2min 55s ago
Process: 67370 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
Process: 69995 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
Main PID: 70062 (vdsm)
CGroup: /system.slice/vdsmd.service
└─70062 /usr/bin/python2 /usr/share/vdsm/vdsm
Apr 19 12:29:00 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:00 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root ERROR failed to retrieve Hosted Engine HA info
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
stats = instance.get_all_stats()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/client/client.py", line 102, in get_all_stats
with broker.connection(self._retries, self._wait):
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 99, in connection
self.connect(retries, wait)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 78, in connect
raise BrokerConnectionError(error_msg)
BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:15 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:15 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root ERROR failed to retrieve Hosted Engine HA info
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
stats = instance.get_all_stats()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/client/client.py", line 102, in get_all_stats
with broker.connection(self._retries, self._wait):
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 99, in connection
self.connect(retries, wait)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 78, in connect
raise BrokerConnectionError(error_msg)
BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN File: /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918- 9a1f-8023322f89a3.com.redhat. rhevm.vdsm already removed
Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN File: /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918- 9a1f-8023322f89a3.org.qemu. guest_agent.0 already removed
Apr 19 12:29:30 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:30 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root ERROR failed to retrieve Hosted Engine HA info
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
stats = instance.get_all_stats()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/client/client.py", line 102, in get_all_stats
with broker.connection(self._retries, self._wait):
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 99, in connection
self.connect(retries, wait)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 78, in connect
raise BrokerConnectionError(error_msg)
BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:45 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:45 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root ERROR failed to retrieve Hosted Engine HA info
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
stats = instance.get_all_stats()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/client/client.py", line 102, in get_all_stats
with broker.connection(self._retries, self._wait):
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 99, in connection
self.connect(retries, wait)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 78, in connect
raise BrokerConnectionError(error_msg)
BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'status', 'vdsmd.service') stderr:
2017-04-19 12:29:57 DEBUG otopi.plugins.gr_he_setup.system.vdsmenv util.connect_vdsm_json_rpc:209 {'status': {'message': 'Done', 'code': 0}, u'systemProductName': u'PowerEdge R630', u'systemSerialNumber': u'GTH5942', u'systemFamily': u'PowerEdge', u'systemUUID': u'4C4C4544-0054-4810-8035- C7C04F393432', u'systemManufacturer': u'Dell Inc.'}
2017-04-19 12:29:57 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN
2017-04-19 12:29:57 DEBUG otopi.context context.dumpEnvironment:770 ENV OVEHOSTED_VDSM/vdscli=_Server:'<vdsm.jsonrpcvdscli._Server object at 0x34a8290>'
2017-04-19 12:29:57 DEBUG otopi.context context.dumpEnvironment:774 ENVIRONMENT DUMP - END
2017-04-19 12:29:57 DEBUG otopi.context context._executeMethod:128 Stage late_setup METHOD otopi.plugins.gr_he_common.vm.misc.Plugin._late_setup
2017-04-19 12:29:57 DEBUG otopi.context context._executeMethod:134 condition False
2017-04-19 12:29:57 DEBUG otopi.context context._executeMethod:128 Stage late_setup METHOD otopi.plugins.gr_he_setup.pki.vdsmpki.Plugin._late_setup
2017-04-19 12:29:57 DEBUG otopi.plugins.gr_he_setup.pki.vdsmpki plugin.executeRaw:813 execute: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/ server-cert.pem'), executable='None', cwd='None', env=None
2017-04-19 12:29:57 DEBUG otopi.plugins.gr_he_setup.pki.vdsmpki plugin.executeRaw:863 execute-result: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/ server-cert.pem'), rc=0
2017-04-19 12:29:57 DEBUG otopi.plugins.gr_he_setup.pki.vdsmpki plugin.execute:921 execute-output: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/ server-cert.pem') stdout:
- - - - broker.log: - - - -
Thread-2970::INFO::2017-04-19 14:29:51,591::cpu_load_no_engine::113::cpu_load_no_ engine.EngineHealth::( calculate_load) System load total=0.0117, engine=0.0000, non-engine=0.0117
Thread-2976::INFO::2017-04-19 14:29:56,347::listener::134::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( setup) Connection established
Thread-2976::INFO::2017-04-19 14:29:56,361::listener::186::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( handle) Connection closed
Thread-2977::INFO::2017-04-19 14:29:57,628::listener::134::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( setup) Connection established
Thread-2977::INFO::2017-04-19 14:29:57,629::monitor::66::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Starting submonitor ping
Thread-2977::INFO::2017-04-19 14:29:57,631::monitor::79::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Started submonitor ping, id 140603614328464
Thread-2977::INFO::2017-04-19 14:29:57,662::monitor::66::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Starting submonitor mgmt-bridge
Thread-2977::INFO::2017-04-19 14:29:57,667::monitor::79::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Started submonitor mgmt-bridge, id 140603614326992
Thread-2977::INFO::2017-04-19 14:29:57,688::monitor::66::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Starting submonitor mem-free
Thread-2977::INFO::2017-04-19 14:29:57,735::monitor::79::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Started submonitor mem-free, id 140604548742096
Thread-2977::INFO::2017-04-19 14:29:57,756::monitor::66::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Starting submonitor cpu-load-no-engine
Thread-2977::INFO::2017-04-19 14:29:57,798::monitor::79::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Started submonitor cpu-load-no-engine, id 43463568
Thread-2977::INFO::2017-04-19 14:29:57,841::monitor::66::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Starting submonitor engine-health
Thread-2977::INFO::2017-04-19 14:29:57,863::monitor::79::ovirt_hosted_engine_ha.broker. monitor.Monitor::(start_ submonitor) Started submonitor engine-health, id 140604881915728
Thread-2983::INFO::2017-04-19 14:29:58,153::listener::134::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( setup) Connection established
Thread-2984::INFO::2017-04-19 14:30:02,179::listener::134::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( setup) Connection established
Thread-2977::INFO::2017-04-19 14:30:02,249::storage_backends::119::ovirt_hosted_ engine_ha.lib.storage_ backends::(_check_symlinks) Cleaning up stale LV link '/rhev/data-center/mnt/ glusterSD/sc5-gluster-1:_ ovirt__engine/a81b229c-2422- 4110-af45-1bc4d9a01bab/ha_ agent/hosted-engine.lockspace'
Thread-2977::INFO::2017-04-19 14:30:02,321::storage_backends::119::ovirt_hosted_ engine_ha.lib.storage_ backends::(_check_symlinks) Cleaning up stale LV link '/rhev/data-center/mnt/ glusterSD/sc5-gluster-1:_ ovirt__engine/a81b229c-2422- 4110-af45-1bc4d9a01bab/ha_ agent/hosted-engine.metadata'
Thread-2983::INFO::2017-04-19 14:30:02,334::listener::186::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( handle) Connection closed
Thread-2984::INFO::2017-04-19 14:30:02,340::listener::186::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( handle) Connection closed
Thread-2969::INFO::2017-04-19 14:30:02,343::mem_free::50::mem_free.MemFree::(action) memFree: 254533
Thread-2980::INFO::2017-04-19 14:30:02,382::mem_free::50::mem_free.MemFree::(action) memFree: 254530
Thread-2985::INFO::2017-04-19 14:30:07,991::listener::134::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( setup) Connection established
Thread-2985::INFO::2017-04-19 14:30:08,082::listener::186::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::( handle) Connection closed
Thread-2969::INFO::2017-04-19 14:30:08,129::mem_free::50::mem_free.MemFree::(action) memFree: 254534
Thread-2966::INFO::2017-04-19 14:30:09,719::listener::186::ovirt_hosted_engine_ha.broker. listener.ConnectionHandler::(