
Did you try virsh list --inactive Eric Evans Digital Data Services LLC. 304.660.9080 From: Shareef Jalloq <shareef@jalloq.co.uk> Sent: Wednesday, April 8, 2020 5:58 PM To: Strahil Nikolov <hunter86_bg@yahoo.com> Cc: Ovirt Users <users@ovirt.org> Subject: [ovirt-users] Re: ovirt-engine unresponsive - how to rescue? I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get: The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted? Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email. Shareef. On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk <mailto:shareef@jalloq.co.uk> > wrote: Right, still down. I've run virsh and it doesn't know anything about the engine vm. I've restarted the broker and agent services and I still get nothing in virsh->list. In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors: broker.log: MainThread::INFO::2020-04-08 20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::INFO::2020-04-08 20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors MainThread::INFO::2020-04-08 20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-08 20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors MainThread::INFO::2020-04-08 20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage MainThread::INFO::2020-04-08 20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-08 20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-08 20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::WARNING::2020-04-08 20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) MainThread::INFO::2020-04-08 20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::INFO::2020-04-08 20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors agent.log: MainThread::ERROR::2020-04-08 20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2020-04-08 20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down MainThread::INFO::2020-04-08 20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started MainThread::INFO::2020-04-08 20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-node-01.phoelex.com <http://ovirt-node-01.phoelex.com> MainThread::INFO::2020-04-08 20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2020-04-08 20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} MainThread::ERROR::2020-04-08 20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2020-04-08 20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor ).format(t=type, o=options, e=e) RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}] MainThread::ERROR::2020-04-08 20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2020-04-08 20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com <mailto:hunter86_bg@yahoo.com> > wrote: On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" <matonb@ltresources.co.uk <mailto:matonb@ltresources.co.uk> > wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk <mailto:shareef@jalloq.co.uk> > wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com <http://ovirt-node-00.phoelex.com> (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com <http://ovirt-node-00.phoelex.com> Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com <http://ovirt-node-01.phoelex.com> (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com <http://ovirt-node-01.phoelex.com> Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk <mailto:matonb@ltresources.co.uk> > wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk <mailto:shareef@jalloq.co.uk> > wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
This has to be resolved: Engine status : unknown stale-data Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha Best Regards, Strahil Nikolov