ovirt-engine unresponsive - how to rescue?

So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine. What are my next steps? Shareef.

First steps, on one of your hosts as root: To get information: hosted-engine --vm-status To start the engine: hosted-engine --vm-start On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...

Thanks! The status hangs due to, I guess, the VM being down.... [root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch but this doesn't seem to do anything. OK, after a while I get a status of it being barfed... --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...

On the host you tried to restart the engine on: Add an alias to virsh (authenticates with virsh_auth.conf) alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' Then run virsh: virsh virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running HostedEngine should be in the list, try and resume the engine: virsh # resume HostedEngine On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...

On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" <matonb@ltresources.co.uk> wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
This has to be resolved: Engine status : unknown stale-data Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha Best Regards, Strahil Nikolov

Right, still down. I've run virsh and it doesn't know anything about the engine vm. I've restarted the broker and agent services and I still get nothing in virsh->list. In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors: broker.log: MainThread::INFO::2020-04-08 20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::INFO::2020-04-08 20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors MainThread::INFO::2020-04-08 20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-08 20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors MainThread::INFO::2020-04-08 20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage MainThread::INFO::2020-04-08 20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-08 20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-08 20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::WARNING::2020-04-08 20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) MainThread::INFO::2020-04-08 20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::INFO::2020-04-08 20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors agent.log: MainThread::ERROR::2020-04-08 20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2020-04-08 20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down MainThread::INFO::2020-04-08 20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started MainThread::INFO::2020-04-08 20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-node-01.phoelex.com MainThread::INFO::2020-04-08 20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2020-04-08 20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} MainThread::ERROR::2020-04-08 20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2020-04-08 20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor ).format(t=type, o=options, e=e) RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}] MainThread::ERROR::2020-04-08 20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2020-04-08 20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov

I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get: The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted? Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email. Shareef. On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list.
In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:
broker.log:
MainThread::INFO::2020-04-08 20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08 20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-04-08 20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-08 20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::INFO::2020-04-08 20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-04-08 20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-08 20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-08 20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::WARNING::2020-04-08 20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::INFO::2020-04-08 20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08 20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
agent.log:
MainThread::ERROR::2020-04-08 20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-04-08 20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
MainThread::INFO::2020-04-08 20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2020-04-08 20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-node-01.phoelex.com
MainThread::INFO::2020-04-08 20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2020-04-08 20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
MainThread::ERROR::2020-04-08 20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2020-04-08 20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}]
MainThread::ERROR::2020-04-08 20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-04-08 20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov

Did you try virsh list --inactive Eric Evans Digital Data Services LLC. 304.660.9080 From: Shareef Jalloq <shareef@jalloq.co.uk> Sent: Wednesday, April 8, 2020 5:58 PM To: Strahil Nikolov <hunter86_bg@yahoo.com> Cc: Ovirt Users <users@ovirt.org> Subject: [ovirt-users] Re: ovirt-engine unresponsive - how to rescue? I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get: The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted? Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email. Shareef. On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk <mailto:shareef@jalloq.co.uk> > wrote: Right, still down. I've run virsh and it doesn't know anything about the engine vm. I've restarted the broker and agent services and I still get nothing in virsh->list. In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors: broker.log: MainThread::INFO::2020-04-08 20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::INFO::2020-04-08 20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors MainThread::INFO::2020-04-08 20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-08 20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors MainThread::INFO::2020-04-08 20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage MainThread::INFO::2020-04-08 20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-08 20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-08 20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::WARNING::2020-04-08 20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) MainThread::INFO::2020-04-08 20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::INFO::2020-04-08 20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors agent.log: MainThread::ERROR::2020-04-08 20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2020-04-08 20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down MainThread::INFO::2020-04-08 20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started MainThread::INFO::2020-04-08 20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-node-01.phoelex.com <http://ovirt-node-01.phoelex.com> MainThread::INFO::2020-04-08 20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2020-04-08 20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} MainThread::ERROR::2020-04-08 20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2020-04-08 20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor ).format(t=type, o=options, e=e) RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}] MainThread::ERROR::2020-04-08 20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2020-04-08 20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com <mailto:hunter86_bg@yahoo.com> > wrote: On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" <matonb@ltresources.co.uk <mailto:matonb@ltresources.co.uk> > wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk <mailto:shareef@jalloq.co.uk> > wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com <http://ovirt-node-00.phoelex.com> (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com <http://ovirt-node-00.phoelex.com> Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com <http://ovirt-node-01.phoelex.com> (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com <http://ovirt-node-01.phoelex.com> Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk <mailto:matonb@ltresources.co.uk> > wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk <mailto:shareef@jalloq.co.uk> > wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
This has to be resolved: Engine status : unknown stale-data Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha Best Regards, Strahil Nikolov

Ah hah! Ok, so I've managed to start it using virsh on the second host but my first host is still dead. First of all, what are these 56,317 .prob- files that get dumped to the NFS mounts? Secondly, why doesn't the node mount the NFS directories at boot? Is that the issue with this particular node? On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote:
Did you try virsh list --inactive
Eric Evans
Digital Data Services LLC.
304.660.9080
*From:* Shareef Jalloq <shareef@jalloq.co.uk> *Sent:* Wednesday, April 8, 2020 5:58 PM *To:* Strahil Nikolov <hunter86_bg@yahoo.com> *Cc:* Ovirt Users <users@ovirt.org> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue?
I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get:
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted?
Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email.
Shareef.
On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list.
In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:
broker.log:
MainThread::INFO::2020-04-08 20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08 20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-04-08 20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-08 20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::INFO::2020-04-08 20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-04-08 20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-08 20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-08 20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::WARNING::2020-04-08 20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::INFO::2020-04-08 20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08 20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
agent.log:
MainThread::ERROR::2020-04-08 20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-04-08 20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
MainThread::INFO::2020-04-08 20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2020-04-08 20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-node-01.phoelex.com
MainThread::INFO::2020-04-08 20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2020-04-08 20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
MainThread::ERROR::2020-04-08 20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2020-04-08 20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}]
MainThread::ERROR::2020-04-08 20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-04-08 20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov

Hmmm, virsh tells me the HE is running but it hasn't come up and the agent.log is full of the same errors. On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ah hah! Ok, so I've managed to start it using virsh on the second host but my first host is still dead.
First of all, what are these 56,317 .prob- files that get dumped to the NFS mounts?
Secondly, why doesn't the node mount the NFS directories at boot? Is that the issue with this particular node?
On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote:
Did you try virsh list --inactive
Eric Evans
Digital Data Services LLC.
304.660.9080
*From:* Shareef Jalloq <shareef@jalloq.co.uk> *Sent:* Wednesday, April 8, 2020 5:58 PM *To:* Strahil Nikolov <hunter86_bg@yahoo.com> *Cc:* Ovirt Users <users@ovirt.org> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue?
I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get:
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted?
Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email.
Shareef.
On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list.
In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:
broker.log:
MainThread::INFO::2020-04-08 20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08 20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-04-08 20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-08 20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::INFO::2020-04-08 20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-04-08 20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-08 20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-08 20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::WARNING::2020-04-08 20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::INFO::2020-04-08 20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08 20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
agent.log:
MainThread::ERROR::2020-04-08 20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-04-08 20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
MainThread::INFO::2020-04-08 20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2020-04-08 20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-node-01.phoelex.com
MainThread::INFO::2020-04-08 20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2020-04-08 20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
MainThread::ERROR::2020-04-08 20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2020-04-08 20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}]
MainThread::ERROR::2020-04-08 20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-04-08 20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine.
What are my next steps?
Shareef. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov

Don't know if this is useful or not, but I just tried to shutdown and start another VM on one of the hosts and get the following error: virsh # start scratch error: Failed to start domain scratch error: Network not found: no network with matching name 'vdsm-ovirtmgmt' Is this not referring to the interface name as the network is called 'ovirtmgnt'. On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Hmmm, virsh tells me the HE is running but it hasn't come up and the agent.log is full of the same errors.
On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ah hah! Ok, so I've managed to start it using virsh on the second host but my first host is still dead.
First of all, what are these 56,317 .prob- files that get dumped to the NFS mounts?
Secondly, why doesn't the node mount the NFS directories at boot? Is that the issue with this particular node?
On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote:
Did you try virsh list --inactive
Eric Evans
Digital Data Services LLC.
304.660.9080
*From:* Shareef Jalloq <shareef@jalloq.co.uk> *Sent:* Wednesday, April 8, 2020 5:58 PM *To:* Strahil Nikolov <hunter86_bg@yahoo.com> *Cc:* Ovirt Users <users@ovirt.org> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue?
I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get:
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted?
Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email.
Shareef.
On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list.
In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:
broker.log:
MainThread::INFO::2020-04-08 20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08 20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-04-08 20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08 20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-08 20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-08 20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-08 20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-08 20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::INFO::2020-04-08 20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-04-08 20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-08 20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-08 20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::WARNING::2020-04-08 20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::INFO::2020-04-08 20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08 20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
agent.log:
MainThread::ERROR::2020-04-08 20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-04-08 20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
MainThread::INFO::2020-04-08 20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2020-04-08 20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-node-01.phoelex.com
MainThread::INFO::2020-04-08 20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2020-04-08 20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
MainThread::ERROR::2020-04-08 20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2020-04-08 20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}]
MainThread::ERROR::2020-04-08 20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2020-04-08 20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
First steps, on one of your hosts as root:
To get information: hosted-engine --vm-status
To start the engine: hosted-engine --vm-start
On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
> So my engine has gone down and I can't ssh into it either. If I try to > log into the web-ui of the node it is running on, I get redirected because > the node can't reach the engine. > > What are my next steps? > > Shareef. > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-leave@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov

On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Don't know if this is useful or not, but I just tried to shutdown and start another VM on one of the hosts and get the following error:
virsh # start scratch
error: Failed to start domain scratch
error: Network not found: no network with matching name 'vdsm-ovirtmgmt'
Is this not referring to the interface name as the network is called 'ovirtmgnt'.
On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Hmmm, virsh tells me the HE is running but it hasn't come up and the agent.log is full of the same errors.
On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ah hah! Ok, so I've managed to start it using virsh on the second host but my first host is still dead.
First of all, what are these 56,317 .prob- files that get dumped to the NFS mounts?
Secondly, why doesn't the node mount the NFS directories at boot? Is that the issue with this particular node?
On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote:
Did you try virsh list --inactive
Eric Evans
Digital Data Services LLC.
304.660.9080
*From:* Shareef Jalloq <shareef@jalloq.co.uk> *Sent:* Wednesday, April 8, 2020 5:58 PM *To:* Strahil Nikolov <hunter86_bg@yahoo.com> *Cc:* Ovirt Users <users@ovirt.org> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue?
I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get:
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted?
Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email.
Shareef.
On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list.
In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:
broker.log:
MainThread::INFO::2020-04-08
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-04-08
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-08
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
MainThread::INFO::2020-04-08
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-08
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-08
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-08
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-08
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::INFO::2020-04-08
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
agent.log:
MainThread::ERROR::2020-04-08
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent
MainThread::INFO::2020-04-08
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
MainThread::INFO::2020-04-08
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2020-04-08
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: ovirt-node-01.phoelex.com
MainThread::INFO::2020-04-08
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Initializing ha-broker connection
MainThread::INFO::2020-04-08
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
MainThread::ERROR::2020-04-08
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Failed to start necessary monitors
MainThread::ERROR::2020-04-08
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent
return action(he)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper
return he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 432, in start_monitoring
self._initialize_broker()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 556, in _initialize_broker
m.get('options', {}))
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}]
MainThread::ERROR::2020-04-08
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent
MainThread::INFO::2020-04-08
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Thanks!
The status hangs due to, I guess, the VM being down....
[root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting VM in WaitForLaunch
but this doesn't seem to do anything. OK, after a while I get a status of it being barfed...
--== Host ovirt-node-00.phoelex.com (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : False Hostname : ovirt-node-00.phoelex.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 9c4a034b local_conf_timestamp : 523362 Host timestamp : 523608 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=523608 (Wed Apr 8 16:17:11 2020) host-id=1 score=3400 vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host ovirt-node-01.phoelex.com (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt-node-01.phoelex.com Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down_unexpected", "detail": "Down"} Score : 0 stopped : False Local maintenance : False crc32 : 5045f2eb local_conf_timestamp : 1737037 Host timestamp : 1737283 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1737283 (Wed Apr 8 16:16:17 2020) host-id=2 score=0 vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False
On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> wrote:
> First steps, on one of your hosts as root: > > To get information: > hosted-engine --vm-status > > To start the engine: > hosted-engine --vm-start > > > On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote: > >> So my engine has gone down and I can't ssh into it either. If I try to >> log into the web-ui of the node it is running on, I get redirected because >> the node can't reach the engine. >> >> What are my next steps? >> >> Shareef. >> _______________________________________________ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-leave@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>> >
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov
Hi Shareef, The flow of activation oVirt is more complex than a plain KVM. Mounting of the domains happen during the activation of the node ( the HostedEngine is activating everything needed). Focus on the HostedEngine VM. Is it running properly ? If not,try: 1. Verify that the storage domain exists 2. Check if it has 'ha_agents' directory 3. Check if the links are OK, if not you can safely remove the links 4. Next check the services are running: A) sanlock B) supervdsmd C) vdsmd D) libvirtd 5. Increase the log level for broker and agent services: cd /etc/ovirt-hosted-engine-ha vim *-log.conf systemctl restart ovirt-ha-broker ovirt-ha-agent 6. Check what they are complaining about Keep in mind that agent will keep throwing errors untill the broker stops doing it (agent depends on broker), so broker must be OK before peoceeding with the agent log. About the manual VM start, you need 2 things: 1. Define the VM network # cat vdsm-ovirtmgmt.xml <network> <name>vdsm-ovirtmgmt</name> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> <forward mode='bridge'/> <bridge name='ovirtmgmt'/> </network> [root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml 2. Get an xml definition which can be found in the vdsm log. Every VM at start up has it's configuration printed out in vdsm log on the host it starts. Save to file and then: A) virsh define myvm.xml B) virsh start myvm It seems there is/was a problem with your NFS shares. Best Regards, Strahil Nikolov

OK, let's go through this. I'm looking at the node that at least still has some VMs running. virsh also tells me that the HostedEngine VM is running but it's unresponsive and I can't shut it down. 1. All storage domains exist and are mounted. 2. The ha_agent exists: [root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ nas-01.phoelex.com\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ dom_md ha_agent images master 3. There are two links [root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ total 8 lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace -> /var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata -> /var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 4. The services exist but all seem to have some sort of warning: a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec* b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory* c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?* d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot parse process status data Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error 5 & 6. The broker log is continually printing this error: MainThread::INFO::2020-04-09 08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started MainThread::DEBUG::2020-04-09 08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Running broker MainThread::DEBUG::2020-04-09 08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) Starting monitor MainThread::INFO::2020-04-09 08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker /submonitors MainThread::INFO::2020-04-09 08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-09 08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-09 08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge MainThread::INFO::2020-04-09 08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free MainThread::INFO::2020-04-09 08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health MainThread::INFO::2020-04-09 08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors MainThread::DEBUG::2020-04-09 08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) Starting storage broker MainThread::DEBUG::2020-04-09 08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting to VDSM MainThread::DEBUG::2020-04-09 08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) Creating a new json-rpc connection to VDSM Client localhost:54321::DEBUG::2020-04-09 08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 139992488138496)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7f528acabc90>>, args=(), kwargs={}) Client localhost:54321::DEBUG::2020-04-09 08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) Stomp connection established MainThread::DEBUG::2020-04-09 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response MainThread::INFO::2020-04-09 08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage MainThread::INFO::2020-04-09 08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::DEBUG::2020-04-09 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2020-04-09 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2020-04-09 08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available MainThread::INFO::2020-04-09 08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::DEBUG::2020-04-09 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2020-04-09 08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) [{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] MainThread::INFO::2020-04-09 08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::DEBUG::2020-04-09 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2020-04-09 08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Error refreshing storage domain: Command StorageDomain.getStats with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) MainThread::DEBUG::2020-04-09 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response MainThread::DEBUG::2020-04-09 08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) MainThread::WARNING::2020-04-09 08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) The UUID it is moaning about is indeed the one that the HA sits on and is the one I listed the contents of in step 2 above. So why can't it see this domain? Thanks, Shareef. On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Don't know if this is useful or not, but I just tried to shutdown and start another VM on one of the hosts and get the following error:
virsh # start scratch
error: Failed to start domain scratch
error: Network not found: no network with matching name 'vdsm-ovirtmgmt'
Is this not referring to the interface name as the network is called 'ovirtmgnt'.
On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Hmmm, virsh tells me the HE is running but it hasn't come up and the agent.log is full of the same errors.
On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ah hah! Ok, so I've managed to start it using virsh on the second host but my first host is still dead.
First of all, what are these 56,317 .prob- files that get dumped to the NFS mounts?
Secondly, why doesn't the node mount the NFS directories at boot? Is that the issue with this particular node?
On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote:
Did you try virsh list --inactive
Eric Evans
Digital Data Services LLC.
304.660.9080
*From:* Shareef Jalloq <shareef@jalloq.co.uk> *Sent:* Wednesday, April 8, 2020 5:58 PM *To:* Strahil Nikolov <hunter86_bg@yahoo.com> *Cc:* Ovirt Users <users@ovirt.org> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue?
I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get:
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted?
Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email.
Shareef.
On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list.
In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:
broker.log:
MainThread::INFO::2020-04-08
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-04-08
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-08
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
MainThread::INFO::2020-04-08
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-08
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-08
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-08
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-08
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::INFO::2020-04-08
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
agent.log:
MainThread::ERROR::2020-04-08
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent
MainThread::INFO::2020-04-08
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
MainThread::INFO::2020-04-08
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2020-04-08
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: ovirt-node-01.phoelex.com
MainThread::INFO::2020-04-08
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Initializing ha-broker connection
MainThread::INFO::2020-04-08
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
MainThread::ERROR::2020-04-08
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Failed to start necessary monitors
MainThread::ERROR::2020-04-08
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent
return action(he)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper
return he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 432, in start_monitoring
self._initialize_broker()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 556, in _initialize_broker
m.get('options', {}))
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}]
MainThread::ERROR::2020-04-08
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent
MainThread::INFO::2020-04-08
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote:
On the host you tried to restart the engine on:
Add an alias to virsh (authenticates with virsh_auth.conf)
alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then run virsh:
virsh
virsh # list Id Name State ---------------------------------------------------- xx HostedEngine Paused xx ********** running ... xx ********** running
HostedEngine should be in the list, try and resume the engine:
virsh # resume HostedEngine
On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
> Thanks! > > The status hangs due to, I guess, the VM being down.... > > [root@ovirt-node-01 ~]# hosted-engine --vm-start > VM exists and is down, cleaning up and restarting > VM in WaitForLaunch > > but this doesn't seem to do anything. OK, after a while I get a status of > it being barfed... > > --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : ovirt-node-00.phoelex.com > Host ID : 1 > Engine status : unknown stale-data > Score : 3400 > stopped : False > Local maintenance : False > crc32 : 9c4a034b > local_conf_timestamp : 523362 > Host timestamp : 523608 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=523608 (Wed Apr 8 16:17:11 2020) > host-id=1 > score=3400 > vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) > conf_on_shared_storage=True > maintenance=False > state=EngineDown > stopped=False > > > --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : ovirt-node-01.phoelex.com > Host ID : 2 > Engine status : {"reason": "bad vm status", "health": > "bad", "vm": "down_unexpected", "detail": "Down"} > Score : 0 > stopped : False > Local maintenance : False > crc32 : 5045f2eb > local_conf_timestamp : 1737037 > Host timestamp : 1737283 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=1737283 (Wed Apr 8 16:16:17 2020) > host-id=2 > score=0 > vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) > conf_on_shared_storage=True > maintenance=False > state=EngineUnexpectedlyDown > stopped=False > > On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett <matonb@ltresources.co.uk> > wrote: > >> First steps, on one of your hosts as root: >> >> To get information: >> hosted-engine --vm-status >> >> To start the engine: >> hosted-engine --vm-start >> >> >> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote: >> >>> So my engine has gone down and I can't ssh into it either. If I try to >>> log into the web-ui of the node it is running on, I get redirected because >>> the node can't reach the engine. >>> >>> What are my next steps? >>> >>> Shareef. >>> _______________________________________________ >>> Users mailing list -- users@ovirt.org >>> To unsubscribe send an email to users-leave@ovirt.org >>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>>> >>
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov
Hi Shareef,
The flow of activation oVirt is more complex than a plain KVM. Mounting of the domains happen during the activation of the node ( the HostedEngine is activating everything needed).
Focus on the HostedEngine VM. Is it running properly ?
If not,try: 1. Verify that the storage domain exists 2. Check if it has 'ha_agents' directory 3. Check if the links are OK, if not you can safely remove the links
4. Next check the services are running: A) sanlock B) supervdsmd C) vdsmd D) libvirtd
5. Increase the log level for broker and agent services:
cd /etc/ovirt-hosted-engine-ha vim *-log.conf
systemctl restart ovirt-ha-broker ovirt-ha-agent
6. Check what they are complaining about Keep in mind that agent will keep throwing errors untill the broker stops doing it (agent depends on broker), so broker must be OK before peoceeding with the agent log.
About the manual VM start, you need 2 things:
1. Define the VM network # cat vdsm-ovirtmgmt.xml <network> <name>vdsm-ovirtmgmt</name> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> <forward mode='bridge'/> <bridge name='ovirtmgmt'/> </network>
[root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml
2. Get an xml definition which can be found in the vdsm log. Every VM at start up has it's configuration printed out in vdsm log on the host it starts. Save to file and then: A) virsh define myvm.xml B) virsh start myvm
It seems there is/was a problem with your NFS shares.
Best Regards, Strahil Nikolov

On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, let's go through this. I'm looking at the node that at least still has some VMs running. virsh also tells me that the HostedEngine VM is running but it's unresponsive and I can't shut it down.
1. All storage domains exist and are mounted. 2. The ha_agent exists:
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ nas-01.phoelex.com\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
dom_md ha_agent images master
3. There are two links
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
total 8
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace -> /var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata -> /var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
4. The services exist but all seem to have some sort of warning:
a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory*
c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?*
d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot parse process status data
Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found
Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
5 & 6. The broker log is continually printing this error:
MainThread::INFO::2020-04-09 08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::DEBUG::2020-04-09 08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Running broker
MainThread::DEBUG::2020-04-09 08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) Starting monitor
MainThread::INFO::2020-04-09 08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
/submonitors
MainThread::INFO::2020-04-09 08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-09 08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-09 08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09 08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-09 08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-09 08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::DEBUG::2020-04-09 08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) Starting storage broker
MainThread::DEBUG::2020-04-09 08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting to VDSM
MainThread::DEBUG::2020-04-09 08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2020-04-09 08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 139992488138496)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7f528acabc90>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2020-04-09 08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) Stomp connection established
MainThread::DEBUG::2020-04-09 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2020-04-09 08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-04-09 08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available
MainThread::INFO::2020-04-09 08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) [{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
MainThread::INFO::2020-04-09 08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::DEBUG::2020-04-09 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Error refreshing storage domain: Command StorageDomain.getStats with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::DEBUG::2020-04-09 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::WARNING::2020-04-09 08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
The UUID it is moaning about is indeed the one that the HA sits on and is the one I listed the contents of in step 2 above.
So why can't it see this domain?
Thanks, Shareef.
On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Don't know if this is useful or not, but I just tried to shutdown and start another VM on one of the hosts and get the following error:
virsh # start scratch
error: Failed to start domain scratch
error: Network not found: no network with matching name 'vdsm-ovirtmgmt'
Is this not referring to the interface name as the network is called 'ovirtmgnt'.
On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Hmmm, virsh tells me the HE is running but it hasn't come up and
On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the
agent.log is full of the same errors.
On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ah hah! Ok, so I've managed to start it using virsh on the second host but my first host is still dead.
First of all, what are these 56,317 .prob- files that get dumped to the NFS mounts?
Secondly, why doesn't the node mount the NFS directories at boot? Is that the issue with this particular node?
On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote:
Did you try virsh list --inactive
Eric Evans
Digital Data Services LLC.
304.660.9080
*From:* Shareef Jalloq <shareef@jalloq.co.uk> *Sent:* Wednesday, April 8, 2020 5:58 PM *To:* Strahil Nikolov <hunter86_bg@yahoo.com> *Cc:* Ovirt Users <users@ovirt.org> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue?
I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get:
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted?
Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email.
Shareef.
On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list.
In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:
broker.log:
MainThread::INFO::2020-04-08
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-04-08
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-08
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
MainThread::INFO::2020-04-08
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-08
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-08
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-08
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-08
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::INFO::2020-04-08
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
agent.log:
MainThread::ERROR::2020-04-08
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent
MainThread::INFO::2020-04-08
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
MainThread::INFO::2020-04-08
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2020-04-08
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: ovirt-node-01.phoelex.com
MainThread::INFO::2020-04-08
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Initializing ha-broker connection
MainThread::INFO::2020-04-08
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
MainThread::ERROR::2020-04-08
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Failed to start necessary monitors
MainThread::ERROR::2020-04-08
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent
return action(he)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper
return he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 432, in start_monitoring
self._initialize_broker()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 556, in _initialize_broker
m.get('options', {}))
File
line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network',
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", options:
{'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}]
MainThread::ERROR::2020-04-08
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent
MainThread::INFO::2020-04-08
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote: >On the host you tried to restart the engine on: > >Add an alias to virsh (authenticates with virsh_auth.conf) > >alias virsh='virsh -c
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
> >Then run virsh: > >virsh > >virsh # list > Id Name State >---------------------------------------------------- > xx HostedEngine Paused > xx ********** running > ... > xx ********** running > >HostedEngine should be in the list, try and resume the engine: > >virsh # resume HostedEngine > >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> >wrote: > >> Thanks! >> >> The status hangs due to, I guess, the VM being down.... >> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >> VM exists and is down, cleaning up and restarting >> VM in WaitForLaunch >> >> but this doesn't seem to do anything. OK, after a while I get a >status of >> it being barfed... >> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt-node-00.phoelex.com >> Host ID : 1 >> Engine status : unknown stale-data >> Score : 3400 >> stopped : False >> Local maintenance : False >> crc32 : 9c4a034b >> local_conf_timestamp : 523362 >> Host timestamp : 523608 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineDown >> stopped=False >> >> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : True >> Hostname : ovirt-node-01.phoelex.com >> Host ID : 2 >> Engine status : {"reason": "bad vm status", >"health": >> "bad", "vm": "down_unexpected", "detail": "Down"} >> Score : 0 >> stopped : False >> Local maintenance : False >> crc32 : 5045f2eb >> local_conf_timestamp : 1737037 >> Host timestamp : 1737283 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >> host-id=2 >> score=0 >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUnexpectedlyDown >> stopped=False >> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett ><matonb@ltresources.co.uk> >> wrote: >> >>> First steps, on one of your hosts as root: >>> >>> To get information: >>> hosted-engine --vm-status >>> >>> To start the engine: >>> hosted-engine --vm-start >>> >>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> >wrote: >>> >>>> So my engine has gone down and I can't ssh into it either. If I >try to >>>> log into the web-ui of the node it is running on, I get redirected >because >>>> the node can't reach the engine. >>>> >>>> What are my next steps? >>>> >>>> Shareef. >>>> _______________________________________________ >>>> Users mailing list -- users@ovirt.org >>>> To unsubscribe send an email to users-leave@ovirt.org >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>> oVirt Code of Conduct: >>>> https://www.ovirt.org/community/about/community-guidelines/ >>>> List Archives: >>>> >
>>>> >>>
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov
Hi Shareef,
The flow of activation oVirt is more complex than a plain KVM. Mounting of the domains happen during the activation of the node (
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C... the
HostedEngine is activating everything needed).
Focus on the HostedEngine VM. Is it running properly ?
If not,try: 1. Verify that the storage domain exists 2. Check if it has 'ha_agents' directory 3. Check if the links are OK, if not you can safely remove the links
4. Next check the services are running: A) sanlock B) supervdsmd C) vdsmd D) libvirtd
5. Increase the log level for broker and agent services:
cd /etc/ovirt-hosted-engine-ha vim *-log.conf
systemctl restart ovirt-ha-broker ovirt-ha-agent
6. Check what they are complaining about Keep in mind that agent will keep throwing errors untill the broker stops doing it (agent depends on broker), so broker must be OK before peoceeding with the agent log.
About the manual VM start, you need 2 things:
1. Define the VM network # cat vdsm-ovirtmgmt.xml <network> <name>vdsm-ovirtmgmt</name> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> <forward mode='bridge'/> <bridge name='ovirtmgmt'/> </network>
[root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml
2. Get an xml definition which can be found in the vdsm log. Every VM at start up has it's configuration printed out in vdsm log on the host it starts. Save to file and then: A) virsh define myvm.xml B) virsh start myvm
It seems there is/was a problem with your NFS shares.
Best Regards, Strahil Nikolov
Hey Shareef, Check if there are any files or folders not owned by vdsm:kvm . Something like this: find . -not -user 36 -not -group 36 -print Also check if vdsm can access the images in the '<vol-mount-point>/images' directories. Best Regards, Strahil Nikolov

Do these files exist on the hosted engine? I am a Ovirt newbie but it sounds like file or disk corruption. How much actual storage space is left on the volume with he prob files? Or the hosted engine disk? Can you ssh into the hosted engine and put it in global maintenance and rerun engine-setup? What happened to trigger these errors? I'm coming in a bit late to the conversation. Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed
to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory*
c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?*
Eric Evans Digital Data Services LLC. 304.660.9080 -----Original Message----- From: Strahil Nikolov <hunter86_bg@yahoo.com> Sent: Thursday, April 9, 2020 12:57 PM To: Shareef Jalloq <shareef@jalloq.co.uk> Cc: eevans@digitaldatatechs.com; Ovirt Users <users@ovirt.org> Subject: [ovirt-users] Re: ovirt-engine unresponsive - how to rescue? On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, let's go through this. I'm looking at the node that at least still has some VMs running. virsh also tells me that the HostedEngine VM is running but it's unresponsive and I can't shut it down.
1. All storage domains exist and are mounted. 2. The ha_agent exists:
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ nas-01.phoelex.com\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26 f2/
dom_md ha_agent images master
3. There are two links
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
total 8
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace -> /var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42f e-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata -> /var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-725 0-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
4. The services exist but all seem to have some sort of warning:
a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory*
c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?*
d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot parse process status data
Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found
Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
5 & 6. The broker log is continually printing this error:
MainThread::INFO::2020-04-09 08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker:: (run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::DEBUG::2020-04-09 08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker:: (run) Running broker
MainThread::DEBUG::2020-04-09 08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker: :(_get_monitor) Starting monitor
MainThread::INFO::2020-04-09 08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
/submonitors
MainThread::INFO::2020-04-09 08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-09 08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-09 08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-09 08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09 08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-09 08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-09 08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-09 08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monito r::(_discover_submonitors) Finished loading submonitors
MainThread::DEBUG::2020-04-09 08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker: :(_get_storage_broker) Starting storage broker
MainThread::DEBUG::2020-04-09 08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage _backends::(connect) Connecting to VDSM
MainThread::DEBUG::2020-04-09 08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::( __log_debug) Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2020-04-09 08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 139992488138496)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7f528acabc90>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2020-04-09 08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient:: (_process_connected) Stomp connection established
MainThread::DEBUG::2020-04-09 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2020-04-09 08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage _backends::(connect) Connecting the storage
MainThread::INFO::2020-04-09 08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_s erver.StorageServer::(connect_storage_server) Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_s erver.StorageServer::(_validate_pre_connected_path) Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available
MainThread::INFO::2020-04-09 08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_s erver.StorageServer::(connect_storage_server) Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_s erver.StorageServer::(connect_storage_server) [{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
MainThread::INFO::2020-04-09 08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_s erver.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::DEBUG::2020-04-09 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_s erver.StorageServer::(connect_storage_server) Error refreshing storage domain: Command StorageDomain.getStats with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::DEBUG::2020-04-09 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage _backends::(_get_sector_size) Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::WARNING::2020-04-09 08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage _broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
The UUID it is moaning about is indeed the one that the HA sits on and is the one I listed the contents of in step 2 above.
So why can't it see this domain?
Thanks, Shareef.
On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Don't know if this is useful or not, but I just tried to shutdown and start another VM on one of the hosts and get the following error:
virsh # start scratch
error: Failed to start domain scratch
error: Network not found: no network with matching name 'vdsm-ovirtmgmt'
Is this not referring to the interface name as the network is called 'ovirtmgnt'.
On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Hmmm, virsh tells me the HE is running but it hasn't come up and
On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the
agent.log is full of the same errors.
On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ah hah! Ok, so I've managed to start it using virsh on the second host but my first host is still dead.
First of all, what are these 56,317 .prob- files that get dumped to the NFS mounts?
Secondly, why doesn't the node mount the NFS directories at boot? Is that the issue with this particular node?
On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote:
Did you try virsh list --inactive
Eric Evans
Digital Data Services LLC.
304.660.9080
*From:* Shareef Jalloq <shareef@jalloq.co.uk> *Sent:* Wednesday, April 8, 2020 5:58 PM *To:* Strahil Nikolov <hunter86_bg@yahoo.com> *Cc:* Ovirt Users <users@ovirt.org> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue?
I've now shut down the VMs on one host and rebooted it but the agent service doesn't start. If I run 'hosted-engine --vm-status' I get:
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
and indeed if I list the mounts under /rhev/data-center/mnt, only one of the directories is mounted. I have 3 NFS mounts, one ISO Domain and two Data Domains. Only one Data Domain has mounted and this has lots of .prob files in. So why haven't the other NFS exports been mounted?
Manually mounting them doesn't seem to have helped much either. I can start the broker service but the agent service says no. Same error as the one in my last email.
Shareef.
On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, still down. I've run virsh and it doesn't know anything about the engine vm.
I've restarted the broker and agent services and I still get nothing in virsh->list.
In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors:
broker.log:
MainThread::INFO::2020-04-08
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker: :(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submoni tors
MainThread::INFO::2020-04-08
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-08
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-08
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-08
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-08
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Finished loading submonitors
MainThread::INFO::2020-04-08
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storag e_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-08
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_ server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-08
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_ server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-08
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_ server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-08
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storag e_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::INFO::2020-04-08
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker: :(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-04-08
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monit or::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submoni tors
agent.log:
MainThread::ERROR::2020-04-08
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_ run_agent)
Trying to restart agent
MainThread::INFO::2020-04-08
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(ru n)
Agent shutting down
MainThread::INFO::2020-04-08
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(ru n)
ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2020-04-08
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_ engine.HostedEngine::(_get_hostname)
Found certificate common name: ovirt-node-01.phoelex.com
MainThread::INFO::2020-04-08
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_ engine.HostedEngine::(_initialize_broker)
Initializing ha-broker connection
MainThread::INFO::2020-04-08
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.Br okerLink::(start_monitor)
Starting monitor network, options {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
MainThread::ERROR::2020-04-08
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_ engine.HostedEngine::(_initialize_broker)
Failed to start necessary monitors
MainThread::ERROR::2020-04-08
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_ run_agent)
Traceback (most recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.p y",
line 131, in _run_agent
return action(he)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.p y",
line 55, in action_proper
return he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_ engine.py",
line 432, in start_monitoring
self._initialize_broker()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_ engine.py",
line 556, in _initialize_broker
m.get('options', {}))
File
line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network',
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlin k.py", options:
{'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}]
MainThread::ERROR::2020-04-08
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_ run_agent)
Trying to restart agent
MainThread::INFO::2020-04-08
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(ru n)
Agent shutting down
On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < matonb@ltresources.co.uk> wrote: >On the host you tried to restart the engine on: > >Add an alias to virsh (authenticates with virsh_auth.conf) > >alias virsh='virsh -c
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
> >Then run virsh: > >virsh > >virsh # list > Id Name State >---------------------------------------------------- > xx HostedEngine Paused > xx ********** running > ... > xx ********** running > >HostedEngine should be in the list, try and resume the engine: > >virsh # resume HostedEngine > >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> >wrote: > >> Thanks! >> >> The status hangs due to, I guess, the VM being down.... >> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start VM exists >> and is down, cleaning up and restarting VM in WaitForLaunch >> >> but this doesn't seem to do anything. OK, after a while I get a >status of >> it being barfed... >> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt-node-00.phoelex.com >> Host ID : 1 >> Engine status : unknown stale-data >> Score : 3400 >> stopped : False >> Local maintenance : False >> crc32 : 9c4a034b >> local_conf_timestamp : 523362 >> Host timestamp : 523608 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) >> conf_on_shared_storage=True maintenance=False >> state=EngineDown stopped=False >> >> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : True >> Hostname : ovirt-node-01.phoelex.com >> Host ID : 2 >> Engine status : {"reason": "bad vm status", >"health": >> "bad", "vm": "down_unexpected", "detail": "Down"} >> Score : 0 >> stopped : False >> Local maintenance : False >> crc32 : 5045f2eb >> local_conf_timestamp : 1737037 >> Host timestamp : 1737283 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >> host-id=2 >> score=0 >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) >> conf_on_shared_storage=True maintenance=False >> state=EngineUnexpectedlyDown stopped=False >> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett ><matonb@ltresources.co.uk> >> wrote: >> >>> First steps, on one of your hosts as root: >>> >>> To get information: >>> hosted-engine --vm-status >>> >>> To start the engine: >>> hosted-engine --vm-start >>> >>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> >wrote: >>> >>>> So my engine has gone down and I can't ssh into it either. If I >try to >>>> log into the web-ui of the node it is running on, I get redirected >because >>>> the node can't reach the engine. >>>> >>>> What are my next steps? >>>> >>>> Shareef. >>>> _______________________________________________ >>>> Users mailing list -- users@ovirt.org To unsubscribe send >>>> an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>> oVirt Code of Conduct: >>>> https://www.ovirt.org/community/about/community-guidelines/ >>>> List Archives: >>>> >
>>>> >>>
This has to be resolved:
Engine status : unknown stale-data
Run again 'hosted-engine --vm-status'. If it remains the same, restart ovirt-ha-broker.service & ovirt-ha-agent.service
Verify that the engine's storage is available. Then monitor the broker & agent logs in /var/log/ovirt-hosted-engine-ha
Best Regards, Strahil Nikolov
Hi Shareef,
The flow of activation oVirt is more complex than a plain KVM. Mounting of the domains happen during the activation of the node (
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCI RSW5CDRQWR5MIKJUH3ISLCQ/ the
HostedEngine is activating everything needed).
Focus on the HostedEngine VM. Is it running properly ?
If not,try: 1. Verify that the storage domain exists 2. Check if it has 'ha_agents' directory 3. Check if the links are OK, if not you can safely remove the links
4. Next check the services are running: A) sanlock B) supervdsmd C) vdsmd D) libvirtd
5. Increase the log level for broker and agent services:
cd /etc/ovirt-hosted-engine-ha vim *-log.conf
systemctl restart ovirt-ha-broker ovirt-ha-agent
6. Check what they are complaining about Keep in mind that agent will keep throwing errors untill the broker stops doing it (agent depends on broker), so broker must be OK before peoceeding with the agent log.
About the manual VM start, you need 2 things:
1. Define the VM network # cat vdsm-ovirtmgmt.xml <network> <name>vdsm-ovirtmgmt</name> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> <forward mode='bridge'/> <bridge name='ovirtmgmt'/> </network>
[root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml
2. Get an xml definition which can be found in the vdsm log. Every VM at start up has it's configuration printed out in vdsm log on the host it starts. Save to file and then: A) virsh define myvm.xml B) virsh start myvm
It seems there is/was a problem with your NFS shares.
Best Regards, Strahil Nikolov
Hey Shareef, Check if there are any files or folders not owned by vdsm:kvm . Something like this: find . -not -user 36 -not -group 36 -print Also check if vdsm can access the images in the '<vol-mount-point>/images' directories. Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/N42KAKSIBDYWAU...

Where should I be checking if there are any files/folder not owned by vdsm:kvm? I checked on the mount the HA sits on and it's fine. How would I go about checking vdsm can access those images? If I run virsh, it lists them and they were running yesterday even though the HA was down. I've since restarted both hosts but the broker is still spitting out the same error (copied below). How do I find the reason the broker can't connect to the storage? The conf file is already at DEBUG verbosity: [handler_logfile] class=logging.handlers.TimedRotatingFileHandler args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7) level=DEBUG formatter=long And what are all these .prob-<num> files that are being created? There are over 250K of them now on the mount I'm using for the Data domain. They're all of 0 size and of the form, /rhev/data-center/mnt/nas-01.phoelex.com: _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 @eevans: The volume I have the Data Domain on has TB's free. The HA is dead so I can't ssh in. No idea what started these errors and the other VMs were still running happily although they're on a different Data Domain. Shareef. MainThread::INFO::2020-04-10 07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage MainThread::INFO::2020-04-10 07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-10 07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2020-04-10 07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain MainThread::WARNING::2020-04-10 07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: (code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
OK, let's go through this. I'm looking at the node that at least still has some VMs running. virsh also tells me that the HostedEngine VM is running but it's unresponsive and I can't shut it down.
1. All storage domains exist and are mounted. 2. The ha_agent exists:
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
dom_md ha_agent images master
3. There are two links
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
total 8
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace ->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata ->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
4. The services exist but all seem to have some sort of warning:
a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory*
c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?*
d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot parse process status data
Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found
Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
5 & 6. The broker log is continually printing this error:
MainThread::INFO::2020-04-09
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Running broker
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) Starting monitor
MainThread::INFO::2020-04-09
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
/submonitors
MainThread::INFO::2020-04-09
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::DEBUG::2020-04-09
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) Starting storage broker
MainThread::DEBUG::2020-04-09
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting to VDSM
MainThread::DEBUG::2020-04-09
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2020-04-09 08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 139992488138496)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7f528acabc90>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2020-04-09
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) Stomp connection established
MainThread::DEBUG::2020-04-09 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2020-04-09
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-04-09
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available
MainThread::INFO::2020-04-09
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) [{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
MainThread::INFO::2020-04-09
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::DEBUG::2020-04-09 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Error refreshing storage domain: Command StorageDomain.getStats with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::DEBUG::2020-04-09 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::WARNING::2020-04-09
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
The UUID it is moaning about is indeed the one that the HA sits on and is the one I listed the contents of in step 2 above.
So why can't it see this domain?
Thanks, Shareef.
On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Don't know if this is useful or not, but I just tried to shutdown and start another VM on one of the hosts and get the following error:
virsh # start scratch
error: Failed to start domain scratch
error: Network not found: no network with matching name 'vdsm-ovirtmgmt'
Is this not referring to the interface name as the network is called 'ovirtmgnt'.
On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Hmmm, virsh tells me the HE is running but it hasn't come up and
On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the
agent.log is full of the same errors.
On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ah hah! Ok, so I've managed to start it using virsh on the second host but my first host is still dead.
First of all, what are these 56,317 .prob- files that get dumped to the NFS mounts?
Secondly, why doesn't the node mount the NFS directories at boot? Is that the issue with this particular node?
On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote:
> Did you try virsh list --inactive > > > > Eric Evans > > Digital Data Services LLC. > > 304.660.9080 > > > > *From:* Shareef Jalloq <shareef@jalloq.co.uk> > *Sent:* Wednesday, April 8, 2020 5:58 PM > *To:* Strahil Nikolov <hunter86_bg@yahoo.com> > *Cc:* Ovirt Users <users@ovirt.org> > *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue? > > > > I've now shut down the VMs on one host and rebooted it but the agent > service doesn't start. If I run 'hosted-engine --vm-status' I get: > > > > The hosted engine configuration has not been retrieved from shared > storage. Please ensure that ovirt-ha-agent is running and the storage > server is reachable. > > > > and indeed if I list the mounts under /rhev/data-center/mnt, only one of > the directories is mounted. I have 3 NFS mounts, one ISO Domain and two > Data Domains. Only one Data Domain has mounted and this has lots of .prob > files in. So why haven't the other NFS exports been mounted? > > > > Manually mounting them doesn't seem to have helped much either. I can > start the broker service but the agent service says no. Same error as the > one in my last email. > > > > Shareef. > > > > On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> > wrote: > > Right, still down. I've run virsh and it doesn't know anything about > the engine vm. > > > > I've restarted the broker and agent services and I still get nothing in > virsh->list. > > > > In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors: > > > > broker.log: > > > > MainThread::INFO::2020-04-08 >
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> ovirt-hosted-engine-ha broker 2.3.6 started > > MainThread::INFO::2020-04-08 >
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Searching for submonitors in >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> > MainThread::INFO::2020-04-08 >
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor network > > MainThread::INFO::2020-04-08 >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor cpu-load-no-engine > > MainThread::INFO::2020-04-08 >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mgmt-bridge > > MainThread::INFO::2020-04-08 >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor network > > MainThread::INFO::2020-04-08 >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor cpu-load > > MainThread::INFO::2020-04-08 >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor engine-health > > MainThread::INFO::2020-04-08 >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mgmt-bridge > > MainThread::INFO::2020-04-08 >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor cpu-load-no-engine > > MainThread::INFO::2020-04-08 >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor cpu-load > > MainThread::INFO::2020-04-08 >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mem-free > > MainThread::INFO::2020-04-08 >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor storage-domain > > MainThread::INFO::2020-04-08 >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor storage-domain > > MainThread::INFO::2020-04-08 >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor mem-free > > MainThread::INFO::2020-04-08 >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Loaded submonitor engine-health > > MainThread::INFO::2020-04-08 >
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Finished loading submonitors > > MainThread::INFO::2020-04-08 >
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> Connecting the storage > > MainThread::INFO::2020-04-08 >
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server > > MainThread::INFO::2020-04-08 >
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server > > MainThread::INFO::2020-04-08 >
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Refreshing the storage domain > > MainThread::WARNING::2020-04-08 >
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> Can't connect vdsm storage: Command StorageDomain.getInfo with args > {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > > (code=350, message=Error in storage domain action: > (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > MainThread::INFO::2020-04-08 >
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> ovirt-hosted-engine-ha broker 2.3.6 started > > MainThread::INFO::2020-04-08 >
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> Searching for submonitors in >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> > > > agent.log: > > > > MainThread::ERROR::2020-04-08 >
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Trying to restart agent > > MainThread::INFO::2020-04-08 >
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> Agent shutting down > > MainThread::INFO::2020-04-08 >
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> ovirt-hosted-engine-ha agent 2.3.6 started > > MainThread::INFO::2020-04-08 >
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> Found certificate common name: ovirt-node-01.phoelex.com > > MainThread::INFO::2020-04-08 >
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> Initializing ha-broker connection > > MainThread::INFO::2020-04-08 >
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Starting monitor network, options {'tcp_t_address': '', 'network_test': > 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > > MainThread::ERROR::2020-04-08 >
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> Failed to start necessary monitors > > MainThread::ERROR::2020-04-08 >
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Traceback (most recent call last): > > File >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 131, in _run_agent > > return action(he) > > File >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 55, in action_proper > > return he.start_monitoring() > > File >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 432, in start_monitoring > > self._initialize_broker() > > File >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 556, in _initialize_broker > > m.get('options', {})) > > File >
> line 89, in start_monitor > > ).format(t=type, o=options, e=e) > > RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: > [Errno 2] No such file or directory, [monitor: 'network',
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", options:
> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': > '192.168.1.99'}] > > > > MainThread::ERROR::2020-04-08 >
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Trying to restart agent > > MainThread::INFO::2020-04-08 >
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> Agent shutting down > > > > On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> > wrote: > > On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < > matonb@ltresources.co.uk> wrote: > >On the host you tried to restart the engine on: > > > >Add an alias to virsh (authenticates with virsh_auth.conf) > > > >alias virsh='virsh -c > qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' > > > >Then run virsh: > > > >virsh > > > >virsh # list > > Id Name State > >---------------------------------------------------- > > xx HostedEngine Paused > > xx ********** running > > ... > > xx ********** running > > > >HostedEngine should be in the list, try and resume the engine: > > > >virsh # resume HostedEngine > > > >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> > >wrote: > > > >> Thanks! > >> > >> The status hangs due to, I guess, the VM being down.... > >> > >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >> VM exists and is down, cleaning up and restarting > >> VM in WaitForLaunch > >> > >> but this doesn't seem to do anything. OK, after a while I get a > >status of > >> it being barfed... > >> > >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- > >> > >> conf_on_shared_storage : True > >> Status up-to-date : False > >> Hostname : ovirt-node-00.phoelex.com > >> Host ID : 1 > >> Engine status : unknown stale-data > >> Score : 3400 > >> stopped : False > >> Local maintenance : False > >> crc32 : 9c4a034b > >> local_conf_timestamp : 523362 > >> Host timestamp : 523608 > >> Extra metadata (valid at timestamp): > >> metadata_parse_version=1 > >> metadata_feature_version=1 > >> timestamp=523608 (Wed Apr 8 16:17:11 2020) > >> host-id=1 > >> score=3400 > >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) > >> conf_on_shared_storage=True > >> maintenance=False > >> state=EngineDown > >> stopped=False > >> > >> > >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- > >> > >> conf_on_shared_storage : True > >> Status up-to-date : True > >> Hostname : ovirt-node-01.phoelex.com > >> Host ID : 2 > >> Engine status : {"reason": "bad vm status", > >"health": > >> "bad", "vm": "down_unexpected", "detail": "Down"} > >> Score : 0 > >> stopped : False > >> Local maintenance : False > >> crc32 : 5045f2eb > >> local_conf_timestamp : 1737037 > >> Host timestamp : 1737283 > >> Extra metadata (valid at timestamp): > >> metadata_parse_version=1 > >> metadata_feature_version=1 > >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >> host-id=2 > >> score=0 > >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) > >> conf_on_shared_storage=True > >> maintenance=False > >> state=EngineUnexpectedlyDown > >> stopped=False > >> > >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > ><matonb@ltresources.co.uk> > >> wrote: > >> > >>> First steps, on one of your hosts as root: > >>> > >>> To get information: > >>> hosted-engine --vm-status > >>> > >>> To start the engine: > >>> hosted-engine --vm-start > >>> > >>> > >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> > >wrote: > >>> > >>>> So my engine has gone down and I can't ssh into it either. If I > >try to > >>>> log into the web-ui of the node it is running on, I get redirected > >because > >>>> the node can't reach the engine. > >>>> > >>>> What are my next steps? > >>>> > >>>> Shareef. > >>>> _______________________________________________ > >>>> Users mailing list -- users@ovirt.org > >>>> To unsubscribe send an email to users-leave@ovirt.org > >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>>> oVirt Code of Conduct: > >>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>> List Archives: > >>>> > > >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
> >>>> > >>> > > This has to be resolved: > > Engine status : unknown stale-data > > Run again 'hosted-engine --vm-status'. If it remains the same, restart > ovirt-ha-broker.service & ovirt-ha-agent.service > > Verify that the engine's storage is available. Then monitor the broker > & agent logs in /var/log/ovirt-hosted-engine-ha > > Best Regards, > Strahil Nikolov > > > >
Hi Shareef,
The flow of activation oVirt is more complex than a plain KVM. Mounting of the domains happen during the activation of the node ( the HostedEngine is activating everything needed).
Focus on the HostedEngine VM. Is it running properly ?
If not,try: 1. Verify that the storage domain exists 2. Check if it has 'ha_agents' directory 3. Check if the links are OK, if not you can safely remove the links
4. Next check the services are running: A) sanlock B) supervdsmd C) vdsmd D) libvirtd
5. Increase the log level for broker and agent services:
cd /etc/ovirt-hosted-engine-ha vim *-log.conf
systemctl restart ovirt-ha-broker ovirt-ha-agent
6. Check what they are complaining about Keep in mind that agent will keep throwing errors untill the broker stops doing it (agent depends on broker), so broker must be OK before peoceeding with the agent log.
About the manual VM start, you need 2 things:
1. Define the VM network # cat vdsm-ovirtmgmt.xml <network> <name>vdsm-ovirtmgmt</name> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> <forward mode='bridge'/> <bridge name='ovirtmgmt'/> </network>
[root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml
2. Get an xml definition which can be found in the vdsm log. Every VM at start up has it's configuration printed out in vdsm log on the host it starts. Save to file and then: A) virsh define myvm.xml B) virsh start myvm
It seems there is/was a problem with your NFS shares.
Best Regards, Strahil Nikolov
Hey Shareef,
Check if there are any files or folders not owned by vdsm:kvm . Something like this:
find . -not -user 36 -not -group 36 -print
Also check if vdsm can access the images in the '<vol-mount-point>/images' directories.
Best Regards, Strahil Nikolov

Right, I've given up on recovering the HE so want to try and redeploy it. There doesn't seem to be enough information to debug why the broker/agent won't start cleanly. In running 'hosted-engine --deploy', I'm seeing the following error in the setup validation phase: 2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please provide the hostname of this host on the management network [ovirt-node-00.phoelex.com]: 2020-04-14 09:46:12,831+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getResolvedAddresses:432 getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61']) 2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com resolves to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) 2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', 'ANY'], executable='None', cwd='None', env=None 2020-04-14 09:46:12,871+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'], rc=0 2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stdout: ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stderr: 2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None 2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr'), rc=0 2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr') stdout: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000 link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff 3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff 5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff 21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:febc:326a/64 scope link valid_lft forever preferred_lft forever 22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff 2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr') stderr: 2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getLocalAddresses:251 addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] 2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 test_hostname exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 460, in test_hostname not_local_text, File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 342, in _validateFQDNresolvability addresses=resolvedAddressesAsString RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host 2020-04-14 09:46:12,884+0000 ERROR otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host name is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host The node I'm running on has an IP address of .61 and resolves correctly. On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Where should I be checking if there are any files/folder not owned by vdsm:kvm? I checked on the mount the HA sits on and it's fine.
How would I go about checking vdsm can access those images? If I run virsh, it lists them and they were running yesterday even though the HA was down. I've since restarted both hosts but the broker is still spitting out the same error (copied below). How do I find the reason the broker can't connect to the storage? The conf file is already at DEBUG verbosity:
[handler_logfile]
class=logging.handlers.TimedRotatingFileHandler
args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
level=DEBUG
formatter=long
And what are all these .prob-<num> files that are being created? There are over 250K of them now on the mount I'm using for the Data domain. They're all of 0 size and of the form, /rhev/data-center/mnt/nas-01.phoelex.com: _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
@eevans: The volume I have the Data Domain on has TB's free. The HA is dead so I can't ssh in. No idea what started these errors and the other VMs were still running happily although they're on a different Data Domain.
Shareef.
MainThread::INFO::2020-04-10 07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-04-10 07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-10 07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2020-04-10 07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::WARNING::2020-04-10 07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
OK, let's go through this. I'm looking at the node that at least still has some VMs running. virsh also tells me that the HostedEngine VM is running but it's unresponsive and I can't shut it down.
1. All storage domains exist and are mounted. 2. The ha_agent exists:
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
dom_md ha_agent images master
3. There are two links
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
total 8
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace ->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata ->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
4. The services exist but all seem to have some sort of warning:
a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory*
c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?*
d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot parse process status data
Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found
Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
5 & 6. The broker log is continually printing this error:
MainThread::INFO::2020-04-09
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Running broker
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) Starting monitor
MainThread::INFO::2020-04-09
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
/submonitors
MainThread::INFO::2020-04-09
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::DEBUG::2020-04-09
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) Starting storage broker
MainThread::DEBUG::2020-04-09
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting to VDSM
MainThread::DEBUG::2020-04-09
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2020-04-09 08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 139992488138496)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7f528acabc90>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2020-04-09
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) Stomp connection established
MainThread::DEBUG::2020-04-09 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2020-04-09
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) Connecting the storage
MainThread::INFO::2020-04-09
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available
MainThread::INFO::2020-04-09
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) [{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
MainThread::INFO::2020-04-09
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::DEBUG::2020-04-09 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Error refreshing storage domain: Command StorageDomain.getStats with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::DEBUG::2020-04-09 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::WARNING::2020-04-09
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
The UUID it is moaning about is indeed the one that the HA sits on and is the one I listed the contents of in step 2 above.
So why can't it see this domain?
Thanks, Shareef.
On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Don't know if this is useful or not, but I just tried to shutdown and start another VM on one of the hosts and get the following error:
virsh # start scratch
error: Failed to start domain scratch
error: Network not found: no network with matching name 'vdsm-ovirtmgmt'
Is this not referring to the interface name as the network is called 'ovirtmgnt'.
On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Hmmm, virsh tells me the HE is running but it hasn't come up and
On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the
agent.log is full of the same errors.
On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
> Ah hah! Ok, so I've managed to start it using virsh on the second host > but my first host is still dead. > > First of all, what are these 56,317 .prob- files that get dumped to the > NFS mounts? > > Secondly, why doesn't the node mount the NFS directories at boot? Is > that the issue with this particular node? > > On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote: > >> Did you try virsh list --inactive >> >> >> >> Eric Evans >> >> Digital Data Services LLC. >> >> 304.660.9080 >> >> >> >> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >> *Sent:* Wednesday, April 8, 2020 5:58 PM >> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >> *Cc:* Ovirt Users <users@ovirt.org> >> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue? >> >> >> >> I've now shut down the VMs on one host and rebooted it but the agent >> service doesn't start. If I run 'hosted-engine --vm-status' I get: >> >> >> >> The hosted engine configuration has not been retrieved from shared >> storage. Please ensure that ovirt-ha-agent is running and the storage >> server is reachable. >> >> >> >> and indeed if I list the mounts under /rhev/data-center/mnt, only one of >> the directories is mounted. I have 3 NFS mounts, one ISO Domain and two >> Data Domains. Only one Data Domain has mounted and this has lots of .prob >> files in. So why haven't the other NFS exports been mounted? >> >> >> >> Manually mounting them doesn't seem to have helped much either. I can >> start the broker service but the agent service says no. Same error as the >> one in my last email. >> >> >> >> Shareef. >> >> >> >> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> >> wrote: >> >> Right, still down. I've run virsh and it doesn't know anything about >> the engine vm. >> >> >> >> I've restarted the broker and agent services and I still get nothing in >> virsh->list. >> >> >> >> In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors: >> >> >> >> broker.log: >> >> >> >> MainThread::INFO::2020-04-08 >>
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> ovirt-hosted-engine-ha broker 2.3.6 started >> >> MainThread::INFO::2020-04-08 >>
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Searching for submonitors in >>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>> >> MainThread::INFO::2020-04-08 >>
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor network >> >> MainThread::INFO::2020-04-08 >>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor cpu-load-no-engine >> >> MainThread::INFO::2020-04-08 >>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mgmt-bridge >> >> MainThread::INFO::2020-04-08 >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor network >> >> MainThread::INFO::2020-04-08 >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor cpu-load >> >> MainThread::INFO::2020-04-08 >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor engine-health >> >> MainThread::INFO::2020-04-08 >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mgmt-bridge >> >> MainThread::INFO::2020-04-08 >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor cpu-load-no-engine >> >> MainThread::INFO::2020-04-08 >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor cpu-load >> >> MainThread::INFO::2020-04-08 >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mem-free >> >> MainThread::INFO::2020-04-08 >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor storage-domain >> >> MainThread::INFO::2020-04-08 >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor storage-domain >> >> MainThread::INFO::2020-04-08 >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor mem-free >> >> MainThread::INFO::2020-04-08 >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Loaded submonitor engine-health >> >> MainThread::INFO::2020-04-08 >>
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Finished loading submonitors >> >> MainThread::INFO::2020-04-08 >>
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> Connecting the storage >> >> MainThread::INFO::2020-04-08 >>
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Connecting storage server >> >> MainThread::INFO::2020-04-08 >>
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Connecting storage server >> >> MainThread::INFO::2020-04-08 >>
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Refreshing the storage domain >> >> MainThread::WARNING::2020-04-08 >>
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>> Can't connect vdsm storage: Command StorageDomain.getInfo with args >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >> >> (code=350, message=Error in storage domain action: >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> MainThread::INFO::2020-04-08 >>
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> ovirt-hosted-engine-ha broker 2.3.6 started >> >> MainThread::INFO::2020-04-08 >>
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> Searching for submonitors in >>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>> >> >> >> agent.log: >> >> >> >> MainThread::ERROR::2020-04-08 >>
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Trying to restart agent >> >> MainThread::INFO::2020-04-08 >>
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> Agent shutting down >> >> MainThread::INFO::2020-04-08 >>
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> ovirt-hosted-engine-ha agent 2.3.6 started >> >> MainThread::INFO::2020-04-08 >>
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>> Found certificate common name: ovirt-node-01.phoelex.com >> >> MainThread::INFO::2020-04-08 >>
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>> Initializing ha-broker connection >> >> MainThread::INFO::2020-04-08 >>
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>> Starting monitor network, options {'tcp_t_address': '', 'network_test': >> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >> >> MainThread::ERROR::2020-04-08 >>
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>> Failed to start necessary monitors >> >> MainThread::ERROR::2020-04-08 >>
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Traceback (most recent call last): >> >> File >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 131, in _run_agent >> >> return action(he) >> >> File >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 55, in action_proper >> >> return he.start_monitoring() >> >> File >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 432, in start_monitoring >> >> self._initialize_broker() >> >> File >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 556, in _initialize_broker >> >> m.get('options', {})) >> >> File >>
>> line 89, in start_monitor >> >> ).format(t=type, o=options, e=e) >> >> RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: >> [Errno 2] No such file or directory, [monitor: 'network',
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", options:
>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': >> '192.168.1.99'}] >> >> >> >> MainThread::ERROR::2020-04-08 >>
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Trying to restart agent >> >> MainThread::INFO::2020-04-08 >>
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> Agent shutting down >> >> >> >> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> >> wrote: >> >> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < >> matonb@ltresources.co.uk> wrote: >> >On the host you tried to restart the engine on: >> > >> >Add an alias to virsh (authenticates with virsh_auth.conf) >> > >> >alias virsh='virsh -c >> qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >> > >> >Then run virsh: >> > >> >virsh >> > >> >virsh # list >> > Id Name State >> >---------------------------------------------------- >> > xx HostedEngine Paused >> > xx ********** running >> > ... >> > xx ********** running >> > >> >HostedEngine should be in the list, try and resume the engine: >> > >> >virsh # resume HostedEngine >> > >> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> >> >wrote: >> > >> >> Thanks! >> >> >> >> The status hangs due to, I guess, the VM being down.... >> >> >> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >> >> VM exists and is down, cleaning up and restarting >> >> VM in WaitForLaunch >> >> >> >> but this doesn't seem to do anything. OK, after a while I get a >> >status of >> >> it being barfed... >> >> >> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- >> >> >> >> conf_on_shared_storage : True >> >> Status up-to-date : False >> >> Hostname : ovirt-node-00.phoelex.com >> >> Host ID : 1 >> >> Engine status : unknown stale-data >> >> Score : 3400 >> >> stopped : False >> >> Local maintenance : False >> >> crc32 : 9c4a034b >> >> local_conf_timestamp : 523362 >> >> Host timestamp : 523608 >> >> Extra metadata (valid at timestamp): >> >> metadata_parse_version=1 >> >> metadata_feature_version=1 >> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >> >> host-id=1 >> >> score=3400 >> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) >> >> conf_on_shared_storage=True >> >> maintenance=False >> >> state=EngineDown >> >> stopped=False >> >> >> >> >> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- >> >> >> >> conf_on_shared_storage : True >> >> Status up-to-date : True >> >> Hostname : ovirt-node-01.phoelex.com >> >> Host ID : 2 >> >> Engine status : {"reason": "bad vm status", >> >"health": >> >> "bad", "vm": "down_unexpected", "detail": "Down"} >> >> Score : 0 >> >> stopped : False >> >> Local maintenance : False >> >> crc32 : 5045f2eb >> >> local_conf_timestamp : 1737037 >> >> Host timestamp : 1737283 >> >> Extra metadata (valid at timestamp): >> >> metadata_parse_version=1 >> >> metadata_feature_version=1 >> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >> >> host-id=2 >> >> score=0 >> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) >> >> conf_on_shared_storage=True >> >> maintenance=False >> >> state=EngineUnexpectedlyDown >> >> stopped=False >> >> >> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >> ><matonb@ltresources.co.uk> >> >> wrote: >> >> >> >>> First steps, on one of your hosts as root: >> >>> >> >>> To get information: >> >>> hosted-engine --vm-status >> >>> >> >>> To start the engine: >> >>> hosted-engine --vm-start >> >>> >> >>> >> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> >> >wrote: >> >>> >> >>>> So my engine has gone down and I can't ssh into it either. If I >> >try to >> >>>> log into the web-ui of the node it is running on, I get redirected >> >because >> >>>> the node can't reach the engine. >> >>>> >> >>>> What are my next steps? >> >>>> >> >>>> Shareef. >> >>>> _______________________________________________ >> >>>> Users mailing list -- users@ovirt.org >> >>>> To unsubscribe send an email to users-leave@ovirt.org >> >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> >>>> oVirt Code of Conduct: >> >>>> https://www.ovirt.org/community/about/community-guidelines/ >> >>>> List Archives: >> >>>> >> > >>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>> >>>> >> >>> >> >> This has to be resolved: >> >> Engine status : unknown stale-data >> >> Run again 'hosted-engine --vm-status'. If it remains the same, restart >> ovirt-ha-broker.service & ovirt-ha-agent.service >> >> Verify that the engine's storage is available. Then monitor the broker >> & agent logs in /var/log/ovirt-hosted-engine-ha >> >> Best Regards, >> Strahil Nikolov >> >> >> >>
Hi Shareef,
The flow of activation oVirt is more complex than a plain KVM. Mounting of the domains happen during the activation of the node ( the HostedEngine is activating everything needed).
Focus on the HostedEngine VM. Is it running properly ?
If not,try: 1. Verify that the storage domain exists 2. Check if it has 'ha_agents' directory 3. Check if the links are OK, if not you can safely remove the links
4. Next check the services are running: A) sanlock B) supervdsmd C) vdsmd D) libvirtd
5. Increase the log level for broker and agent services:
cd /etc/ovirt-hosted-engine-ha vim *-log.conf
systemctl restart ovirt-ha-broker ovirt-ha-agent
6. Check what they are complaining about Keep in mind that agent will keep throwing errors untill the broker stops doing it (agent depends on broker), so broker must be OK before peoceeding with the agent log.
About the manual VM start, you need 2 things:
1. Define the VM network # cat vdsm-ovirtmgmt.xml <network> <name>vdsm-ovirtmgmt</name> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> <forward mode='bridge'/> <bridge name='ovirtmgmt'/> </network>
[root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml
2. Get an xml definition which can be found in the vdsm log. Every VM at start up has it's configuration printed out in vdsm log on the host it starts. Save to file and then: A) virsh define myvm.xml B) virsh start myvm
It seems there is/was a problem with your NFS shares.
Best Regards, Strahil Nikolov
Hey Shareef,
Check if there are any files or folders not owned by vdsm:kvm . Something like this:
find . -not -user 36 -not -group 36 -print
Also check if vdsm can access the images in the '<vol-mount-point>/images' directories.
Best Regards, Strahil Nikolov

On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Right, I've given up on recovering the HE so want to try and redeploy it. There doesn't seem to be enough information to debug why the broker/agent won't start cleanly.
In running 'hosted-engine --deploy', I'm seeing the following error in the setup validation phase:
2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please provide the hostname of this host on the management network [ovirt-node-00.phoelex.com]:
2020-04-14 09:46:12,831+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getResolvedAddresses:432 getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com resolves to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', 'ANY'], executable='None', cwd='None', env=None
2020-04-14 09:46:12,871+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'], rc=0
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stdout:
ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stderr:
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr'), rc=0
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr') stdout:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:febc:326a/64 scope link
valid_lft forever preferred_lft forever
22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr') stderr:
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getLocalAddresses:251 addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 test_hostname exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 460, in test_hostname
not_local_text,
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 342, in _validateFQDNresolvability
addresses=resolvedAddressesAsString
RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
2020-04-14 09:46:12,884+0000 ERROR otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host name is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
The node I'm running on has an IP address of .61 and resolves correctly.
On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Where should I be checking if there are any files/folder not owned by vdsm:kvm? I checked on the mount the HA sits on and it's fine.
How would I go about checking vdsm can access those images? If I run virsh, it lists them and they were running yesterday even though the HA was down. I've since restarted both hosts but the broker is still spitting out the same error (copied below). How do I find the reason the broker can't connect to the storage? The conf file is already at DEBUG verbosity:
[handler_logfile]
class=logging.handlers.TimedRotatingFileHandler
args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
level=DEBUG
formatter=long
And what are all these .prob-<num> files that are being created? There are over 250K of them now on the mount I'm using for the Data domain. They're all of 0 size and of the form, /rhev/data-center/mnt/nas-01.phoelex.com: _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
@eevans: The volume I have the Data Domain on has TB's free. The HA is dead so I can't ssh in. No idea what started these errors and the other VMs were still running happily although they're on a different Data Domain.
Shareef.
MainThread::INFO::2020-04-10
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-10
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-10
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
OK, let's go through this. I'm looking at the node that at least still has some VMs running. virsh also tells me that the HostedEngine VM is running but it's unresponsive and I can't shut it down.
1. All storage domains exist and are mounted. 2. The ha_agent exists:
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
dom_md ha_agent images master
3. There are two links
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
total 8
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace ->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata ->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
4. The services exist but all seem to have some sort of warning:
a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]:
*2020-04-08
18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory*
c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
the Hosted Engine setup finished?*
d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot parse process status data
Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found
Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
5 & 6. The broker log is continually printing this error:
MainThread::INFO::2020-04-09
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
Running broker
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
Starting monitor
MainThread::INFO::2020-04-09
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
/submonitors
MainThread::INFO::2020-04-09
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
MainThread::DEBUG::2020-04-09
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
Starting storage broker
MainThread::DEBUG::2020-04-09
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting to VDSM
MainThread::DEBUG::2020-04-09
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2020-04-09 08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 139992488138496)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7f528acabc90>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2020-04-09
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
Stomp connection established
MainThread::DEBUG::2020-04-09 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2020-04-09
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-09
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available
MainThread::INFO::2020-04-09
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
MainThread::INFO::2020-04-09
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::DEBUG::2020-04-09 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Error refreshing storage domain: Command StorageDomain.getStats with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::DEBUG::2020-04-09 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::WARNING::2020-04-09
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
The UUID it is moaning about is indeed the one that the HA sits on and is the one I listed the contents of in step 2 above.
So why can't it see this domain?
Thanks, Shareef.
On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Don't know if this is useful or not, but I just tried to shutdown and start another VM on one of the hosts and get the following error:
virsh # start scratch
error: Failed to start domain scratch
error: Network not found: no network with matching name 'vdsm-ovirtmgmt'
Is this not referring to the interface name as the network is called 'ovirtmgnt'.
On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
> Hmmm, virsh tells me the HE is running but it hasn't come up and
On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the
> agent.log is full of the same errors. > > On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> > wrote: > >> Ah hah! Ok, so I've managed to start it using virsh on the second host >> but my first host is still dead. >> >> First of all, what are these 56,317 .prob- files that get dumped to the >> NFS mounts? >> >> Secondly, why doesn't the node mount the NFS directories at boot? Is >> that the issue with this particular node? >> >> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote: >> >>> Did you try virsh list --inactive >>> >>> >>> >>> Eric Evans >>> >>> Digital Data Services LLC. >>> >>> 304.660.9080 >>> >>> >>> >>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >>> *Sent:* Wednesday, April 8, 2020 5:58 PM >>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >>> *Cc:* Ovirt Users <users@ovirt.org> >>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to rescue? >>> >>> >>> >>> I've now shut down the VMs on one host and rebooted it but
agent >>> service doesn't start. If I run 'hosted-engine --vm-status' I get: >>> >>> >>> >>> The hosted engine configuration has not been retrieved from shared >>> storage. Please ensure that ovirt-ha-agent is running and
storage >>> server is reachable. >>> >>> >>> >>> and indeed if I list the mounts under /rhev/data-center/mnt, only one of >>> the directories is mounted. I have 3 NFS mounts, one ISO Domain and two >>> Data Domains. Only one Data Domain has mounted and this has lots of .prob >>> files in. So why haven't the other NFS exports been mounted? >>> >>> >>> >>> Manually mounting them doesn't seem to have helped much either. I can >>> start the broker service but the agent service says no. Same error as the >>> one in my last email. >>> >>> >>> >>> Shareef. >>> >>> >>> >>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq <shareef@jalloq.co.uk> >>> wrote: >>> >>> Right, still down. I've run virsh and it doesn't know anything about >>> the engine vm. >>> >>> >>> >>> I've restarted the broker and agent services and I still get nothing in >>> virsh->list. >>> >>> >>> >>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots of errors: >>> >>> >>> >>> broker.log: >>> >>> >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> ovirt-hosted-engine-ha broker 2.3.6 started >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Searching for submonitors in >>>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor network >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor cpu-load-no-engine >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor mgmt-bridge >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor network >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor cpu-load >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor engine-health >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor mgmt-bridge >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor cpu-load-no-engine >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor cpu-load >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor mem-free >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor storage-domain >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor storage-domain >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor mem-free >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Loaded submonitor engine-health >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Finished loading submonitors >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>> Connecting the storage >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> Connecting storage server >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> Connecting storage server >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> Refreshing the storage domain >>> >>> MainThread::WARNING::2020-04-08 >>>
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>>> Can't connect vdsm storage: Command StorageDomain.getInfo with args >>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >>> >>> (code=350, message=Error in storage domain action: >>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> ovirt-hosted-engine-ha broker 2.3.6 started >>> >>> MainThread::INFO::2020-04-08 >>>
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> Searching for submonitors in >>>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>>> >>> >>> >>> agent.log: >>> >>> >>> >>> MainThread::ERROR::2020-04-08 >>>
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> Trying to restart agent >>> >>> MainThread::INFO::2020-04-08 >>>
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> Agent shutting down >>> >>> MainThread::INFO::2020-04-08 >>>
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> ovirt-hosted-engine-ha agent 2.3.6 started >>> >>> MainThread::INFO::2020-04-08 >>>
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>>> Found certificate common name: ovirt-node-01.phoelex.com >>> >>> MainThread::INFO::2020-04-08 >>>
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>> Initializing ha-broker connection >>> >>> MainThread::INFO::2020-04-08 >>>
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>> Starting monitor network, options {'tcp_t_address': '', 'network_test': >>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >>> >>> MainThread::ERROR::2020-04-08 >>>
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>> Failed to start necessary monitors >>> >>> MainThread::ERROR::2020-04-08 >>>
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> Traceback (most recent call last): >>> >>> File >>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 131, in _run_agent >>> >>> return action(he) >>> >>> File >>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 55, in action_proper >>> >>> return he.start_monitoring() >>> >>> File >>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 432, in start_monitoring >>> >>> self._initialize_broker() >>> >>> File >>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 556, in _initialize_broker >>> >>> m.get('options', {})) >>> >>> File >>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>> line 89, in start_monitor >>> >>> ).format(t=type, o=options, e=e) >>> >>> RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: >>> [Errno 2] No such file or directory, [monitor: 'network', options: >>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', 'addr': >>> '192.168.1.99'}] >>> >>> >>> >>> MainThread::ERROR::2020-04-08 >>>
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> Trying to restart agent >>> >>> MainThread::INFO::2020-04-08 >>>
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> Agent shutting down >>> >>> >>> >>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov <hunter86_bg@yahoo.com> >>> wrote: >>> >>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < >>> matonb@ltresources.co.uk> wrote: >>> >On the host you tried to restart the engine on: >>> > >>> >Add an alias to virsh (authenticates with virsh_auth.conf) >>> > >>> >alias virsh='virsh -c >>> qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >>> > >>> >Then run virsh: >>> > >>> >virsh >>> > >>> >virsh # list >>> > Id Name State >>> >---------------------------------------------------- >>> > xx HostedEngine Paused >>> > xx ********** running >>> > ... >>> > xx ********** running >>> > >>> >HostedEngine should be in the list, try and resume the engine: >>> > >>> >virsh # resume HostedEngine >>> > >>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> >>> >wrote: >>> > >>> >> Thanks! >>> >> >>> >> The status hangs due to, I guess, the VM being down.... >>> >> >>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >>> >> VM exists and is down, cleaning up and restarting >>> >> VM in WaitForLaunch >>> >> >>> >> but this doesn't seem to do anything. OK, after a while I get a >>> >status of >>> >> it being barfed... >>> >> >>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- >>> >> >>> >> conf_on_shared_storage : True >>> >> Status up-to-date : False >>> >> Hostname : ovirt-node-00.phoelex.com >>> >> Host ID : 1 >>> >> Engine status : unknown stale-data >>> >> Score : 3400 >>> >> stopped : False >>> >> Local maintenance : False >>> >> crc32 : 9c4a034b >>> >> local_conf_timestamp : 523362 >>> >> Host timestamp : 523608 >>> >> Extra metadata (valid at timestamp): >>> >> metadata_parse_version=1 >>> >> metadata_feature_version=1 >>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >>> >> host-id=1 >>> >> score=3400 >>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) >>> >> conf_on_shared_storage=True >>> >> maintenance=False >>> >> state=EngineDown >>> >> stopped=False >>> >> >>> >> >>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- >>> >> >>> >> conf_on_shared_storage : True >>> >> Status up-to-date : True >>> >> Hostname : ovirt-node-01.phoelex.com >>> >> Host ID : 2 >>> >> Engine status : {"reason": "bad vm status", >>> >"health": >>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >>> >> Score : 0 >>> >> stopped : False >>> >> Local maintenance : False >>> >> crc32 : 5045f2eb >>> >> local_conf_timestamp : 1737037 >>> >> Host timestamp : 1737283 >>> >> Extra metadata (valid at timestamp): >>> >> metadata_parse_version=1 >>> >> metadata_feature_version=1 >>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >>> >> host-id=2 >>> >> score=0 >>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) >>> >> conf_on_shared_storage=True >>> >> maintenance=False >>> >> state=EngineUnexpectedlyDown >>> >> stopped=False >>> >> >>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >>> ><matonb@ltresources.co.uk> >>> >> wrote: >>> >> >>> >>> First steps, on one of your hosts as root: >>> >>> >>> >>> To get information: >>> >>> hosted-engine --vm-status >>> >>> >>> >>> To start the engine: >>> >>> hosted-engine --vm-start >>> >>> >>> >>> >>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk> >>> >wrote: >>> >>> >>> >>>> So my engine has gone down and I can't ssh into it either. If I >>> >try to >>> >>>> log into the web-ui of the node it is running on, I get redirected >>> >because >>> >>>> the node can't reach the engine. >>> >>>> >>> >>>> What are my next steps? >>> >>>> >>> >>>> Shareef. >>> >>>> _______________________________________________ >>> >>>> Users mailing list -- users@ovirt.org >>> >>>> To unsubscribe send an email to users-leave@ovirt.org >>> >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>> >>>> oVirt Code of Conduct: >>> >>>> https://www.ovirt.org/community/about/community-guidelines/ >>> >>>> List Archives: >>> >>>> >>> > >>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>>> >>>> >>> >>> >>> >>> This has to be resolved: >>> >>> Engine status : unknown stale-data >>> >>> Run again 'hosted-engine --vm-status'. If it remains the same, restart >>> ovirt-ha-broker.service & ovirt-ha-agent.service >>> >>> Verify that the engine's storage is available. Then monitor
broker >>> & agent logs in /var/log/ovirt-hosted-engine-ha >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> >>> >>>
Hi Shareef,
The flow of activation oVirt is more complex than a plain KVM. Mounting of the domains happen during the activation of the node ( the HostedEngine is activating everything needed).
Focus on the HostedEngine VM. Is it running properly ?
If not,try: 1. Verify that the storage domain exists 2. Check if it has 'ha_agents' directory 3. Check if the links are OK, if not you can safely remove the
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) directory'Is the the the links
4. Next check the services are running: A) sanlock B) supervdsmd C) vdsmd D) libvirtd
5. Increase the log level for broker and agent services:
cd /etc/ovirt-hosted-engine-ha vim *-log.conf
systemctl restart ovirt-ha-broker ovirt-ha-agent
6. Check what they are complaining about Keep in mind that agent will keep throwing errors untill the
broker stops
doing it (agent depends on broker), so broker must be OK before peoceeding with the agent log.
About the manual VM start, you need 2 things:
1. Define the VM network # cat vdsm-ovirtmgmt.xml <network> <name>vdsm-ovirtmgmt</name> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> <forward mode='bridge'/> <bridge name='ovirtmgmt'/> </network>
[root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml
2. Get an xml definition which can be found in the vdsm log. Every VM at start up has it's configuration printed out in vdsm log on the host it starts. Save to file and then: A) virsh define myvm.xml B) virsh start myvm
It seems there is/was a problem with your NFS shares.
Best Regards, Strahil Nikolov
Hey Shareef,
Check if there are any files or folders not owned by vdsm:kvm . Something like this:
find . -not -user 36 -not -group 36 -print
Also check if vdsm can access the images in the '<vol-mount-point>/images' directories.
Best Regards, Strahil Nikolov
And the IPv6 address '64:ff9b::c0a8:13d' ? I don't see in the log output. Best Regards, Strahil Nikolov

Hmmm, we're not using ipv6. Is that the issue? On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Right, I've given up on recovering the HE so want to try and redeploy it. There doesn't seem to be enough information to debug why the broker/agent won't start cleanly.
In running 'hosted-engine --deploy', I'm seeing the following error in the setup validation phase:
2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please provide the hostname of this host on the management network [ovirt-node-00.phoelex.com]:
2020-04-14 09:46:12,831+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getResolvedAddresses:432 getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com resolves to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', 'ANY'], executable='None', cwd='None', env=None
2020-04-14 09:46:12,871+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'], rc=0
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stdout:
ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stderr:
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr'), rc=0
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr') stdout:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:febc:326a/64 scope link
valid_lft forever preferred_lft forever
22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr') stderr:
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getLocalAddresses:251 addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 test_hostname exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 460, in test_hostname
not_local_text,
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 342, in _validateFQDNresolvability
addresses=resolvedAddressesAsString
RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
2020-04-14 09:46:12,884+0000 ERROR otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host name is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
The node I'm running on has an IP address of .61 and resolves correctly.
On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Where should I be checking if there are any files/folder not owned by vdsm:kvm? I checked on the mount the HA sits on and it's fine.
How would I go about checking vdsm can access those images? If I run virsh, it lists them and they were running yesterday even though the HA was down. I've since restarted both hosts but the broker is still spitting out the same error (copied below). How do I find the reason the broker can't connect to the storage? The conf file is already at DEBUG verbosity:
[handler_logfile]
class=logging.handlers.TimedRotatingFileHandler
args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
level=DEBUG
formatter=long
And what are all these .prob-<num> files that are being created? There are over 250K of them now on the mount I'm using for the Data domain. They're all of 0 size and of the form, /rhev/data-center/mnt/nas-01.phoelex.com: _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
@eevans: The volume I have the Data Domain on has TB's free. The HA is dead so I can't ssh in. No idea what started these errors and the other VMs were still running happily although they're on a different Data Domain.
Shareef.
MainThread::INFO::2020-04-10
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-10
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-10
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
OK, let's go through this. I'm looking at the node that at least still has some VMs running. virsh also tells me that the HostedEngine VM is running but it's unresponsive and I can't shut it down.
1. All storage domains exist and are mounted. 2. The ha_agent exists:
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
dom_md ha_agent images master
3. There are two links
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
total 8
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace ->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata ->
4. The services exist but all seem to have some sort of warning:
a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]:
*2020-04-08
18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory*
c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 directory'Is
the Hosted Engine setup finished?*
d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot parse process status data
Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found
Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
5 & 6. The broker log is continually printing this error:
MainThread::INFO::2020-04-09
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
Running broker
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
Starting monitor
MainThread::INFO::2020-04-09
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
/submonitors
MainThread::INFO::2020-04-09
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
MainThread::DEBUG::2020-04-09
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
Starting storage broker
MainThread::DEBUG::2020-04-09
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting to VDSM
MainThread::DEBUG::2020-04-09
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2020-04-09 08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 139992488138496)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7f528acabc90>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2020-04-09
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
Stomp connection established
MainThread::DEBUG::2020-04-09 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2020-04-09
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-09
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available
MainThread::INFO::2020-04-09
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
MainThread::INFO::2020-04-09
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::DEBUG::2020-04-09 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Error refreshing storage domain: Command StorageDomain.getStats with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::DEBUG::2020-04-09 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::WARNING::2020-04-09
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
The UUID it is moaning about is indeed the one that the HA sits on and is the one I listed the contents of in step 2 above.
So why can't it see this domain?
Thanks, Shareef.
On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: >Don't know if this is useful or not, but I just tried to shutdown and >start >another VM on one of the hosts and get the following error: > >virsh # start scratch > >error: Failed to start domain scratch > >error: Network not found: no network with matching name >'vdsm-ovirtmgmt' > >Is this not referring to the interface name as the network is called >'ovirtmgnt'. > >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> >wrote: > >> Hmmm, virsh tells me the HE is running but it hasn't come up and the >> agent.log is full of the same errors. >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> >> wrote: >> >>> Ah hah! Ok, so I've managed to start it using virsh on the second >host >>> but my first host is still dead. >>> >>> First of all, what are these 56,317 .prob- files that get dumped to >the >>> NFS mounts? >>> >>> Secondly, why doesn't the node mount the NFS directories at boot? >Is >>> that the issue with this particular node? >>> >>> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote: >>> >>>> Did you try virsh list --inactive >>>> >>>> >>>> >>>> Eric Evans >>>> >>>> Digital Data Services LLC. >>>> >>>> 304.660.9080 >>>> >>>> >>>> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >>>> *Cc:* Ovirt Users <users@ovirt.org> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to >rescue? >>>> >>>> >>>> >>>> I've now shut down the VMs on one host and rebooted it but
>agent >>>> service doesn't start. If I run 'hosted-engine --vm-status' I get: >>>> >>>> >>>> >>>> The hosted engine configuration has not been retrieved from shared >>>> storage. Please ensure that ovirt-ha-agent is running and
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) the the
>storage >>>> server is reachable. >>>> >>>> >>>> >>>> and indeed if I list the mounts under /rhev/data-center/mnt, only >one of >>>> the directories is mounted. I have 3 NFS mounts, one ISO Domain >and two >>>> Data Domains. Only one Data Domain has mounted and this has lots >of .prob >>>> files in. So why haven't the other NFS exports been mounted? >>>> >>>> >>>> >>>> Manually mounting them doesn't seem to have helped much either. I >can >>>> start the broker service but the agent service says no. Same error >as the >>>> one in my last email. >>>> >>>> >>>> >>>> Shareef. >>>> >>>> >>>> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq ><shareef@jalloq.co.uk> >>>> wrote: >>>> >>>> Right, still down. I've run virsh and it doesn't know anything >about >>>> the engine vm. >>>> >>>> >>>> >>>> I've restarted the broker and agent services and I still get >nothing in >>>> virsh->list. >>>> >>>> >>>> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots of >errors: >>>> >>>> >>>> >>>> broker.log: >>>> >>>> >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Searching for submonitors in >>>>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor network >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor cpu-load-no-engine >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor mgmt-bridge >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor network >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor cpu-load >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor engine-health >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor mgmt-bridge >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor cpu-load-no-engine >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor cpu-load >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor mem-free >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor storage-domain >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor storage-domain >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor mem-free >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Loaded submonitor engine-health >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Finished loading submonitors >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>>> Connecting the storage >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>> Connecting storage server >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>> Connecting storage server >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>>> Refreshing the storage domain >>>> >>>> MainThread::WARNING::2020-04-08 >>>>
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>>>> Can't connect vdsm storage: Command StorageDomain.getInfo with args >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >>>> >>>> (code=350, message=Error in storage domain action: >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>>> Searching for submonitors in >>>>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>>>> >>>> >>>> >>>> agent.log: >>>> >>>> >>>> >>>> MainThread::ERROR::2020-04-08 >>>>
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>> Trying to restart agent >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>> Agent shutting down >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>> ovirt-hosted-engine-ha agent 2.3.6 started >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>>>> Found certificate common name: ovirt-node-01.phoelex.com >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>> Initializing ha-broker connection >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>>> Starting monitor network, options {'tcp_t_address': '', >'network_test': >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >>>> >>>> MainThread::ERROR::2020-04-08 >>>>
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>>> Failed to start necessary monitors >>>> >>>> MainThread::ERROR::2020-04-08 >>>>
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>> Traceback (most recent call last): >>>> >>>> File >>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>> line 131, in _run_agent >>>> >>>> return action(he) >>>> >>>> File >>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>> line 55, in action_proper >>>> >>>> return he.start_monitoring() >>>> >>>> File >>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>> line 432, in start_monitoring >>>> >>>> self._initialize_broker() >>>> >>>> File >>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>> line 556, in _initialize_broker >>>> >>>> m.get('options', {})) >>>> >>>> File >>>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>> line 89, in start_monitor >>>> >>>> ).format(t=type, o=options, e=e) >>>> >>>> RequestError: brokerlink - failed to start monitor via >ovirt-ha-broker: >>>> [Errno 2] No such file or directory, [monitor: 'network', options: >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', >'addr': >>>> '192.168.1.99'}] >>>> >>>> >>>> >>>> MainThread::ERROR::2020-04-08 >>>>
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>>> Trying to restart agent >>>> >>>> MainThread::INFO::2020-04-08 >>>>
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>> Agent shutting down >>>> >>>> >>>> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov ><hunter86_bg@yahoo.com> >>>> wrote: >>>> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < >>>> matonb@ltresources.co.uk> wrote: >>>> >On the host you tried to restart the engine on: >>>> > >>>> >Add an alias to virsh (authenticates with virsh_auth.conf) >>>> > >>>> >alias virsh='virsh -c >>>> qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >>>> > >>>> >Then run virsh: >>>> > >>>> >virsh >>>> > >>>> >virsh # list >>>> > Id Name State >>>> >---------------------------------------------------- >>>> > xx HostedEngine Paused >>>> > xx ********** running >>>> > ... >>>> > xx ********** running >>>> > >>>> >HostedEngine should be in the list, try and resume the engine: >>>> > >>>> >virsh # resume HostedEngine >>>> > >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> >>>> >wrote: >>>> > >>>> >> Thanks! >>>> >> >>>> >> The status hangs due to, I guess, the VM being down.... >>>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >>>> >> VM exists and is down, cleaning up and restarting >>>> >> VM in WaitForLaunch >>>> >> >>>> >> but this doesn't seem to do anything. OK, after a while I get a >>>> >status of >>>> >> it being barfed... >>>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- >>>> >> >>>> >> conf_on_shared_storage : True >>>> >> Status up-to-date : False >>>> >> Hostname : ovirt-node-00.phoelex.com >>>> >> Host ID : 1 >>>> >> Engine status : unknown stale-data >>>> >> Score : 3400 >>>> >> stopped : False >>>> >> Local maintenance : False >>>> >> crc32 : 9c4a034b >>>> >> local_conf_timestamp : 523362 >>>> >> Host timestamp : 523608 >>>> >> Extra metadata (valid at timestamp): >>>> >> metadata_parse_version=1 >>>> >> metadata_feature_version=1 >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >>>> >> host-id=1 >>>> >> score=3400 >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) >>>> >> conf_on_shared_storage=True >>>> >> maintenance=False >>>> >> state=EngineDown >>>> >> stopped=False >>>> >> >>>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- >>>> >> >>>> >> conf_on_shared_storage : True >>>> >> Status up-to-date : True >>>> >> Hostname : ovirt-node-01.phoelex.com >>>> >> Host ID : 2 >>>> >> Engine status : {"reason": "bad vm status", >>>> >"health": >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >>>> >> Score : 0 >>>> >> stopped : False >>>> >> Local maintenance : False >>>> >> crc32 : 5045f2eb >>>> >> local_conf_timestamp : 1737037 >>>> >> Host timestamp : 1737283 >>>> >> Extra metadata (valid at timestamp): >>>> >> metadata_parse_version=1 >>>> >> metadata_feature_version=1 >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >>>> >> host-id=2 >>>> >> score=0 >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) >>>> >> conf_on_shared_storage=True >>>> >> maintenance=False >>>> >> state=EngineUnexpectedlyDown >>>> >> stopped=False >>>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >>>> ><matonb@ltresources.co.uk> >>>> >> wrote: >>>> >> >>>> >>> First steps, on one of your hosts as root: >>>> >>> >>>> >>> To get information: >>>> >>> hosted-engine --vm-status >>>> >>> >>>> >>> To start the engine: >>>> >>> hosted-engine --vm-start >>>> >>> >>>> >>> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq ><shareef@jalloq.co.uk> >>>> >wrote: >>>> >>> >>>> >>>> So my engine has gone down and I can't ssh into it either. If >I >>>> >try to >>>> >>>> log into the web-ui of the node it is running on, I get >redirected >>>> >because >>>> >>>> the node can't reach the engine. >>>> >>>> >>>> >>>> What are my next steps? >>>> >>>> >>>> >>>> Shareef. >>>> >>>> _______________________________________________ >>>> >>>> Users mailing list -- users@ovirt.org >>>> >>>> To unsubscribe send an email to users-leave@ovirt.org >>>> >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>> >>>> oVirt Code of Conduct: >>>> >>>> https://www.ovirt.org/community/about/community-guidelines/ >>>> >>>> List Archives: >>>> >>>> >>>> > >>>> >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>>>> >>>> >>>> >>> >>>> >>>> This has to be resolved: >>>> >>>> Engine status : unknown stale-data >>>> >>>> Run again 'hosted-engine --vm-status'. If it remains the same, >restart >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >>>> >>>> Verify that the engine's storage is available. Then monitor the >broker >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>>> >>>> >>>>
Hi Shareef,
The flow of activation oVirt is more complex than a plain KVM. Mounting of the domains happen during the activation of the node ( the HostedEngine is activating everything needed).
Focus on the HostedEngine VM. Is it running properly ?
If not,try: 1. Verify that the storage domain exists 2. Check if it has 'ha_agents' directory 3. Check if the links are OK, if not you can safely remove the links
4. Next check the services are running: A) sanlock B) supervdsmd C) vdsmd D) libvirtd
5. Increase the log level for broker and agent services:
cd /etc/ovirt-hosted-engine-ha vim *-log.conf
systemctl restart ovirt-ha-broker ovirt-ha-agent
6. Check what they are complaining about Keep in mind that agent will keep throwing errors untill the broker stops doing it (agent depends on broker), so broker must be OK before peoceeding with the agent log.
About the manual VM start, you need 2 things:
1. Define the VM network # cat vdsm-ovirtmgmt.xml <network> <name>vdsm-ovirtmgmt</name> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> <forward mode='bridge'/> <bridge name='ovirtmgmt'/> </network>
[root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml
2. Get an xml definition which can be found in the vdsm log. Every VM at start up has it's configuration printed out in vdsm log on the host it starts. Save to file and then: A) virsh define myvm.xml B) virsh start myvm
It seems there is/was a problem with your NFS shares.
Best Regards, Strahil Nikolov
Hey Shareef,
Check if there are any files or folders not owned by vdsm:kvm . Something like this:
find . -not -user 36 -not -group 36 -print
Also check if vdsm can access the images in the '<vol-mount-point>/images' directories.
Best Regards, Strahil Nikolov
And the IPv6 address '64:ff9b::c0a8:13d' ?
I don't see in the log output.
Best Regards, Strahil Nikolov

On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Hmmm, we're not using ipv6. Is that the issue?
On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Right, I've given up on recovering the HE so want to try and redeploy it. There doesn't seem to be enough information to debug why the broker/agent won't start cleanly.
In running 'hosted-engine --deploy', I'm seeing the following error in the setup validation phase:
2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please provide
hostname of this host on the management network [ovirt-node-00.phoelex.com]:
2020-04-14 09:46:12,831+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getResolvedAddresses:432 getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com resolves to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', 'ANY'], executable='None', cwd='None', env=None
2020-04-14 09:46:12,871+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'], rc=0
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stdout:
ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stderr:
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr'), rc=0
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr') stdout:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:febc:326a/64 scope link
valid_lft forever preferred_lft forever
22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr') stderr:
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getLocalAddresses:251 addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 test_hostname exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 460, in test_hostname
not_local_text,
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 342, in _validateFQDNresolvability
addresses=resolvedAddressesAsString
RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
2020-04-14 09:46:12,884+0000 ERROR otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host name is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
The node I'm running on has an IP address of .61 and resolves correctly.
On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Where should I be checking if there are any files/folder not owned by vdsm:kvm? I checked on the mount the HA sits on and it's fine.
How would I go about checking vdsm can access those images? If I run virsh, it lists them and they were running yesterday even though
On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the the
HA was
down. I've since restarted both hosts but the broker is still spitting out the same error (copied below). How do I find the reason the broker can't connect to the storage? The conf file is already at DEBUG verbosity:
[handler_logfile]
class=logging.handlers.TimedRotatingFileHandler
args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
level=DEBUG
formatter=long
And what are all these .prob-<num> files that are being created? There are over 250K of them now on the mount I'm using for the Data domain. They're all of 0 size and of the form, /rhev/data-center/mnt/nas-01.phoelex.com: _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
@eevans: The volume I have the Data Domain on has TB's free. The HA is dead so I can't ssh in. No idea what started these errors and the other VMs were still running happily although they're on a different Data Domain.
Shareef.
MainThread::INFO::2020-04-10
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-10
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-10
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
OK, let's go through this. I'm looking at the node that at least still has some VMs running. virsh also tells me that the HostedEngine VM is running but it's unresponsive and I can't shut it down.
1. All storage domains exist and are mounted. 2. The ha_agent exists:
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
dom_md ha_agent images master
3. There are two links
[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ nas-01.phoelex.com
\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
total 8
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace
->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata
->
4. The services exist but all seem to have some sort of warning:
a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]:
*2020-04-08
18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory*
c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 directory'Is
the Hosted Engine setup finished?*
d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot parse process status data
Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : internal error: /proc/net/dev: Interface not found
Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
5 & 6. The broker log is continually printing this error:
MainThread::INFO::2020-04-09
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
Running broker
MainThread::DEBUG::2020-04-09
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
Starting monitor
MainThread::INFO::2020-04-09
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
/submonitors
MainThread::INFO::2020-04-09
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2020-04-09
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2020-04-09
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
MainThread::DEBUG::2020-04-09
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
Starting storage broker
MainThread::DEBUG::2020-04-09
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting to VDSM
MainThread::DEBUG::2020-04-09
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2020-04-09 08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 139992488138496)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7f528acabc90>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2020-04-09
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
Stomp connection established
MainThread::DEBUG::2020-04-09 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2020-04-09
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-09
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available
MainThread::INFO::2020-04-09
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::DEBUG::2020-04-09 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
MainThread::INFO::2020-04-09
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::DEBUG::2020-04-09 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Error refreshing storage domain: Command StorageDomain.getStats with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::DEBUG::2020-04-09 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::DEBUG::2020-04-09
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
MainThread::WARNING::2020-04-09
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
The UUID it is moaning about is indeed the one that the HA sits on and is the one I listed the contents of in step 2 above.
So why can't it see this domain?
Thanks, Shareef.
On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < > shareef@jalloq.co.uk> wrote: > >Don't know if this is useful or not, but I just tried to shutdown and > >start > >another VM on one of the hosts and get the following error: > > > >virsh # start scratch > > > >error: Failed to start domain scratch > > > >error: Network not found: no network with matching name > >'vdsm-ovirtmgmt' > > > >Is this not referring to the interface name as the network is called > >'ovirtmgnt'. > > > >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq <shareef@jalloq.co.uk> > >wrote: > > > >> Hmmm, virsh tells me the HE is running but it hasn't come up and the > >> agent.log is full of the same errors. > >> > >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq <shareef@jalloq.co.uk> > >> wrote: > >> > >>> Ah hah! Ok, so I've managed to start it using virsh on
second > >host > >>> but my first host is still dead. > >>> > >>> First of all, what are these 56,317 .prob- files that get dumped to > >the > >>> NFS mounts? > >>> > >>> Secondly, why doesn't the node mount the NFS directories at boot? > >Is > >>> that the issue with this particular node? > >>> > >>> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> wrote: > >>> > >>>> Did you try virsh list --inactive > >>>> > >>>> > >>>> > >>>> Eric Evans > >>>> > >>>> Digital Data Services LLC. > >>>> > >>>> 304.660.9080 > >>>> > >>>> > >>>> > >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> > >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM > >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> > >>>> *Cc:* Ovirt Users <users@ovirt.org> > >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to > >rescue? > >>>> > >>>> > >>>> > >>>> I've now shut down the VMs on one host and rebooted it but
> >agent > >>>> service doesn't start. If I run 'hosted-engine --vm-status' I get: > >>>> > >>>> > >>>> > >>>> The hosted engine configuration has not been retrieved from shared > >>>> storage. Please ensure that ovirt-ha-agent is running and
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) the the the
> >storage > >>>> server is reachable. > >>>> > >>>> > >>>> > >>>> and indeed if I list the mounts under /rhev/data-center/mnt, only > >one of > >>>> the directories is mounted. I have 3 NFS mounts, one ISO Domain > >and two > >>>> Data Domains. Only one Data Domain has mounted and this has lots > >of .prob > >>>> files in. So why haven't the other NFS exports been mounted? > >>>> > >>>> > >>>> > >>>> Manually mounting them doesn't seem to have helped much either. I > >can > >>>> start the broker service but the agent service says no. Same error > >as the > >>>> one in my last email. > >>>> > >>>> > >>>> > >>>> Shareef. > >>>> > >>>> > >>>> > >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq > ><shareef@jalloq.co.uk> > >>>> wrote: > >>>> > >>>> Right, still down. I've run virsh and it doesn't know anything > >about > >>>> the engine vm. > >>>> > >>>> > >>>> > >>>> I've restarted the broker and agent services and I still get > >nothing in > >>>> virsh->list. > >>>> > >>>> > >>>> > >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots of > >errors: > >>>> > >>>> > >>>> > >>>> broker.log: > >>>> > >>>> > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Searching for submonitors in > >>>> >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor network > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor cpu-load-no-engine > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor mgmt-bridge > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor network > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor cpu-load > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor engine-health > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor mgmt-bridge > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor cpu-load-no-engine > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor cpu-load > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor mem-free > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor storage-domain > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor storage-domain > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor mem-free > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Loaded submonitor engine-health > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Finished loading submonitors > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >>>> Connecting the storage > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>>> Connecting storage server > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>>> Connecting storage server > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>>> Refreshing the storage domain > >>>> > >>>> MainThread::WARNING::2020-04-08 > >>>> > >
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo with args > >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >>>> > >>>> (code=350, message=Error in storage domain action: > >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>>> Searching for submonitors in > >>>> >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> >>>> > >>>> > >>>> > >>>> agent.log: > >>>> > >>>> > >>>> > >>>> MainThread::ERROR::2020-04-08 > >>>> > >
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >>>> Trying to restart agent > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> >
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >>>> Agent shutting down > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> >
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >>>> ovirt-hosted-engine-ha agent 2.3.6 started > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> >>>> Found certificate common name: ovirt-node-01.phoelex.com > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> >>>> Initializing ha-broker connection > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> > >
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> >>>> Starting monitor network, options {'tcp_t_address': '', > >'network_test': > >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > >>>> > >>>> MainThread::ERROR::2020-04-08 > >>>> > >
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> >>>> Failed to start necessary monitors > >>>> > >>>> MainThread::ERROR::2020-04-08 > >>>> > >
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >>>> Traceback (most recent call last): > >>>> > >>>> File > >>>> >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >>>> line 131, in _run_agent > >>>> > >>>> return action(he) > >>>> > >>>> File > >>>> >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >>>> line 55, in action_proper > >>>> > >>>> return he.start_monitoring() > >>>> > >>>> File > >>>> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >>>> line 432, in start_monitoring > >>>> > >>>> self._initialize_broker() > >>>> > >>>> File > >>>> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >>>> line 556, in _initialize_broker > >>>> > >>>> m.get('options', {})) > >>>> > >>>> File > >>>> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> >>>> line 89, in start_monitor > >>>> > >>>> ).format(t=type, o=options, e=e) > >>>> > >>>> RequestError: brokerlink - failed to start monitor via > >ovirt-ha-broker: > >>>> [Errno 2] No such file or directory, [monitor: 'network', options: > >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', > >'addr': > >>>> '192.168.1.99'}] > >>>> > >>>> > >>>> > >>>> MainThread::ERROR::2020-04-08 > >>>> > >
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >>>> Trying to restart agent > >>>> > >>>> MainThread::INFO::2020-04-08 > >>>> >
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >>>> Agent shutting down > >>>> > >>>> > >>>> > >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov > ><hunter86_bg@yahoo.com> > >>>> wrote: > >>>> > >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < > >>>> matonb@ltresources.co.uk> wrote: > >>>> >On the host you tried to restart the engine on: > >>>> > > >>>> >Add an alias to virsh (authenticates with virsh_auth.conf) > >>>> > > >>>> >alias virsh='virsh -c > >>>>
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
> >>>> > > >>>> >Then run virsh: > >>>> > > >>>> >virsh > >>>> > > >>>> >virsh # list > >>>> > Id Name State > >>>> >---------------------------------------------------- > >>>> > xx HostedEngine Paused > >>>> > xx ********** running > >>>> > ... > >>>> > xx ********** running > >>>> > > >>>> >HostedEngine should be in the list, try and resume the engine: > >>>> > > >>>> >virsh # resume HostedEngine > >>>> > > >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq <shareef@jalloq.co.uk> > >>>> >wrote: > >>>> > > >>>> >> Thanks! > >>>> >> > >>>> >> The status hangs due to, I guess, the VM being down.... > >>>> >> > >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >>>> >> VM exists and is down, cleaning up and restarting > >>>> >> VM in WaitForLaunch > >>>> >> > >>>> >> but this doesn't seem to do anything. OK, after a while I get a > >>>> >status of > >>>> >> it being barfed... > >>>> >> > >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- > >>>> >> > >>>> >> conf_on_shared_storage : True > >>>> >> Status up-to-date : False > >>>> >> Hostname : ovirt-node-00.phoelex.com > >>>> >> Host ID : 1 > >>>> >> Engine status : unknown stale-data > >>>> >> Score : 3400 > >>>> >> stopped : False > >>>> >> Local maintenance : False > >>>> >> crc32 : 9c4a034b > >>>> >> local_conf_timestamp : 523362 > >>>> >> Host timestamp : 523608 > >>>> >> Extra metadata (valid at timestamp): > >>>> >> metadata_parse_version=1 > >>>> >> metadata_feature_version=1 > >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) > >>>> >> host-id=1 > >>>> >> score=3400 > >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) > >>>> >> conf_on_shared_storage=True > >>>> >> maintenance=False > >>>> >> state=EngineDown > >>>> >> stopped=False > >>>> >> > >>>> >> > >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- > >>>> >> > >>>> >> conf_on_shared_storage : True > >>>> >> Status up-to-date : True > >>>> >> Hostname : ovirt-node-01.phoelex.com > >>>> >> Host ID : 2 > >>>> >> Engine status : {"reason": "bad vm status", > >>>> >"health": > >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} > >>>> >> Score : 0 > >>>> >> stopped : False > >>>> >> Local maintenance : False > >>>> >> crc32 : 5045f2eb > >>>> >> local_conf_timestamp : 1737037 > >>>> >> Host timestamp : 1737283 > >>>> >> Extra metadata (valid at timestamp): > >>>> >> metadata_parse_version=1 > >>>> >> metadata_feature_version=1 > >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >>>> >> host-id=2 > >>>> >> score=0 > >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11
> >>>> >> conf_on_shared_storage=True > >>>> >> maintenance=False > >>>> >> state=EngineUnexpectedlyDown > >>>> >> stopped=False > >>>> >> > >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > >>>> ><matonb@ltresources.co.uk> > >>>> >> wrote: > >>>> >> > >>>> >>> First steps, on one of your hosts as root: > >>>> >>> > >>>> >>> To get information: > >>>> >>> hosted-engine --vm-status > >>>> >>> > >>>> >>> To start the engine: > >>>> >>> hosted-engine --vm-start > >>>> >>> > >>>> >>> > >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq > ><shareef@jalloq.co.uk> > >>>> >wrote: > >>>> >>> > >>>> >>>> So my engine has gone down and I can't ssh into it either. If > >I > >>>> >try to > >>>> >>>> log into the web-ui of the node it is running on, I get > >redirected > >>>> >because > >>>> >>>> the node can't reach the engine. > >>>> >>>> > >>>> >>>> What are my next steps? > >>>> >>>> > >>>> >>>> Shareef. > >>>> >>>> _______________________________________________ > >>>> >>>> Users mailing list -- users@ovirt.org > >>>> >>>> To unsubscribe send an email to users-leave@ovirt.org > >>>> >>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>>> >>>> oVirt Code of Conduct: > >>>> >>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>> >>>> List Archives: > >>>> >>>> > >>>> > > >>>> > > >
> >>>> >>>> > >>>> >>> > >>>> > >>>> This has to be resolved: > >>>> > >>>> Engine status : unknown stale-data > >>>> > >>>> Run again 'hosted-engine --vm-status'. If it remains the same, > >restart > >>>> ovirt-ha-broker.service & ovirt-ha-agent.service > >>>> > >>>> Verify that the engine's storage is available. Then monitor the > >broker > >>>> & agent logs in /var/log/ovirt-hosted-engine-ha > >>>> > >>>> Best Regards, > >>>> Strahil Nikolov > >>>> > >>>> > >>>> > >>>> > > Hi Shareef, > > The flow of activation oVirt is more complex than a plain KVM. > Mounting of the domains happen during the activation of the node ( the > HostedEngine is activating everything needed). > > Focus on the HostedEngine VM. > Is it running properly ? > > If not,try: > 1. Verify that the storage domain exists > 2. Check if it has 'ha_agents' directory > 3. Check if the links are OK, if not you can safely remove
links
> > 4. Next check the services are running: > A) sanlock > B) supervdsmd > C) vdsmd > D) libvirtd > > 5. Increase the log level for broker and agent services: > > cd /etc/ovirt-hosted-engine-ha > vim *-log.conf > > systemctl restart ovirt-ha-broker ovirt-ha-agent > > 6. Check what they are complaining about > Keep in mind that agent will keep throwing errors untill the broker stops > doing it (agent depends on broker), so broker must be OK before > peoceeding with the agent log. > > About the manual VM start, you need 2 things: > > 1. Define the VM network > # cat vdsm-ovirtmgmt.xml <network> > <name>vdsm-ovirtmgmt</name> > <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> > <forward mode='bridge'/> > <bridge name='ovirtmgmt'/> > </network> > > [root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml > > 2. Get an xml definition which can be found in the vdsm log. Every VM at > start up has it's configuration printed out in vdsm log on
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C... the the
host
it > starts. > Save to file and then: > A) virsh define myvm.xml > B) virsh start myvm > > It seems there is/was a problem with your NFS shares. > > > Best Regards, > Strahil Nikolov >
Hey Shareef,
Check if there are any files or folders not owned by vdsm:kvm . Something like this:
find . -not -user 36 -not -group 36 -print
Also check if vdsm can access the images in the '<vol-mount-point>/images' directories.
Best Regards, Strahil Nikolov
And the IPv6 address '64:ff9b::c0a8:13d' ?
I don't see in the log output.
Best Regards, Strahil Nikolov
Based on your output , you got a PTR record for IPv4 & IPv6 ... most probably it's the reason. Set the IPv6 on the interface and try again. Best Regards, Strahil Nikolov

OK, that seems to have fixed it, thanks. Is this a side effect of redeploying the HE over a first time install? Nothing has changed in our setup and I didn't need to do this when I initially set up our nodes. On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Hmmm, we're not using ipv6. Is that the issue?
On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Right, I've given up on recovering the HE so want to try and redeploy it. There doesn't seem to be enough information to debug why the broker/agent won't start cleanly.
In running 'hosted-engine --deploy', I'm seeing the following error in the setup validation phase:
2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please provide
hostname of this host on the management network [ovirt-node-00.phoelex.com]:
2020-04-14 09:46:12,831+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getResolvedAddresses:432 getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com resolves to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', 'ANY'], executable='None', cwd='None', env=None
2020-04-14 09:46:12,871+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'], rc=0
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stdout:
ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stderr:
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr'), rc=0
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr') stdout:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:febc:326a/64 scope link
valid_lft forever preferred_lft forever
22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr') stderr:
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getLocalAddresses:251 addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 test_hostname exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 460, in test_hostname
not_local_text,
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 342, in _validateFQDNresolvability
addresses=resolvedAddressesAsString
RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
2020-04-14 09:46:12,884+0000 ERROR otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host name is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
The node I'm running on has an IP address of .61 and resolves correctly.
On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Where should I be checking if there are any files/folder not owned by vdsm:kvm? I checked on the mount the HA sits on and it's fine.
How would I go about checking vdsm can access those images? If I run virsh, it lists them and they were running yesterday even though
On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the the
HA was
down. I've since restarted both hosts but the broker is still spitting out the same error (copied below). How do I find the reason the broker can't connect to the storage? The conf file is already at DEBUG verbosity:
[handler_logfile]
class=logging.handlers.TimedRotatingFileHandler
args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
level=DEBUG
formatter=long
And what are all these .prob-<num> files that are being created? There are over 250K of them now on the mount I'm using for the Data domain. They're all of 0 size and of the form, /rhev/data-center/mnt/nas-01.phoelex.com: _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
@eevans: The volume I have the Data Domain on has TB's free. The HA is dead so I can't ssh in. No idea what started these errors and the other VMs were still running happily although they're on a different Data Domain.
Shareef.
MainThread::INFO::2020-04-10
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-10
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-10
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: >OK, let's go through this. I'm looking at the node that at least still >has >some VMs running. virsh also tells me that the HostedEngine VM is >running >but it's unresponsive and I can't shut it down. > >1. All storage domains exist and are mounted. >2. The ha_agent exists: > >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ >nas-01.phoelex.com \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ > >dom_md ha_agent images master > >3. There are two links > >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ >nas-01.phoelex.com
\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
> >total 8 > >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace ->
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata ->
> >4. The services exist but all seem to have some sort of warning: > >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec* > >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed >to >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: >No >such file or directory* > >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed >to >retrieve Hosted Engine HA score '[Errno 2] No such file or
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 directory'Is
>the >Hosted Engine setup finished?* > >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot >parse >process status data > >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : >internal >error: /proc/net/dev: Interface not found > >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of >file >while reading data: Input/output error > >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of >file >while reading data: Input/output error > >5 & 6. The broker log is continually printing this error: > >MainThread::INFO::2020-04-09
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>ovirt-hosted-engine-ha broker 2.3.6 started > >MainThread::DEBUG::2020-04-09
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>Running broker > >MainThread::DEBUG::2020-04-09
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
>Starting monitor > >MainThread::INFO::2020-04-09
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Searching for submonitors in >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker > >/submonitors > >MainThread::INFO::2020-04-09
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor network > >MainThread::INFO::2020-04-09
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load-no-engine > >MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor mgmt-bridge > >MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor network > >MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load > >MainThread::INFO::2020-04-09
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor engine-health > >MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor mgmt-bridge > >MainThread::INFO::2020-04-09
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load-no-engine > >MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load > >MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor mem-free > >MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor storage-domain > >MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor storage-domain > >MainThread::INFO::2020-04-09
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor mem-free > >MainThread::INFO::2020-04-09
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor engine-health > >MainThread::INFO::2020-04-09
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Finished loading submonitors > >MainThread::DEBUG::2020-04-09
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
>Starting storage broker > >MainThread::DEBUG::2020-04-09
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>Connecting to VDSM > >MainThread::DEBUG::2020-04-09
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
>Creating a new json-rpc connection to VDSM > >Client localhost:54321::DEBUG::2020-04-09 >08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client >localhost:54321, started daemon 139992488138496)> (func=<bound method >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at >0x7f528acabc90>>, args=(), kwargs={}) > >Client localhost:54321::DEBUG::2020-04-09
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
>Stomp connection established > >MainThread::DEBUG::2020-04-09 >08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >response > >MainThread::INFO::2020-04-09
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>Connecting the storage > >MainThread::INFO::2020-04-09
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>Connecting storage server > >MainThread::DEBUG::2020-04-09 >08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >response > >MainThread::DEBUG::2020-04-09 >08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >response > >MainThread::DEBUG::2020-04-09
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
>Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available > >MainThread::INFO::2020-04-09
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>Connecting storage server > >MainThread::DEBUG::2020-04-09 >08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >response > >MainThread::DEBUG::2020-04-09
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] > >MainThread::INFO::2020-04-09
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>Refreshing the storage domain > >MainThread::DEBUG::2020-04-09 >08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >response > >MainThread::DEBUG::2020-04-09
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>Error refreshing storage domain: Command StorageDomain.getStats with >args >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >(code=350, message=Error in storage domain action: >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >MainThread::DEBUG::2020-04-09 >08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >response > >MainThread::DEBUG::2020-04-09
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
>Command StorageDomain.getInfo with args {'storagedomainID': >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >(code=350, message=Error in storage domain action: >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >MainThread::WARNING::2020-04-09
>Can't connect vdsm storage: Command StorageDomain.getInfo with args >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >(code=350, message=Error in storage domain action: >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > >The UUID it is moaning about is indeed the one that the HA sits on and >is >the one I listed the contents of in step 2 above. > > >So why can't it see this domain? > > >Thanks, Shareef. > >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> >wrote: > >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < >> shareef@jalloq.co.uk> wrote: >> >Don't know if this is useful or not, but I just tried to shutdown >and >> >start >> >another VM on one of the hosts and get the following error: >> > >> >virsh # start scratch >> > >> >error: Failed to start domain scratch >> > >> >error: Network not found: no network with matching name >> >'vdsm-ovirtmgmt' >> > >> >Is this not referring to the interface name as the network is called >> >'ovirtmgnt'. >> > >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq ><shareef@jalloq.co.uk> >> >wrote: >> > >> >> Hmmm, virsh tells me the HE is running but it hasn't come up and >the >> >> agent.log is full of the same errors. >> >> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq ><shareef@jalloq.co.uk> >> >> wrote: >> >> >> >>> Ah hah! Ok, so I've managed to start it using virsh on
>second >> >host >> >>> but my first host is still dead. >> >>> >> >>> First of all, what are these 56,317 .prob- files that get dumped >to >> >the >> >>> NFS mounts? >> >>> >> >>> Secondly, why doesn't the node mount the NFS directories at boot? >> >Is >> >>> that the issue with this particular node? >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> >wrote: >> >>> >> >>>> Did you try virsh list --inactive >> >>>> >> >>>> >> >>>> >> >>>> Eric Evans >> >>>> >> >>>> Digital Data Services LLC. >> >>>> >> >>>> 304.660.9080 >> >>>> >> >>>> >> >>>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to >> >rescue? >> >>>> >> >>>> >> >>>> >> >>>> I've now shut down the VMs on one host and rebooted it but
>> >agent >> >>>> service doesn't start. If I run 'hosted-engine --vm-status' I >get: >> >>>> >> >>>> >> >>>> >> >>>> The hosted engine configuration has not been retrieved from >shared >> >>>> storage. Please ensure that ovirt-ha-agent is running and
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) the the the
>> >storage >> >>>> server is reachable. >> >>>> >> >>>> >> >>>> >> >>>> and indeed if I list the mounts under /rhev/data-center/mnt, >only >> >one of >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO Domain >> >and two >> >>>> Data Domains. Only one Data Domain has mounted and this has >lots >> >of .prob >> >>>> files in. So why haven't the other NFS exports been mounted? >> >>>> >> >>>> >> >>>> >> >>>> Manually mounting them doesn't seem to have helped much either. >I >> >can >> >>>> start the broker service but the agent service says no. Same >error >> >as the >> >>>> one in my last email. >> >>>> >> >>>> >> >>>> >> >>>> Shareef. >> >>>> >> >>>> >> >>>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >> ><shareef@jalloq.co.uk> >> >>>> wrote: >> >>>> >> >>>> Right, still down. I've run virsh and it doesn't know anything >> >about >> >>>> the engine vm. >> >>>> >> >>>> >> >>>> >> >>>> I've restarted the broker and agent services and I still get >> >nothing in >> >>>> virsh->list. >> >>>> >> >>>> >> >>>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots of >> >errors: >> >>>> >> >>>> >> >>>> >> >>>> broker.log: >> >>>> >> >>>> >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Searching for submonitors in >> >>>> >>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor network >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor cpu-load-no-engine >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor mgmt-bridge >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor network >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor cpu-load >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor engine-health >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor mgmt-bridge >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor cpu-load-no-engine >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor cpu-load >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor mem-free >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor storage-domain >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor storage-domain >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor mem-free >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Loaded submonitor engine-health >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Finished loading submonitors >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> >>>> Connecting the storage >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >>>> Connecting storage server >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >>>> Connecting storage server >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >>>> Refreshing the storage domain >> >>>> >> >>>> MainThread::WARNING::2020-04-08 >> >>>> >> >>
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo with >args >> >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >failed: >> >>>> >> >>>> (code=350, message=Error in storage domain action: >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>>> Searching for submonitors in >> >>>> >>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>> >>>> >> >>>> >> >>>> >> >>>> agent.log: >> >>>> >> >>>> >> >>>> >> >>>> MainThread::ERROR::2020-04-08 >> >>>> >> >>
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> >>>> Trying to restart agent >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >>
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> >>>> Agent shutting down >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >>
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>> >>>> Found certificate common name: ovirt-node-01.phoelex.com >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>> >>>> Initializing ha-broker connection >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >> >>
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>> >>>> Starting monitor network, options {'tcp_t_address': '', >> >'network_test': >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >> >>>> >> >>>> MainThread::ERROR::2020-04-08 >> >>>> >> >>
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>> >>>> Failed to start necessary monitors >> >>>> >> >>>> MainThread::ERROR::2020-04-08 >> >>>> >> >>
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> >>>> Traceback (most recent call last): >> >>>> >> >>>> File >> >>>> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> >>>> line 131, in _run_agent >> >>>> >> >>>> return action(he) >> >>>> >> >>>> File >> >>>> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> >>>> line 55, in action_proper >> >>>> >> >>>> return he.start_monitoring() >> >>>> >> >>>> File >> >>>> >> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> >>>> line 432, in start_monitoring >> >>>> >> >>>> self._initialize_broker() >> >>>> >> >>>> File >> >>>> >> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> >>>> line 556, in _initialize_broker >> >>>> >> >>>> m.get('options', {})) >> >>>> >> >>>> File >> >>>> >> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>> >>>> line 89, in start_monitor >> >>>> >> >>>> ).format(t=type, o=options, e=e) >> >>>> >> >>>> RequestError: brokerlink - failed to start monitor via >> >ovirt-ha-broker: >> >>>> [Errno 2] No such file or directory, [monitor: 'network', >options: >> >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', >> >'addr': >> >>>> '192.168.1.99'}] >> >>>> >> >>>> >> >>>> >> >>>> MainThread::ERROR::2020-04-08 >> >>>> >> >>
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> >>>> Trying to restart agent >> >>>> >> >>>> MainThread::INFO::2020-04-08 >> >>>> >>
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> >>>> Agent shutting down >> >>>> >> >>>> >> >>>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >> ><hunter86_bg@yahoo.com> >> >>>> wrote: >> >>>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < >> >>>> matonb@ltresources.co.uk> wrote: >> >>>> >On the host you tried to restart the engine on: >> >>>> > >> >>>> >Add an alias to virsh (authenticates with virsh_auth.conf) >> >>>> > >> >>>> >alias virsh='virsh -c >> >>>>
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
>> >>>> > >> >>>> >Then run virsh: >> >>>> > >> >>>> >virsh >> >>>> > >> >>>> >virsh # list >> >>>> > Id Name State >> >>>> >---------------------------------------------------- >> >>>> > xx HostedEngine Paused >> >>>> > xx ********** running >> >>>> > ... >> >>>> > xx ********** running >> >>>> > >> >>>> >HostedEngine should be in the list, try and resume the engine: >> >>>> > >> >>>> >virsh # resume HostedEngine >> >>>> > >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq ><shareef@jalloq.co.uk> >> >>>> >wrote: >> >>>> > >> >>>> >> Thanks! >> >>>> >> >> >>>> >> The status hangs due to, I guess, the VM being down.... >> >>>> >> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >> >>>> >> VM exists and is down, cleaning up and restarting >> >>>> >> VM in WaitForLaunch >> >>>> >> >> >>>> >> but this doesn't seem to do anything. OK, after a while I >get a >> >>>> >status of >> >>>> >> it being barfed... >> >>>> >> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- >> >>>> >> >> >>>> >> conf_on_shared_storage : True >> >>>> >> Status up-to-date : False >> >>>> >> Hostname : >ovirt-node-00.phoelex.com >> >>>> >> Host ID : 1 >> >>>> >> Engine status : unknown stale-data >> >>>> >> Score : 3400 >> >>>> >> stopped : False >> >>>> >> Local maintenance : False >> >>>> >> crc32 : 9c4a034b >> >>>> >> local_conf_timestamp : 523362 >> >>>> >> Host timestamp : 523608 >> >>>> >> Extra metadata (valid at timestamp): >> >>>> >> metadata_parse_version=1 >> >>>> >> metadata_feature_version=1 >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >> >>>> >> host-id=1 >> >>>> >> score=3400 >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) >> >>>> >> conf_on_shared_storage=True >> >>>> >> maintenance=False >> >>>> >> state=EngineDown >> >>>> >> stopped=False >> >>>> >> >> >>>> >> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- >> >>>> >> >> >>>> >> conf_on_shared_storage : True >> >>>> >> Status up-to-date : True >> >>>> >> Hostname : >ovirt-node-01.phoelex.com >> >>>> >> Host ID : 2 >> >>>> >> Engine status : {"reason": "bad vm >status", >> >>>> >"health": >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >> >>>> >> Score : 0 >> >>>> >> stopped : False >> >>>> >> Local maintenance : False >> >>>> >> crc32 : 5045f2eb >> >>>> >> local_conf_timestamp : 1737037 >> >>>> >> Host timestamp : 1737283 >> >>>> >> Extra metadata (valid at timestamp): >> >>>> >> metadata_parse_version=1 >> >>>> >> metadata_feature_version=1 >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >> >>>> >> host-id=2 >> >>>> >> score=0 >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11
>> >>>> >> conf_on_shared_storage=True >> >>>> >> maintenance=False >> >>>> >> state=EngineUnexpectedlyDown >> >>>> >> stopped=False >> >>>> >> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >> >>>> ><matonb@ltresources.co.uk> >> >>>> >> wrote: >> >>>> >> >> >>>> >>> First steps, on one of your hosts as root: >> >>>> >>> >> >>>> >>> To get information: >> >>>> >>> hosted-engine --vm-status >> >>>> >>> >> >>>> >>> To start the engine: >> >>>> >>> hosted-engine --vm-start >> >>>> >>> >> >>>> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq >> ><shareef@jalloq.co.uk> >> >>>> >wrote: >> >>>> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into it either. >If >> >I >> >>>> >try to >> >>>> >>>> log into the web-ui of the node it is running on, I get >> >redirected >> >>>> >because >> >>>> >>>> the node can't reach the engine. >> >>>> >>>> >> >>>> >>>> What are my next steps? >> >>>> >>>> >> >>>> >>>> Shareef. >> >>>> >>>> _______________________________________________ >> >>>> >>>> Users mailing list -- users@ovirt.org >> >>>> >>>> To unsubscribe send an email to users-leave@ovirt.org >> >>>> >>>> Privacy Statement: >https://www.ovirt.org/privacy-policy.html >> >>>> >>>> oVirt Code of Conduct: >> >>>> >>>> https://www.ovirt.org/community/about/community-guidelines/ >> >>>> >>>> List Archives: >> >>>> >>>> >> >>>> > >> >>>> >> > >> >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>> >>>> >>>> >> >>>> >>> >> >>>> >> >>>> This has to be resolved: >> >>>> >> >>>> Engine status : unknown stale-data >> >>>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the same, >> >restart >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >> >>>> >> >>>> Verify that the engine's storage is available. Then monitor the >> >broker >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >> >>>> >> >>>> Best Regards, >> >>>> Strahil Nikolov >> >>>> >> >>>> >> >>>> >> >>>> >> >> Hi Shareef, >> >> The flow of activation oVirt is more complex than a plain KVM. >> Mounting of the domains happen during the activation of the node ( >the >> HostedEngine is activating everything needed). >> >> Focus on the HostedEngine VM. >> Is it running properly ? >> >> If not,try: >> 1. Verify that the storage domain exists >> 2. Check if it has 'ha_agents' directory >> 3. Check if the links are OK, if not you can safely remove the links >> >> 4. Next check the services are running: >> A) sanlock >> B) supervdsmd >> C) vdsmd >> D) libvirtd >> >> 5. Increase the log level for broker and agent services: >> >> cd /etc/ovirt-hosted-engine-ha >> vim *-log.conf >> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >> >> 6. Check what they are complaining about >> Keep in mind that agent will keep throwing errors untill the broker >stops >> doing it (agent depends on broker), so broker must be OK before >> peoceeding with the agent log. >> >> About the manual VM start, you need 2 things: >> >> 1. Define the VM network >> # cat vdsm-ovirtmgmt.xml <network> >> <name>vdsm-ovirtmgmt</name> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >> <forward mode='bridge'/> >> <bridge name='ovirtmgmt'/> >> </network> >> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml >> >> 2. Get an xml definition which can be found in the vdsm log. Every VM >at >> start up has it's configuration printed out in vdsm log on the host >it >> starts. >> Save to file and then: >> A) virsh define myvm.xml >> B) virsh start myvm >> >> It seems there is/was a problem with your NFS shares. >> >> >> Best Regards, >> Strahil Nikolov >>
Hey Shareef,
Check if there are any files or folders not owned by vdsm:kvm . Something like this:
find . -not -user 36 -not -group 36 -print
Also check if vdsm can access the images in the '<vol-mount-point>/images' directories.
Best Regards, Strahil Nikolov
And the IPv6 address '64:ff9b::c0a8:13d' ?
I don't see in the log output.
Best Regards, Strahil Nikolov
Based on your output , you got a PTR record for IPv4 & IPv6 ... most probably it's the reason.
Set the IPv6 on the interface and try again.
Best Regards, Strahil Nikolov

Ha, spoke too soon. It's now stuck in a loop and a google points me at https://bugzilla.redhat.com/show_bug.cgi?id=1746585 However, forcing ipv4 doesn't seem to have fixed the loop. On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, that seems to have fixed it, thanks. Is this a side effect of redeploying the HE over a first time install? Nothing has changed in our setup and I didn't need to do this when I initially set up our nodes.
On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Hmmm, we're not using ipv6. Is that the issue?
On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Right, I've given up on recovering the HE so want to try and redeploy it. There doesn't seem to be enough information to debug why the broker/agent won't start cleanly.
In running 'hosted-engine --deploy', I'm seeing the following error in the setup validation phase:
2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please provide
hostname of this host on the management network [ovirt-node-00.phoelex.com]:
2020-04-14 09:46:12,831+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getResolvedAddresses:432 getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com resolves to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', 'ANY'], executable='None', cwd='None', env=None
2020-04-14 09:46:12,871+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'], rc=0
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stdout:
ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stderr:
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr'), rc=0
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr') stdout:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:febc:326a/64 scope link
valid_lft forever preferred_lft forever
22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr') stderr:
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getLocalAddresses:251 addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 test_hostname exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 460, in test_hostname
not_local_text,
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 342, in _validateFQDNresolvability
addresses=resolvedAddressesAsString
RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
2020-04-14 09:46:12,884+0000 ERROR otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host name is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
The node I'm running on has an IP address of .61 and resolves correctly.
On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Where should I be checking if there are any files/folder not owned by vdsm:kvm? I checked on the mount the HA sits on and it's fine.
How would I go about checking vdsm can access those images? If I run virsh, it lists them and they were running yesterday even though
On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the the
HA was
down. I've since restarted both hosts but the broker is still spitting out the same error (copied below). How do I find the reason the broker can't connect to the storage? The conf file is already at DEBUG verbosity:
[handler_logfile]
class=logging.handlers.TimedRotatingFileHandler
args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
level=DEBUG
formatter=long
And what are all these .prob-<num> files that are being created? There are over 250K of them now on the mount I'm using for the Data domain. They're all of 0 size and of the form, /rhev/data-center/mnt/nas-01.phoelex.com: _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
@eevans: The volume I have the Data Domain on has TB's free. The HA is dead so I can't ssh in. No idea what started these errors and the other VMs were still running happily although they're on a different Data Domain.
Shareef.
MainThread::INFO::2020-04-10
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-10
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-10
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < > shareef@jalloq.co.uk> wrote: > >OK, let's go through this. I'm looking at the node that at least still > >has > >some VMs running. virsh also tells me that the HostedEngine VM is > >running > >but it's unresponsive and I can't shut it down. > > > >1. All storage domains exist and are mounted. > >2. The ha_agent exists: > > > >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ > >nas-01.phoelex.com > \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ > > > >dom_md ha_agent images master > > > >3. There are two links > > > >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ > >nas-01.phoelex.com > \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ > > > >total 8 > > > >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace -> > >
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
> > > >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata -> > >
> > > >4. The services exist but all seem to have some sort of warning: > > > >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 > >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec* > > > >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed > >to > >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: > >No > >such file or directory* > > > >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed > >to > >retrieve Hosted Engine HA score '[Errno 2] No such file or
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 directory'Is
> >the > >Hosted Engine setup finished?* > > > >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 > >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot > >parse > >process status data > > > >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 > >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : > >internal > >error: /proc/net/dev: Interface not found > > > >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 > >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of > >file > >while reading data: Input/output error > > > >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 > >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of > >file > >while reading data: Input/output error > > > >5 & 6. The broker log is continually printing this error: > > > >MainThread::INFO::2020-04-09 > >
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >ovirt-hosted-engine-ha broker 2.3.6 started > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >Running broker > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
> >Starting monitor > > > >MainThread::INFO::2020-04-09 > >
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Searching for submonitors in > >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker > > > >/submonitors > > > >MainThread::INFO::2020-04-09 > >
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor network > > > >MainThread::INFO::2020-04-09 > >
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor cpu-load-no-engine > > > >MainThread::INFO::2020-04-09 > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor mgmt-bridge > > > >MainThread::INFO::2020-04-09 > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor network > > > >MainThread::INFO::2020-04-09 > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor cpu-load > > > >MainThread::INFO::2020-04-09 > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor engine-health > > > >MainThread::INFO::2020-04-09 > >
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor mgmt-bridge > > > >MainThread::INFO::2020-04-09 > >
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor cpu-load-no-engine > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor cpu-load > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor mem-free > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor storage-domain > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor storage-domain > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor mem-free > > > >MainThread::INFO::2020-04-09 > >
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor engine-health > > > >MainThread::INFO::2020-04-09 > >
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Finished loading submonitors > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
> >Starting storage broker > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >Connecting to VDSM > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
> >Creating a new json-rpc connection to VDSM > > > >Client localhost:54321::DEBUG::2020-04-09 > >08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client > >localhost:54321, started daemon 139992488138496)> (func=<bound method > >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at > >0x7f528acabc90>>, args=(), kwargs={}) > > > >Client localhost:54321::DEBUG::2020-04-09 > >
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
> >Stomp connection established > > > >MainThread::DEBUG::2020-04-09 > >08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::INFO::2020-04-09 > >
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >Connecting the storage > > > >MainThread::INFO::2020-04-09 > >
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >Connecting storage server > > > >MainThread::DEBUG::2020-04-09 > >08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available > > > >MainThread::INFO::2020-04-09 > >
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >Connecting storage server > > > >MainThread::DEBUG::2020-04-09 > >08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] > > > >MainThread::INFO::2020-04-09 > >
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >Refreshing the storage domain > > > >MainThread::DEBUG::2020-04-09 > >08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >Error refreshing storage domain: Command StorageDomain.getStats with > >args > >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > > > >(code=350, message=Error in storage domain action: > >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > > >MainThread::DEBUG::2020-04-09 > >08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
> >Command StorageDomain.getInfo with args {'storagedomainID': > >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > > > >(code=350, message=Error in storage domain action: > >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > > >MainThread::WARNING::2020-04-09 > >
> >Can't connect vdsm storage: Command StorageDomain.getInfo with args > >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > > > >(code=350, message=Error in storage domain action: > >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > > > > >The UUID it is moaning about is indeed the one that the HA sits on and > >is > >the one I listed the contents of in step 2 above. > > > > > >So why can't it see this domain? > > > > > >Thanks, Shareef. > > > >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> > >wrote: > > > >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < > >> shareef@jalloq.co.uk> wrote: > >> >Don't know if this is useful or not, but I just tried to shutdown > >and > >> >start > >> >another VM on one of the hosts and get the following error: > >> > > >> >virsh # start scratch > >> > > >> >error: Failed to start domain scratch > >> > > >> >error: Network not found: no network with matching name > >> >'vdsm-ovirtmgmt' > >> > > >> >Is this not referring to the interface name as the network is called > >> >'ovirtmgnt'. > >> > > >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >wrote: > >> > > >> >> Hmmm, virsh tells me the HE is running but it hasn't come up and > >the > >> >> agent.log is full of the same errors. > >> >> > >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >> wrote: > >> >> > >> >>> Ah hah! Ok, so I've managed to start it using virsh on
> >second > >> >host > >> >>> but my first host is still dead. > >> >>> > >> >>> First of all, what are these 56,317 .prob- files that get dumped > >to > >> >the > >> >>> NFS mounts? > >> >>> > >> >>> Secondly, why doesn't the node mount the NFS directories at boot? > >> >Is > >> >>> that the issue with this particular node? > >> >>> > >> >>> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> > >wrote: > >> >>> > >> >>>> Did you try virsh list --inactive > >> >>>> > >> >>>> > >> >>>> > >> >>>> Eric Evans > >> >>>> > >> >>>> Digital Data Services LLC. > >> >>>> > >> >>>> 304.660.9080 > >> >>>> > >> >>>> > >> >>>> > >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> > >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM > >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> > >> >>>> *Cc:* Ovirt Users <users@ovirt.org> > >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to > >> >rescue? > >> >>>> > >> >>>> > >> >>>> > >> >>>> I've now shut down the VMs on one host and rebooted it but
> >> >agent > >> >>>> service doesn't start. If I run 'hosted-engine --vm-status' I > >get: > >> >>>> > >> >>>> > >> >>>> > >> >>>> The hosted engine configuration has not been retrieved from > >shared > >> >>>> storage. Please ensure that ovirt-ha-agent is running and
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) the the the
> >> >storage > >> >>>> server is reachable. > >> >>>> > >> >>>> > >> >>>> > >> >>>> and indeed if I list the mounts under /rhev/data-center/mnt, > >only > >> >one of > >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO Domain > >> >and two > >> >>>> Data Domains. Only one Data Domain has mounted and this has > >lots > >> >of .prob > >> >>>> files in. So why haven't the other NFS exports been mounted? > >> >>>> > >> >>>> > >> >>>> > >> >>>> Manually mounting them doesn't seem to have helped much either. > >I > >> >can > >> >>>> start the broker service but the agent service says no. Same > >error > >> >as the > >> >>>> one in my last email. > >> >>>> > >> >>>> > >> >>>> > >> >>>> Shareef. > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq > >> ><shareef@jalloq.co.uk> > >> >>>> wrote: > >> >>>> > >> >>>> Right, still down. I've run virsh and it doesn't know anything > >> >about > >> >>>> the engine vm. > >> >>>> > >> >>>> > >> >>>> > >> >>>> I've restarted the broker and agent services and I still get > >> >nothing in > >> >>>> virsh->list. > >> >>>> > >> >>>> > >> >>>> > >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots of > >> >errors: > >> >>>> > >> >>>> > >> >>>> > >> >>>> broker.log: > >> >>>> > >> >>>> > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Searching for submonitors in > >> >>>> > >> > >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor network > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor cpu-load-no-engine > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor mgmt-bridge > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor network > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor cpu-load > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor engine-health > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor mgmt-bridge > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor cpu-load-no-engine > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor cpu-load > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor mem-free > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor storage-domain > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor storage-domain > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor mem-free > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Loaded submonitor engine-health > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Finished loading submonitors > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> >>>> Connecting the storage > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> Connecting storage server > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> Connecting storage server > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> Refreshing the storage domain > >> >>>> > >> >>>> MainThread::WARNING::2020-04-08 > >> >>>> > >> > >> > >
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo with > >args > >> >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >failed: > >> >>>> > >> >>>> (code=350, message=Error in storage domain action: > >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> Searching for submonitors in > >> >>>> > >> > >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >> >>>> > >> >>>> > >> >>>> > >> >>>> agent.log: > >> >>>> > >> >>>> > >> >>>> > >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> > >> > >> > >
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>>> Trying to restart agent > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> >
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>>> Agent shutting down > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> >
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) > >> >>>> Found certificate common name: ovirt-node-01.phoelex.com > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >> >>>> Initializing ha-broker connection > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) > >> >>>> Starting monitor network, options {'tcp_t_address': '', > >> >'network_test': > >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > >> >>>> > >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >> >>>> Failed to start necessary monitors > >> >>>> > >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>>> Traceback (most recent call last): > >> >>>> > >> >>>> File > >> >>>> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >> >>>> line 131, in _run_agent > >> >>>> > >> >>>> return action(he) > >> >>>> > >> >>>> File > >> >>>> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >> >>>> line 55, in action_proper > >> >>>> > >> >>>> return he.start_monitoring() > >> >>>> > >> >>>> File > >> >>>> > >> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >> >>>> line 432, in start_monitoring > >> >>>> > >> >>>> self._initialize_broker() > >> >>>> > >> >>>> File > >> >>>> > >> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >> >>>> line 556, in _initialize_broker > >> >>>> > >> >>>> m.get('options', {})) > >> >>>> > >> >>>> File > >> >>>> > >> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >> >>>> line 89, in start_monitor > >> >>>> > >> >>>> ).format(t=type, o=options, e=e) > >> >>>> > >> >>>> RequestError: brokerlink - failed to start monitor via > >> >ovirt-ha-broker: > >> >>>> [Errno 2] No such file or directory, [monitor: 'network', > >options: > >> >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', > >> >'addr': > >> >>>> '192.168.1.99'}] > >> >>>> > >> >>>> > >> >>>> > >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>>> Trying to restart agent > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> >
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>>> Agent shutting down > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov > >> ><hunter86_bg@yahoo.com> > >> >>>> wrote: > >> >>>> > >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < > >> >>>> matonb@ltresources.co.uk> wrote: > >> >>>> >On the host you tried to restart the engine on: > >> >>>> > > >> >>>> >Add an alias to virsh (authenticates with virsh_auth.conf) > >> >>>> > > >> >>>> >alias virsh='virsh -c > >> >>>> > qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' > >> >>>> > > >> >>>> >Then run virsh: > >> >>>> > > >> >>>> >virsh > >> >>>> > > >> >>>> >virsh # list > >> >>>> > Id Name State > >> >>>> >---------------------------------------------------- > >> >>>> > xx HostedEngine Paused > >> >>>> > xx ********** running > >> >>>> > ... > >> >>>> > xx ********** running > >> >>>> > > >> >>>> >HostedEngine should be in the list, try and resume the engine: > >> >>>> > > >> >>>> >virsh # resume HostedEngine > >> >>>> > > >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >>>> >wrote: > >> >>>> > > >> >>>> >> Thanks! > >> >>>> >> > >> >>>> >> The status hangs due to, I guess, the VM being down.... > >> >>>> >> > >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >> >>>> >> VM exists and is down, cleaning up and restarting > >> >>>> >> VM in WaitForLaunch > >> >>>> >> > >> >>>> >> but this doesn't seem to do anything. OK, after a while I > >get a > >> >>>> >status of > >> >>>> >> it being barfed... > >> >>>> >> > >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- > >> >>>> >> > >> >>>> >> conf_on_shared_storage : True > >> >>>> >> Status up-to-date : False > >> >>>> >> Hostname : > >ovirt-node-00.phoelex.com > >> >>>> >> Host ID : 1 > >> >>>> >> Engine status : unknown stale-data > >> >>>> >> Score : 3400 > >> >>>> >> stopped : False > >> >>>> >> Local maintenance : False > >> >>>> >> crc32 : 9c4a034b > >> >>>> >> local_conf_timestamp : 523362 > >> >>>> >> Host timestamp : 523608 > >> >>>> >> Extra metadata (valid at timestamp): > >> >>>> >> metadata_parse_version=1 > >> >>>> >> metadata_feature_version=1 > >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) > >> >>>> >> host-id=1 > >> >>>> >> score=3400 > >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) > >> >>>> >> conf_on_shared_storage=True > >> >>>> >> maintenance=False > >> >>>> >> state=EngineDown > >> >>>> >> stopped=False > >> >>>> >> > >> >>>> >> > >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- > >> >>>> >> > >> >>>> >> conf_on_shared_storage : True > >> >>>> >> Status up-to-date : True > >> >>>> >> Hostname : > >ovirt-node-01.phoelex.com > >> >>>> >> Host ID : 2 > >> >>>> >> Engine status : {"reason": "bad vm > >status", > >> >>>> >"health": > >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} > >> >>>> >> Score : 0 > >> >>>> >> stopped : False > >> >>>> >> Local maintenance : False > >> >>>> >> crc32 : 5045f2eb > >> >>>> >> local_conf_timestamp : 1737037 > >> >>>> >> Host timestamp : 1737283 > >> >>>> >> Extra metadata (valid at timestamp): > >> >>>> >> metadata_parse_version=1 > >> >>>> >> metadata_feature_version=1 > >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >> >>>> >> host-id=2 > >> >>>> >> score=0 > >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11
> >> >>>> >> conf_on_shared_storage=True > >> >>>> >> maintenance=False > >> >>>> >> state=EngineUnexpectedlyDown > >> >>>> >> stopped=False > >> >>>> >> > >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > >> >>>> ><matonb@ltresources.co.uk> > >> >>>> >> wrote: > >> >>>> >> > >> >>>> >>> First steps, on one of your hosts as root: > >> >>>> >>> > >> >>>> >>> To get information: > >> >>>> >>> hosted-engine --vm-status > >> >>>> >>> > >> >>>> >>> To start the engine: > >> >>>> >>> hosted-engine --vm-start > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq > >> ><shareef@jalloq.co.uk> > >> >>>> >wrote: > >> >>>> >>> > >> >>>> >>>> So my engine has gone down and I can't ssh into it either. > >If > >> >I > >> >>>> >try to > >> >>>> >>>> log into the web-ui of the node it is running on, I get > >> >redirected > >> >>>> >because > >> >>>> >>>> the node can't reach the engine. > >> >>>> >>>> > >> >>>> >>>> What are my next steps? > >> >>>> >>>> > >> >>>> >>>> Shareef. > >> >>>> >>>> _______________________________________________ > >> >>>> >>>> Users mailing list -- users@ovirt.org > >> >>>> >>>> To unsubscribe send an email to users-leave@ovirt.org > >> >>>> >>>> Privacy Statement: > >https://www.ovirt.org/privacy-policy.html > >> >>>> >>>> oVirt Code of Conduct: > >> >>>> >>>> https://www.ovirt.org/community/about/community-guidelines/ > >> >>>> >>>> List Archives: > >> >>>> >>>> > >> >>>> > > >> >>>> > >> > > >> > > >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
> >> >>>> >>>> > >> >>>> >>> > >> >>>> > >> >>>> This has to be resolved: > >> >>>> > >> >>>> Engine status : unknown stale-data > >> >>>> > >> >>>> Run again 'hosted-engine --vm-status'. If it remains the same, > >> >restart > >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service > >> >>>> > >> >>>> Verify that the engine's storage is available. Then monitor the > >> >broker > >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha > >> >>>> > >> >>>> Best Regards, > >> >>>> Strahil Nikolov > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> > >> Hi Shareef, > >> > >> The flow of activation oVirt is more complex than a plain KVM. > >> Mounting of the domains happen during the activation of the node ( > >the > >> HostedEngine is activating everything needed). > >> > >> Focus on the HostedEngine VM. > >> Is it running properly ? > >> > >> If not,try: > >> 1. Verify that the storage domain exists > >> 2. Check if it has 'ha_agents' directory > >> 3. Check if the links are OK, if not you can safely remove the links > >> > >> 4. Next check the services are running: > >> A) sanlock > >> B) supervdsmd > >> C) vdsmd > >> D) libvirtd > >> > >> 5. Increase the log level for broker and agent services: > >> > >> cd /etc/ovirt-hosted-engine-ha > >> vim *-log.conf > >> > >> systemctl restart ovirt-ha-broker ovirt-ha-agent > >> > >> 6. Check what they are complaining about > >> Keep in mind that agent will keep throwing errors untill the broker > >stops > >> doing it (agent depends on broker), so broker must be OK before > >> peoceeding with the agent log. > >> > >> About the manual VM start, you need 2 things: > >> > >> 1. Define the VM network > >> # cat vdsm-ovirtmgmt.xml <network> > >> <name>vdsm-ovirtmgmt</name> > >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> > >> <forward mode='bridge'/> > >> <bridge name='ovirtmgmt'/> > >> </network> > >> > >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml > >> > >> 2. Get an xml definition which can be found in the vdsm log. Every VM > >at > >> start up has it's configuration printed out in vdsm log on the host > >it > >> starts. > >> Save to file and then: > >> A) virsh define myvm.xml > >> B) virsh start myvm > >> > >> It seems there is/was a problem with your NFS shares. > >> > >> > >> Best Regards, > >> Strahil Nikolov > >> > > Hey Shareef, > > Check if there are any files or folders not owned by vdsm:kvm . Something > like this: > > find . -not -user 36 -not -group 36 -print > > Also check if vdsm can access the images in the > '<vol-mount-point>/images' directories. > > Best Regards, > Strahil Nikolov >
And the IPv6 address '64:ff9b::c0a8:13d' ?
I don't see in the log output.
Best Regards, Strahil Nikolov
Based on your output , you got a PTR record for IPv4 & IPv6 ... most probably it's the reason.
Set the IPv6 on the interface and try again.
Best Regards, Strahil Nikolov

Oh this is painful. It seems to progress if you have both he_force_ipv4 set and run the deployment with the '--4' switch. But then I get a failure when the ansible script checks for firewalld-zones and doesn't get anything back. Should the deployment flow not be setting any zones it needs? 2020-04-15 10:57:25,439+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get active list of active firewalld zones] 2020-04-15 10:57:26,641+0000 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': True, u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': 1, u'invocation': {u'module_args': {u'creates': None, u'executable': None, u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], u'msg': u'non-zero return code'} 2020-04-15 10:57:26,741+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -euo pipefail && firewall-cmd --get-active-zones | grep -v \"^\\s*interfaces\"", "delta": "0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ha, spoke too soon. It's now stuck in a loop and a google points me at https://bugzilla.redhat.com/show_bug.cgi?id=1746585
However, forcing ipv4 doesn't seem to have fixed the loop.
On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, that seems to have fixed it, thanks. Is this a side effect of redeploying the HE over a first time install? Nothing has changed in our setup and I didn't need to do this when I initially set up our nodes.
On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Hmmm, we're not using ipv6. Is that the issue?
On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Right, I've given up on recovering the HE so want to try and redeploy it. There doesn't seem to be enough information to debug why the broker/agent won't start cleanly.
In running 'hosted-engine --deploy', I'm seeing the following error in the setup validation phase:
2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please provide
hostname of this host on the management network [ovirt-node-00.phoelex.com]:
2020-04-14 09:46:12,831+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getResolvedAddresses:432 getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com resolves to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', 'ANY'], executable='None', cwd='None', env=None
2020-04-14 09:46:12,871+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'], rc=0
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stdout:
ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stderr:
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr'), rc=0
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr') stdout:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:febc:326a/64 scope link
valid_lft forever preferred_lft forever
22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr') stderr:
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getLocalAddresses:251 addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 test_hostname exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 460, in test_hostname
not_local_text,
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 342, in _validateFQDNresolvability
addresses=resolvedAddressesAsString
RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
2020-04-14 09:46:12,884+0000 ERROR otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host name is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
The node I'm running on has an IP address of .61 and resolves correctly.
On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
> Where should I be checking if there are any files/folder not owned by > vdsm:kvm? I checked on the mount the HA sits on and it's fine. > > How would I go about checking vdsm can access those images? If I run > virsh, it lists them and they were running yesterday even though
On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the the
HA was > down. I've since restarted both hosts but the broker is still spitting out > the same error (copied below). How do I find the reason the broker can't > connect to the storage? The conf file is already at DEBUG verbosity: > > [handler_logfile] > > class=logging.handlers.TimedRotatingFileHandler > > args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7) > > level=DEBUG > > formatter=long > > And what are all these .prob-<num> files that are being created? There > are over 250K of them now on the mount I'm using for the Data domain. > They're all of 0 size and of the form, > /rhev/data-center/mnt/nas-01.phoelex.com: > _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 > > @eevans: The volume I have the Data Domain on has TB's free. The HA is > dead so I can't ssh in. No idea what started these errors and the other > VMs were still running happily although they're on a different Data Domain. > > Shareef. > > MainThread::INFO::2020-04-10 >
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> Connecting the storage > > MainThread::INFO::2020-04-10 >
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server > > MainThread::INFO::2020-04-10 >
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server > > MainThread::INFO::2020-04-10 >
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Refreshing the storage domain > > MainThread::WARNING::2020-04-10 >
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> Can't connect vdsm storage: Command StorageDomain.getInfo with args > {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > > (code=350, message=Error in storage domain action: > (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> > wrote: > >> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < >> shareef@jalloq.co.uk> wrote: >> >OK, let's go through this. I'm looking at the node that at least still >> >has >> >some VMs running. virsh also tells me that the HostedEngine VM is >> >running >> >but it's unresponsive and I can't shut it down. >> > >> >1. All storage domains exist and are mounted. >> >2. The ha_agent exists: >> > >> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ >> >nas-01.phoelex.com >> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >> > >> >dom_md ha_agent images master >> > >> >3. There are two links >> > >> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ >> >nas-01.phoelex.com >> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ >> > >> >total 8 >> > >> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace -> >> >>
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 >> > >> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata -> >> >>
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 >> > >> >4. The services exist but all seem to have some sort of warning: >> > >> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 >> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec* >> > >> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed >> >to >> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: >> >No >> >such file or directory* >> > >> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed >> >to >> >retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is >> >the >> >Hosted Engine setup finished?* >> > >> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 >> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot >> >parse >> >process status data >> > >> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 >> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : >> >internal >> >error: /proc/net/dev: Interface not found >> > >> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 >> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of >> >file >> >while reading data: Input/output error >> > >> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 >> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of >> >file >> >while reading data: Input/output error >> > >> >5 & 6. The broker log is continually printing this error: >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >ovirt-hosted-engine-ha broker 2.3.6 started >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >Running broker >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) >> >Starting monitor >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Searching for submonitors in >> >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker >> > >> >/submonitors >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor network >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor cpu-load-no-engine >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor mgmt-bridge >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor network >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor cpu-load >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor engine-health >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor mgmt-bridge >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor cpu-load-no-engine >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor cpu-load >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor mem-free >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor storage-domain >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor storage-domain >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor mem-free >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Loaded submonitor engine-health >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >Finished loading submonitors >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) >> >Starting storage broker >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >Connecting to VDSM >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) >> >Creating a new json-rpc connection to VDSM >> > >> >Client localhost:54321::DEBUG::2020-04-09 >> >08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client >> >localhost:54321, started daemon 139992488138496)> (func=<bound method >> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at >> >0x7f528acabc90>>, args=(), kwargs={}) >> > >> >Client localhost:54321::DEBUG::2020-04-09 >> >>
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) >> >Stomp connection established >> > >> >MainThread::DEBUG::2020-04-09 >> >08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >> >response >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >Connecting the storage >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >Connecting storage server >> > >> >MainThread::DEBUG::2020-04-09 >> >08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >> >response >> > >> >MainThread::DEBUG::2020-04-09 >> >08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >> >response >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) >> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >Connecting storage server >> > >> >MainThread::DEBUG::2020-04-09 >> >08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >> >response >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >> > >> >MainThread::INFO::2020-04-09 >> >>
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >Refreshing the storage domain >> > >> >MainThread::DEBUG::2020-04-09 >> >08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >> >response >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >Error refreshing storage domain: Command StorageDomain.getStats with >> >args >> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >> > >> >(code=350, message=Error in storage domain action: >> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> > >> >MainThread::DEBUG::2020-04-09 >> >08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending >> >response >> > >> >MainThread::DEBUG::2020-04-09 >> >>
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) >> >Command StorageDomain.getInfo with args {'storagedomainID': >> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >> > >> >(code=350, message=Error in storage domain action: >> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> > >> >MainThread::WARNING::2020-04-09 >> >>
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >Can't connect vdsm storage: Command StorageDomain.getInfo with args >> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >> > >> >(code=350, message=Error in storage domain action: >> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> > >> > >> >The UUID it is moaning about is indeed the one that the HA sits on and >> >is >> >the one I listed the contents of in step 2 above. >> > >> > >> >So why can't it see this domain? >> > >> > >> >Thanks, Shareef. >> > >> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> >> >wrote: >> > >> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < >> >> shareef@jalloq.co.uk> wrote: >> >> >Don't know if this is useful or not, but I just tried to shutdown >> >and >> >> >start >> >> >another VM on one of the hosts and get the following error: >> >> > >> >> >virsh # start scratch >> >> > >> >> >error: Failed to start domain scratch >> >> > >> >> >error: Network not found: no network with matching name >> >> >'vdsm-ovirtmgmt' >> >> > >> >> >Is this not referring to the interface name as the network is called >> >> >'ovirtmgnt'. >> >> > >> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >> ><shareef@jalloq.co.uk> >> >> >wrote: >> >> > >> >> >> Hmmm, virsh tells me the HE is running but it hasn't come up and >> >the >> >> >> agent.log is full of the same errors. >> >> >> >> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq >> ><shareef@jalloq.co.uk> >> >> >> wrote: >> >> >> >> >> >>> Ah hah! Ok, so I've managed to start it using virsh on the >> >second >> >> >host >> >> >>> but my first host is still dead. >> >> >>> >> >> >>> First of all, what are these 56,317 .prob- files that get dumped >> >to >> >> >the >> >> >>> NFS mounts? >> >> >>> >> >> >>> Secondly, why doesn't the node mount the NFS directories at boot? >> >> >Is >> >> >>> that the issue with this particular node? >> >> >>> >> >> >>> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> >> >wrote: >> >> >>> >> >> >>>> Did you try virsh list --inactive >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> Eric Evans >> >> >>>> >> >> >>>> Digital Data Services LLC. >> >> >>>> >> >> >>>> 304.660.9080 >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how to >> >> >rescue? >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> I've now shut down the VMs on one host and rebooted it but the >> >> >agent >> >> >>>> service doesn't start. If I run 'hosted-engine --vm-status' I >> >get: >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> The hosted engine configuration has not been retrieved from >> >shared >> >> >>>> storage. Please ensure that ovirt-ha-agent is running and the >> >> >storage >> >> >>>> server is reachable. >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> and indeed if I list the mounts under /rhev/data-center/mnt, >> >only >> >> >one of >> >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO Domain >> >> >and two >> >> >>>> Data Domains. Only one Data Domain has mounted and this has >> >lots >> >> >of .prob >> >> >>>> files in. So why haven't the other NFS exports been mounted? >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> Manually mounting them doesn't seem to have helped much either. >> >I >> >> >can >> >> >>>> start the broker service but the agent service says no. Same >> >error >> >> >as the >> >> >>>> one in my last email. >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> Shareef. >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >> >> ><shareef@jalloq.co.uk> >> >> >>>> wrote: >> >> >>>> >> >> >>>> Right, still down. I've run virsh and it doesn't know anything >> >> >about >> >> >>>> the engine vm. >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> I've restarted the broker and agent services and I still get >> >> >nothing in >> >> >>>> virsh->list. >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots of >> >> >errors: >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> broker.log: >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Searching for submonitors in >> >> >>>> >> >> >> >>
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor network >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor cpu-load-no-engine >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor mgmt-bridge >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor network >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor cpu-load >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor engine-health >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor mgmt-bridge >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor cpu-load-no-engine >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor cpu-load >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor mem-free >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor storage-domain >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor storage-domain >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor mem-free >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Loaded submonitor engine-health >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Finished loading submonitors >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >>>> Connecting the storage >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> Connecting storage server >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> Connecting storage server >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> Refreshing the storage domain >> >> >>>> >> >> >>>> MainThread::WARNING::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo with >> >args >> >> >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >failed: >> >> >>>> >> >> >>>> (code=350, message=Error in storage domain action: >> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> Searching for submonitors in >> >> >>>> >> >> >> >>
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> agent.log: >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> MainThread::ERROR::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >>>> Trying to restart agent >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >>
>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >>>> Agent shutting down >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >>
>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >> >> >>>> Initializing ha-broker connection >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >> >> >>>> Starting monitor network, options {'tcp_t_address': '', >> >> >'network_test': >> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >> >> >>>> >> >> >>>> MainThread::ERROR::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >> >> >>>> Failed to start necessary monitors >> >> >>>> >> >> >>>> MainThread::ERROR::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >>>> Traceback (most recent call last): >> >> >>>> >> >> >>>> File >> >> >>>> >> >> >> >>
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> >> >>>> line 131, in _run_agent >> >> >>>> >> >> >>>> return action(he) >> >> >>>> >> >> >>>> File >> >> >>>> >> >> >> >>
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> >> >>>> line 55, in action_proper >> >> >>>> >> >> >>>> return he.start_monitoring() >> >> >>>> >> >> >>>> File >> >> >>>> >> >> >> >> >> >>
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> >> >>>> line 432, in start_monitoring >> >> >>>> >> >> >>>> self._initialize_broker() >> >> >>>> >> >> >>>> File >> >> >>>> >> >> >> >> >> >>
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> >> >>>> line 556, in _initialize_broker >> >> >>>> >> >> >>>> m.get('options', {})) >> >> >>>> >> >> >>>> File >> >> >>>> >> >> >> >> >> >>
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> >> >>>> line 89, in start_monitor >> >> >>>> >> >> >>>> ).format(t=type, o=options, e=e) >> >> >>>> >> >> >>>> RequestError: brokerlink - failed to start monitor via >> >> >ovirt-ha-broker: >> >> >>>> [Errno 2] No such file or directory, [monitor: 'network', >> >options: >> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', >> >> >'addr': >> >> >>>> '192.168.1.99'}] >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> MainThread::ERROR::2020-04-08 >> >> >>>> >> >> >> >> >> >>
>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >>>> Trying to restart agent >> >> >>>> >> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >> >>
>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >>>> Agent shutting down >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >> >> ><hunter86_bg@yahoo.com> >> >> >>>> wrote: >> >> >>>> >> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < >> >> >>>> matonb@ltresources.co.uk> wrote: >> >> >>>> >On the host you tried to restart the engine on: >> >> >>>> > >> >> >>>> >Add an alias to virsh (authenticates with virsh_auth.conf) >> >> >>>> > >> >> >>>> >alias virsh='virsh -c >> >> >>>> >> qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >> >> >>>> > >> >> >>>> >Then run virsh: >> >> >>>> > >> >> >>>> >virsh >> >> >>>> > >> >> >>>> >virsh # list >> >> >>>> > Id Name State >> >> >>>> >---------------------------------------------------- >> >> >>>> > xx HostedEngine Paused >> >> >>>> > xx ********** running >> >> >>>> > ... >> >> >>>> > xx ********** running >> >> >>>> > >> >> >>>> >HostedEngine should be in the list, try and resume the engine: >> >> >>>> > >> >> >>>> >virsh # resume HostedEngine >> >> >>>> > >> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq >> ><shareef@jalloq.co.uk> >> >> >>>> >wrote: >> >> >>>> > >> >> >>>> >> Thanks! >> >> >>>> >> >> >> >>>> >> The status hangs due to, I guess, the VM being down.... >> >> >>>> >> >> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >> >> >>>> >> VM exists and is down, cleaning up and restarting >> >> >>>> >> VM in WaitForLaunch >> >> >>>> >> >> >> >>>> >> but this doesn't seem to do anything. OK, after a while I >> >get a >> >> >>>> >status of >> >> >>>> >> it being barfed... >> >> >>>> >> >> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- >> >> >>>> >> >> >> >>>> >> conf_on_shared_storage : True >> >> >>>> >> Status up-to-date : False >> >> >>>> >> Hostname : >> >ovirt-node-00.phoelex.com >> >> >>>> >> Host ID : 1 >> >> >>>> >> Engine status : unknown stale-data >> >> >>>> >> Score : 3400 >> >> >>>> >> stopped : False >> >> >>>> >> Local maintenance : False >> >> >>>> >> crc32 : 9c4a034b >> >> >>>> >> local_conf_timestamp : 523362 >> >> >>>> >> Host timestamp : 523608 >> >> >>>> >> Extra metadata (valid at timestamp): >> >> >>>> >> metadata_parse_version=1 >> >> >>>> >> metadata_feature_version=1 >> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >> >> >>>> >> host-id=1 >> >> >>>> >> score=3400 >> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) >> >> >>>> >> conf_on_shared_storage=True >> >> >>>> >> maintenance=False >> >> >>>> >> state=EngineDown >> >> >>>> >> stopped=False >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- >> >> >>>> >> >> >> >>>> >> conf_on_shared_storage : True >> >> >>>> >> Status up-to-date : True >> >> >>>> >> Hostname : >> >ovirt-node-01.phoelex.com >> >> >>>> >> Host ID : 2 >> >> >>>> >> Engine status : {"reason": "bad vm >> >status", >> >> >>>> >"health": >> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >> >> >>>> >> Score : 0 >> >> >>>> >> stopped : False >> >> >>>> >> Local maintenance : False >> >> >>>> >> crc32 : 5045f2eb >> >> >>>> >> local_conf_timestamp : 1737037 >> >> >>>> >> Host timestamp : 1737283 >> >> >>>> >> Extra metadata (valid at timestamp): >> >> >>>> >> metadata_parse_version=1 >> >> >>>> >> metadata_feature_version=1 >> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >> >> >>>> >> host-id=2 >> >> >>>> >> score=0 >> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11
>> >> >>>> >> conf_on_shared_storage=True >> >> >>>> >> maintenance=False >> >> >>>> >> state=EngineUnexpectedlyDown >> >> >>>> >> stopped=False >> >> >>>> >> >> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >> >> >>>> ><matonb@ltresources.co.uk> >> >> >>>> >> wrote: >> >> >>>> >> >> >> >>>> >>> First steps, on one of your hosts as root: >> >> >>>> >>> >> >> >>>> >>> To get information: >> >> >>>> >>> hosted-engine --vm-status >> >> >>>> >>> >> >> >>>> >>> To start the engine: >> >> >>>> >>> hosted-engine --vm-start >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq >> >> ><shareef@jalloq.co.uk> >> >> >>>> >wrote: >> >> >>>> >>> >> >> >>>> >>>> So my engine has gone down and I can't ssh into it either. >> >If >> >> >I >> >> >>>> >try to >> >> >>>> >>>> log into the web-ui of the node it is running on, I get >> >> >redirected >> >> >>>> >because >> >> >>>> >>>> the node can't reach the engine. >> >> >>>> >>>> >> >> >>>> >>>> What are my next steps? >> >> >>>> >>>> >> >> >>>> >>>> Shareef. >> >> >>>> >>>> _______________________________________________ >> >> >>>> >>>> Users mailing list -- users@ovirt.org >> >> >>>> >>>> To unsubscribe send an email to users-leave@ovirt.org >> >> >>>> >>>> Privacy Statement: >> >https://www.ovirt.org/privacy-policy.html >> >> >>>> >>>> oVirt Code of Conduct: >> >> >>>> >>>> https://www.ovirt.org/community/about/community-guidelines/ >> >> >>>> >>>> List Archives: >> >> >>>> >>>> >> >> >>>> > >> >> >>>> >> >> > >> >> >> > >>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>> >> >>>> >>>> >> >> >>>> >>> >> >> >>>> >> >> >>>> This has to be resolved: >> >> >>>> >> >> >>>> Engine status : unknown stale-data >> >> >>>> >> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the same, >> >> >restart >> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >> >> >>>> >> >> >>>> Verify that the engine's storage is available. Then monitor the >> >> >broker >> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >> >> >>>> >> >> >>>> Best Regards, >> >> >>>> Strahil Nikolov >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >> >> Hi Shareef, >> >> >> >> The flow of activation oVirt is more complex than a plain KVM. >> >> Mounting of the domains happen during the activation of the node ( >> >the >> >> HostedEngine is activating everything needed). >> >> >> >> Focus on the HostedEngine VM. >> >> Is it running properly ? >> >> >> >> If not,try: >> >> 1. Verify that the storage domain exists >> >> 2. Check if it has 'ha_agents' directory >> >> 3. Check if the links are OK, if not you can safely remove the links >> >> >> >> 4. Next check the services are running: >> >> A) sanlock >> >> B) supervdsmd >> >> C) vdsmd >> >> D) libvirtd >> >> >> >> 5. Increase the log level for broker and agent services: >> >> >> >> cd /etc/ovirt-hosted-engine-ha >> >> vim *-log.conf >> >> >> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >> >> >> >> 6. Check what they are complaining about >> >> Keep in mind that agent will keep throwing errors untill the broker >> >stops >> >> doing it (agent depends on broker), so broker must be OK before >> >> peoceeding with the agent log. >> >> >> >> About the manual VM start, you need 2 things: >> >> >> >> 1. Define the VM network >> >> # cat vdsm-ovirtmgmt.xml <network> >> >> <name>vdsm-ovirtmgmt</name> >> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >> >> <forward mode='bridge'/> >> >> <bridge name='ovirtmgmt'/> >> >> </network> >> >> >> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml >> >> >> >> 2. Get an xml definition which can be found in the vdsm log. Every VM >> >at >> >> start up has it's configuration printed out in vdsm log on the host >> >it >> >> starts. >> >> Save to file and then: >> >> A) virsh define myvm.xml >> >> B) virsh start myvm >> >> >> >> It seems there is/was a problem with your NFS shares. >> >> >> >> >> >> Best Regards, >> >> Strahil Nikolov >> >> >> >> Hey Shareef, >> >> Check if there are any files or folders not owned by vdsm:kvm . Something >> like this: >> >> find . -not -user 36 -not -group 36 -print >> >> Also check if vdsm can access the images in the >> '<vol-mount-point>/images' directories. >> >> Best Regards, >> Strahil Nikolov >> >
And the IPv6 address '64:ff9b::c0a8:13d' ?
I don't see in the log output.
Best Regards, Strahil Nikolov
Based on your output , you got a PTR record for IPv4 & IPv6 ... most probably it's the reason.
Set the IPv6 on the interface and try again.
Best Regards, Strahil Nikolov

On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote: >Oh this is painful. It seems to progress if you have both >he_force_ipv4 >set and run the deployment with the '--4' switch. > >But then I get a failure when the ansible script checks for >firewalld-zones >and doesn't get anything back. Should the deployment flow not be >setting >any zones it needs? > >2020-04-15 10:57:25,439+0000 INFO >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get >active list of active firewalld zones] > >2020-04-15 10:57:26,641+0000 DEBUG >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': >True, >u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, >u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd >--get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 >10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': >1, >u'invocation': {u'module_args': {u'creates': None, u'executable': None, >u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set >-euo >pipefail && firewall-cmd --get-active-zones | grep -v >"^\\s*interfaces"', >u'removes': None, u'argv': None, u'warn': True, u'chdir': None, >u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], >u'msg': >u'non-zero return code'} > >2020-04-15 10:57:26,741+0000 ERROR >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:107 fatal: [localhost]: FAILED! => >{"changed": true, "cmd": "set -euo pipefail && firewall-cmd >--get-active-zones | grep -v \"^\\s*interfaces\"", "delta": >"0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero >return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": >"", >"stderr_lines": [], "stdout": "", "stdout_lines": []} > >On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> >wrote: > >> Ha, spoke too soon. It's now stuck in a loop and a google points me >at >> https://bugzilla.redhat.com/show_bug.cgi?id=1746585 >> >> However, forcing ipv4 doesn't seem to have fixed the loop. >> >> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> >> wrote: >> >>> OK, that seems to have fixed it, thanks. Is this a side effect of >>> redeploying the HE over a first time install? Nothing has changed in >our >>> setup and I didn't need to do this when I initially set up our >nodes. >>> >>> >>> >>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov ><hunter86_bg@yahoo.com> >>> wrote: >>> >>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < >>>> shareef@jalloq.co.uk> wrote: >>>> >Hmmm, we're not using ipv6. Is that the issue? >>>> > >>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov ><hunter86_bg@yahoo.com> >>>> >wrote: >>>> > >>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < >>>> >> shareef@jalloq.co.uk> wrote: >>>> >> >Right, I've given up on recovering the HE so want to try and >>>> >redeploy >>>> >> >it. >>>> >> >There doesn't seem to be enough information to debug why the >>>> >> >broker/agent >>>> >> >won't start cleanly. >>>> >> > >>>> >> >In running 'hosted-engine --deploy', I'm seeing the following >error >>>> >in >>>> >> >the >>>> >> >setup validation phase: >>>> >> > >>>> >> >2020-04-14 09:46:08,922+0000 DEBUG >otopi.plugins.otopi.dialog.human >>>> >> >dialog.__logString:204 DIALOG:SEND Please >provide >>>> >the >>>> >> >hostname of this host on the management network >>>> >> >[ovirt-node-00.phoelex.com]: >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,831+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname.getResolvedAddresses:432 >>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', >'192.168.1.61']) >>>> >> > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname._validateFQDNresolvability:289 >ovirt-node-00.phoelex.com >>>> >> >resolves >>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >>>> >> > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>>> >> >execute: >>>> >> >['/usr/bin/dig', '+noall', '+answer', >'ovirt-node-00.phoelex.com', >>>> >> >'ANY'], >>>> >> >executable='None', cwd='None', env=None >>>> >> > >>>> >> >2020-04-14 09:46:12,871+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: >>>> >> > >>>> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: >>>> >> > >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>>> >> >execute: >>>> >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', >env=None >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: >>>> >> > >>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state >UNKNOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >>>> >> > >>>> >> > inet 127.0.0.1/8 scope host lo >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> > inet6 ::1/128 scope host >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq >master >>>> >> >ovirtmgmt state UP group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq >state >>>> >> >DOWN >>>> >> >group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state >DOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >>>> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >>>> >noqueue >>>> >> >state UP group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global >ovirtmgmt >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop >state >>>> >DOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: >>>> >> > >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname.getLocalAddresses:251 >>>> >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] >>>> >> > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >hostname.test_hostname:464 >>>> >> >test_hostname exception >>>> >> > >>>> >> >Traceback (most recent call last): >>>> >> > >>>> >> >File >"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >>>> >> >line >>>> >> >460, in test_hostname >>>> >> > >>>> >> > not_local_text, >>>> >> > >>>> >> >File >"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >>>> >> >line >>>> >> >342, in _validateFQDNresolvability >>>> >> > >>>> >> > addresses=resolvedAddressesAsString >>>> >> > >>>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to >>>> >64:ff9b::c0a8:13d >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>>> >devices >>>> >> >on >>>> >> >this host >>>> >> > >>>> >> >2020-04-14 09:46:12,884+0000 ERROR >>>> >> >otopi.plugins.gr_he_common.network.bridge >dialog.queryEnvKey:120 >>>> >Host >>>> >> >name >>>> >> >is not valid: ovirt-node-00.phoelex.com resolves to >>>> >64:ff9b::c0a8:13d >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>>> >devices >>>> >> >on >>>> >> >this host >>>> >> > >>>> >> >The node I'm running on has an IP address of .61 and resolves >>>> >> >correctly. >>>> >> > >>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq >>>> ><shareef@jalloq.co.uk> >>>> >> >wrote: >>>> >> > >>>> >> >> Where should I be checking if there are any files/folder not >owned >>>> >by >>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and it's >fine. >>>> >> >> >>>> >> >> How would I go about checking vdsm can access those images? >If I >>>> >run >>>> >> >> virsh, it lists them and they were running yesterday even >though >>>> >the >>>> >> >HA was >>>> >> >> down. I've since restarted both hosts but the broker is >still >>>> >> >spitting out >>>> >> >> the same error (copied below). How do I find the reason the >>>> >broker >>>> >> >can't >>>> >> >> connect to the storage? The conf file is already at DEBUG >>>> >verbosity: >>>> >> >> >>>> >> >> [handler_logfile] >>>> >> >> >>>> >> >> class=logging.handlers.TimedRotatingFileHandler >>>> >> >> >>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, >7) >>>> >> >> >>>> >> >> level=DEBUG >>>> >> >> >>>> >> >> formatter=long >>>> >> >> >>>> >> >> And what are all these .prob-<num> files that are being >created? >>>> >> >There >>>> >> >> are over 250K of them now on the mount I'm using for the Data >>>> >domain. >>>> >> >> They're all of 0 size and of the form, >>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: >>>> >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 >>>> >> >> >>>> >> >> @eevans: The volume I have the Data Domain on has TB's free. > The >>>> >HA >>>> >> >is >>>> >> >> dead so I can't ssh in. No idea what started these errors >and the >>>> >> >other >>>> >> >> VMs were still running happily although they're on a >different >>>> >Data >>>> >> >Domain. >>>> >> >> >>>> >> >> Shareef. >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >> Connecting the storage >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Connecting storage server >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Connecting storage server >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Refreshing the storage domain >>>> >> >> >>>> >> >> MainThread::WARNING::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>> >>>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo >with >>>> >args >>>> >> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >> >>>> >> >> (code=350, message=Error in storage domain action: >>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >> >>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov >>>> >> ><hunter86_bg@yahoo.com> >>>> >> >> wrote: >>>> >> >> >>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < >>>> >> >>> shareef@jalloq.co.uk> wrote: >>>> >> >>> >OK, let's go through this. I'm looking at the node that at >>>> >least >>>> >> >still >>>> >> >>> >has >>>> >> >>> >some VMs running. virsh also tells me that the >HostedEngine VM >>>> >is >>>> >> >>> >running >>>> >> >>> >but it's unresponsive and I can't shut it down. >>>> >> >>> > >>>> >> >>> >1. All storage domains exist and are mounted. >>>> >> >>> >2. The ha_agent exists: >>>> >> >>> > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls >>>> >> >/rhev/data-center/mnt/ >>>> >> >>> >nas-01.phoelex.com >>>> >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >>>> >> >>> > >>>> >> >>> >dom_md ha_agent images master >>>> >> >>> > >>>> >> >>> >3. There are two links >>>> >> >>> > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll >>>> >> >/rhev/data-center/mnt/ >>>> >> >>> >nas-01.phoelex.com >>>> >> >>> >>>> >>\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ >>>> >> >>> > >>>> >> >>> >total 8 >>>> >> >>> > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >hosted-engine.lockspace >>>> >-> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 >>>> >> >>> > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >hosted-engine.metadata >>>> >-> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 >>>> >> >>> > >>>> >> >>> >4. The services exist but all seem to have some sort of >warning: >>>> >> >>> > >>>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: >>>> >> >*2020-04-08 >>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time >10 >>>> >sec* >>>> >> >>> > >>>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com >supervdsmd[29409]: >>>> >> >*failed >>>> >> >>> >to >>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared >object >>>> >> >file: >>>> >> >>> >No >>>> >> >>> >such file or directory* >>>> >> >>> > >>>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: >*ERROR >>>> >> >failed >>>> >> >>> >to >>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or >>>> >> >directory'Is >>>> >> >>> >the >>>> >> >>> >Hosted Engine setup finished?* >>>> >> >>> > >>>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com >libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 >: >>>> >> >cannot >>>> >> >>> >parse >>>> >> >>> >process status data >>>> >> >>> > >>>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >22:48:27.134+0000: 29309: error : >virNetDevTapInterfaceStats:764 >>>> >: >>>> >> >>> >internal >>>> >> >>> >error: /proc/net/dev: Interface not found >>>> >> >>> > >>>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 >: >>>> >End >>>> >> >of >>>> >> >>> >file >>>> >> >>> >while reading data: Input/output error >>>> >> >>> > >>>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-09 >>>> >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 >: >>>> >End >>>> >> >of >>>> >> >>> >file >>>> >> >>> >while reading data: Input/output error >>>> >> >>> > >>>> >> >>> >5 & 6. The broker log is continually printing this error: >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >Running broker >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) >>>> >> >>> >Starting monitor >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Searching for submonitors in >>>> >> >>> >>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker >>>> >> >>> > >>>> >> >>> >/submonitors >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor network >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load-no-engine >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mgmt-bridge >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor network >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor engine-health >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mgmt-bridge >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load-no-engine >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mem-free >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor storage-domain >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor storage-domain >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mem-free >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor engine-health >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Finished loading submonitors >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) >>>> >> >>> >Starting storage broker >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >Connecting to VDSM >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) >>>> >> >>> >Creating a new json-rpc connection to VDSM >>>> >> >>> > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START thread >>>> >> ><Thread(Client >>>> >> >>> >localhost:54321, started daemon 139992488138496)> >(func=<bound >>>> >> >method >>>> >> >>> >Reactor.process_requests of ><yajsonrpc.betterAsyncore.Reactor >>>> >> >object at >>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) >>>> >> >>> > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) >>>> >> >>> >Stomp connection established >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >Connecting the storage >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Connecting storage server >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) >>>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not >>>> >> >available >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Connecting storage server >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >[{u'status': 0, u'id': >u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Refreshing the storage domain >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Error refreshing storage domain: Command >StorageDomain.getStats >>>> >> >with >>>> >> >>> >args >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) >>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) >>>> >> >>> >Command StorageDomain.getInfo with args {'storagedomainID': >>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> >MainThread::WARNING::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo >with >>>> >args >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >The UUID it is moaning about is indeed the one that the HA >sits >>>> >on >>>> >> >and >>>> >> >>> >is >>>> >> >>> >the one I listed the contents of in step 2 above. >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >So why can't it see this domain? >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >Thanks, Shareef. >>>> >> >>> > >>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov >>>> >> ><hunter86_bg@yahoo.com> >>>> >> >>> >wrote: >>>> >> >>> > >>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < >>>> >> >>> >> shareef@jalloq.co.uk> wrote: >>>> >> >>> >> >Don't know if this is useful or not, but I just tried to >>>> >> >shutdown >>>> >> >>> >and >>>> >> >>> >> >start >>>> >> >>> >> >another VM on one of the hosts and get the following >error: >>>> >> >>> >> > >>>> >> >>> >> >virsh # start scratch >>>> >> >>> >> > >>>> >> >>> >> >error: Failed to start domain scratch >>>> >> >>> >> > >>>> >> >>> >> >error: Network not found: no network with matching name >>>> >> >>> >> >'vdsm-ovirtmgmt' >>>> >> >>> >> > >>>> >> >>> >> >Is this not referring to the interface name as the >network is >>>> >> >called >>>> >> >>> >> >'ovirtmgnt'. >>>> >> >>> >> > >>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >wrote: >>>> >> >>> >> > >>>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't >come >>>> >up >>>> >> >and >>>> >> >>> >the >>>> >> >>> >> >> agent.log is full of the same errors. >>>> >> >>> >> >> >>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >> wrote: >>>> >> >>> >> >> >>>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh >on >>>> >the >>>> >> >>> >second >>>> >> >>> >> >host >>>> >> >>> >> >>> but my first host is still dead. >>>> >> >>> >> >>> >>>> >> >>> >> >>> First of all, what are these 56,317 .prob- files that >get >>>> >> >dumped >>>> >> >>> >to >>>> >> >>> >> >the >>>> >> >>> >> >>> NFS mounts? >>>> >> >>> >> >>> >>>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS >directories >>>> >at >>>> >> >boot? >>>> >> >>> >> >Is >>>> >> >>> >> >>> that the issue with this particular node? >>>> >> >>> >> >>> >>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM >>>> ><eevans@digitaldatatechs.com> >>>> >> >>> >wrote: >>>> >> >>> >> >>> >>>> >> >>> >> >>>> Did you try virsh list --inactive >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Eric Evans >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Digital Data Services LLC. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> 304.660.9080 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >>>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >>>> >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >>>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine >unresponsive - >>>> >how >>>> >> >to >>>> >> >>> >> >rescue? >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> I've now shut down the VMs on one host and rebooted >it >>>> >but >>>> >> >the >>>> >> >>> >> >agent >>>> >> >>> >> >>>> service doesn't start. If I run 'hosted-engine >>>> >--vm-status' >>>> >> >I >>>> >> >>> >get: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> The hosted engine configuration has not been >retrieved >>>> >from >>>> >> >>> >shared >>>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is >running and >>>> >> >the >>>> >> >>> >> >storage >>>> >> >>> >> >>>> server is reachable. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> and indeed if I list the mounts under >>>> >/rhev/data-center/mnt, >>>> >> >>> >only >>>> >> >>> >> >one of >>>> >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, >one ISO >>>> >> >Domain >>>> >> >>> >> >and two >>>> >> >>> >> >>>> Data Domains. Only one Data Domain has mounted and >this >>>> >has >>>> >> >>> >lots >>>> >> >>> >> >of .prob >>>> >> >>> >> >>>> files in. So why haven't the other NFS exports been >>>> >> >mounted? >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have helped >much >>>> >> >either. >>>> >> >>> >I >>>> >> >>> >> >can >>>> >> >>> >> >>>> start the broker service but the agent service says >no. >>>> >> >Same >>>> >> >>> >error >>>> >> >>> >> >as the >>>> >> >>> >> >>>> one in my last email. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Shareef. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >>>> >> >>> >> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> wrote: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Right, still down. I've run virsh and it doesn't >know >>>> >> >anything >>>> >> >>> >> >about >>>> >> >>> >> >>>> the engine vm. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> I've restarted the broker and agent services and I >still >>>> >get >>>> >> >>> >> >nothing in >>>> >> >>> >> >>>> virsh->list. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I >see >>>> >lots >>>> >> >of >>>> >> >>> >> >errors: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> broker.log: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Searching for submonitors in >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor network >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor network >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor engine-health >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mem-free >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor storage-domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor storage-domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mem-free >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor engine-health >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Finished loading submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >> >>>> Connecting the storage >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Connecting storage server >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Connecting storage server >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Refreshing the storage domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >>> >> >>>> Can't connect vdsm storage: Command >StorageDomain.getInfo >>>> >> >with >>>> >> >>> >args >>>> >> >>> >> >>>> {'storagedomainID': >>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >> >>> >failed: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> (code=350, message=Error in storage domain action: >>>> >> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Searching for submonitors in >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> agent.log: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Trying to restart agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>> >>>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> Agent shutting down >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>> >>>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >>>> >> >>> >> >>>> Found certificate common name: >ovirt-node-01.phoelex.com >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>> >> >>> >> >>>> Initializing ha-broker connection >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>> >> >>> >> >>>> Starting monitor network, options {'tcp_t_address': >'', >>>> >> >>> >> >'network_test': >>>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>> >> >>> >> >>>> Failed to start necessary monitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Traceback (most recent call last): >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> >> >>> >> >>>> line 131, in _run_agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> return action(he) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> >> >>> >> >>>> line 55, in action_proper >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> return he.start_monitoring() >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> >> >>> >> >>>> line 432, in start_monitoring >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> self._initialize_broker() >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> >> >>> >> >>>> line 556, in _initialize_broker >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> m.get('options', {})) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>> >> >>> >> >>>> line 89, in start_monitor >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start monitor >via >>>> >> >>> >> >ovirt-ha-broker: >>>> >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: >'network', >>>> >> >>> >options: >>>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', >>>> >'tcp_t_port': >>>> >> >'', >>>> >> >>> >> >'addr': >>>> >> >>> >> >>>> '192.168.1.99'}] >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>> >>>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Trying to restart agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>> >>>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> Agent shutting down >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >>>> >> >>> >> ><hunter86_bg@yahoo.com> >>>> >> >>> >> >>>> wrote: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, >Brett" < >>>> >> >>> >> >>>> matonb@ltresources.co.uk> wrote: >>>> >> >>> >> >>>> >On the host you tried to restart the engine on: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >Add an alias to virsh (authenticates with >>>> >virsh_auth.conf) >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >alias virsh='virsh -c >>>> >> >>> >> >>>> >>>> >> >>> >>>> >>>>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >Then run virsh: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh # list >>>> >> >>> >> >>>> > Id Name State >>>> >> >>> >> >>>> >>---------------------------------------------------- >>>> >> >>> >> >>>> > xx HostedEngine Paused >>>> >> >>> >> >>>> > xx ********** running >>>> >> >>> >> >>>> > ... >>>> >> >>> >> >>>> > xx ********** running >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >HostedEngine should be in the list, try and resume >the >>>> >> >engine: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh # resume HostedEngine >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> >wrote: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >> Thanks! >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM being >>>> >down.... >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >>>> >> >>> >> >>>> >> VM exists and is down, cleaning up and restarting >>>> >> >>> >> >>>> >> VM in WaitForLaunch >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, after >a >>>> >while >>>> >> >I >>>> >> >>> >get a >>>> >> >>> >> >>>> >status of >>>> >> >>> >> >>>> >> it being barfed... >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) >status >>>> >==-- >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >>>> >> >>> >> >>>> >> Status up-to-date : False >>>> >> >>> >> >>>> >> Hostname : >>>> >> >>> >ovirt-node-00.phoelex.com >>>> >> >>> >> >>>> >> Host ID : 1 >>>> >> >>> >> >>>> >> Engine status : unknown >>>> >stale-data >>>> >> >>> >> >>>> >> Score : 3400 >>>> >> >>> >> >>>> >> stopped : False >>>> >> >>> >> >>>> >> Local maintenance : False >>>> >> >>> >> >>>> >> crc32 : 9c4a034b >>>> >> >>> >> >>>> >> local_conf_timestamp : 523362 >>>> >> >>> >> >>>> >> Host timestamp : 523608 >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>>> >> >>> >> >>>> >> metadata_parse_version=1 >>>> >> >>> >> >>>> >> metadata_feature_version=1 >>>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >>>> >> >>> >> >>>> >> host-id=1 >>>> >> >>> >> >>>> >> score=3400 >>>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 >2020) >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >>>> >> >>> >> >>>> >> maintenance=False >>>> >> >>> >> >>>> >> state=EngineDown >>>> >> >>> >> >>>> >> stopped=False >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) >status >>>> >==-- >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >>>> >> >>> >> >>>> >> Status up-to-date : True >>>> >> >>> >> >>>> >> Hostname : >>>> >> >>> >ovirt-node-01.phoelex.com >>>> >> >>> >> >>>> >> Host ID : 2 >>>> >> >>> >> >>>> >> Engine status : {"reason": >"bad >>>> >vm >>>> >> >>> >status", >>>> >> >>> >> >>>> >"health": >>>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >>>> >> >>> >> >>>> >> Score : 0 >>>> >> >>> >> >>>> >> stopped : False >>>> >> >>> >> >>>> >> Local maintenance : False >>>> >> >>> >> >>>> >> crc32 : 5045f2eb >>>> >> >>> >> >>>> >> local_conf_timestamp : 1737037 >>>> >> >>> >> >>>> >> Host timestamp : 1737283 >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>>> >> >>> >> >>>> >> metadata_parse_version=1 >>>> >> >>> >> >>>> >> metadata_feature_version=1 >>>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >>>> >> >>> >> >>>> >> host-id=2 >>>> >> >>> >> >>>> >> score=0 >>>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 >>>> >2020) >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >>>> >> >>> >> >>>> >> maintenance=False >>>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown >>>> >> >>> >> >>>> >> stopped=False >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >>>> >> >>> >> >>>> ><matonb@ltresources.co.uk> >>>> >> >>> >> >>>> >> wrote: >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >>> First steps, on one of your hosts as root: >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> To get information: >>>> >> >>> >> >>>> >>> hosted-engine --vm-status >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> To start the engine: >>>> >> >>> >> >>>> >>> hosted-engine --vm-start >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq >>>> >> >>> >> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> >wrote: >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into >it >>>> >> >either. >>>> >> >>> >If >>>> >> >>> >> >I >>>> >> >>> >> >>>> >try to >>>> >> >>> >> >>>> >>>> log into the web-ui of the node it is running >on, I >>>> >get >>>> >> >>> >> >redirected >>>> >> >>> >> >>>> >because >>>> >> >>> >> >>>> >>>> the node can't reach the engine. >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>>> What are my next steps? >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>>> Shareef. >>>> >> >>> >> >>>> >>>> _______________________________________________ >>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org >>>> >> >>> >> >>>> >>>> To unsubscribe send an email to >>>> >users-leave@ovirt.org >>>> >> >>> >> >>>> >>>> Privacy Statement: >>>> >> >>> >https://www.ovirt.org/privacy-policy.html >>>> >> >>> >> >>>> >>>> oVirt Code of Conduct: >>>> >> >>> >> >>>> >>>> >>>> >> >https://www.ovirt.org/community/about/community-guidelines/ >>>> >> >>> >> >>>> >>>> List Archives: >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >>>> >> >>> >> > >>>> >> >>> >> >>>> >> >>> > >>>> >> >>> >>>> >> > >>>> >> >>>> > >>>> >https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/ >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> This has to be resolved: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Engine status : unknown >stale-data >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains >the >>>> >> >same, >>>> >> >>> >> >restart >>>> >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Verify that the engine's storage is available. Then >>>> >monitor >>>> >> >the >>>> >> >>> >> >broker >>>> >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Best Regards, >>>> >> >>> >> >>>> Strahil Nikolov >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> Hi Shareef, >>>> >> >>> >> >>>> >> >>> >> The flow of activation oVirt is more complex than a plain >KVM. >>>> >> >>> >> Mounting of the domains happen during the activation of >the >>>> >node >>>> >> >( >>>> >> >>> >the >>>> >> >>> >> HostedEngine is activating everything needed). >>>> >> >>> >> >>>> >> >>> >> Focus on the HostedEngine VM. >>>> >> >>> >> Is it running properly ? >>>> >> >>> >> >>>> >> >>> >> If not,try: >>>> >> >>> >> 1. Verify that the storage domain exists >>>> >> >>> >> 2. Check if it has 'ha_agents' directory >>>> >> >>> >> 3. Check if the links are OK, if not you can safely >remove >>>> >the >>>> >> >links >>>> >> >>> >> >>>> >> >>> >> 4. Next check the services are running: >>>> >> >>> >> A) sanlock >>>> >> >>> >> B) supervdsmd >>>> >> >>> >> C) vdsmd >>>> >> >>> >> D) libvirtd >>>> >> >>> >> >>>> >> >>> >> 5. Increase the log level for broker and agent services: >>>> >> >>> >> >>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha >>>> >> >>> >> vim *-log.conf >>>> >> >>> >> >>>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >>>> >> >>> >> >>>> >> >>> >> 6. Check what they are complaining about >>>> >> >>> >> Keep in mind that agent will keep throwing errors untill >the >>>> >> >broker >>>> >> >>> >stops >>>> >> >>> >> doing it (agent depends on broker), so broker must be >OK >>>> >before >>>> >> >>> >> peoceeding with the agent log. >>>> >> >>> >> >>>> >> >>> >> About the manual VM start, you need 2 things: >>>> >> >>> >> >>>> >> >>> >> 1. Define the VM network >>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> >>>> >> >>> >> <name>vdsm-ovirtmgmt</name> >>>> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >>>> >> >>> >> <forward mode='bridge'/> >>>> >> >>> >> <bridge name='ovirtmgmt'/> >>>> >> >>> >> </network> >>>> >> >>> >> >>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define >>>> >> >vdsm-ovirtmgmt.xml >>>> >> >>> >> >>>> >> >>> >> 2. Get an xml definition which can be found in the vdsm >log. >>>> >> >Every VM >>>> >> >>> >at >>>> >> >>> >> start up has it's configuration printed out in vdsm log >on >>>> >the >>>> >> >host >>>> >> >>> >it >>>> >> >>> >> starts. >>>> >> >>> >> Save to file and then: >>>> >> >>> >> A) virsh define myvm.xml >>>> >> >>> >> B) virsh start myvm >>>> >> >>> >> >>>> >> >>> >> It seems there is/was a problem with your NFS shares. >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> Best Regards, >>>> >> >>> >> Strahil Nikolov >>>> >> >>> >> >>>> >> >>> >>>> >> >>> Hey Shareef, >>>> >> >>> >>>> >> >>> Check if there are any files or folders not owned by >vdsm:kvm . >>>> >> >Something >>>> >> >>> like this: >>>> >> >>> >>>> >> >>> find . -not -user 36 -not -group 36 -print >>>> >> >>> >>>> >> >>> Also check if vdsm can access the images in the >>>> >> >>> '<vol-mount-point>/images' directories. >>>> >> >>> >>>> >> >>> Best Regards, >>>> >> >>> Strahil Nikolov >>>> >> >>> >>>> >> >> >>>> >> >>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? >>>> >> >>>> >> I don't see in the log output. >>>> >> >>>> >> Best Regards, >>>> >> Strahil Nikolov >>>> >> >>>> >>>> Based on your output , you got a PTR record for IPv4 & IPv6 ... >most >>>> probably it's the reason. >>>> >>>> Set the IPv6 on the interface and try again. >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>> Do you have firewalld up and running on the host ? Best Regards, Strahil Nikolov

Yes, but there are no zones set up, just ports 22, 6801 adn 6900. On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Oh this is painful. It seems to progress if you have both he_force_ipv4 set and run the deployment with the '--4' switch.
But then I get a failure when the ansible script checks for firewalld-zones and doesn't get anything back. Should the deployment flow not be setting any zones it needs?
2020-04-15 10:57:25,439+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get active list of active firewalld zones]
2020-04-15 10:57:26,641+0000 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': True, u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': 1, u'invocation': {u'module_args': {u'creates': None, u'executable': None, u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], u'msg': u'non-zero return code'}
2020-04-15 10:57:26,741+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -euo pipefail && firewall-cmd --get-active-zones | grep -v \"^\\s*interfaces\"", "delta": "0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ha, spoke too soon. It's now stuck in a loop and a google points me at https://bugzilla.redhat.com/show_bug.cgi?id=1746585
However, forcing ipv4 doesn't seem to have fixed the loop.
On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, that seems to have fixed it, thanks. Is this a side effect of redeploying the HE over a first time install? Nothing has changed in our setup and I didn't need to do this when I initially set up our nodes.
On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Hmmm, we're not using ipv6. Is that the issue?
On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < > shareef@jalloq.co.uk> wrote: > >Right, I've given up on recovering the HE so want to try and redeploy > >it. > >There doesn't seem to be enough information to debug why the > >broker/agent > >won't start cleanly. > > > >In running 'hosted-engine --deploy', I'm seeing the following error in > >the > >setup validation phase: > > > >2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human > >dialog.__logString:204 DIALOG:SEND Please
the > >hostname of this host on the management network > >[ovirt-node-00.phoelex.com]: > > > > > >2020-04-14 09:46:12,831+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge > >hostname.getResolvedAddresses:432 > >getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61']) > > > >2020-04-14 09:46:12,832+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge > >hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com > >resolves > >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) > > > >2020-04-14 09:46:12,832+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 > >execute: > >['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', > >'ANY'], > >executable='None', cwd='None', env=None > > > >2020-04-14 09:46:12,871+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 > >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' > >ovirt-node-00.phoelex.com', 'ANY'], rc=0 > > > >2020-04-14 09:46:12,872+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 > >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >ovirt-node-00.phoelex.com', 'ANY'] stdout: > > > >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 > > > > > >2020-04-14 09:46:12,872+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 > >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >ovirt-node-00.phoelex.com', 'ANY'] stderr: > > > > > > > >2020-04-14 09:46:12,872+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 > >execute: > >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None > > > >2020-04-14 09:46:12,876+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 > >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 > > > >2020-04-14 09:46:12,876+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 > >execute-output: ('/usr/sbin/ip', 'addr') stdout: > > > >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN > >group > >default qlen 1000 > > > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > > > inet 127.0.0.1/8 scope host lo > > > > valid_lft forever preferred_lft forever > > > > inet6 ::1/128 scope host > > > > valid_lft forever preferred_lft forever > > > >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master > >ovirtmgmt state UP group default qlen 1000 > > > > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > > > >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state > >DOWN > >group default qlen 1000 > > > > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff > > > >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN > >group > >default qlen 1000 > > > > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff > > > >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group > >default qlen 1000 > > > > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff > > > >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue > >state UP group default qlen 1000 > > > > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > > > > inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt > > > > valid_lft forever preferred_lft forever > > > > inet6 fe80::ae1f:6bff:febc:326a/64 scope link > > > > valid_lft forever preferred_lft forever > > > >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN > >group > >default qlen 1000 > > > > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff > > > > > >2020-04-14 09:46:12,876+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 > >execute-output: ('/usr/sbin/ip', 'addr') stderr: > > > > > > > >2020-04-14 09:46:12,877+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge > >hostname.getLocalAddresses:251 > >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] > > > >2020-04-14 09:46:12,877+0000 DEBUG > >otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 > >test_hostname exception > > > >Traceback (most recent call last): > > > >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >line > >460, in test_hostname > > > > not_local_text, > > > >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >line > >342, in _validateFQDNresolvability > > > > addresses=resolvedAddressesAsString > > > >RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d > >192.168.1.61 and not all of them can be mapped to non loopback devices > >on > >this host > > > >2020-04-14 09:46:12,884+0000 ERROR > >otopi.plugins.gr_he_common.network.bridge
Host > >name > >is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d > >192.168.1.61 and not all of them can be mapped to non loopback devices > >on > >this host > > > >The node I'm running on has an IP address of .61 and resolves > >correctly. > > > >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> > >wrote: > > > >> Where should I be checking if there are any files/folder not owned by > >> vdsm:kvm? I checked on the mount the HA sits on and it's fine. > >> > >> How would I go about checking vdsm can access those images? If I run > >> virsh, it lists them and they were running yesterday even
On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: provide dialog.queryEnvKey:120 though
the > >HA was > >> down. I've since restarted both hosts but the broker is still > >spitting out > >> the same error (copied below). How do I find the reason the broker > >can't > >> connect to the storage? The conf file is already at DEBUG verbosity: > >> > >> [handler_logfile] > >> > >> class=logging.handlers.TimedRotatingFileHandler > >> > >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1,
> >> > >> level=DEBUG > >> > >> formatter=long > >> > >> And what are all these .prob-<num> files that are being created? > >There > >> are over 250K of them now on the mount I'm using for the Data domain. > >> They're all of 0 size and of the form, > >> /rhev/data-center/mnt/nas-01.phoelex.com: > >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 > >> > >> @eevans: The volume I have the Data Domain on has TB's free. The HA > >is > >> dead so I can't ssh in. No idea what started these errors and the > >other > >> VMs were still running happily although they're on a different Data > >Domain. > >> > >> Shareef. > >> > >> MainThread::INFO::2020-04-10 > >> > >
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >> Connecting the storage > >> > >> MainThread::INFO::2020-04-10 > >> > >
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> Connecting storage server > >> > >> MainThread::INFO::2020-04-10 > >> > >
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> Connecting storage server > >> > >> MainThread::INFO::2020-04-10 > >> > >
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> Refreshing the storage domain > >> > >> MainThread::WARNING::2020-04-10 > >> > >
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >> Can't connect vdsm storage: Command StorageDomain.getInfo with args > >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >> > >> (code=350, message=Error in storage domain action: > >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> > >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov > ><hunter86_bg@yahoo.com> > >> wrote: > >> > >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < > >>> shareef@jalloq.co.uk> wrote: > >>> >OK, let's go through this. I'm looking at the node that at least > >still > >>> >has > >>> >some VMs running. virsh also tells me that the HostedEngine VM is > >>> >running > >>> >but it's unresponsive and I can't shut it down. > >>> > > >>> >1. All storage domains exist and are mounted. > >>> >2. The ha_agent exists: > >>> > > >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls > >/rhev/data-center/mnt/ > >>> >nas-01.phoelex.com > >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ > >>> > > >>> >dom_md ha_agent images master > >>> > > >>> >3. There are two links > >>> > > >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll > >/rhev/data-center/mnt/ > >>> >nas-01.phoelex.com > >>> >\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ > >>> > > >>> >total 8 > >>> > > >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace -> > >>> > >>> > >
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
> >>> > > >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata -> > >>> > >>> > >
> >>> > > >>> >4. The services exist but all seem to have some sort of warning: > >>> > > >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: > >*2020-04-08 > >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec* > >>> > > >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: > >*failed > >>> >to > >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object > >file: > >>> >No > >>> >such file or directory* > >>> > > >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR > >failed > >>> >to > >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or > >directory'Is > >>> >the > >>> >Hosted Engine setup finished?* > >>> > > >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 libvirtd[29307]:
> >2020-04-08 > >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : > >cannot > >>> >parse > >>> >process status data > >>> > > >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: > >2020-04-08 > >>> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : > >>> >internal > >>> >error: /proc/net/dev: Interface not found > >>> > > >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: > >2020-04-08 > >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End > >of > >>> >file > >>> >while reading data: Input/output error > >>> > > >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: > >2020-04-09 > >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End > >of > >>> >file > >>> >while reading data: Input/output error > >>> > > >>> >5 & 6. The broker log is continually printing this error: > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >>> >ovirt-hosted-engine-ha broker 2.3.6 started > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >>> >Running broker > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
> >>> >Starting monitor > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Searching for submonitors in > >>> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker > >>> > > >>> >/submonitors > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor network > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor cpu-load-no-engine > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor mgmt-bridge > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor network > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor cpu-load > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor engine-health > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor mgmt-bridge > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor cpu-load-no-engine > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor cpu-load > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor mem-free > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor storage-domain > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor storage-domain > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor mem-free > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Loaded submonitor engine-health > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >>> >Finished loading submonitors > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
> >>> >Starting storage broker > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >>> >Connecting to VDSM > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
> >>> >Creating a new json-rpc connection to VDSM > >>> > > >>> >Client localhost:54321::DEBUG::2020-04-09 > >>> >08:07:31,453::concurrent::258::root::(run) START thread > ><Thread(Client > >>> >localhost:54321, started daemon 139992488138496)> (func=<bound > >method > >>> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor > >object at > >>> >0x7f528acabc90>>, args=(), kwargs={}) > >>> > > >>> >Client localhost:54321::DEBUG::2020-04-09 > >>> > >>> > >
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
> >>> >Stomp connection established > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >>> >Connecting the storage > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >Connecting storage server > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not > >available > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >Connecting storage server > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] > >>> > > >>> >MainThread::INFO::2020-04-09 > >>> > >>> > >
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >Refreshing the storage domain > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >>> >Error refreshing storage domain: Command StorageDomain.getStats > >with > >>> >args > >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >>> > > >>> >(code=350, message=Error in storage domain action: > >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) > >Sending > >>> >response > >>> > > >>> >MainThread::DEBUG::2020-04-09 > >>> > >>> > >
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
> >>> >Command StorageDomain.getInfo with args {'storagedomainID': > >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >>> > > >>> >(code=350, message=Error in storage domain action: > >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>> > > >>> >MainThread::WARNING::2020-04-09 > >>> > >>> > >
> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo with args > >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >>> > > >>> >(code=350, message=Error in storage domain action: > >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>> > > >>> > > >>> >The UUID it is moaning about is indeed the one that the HA sits on > >and > >>> >is > >>> >the one I listed the contents of in step 2 above. > >>> > > >>> > > >>> >So why can't it see this domain? > >>> > > >>> > > >>> >Thanks, Shareef. > >>> > > >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov > ><hunter86_bg@yahoo.com> > >>> >wrote: > >>> > > >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < > >>> >> shareef@jalloq.co.uk> wrote: > >>> >> >Don't know if this is useful or not, but I just tried to > >shutdown > >>> >and > >>> >> >start > >>> >> >another VM on one of the hosts and get the following error: > >>> >> > > >>> >> >virsh # start scratch > >>> >> > > >>> >> >error: Failed to start domain scratch > >>> >> > > >>> >> >error: Network not found: no network with matching name > >>> >> >'vdsm-ovirtmgmt' > >>> >> > > >>> >> >Is this not referring to the interface name as the network is > >called > >>> >> >'ovirtmgnt'. > >>> >> > > >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq > >>> ><shareef@jalloq.co.uk> > >>> >> >wrote: > >>> >> > > >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't come up > >and > >>> >the > >>> >> >> agent.log is full of the same errors. > >>> >> >> > >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq > >>> ><shareef@jalloq.co.uk> > >>> >> >> wrote: > >>> >> >> > >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh on the > >>> >second > >>> >> >host > >>> >> >>> but my first host is still dead. > >>> >> >>> > >>> >> >>> First of all, what are these 56,317 .prob- files that get > >dumped > >>> >to > >>> >> >the > >>> >> >>> NFS mounts? > >>> >> >>> > >>> >> >>> Secondly, why doesn't the node mount the NFS
at > >boot? > >>> >> >Is > >>> >> >>> that the issue with this particular node? > >>> >> >>> > >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> > >>> >wrote: > >>> >> >>> > >>> >> >>>> Did you try virsh list --inactive > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> Eric Evans > >>> >> >>>> > >>> >> >>>> Digital Data Services LLC. > >>> >> >>>> > >>> >> >>>> 304.660.9080 > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> > >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM > >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> > >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> > >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how > >to > >>> >> >rescue? > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> I've now shut down the VMs on one host and rebooted it but > >the > >>> >> >agent > >>> >> >>>> service doesn't start. If I run 'hosted-engine --vm-status' > >I > >>> >get: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> The hosted engine configuration has not been retrieved from > >>> >shared > >>> >> >>>> storage. Please ensure that ovirt-ha-agent is running and > >the > >>> >> >storage > >>> >> >>>> server is reachable. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> and indeed if I list the mounts under /rhev/data-center/mnt, > >>> >only > >>> >> >one of > >>> >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO > >Domain > >>> >> >and two > >>> >> >>>> Data Domains. Only one Data Domain has mounted and
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) directories this
has > >>> >lots > >>> >> >of .prob > >>> >> >>>> files in. So why haven't the other NFS exports been > >mounted? > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> Manually mounting them doesn't seem to have helped much > >either. > >>> >I > >>> >> >can > >>> >> >>>> start the broker service but the agent service says no. > >Same > >>> >error > >>> >> >as the > >>> >> >>>> one in my last email. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> Shareef. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq > >>> >> ><shareef@jalloq.co.uk> > >>> >> >>>> wrote: > >>> >> >>>> > >>> >> >>>> Right, still down. I've run virsh and it doesn't know > >anything > >>> >> >about > >>> >> >>>> the engine vm. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> I've restarted the broker and agent services and I still get > >>> >> >nothing in > >>> >> >>>> virsh->list. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots > >of > >>> >> >errors: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> broker.log: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Searching for submonitors in > >>> >> >>>> > >>> >> > >>> > >>> > >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor network > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor cpu-load-no-engine > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor mgmt-bridge > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor network > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor cpu-load > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor engine-health > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor mgmt-bridge > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor cpu-load-no-engine > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor cpu-load > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor mem-free > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor storage-domain > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor storage-domain > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor mem-free > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Loaded submonitor engine-health > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Finished loading submonitors > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >>> >> >>>> Connecting the storage > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >> >>>> Connecting storage server > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >> >>>> Connecting storage server > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>> >> >>>> Refreshing the storage domain > >>> >> >>>> > >>> >> >>>> MainThread::WARNING::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >>> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo > >with > >>> >args > >>> >> >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >>> >failed: > >>> >> >>>> > >>> >> >>>> (code=350, message=Error in storage domain action: > >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>> >> >>>> Searching for submonitors in > >>> >> >>>> > >>> >> > >>> > >>> > >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> agent.log: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> MainThread::ERROR::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>> >> >>>> Trying to restart agent > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>> >> >>>> Agent shutting down > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) > >>> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >>> >> >>>> Initializing ha-broker connection > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) > >>> >> >>>> Starting monitor network, options {'tcp_t_address': '', > >>> >> >'network_test': > >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > >>> >> >>>> > >>> >> >>>> MainThread::ERROR::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >>> >> >>>> Failed to start necessary monitors > >>> >> >>>> > >>> >> >>>> MainThread::ERROR::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>> >> >>>> Traceback (most recent call last): > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> > >>> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >>> >> >>>> line 131, in _run_agent > >>> >> >>>> > >>> >> >>>> return action(he) > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> > >>> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >>> >> >>>> line 55, in action_proper > >>> >> >>>> > >>> >> >>>> return he.start_monitoring() > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >>> >> >>>> line 432, in start_monitoring > >>> >> >>>> > >>> >> >>>> self._initialize_broker() > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >>> >> >>>> line 556, in _initialize_broker > >>> >> >>>> > >>> >> >>>> m.get('options', {})) > >>> >> >>>> > >>> >> >>>> File > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >>> >> >>>> line 89, in start_monitor > >>> >> >>>> > >>> >> >>>> ).format(t=type, o=options, e=e) > >>> >> >>>> > >>> >> >>>> RequestError: brokerlink - failed to start monitor via > >>> >> >ovirt-ha-broker: > >>> >> >>>> [Errno 2] No such file or directory, [monitor: 'network', > >>> >options: > >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': > >'', > >>> >> >'addr': > >>> >> >>>> '192.168.1.99'}] > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> MainThread::ERROR::2020-04-08 > >>> >> >>>> > >>> >> > >>> >> > >>> > >>> > >
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>> >> >>>> Trying to restart agent > >>> >> >>>> > >>> >> >>>> MainThread::INFO::2020-04-08 > >>> >> >>>> > >>> >> > >>> >
> >>> >> >>>> >> conf_on_shared_storage=True > >>> >> >>>> >> maintenance=False > >>> >> >>>> >> state=EngineDown > >>> >> >>>> >> stopped=False > >>> >> >>>> >> > >>> >> >>>> >> > >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- > >>> >> >>>> >> > >>> >> >>>> >> conf_on_shared_storage : True > >>> >> >>>> >> Status up-to-date : True > >>> >> >>>> >> Hostname : > >>> >ovirt-node-01.phoelex.com > >>> >> >>>> >> Host ID : 2 > >>> >> >>>> >> Engine status : {"reason": "bad vm > >>> >status", > >>> >> >>>> >"health": > >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} > >>> >> >>>> >> Score : 0 > >>> >> >>>> >> stopped : False > >>> >> >>>> >> Local maintenance : False > >>> >> >>>> >> crc32 : 5045f2eb > >>> >> >>>> >> local_conf_timestamp : 1737037 > >>> >> >>>> >> Host timestamp : 1737283 > >>> >> >>>> >> Extra metadata (valid at timestamp): > >>> >> >>>> >> metadata_parse_version=1 > >>> >> >>>> >> metadata_feature_version=1 > >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >>> >> >>>> >> host-id=2 > >>> >> >>>> >> score=0 > >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 2020) > >>> >> >>>> >> conf_on_shared_storage=True > >>> >> >>>> >> maintenance=False > >>> >> >>>> >> state=EngineUnexpectedlyDown > >>> >> >>>> >> stopped=False > >>> >> >>>> >> > >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > >>> >> >>>> ><matonb@ltresources.co.uk> > >>> >> >>>> >> wrote: > >>> >> >>>> >> > >>> >> >>>> >>> First steps, on one of your hosts as root: > >>> >> >>>> >>> > >>> >> >>>> >>> To get information: > >>> >> >>>> >>> hosted-engine --vm-status > >>> >> >>>> >>> > >>> >> >>>> >>> To start the engine: > >>> >> >>>> >>> hosted-engine --vm-start > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq > >>> >> ><shareef@jalloq.co.uk> > >>> >> >>>> >wrote: > >>> >> >>>> >>> > >>> >> >>>> >>>> So my engine has gone down and I can't ssh into it > >either. > >>> >If > >>> >> >I > >>> >> >>>> >try to > >>> >> >>>> >>>> log into the web-ui of the node it is running on, I get > >>> >> >redirected > >>> >> >>>> >because > >>> >> >>>> >>>> the node can't reach the engine. > >>> >> >>>> >>>> > >>> >> >>>> >>>> What are my next steps? > >>> >> >>>> >>>> > >>> >> >>>> >>>> Shareef. > >>> >> >>>> >>>> _______________________________________________ > >>> >> >>>> >>>> Users mailing list -- users@ovirt.org > >>> >> >>>> >>>> To unsubscribe send an email to users-leave@ovirt.org > >>> >> >>>> >>>> Privacy Statement: > >>> >https://www.ovirt.org/privacy-policy.html > >>> >> >>>> >>>> oVirt Code of Conduct: > >>> >> >>>> >>>> > >https://www.ovirt.org/community/about/community-guidelines/ > >>> >> >>>> >>>> List Archives: > >>> >> >>>> >>>> > >>> >> >>>> > > >>> >> >>>> > >>> >> > > >>> >> > >>> > > >>> > > >
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>> >> >>>> Agent shutting down > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov > >>> >> ><hunter86_bg@yahoo.com> > >>> >> >>>> wrote: > >>> >> >>>> > >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < > >>> >> >>>> matonb@ltresources.co.uk> wrote: > >>> >> >>>> >On the host you tried to restart the engine on: > >>> >> >>>> > > >>> >> >>>> >Add an alias to virsh (authenticates with virsh_auth.conf) > >>> >> >>>> > > >>> >> >>>> >alias virsh='virsh -c > >>> >> >>>> > >>>
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
> >>> >> >>>> > > >>> >> >>>> >Then run virsh: > >>> >> >>>> > > >>> >> >>>> >virsh > >>> >> >>>> > > >>> >> >>>> >virsh # list > >>> >> >>>> > Id Name State > >>> >> >>>>
> >>> >> >>>> > xx HostedEngine Paused > >>> >> >>>> > xx ********** running > >>> >> >>>> > ... > >>> >> >>>> > xx ********** running > >>> >> >>>> > > >>> >> >>>> >HostedEngine should be in the list, try and resume the > >engine: > >>> >> >>>> > > >>> >> >>>> >virsh # resume HostedEngine > >>> >> >>>> > > >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq > >>> ><shareef@jalloq.co.uk> > >>> >> >>>> >wrote: > >>> >> >>>> > > >>> >> >>>> >> Thanks! > >>> >> >>>> >> > >>> >> >>>> >> The status hangs due to, I guess, the VM being down.... > >>> >> >>>> >> > >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >>> >> >>>> >> VM exists and is down, cleaning up and restarting > >>> >> >>>> >> VM in WaitForLaunch > >>> >> >>>> >> > >>> >> >>>> >> but this doesn't seem to do anything. OK, after a while > >I > >>> >get a > >>> >> >>>> >status of > >>> >> >>>> >> it being barfed... > >>> >> >>>> >> > >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- > >>> >> >>>> >> > >>> >> >>>> >> conf_on_shared_storage : True > >>> >> >>>> >> Status up-to-date : False > >>> >> >>>> >> Hostname : > >>> >ovirt-node-00.phoelex.com > >>> >> >>>> >> Host ID : 1 > >>> >> >>>> >> Engine status : unknown stale-data > >>> >> >>>> >> Score : 3400 > >>> >> >>>> >> stopped : False > >>> >> >>>> >> Local maintenance : False > >>> >> >>>> >> crc32 : 9c4a034b > >>> >> >>>> >> local_conf_timestamp : 523362 > >>> >> >>>> >> Host timestamp : 523608 > >>> >> >>>> >> Extra metadata (valid at timestamp): > >>> >> >>>> >> metadata_parse_version=1 > >>> >> >>>> >> metadata_feature_version=1 > >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) > >>> >> >>>> >> host-id=1 > >>> >> >>>> >> score=3400 > >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
> >>> >> >>>> >>>> > >>> >> >>>> >>> > >>> >> >>>> > >>> >> >>>> This has to be resolved: > >>> >> >>>> > >>> >> >>>> Engine status : unknown stale-data > >>> >> >>>> > >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the > >same, > >>> >> >restart > >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service > >>> >> >>>> > >>> >> >>>> Verify that the engine's storage is available. Then monitor > >the > >>> >> >broker > >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha > >>> >> >>>> > >>> >> >>>> Best Regards, > >>> >> >>>> Strahil Nikolov > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> > >>> >> Hi Shareef, > >>> >> > >>> >> The flow of activation oVirt is more complex than a plain KVM. > >>> >> Mounting of the domains happen during the activation of the node > >( > >>> >the > >>> >> HostedEngine is activating everything needed). > >>> >> > >>> >> Focus on the HostedEngine VM. > >>> >> Is it running properly ? > >>> >> > >>> >> If not,try: > >>> >> 1. Verify that the storage domain exists > >>> >> 2. Check if it has 'ha_agents' directory > >>> >> 3. Check if the links are OK, if not you can safely remove the > >links > >>> >> > >>> >> 4. Next check the services are running: > >>> >> A) sanlock > >>> >> B) supervdsmd > >>> >> C) vdsmd > >>> >> D) libvirtd > >>> >> > >>> >> 5. Increase the log level for broker and agent services: > >>> >> > >>> >> cd /etc/ovirt-hosted-engine-ha > >>> >> vim *-log.conf > >>> >> > >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent > >>> >> > >>> >> 6. Check what they are complaining about > >>> >> Keep in mind that agent will keep throwing errors untill the > >broker > >>> >stops > >>> >> doing it (agent depends on broker), so broker must be OK before > >>> >> peoceeding with the agent log. > >>> >> > >>> >> About the manual VM start, you need 2 things: > >>> >> > >>> >> 1. Define the VM network > >>> >> # cat vdsm-ovirtmgmt.xml <network> > >>> >> <name>vdsm-ovirtmgmt</name> > >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> > >>> >> <forward mode='bridge'/> > >>> >> <bridge name='ovirtmgmt'/> > >>> >> </network> > >>> >> > >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define > >vdsm-ovirtmgmt.xml > >>> >> > >>> >> 2. Get an xml definition which can be found in the vdsm log. > >Every VM > >>> >at > >>> >> start up has it's configuration printed out in vdsm log on the > >host > >>> >it > >>> >> starts. > >>> >> Save to file and then: > >>> >> A) virsh define myvm.xml > >>> >> B) virsh start myvm > >>> >> > >>> >> It seems there is/was a problem with your NFS shares. > >>> >> > >>> >> > >>> >> Best Regards, > >>> >> Strahil Nikolov > >>> >> > >>> > >>> Hey Shareef, > >>> > >>> Check if there are any files or folders not owned by vdsm:kvm . > >Something > >>> like this: > >>> > >>> find . -not -user 36 -not -group 36 -print > >>> > >>> Also check if vdsm can access the images in the > >>> '<vol-mount-point>/images' directories. > >>> > >>> Best Regards, > >>> Strahil Nikolov > >>> > >> > > And the IPv6 address '64:ff9b::c0a8:13d' ? > > I don't see in the log output. > > Best Regards, > Strahil Nikolov >
Based on your output , you got a PTR record for IPv4 & IPv6 ... most probably it's the reason.
Set the IPv6 on the interface and try again.
Best Regards, Strahil Nikolov
Do you have firewalld up and running on the host ?
Best Regards, Strahil Nikolov

On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Yes, but there are no zones set up, just ports 22, 6801 adn 6900.
On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Oh this is painful. It seems to progress if you have both he_force_ipv4 set and run the deployment with the '--4' switch.
But then I get a failure when the ansible script checks for firewalld-zones and doesn't get anything back. Should the deployment flow not be setting any zones it needs?
2020-04-15 10:57:25,439+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get active list of active firewalld zones]
2020-04-15 10:57:26,641+0000 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': True, u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': 1, u'invocation': {u'module_args': {u'creates': None, u'executable': None, u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], u'msg': u'non-zero return code'}
2020-04-15 10:57:26,741+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -euo pipefail && firewall-cmd --get-active-zones | grep -v \"^\\s*interfaces\"", "delta": "0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ha, spoke too soon. It's now stuck in a loop and a google points me at https://bugzilla.redhat.com/show_bug.cgi?id=1746585
However, forcing ipv4 doesn't seem to have fixed the loop.
On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, that seems to have fixed it, thanks. Is this a side effect of redeploying the HE over a first time install? Nothing has changed in our setup and I didn't need to do this when I initially set up our nodes.
On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: >Hmmm, we're not using ipv6. Is that the issue? > >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> >wrote: > >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < >> shareef@jalloq.co.uk> wrote: >> >Right, I've given up on recovering the HE so want to try and >redeploy >> >it. >> >There doesn't seem to be enough information to debug why the >> >broker/agent >> >won't start cleanly. >> > >> >In running 'hosted-engine --deploy', I'm seeing the following error >in >> >the >> >setup validation phase: >> > >> >2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human >> >dialog.__logString:204 DIALOG:SEND Please provide >the >> >hostname of this host on the management network >> >[ovirt-node-00.phoelex.com]: >> > >> > >> >2020-04-14 09:46:12,831+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge >> >hostname.getResolvedAddresses:432 >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >> > >> >2020-04-14 09:46:12,832+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge >> >hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com >> >resolves >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >> > >> >2020-04-14 09:46:12,832+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge
>> >execute: >> >['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', >> >'ANY'], >> >executable='None', cwd='None', env=None >> > >> >2020-04-14 09:46:12,871+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge
>> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 >> > >> >2020-04-14 09:46:12,872+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: >> > >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 >> > >> > >> >2020-04-14 09:46:12,872+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: >> > >> > >> > >> >2020-04-14 09:46:12,872+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge
>> >execute: >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None >> > >> >2020-04-14 09:46:12,876+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge
>> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 >> > >> >2020-04-14 09:46:12,876+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: >> > >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN >> >group >> >default qlen 1000 >> > >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> > >> > inet 127.0.0.1/8 scope host lo >> > >> > valid_lft forever preferred_lft forever >> > >> > inet6 ::1/128 scope host >> > >> > valid_lft forever preferred_lft forever >> > >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master >> >ovirtmgmt state UP group default qlen 1000 >> > >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >> > >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state >> >DOWN >> >group default qlen 1000 >> > >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff >> > >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >> >group >> >default qlen 1000 >> > >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff >> > >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >group >> >default qlen 1000 >> > >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff >> > >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >noqueue >> >state UP group default qlen 1000 >> > >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >> > >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt >> > >> > valid_lft forever preferred_lft forever >> > >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link >> > >> > valid_lft forever preferred_lft forever >> > >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state >DOWN >> >group >> >default qlen 1000 >> > >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff >> > >> > >> >2020-04-14 09:46:12,876+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: >> > >> > >> > >> >2020-04-14 09:46:12,877+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge >> >hostname.getLocalAddresses:251 >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] >> > >> >2020-04-14 09:46:12,877+0000 DEBUG >> >otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 >> >test_hostname exception >> > >> >Traceback (most recent call last): >> > >> >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >> >line >> >460, in test_hostname >> > >> > not_local_text, >> > >> >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >> >line >> >342, in _validateFQDNresolvability >> > >> > addresses=resolvedAddressesAsString >> > >> >RuntimeError: ovirt-node-00.phoelex.com resolves to >64:ff9b::c0a8:13d >> >192.168.1.61 and not all of them can be mapped to non loopback >devices >> >on >> >this host >> > >> >2020-04-14 09:46:12,884+0000 ERROR >> >otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 >Host >> >name >> >is not valid: ovirt-node-00.phoelex.com resolves to >64:ff9b::c0a8:13d >> >192.168.1.61 and not all of them can be mapped to non loopback >devices >> >on >> >this host >> > >> >The node I'm running on has an IP address of .61 and resolves >> >correctly. >> > >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq ><shareef@jalloq.co.uk> >> >wrote: >> > >> >> Where should I be checking if there are any files/folder not owned >by >> >> vdsm:kvm? I checked on the mount the HA sits on and it's fine. >> >> >> >> How would I go about checking vdsm can access those images? If I >run >> >> virsh, it lists them and they were running yesterday even though >the >> >HA was >> >> down. I've since restarted both hosts but the broker is still >> >spitting out >> >> the same error (copied below). How do I find the reason
On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: plugin.executeRaw:813 plugin.executeRaw:863 plugin.executeRaw:813 plugin.executeRaw:863 the
>broker >> >can't >> >> connect to the storage? The conf file is already at DEBUG >verbosity: >> >> >> >> [handler_logfile] >> >> >> >> class=logging.handlers.TimedRotatingFileHandler >> >> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1,
>> >> >> >> level=DEBUG >> >> >> >> formatter=long >> >> >> >> And what are all these .prob-<num> files that are being created? >> >There >> >> are over 250K of them now on the mount I'm using for the Data >domain. >> >> They're all of 0 size and of the form, >> >> /rhev/data-center/mnt/nas-01.phoelex.com: >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 >> >> >> >> @eevans: The volume I have the Data Domain on has TB's free. The >HA >> >is >> >> dead so I can't ssh in. No idea what started these errors and the >> >other >> >> VMs were still running happily although they're on a different >Data >> >Domain. >> >> >> >> Shareef. >> >> >> >> MainThread::INFO::2020-04-10 >> >> >> >>
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> >> Connecting the storage >> >> >> >> MainThread::INFO::2020-04-10 >> >> >> >>
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >> Connecting storage server >> >> >> >> MainThread::INFO::2020-04-10 >> >> >> >>
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >> Connecting storage server >> >> >> >> MainThread::INFO::2020-04-10 >> >> >> >>
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >> Refreshing the storage domain >> >> >> >> MainThread::WARNING::2020-04-10 >> >> >> >>
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>> >> Can't connect vdsm storage: Command StorageDomain.getInfo with >args >> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >failed: >> >> >> >> (code=350, message=Error in storage domain action: >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov >> ><hunter86_bg@yahoo.com> >> >> wrote: >> >> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < >> >>> shareef@jalloq.co.uk> wrote: >> >>> >OK, let's go through this. I'm looking at the node that at >least >> >still >> >>> >has >> >>> >some VMs running. virsh also tells me that the HostedEngine VM >is >> >>> >running >> >>> >but it's unresponsive and I can't shut it down. >> >>> > >> >>> >1. All storage domains exist and are mounted. >> >>> >2. The ha_agent exists: >> >>> > >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls >> >/rhev/data-center/mnt/ >> >>> >nas-01.phoelex.com >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >> >>> > >> >>> >dom_md ha_agent images master >> >>> > >> >>> >3. There are two links >> >>> > >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll >> >/rhev/data-center/mnt/ >> >>> >nas-01.phoelex.com >> >>>
\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
>> >>> > >> >>> >total 8 >> >>> > >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace >-> >> >>> >> >>> >> >>
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
>> >>> > >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata >-> >> >>> >> >>> >> >>
>> >>> > >> >>> >4. The services exist but all seem to have some sort of warning: >> >>> > >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: >> >*2020-04-08 >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 >sec* >> >>> > >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: >> >*failed >> >>> >to >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object >> >file: >> >>> >No >> >>> >such file or directory* >> >>> > >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR >> >failed >> >>> >to >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or >> >directory'Is >> >>> >the >> >>> >Hosted Engine setup finished?* >> >>> > >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com
>> >2020-04-08 >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : >> >cannot >> >>> >parse >> >>> >process status data >> >>> > >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com
>> >2020-04-08 >> >>> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 >: >> >>> >internal >> >>> >error: /proc/net/dev: Interface not found >> >>> > >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com
>> >2020-04-08 >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : >End >> >of >> >>> >file >> >>> >while reading data: Input/output error >> >>> > >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 libvirtd[29307]: libvirtd[29307]: libvirtd[29307]: libvirtd[29307]:
>> >2020-04-09 >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : >End >> >of >> >>> >file >> >>> >while reading data: Input/output error >> >>> > >> >>> >5 & 6. The broker log is continually printing this error: >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> >>> >ovirt-hosted-engine-ha broker 2.3.6 started >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> >>> >Running broker >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
>> >>> >Starting monitor >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Searching for submonitors in >> >>> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker >> >>> > >> >>> >/submonitors >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor network >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor cpu-load-no-engine >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor mgmt-bridge >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor network >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor cpu-load >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor engine-health >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor mgmt-bridge >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor cpu-load-no-engine >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor cpu-load >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor mem-free >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor storage-domain >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor storage-domain >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor mem-free >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Loaded submonitor engine-health >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >>> >Finished loading submonitors >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
>> >>> >Starting storage broker >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> >>> >Connecting to VDSM >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
>> >>> >Creating a new json-rpc connection to VDSM >> >>> > >> >>> >Client localhost:54321::DEBUG::2020-04-09 >> >>> >08:07:31,453::concurrent::258::root::(run) START thread >> ><Thread(Client >> >>> >localhost:54321, started daemon 139992488138496)> (func=<bound >> >method >> >>> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor >> >object at >> >>> >0x7f528acabc90>>, args=(), kwargs={}) >> >>> > >> >>> >Client localhost:54321::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
>> >>> >Stomp connection established >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >Sending >> >>> >response >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> >>> >Connecting the storage >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >>> >Connecting storage server >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >Sending >> >>> >response >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >Sending >> >>> >response >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
>> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not >> >available >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >>> >Connecting storage server >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >Sending >> >>> >response >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >>> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >> >>> > >> >>> >MainThread::INFO::2020-04-09 >> >>> >> >>> >> >>
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >>> >Refreshing the storage domain >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >Sending >> >>> >response >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >>> >Error refreshing storage domain: Command StorageDomain.getStats >> >with >> >>> >args >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >failed: >> >>> > >> >>> >(code=350, message=Error in storage domain action: >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >Sending >> >>> >response >> >>> > >> >>> >MainThread::DEBUG::2020-04-09 >> >>> >> >>> >> >>
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
>> >>> >Command StorageDomain.getInfo with args {'storagedomainID': >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >> >>> > >> >>> >(code=350, message=Error in storage domain action: >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >>> > >> >>> >MainThread::WARNING::2020-04-09 >> >>> >> >>> >> >>
>> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo with >args >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >failed: >> >>> > >> >>> >(code=350, message=Error in storage domain action: >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >>> > >> >>> > >> >>> >The UUID it is moaning about is indeed the one that the HA sits >on >> >and >> >>> >is >> >>> >the one I listed the contents of in step 2 above. >> >>> > >> >>> > >> >>> >So why can't it see this domain? >> >>> > >> >>> > >> >>> >Thanks, Shareef. >> >>> > >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov >> ><hunter86_bg@yahoo.com> >> >>> >wrote: >> >>> > >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < >> >>> >> shareef@jalloq.co.uk> wrote: >> >>> >> >Don't know if this is useful or not, but I just tried to >> >shutdown >> >>> >and >> >>> >> >start >> >>> >> >another VM on one of the hosts and get the following error: >> >>> >> > >> >>> >> >virsh # start scratch >> >>> >> > >> >>> >> >error: Failed to start domain scratch >> >>> >> > >> >>> >> >error: Network not found: no network with matching name >> >>> >> >'vdsm-ovirtmgmt' >> >>> >> > >> >>> >> >Is this not referring to the interface name as the network is >> >called >> >>> >> >'ovirtmgnt'. >> >>> >> > >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >> >>> ><shareef@jalloq.co.uk> >> >>> >> >wrote: >> >>> >> > >> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't come >up >> >and >> >>> >the >> >>> >> >> agent.log is full of the same errors. >> >>> >> >> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq >> >>> ><shareef@jalloq.co.uk> >> >>> >> >> wrote: >> >>> >> >> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh on >the >> >>> >second >> >>> >> >host >> >>> >> >>> but my first host is still dead. >> >>> >> >>> >> >>> >> >>> First of all, what are these 56,317 .prob- files
>> >dumped >> >>> >to >> >>> >> >the >> >>> >> >>> NFS mounts? >> >>> >> >>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS
>at >> >boot? >> >>> >> >Is >> >>> >> >>> that the issue with this particular node? >> >>> >> >>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM ><eevans@digitaldatatechs.com> >> >>> >wrote: >> >>> >> >>> >> >>> >> >>>> Did you try virsh list --inactive >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> Eric Evans >> >>> >> >>>> >> >>> >> >>>> Digital Data Services LLC. >> >>> >> >>>> >> >>> >> >>>> 304.660.9080 >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - >how >> >to >> >>> >> >rescue? >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> I've now shut down the VMs on one host and rebooted it >but >> >the >> >>> >> >agent >> >>> >> >>>> service doesn't start. If I run 'hosted-engine >--vm-status' >> >I >> >>> >get: >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> The hosted engine configuration has not been retrieved >from >> >>> >shared >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is running and >> >the >> >>> >> >storage >> >>> >> >>>> server is reachable. >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> and indeed if I list the mounts under >/rhev/data-center/mnt, >> >>> >only >> >>> >> >one of >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO >> >Domain >> >>> >> >and two >> >>> >> >>>> Data Domains. Only one Data Domain has mounted and
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) that get directories this
>has >> >>> >lots >> >>> >> >of .prob >> >>> >> >>>> files in. So why haven't the other NFS exports been >> >mounted? >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have helped much >> >either. >> >>> >I >> >>> >> >can >> >>> >> >>>> start the broker service but the agent service says no. >> >Same >> >>> >error >> >>> >> >as the >> >>> >> >>>> one in my last email. >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> Shareef. >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >> >>> >> ><shareef@jalloq.co.uk> >> >>> >> >>>> wrote: >> >>> >> >>>> >> >>> >> >>>> Right, still down. I've run virsh and it doesn't know >> >anything >> >>> >> >about >> >>> >> >>>> the engine vm. >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> I've restarted the broker and agent services and I still >get >> >>> >> >nothing in >> >>> >> >>>> virsh->list. >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see >lots >> >of >> >>> >> >errors: >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> broker.log: >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Searching for submonitors in >> >>> >> >>>> >> >>> >> >> >>> >> >>> >> >>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor network >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor mgmt-bridge >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor network >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor cpu-load >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor engine-health >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor mgmt-bridge >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor cpu-load >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor mem-free >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor storage-domain >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor storage-domain >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor mem-free >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Loaded submonitor engine-health >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Finished loading submonitors >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >>> >> >>>> Connecting the storage >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >>> >> >>>> Connecting storage server >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >>> >> >>>> Connecting storage server >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >>> >> >>>> Refreshing the storage domain >> >>> >> >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >>> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo >> >with >> >>> >args >> >>> >> >>>> {'storagedomainID': >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >>> >failed: >> >>> >> >>>> >> >>> >> >>>> (code=350, message=Error in storage domain action: >> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >>> >> >>>> Searching for submonitors in >> >>> >> >>>> >> >>> >> >> >>> >> >>> >> >>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> agent.log: >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >>> >> >>>> Trying to restart agent >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >>
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >>> >> >>>> Agent shutting down >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >>
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >> >>> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >> >>> >> >>>> Initializing ha-broker connection >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >> >>> >> >>>> Starting monitor network, options {'tcp_t_address': '', >> >>> >> >'network_test': >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >> >>> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >> >>> >> >>>> Failed to start necessary monitors >> >>> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >>> >> >>>> Traceback (most recent call last): >> >>> >> >>>> >> >>> >> >>>> File >> >>> >> >>>> >> >>> >> >> >>> >> >>> >> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> >>> >> >>>> line 131, in _run_agent >> >>> >> >>>> >> >>> >> >>>> return action(he) >> >>> >> >>>> >> >>> >> >>>> File >> >>> >> >>>> >> >>> >> >> >>> >> >>> >> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> >>> >> >>>> line 55, in action_proper >> >>> >> >>>> >> >>> >> >>>> return he.start_monitoring() >> >>> >> >>>> >> >>> >> >>>> File >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> >>> >> >>>> line 432, in start_monitoring >> >>> >> >>>> >> >>> >> >>>> self._initialize_broker() >> >>> >> >>>> >> >>> >> >>>> File >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> >>> >> >>>> line 556, in _initialize_broker >> >>> >> >>>> >> >>> >> >>>> m.get('options', {})) >> >>> >> >>>> >> >>> >> >>>> File >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> >>> >> >>>> line 89, in start_monitor >> >>> >> >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) >> >>> >> >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start monitor via >> >>> >> >ovirt-ha-broker: >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: 'network', >> >>> >options: >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', >'tcp_t_port': >> >'', >> >>> >> >'addr': >> >>> >> >>>> '192.168.1.99'}] >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >> >> >>> >> >>> >> >>
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >>> >> >>>> Trying to restart agent >> >>> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >>> >> >>>> >> >>> >> >> >>> >>
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >>> >> >>>> Agent shutting down >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >> >>> >> ><hunter86_bg@yahoo.com> >> >>> >> >>>> wrote: >> >>> >> >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < >> >>> >> >>>> matonb@ltresources.co.uk> wrote: >> >>> >> >>>> >On the host you tried to restart the engine on: >> >>> >> >>>> > >> >>> >> >>>> >Add an alias to virsh (authenticates with >virsh_auth.conf) >> >>> >> >>>> > >> >>> >> >>>> >alias virsh='virsh -c >> >>> >> >>>> >> >>>
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >> >>> >> >>>> > >> >>> >> >>>> >Then run virsh: >> >>> >> >>>> > >> >>> >> >>>> >virsh >> >>> >> >>>> > >> >>> >> >>>> >virsh # list >> >>> >> >>>> > Id Name State >> >>> >> >>>>
>> >>> >> >>>> > xx HostedEngine Paused >> >>> >> >>>> > xx ********** running >> >>> >> >>>> > ... >> >>> >> >>>> > xx ********** running >> >>> >> >>>> > >> >>> >> >>>> >HostedEngine should be in the list, try and resume the >> >engine: >> >>> >> >>>> > >> >>> >> >>>> >virsh # resume HostedEngine >> >>> >> >>>> > >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq >> >>> ><shareef@jalloq.co.uk> >> >>> >> >>>> >wrote: >> >>> >> >>>> > >> >>> >> >>>> >> Thanks! >> >>> >> >>>> >> >> >>> >> >>>> >> The status hangs due to, I guess, the VM being >down.... >> >>> >> >>>> >> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >> >>> >> >>>> >> VM exists and is down, cleaning up and restarting >> >>> >> >>>> >> VM in WaitForLaunch >> >>> >> >>>> >> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, after a >while >> >I >> >>> >get a >> >>> >> >>>> >status of >> >>> >> >>>> >> it being barfed... >> >>> >> >>>> >> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status >==-- >> >>> >> >>>> >> >> >>> >> >>>> >> conf_on_shared_storage : True >> >>> >> >>>> >> Status up-to-date : False >> >>> >> >>>> >> Hostname : >> >>> >ovirt-node-00.phoelex.com >> >>> >> >>>> >> Host ID : 1 >> >>> >> >>>> >> Engine status : unknown >stale-data >> >>> >> >>>> >> Score : 3400 >> >>> >> >>>> >> stopped : False >> >>> >> >>>> >> Local maintenance : False >> >>> >> >>>> >> crc32 : 9c4a034b >> >>> >> >>>> >> local_conf_timestamp : 523362 >> >>> >> >>>> >> Host timestamp : 523608 >> >>> >> >>>> >> Extra metadata (valid at timestamp): >> >>> >> >>>> >> metadata_parse_version=1 >> >>> >> >>>> >> metadata_feature_version=1 >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >> >>> >> >>>> >> host-id=1 >> >>> >> >>>> >> score=3400 >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06
>> >>> >> >>>> >> conf_on_shared_storage=True >> >>> >> >>>> >> maintenance=False >> >>> >> >>>> >> state=EngineDown >> >>> >> >>>> >> stopped=False >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status >==-- >> >>> >> >>>> >> >> >>> >> >>>> >> conf_on_shared_storage : True >> >>> >> >>>> >> Status up-to-date : True >> >>> >> >>>> >> Hostname : >> >>> >ovirt-node-01.phoelex.com >> >>> >> >>>> >> Host ID : 2 >> >>> >> >>>> >> Engine status : {"reason": "bad >vm >> >>> >status", >> >>> >> >>>> >"health": >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >> >>> >> >>>> >> Score : 0 >> >>> >> >>>> >> stopped : False >> >>> >> >>>> >> Local maintenance : False >> >>> >> >>>> >> crc32 : 5045f2eb >> >>> >> >>>> >> local_conf_timestamp : 1737037 >> >>> >> >>>> >> Host timestamp : 1737283 >> >>> >> >>>> >> Extra metadata (valid at timestamp): >> >>> >> >>>> >> metadata_parse_version=1 >> >>> >> >>>> >> metadata_feature_version=1 >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) >> >>> >> >>>> >> host-id=2 >> >>> >> >>>> >> score=0 >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 >2020) >> >>> >> >>>> >> conf_on_shared_storage=True >> >>> >> >>>> >> maintenance=False >> >>> >> >>>> >> state=EngineUnexpectedlyDown >> >>> >> >>>> >> stopped=False >> >>> >> >>>> >> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >> >>> >> >>>> ><matonb@ltresources.co.uk> >> >>> >> >>>> >> wrote: >> >>> >> >>>> >> >> >>> >> >>>> >>> First steps, on one of your hosts as root: >> >>> >> >>>> >>> >> >>> >> >>>> >>> To get information: >> >>> >> >>>> >>> hosted-engine --vm-status >> >>> >> >>>> >>> >> >>> >> >>>> >>> To start the engine: >> >>> >> >>>> >>> hosted-engine --vm-start >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq >> >>> >> ><shareef@jalloq.co.uk> >> >>> >> >>>> >wrote: >> >>> >> >>>> >>> >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into it >> >either. >> >>> >If >> >>> >> >I >> >>> >> >>>> >try to >> >>> >> >>>> >>>> log into the web-ui of the node it is running on, I >get >> >>> >> >redirected >> >>> >> >>>> >because >> >>> >> >>>> >>>> the node can't reach the engine. >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> What are my next steps? >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> Shareef. >> >>> >> >>>> >>>>
>> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org >> >>> >> >>>> >>>> To unsubscribe send an email to >users-leave@ovirt.org >> >>> >> >>>> >>>> Privacy Statement: >> >>> >https://www.ovirt.org/privacy-policy.html >> >>> >> >>>> >>>> oVirt Code of Conduct: >> >>> >> >>>> >>>> >> >https://www.ovirt.org/community/about/community-guidelines/ >> >>> >> >>>> >>>> List Archives: >> >>> >> >>>> >>>> >> >>> >> >>>> > >> >>> >> >>>> >> >>> >> > >> >>> >> >> >>> > >> >>> >> > >> >
>> >>> >> >>>> >>>> >> >>> >> >>>> >>> >> >>> >> >>>> >> >>> >> >>>> This has to be resolved: >> >>> >> >>>> >> >>> >> >>>> Engine status : unknown stale-data >> >>> >> >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the >> >same, >> >>> >> >restart >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >> >>> >> >>>> >> >>> >> >>>> Verify that the engine's storage is available. Then >monitor >> >the >> >>> >> >broker >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >> >>> >> >>>> >> >>> >> >>>> Best Regards, >> >>> >> >>>> Strahil Nikolov >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >> >>> >> Hi Shareef, >> >>> >> >> >>> >> The flow of activation oVirt is more complex than a
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C... plain
>> >>> >> Mounting of the domains happen during the activation of
>node >> >( >> >>> >the >> >>> >> HostedEngine is activating everything needed). >> >>> >> >> >>> >> Focus on the HostedEngine VM. >> >>> >> Is it running properly ? >> >>> >> >> >>> >> If not,try: >> >>> >> 1. Verify that the storage domain exists >> >>> >> 2. Check if it has 'ha_agents' directory >> >>> >> 3. Check if the links are OK, if not you can safely remove >the >> >links >> >>> >> >> >>> >> 4. Next check the services are running: >> >>> >> A) sanlock >> >>> >> B) supervdsmd >> >>> >> C) vdsmd >> >>> >> D) libvirtd >> >>> >> >> >>> >> 5. Increase the log level for broker and agent services: >> >>> >> >> >>> >> cd /etc/ovirt-hosted-engine-ha >> >>> >> vim *-log.conf >> >>> >> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >> >>> >> >> >>> >> 6. Check what they are complaining about >> >>> >> Keep in mind that agent will keep throwing errors untill
KVM. the the
>> >broker >> >>> >stops >> >>> >> doing it (agent depends on broker), so broker must be OK >before >> >>> >> peoceeding with the agent log. >> >>> >> >> >>> >> About the manual VM start, you need 2 things: >> >>> >> >> >>> >> 1. Define the VM network >> >>> >> # cat vdsm-ovirtmgmt.xml <network> >> >>> >> <name>vdsm-ovirtmgmt</name> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >> >>> >> <forward mode='bridge'/> >> >>> >> <bridge name='ovirtmgmt'/> >> >>> >> </network> >> >>> >> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define >> >vdsm-ovirtmgmt.xml >> >>> >> >> >>> >> 2. Get an xml definition which can be found in the vdsm log. >> >Every VM >> >>> >at >> >>> >> start up has it's configuration printed out in vdsm log on >the >> >host >> >>> >it >> >>> >> starts. >> >>> >> Save to file and then: >> >>> >> A) virsh define myvm.xml >> >>> >> B) virsh start myvm >> >>> >> >> >>> >> It seems there is/was a problem with your NFS shares. >> >>> >> >> >>> >> >> >>> >> Best Regards, >> >>> >> Strahil Nikolov >> >>> >> >> >>> >> >>> Hey Shareef, >> >>> >> >>> Check if there are any files or folders not owned by vdsm:kvm . >> >Something >> >>> like this: >> >>> >> >>> find . -not -user 36 -not -group 36 -print >> >>> >> >>> Also check if vdsm can access the images in the >> >>> '<vol-mount-point>/images' directories. >> >>> >> >>> Best Regards, >> >>> Strahil Nikolov >> >>> >> >> >> >> And the IPv6 address '64:ff9b::c0a8:13d' ? >> >> I don't see in the log output. >> >> Best Regards, >> Strahil Nikolov >>
Based on your output , you got a PTR record for IPv4 & IPv6 ... most probably it's the reason.
Set the IPv6 on the interface and try again.
Best Regards, Strahil Nikolov
Do you have firewalld up and running on the host ?
Best Regards, Strahil Nikolov
I am guessing, but your interface is not asaigned to any zone , right? Just add the interface to the default zone (usually 'public'). Best Regards, Strahil Nikolov

Thanks for your help but I've decided to try and reinstall from scratch. This is taking too long. On Wed, Apr 15, 2020 at 3:25 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Yes, but there are no zones set up, just ports 22, 6801 adn 6900.
On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Oh this is painful. It seems to progress if you have both he_force_ipv4 set and run the deployment with the '--4' switch.
But then I get a failure when the ansible script checks for firewalld-zones and doesn't get anything back. Should the deployment flow not be setting any zones it needs?
2020-04-15 10:57:25,439+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get active list of active firewalld zones]
2020-04-15 10:57:26,641+0000 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': True, u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': 1, u'invocation': {u'module_args': {u'creates': None, u'executable': None, u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], u'msg': u'non-zero return code'}
2020-04-15 10:57:26,741+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -euo pipefail && firewall-cmd --get-active-zones | grep -v \"^\\s*interfaces\"", "delta": "0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Ha, spoke too soon. It's now stuck in a loop and a google points me at https://bugzilla.redhat.com/show_bug.cgi?id=1746585
However, forcing ipv4 doesn't seem to have fixed the loop.
On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, that seems to have fixed it, thanks. Is this a side effect of redeploying the HE over a first time install? Nothing has changed in our setup and I didn't need to do this when I initially set up our nodes.
On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < > shareef@jalloq.co.uk> wrote: > >Hmmm, we're not using ipv6. Is that the issue? > > > >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> > >wrote: > > > >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < > >> shareef@jalloq.co.uk> wrote: > >> >Right, I've given up on recovering the HE so want to try and > >redeploy > >> >it. > >> >There doesn't seem to be enough information to debug why the > >> >broker/agent > >> >won't start cleanly. > >> > > >> >In running 'hosted-engine --deploy', I'm seeing the following error > >in > >> >the > >> >setup validation phase: > >> > > >> >2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human > >> >dialog.__logString:204 DIALOG:SEND Please provide > >the > >> >hostname of this host on the management network > >> >[ovirt-node-00.phoelex.com]: > >> > > >> > > >> >2020-04-14 09:46:12,831+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge > >> >hostname.getResolvedAddresses:432 > >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61']) > >> > > >> >2020-04-14 09:46:12,832+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge > >> >hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com > >> >resolves > >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) > >> > > >> >2020-04-14 09:46:12,832+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge
> >> >execute: > >> >['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', > >> >'ANY'], > >> >executable='None', cwd='None', env=None > >> > > >> >2020-04-14 09:46:12,871+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge
> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' > >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 > >> > > >> >2020-04-14 09:46:12,872+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 > >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: > >> > > >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 > >> > > >> > > >> >2020-04-14 09:46:12,872+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 > >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: > >> > > >> > > >> > > >> >2020-04-14 09:46:12,872+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge
> >> >execute: > >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None > >> > > >> >2020-04-14 09:46:12,876+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge
> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 > >> > > >> >2020-04-14 09:46:12,876+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 > >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: > >> > > >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN > >> >group > >> >default qlen 1000 > >> > > >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > >> > > >> > inet 127.0.0.1/8 scope host lo > >> > > >> > valid_lft forever preferred_lft forever > >> > > >> > inet6 ::1/128 scope host > >> > > >> > valid_lft forever preferred_lft forever > >> > > >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master > >> >ovirtmgmt state UP group default qlen 1000 > >> > > >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > >> > > >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state > >> >DOWN > >> >group default qlen 1000 > >> > > >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff > >> > > >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN > >> >group > >> >default qlen 1000 > >> > > >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff > >> > > >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN > >group > >> >default qlen 1000 > >> > > >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff > >> > > >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > >noqueue > >> >state UP group default qlen 1000 > >> > > >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > >> > > >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt > >> > > >> > valid_lft forever preferred_lft forever > >> > > >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link > >> > > >> > valid_lft forever preferred_lft forever > >> > > >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state > >DOWN > >> >group > >> >default qlen 1000 > >> > > >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff > >> > > >> > > >> >2020-04-14 09:46:12,876+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 > >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: > >> > > >> > > >> > > >> >2020-04-14 09:46:12,877+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge > >> >hostname.getLocalAddresses:251 > >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] > >> > > >> >2020-04-14 09:46:12,877+0000 DEBUG > >> >otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 > >> >test_hostname exception > >> > > >> >Traceback (most recent call last): > >> > > >> >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >> >line > >> >460, in test_hostname > >> > > >> > not_local_text, > >> > > >> >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >> >line > >> >342, in _validateFQDNresolvability > >> > > >> > addresses=resolvedAddressesAsString > >> > > >> >RuntimeError: ovirt-node-00.phoelex.com resolves to > >64:ff9b::c0a8:13d > >> >192.168.1.61 and not all of them can be mapped to non loopback > >devices > >> >on > >> >this host > >> > > >> >2020-04-14 09:46:12,884+0000 ERROR > >> >otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 > >Host > >> >name > >> >is not valid: ovirt-node-00.phoelex.com resolves to > >64:ff9b::c0a8:13d > >> >192.168.1.61 and not all of them can be mapped to non loopback > >devices > >> >on > >> >this host > >> > > >> >The node I'm running on has an IP address of .61 and resolves > >> >correctly. > >> > > >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >wrote: > >> > > >> >> Where should I be checking if there are any files/folder not owned > >by > >> >> vdsm:kvm? I checked on the mount the HA sits on and it's fine. > >> >> > >> >> How would I go about checking vdsm can access those images? If I > >run > >> >> virsh, it lists them and they were running yesterday even though > >the > >> >HA was > >> >> down. I've since restarted both hosts but the broker is still > >> >spitting out > >> >> the same error (copied below). How do I find the reason
On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: plugin.executeRaw:813 plugin.executeRaw:863 plugin.executeRaw:813 plugin.executeRaw:863 the
> >broker > >> >can't > >> >> connect to the storage? The conf file is already at DEBUG > >verbosity: > >> >> > >> >> [handler_logfile] > >> >> > >> >> class=logging.handlers.TimedRotatingFileHandler > >> >> > >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1,
> >> >> > >> >> level=DEBUG > >> >> > >> >> formatter=long > >> >> > >> >> And what are all these .prob-<num> files that are being created? > >> >There > >> >> are over 250K of them now on the mount I'm using for the Data > >domain. > >> >> They're all of 0 size and of the form, > >> >> /rhev/data-center/mnt/nas-01.phoelex.com: > >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 > >> >> > >> >> @eevans: The volume I have the Data Domain on has TB's free. The > >HA > >> >is > >> >> dead so I can't ssh in. No idea what started these errors and the > >> >other > >> >> VMs were still running happily although they're on a different > >Data > >> >Domain. > >> >> > >> >> Shareef. > >> >> > >> >> MainThread::INFO::2020-04-10 > >> >> > >> > >> > >
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >> >> Connecting the storage > >> >> > >> >> MainThread::INFO::2020-04-10 > >> >> > >> > >> > >
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >> Connecting storage server > >> >> > >> >> MainThread::INFO::2020-04-10 > >> >> > >> > >> > >
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >> Connecting storage server > >> >> > >> >> MainThread::INFO::2020-04-10 > >> >> > >> > >> > >
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >> Refreshing the storage domain > >> >> > >> >> MainThread::WARNING::2020-04-10 > >> >> > >> > >> > >
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo with > >args > >> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >failed: > >> >> > >> >> (code=350, message=Error in storage domain action: > >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >> > >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov > >> ><hunter86_bg@yahoo.com> > >> >> wrote: > >> >> > >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < > >> >>> shareef@jalloq.co.uk> wrote: > >> >>> >OK, let's go through this. I'm looking at the node that at > >least > >> >still > >> >>> >has > >> >>> >some VMs running. virsh also tells me that the HostedEngine VM > >is > >> >>> >running > >> >>> >but it's unresponsive and I can't shut it down. > >> >>> > > >> >>> >1. All storage domains exist and are mounted. > >> >>> >2. The ha_agent exists: > >> >>> > > >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls > >> >/rhev/data-center/mnt/ > >> >>> >nas-01.phoelex.com > >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ > >> >>> > > >> >>> >dom_md ha_agent images master > >> >>> > > >> >>> >3. There are two links > >> >>> > > >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll > >> >/rhev/data-center/mnt/ > >> >>> >nas-01.phoelex.com > >> >>> > \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ > >> >>> > > >> >>> >total 8 > >> >>> > > >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace > >-> > >> >>> > >> >>> > >> > >> > >
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 > >> >>> > > >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata > >-> > >> >>> > >> >>> > >> > >> > >
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 > >> >>> > > >> >>> >4. The services exist but all seem to have some sort of warning: > >> >>> > > >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: > >> >*2020-04-08 > >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 > >sec* > >> >>> > > >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: > >> >*failed > >> >>> >to > >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object > >> >file: > >> >>> >No > >> >>> >such file or directory* > >> >>> > > >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR > >> >failed > >> >>> >to > >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or > >> >directory'Is > >> >>> >the > >> >>> >Hosted Engine setup finished?* > >> >>> > > >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: > >> >2020-04-08 > >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : > >> >cannot > >> >>> >parse > >> >>> >process status data > >> >>> > > >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: > >> >2020-04-08 > >> >>> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 > >: > >> >>> >internal > >> >>> >error: /proc/net/dev: Interface not found > >> >>> > > >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: > >> >2020-04-08 > >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : > >End > >> >of > >> >>> >file > >> >>> >while reading data: Input/output error > >> >>> > > >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: > >> >2020-04-09 > >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : > >End > >> >of > >> >>> >file > >> >>> >while reading data: Input/output error > >> >>> > > >> >>> >5 & 6. The broker log is continually printing this error: > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>> >Running broker > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) > >> >>> >Starting monitor > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Searching for submonitors in > >> >>> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker > >> >>> > > >> >>> >/submonitors > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor network > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor cpu-load-no-engine > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor mgmt-bridge > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor network > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor cpu-load > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor engine-health > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor mgmt-bridge > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor cpu-load-no-engine > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor cpu-load > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor mem-free > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor storage-domain > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor storage-domain > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor mem-free > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Loaded submonitor engine-health > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >Finished loading submonitors > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) > >> >>> >Starting storage broker > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> >>> >Connecting to VDSM > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) > >> >>> >Creating a new json-rpc connection to VDSM > >> >>> > > >> >>> >Client localhost:54321::DEBUG::2020-04-09 > >> >>> >08:07:31,453::concurrent::258::root::(run) START thread > >> ><Thread(Client > >> >>> >localhost:54321, started daemon 139992488138496)> (func=<bound > >> >method > >> >>> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor > >> >object at > >> >>> >0x7f528acabc90>>, args=(), kwargs={}) > >> >>> > > >> >>> >Client localhost:54321::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) > >> >>> >Stomp connection established > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >Sending > >> >>> >response > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> >>> >Connecting the storage > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>> >Connecting storage server > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >Sending > >> >>> >response > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >Sending > >> >>> >response > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) > >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not > >> >available > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>> >Connecting storage server > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >Sending > >> >>> >response > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] > >> >>> > > >> >>> >MainThread::INFO::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>> >Refreshing the storage domain > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >Sending > >> >>> >response > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>> >Error refreshing storage domain: Command StorageDomain.getStats > >> >with > >> >>> >args > >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >failed: > >> >>> > > >> >>> >(code=350, message=Error in storage domain action: > >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >Sending > >> >>> >response > >> >>> > > >> >>> >MainThread::DEBUG::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) > >> >>> >Command StorageDomain.getInfo with args {'storagedomainID': > >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >> >>> > > >> >>> >(code=350, message=Error in storage domain action: > >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>> > > >> >>> >MainThread::WARNING::2020-04-09 > >> >>> > >> >>> > >> > >> > >
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo with > >args > >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >failed: > >> >>> > > >> >>> >(code=350, message=Error in storage domain action: > >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>> > > >> >>> > > >> >>> >The UUID it is moaning about is indeed the one that the HA sits > >on > >> >and > >> >>> >is > >> >>> >the one I listed the contents of in step 2 above. > >> >>> > > >> >>> > > >> >>> >So why can't it see this domain? > >> >>> > > >> >>> > > >> >>> >Thanks, Shareef. > >> >>> > > >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov > >> ><hunter86_bg@yahoo.com> > >> >>> >wrote: > >> >>> > > >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < > >> >>> >> shareef@jalloq.co.uk> wrote: > >> >>> >> >Don't know if this is useful or not, but I just tried to > >> >shutdown > >> >>> >and > >> >>> >> >start > >> >>> >> >another VM on one of the hosts and get the following error: > >> >>> >> > > >> >>> >> >virsh # start scratch > >> >>> >> > > >> >>> >> >error: Failed to start domain scratch > >> >>> >> > > >> >>> >> >error: Network not found: no network with matching name > >> >>> >> >'vdsm-ovirtmgmt' > >> >>> >> > > >> >>> >> >Is this not referring to the interface name as the network is > >> >called > >> >>> >> >'ovirtmgnt'. > >> >>> >> > > >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq > >> >>> ><shareef@jalloq.co.uk> > >> >>> >> >wrote: > >> >>> >> > > >> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't come > >up > >> >and > >> >>> >the > >> >>> >> >> agent.log is full of the same errors. > >> >>> >> >> > >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq > >> >>> ><shareef@jalloq.co.uk> > >> >>> >> >> wrote: > >> >>> >> >> > >> >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh on > >the > >> >>> >second > >> >>> >> >host > >> >>> >> >>> but my first host is still dead. > >> >>> >> >>> > >> >>> >> >>> First of all, what are these 56,317 .prob- files that get > >> >dumped > >> >>> >to > >> >>> >> >the > >> >>> >> >>> NFS mounts? > >> >>> >> >>> > >> >>> >> >>> Secondly, why doesn't the node mount the NFS directories > >at > >> >boot? > >> >>> >> >Is > >> >>> >> >>> that the issue with this particular node? > >> >>> >> >>> > >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM > ><eevans@digitaldatatechs.com> > >> >>> >wrote: > >> >>> >> >>> > >> >>> >> >>>> Did you try virsh list --inactive > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> Eric Evans > >> >>> >> >>>> > >> >>> >> >>>> Digital Data Services LLC. > >> >>> >> >>>> > >> >>> >> >>>> 304.660.9080 > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> > >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM > >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> > >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> > >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - > >how > >> >to > >> >>> >> >rescue? > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> I've now shut down the VMs on one host and rebooted it > >but > >> >the > >> >>> >> >agent > >> >>> >> >>>> service doesn't start. If I run 'hosted-engine > >--vm-status' > >> >I > >> >>> >get: > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> The hosted engine configuration has not been retrieved > >from > >> >>> >shared > >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is running and > >> >the > >> >>> >> >storage > >> >>> >> >>>> server is reachable. > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> and indeed if I list the mounts under > >/rhev/data-center/mnt, > >> >>> >only > >> >>> >> >one of > >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO > >> >Domain > >> >>> >> >and two > >> >>> >> >>>> Data Domains. Only one Data Domain has mounted and this > >has > >> >>> >lots > >> >>> >> >of .prob > >> >>> >> >>>> files in. So why haven't the other NFS exports been > >> >mounted? > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> Manually mounting them doesn't seem to have helped much > >> >either. > >> >>> >I > >> >>> >> >can > >> >>> >> >>>> start the broker service but the agent service says no. > >> >Same > >> >>> >error > >> >>> >> >as the > >> >>> >> >>>> one in my last email. > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> Shareef. > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq > >> >>> >> ><shareef@jalloq.co.uk> > >> >>> >> >>>> wrote: > >> >>> >> >>>> > >> >>> >> >>>> Right, still down. I've run virsh and it doesn't know > >> >anything > >> >>> >> >about > >> >>> >> >>>> the engine vm. > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> I've restarted the broker and agent services and I still > >get > >> >>> >> >nothing in > >> >>> >> >>>> virsh->list. > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see > >lots > >> >of > >> >>> >> >errors: > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> broker.log: > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Searching for submonitors in > >> >>> >> >>>> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor network > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor cpu-load-no-engine > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor mgmt-bridge > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor network > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor cpu-load > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor engine-health > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor mgmt-bridge > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor cpu-load-no-engine > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor cpu-load > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor mem-free > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor storage-domain > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor storage-domain > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor mem-free > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Loaded submonitor engine-health > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Finished loading submonitors > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> >>> >> >>>> Connecting the storage > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>> >> >>>> Connecting storage server > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>> >> >>>> Connecting storage server > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>> >> >>>> Refreshing the storage domain > >> >>> >> >>>> > >> >>> >> >>>> MainThread::WARNING::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >> >>> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo > >> >with > >> >>> >args > >> >>> >> >>>> {'storagedomainID': > >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >> >>> >failed: > >> >>> >> >>>> > >> >>> >> >>>> (code=350, message=Error in storage domain action: > >> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>> >> >>>> Searching for submonitors in > >> >>> >> >>>> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> agent.log: > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>> >> >>>> Trying to restart agent > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> > >> > >
>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>> >> >>>> Agent shutting down > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> > >> > >
>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) > >> >>> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >> >>> >> >>>> Initializing ha-broker connection > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) > >> >>> >> >>>> Starting monitor network, options {'tcp_t_address': '', > >> >>> >> >'network_test': > >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > >> >>> >> >>>> > >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >> >>> >> >>>> Failed to start necessary monitors > >> >>> >> >>>> > >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>> >> >>>> Traceback (most recent call last): > >> >>> >> >>>> > >> >>> >> >>>> File > >> >>> >> >>>> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >> >>> >> >>>> line 131, in _run_agent > >> >>> >> >>>> > >> >>> >> >>>> return action(he) > >> >>> >> >>>> > >> >>> >> >>>> File > >> >>> >> >>>> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >> >>> >> >>>> line 55, in action_proper > >> >>> >> >>>> > >> >>> >> >>>> return he.start_monitoring() > >> >>> >> >>>> > >> >>> >> >>>> File > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >> >>> >> >>>> line 432, in start_monitoring > >> >>> >> >>>> > >> >>> >> >>>> self._initialize_broker() > >> >>> >> >>>> > >> >>> >> >>>> File > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >> >>> >> >>>> line 556, in _initialize_broker > >> >>> >> >>>> > >> >>> >> >>>> m.get('options', {})) > >> >>> >> >>>> > >> >>> >> >>>> File > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >> >>> >> >>>> line 89, in start_monitor > >> >>> >> >>>> > >> >>> >> >>>> ).format(t=type, o=options, e=e) > >> >>> >> >>>> > >> >>> >> >>>> RequestError: brokerlink - failed to start monitor via > >> >>> >> >ovirt-ha-broker: > >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: 'network', > >> >>> >options: > >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', > >'tcp_t_port': > >> >'', > >> >>> >> >'addr': > >> >>> >> >>>> '192.168.1.99'}] > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> > >> > >
>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>> >> >>>> Trying to restart agent > >> >>> >> >>>> > >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>> >> >>>> > >> >>> >> > >> >>> > >> > >
>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>> >> >>>> Agent shutting down > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov > >> >>> >> ><hunter86_bg@yahoo.com> > >> >>> >> >>>> wrote: > >> >>> >> >>>> > >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < > >> >>> >> >>>> matonb@ltresources.co.uk> wrote: > >> >>> >> >>>> >On the host you tried to restart the engine on: > >> >>> >> >>>> > > >> >>> >> >>>> >Add an alias to virsh (authenticates with > >virsh_auth.conf) > >> >>> >> >>>> > > >> >>> >> >>>> >alias virsh='virsh -c > >> >>> >> >>>> > >> >>> > >qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' > >> >>> >> >>>> > > >> >>> >> >>>> >Then run virsh: > >> >>> >> >>>> > > >> >>> >> >>>> >virsh > >> >>> >> >>>> > > >> >>> >> >>>> >virsh # list > >> >>> >> >>>> > Id Name State > >> >>> >> >>>>
> >> >>> >> >>>> > xx HostedEngine Paused > >> >>> >> >>>> > xx ********** running > >> >>> >> >>>> > ... > >> >>> >> >>>> > xx ********** running > >> >>> >> >>>> > > >> >>> >> >>>> >HostedEngine should be in the list, try and resume the > >> >engine: > >> >>> >> >>>> > > >> >>> >> >>>> >virsh # resume HostedEngine > >> >>> >> >>>> > > >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq > >> >>> ><shareef@jalloq.co.uk> > >> >>> >> >>>> >wrote: > >> >>> >> >>>> > > >> >>> >> >>>> >> Thanks! > >> >>> >> >>>> >> > >> >>> >> >>>> >> The status hangs due to, I guess, the VM being > >down.... > >> >>> >> >>>> >> > >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >> >>> >> >>>> >> VM exists and is down, cleaning up and restarting > >> >>> >> >>>> >> VM in WaitForLaunch > >> >>> >> >>>> >> > >> >>> >> >>>> >> but this doesn't seem to do anything. OK, after a > >while > >> >I > >> >>> >get a > >> >>> >> >>>> >status of > >> >>> >> >>>> >> it being barfed... > >> >>> >> >>>> >> > >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status > >==-- > >> >>> >> >>>> >> > >> >>> >> >>>> >> conf_on_shared_storage : True > >> >>> >> >>>> >> Status up-to-date : False > >> >>> >> >>>> >> Hostname : > >> >>> >ovirt-node-00.phoelex.com > >> >>> >> >>>> >> Host ID : 1 > >> >>> >> >>>> >> Engine status : unknown > >stale-data > >> >>> >> >>>> >> Score : 3400 > >> >>> >> >>>> >> stopped : False > >> >>> >> >>>> >> Local maintenance : False > >> >>> >> >>>> >> crc32 : 9c4a034b > >> >>> >> >>>> >> local_conf_timestamp : 523362 > >> >>> >> >>>> >> Host timestamp : 523608 > >> >>> >> >>>> >> Extra metadata (valid at timestamp): > >> >>> >> >>>> >> metadata_parse_version=1 > >> >>> >> >>>> >> metadata_feature_version=1 > >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) > >> >>> >> >>>> >> host-id=1 > >> >>> >> >>>> >> score=3400 > >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06
> >> >>> >> >>>> >> conf_on_shared_storage=True > >> >>> >> >>>> >> maintenance=False > >> >>> >> >>>> >> state=EngineDown > >> >>> >> >>>> >> stopped=False > >> >>> >> >>>> >> > >> >>> >> >>>> >> > >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status > >==-- > >> >>> >> >>>> >> > >> >>> >> >>>> >> conf_on_shared_storage : True > >> >>> >> >>>> >> Status up-to-date : True > >> >>> >> >>>> >> Hostname : > >> >>> >ovirt-node-01.phoelex.com > >> >>> >> >>>> >> Host ID : 2 > >> >>> >> >>>> >> Engine status : {"reason": "bad > >vm > >> >>> >status", > >> >>> >> >>>> >"health": > >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} > >> >>> >> >>>> >> Score : 0 > >> >>> >> >>>> >> stopped : False > >> >>> >> >>>> >> Local maintenance : False > >> >>> >> >>>> >> crc32 : 5045f2eb > >> >>> >> >>>> >> local_conf_timestamp : 1737037 > >> >>> >> >>>> >> Host timestamp : 1737283 > >> >>> >> >>>> >> Extra metadata (valid at timestamp): > >> >>> >> >>>> >> metadata_parse_version=1 > >> >>> >> >>>> >> metadata_feature_version=1 > >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >> >>> >> >>>> >> host-id=2 > >> >>> >> >>>> >> score=0 > >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 > >2020) > >> >>> >> >>>> >> conf_on_shared_storage=True > >> >>> >> >>>> >> maintenance=False > >> >>> >> >>>> >> state=EngineUnexpectedlyDown > >> >>> >> >>>> >> stopped=False > >> >>> >> >>>> >> > >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > >> >>> >> >>>> ><matonb@ltresources.co.uk> > >> >>> >> >>>> >> wrote: > >> >>> >> >>>> >> > >> >>> >> >>>> >>> First steps, on one of your hosts as root: > >> >>> >> >>>> >>> > >> >>> >> >>>> >>> To get information: > >> >>> >> >>>> >>> hosted-engine --vm-status > >> >>> >> >>>> >>> > >> >>> >> >>>> >>> To start the engine: > >> >>> >> >>>> >>> hosted-engine --vm-start > >> >>> >> >>>> >>> > >> >>> >> >>>> >>> > >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq > >> >>> >> ><shareef@jalloq.co.uk> > >> >>> >> >>>> >wrote: > >> >>> >> >>>> >>> > >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into it > >> >either. > >> >>> >If > >> >>> >> >I > >> >>> >> >>>> >try to > >> >>> >> >>>> >>>> log into the web-ui of the node it is running on, I > >get > >> >>> >> >redirected > >> >>> >> >>>> >because > >> >>> >> >>>> >>>> the node can't reach the engine. > >> >>> >> >>>> >>>> > >> >>> >> >>>> >>>> What are my next steps? > >> >>> >> >>>> >>>> > >> >>> >> >>>> >>>> Shareef. > >> >>> >> >>>> >>>>
> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org > >> >>> >> >>>> >>>> To unsubscribe send an email to > >users-leave@ovirt.org > >> >>> >> >>>> >>>> Privacy Statement: > >> >>> >https://www.ovirt.org/privacy-policy.html > >> >>> >> >>>> >>>> oVirt Code of Conduct: > >> >>> >> >>>> >>>> > >> >https://www.ovirt.org/community/about/community-guidelines/ > >> >>> >> >>>> >>>> List Archives: > >> >>> >> >>>> >>>> > >> >>> >> >>>> > > >> >>> >> >>>> > >> >>> >> > > >> >>> >> > >> >>> > > >> >>> > >> > > >> > > >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
> >> >>> >> >>>> >>>> > >> >>> >> >>>> >>> > >> >>> >> >>>> > >> >>> >> >>>> This has to be resolved: > >> >>> >> >>>> > >> >>> >> >>>> Engine status : unknown stale-data > >> >>> >> >>>> > >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the > >> >same, > >> >>> >> >restart > >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service > >> >>> >> >>>> > >> >>> >> >>>> Verify that the engine's storage is available. Then > >monitor > >> >the > >> >>> >> >broker > >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha > >> >>> >> >>>> > >> >>> >> >>>> Best Regards, > >> >>> >> >>>> Strahil Nikolov > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> >>>> > >> >>> >> > >> >>> >> Hi Shareef, > >> >>> >> > >> >>> >> The flow of activation oVirt is more complex than a plain KVM. > >> >>> >> Mounting of the domains happen during the activation of the > >node > >> >( > >> >>> >the > >> >>> >> HostedEngine is activating everything needed). > >> >>> >> > >> >>> >> Focus on the HostedEngine VM. > >> >>> >> Is it running properly ? > >> >>> >> > >> >>> >> If not,try: > >> >>> >> 1. Verify that the storage domain exists > >> >>> >> 2. Check if it has 'ha_agents' directory > >> >>> >> 3. Check if the links are OK, if not you can safely remove > >the > >> >links > >> >>> >> > >> >>> >> 4. Next check the services are running: > >> >>> >> A) sanlock > >> >>> >> B) supervdsmd > >> >>> >> C) vdsmd > >> >>> >> D) libvirtd > >> >>> >> > >> >>> >> 5. Increase the log level for broker and agent services: > >> >>> >> > >> >>> >> cd /etc/ovirt-hosted-engine-ha > >> >>> >> vim *-log.conf > >> >>> >> > >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent > >> >>> >> > >> >>> >> 6. Check what they are complaining about > >> >>> >> Keep in mind that agent will keep throwing errors untill the > >> >broker > >> >>> >stops > >> >>> >> doing it (agent depends on broker), so broker must be OK > >before > >> >>> >> peoceeding with the agent log. > >> >>> >> > >> >>> >> About the manual VM start, you need 2 things: > >> >>> >> > >> >>> >> 1. Define the VM network > >> >>> >> # cat vdsm-ovirtmgmt.xml <network> > >> >>> >> <name>vdsm-ovirtmgmt</name> > >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> > >> >>> >> <forward mode='bridge'/> > >> >>> >> <bridge name='ovirtmgmt'/> > >> >>> >> </network> > >> >>> >> > >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define > >> >vdsm-ovirtmgmt.xml > >> >>> >> > >> >>> >> 2. Get an xml definition which can be found in the vdsm log. > >> >Every VM > >> >>> >at > >> >>> >> start up has it's configuration printed out in vdsm log on > >the > >> >host > >> >>> >it > >> >>> >> starts. > >> >>> >> Save to file and then: > >> >>> >> A) virsh define myvm.xml > >> >>> >> B) virsh start myvm > >> >>> >> > >> >>> >> It seems there is/was a problem with your NFS shares. > >> >>> >> > >> >>> >> > >> >>> >> Best Regards, > >> >>> >> Strahil Nikolov > >> >>> >> > >> >>> > >> >>> Hey Shareef, > >> >>> > >> >>> Check if there are any files or folders not owned by vdsm:kvm . > >> >Something > >> >>> like this: > >> >>> > >> >>> find . -not -user 36 -not -group 36 -print > >> >>> > >> >>> Also check if vdsm can access the images in the > >> >>> '<vol-mount-point>/images' directories. > >> >>> > >> >>> Best Regards, > >> >>> Strahil Nikolov > >> >>> > >> >> > >> > >> And the IPv6 address '64:ff9b::c0a8:13d' ? > >> > >> I don't see in the log output. > >> > >> Best Regards, > >> Strahil Nikolov > >> > > Based on your output , you got a PTR record for IPv4 & IPv6 ... most > probably it's the reason. > > Set the IPv6 on the interface and try again. > > Best Regards, > Strahil Nikolov >
Do you have firewalld up and running on the host ?
Best Regards, Strahil Nikolov
I am guessing, but your interface is not asaigned to any zone , right? Just add the interface to the default zone (usually 'public').
Best Regards, Strahil Nikolov

On April 15, 2020 5:59:46 PM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote: >Thanks for your help but I've decided to try and reinstall from >scratch. >This is taking too long. > >On Wed, Apr 15, 2020 at 3:25 PM Strahil Nikolov <hunter86_bg@yahoo.com> >wrote: > >> On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq < >> shareef@jalloq.co.uk> wrote: >> >Yes, but there are no zones set up, just ports 22, 6801 adn 6900. >> > >> >On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov >> ><hunter86_bg@yahoo.com> >> >wrote: >> > >> >> On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < >> >> shareef@jalloq.co.uk> wrote: >> >> >Oh this is painful. It seems to progress if you have both >> >> >he_force_ipv4 >> >> >set and run the deployment with the '--4' switch. >> >> > >> >> >But then I get a failure when the ansible script checks for >> >> >firewalld-zones >> >> >and doesn't get anything back. Should the deployment flow not be >> >> >setting >> >> >any zones it needs? >> >> > >> >> >2020-04-15 10:57:25,439+0000 INFO >> >> >otopi.ovirt_hosted_engine_setup.ansible_utils >> >> >ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup >: >> >Get >> >> >active list of active firewalld zones] >> >> > >> >> >2020-04-15 10:57:26,641+0000 DEBUG >> >> >otopi.ovirt_hosted_engine_setup.ansible_utils >> >> >ansible_utils._process_output:103 {u'stderr_lines': [], >u'changed': >> >> >True, >> >> >u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, >> >> >u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd >> >> >--get-active-zones | grep -v "^\\s*interfaces"', u'start': >> >u'2020-04-15 >> >> >10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', >> >u'rc': >> >> >1, >> >> >u'invocation': {u'module_args': {u'creates': None, u'executable': >> >None, >> >> >u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': >> >u'set >> >> >-euo >> >> >pipefail && firewall-cmd --get-active-zones | grep -v >> >> >"^\\s*interfaces"', >> >> >u'removes': None, u'argv': None, u'warn': True, u'chdir': None, >> >> >u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': >[], >> >> >u'msg': >> >> >u'non-zero return code'} >> >> > >> >> >2020-04-15 10:57:26,741+0000 ERROR >> >> >otopi.ovirt_hosted_engine_setup.ansible_utils >> >> >ansible_utils._process_output:107 fatal: [localhost]: FAILED! => >> >> >{"changed": true, "cmd": "set -euo pipefail && firewall-cmd >> >> >--get-active-zones | grep -v \"^\\s*interfaces\"", "delta": >> >> >"0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": >> >"non-zero >> >> >return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", >> >"stderr": >> >> >"", >> >> >"stderr_lines": [], "stdout": "", "stdout_lines": []} >> >> > >> >> >On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq >> ><shareef@jalloq.co.uk> >> >> >wrote: >> >> > >> >> >> Ha, spoke too soon. It's now stuck in a loop and a google >points >> >me >> >> >at >> >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1746585 >> >> >> >> >> >> However, forcing ipv4 doesn't seem to have fixed the loop. >> >> >> >> >> >> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq >> ><shareef@jalloq.co.uk> >> >> >> wrote: >> >> >> >> >> >>> OK, that seems to have fixed it, thanks. Is this a side >effect >> >of >> >> >>> redeploying the HE over a first time install? Nothing has >changed >> >in >> >> >our >> >> >>> setup and I didn't need to do this when I initially set up our >> >> >nodes. >> >> >>> >> >> >>> >> >> >>> >> >> >>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov >> >> ><hunter86_bg@yahoo.com> >> >> >>> wrote: >> >> >>> >> >> >>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < >> >> >>>> shareef@jalloq.co.uk> wrote: >> >> >>>> >Hmmm, we're not using ipv6. Is that the issue? >> >> >>>> > >> >> >>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov >> >> ><hunter86_bg@yahoo.com> >> >> >>>> >wrote: >> >> >>>> > >> >> >>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < >> >> >>>> >> shareef@jalloq.co.uk> wrote: >> >> >>>> >> >Right, I've given up on recovering the HE so want to try >and >> >> >>>> >redeploy >> >> >>>> >> >it. >> >> >>>> >> >There doesn't seem to be enough information to debug why >the >> >> >>>> >> >broker/agent >> >> >>>> >> >won't start cleanly. >> >> >>>> >> > >> >> >>>> >> >In running 'hosted-engine --deploy', I'm seeing the >> >following >> >> >error >> >> >>>> >in >> >> >>>> >> >the >> >> >>>> >> >setup validation phase: >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:08,922+0000 DEBUG >> >> >otopi.plugins.otopi.dialog.human >> >> >>>> >> >dialog.__logString:204 DIALOG:SEND Please >> >> >provide >> >> >>>> >the >> >> >>>> >> >hostname of this host on the management network >> >> >>>> >> >[ovirt-node-00.phoelex.com]: >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,831+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >>>> >> >hostname.getResolvedAddresses:432 >> >> >>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', >> >> >'192.168.1.61']) >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >>>> >> >hostname._validateFQDNresolvability:289 >> >> >ovirt-node-00.phoelex.com >> >> >>>> >> >resolves >> >> >>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >plugin.executeRaw:813 >> >> >>>> >> >execute: >> >> >>>> >> >['/usr/bin/dig', '+noall', '+answer', >> >> >'ovirt-node-00.phoelex.com', >> >> >>>> >> >'ANY'], >> >> >>>> >> >executable='None', cwd='None', env=None >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,871+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >plugin.executeRaw:863 >> >> >>>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' >> >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >plugin.execute:921 >> >> >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >> >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: >> >> >>>> >> > >> >> >>>> >> >ovirt-node-00.phoelex.com. 86400 IN A >192.168.1.61 >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >plugin.execute:926 >> >> >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >> >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >plugin.executeRaw:813 >> >> >>>> >> >execute: >> >> >>>> >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', >> >> >env=None >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >plugin.executeRaw:863 >> >> >>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >plugin.execute:921 >> >> >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: >> >> >>>> >> > >> >> >>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue >state >> >> >UNKNOWN >> >> >>>> >> >group >> >> >>>> >> >default qlen 1000 >> >> >>>> >> > >> >> >>>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> >> >>>> >> > >> >> >>>> >> > inet 127.0.0.1/8 scope host lo >> >> >>>> >> > >> >> >>>> >> > valid_lft forever preferred_lft forever >> >> >>>> >> > >> >> >>>> >> > inet6 ::1/128 scope host >> >> >>>> >> > >> >> >>>> >> > valid_lft forever preferred_lft forever >> >> >>>> >> > >> >> >>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >mq >> >> >master >> >> >>>> >> >ovirtmgmt state UP group default qlen 1000 >> >> >>>> >> > >> >> >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >> >> >>>> >> > >> >> >>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 >qdisc >> >mq >> >> >state >> >> >>>> >> >DOWN >> >> >>>> >> >group default qlen 1000 >> >> >>>> >> > >> >> >>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff >> >> >>>> >> > >> >> >>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop >> >state >> >> >DOWN >> >> >>>> >> >group >> >> >>>> >> >default qlen 1000 >> >> >>>> >> > >> >> >>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff >> >> >>>> >> > >> >> >>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop >state >> >DOWN >> >> >>>> >group >> >> >>>> >> >default qlen 1000 >> >> >>>> >> > >> >> >>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff >> >> >>>> >> > >> >> >>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 >> >qdisc >> >> >>>> >noqueue >> >> >>>> >> >state UP group default qlen 1000 >> >> >>>> >> > >> >> >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >> >> >>>> >> > >> >> >>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global >> >> >ovirtmgmt >> >> >>>> >> > >> >> >>>> >> > valid_lft forever preferred_lft forever >> >> >>>> >> > >> >> >>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link >> >> >>>> >> > >> >> >>>> >> > valid_lft forever preferred_lft forever >> >> >>>> >> > >> >> >>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc >noop >> >> >state >> >> >>>> >DOWN >> >> >>>> >> >group >> >> >>>> >> >default qlen 1000 >> >> >>>> >> > >> >> >>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >plugin.execute:926 >> >> >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >>>> >> >hostname.getLocalAddresses:251 >> >> >>>> >> >addresses: [u'192.168.1.61', >u'fe80::ae1f:6bff:febc:326a'] >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >hostname.test_hostname:464 >> >> >>>> >> >test_hostname exception >> >> >>>> >> > >> >> >>>> >> >Traceback (most recent call last): >> >> >>>> >> > >> >> >>>> >> >File >> >> >"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >> >> >>>> >> >line >> >> >>>> >> >460, in test_hostname >> >> >>>> >> > >> >> >>>> >> > not_local_text, >> >> >>>> >> > >> >> >>>> >> >File >> >> >"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >> >> >>>> >> >line >> >> >>>> >> >342, in _validateFQDNresolvability >> >> >>>> >> > >> >> >>>> >> > addresses=resolvedAddressesAsString >> >> >>>> >> > >> >> >>>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to >> >> >>>> >64:ff9b::c0a8:13d >> >> >>>> >> >192.168.1.61 and not all of them can be mapped to non >> >loopback >> >> >>>> >devices >> >> >>>> >> >on >> >> >>>> >> >this host >> >> >>>> >> > >> >> >>>> >> >2020-04-14 09:46:12,884+0000 ERROR >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >dialog.queryEnvKey:120 >> >> >>>> >Host >> >> >>>> >> >name >> >> >>>> >> >is not valid: ovirt-node-00.phoelex.com resolves to >> >> >>>> >64:ff9b::c0a8:13d >> >> >>>> >> >192.168.1.61 and not all of them can be mapped to non >> >loopback >> >> >>>> >devices >> >> >>>> >> >on >> >> >>>> >> >this host >> >> >>>> >> > >> >> >>>> >> >The node I'm running on has an IP address of .61 and >> >resolves >> >> >>>> >> >correctly. >> >> >>>> >> > >> >> >>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq >> >> >>>> ><shareef@jalloq.co.uk> >> >> >>>> >> >wrote: >> >> >>>> >> > >> >> >>>> >> >> Where should I be checking if there are any >files/folder >> >not >> >> >owned >> >> >>>> >by >> >> >>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and >it's >> >> >fine. >> >> >>>> >> >> >> >> >>>> >> >> How would I go about checking vdsm can access those >> >images? >> >> >If I >> >> >>>> >run >> >> >>>> >> >> virsh, it lists them and they were running yesterday >even >> >> >though >> >> >>>> >the >> >> >>>> >> >HA was >> >> >>>> >> >> down. I've since restarted both hosts but the broker >is >> >> >still >> >> >>>> >> >spitting out >> >> >>>> >> >> the same error (copied below). How do I find the >reason >> >the >> >> >>>> >broker >> >> >>>> >> >can't >> >> >>>> >> >> connect to the storage? The conf file is already at >DEBUG >> >> >>>> >verbosity: >> >> >>>> >> >> >> >> >>>> >> >> [handler_logfile] >> >> >>>> >> >> >> >> >>>> >> >> class=logging.handlers.TimedRotatingFileHandler >> >> >>>> >> >> >> >> >>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', >'d', >> >1, >> >> >7) >> >> >>>> >> >> >> >> >>>> >> >> level=DEBUG >> >> >>>> >> >> >> >> >>>> >> >> formatter=long >> >> >>>> >> >> >> >> >>>> >> >> And what are all these .prob-<num> files that are being >> >> >created? >> >> >>>> >> >There >> >> >>>> >> >> are over 250K of them now on the mount I'm using for >the >> >Data >> >> >>>> >domain. >> >> >>>> >> >> They're all of 0 size and of the form, >> >> >>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: >> >> >>>> >> >> >> >_volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 >> >> >>>> >> >> >> >> >>>> >> >> @eevans: The volume I have the Data Domain on has TB's >> >free. >> >> > The >> >> >>>> >HA >> >> >>>> >> >is >> >> >>>> >> >> dead so I can't ssh in. No idea what started these >errors >> >> >and the >> >> >>>> >> >other >> >> >>>> >> >> VMs were still running happily although they're on a >> >> >different >> >> >>>> >Data >> >> >>>> >> >Domain. >> >> >>>> >> >> >> >> >>>> >> >> Shareef. >> >> >>>> >> >> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >> >> >>>> >> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >>>> >> >> Connecting the storage >> >> >>>> >> >> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >> >> >>>> >> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >> Connecting storage server >> >> >>>> >> >> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >> >> >>>> >> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >> Connecting storage server >> >> >>>> >> >> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >> >> >>>> >> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >> Refreshing the storage domain >> >> >>>> >> >> >> >> >>>> >> >> MainThread::WARNING::2020-04-10 >> >> >>>> >> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >> >>>> >> >> Can't connect vdsm storage: Command >StorageDomain.getInfo >> >> >with >> >> >>>> >args >> >> >>>> >> >> {'storagedomainID': >> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >> >>>> >failed: >> >> >>>> >> >> >> >> >>>> >> >> (code=350, message=Error in storage domain action: >> >> >>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >>>> >> >> >> >> >>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov >> >> >>>> >> ><hunter86_bg@yahoo.com> >> >> >>>> >> >> wrote: >> >> >>>> >> >> >> >> >>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq >< >> >> >>>> >> >>> shareef@jalloq.co.uk> wrote: >> >> >>>> >> >>> >OK, let's go through this. I'm looking at the node >that >> >at >> >> >>>> >least >> >> >>>> >> >still >> >> >>>> >> >>> >has >> >> >>>> >> >>> >some VMs running. virsh also tells me that the >> >> >HostedEngine VM >> >> >>>> >is >> >> >>>> >> >>> >running >> >> >>>> >> >>> >but it's unresponsive and I can't shut it down. >> >> >>>> >> >>> > >> >> >>>> >> >>> >1. All storage domains exist and are mounted. >> >> >>>> >> >>> >2. The ha_agent exists: >> >> >>>> >> >>> > >> >> >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls >> >> >>>> >> >/rhev/data-center/mnt/ >> >> >>>> >> >>> >nas-01.phoelex.com >> >> >>>> >> >>> >\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >> >> >>>> >> >>> > >> >> >>>> >> >>> >dom_md ha_agent images master >> >> >>>> >> >>> > >> >> >>>> >> >>> >3. There are two links >> >> >>>> >> >>> > >> >> >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll >> >> >>>> >> >/rhev/data-center/mnt/ >> >> >>>> >> >>> >nas-01.phoelex.com >> >> >>>> >> >>> >> >> >>>> >> >>>\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ >> >> >>>> >> >>> > >> >> >>>> >> >>> >total 8 >> >> >>>> >> >>> > >> >> >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >> >> >hosted-engine.lockspace >> >> >>>> >-> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 >> >> >>>> >> >>> > >> >> >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >> >> >hosted-engine.metadata >> >> >>>> >-> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 >> >> >>>> >> >>> > >> >> >>>> >> >>> >4. The services exist but all seem to have some sort >of >> >> >warning: >> >> >>>> >> >>> > >> >> >>>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com >> >sanlock[1728]: >> >> >>>> >> >*2020-04-08 >> >> >>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write >> >time >> >> >10 >> >> >>>> >sec* >> >> >>>> >> >>> > >> >> >>>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com >> >> >supervdsmd[29409]: >> >> >>>> >> >*failed >> >> >>>> >> >>> >to >> >> >>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open >> >shared >> >> >object >> >> >>>> >> >file: >> >> >>>> >> >>> >No >> >> >>>> >> >>> >such file or directory* >> >> >>>> >> >>> > >> >> >>>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com >vdsm[4801]: >> >> >*ERROR >> >> >>>> >> >failed >> >> >>>> >> >>> >to >> >> >>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such >file >> >or >> >> >>>> >> >directory'Is >> >> >>>> >> >>> >the >> >> >>>> >> >>> >Hosted Engine setup finished?* >> >> >>>> >> >>> > >> >> >>>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com >> >> >libvirtd[29307]: >> >> >>>> >> >2020-04-08 >> >> >>>> >> >>> >22:48:27.134+0000: 29309: warning : >> >qemuGetProcessInfo:1404 >> >> >: >> >> >>>> >> >cannot >> >> >>>> >> >>> >parse >> >> >>>> >> >>> >process status data >> >> >>>> >> >>> > >> >> >>>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com >> >libvirtd[29307]: >> >> >>>> >> >2020-04-08 >> >> >>>> >> >>> >22:48:27.134+0000: 29309: error : >> >> >virNetDevTapInterfaceStats:764 >> >> >>>> >: >> >> >>>> >> >>> >internal >> >> >>>> >> >>> >error: /proc/net/dev: Interface not found >> >> >>>> >> >>> > >> >> >>>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com >> >libvirtd[29307]: >> >> >>>> >> >2020-04-08 >> >> >>>> >> >>> >23:09:39.844+0000: 29307: error : >> >virNetSocketReadWire:1806 >> >> >: >> >> >>>> >End >> >> >>>> >> >of >> >> >>>> >> >>> >file >> >> >>>> >> >>> >while reading data: Input/output error >> >> >>>> >> >>> > >> >> >>>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com >> >libvirtd[29307]: >> >> >>>> >> >2020-04-09 >> >> >>>> >> >>> >01:05:26.660+0000: 29307: error : >> >virNetSocketReadWire:1806 >> >> >: >> >> >>>> >End >> >> >>>> >> >of >> >> >>>> >> >>> >file >> >> >>>> >> >>> >while reading data: Input/output error >> >> >>>> >> >>> > >> >> >>>> >> >>> >5 & 6. The broker log is continually printing this >> >error: >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >>>> >> >>> >Running broker >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) >> >> >>>> >> >>> >Starting monitor >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Searching for submonitors in >> >> >>>> >> >>> >> >> >>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker >> >> >>>> >> >>> > >> >> >>>> >> >>> >/submonitors >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor network >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor cpu-load-no-engine >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor mgmt-bridge >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor network >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor cpu-load >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor engine-health >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor mgmt-bridge >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor cpu-load-no-engine >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor cpu-load >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor mem-free >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor storage-domain >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor storage-domain >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor mem-free >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Loaded submonitor engine-health >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >Finished loading submonitors >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) >> >> >>>> >> >>> >Starting storage broker >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >>>> >> >>> >Connecting to VDSM >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) >> >> >>>> >> >>> >Creating a new json-rpc connection to VDSM >> >> >>>> >> >>> > >> >> >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >> >> >>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START >thread >> >> >>>> >> ><Thread(Client >> >> >>>> >> >>> >localhost:54321, started daemon 139992488138496)> >> >> >(func=<bound >> >> >>>> >> >method >> >> >>>> >> >>> >Reactor.process_requests of >> >> ><yajsonrpc.betterAsyncore.Reactor >> >> >>>> >> >object at >> >> >>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) >> >> >>>> >> >>> > >> >> >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) >> >> >>>> >> >>> >Stomp connection established >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >>>> >> >Sending >> >> >>>> >> >>> >response >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >>>> >> >>> >Connecting the storage >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >>> >Connecting storage server >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >>>> >> >Sending >> >> >>>> >> >>> >response >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >>>> >> >Sending >> >> >>>> >> >>> >response >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) >> >> >>>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 >is >> >not >> >> >>>> >> >available >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >>> >Connecting storage server >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >>>> >> >Sending >> >> >>>> >> >>> >response >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >>> >[{u'status': 0, u'id': >> >> >u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >>> >Refreshing the storage domain >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >>>> >> >Sending >> >> >>>> >> >>> >response >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >>> >Error refreshing storage domain: Command >> >> >StorageDomain.getStats >> >> >>>> >> >with >> >> >>>> >> >>> >args >> >> >>>> >> >>> >{'storagedomainID': >> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >> >>>> >failed: >> >> >>>> >> >>> > >> >> >>>> >> >>> >(code=350, message=Error in storage domain action: >> >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >>>> >> >Sending >> >> >>>> >> >>> >response >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) >> >> >>>> >> >>> >Command StorageDomain.getInfo with args >> >{'storagedomainID': >> >> >>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >> >> >>>> >> >>> > >> >> >>>> >> >>> >(code=350, message=Error in storage domain action: >> >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >>>> >> >>> > >> >> >>>> >> >>> >MainThread::WARNING::2020-04-09 >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >> >>>> >> >>> >Can't connect vdsm storage: Command >> >StorageDomain.getInfo >> >> >with >> >> >>>> >args >> >> >>>> >> >>> >{'storagedomainID': >> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >> >>>> >failed: >> >> >>>> >> >>> > >> >> >>>> >> >>> >(code=350, message=Error in storage domain action: >> >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >>>> >> >>> > >> >> >>>> >> >>> > >> >> >>>> >> >>> >The UUID it is moaning about is indeed the one that >the >> >HA >> >> >sits >> >> >>>> >on >> >> >>>> >> >and >> >> >>>> >> >>> >is >> >> >>>> >> >>> >the one I listed the contents of in step 2 above. >> >> >>>> >> >>> > >> >> >>>> >> >>> > >> >> >>>> >> >>> >So why can't it see this domain? >> >> >>>> >> >>> > >> >> >>>> >> >>> > >> >> >>>> >> >>> >Thanks, Shareef. >> >> >>>> >> >>> > >> >> >>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov >> >> >>>> >> ><hunter86_bg@yahoo.com> >> >> >>>> >> >>> >wrote: >> >> >>>> >> >>> > >> >> >>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef >Jalloq >> >< >> >> >>>> >> >>> >> shareef@jalloq.co.uk> wrote: >> >> >>>> >> >>> >> >Don't know if this is useful or not, but I just >tried >> >to >> >> >>>> >> >shutdown >> >> >>>> >> >>> >and >> >> >>>> >> >>> >> >start >> >> >>>> >> >>> >> >another VM on one of the hosts and get the >following >> >> >error: >> >> >>>> >> >>> >> > >> >> >>>> >> >>> >> >virsh # start scratch >> >> >>>> >> >>> >> > >> >> >>>> >> >>> >> >error: Failed to start domain scratch >> >> >>>> >> >>> >> > >> >> >>>> >> >>> >> >error: Network not found: no network with matching >> >name >> >> >>>> >> >>> >> >'vdsm-ovirtmgmt' >> >> >>>> >> >>> >> > >> >> >>>> >> >>> >> >Is this not referring to the interface name as the >> >> >network is >> >> >>>> >> >called >> >> >>>> >> >>> >> >'ovirtmgnt'. >> >> >>>> >> >>> >> > >> >> >>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >> >> >>>> >> >>> ><shareef@jalloq.co.uk> >> >> >>>> >> >>> >> >wrote: >> >> >>>> >> >>> >> > >> >> >>>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it >> >hasn't >> >> >come >> >> >>>> >up >> >> >>>> >> >and >> >> >>>> >> >>> >the >> >> >>>> >> >>> >> >> agent.log is full of the same errors. >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq >> >> >>>> >> >>> ><shareef@jalloq.co.uk> >> >> >>>> >> >>> >> >> wrote: >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using >> >virsh >> >> >on >> >> >>>> >the >> >> >>>> >> >>> >second >> >> >>>> >> >>> >> >host >> >> >>>> >> >>> >> >>> but my first host is still dead. >> >> >>>> >> >>> >> >>> >> >> >>>> >> >>> >> >>> First of all, what are these 56,317 .prob- >files >> >that >> >> >get >> >> >>>> >> >dumped >> >> >>>> >> >>> >to >> >> >>>> >> >>> >> >the >> >> >>>> >> >>> >> >>> NFS mounts? >> >> >>>> >> >>> >> >>> >> >> >>>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS >> >> >directories >> >> >>>> >at >> >> >>>> >> >boot? >> >> >>>> >> >>> >> >Is >> >> >>>> >> >>> >> >>> that the issue with this particular node? >> >> >>>> >> >>> >> >>> >> >> >>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM >> >> >>>> ><eevans@digitaldatatechs.com> >> >> >>>> >> >>> >wrote: >> >> >>>> >> >>> >> >>> >> >> >>>> >> >>> >> >>>> Did you try virsh list --inactive >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Eric Evans >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Digital Data Services LLC. >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> 304.660.9080 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >> >> >>>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >> >> >>>> >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >> >> >>>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >> >> >>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine >> >> >unresponsive - >> >> >>>> >how >> >> >>>> >> >to >> >> >>>> >> >>> >> >rescue? >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> I've now shut down the VMs on one host and >> >rebooted >> >> >it >> >> >>>> >but >> >> >>>> >> >the >> >> >>>> >> >>> >> >agent >> >> >>>> >> >>> >> >>>> service doesn't start. If I run >'hosted-engine >> >> >>>> >--vm-status' >> >> >>>> >> >I >> >> >>>> >> >>> >get: >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> The hosted engine configuration has not been >> >> >retrieved >> >> >>>> >from >> >> >>>> >> >>> >shared >> >> >>>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is >> >> >running and >> >> >>>> >> >the >> >> >>>> >> >>> >> >storage >> >> >>>> >> >>> >> >>>> server is reachable. >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> and indeed if I list the mounts under >> >> >>>> >/rhev/data-center/mnt, >> >> >>>> >> >>> >only >> >> >>>> >> >>> >> >one of >> >> >>>> >> >>> >> >>>> the directories is mounted. I have 3 NFS >mounts, >> >> >one ISO >> >> >>>> >> >Domain >> >> >>>> >> >>> >> >and two >> >> >>>> >> >>> >> >>>> Data Domains. Only one Data Domain has >mounted >> >and >> >> >this >> >> >>>> >has >> >> >>>> >> >>> >lots >> >> >>>> >> >>> >> >of .prob >> >> >>>> >> >>> >> >>>> files in. So why haven't the other NFS >exports >> >been >> >> >>>> >> >mounted? >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have >> >helped >> >> >much >> >> >>>> >> >either. >> >> >>>> >> >>> >I >> >> >>>> >> >>> >> >can >> >> >>>> >> >>> >> >>>> start the broker service but the agent service >> >says >> >> >no. >> >> >>>> >> >Same >> >> >>>> >> >>> >error >> >> >>>> >> >>> >> >as the >> >> >>>> >> >>> >> >>>> one in my last email. >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Shareef. >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >> >> >>>> >> >>> >> ><shareef@jalloq.co.uk> >> >> >>>> >> >>> >> >>>> wrote: >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Right, still down. I've run virsh and it >doesn't >> >> >know >> >> >>>> >> >anything >> >> >>>> >> >>> >> >about >> >> >>>> >> >>> >> >>>> the engine vm. >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> I've restarted the broker and agent services >and >> >I >> >> >still >> >> >>>> >get >> >> >>>> >> >>> >> >nothing in >> >> >>>> >> >>> >> >>>> virsh->list. >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> In the logs under >/var/log/ovirt-hosted-engine-ha >> >I >> >> >see >> >> >>>> >lots >> >> >>>> >> >of >> >> >>>> >> >>> >> >errors: >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> broker.log: >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Searching for submonitors in >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor network >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor network >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor engine-health >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor mem-free >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor storage-domain >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor storage-domain >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor mem-free >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Loaded submonitor engine-health >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Finished loading submonitors >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >>>> >> >>> >> >>>> Connecting the storage >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >>> >> >>>> Connecting storage server >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >>> >> >>>> Connecting storage server >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >>>> >> >>> >> >>>> Refreshing the storage domain >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >> >>>> >> >>> >> >>>> Can't connect vdsm storage: Command >> >> >StorageDomain.getInfo >> >> >>>> >> >with >> >> >>>> >> >>> >args >> >> >>>> >> >>> >> >>>> {'storagedomainID': >> >> >>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >> >>>> >> >>> >failed: >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> (code=350, message=Error in storage domain >> >action: >> >> >>>> >> >>> >> >>>> >> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >>>> >> >>> >> >>>> Searching for submonitors in >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> agent.log: >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >>>> >> >>> >> >>>> Trying to restart agent >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >>>> >> >>> >> >>>> Agent shutting down >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >> >> >>>> >> >>> >> >>>> Found certificate common name: >> >> >ovirt-node-01.phoelex.com >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >> >> >>>> >> >>> >> >>>> Initializing ha-broker connection >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >> >> >>>> >> >>> >> >>>> Starting monitor network, options >> >{'tcp_t_address': >> >> >'', >> >> >>>> >> >>> >> >'network_test': >> >> >>>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': >'192.168.1.99'} >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >> >> >>>> >> >>> >> >>>> Failed to start necessary monitors >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >>>> >> >>> >> >>>> Traceback (most recent call last): >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> File >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> >> >>>> >> >>> >> >>>> line 131, in _run_agent >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> return action(he) >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> File >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> >> >>>> >> >>> >> >>>> line 55, in action_proper >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> return he.start_monitoring() >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> File >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> >> >>>> >> >>> >> >>>> line 432, in start_monitoring >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> self._initialize_broker() >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> File >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> >> >>>> >> >>> >> >>>> line 556, in _initialize_broker >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> m.get('options', {})) >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> File >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> >> >>>> >> >>> >> >>>> line 89, in start_monitor >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start >> >monitor >> >> >via >> >> >>>> >> >>> >> >ovirt-ha-broker: >> >> >>>> >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: >> >> >'network', >> >> >>>> >> >>> >options: >> >> >>>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', >> >> >>>> >'tcp_t_port': >> >> >>>> >> >'', >> >> >>>> >> >>> >> >'addr': >> >> >>>> >> >>> >> >>>> '192.168.1.99'}] >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >>>> >> >>> >> >>>> Trying to restart agent >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >> >> >>>> >> >> >>>> >> >> >> >> >> >> >>>>>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >>>> >> >>> >> >>>> Agent shutting down >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >> >> >>>> >> >>> >> ><hunter86_bg@yahoo.com> >> >> >>>> >> >>> >> >>>> wrote: >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, >> >> >Brett" < >> >> >>>> >> >>> >> >>>> matonb@ltresources.co.uk> wrote: >> >> >>>> >> >>> >> >>>> >On the host you tried to restart the engine >on: >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >Add an alias to virsh (authenticates with >> >> >>>> >virsh_auth.conf) >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >alias virsh='virsh -c >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >>>> >> >> >>>>>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >Then run virsh: >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >virsh >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >virsh # list >> >> >>>> >> >>> >> >>>> > Id Name State >> >> >>>> >> >>> >> >>>> >> >> >>---------------------------------------------------- >> >> >>>> >> >>> >> >>>> > xx HostedEngine Paused >> >> >>>> >> >>> >> >>>> > xx ********** running >> >> >>>> >> >>> >> >>>> > ... >> >> >>>> >> >>> >> >>>> > xx ********** >running >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >HostedEngine should be in the list, try and >> >resume >> >> >the >> >> >>>> >> >engine: >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >virsh # resume HostedEngine >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq >> >> >>>> >> >>> ><shareef@jalloq.co.uk> >> >> >>>> >> >>> >> >>>> >wrote: >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >> Thanks! >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM >being >> >> >>>> >down.... >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine >> >--vm-start >> >> >>>> >> >>> >> >>>> >> VM exists and is down, cleaning up and >> >restarting >> >> >>>> >> >>> >> >>>> >> VM in WaitForLaunch >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, >> >after >> >> >a >> >> >>>> >while >> >> >>>> >> >I >> >> >>>> >> >>> >get a >> >> >>>> >> >>> >> >>>> >status of >> >> >>>> >> >>> >> >>>> >> it being barfed... >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) >> >> >status >> >> >>>> >==-- >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >> >> >>>> >> >>> >> >>>> >> Status up-to-date : False >> >> >>>> >> >>> >> >>>> >> Hostname : >> >> >>>> >> >>> >ovirt-node-00.phoelex.com >> >> >>>> >> >>> >> >>>> >> Host ID : 1 >> >> >>>> >> >>> >> >>>> >> Engine status : >unknown >> >> >>>> >stale-data >> >> >>>> >> >>> >> >>>> >> Score : 3400 >> >> >>>> >> >>> >> >>>> >> stopped : False >> >> >>>> >> >>> >> >>>> >> Local maintenance : False >> >> >>>> >> >>> >> >>>> >> crc32 : >9c4a034b >> >> >>>> >> >>> >> >>>> >> local_conf_timestamp : 523362 >> >> >>>> >> >>> >> >>>> >> Host timestamp : 523608 >> >> >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >> >> >>>> >> >>> >> >>>> >> metadata_parse_version=1 >> >> >>>> >> >>> >> >>>> >> metadata_feature_version=1 >> >> >>>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >> >> >>>> >> >>> >> >>>> >> host-id=1 >> >> >>>> >> >>> >> >>>> >> score=3400 >> >> >>>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 >> >16:13:06 >> >> >2020) >> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >> >> >>>> >> >>> >> >>>> >> maintenance=False >> >> >>>> >> >>> >> >>>> >> state=EngineDown >> >> >>>> >> >>> >> >>>> >> stopped=False >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) >> >> >status >> >> >>>> >==-- >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >> >> >>>> >> >>> >> >>>> >> Status up-to-date : True >> >> >>>> >> >>> >> >>>> >> Hostname : >> >> >>>> >> >>> >ovirt-node-01.phoelex.com >> >> >>>> >> >>> >> >>>> >> Host ID : 2 >> >> >>>> >> >>> >> >>>> >> Engine status : >> >{"reason": >> >> >"bad >> >> >>>> >vm >> >> >>>> >> >>> >status", >> >> >>>> >> >>> >> >>>> >"health": >> >> >>>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": >> >"Down"} >> >> >>>> >> >>> >> >>>> >> Score : 0 >> >> >>>> >> >>> >> >>>> >> stopped : False >> >> >>>> >> >>> >> >>>> >> Local maintenance : False >> >> >>>> >> >>> >> >>>> >> crc32 : >5045f2eb >> >> >>>> >> >>> >> >>>> >> local_conf_timestamp : >1737037 >> >> >>>> >> >>> >> >>>> >> Host timestamp : >1737283 >> >> >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >> >> >>>> >> >>> >> >>>> >> metadata_parse_version=1 >> >> >>>> >> >>> >> >>>> >> metadata_feature_version=1 >> >> >>>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 >2020) >> >> >>>> >> >>> >> >>>> >> host-id=2 >> >> >>>> >> >>> >> >>>> >> score=0 >> >> >>>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 >> >16:12:11 >> >> >>>> >2020) >> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >> >> >>>> >> >>> >> >>>> >> maintenance=False >> >> >>>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown >> >> >>>> >> >>> >> >>>> >> stopped=False >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >> >> >>>> >> >>> >> >>>> ><matonb@ltresources.co.uk> >> >> >>>> >> >>> >> >>>> >> wrote: >> >> >>>> >> >>> >> >>>> >> >> >> >>>> >> >>> >> >>>> >>> First steps, on one of your hosts as root: >> >> >>>> >> >>> >> >>>> >>> >> >> >>>> >> >>> >> >>>> >>> To get information: >> >> >>>> >> >>> >> >>>> >>> hosted-engine --vm-status >> >> >>>> >> >>> >> >>>> >>> >> >> >>>> >> >>> >> >>>> >>> To start the engine: >> >> >>>> >> >>> >> >>>> >>> hosted-engine --vm-start >> >> >>>> >> >>> >> >>>> >>> >> >> >>>> >> >>> >> >>>> >>> >> >> >>>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef >Jalloq >> >> >>>> >> >>> >> ><shareef@jalloq.co.uk> >> >> >>>> >> >>> >> >>>> >wrote: >> >> >>>> >> >>> >> >>>> >>> >> >> >>>> >> >>> >> >>>> >>>> So my engine has gone down and I can't >ssh >> >into >> >> >it >> >> >>>> >> >either. >> >> >>>> >> >>> >If >> >> >>>> >> >>> >> >I >> >> >>>> >> >>> >> >>>> >try to >> >> >>>> >> >>> >> >>>> >>>> log into the web-ui of the node it is >> >running >> >> >on, I >> >> >>>> >get >> >> >>>> >> >>> >> >redirected >> >> >>>> >> >>> >> >>>> >because >> >> >>>> >> >>> >> >>>> >>>> the node can't reach the engine. >> >> >>>> >> >>> >> >>>> >>>> >> >> >>>> >> >>> >> >>>> >>>> What are my next steps? >> >> >>>> >> >>> >> >>>> >>>> >> >> >>>> >> >>> >> >>>> >>>> Shareef. >> >> >>>> >> >>> >> >>>> >>>> >> >_______________________________________________ >> >> >>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org >> >> >>>> >> >>> >> >>>> >>>> To unsubscribe send an email to >> >> >>>> >users-leave@ovirt.org >> >> >>>> >> >>> >> >>>> >>>> Privacy Statement: >> >> >>>> >> >>> >https://www.ovirt.org/privacy-policy.html >> >> >>>> >> >>> >> >>>> >>>> oVirt Code of Conduct: >> >> >>>> >> >>> >> >>>> >>>> >> >> >>>> >> >>https://www.ovirt.org/community/about/community-guidelines/ >> >> >>>> >> >>> >> >>>> >>>> List Archives: >> >> >>>> >> >>> >> >>>> >>>> >> >> >>>> >> >>> >> >>>> > >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> > >> >> >>>> >> >>> >> >> >> >>>> >> >>> > >> >> >>>> >> >>> >> >> >>>> >> > >> >> >>>> >> >> >> >>>> > >> >> >>>> >> >> > >> >> >> > >> >https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/ >> >> >>>> >> >>> >> >>>> >>>> >> >> >>>> >> >>> >> >>>> >>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> This has to be resolved: >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Engine status : unknown >> >> >stale-data >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it >> >remains >> >> >the >> >> >>>> >> >same, >> >> >>>> >> >>> >> >restart >> >> >>>> >> >>> >> >>>> ovirt-ha-broker.service & >ovirt-ha-agent.service >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Verify that the engine's storage is available. >> >Then >> >> >>>> >monitor >> >> >>>> >> >the >> >> >>>> >> >>> >> >broker >> >> >>>> >> >>> >> >>>> & agent logs in >/var/log/ovirt-hosted-engine-ha >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> Best Regards, >> >> >>>> >> >>> >> >>>> Strahil Nikolov >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >>>> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> Hi Shareef, >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> The flow of activation oVirt is more complex than a >> >plain >> >> >KVM. >> >> >>>> >> >>> >> Mounting of the domains happen during the >activation >> >of >> >> >the >> >> >>>> >node >> >> >>>> >> >( >> >> >>>> >> >>> >the >> >> >>>> >> >>> >> HostedEngine is activating everything needed). >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> Focus on the HostedEngine VM. >> >> >>>> >> >>> >> Is it running properly ? >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> If not,try: >> >> >>>> >> >>> >> 1. Verify that the storage domain exists >> >> >>>> >> >>> >> 2. Check if it has 'ha_agents' directory >> >> >>>> >> >>> >> 3. Check if the links are OK, if not you can >safely >> >> >remove >> >> >>>> >the >> >> >>>> >> >links >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> 4. Next check the services are running: >> >> >>>> >> >>> >> A) sanlock >> >> >>>> >> >>> >> B) supervdsmd >> >> >>>> >> >>> >> C) vdsmd >> >> >>>> >> >>> >> D) libvirtd >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> 5. Increase the log level for broker and agent >> >services: >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha >> >> >>>> >> >>> >> vim *-log.conf >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> 6. Check what they are complaining about >> >> >>>> >> >>> >> Keep in mind that agent will keep throwing errors >> >untill >> >> >the >> >> >>>> >> >broker >> >> >>>> >> >>> >stops >> >> >>>> >> >>> >> doing it (agent depends on broker), so broker >must >> >be >> >> >OK >> >> >>>> >before >> >> >>>> >> >>> >> peoceeding with the agent log. >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> About the manual VM start, you need 2 things: >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> 1. Define the VM network >> >> >>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> >> >> >>>> >> >>> >> <name>vdsm-ovirtmgmt</name> >> >> >>>> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >> >> >>>> >> >>> >> <forward mode='bridge'/> >> >> >>>> >> >>> >> <bridge name='ovirtmgmt'/> >> >> >>>> >> >>> >> </network> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define >> >> >>>> >> >vdsm-ovirtmgmt.xml >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> 2. Get an xml definition which can be found in the >> >vdsm >> >> >log. >> >> >>>> >> >Every VM >> >> >>>> >> >>> >at >> >> >>>> >> >>> >> start up has it's configuration printed out in >vdsm >> >log >> >> >on >> >> >>>> >the >> >> >>>> >> >host >> >> >>>> >> >>> >it >> >> >>>> >> >>> >> starts. >> >> >>>> >> >>> >> Save to file and then: >> >> >>>> >> >>> >> A) virsh define myvm.xml >> >> >>>> >> >>> >> B) virsh start myvm >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> It seems there is/was a problem with your NFS >shares. >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> Best Regards, >> >> >>>> >> >>> >> Strahil Nikolov >> >> >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> Hey Shareef, >> >> >>>> >> >>> >> >> >>>> >> >>> Check if there are any files or folders not owned by >> >> >vdsm:kvm . >> >> >>>> >> >Something >> >> >>>> >> >>> like this: >> >> >>>> >> >>> >> >> >>>> >> >>> find . -not -user 36 -not -group 36 -print >> >> >>>> >> >>> >> >> >>>> >> >>> Also check if vdsm can access the images in the >> >> >>>> >> >>> '<vol-mount-point>/images' directories. >> >> >>>> >> >>> >> >> >>>> >> >>> Best Regards, >> >> >>>> >> >>> Strahil Nikolov >> >> >>>> >> >>> >> >> >>>> >> >> >> >> >>>> >> >> >> >>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? >> >> >>>> >> >> >> >>>> >> I don't see in the log output. >> >> >>>> >> >> >> >>>> >> Best Regards, >> >> >>>> >> Strahil Nikolov >> >> >>>> >> >> >> >>>> >> >> >>>> Based on your output , you got a PTR record for IPv4 & >IPv6 >> >... >> >> >most >> >> >>>> probably it's the reason. >> >> >>>> >> >> >>>> Set the IPv6 on the interface and try again. >> >> >>>> >> >> >>>> Best Regards, >> >> >>>> Strahil Nikolov >> >> >>>> >> >> >>> >> >> >> >> Do you have firewalld up and running on the host ? >> >> >> >> Best Regards, >> >> Strahil Nikolov >> >> >> >> I am guessing, but your interface is not asaigned to any zone , >right? >> Just add the interface to the default zone (usually 'public'). >> >> Best Regards, >> Strahil Nikolov >> Keep in mind that there are a lot of playbooks that can be used to deploy a HostedEngine Environment via ansible. Keep in mind that if you plan to use oVirt in Prod, you need to know how to debug it (at least on basic level). Best Regards, Strahil Nikolov

Is this actually production ready? It seems to break at every step. On Wed, Apr 15, 2020 at 5:45 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 15, 2020 5:59:46 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Thanks for your help but I've decided to try and reinstall from scratch. This is taking too long.
On Wed, Apr 15, 2020 at 3:25 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Yes, but there are no zones set up, just ports 22, 6801 adn 6900.
On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Oh this is painful. It seems to progress if you have both he_force_ipv4 set and run the deployment with the '--4' switch.
But then I get a failure when the ansible script checks for firewalld-zones and doesn't get anything back. Should the deployment flow not be setting any zones it needs?
2020-04-15 10:57:25,439+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get active list of active firewalld zones]
2020-04-15 10:57:26,641+0000 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': True, u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': 1, u'invocation': {u'module_args': {u'creates': None, u'executable': None, u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set -euo pipefail && firewall-cmd --get-active-zones | grep -v "^\\s*interfaces"', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], u'msg': u'non-zero return code'}
2020-04-15 10:57:26,741+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -euo pipefail && firewall-cmd --get-active-zones | grep -v \"^\\s*interfaces\"", "delta": "0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
> Ha, spoke too soon. It's now stuck in a loop and a google
me
at > https://bugzilla.redhat.com/show_bug.cgi?id=1746585 > > However, forcing ipv4 doesn't seem to have fixed the loop. > > On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> > wrote: > >> OK, that seems to have fixed it, thanks. Is this a side effect of >> redeploying the HE over a first time install? Nothing has changed in our >> setup and I didn't need to do this when I initially set up our nodes. >> >> >> >> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg@yahoo.com> >> wrote: >> >>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < >>> shareef@jalloq.co.uk> wrote: >>> >Hmmm, we're not using ipv6. Is that the issue? >>> > >>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> >>> >wrote: >>> > >>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < >>> >> shareef@jalloq.co.uk> wrote: >>> >> >Right, I've given up on recovering the HE so want to try and >>> >redeploy >>> >> >it. >>> >> >There doesn't seem to be enough information to debug why
>>> >> >broker/agent >>> >> >won't start cleanly. >>> >> > >>> >> >In running 'hosted-engine --deploy', I'm seeing the following error >>> >in >>> >> >the >>> >> >setup validation phase: >>> >> > >>> >> >2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human >>> >> >dialog.__logString:204 DIALOG:SEND Please provide >>> >the >>> >> >hostname of this host on the management network >>> >> >[ovirt-node-00.phoelex.com]: >>> >> > >>> >> > >>> >> >2020-04-14 09:46:12,831+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge >>> >> >hostname.getResolvedAddresses:432 >>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >>> >> > >>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge >>> >> >hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com >>> >> >resolves >>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >>> >> > >>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>> >> >execute: >>> >> >['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', >>> >> >'ANY'], >>> >> >executable='None', cwd='None', env=None >>> >> > >>> >> >2020-04-14 09:46:12,871+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' >>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 >>> >> > >>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge
>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: >>> >> > >>> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 >>> >> > >>> >> > >>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge
>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: >>> >> > >>> >> > >>> >> > >>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>> >> >execute: >>> >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None >>> >> > >>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 >>> >> > >>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge
>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: >>> >> > >>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN >>> >> >group >>> >> >default qlen 1000 >>> >> > >>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >>> >> > >>> >> > inet 127.0.0.1/8 scope host lo >>> >> > >>> >> > valid_lft forever preferred_lft forever >>> >> > >>> >> > inet6 ::1/128 scope host >>> >> > >>> >> > valid_lft forever preferred_lft forever >>> >> > >>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master >>> >> >ovirtmgmt state UP group default qlen 1000 >>> >> > >>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>> >> > >>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state >>> >> >DOWN >>> >> >group default qlen 1000 >>> >> > >>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff >>> >> > >>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >>> >> >group >>> >> >default qlen 1000 >>> >> > >>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff >>> >> > >>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >>> >group >>> >> >default qlen 1000 >>> >> > >>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff >>> >> > >>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >>> >noqueue >>> >> >state UP group default qlen 1000 >>> >> > >>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>> >> > >>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt >>> >> > >>> >> > valid_lft forever preferred_lft forever >>> >> > >>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link >>> >> > >>> >> > valid_lft forever preferred_lft forever >>> >> > >>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state >>> >DOWN >>> >> >group >>> >> >default qlen 1000 >>> >> > >>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff >>> >> > >>> >> > >>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge
>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: >>> >> > >>> >> > >>> >> > >>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge >>> >> >hostname.getLocalAddresses:251 >>> >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] >>> >> > >>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>> >> >otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 >>> >> >test_hostname exception >>> >> > >>> >> >Traceback (most recent call last): >>> >> > >>> >> >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >>> >> >line >>> >> >460, in test_hostname >>> >> > >>> >> > not_local_text, >>> >> > >>> >> >File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >>> >> >line >>> >> >342, in _validateFQDNresolvability >>> >> > >>> >> > addresses=resolvedAddressesAsString >>> >> > >>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to >>> >64:ff9b::c0a8:13d >>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>> >devices >>> >> >on >>> >> >this host >>> >> > >>> >> >2020-04-14 09:46:12,884+0000 ERROR >>> >> >otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 >>> >Host >>> >> >name >>> >> >is not valid: ovirt-node-00.phoelex.com resolves to >>> >64:ff9b::c0a8:13d >>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>> >devices >>> >> >on >>> >> >this host >>> >> > >>> >> >The node I'm running on has an IP address of .61 and resolves >>> >> >correctly. >>> >> > >>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq >>> ><shareef@jalloq.co.uk> >>> >> >wrote: >>> >> > >>> >> >> Where should I be checking if there are any files/folder not owned >>> >by >>> >> >> vdsm:kvm? I checked on the mount the HA sits on and it's fine. >>> >> >> >>> >> >> How would I go about checking vdsm can access those images? If I >>> >run >>> >> >> virsh, it lists them and they were running yesterday even though >>> >the >>> >> >HA was >>> >> >> down. I've since restarted both hosts but the broker is still >>> >> >spitting out >>> >> >> the same error (copied below). How do I find the reason the >>> >broker >>> >> >can't >>> >> >> connect to the storage? The conf file is already at DEBUG >>> >verbosity: >>> >> >> >>> >> >> [handler_logfile] >>> >> >> >>> >> >> class=logging.handlers.TimedRotatingFileHandler >>> >> >> >>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7) >>> >> >> >>> >> >> level=DEBUG >>> >> >> >>> >> >> formatter=long >>> >> >> >>> >> >> And what are all these .prob-<num> files that are being created? >>> >> >There >>> >> >> are over 250K of them now on the mount I'm using for
On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: points the plugin.execute:921 plugin.execute:926 plugin.execute:921 plugin.execute:926 the
Data
>>> >domain. >>> >> >> They're all of 0 size and of the form, >>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: >>> >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 >>> >> >> >>> >> >> @eevans: The volume I have the Data Domain on has TB's free. The >>> >HA >>> >> >is >>> >> >> dead so I can't ssh in. No idea what started these errors and the >>> >> >other >>> >> >> VMs were still running happily although they're on a different >>> >Data >>> >> >Domain. >>> >> >> >>> >> >> Shareef. >>> >> >> >>> >> >> MainThread::INFO::2020-04-10 >>> >> >> >>> >> >>> >> >>> >>>
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>> >> >> Connecting the storage >>> >> >> >>> >> >> MainThread::INFO::2020-04-10 >>> >> >> >>> >> >>> >> >>> >>>
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >> Connecting storage server >>> >> >> >>> >> >> MainThread::INFO::2020-04-10 >>> >> >> >>> >> >>> >> >>> >>>
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >> Connecting storage server >>> >> >> >>> >> >> MainThread::INFO::2020-04-10 >>> >> >> >>> >> >>> >> >>> >>>
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >> Refreshing the storage domain >>> >> >> >>> >> >> MainThread::WARNING::2020-04-10 >>> >> >> >>> >> >>> >> >>> >>>
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo with >>> >args >>> >> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>> >failed: >>> >> >> >>> >> >> (code=350, message=Error in storage domain action: >>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>> >> >> >>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov >>> >> ><hunter86_bg@yahoo.com> >>> >> >> wrote: >>> >> >> >>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < >>> >> >>> shareef@jalloq.co.uk> wrote: >>> >> >>> >OK, let's go through this. I'm looking at the node that at >>> >least >>> >> >still >>> >> >>> >has >>> >> >>> >some VMs running. virsh also tells me that the HostedEngine VM >>> >is >>> >> >>> >running >>> >> >>> >but it's unresponsive and I can't shut it down. >>> >> >>> > >>> >> >>> >1. All storage domains exist and are mounted. >>> >> >>> >2. The ha_agent exists: >>> >> >>> > >>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls >>> >> >/rhev/data-center/mnt/ >>> >> >>> >nas-01.phoelex.com >>> >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >>> >> >>> > >>> >> >>> >dom_md ha_agent images master >>> >> >>> > >>> >> >>> >3. There are two links >>> >> >>> > >>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll >>> >> >/rhev/data-center/mnt/ >>> >> >>> >nas-01.phoelex.com >>> >> >>> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ >>> >> >>> > >>> >> >>> >total 8 >>> >> >>> > >>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace >>> >-> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 >>> >> >>> > >>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata >>> >-> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 >>> >> >>> > >>> >> >>> >4. The services exist but all seem to have some sort of warning: >>> >> >>> > >>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: >>> >> >*2020-04-08 >>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 >>> >sec* >>> >> >>> > >>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: >>> >> >*failed >>> >> >>> >to >>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object >>> >> >file: >>> >> >>> >No >>> >> >>> >such file or directory* >>> >> >>> > >>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR >>> >> >failed >>> >> >>> >to >>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or >>> >> >directory'Is >>> >> >>> >the >>> >> >>> >Hosted Engine setup finished?* >>> >> >>> > >>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: >>> >> >2020-04-08 >>> >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : >>> >> >cannot >>> >> >>> >parse >>> >> >>> >process status data >>> >> >>> > >>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: >>> >> >2020-04-08 >>> >> >>> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 >>> >: >>> >> >>> >internal >>> >> >>> >error: /proc/net/dev: Interface not found >>> >> >>> > >>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: >>> >> >2020-04-08 >>> >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : >>> >End >>> >> >of >>> >> >>> >file >>> >> >>> >while reading data: Input/output error >>> >> >>> > >>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: >>> >> >2020-04-09 >>> >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : >>> >End >>> >> >of >>> >> >>> >file >>> >> >>> >while reading data: Input/output error >>> >> >>> > >>> >> >>> >5 & 6. The broker log is continually printing this error: >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>> >> >>> >Running broker >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) >>> >> >>> >Starting monitor >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Searching for submonitors in >>> >> >>> >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker >>> >> >>> > >>> >> >>> >/submonitors >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor network >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor cpu-load-no-engine >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor mgmt-bridge >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor network >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor cpu-load >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor engine-health >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor mgmt-bridge >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor cpu-load-no-engine >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor cpu-load >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor mem-free >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor storage-domain >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor storage-domain >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor mem-free >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Loaded submonitor engine-health >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >Finished loading submonitors >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) >>> >> >>> >Starting storage broker >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>> >> >>> >Connecting to VDSM >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) >>> >> >>> >Creating a new json-rpc connection to VDSM >>> >> >>> > >>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>> >> >>> >08:07:31,453::concurrent::258::root::(run) START thread >>> >> ><Thread(Client >>> >> >>> >localhost:54321, started daemon 139992488138496)> (func=<bound >>> >> >method >>> >> >>> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor >>> >> >object at >>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) >>> >> >>> > >>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) >>> >> >>> >Stomp connection established >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) >>> >> >Sending >>> >> >>> >response >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>> >> >>> >Connecting the storage >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >>> >Connecting storage server >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) >>> >> >Sending >>> >> >>> >response >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) >>> >> >Sending >>> >> >>> >response >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) >>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not >>> >> >available >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >>> >Connecting storage server >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) >>> >> >Sending >>> >> >>> >response >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >>> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >>> >> >>> > >>> >> >>> >MainThread::INFO::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >>> >Refreshing the storage domain >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) >>> >> >Sending >>> >> >>> >response >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >>> >Error refreshing storage domain: Command StorageDomain.getStats >>> >> >with >>> >> >>> >args >>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>> >failed: >>> >> >>> > >>> >> >>> >(code=350, message=Error in storage domain action: >>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) >>> >> >Sending >>> >> >>> >response >>> >> >>> > >>> >> >>> >MainThread::DEBUG::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) >>> >> >>> >Command StorageDomain.getInfo with args {'storagedomainID': >>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >>> >> >>> > >>> >> >>> >(code=350, message=Error in storage domain action: >>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>> >> >>> > >>> >> >>> >MainThread::WARNING::2020-04-09 >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>> >> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo with >>> >args >>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>> >failed: >>> >> >>> > >>> >> >>> >(code=350, message=Error in storage domain action: >>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>> >> >>> > >>> >> >>> > >>> >> >>> >The UUID it is moaning about is indeed the one that the HA sits >>> >on >>> >> >and >>> >> >>> >is >>> >> >>> >the one I listed the contents of in step 2 above. >>> >> >>> > >>> >> >>> > >>> >> >>> >So why can't it see this domain? >>> >> >>> > >>> >> >>> > >>> >> >>> >Thanks, Shareef. >>> >> >>> > >>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov >>> >> ><hunter86_bg@yahoo.com> >>> >> >>> >wrote: >>> >> >>> > >>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < >>> >> >>> >> shareef@jalloq.co.uk> wrote: >>> >> >>> >> >Don't know if this is useful or not, but I just tried to >>> >> >shutdown >>> >> >>> >and >>> >> >>> >> >start >>> >> >>> >> >another VM on one of the hosts and get the following error: >>> >> >>> >> > >>> >> >>> >> >virsh # start scratch >>> >> >>> >> > >>> >> >>> >> >error: Failed to start domain scratch >>> >> >>> >> > >>> >> >>> >> >error: Network not found: no network with matching name >>> >> >>> >> >'vdsm-ovirtmgmt' >>> >> >>> >> > >>> >> >>> >> >Is this not referring to the interface name as the network is >>> >> >called >>> >> >>> >> >'ovirtmgnt'. >>> >> >>> >> > >>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >>> >> >>> ><shareef@jalloq.co.uk> >>> >> >>> >> >wrote: >>> >> >>> >> > >>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't come >>> >up >>> >> >and >>> >> >>> >the >>> >> >>> >> >> agent.log is full of the same errors. >>> >> >>> >> >> >>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq >>> >> >>> ><shareef@jalloq.co.uk> >>> >> >>> >> >> wrote: >>> >> >>> >> >> >>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh on >>> >the >>> >> >>> >second >>> >> >>> >> >host >>> >> >>> >> >>> but my first host is still dead. >>> >> >>> >> >>> >>> >> >>> >> >>> First of all, what are these 56,317 .prob- files that get >>> >> >dumped >>> >> >>> >to >>> >> >>> >> >the >>> >> >>> >> >>> NFS mounts? >>> >> >>> >> >>> >>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS directories >>> >at >>> >> >boot? >>> >> >>> >> >Is >>> >> >>> >> >>> that the issue with this particular node? >>> >> >>> >> >>> >>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM >>> ><eevans@digitaldatatechs.com> >>> >> >>> >wrote: >>> >> >>> >> >>> >>> >> >>> >> >>>> Did you try virsh list --inactive >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> Eric Evans >>> >> >>> >> >>>> >>> >> >>> >> >>>> Digital Data Services LLC. >>> >> >>> >> >>>> >>> >> >>> >> >>>> 304.660.9080 >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >>> >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - >>> >how >>> >> >to >>> >> >>> >> >rescue? >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> I've now shut down the VMs on one host and rebooted it >>> >but >>> >> >the >>> >> >>> >> >agent >>> >> >>> >> >>>> service doesn't start. If I run 'hosted-engine >>> >--vm-status' >>> >> >I >>> >> >>> >get: >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> The hosted engine configuration has not been retrieved >>> >from >>> >> >>> >shared >>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is running and >>> >> >the >>> >> >>> >> >storage >>> >> >>> >> >>>> server is reachable. >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> and indeed if I list the mounts under >>> >/rhev/data-center/mnt, >>> >> >>> >only >>> >> >>> >> >one of >>> >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO >>> >> >Domain >>> >> >>> >> >and two >>> >> >>> >> >>>> Data Domains. Only one Data Domain has mounted and this >>> >has >>> >> >>> >lots >>> >> >>> >> >of .prob >>> >> >>> >> >>>> files in. So why haven't the other NFS exports been >>> >> >mounted? >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> Manually mounting them doesn't seem to have helped much >>> >> >either. >>> >> >>> >I >>> >> >>> >> >can >>> >> >>> >> >>>> start the broker service but the agent service says no. >>> >> >Same >>> >> >>> >error >>> >> >>> >> >as the >>> >> >>> >> >>>> one in my last email. >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> Shareef. >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >>> >> >>> >> ><shareef@jalloq.co.uk> >>> >> >>> >> >>>> wrote: >>> >> >>> >> >>>> >>> >> >>> >> >>>> Right, still down. I've run virsh and it doesn't know >>> >> >anything >>> >> >>> >> >about >>> >> >>> >> >>>> the engine vm. >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> I've restarted the broker and agent services and I still >>> >get >>> >> >>> >> >nothing in >>> >> >>> >> >>>> virsh->list. >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see >>> >lots >>> >> >of >>> >> >>> >> >errors: >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> broker.log: >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Searching for submonitors in >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor network >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor network >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor cpu-load >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor engine-health >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor cpu-load >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor mem-free >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor storage-domain >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor storage-domain >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor mem-free >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Loaded submonitor engine-health >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Finished loading submonitors >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>> >> >>> >> >>>> Connecting the storage >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >>> >> >>>> Connecting storage server >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >>> >> >>>> Connecting storage server >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>> >> >>> >> >>>> Refreshing the storage domain >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>> >> >>> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo >>> >> >with >>> >> >>> >args >>> >> >>> >> >>>> {'storagedomainID': >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>> >> >>> >failed: >>> >> >>> >> >>>> >>> >> >>> >> >>>> (code=350, message=Error in storage domain action: >>> >> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>> >> >>> >> >>>> Searching for submonitors in >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> agent.log: >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>> >> >>> >> >>>> Trying to restart agent >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >>> >> >>> >>>
>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>> >> >>> >> >>>> Agent shutting down >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >>> >> >>> >>>
>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >>> >> >>> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>> >> >>> >> >>>> Initializing ha-broker connection >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>> >> >>> >> >>>> Starting monitor network, options {'tcp_t_address': '', >>> >> >>> >> >'network_test': >>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>> >> >>> >> >>>> Failed to start necessary monitors >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>> >> >>> >> >>>> Traceback (most recent call last): >>> >> >>> >> >>>> >>> >> >>> >> >>>> File >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> >> >>> >> >>>> line 131, in _run_agent >>> >> >>> >> >>>> >>> >> >>> >> >>>> return action(he) >>> >> >>> >> >>>> >>> >> >>> >> >>>> File >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> >> >>> >> >>>> line 55, in action_proper >>> >> >>> >> >>>> >>> >> >>> >> >>>> return he.start_monitoring() >>> >> >>> >> >>>> >>> >> >>> >> >>>> File >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> >> >>> >> >>>> line 432, in start_monitoring >>> >> >>> >> >>>> >>> >> >>> >> >>>> self._initialize_broker() >>> >> >>> >> >>>> >>> >> >>> >> >>>> File >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> >> >>> >> >>>> line 556, in _initialize_broker >>> >> >>> >> >>>> >>> >> >>> >> >>>> m.get('options', {})) >>> >> >>> >> >>>> >>> >> >>> >> >>>> File >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>> >> >>> >> >>>> line 89, in start_monitor >>> >> >>> >> >>>> >>> >> >>> >> >>>> ).format(t=type, o=options, e=e) >>> >> >>> >> >>>> >>> >> >>> >> >>>> RequestError: brokerlink - failed to start monitor via >>> >> >>> >> >ovirt-ha-broker: >>> >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: 'network', >>> >> >>> >options: >>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', >>> >'tcp_t_port': >>> >> >'', >>> >> >>> >> >'addr': >>> >> >>> >> >>>> '192.168.1.99'}] >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>> >> >>> >>> >> >>> >> >>> >>>
>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>> >> >>> >> >>>> Trying to restart agent >>> >> >>> >> >>>> >>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >>> >> >>> >>>
>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>> >> >>> >> >>>> Agent shutting down >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >>> >> >>> >> ><hunter86_bg@yahoo.com> >>> >> >>> >> >>>> wrote: >>> >> >>> >> >>>> >>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < >>> >> >>> >> >>>> matonb@ltresources.co.uk> wrote: >>> >> >>> >> >>>> >On the host you tried to restart the engine on: >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >Add an alias to virsh (authenticates with >>> >virsh_auth.conf) >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >alias virsh='virsh -c >>> >> >>> >> >>>> >>> >> >>> >>>
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >Then run virsh: >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >virsh >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >virsh # list >>> >> >>> >> >>>> > Id Name State >>> >> >>> >> >>>> >---------------------------------------------------- >>> >> >>> >> >>>> > xx HostedEngine Paused >>> >> >>> >> >>>> > xx ********** running >>> >> >>> >> >>>> > ... >>> >> >>> >> >>>> > xx ********** running >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >HostedEngine should be in the list, try and resume the >>> >> >engine: >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >virsh # resume HostedEngine >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq >>> >> >>> ><shareef@jalloq.co.uk> >>> >> >>> >> >>>> >wrote: >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >> Thanks! >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM being >>> >down.... >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >>> >> >>> >> >>>> >> VM exists and is down, cleaning up and restarting >>> >> >>> >> >>>> >> VM in WaitForLaunch >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, after a >>> >while >>> >> >I >>> >> >>> >get a >>> >> >>> >> >>>> >status of >>> >> >>> >> >>>> >> it being barfed... >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status >>> >==-- >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> conf_on_shared_storage : True >>> >> >>> >> >>>> >> Status up-to-date : False >>> >> >>> >> >>>> >> Hostname : >>> >> >>> >ovirt-node-00.phoelex.com >>> >> >>> >> >>>> >> Host ID : 1 >>> >> >>> >> >>>> >> Engine status : unknown >>> >stale-data >>> >> >>> >> >>>> >> Score : 3400 >>> >> >>> >> >>>> >> stopped : False >>> >> >>> >> >>>> >> Local maintenance : False >>> >> >>> >> >>>> >> crc32 : 9c4a034b >>> >> >>> >> >>>> >> local_conf_timestamp : 523362 >>> >> >>> >> >>>> >> Host timestamp : 523608 >>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>> >> >>> >> >>>> >> metadata_parse_version=1 >>> >> >>> >> >>>> >> metadata_feature_version=1 >>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) >>> >> >>> >> >>>> >> host-id=1 >>> >> >>> >> >>>> >> score=3400 >>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 2020) >>> >> >>> >> >>>> >> conf_on_shared_storage=True >>> >> >>> >> >>>> >> maintenance=False >>> >> >>> >> >>>> >> state=EngineDown >>> >> >>> >> >>>> >> stopped=False >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status >>> >==-- >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> conf_on_shared_storage : True >>> >> >>> >> >>>> >> Status up-to-date : True >>> >> >>> >> >>>> >> Hostname : >>> >> >>> >ovirt-node-01.phoelex.com >>> >> >>> >> >>>> >> Host ID : 2 >>> >> >>> >> >>>> >> Engine status : {"reason": "bad >>> >vm >>> >> >>> >status", >>> >> >>> >> >>>> >"health": >>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >>> >> >>> >> >>>> >> Score : 0 >>> >> >>> >> >>>> >> stopped : False >>> >> >>> >> >>>> >> Local maintenance : False >>> >> >>> >> >>>> >> crc32 : 5045f2eb >>> >> >>> >> >>>> >> local_conf_timestamp : 1737037 >>> >> >>> >> >>>> >> Host timestamp : 1737283 >>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>> >> >>> >> >>>> >> metadata_parse_version=1 >>> >> >>> >> >>>> >> metadata_feature_version=1 >>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17
>>> >> >>> >> >>>> >> host-id=2 >>> >> >>> >> >>>> >> score=0 >>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 >>> >2020) >>> >> >>> >> >>>> >> conf_on_shared_storage=True >>> >> >>> >> >>>> >> maintenance=False >>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown >>> >> >>> >> >>>> >> stopped=False >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >>> >> >>> >> >>>> ><matonb@ltresources.co.uk> >>> >> >>> >> >>>> >> wrote: >>> >> >>> >> >>>> >> >>> >> >>> >> >>>> >>> First steps, on one of your hosts as root: >>> >> >>> >> >>>> >>> >>> >> >>> >> >>>> >>> To get information: >>> >> >>> >> >>>> >>> hosted-engine --vm-status >>> >> >>> >> >>>> >>> >>> >> >>> >> >>>> >>> To start the engine: >>> >> >>> >> >>>> >>> hosted-engine --vm-start >>> >> >>> >> >>>> >>> >>> >> >>> >> >>>> >>> >>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq >>> >> >>> >> ><shareef@jalloq.co.uk> >>> >> >>> >> >>>> >wrote: >>> >> >>> >> >>>> >>> >>> >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into it >>> >> >either. >>> >> >>> >If >>> >> >>> >> >I >>> >> >>> >> >>>> >try to >>> >> >>> >> >>>> >>>> log into the web-ui of the node it is running on, I >>> >get >>> >> >>> >> >redirected >>> >> >>> >> >>>> >because >>> >> >>> >> >>>> >>>> the node can't reach the engine. >>> >> >>> >> >>>> >>>> >>> >> >>> >> >>>> >>>> What are my next steps? >>> >> >>> >> >>>> >>>> >>> >> >>> >> >>>> >>>> Shareef. >>> >> >>> >> >>>> >>>>
>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org >>> >> >>> >> >>>> >>>> To unsubscribe send an email to >>> >users-leave@ovirt.org >>> >> >>> >> >>>> >>>> Privacy Statement: >>> >> >>> >https://www.ovirt.org/privacy-policy.html >>> >> >>> >> >>>> >>>> oVirt Code of Conduct: >>> >> >>> >> >>>> >>>> >>> >> https://www.ovirt.org/community/about/community-guidelines/ >>> >> >>> >> >>>> >>>> List Archives: >>> >> >>> >> >>>> >>>> >>> >> >>> >> >>>> > >>> >> >>> >> >>>> >>> >> >>> >> > >>> >> >>> >> >>> >> >>> > >>> >> >>> >>> >> > >>> >> >>> > >>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
>>> >> >>> >> >>>> >>>> >>> >> >>> >> >>>> >>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> This has to be resolved: >>> >> >>> >> >>>> >>> >> >>> >> >>>> Engine status : unknown stale-data >>> >> >>> >> >>>> >>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the >>> >> >same, >>> >> >>> >> >restart >>> >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >>> >> >>> >> >>>> >>> >> >>> >> >>>> Verify that the engine's storage is available. Then >>> >monitor >>> >> >the >>> >> >>> >> >broker >>> >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >>> >> >>> >> >>>> >>> >> >>> >> >>>> Best Regards, >>> >> >>> >> >>>> Strahil Nikolov >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>>> >>> >> >>> >> >>> >> >>> >> Hi Shareef, >>> >> >>> >> >>> >> >>> >> The flow of activation oVirt is more complex than a plain KVM. >>> >> >>> >> Mounting of the domains happen during the activation of the >>> >node >>> >> >( >>> >> >>> >the >>> >> >>> >> HostedEngine is activating everything needed). >>> >> >>> >> >>> >> >>> >> Focus on the HostedEngine VM. >>> >> >>> >> Is it running properly ? >>> >> >>> >> >>> >> >>> >> If not,try: >>> >> >>> >> 1. Verify that the storage domain exists >>> >> >>> >> 2. Check if it has 'ha_agents' directory >>> >> >>> >> 3. Check if the links are OK, if not you can safely remove >>> >the >>> >> >links >>> >> >>> >> >>> >> >>> >> 4. Next check the services are running: >>> >> >>> >> A) sanlock >>> >> >>> >> B) supervdsmd >>> >> >>> >> C) vdsmd >>> >> >>> >> D) libvirtd >>> >> >>> >> >>> >> >>> >> 5. Increase the log level for broker and agent services: >>> >> >>> >> >>> >> >>> >> cd /etc/ovirt-hosted-engine-ha >>> >> >>> >> vim *-log.conf >>> >> >>> >> >>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >>> >> >>> >> >>> >> >>> >> 6. Check what they are complaining about >>> >> >>> >> Keep in mind that agent will keep throwing errors untill the >>> >> >broker >>> >> >>> >stops >>> >> >>> >> doing it (agent depends on broker), so broker must be OK >>> >before >>> >> >>> >> peoceeding with the agent log. >>> >> >>> >> >>> >> >>> >> About the manual VM start, you need 2 things: >>> >> >>> >> >>> >> >>> >> 1. Define the VM network >>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> >>> >> >>> >> <name>vdsm-ovirtmgmt</name> >>> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >>> >> >>> >> <forward mode='bridge'/> >>> >> >>> >> <bridge name='ovirtmgmt'/> >>> >> >>> >> </network> >>> >> >>> >> >>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define >>> >> >vdsm-ovirtmgmt.xml >>> >> >>> >> >>> >> >>> >> 2. Get an xml definition which can be found in the vdsm log. >>> >> >Every VM >>> >> >>> >at >>> >> >>> >> start up has it's configuration printed out in vdsm log on >>> >the >>> >> >host >>> >> >>> >it >>> >> >>> >> starts. >>> >> >>> >> Save to file and then: >>> >> >>> >> A) virsh define myvm.xml >>> >> >>> >> B) virsh start myvm >>> >> >>> >> >>> >> >>> >> It seems there is/was a problem with your NFS shares. >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> Best Regards, >>> >> >>> >> Strahil Nikolov >>> >> >>> >> >>> >> >>> >>> >> >>> Hey Shareef, >>> >> >>> >>> >> >>> Check if there are any files or folders not owned by vdsm:kvm . >>> >> >Something >>> >> >>> like this: >>> >> >>> >>> >> >>> find . -not -user 36 -not -group 36 -print >>> >> >>> >>> >> >>> Also check if vdsm can access the images in the >>> >> >>> '<vol-mount-point>/images' directories. >>> >> >>> >>> >> >>> Best Regards, >>> >> >>> Strahil Nikolov >>> >> >>> >>> >> >> >>> >> >>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? >>> >> >>> >> I don't see in the log output. >>> >> >>> >> Best Regards, >>> >> Strahil Nikolov >>> >> >>> >>> Based on your output , you got a PTR record for IPv4 & IPv6 ... most >>> probably it's the reason. >>> >>> Set the IPv6 on the interface and try again. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>
Do you have firewalld up and running on the host ?
Best Regards, Strahil Nikolov
I am guessing, but your interface is not asaigned to any zone , right? Just add the interface to the default zone (usually 'public').
Best Regards, Strahil Nikolov
Keep in mind that there are a lot of playbooks that can be used to deploy a HostedEngine Environment via ansible.
Keep in mind that if you plan to use oVirt in Prod, you need to know how to debug it (at least on basic level).
Best Regards, Strahil Nikolov

On April 16, 2020 11:25:20 AM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Is this actually production ready? It seems to break at every step.
On Wed, Apr 15, 2020 at 5:45 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
>>>> >> >>> >> >>>> >> host-id=1 >>>> >> >>> >> >>>> >> score=3400 >>>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 >2020) >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >>>> >> >>> >> >>>> >> maintenance=False >>>> >> >>> >> >>>> >> state=EngineDown >>>> >> >>> >> >>>> >> stopped=False >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id:
>status >>>> >==-- >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >>>> >> >>> >> >>>> >> Status up-to-date : False >>>> >> >>> >> >>>> >> Hostname : >>>> >> >>> >ovirt-node-00.phoelex.com >>>> >> >>> >> >>>> >> Host ID : 1 >>>> >> >>> >> >>>> >> Engine status : unknown >>>> >stale-data >>>> >> >>> >> >>>> >> Score : 3400 >>>> >> >>> >> >>>> >> stopped : False >>>> >> >>> >> >>>> >> Local maintenance : False >>>> >> >>> >> >>>> >> crc32 : 9c4a034b >>>> >> >>> >> >>>> >> local_conf_timestamp : 523362 >>>> >> >>> >> >>>> >> Host timestamp : 523608 >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>>> >> >>> >> >>>> >> metadata_parse_version=1 >>>> >> >>> >> >>>> >> metadata_feature_version=1 >>>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11
Thanks for your help but I've decided to try and reinstall from scratch. This is taking too long.
On Wed, Apr 15, 2020 at 3:25 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Yes, but there are no zones set up, just ports 22, 6801 adn 6900.
On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: >Oh this is painful. It seems to progress if you have both >he_force_ipv4 >set and run the deployment with the '--4' switch. > >But then I get a failure when the ansible script checks for >firewalld-zones >and doesn't get anything back. Should the deployment flow not be >setting >any zones it needs? > >2020-04-15 10:57:25,439+0000 INFO >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get >active list of active firewalld zones] > >2020-04-15 10:57:26,641+0000 DEBUG >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': >True, >u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, >u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd >--get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 >10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': >1, >u'invocation': {u'module_args': {u'creates': None, u'executable': None, >u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set >-euo >pipefail && firewall-cmd --get-active-zones | grep -v >"^\\s*interfaces"', >u'removes': None, u'argv': None, u'warn': True, u'chdir': None, >u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], >u'msg': >u'non-zero return code'} > >2020-04-15 10:57:26,741+0000 ERROR >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:107 fatal: [localhost]: FAILED! => >{"changed": true, "cmd": "set -euo pipefail && firewall-cmd >--get-active-zones | grep -v \"^\\s*interfaces\"", "delta": >"0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero >return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": >"", >"stderr_lines": [], "stdout": "", "stdout_lines": []} > >On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> >wrote: > >> Ha, spoke too soon. It's now stuck in a loop and a google
On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: points
me
>at >> https://bugzilla.redhat.com/show_bug.cgi?id=1746585 >> >> However, forcing ipv4 doesn't seem to have fixed the loop. >> >> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> >> wrote: >> >>> OK, that seems to have fixed it, thanks. Is this a side effect of >>> redeploying the HE over a first time install? Nothing has changed in >our >>> setup and I didn't need to do this when I initially set up our >nodes. >>> >>> >>> >>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov ><hunter86_bg@yahoo.com> >>> wrote: >>> >>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < >>>> shareef@jalloq.co.uk> wrote: >>>> >Hmmm, we're not using ipv6. Is that the issue? >>>> > >>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov ><hunter86_bg@yahoo.com> >>>> >wrote: >>>> > >>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < >>>> >> shareef@jalloq.co.uk> wrote: >>>> >> >Right, I've given up on recovering the HE so want to
On April 15, 2020 5:59:46 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: try
>>>> >redeploy >>>> >> >it. >>>> >> >There doesn't seem to be enough information to debug why
>>>> >> >broker/agent >>>> >> >won't start cleanly. >>>> >> > >>>> >> >In running 'hosted-engine --deploy', I'm seeing the following >error >>>> >in >>>> >> >the >>>> >> >setup validation phase: >>>> >> > >>>> >> >2020-04-14 09:46:08,922+0000 DEBUG >otopi.plugins.otopi.dialog.human >>>> >> >dialog.__logString:204 DIALOG:SEND Please >provide >>>> >the >>>> >> >hostname of this host on the management network >>>> >> >[ovirt-node-00.phoelex.com]: >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,831+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname.getResolvedAddresses:432 >>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', >'192.168.1.61']) >>>> >> > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname._validateFQDNresolvability:289 >ovirt-node-00.phoelex.com >>>> >> >resolves >>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >>>> >> > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>>> >> >execute: >>>> >> >['/usr/bin/dig', '+noall', '+answer', >'ovirt-node-00.phoelex.com', >>>> >> >'ANY'], >>>> >> >executable='None', cwd='None', env=None >>>> >> > >>>> >> >2020-04-14 09:46:12,871+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge
>>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: >>>> >> > >>>> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge
>>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: >>>> >> > >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>>> >> >execute: >>>> >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', >env=None >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge
>>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: >>>> >> > >>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state >UNKNOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >>>> >> > >>>> >> > inet 127.0.0.1/8 scope host lo >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> > inet6 ::1/128 scope host >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq >master >>>> >> >ovirtmgmt state UP group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq >state >>>> >> >DOWN >>>> >> >group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state >DOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >>>> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >>>> >noqueue >>>> >> >state UP group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global >ovirtmgmt >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop >state >>>> >DOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge
>>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: >>>> >> > >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname.getLocalAddresses:251 >>>> >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] >>>> >> > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >hostname.test_hostname:464 >>>> >> >test_hostname exception >>>> >> > >>>> >> >Traceback (most recent call last): >>>> >> > >>>> >> >File
"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
>>>> >> >line >>>> >> >460, in test_hostname >>>> >> > >>>> >> > not_local_text, >>>> >> > >>>> >> >File
>>>> >> >line >>>> >> >342, in _validateFQDNresolvability >>>> >> > >>>> >> > addresses=resolvedAddressesAsString >>>> >> > >>>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to >>>> >64:ff9b::c0a8:13d >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>>> >devices >>>> >> >on >>>> >> >this host >>>> >> > >>>> >> >2020-04-14 09:46:12,884+0000 ERROR >>>> >> >otopi.plugins.gr_he_common.network.bridge >dialog.queryEnvKey:120 >>>> >Host >>>> >> >name >>>> >> >is not valid: ovirt-node-00.phoelex.com resolves to >>>> >64:ff9b::c0a8:13d >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>>> >devices >>>> >> >on >>>> >> >this host >>>> >> > >>>> >> >The node I'm running on has an IP address of .61 and resolves >>>> >> >correctly. >>>> >> > >>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq >>>> ><shareef@jalloq.co.uk> >>>> >> >wrote: >>>> >> > >>>> >> >> Where should I be checking if there are any files/folder not >owned >>>> >by >>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and it's >fine. >>>> >> >> >>>> >> >> How would I go about checking vdsm can access those images? >If I >>>> >run >>>> >> >> virsh, it lists them and they were running yesterday even >though >>>> >the >>>> >> >HA was >>>> >> >> down. I've since restarted both hosts but the broker is >still >>>> >> >spitting out >>>> >> >> the same error (copied below). How do I find the reason
"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", the
>>>> >broker >>>> >> >can't >>>> >> >> connect to the storage? The conf file is already at DEBUG >>>> >verbosity: >>>> >> >> >>>> >> >> [handler_logfile] >>>> >> >> >>>> >> >> class=logging.handlers.TimedRotatingFileHandler >>>> >> >> >>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, >7) >>>> >> >> >>>> >> >> level=DEBUG >>>> >> >> >>>> >> >> formatter=long >>>> >> >> >>>> >> >> And what are all these .prob-<num> files that are being >created? >>>> >> >There >>>> >> >> are over 250K of them now on the mount I'm using for
and the plugin.execute:921 plugin.execute:926 plugin.execute:921 plugin.execute:926 the
Data
>>>> >domain. >>>> >> >> They're all of 0 size and of the form, >>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: >>>> >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 >>>> >> >> >>>> >> >> @eevans: The volume I have the Data Domain on has TB's free. > The >>>> >HA >>>> >> >is >>>> >> >> dead so I can't ssh in. No idea what started these errors >and the >>>> >> >other >>>> >> >> VMs were still running happily although they're on a >different >>>> >Data >>>> >> >Domain. >>>> >> >> >>>> >> >> Shareef. >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >> Connecting the storage >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Connecting storage server >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Connecting storage server >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Refreshing the storage domain >>>> >> >> >>>> >> >> MainThread::WARNING::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo >with >>>> >args >>>> >> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >> >>>> >> >> (code=350, message=Error in storage domain action: >>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >> >>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov >>>> >> ><hunter86_bg@yahoo.com> >>>> >> >> wrote: >>>> >> >> >>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < >>>> >> >>> shareef@jalloq.co.uk> wrote: >>>> >> >>> >OK, let's go through this. I'm looking at the node that at >>>> >least >>>> >> >still >>>> >> >>> >has >>>> >> >>> >some VMs running. virsh also tells me that the >HostedEngine VM >>>> >is >>>> >> >>> >running >>>> >> >>> >but it's unresponsive and I can't shut it down. >>>> >> >>> > >>>> >> >>> >1. All storage domains exist and are mounted. >>>> >> >>> >2. The ha_agent exists: >>>> >> >>> > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls >>>> >> >/rhev/data-center/mnt/ >>>> >> >>> >nas-01.phoelex.com >>>> >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >>>> >> >>> > >>>> >> >>> >dom_md ha_agent images master >>>> >> >>> > >>>> >> >>> >3. There are two links >>>> >> >>> > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll >>>> >> >/rhev/data-center/mnt/ >>>> >> >>> >nas-01.phoelex.com >>>> >> >>> >>>>
\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
>>>> >> >>> > >>>> >> >>> >total 8 >>>> >> >>> > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >hosted-engine.lockspace >>>> >-> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 >>>> >> >>> > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >hosted-engine.metadata >>>> >-> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 >>>> >> >>> > >>>> >> >>> >4. The services exist but all seem to have some sort of >warning: >>>> >> >>> > >>>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: >>>> >> >*2020-04-08 >>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time >10 >>>> >sec* >>>> >> >>> > >>>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com >supervdsmd[29409]: >>>> >> >*failed >>>> >> >>> >to >>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared >object >>>> >> >file: >>>> >> >>> >No >>>> >> >>> >such file or directory* >>>> >> >>> > >>>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: >*ERROR >>>> >> >failed >>>> >> >>> >to >>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or >>>> >> >directory'Is >>>> >> >>> >the >>>> >> >>> >Hosted Engine setup finished?* >>>> >> >>> > >>>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com >libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 >: >>>> >> >cannot >>>> >> >>> >parse >>>> >> >>> >process status data >>>> >> >>> > >>>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >22:48:27.134+0000: 29309: error : >virNetDevTapInterfaceStats:764 >>>> >: >>>> >> >>> >internal >>>> >> >>> >error: /proc/net/dev: Interface not found >>>> >> >>> > >>>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 >: >>>> >End >>>> >> >of >>>> >> >>> >file >>>> >> >>> >while reading data: Input/output error >>>> >> >>> > >>>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-09 >>>> >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 >: >>>> >End >>>> >> >of >>>> >> >>> >file >>>> >> >>> >while reading data: Input/output error >>>> >> >>> > >>>> >> >>> >5 & 6. The broker log is continually printing this error: >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >Running broker >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) >>>> >> >>> >Starting monitor >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Searching for submonitors in >>>> >> >>>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
>>>> >> >>> > >>>> >> >>> >/submonitors >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor network >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load-no-engine >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mgmt-bridge >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor network >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor engine-health >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mgmt-bridge >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load-no-engine >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mem-free >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor storage-domain >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor storage-domain >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mem-free >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor engine-health >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Finished loading submonitors >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) >>>> >> >>> >Starting storage broker >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >Connecting to VDSM >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) >>>> >> >>> >Creating a new json-rpc connection to VDSM >>>> >> >>> > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START thread >>>> >> ><Thread(Client >>>> >> >>> >localhost:54321, started daemon 139992488138496)> >(func=<bound >>>> >> >method >>>> >> >>> >Reactor.process_requests of ><yajsonrpc.betterAsyncore.Reactor >>>> >> >object at >>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) >>>> >> >>> > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) >>>> >> >>> >Stomp connection established >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >Connecting the storage >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Connecting storage server >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) >>>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not >>>> >> >available >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Connecting storage server >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >[{u'status': 0, u'id': >u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Refreshing the storage domain >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Error refreshing storage domain: Command >StorageDomain.getStats >>>> >> >with >>>> >> >>> >args >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) >>>> >> >>> >Command StorageDomain.getInfo with args {'storagedomainID': >>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> >MainThread::WARNING::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo >with >>>> >args >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >The UUID it is moaning about is indeed the one that the HA >sits >>>> >on >>>> >> >and >>>> >> >>> >is >>>> >> >>> >the one I listed the contents of in step 2 above. >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >So why can't it see this domain? >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >Thanks, Shareef. >>>> >> >>> > >>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov >>>> >> ><hunter86_bg@yahoo.com> >>>> >> >>> >wrote: >>>> >> >>> > >>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < >>>> >> >>> >> shareef@jalloq.co.uk> wrote: >>>> >> >>> >> >Don't know if this is useful or not, but I just tried to >>>> >> >shutdown >>>> >> >>> >and >>>> >> >>> >> >start >>>> >> >>> >> >another VM on one of the hosts and get the following >error: >>>> >> >>> >> > >>>> >> >>> >> >virsh # start scratch >>>> >> >>> >> > >>>> >> >>> >> >error: Failed to start domain scratch >>>> >> >>> >> > >>>> >> >>> >> >error: Network not found: no network with matching name >>>> >> >>> >> >'vdsm-ovirtmgmt' >>>> >> >>> >> > >>>> >> >>> >> >Is this not referring to the interface name as the >network is >>>> >> >called >>>> >> >>> >> >'ovirtmgnt'. >>>> >> >>> >> > >>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >wrote: >>>> >> >>> >> > >>>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't >come >>>> >up >>>> >> >and >>>> >> >>> >the >>>> >> >>> >> >> agent.log is full of the same errors. >>>> >> >>> >> >> >>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >> wrote: >>>> >> >>> >> >> >>>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh >on >>>> >the >>>> >> >>> >second >>>> >> >>> >> >host >>>> >> >>> >> >>> but my first host is still dead. >>>> >> >>> >> >>> >>>> >> >>> >> >>> First of all, what are these 56,317 .prob- files that >get >>>> >> >dumped >>>> >> >>> >to >>>> >> >>> >> >the >>>> >> >>> >> >>> NFS mounts? >>>> >> >>> >> >>> >>>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS >directories >>>> >at >>>> >> >boot? >>>> >> >>> >> >Is >>>> >> >>> >> >>> that the issue with this particular node? >>>> >> >>> >> >>> >>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM >>>> ><eevans@digitaldatatechs.com> >>>> >> >>> >wrote: >>>> >> >>> >> >>> >>>> >> >>> >> >>>> Did you try virsh list --inactive >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Eric Evans >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Digital Data Services LLC. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> 304.660.9080 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >>>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >>>> >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >>>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine >unresponsive - >>>> >how >>>> >> >to >>>> >> >>> >> >rescue? >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> I've now shut down the VMs on one host and rebooted >it >>>> >but >>>> >> >the >>>> >> >>> >> >agent >>>> >> >>> >> >>>> service doesn't start. If I run 'hosted-engine >>>> >--vm-status' >>>> >> >I >>>> >> >>> >get: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> The hosted engine configuration has not been >retrieved >>>> >from >>>> >> >>> >shared >>>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is >running and >>>> >> >the >>>> >> >>> >> >storage >>>> >> >>> >> >>>> server is reachable. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> and indeed if I list the mounts under >>>> >/rhev/data-center/mnt, >>>> >> >>> >only >>>> >> >>> >> >one of >>>> >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, >one ISO >>>> >> >Domain >>>> >> >>> >> >and two >>>> >> >>> >> >>>> Data Domains. Only one Data Domain has mounted and >this >>>> >has >>>> >> >>> >lots >>>> >> >>> >> >of .prob >>>> >> >>> >> >>>> files in. So why haven't the other NFS exports been >>>> >> >mounted? >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have helped >much >>>> >> >either. >>>> >> >>> >I >>>> >> >>> >> >can >>>> >> >>> >> >>>> start the broker service but the agent service says >no. >>>> >> >Same >>>> >> >>> >error >>>> >> >>> >> >as the >>>> >> >>> >> >>>> one in my last email. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Shareef. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >>>> >> >>> >> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> wrote: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Right, still down. I've run virsh and it doesn't >know >>>> >> >anything >>>> >> >>> >> >about >>>> >> >>> >> >>>> the engine vm. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> I've restarted the broker and agent services and I >still >>>> >get >>>> >> >>> >> >nothing in >>>> >> >>> >> >>>> virsh->list. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I >see >>>> >lots >>>> >> >of >>>> >> >>> >> >errors: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> broker.log: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Searching for submonitors in >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor network >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor network >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor engine-health >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mem-free >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor storage-domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor storage-domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mem-free >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor engine-health >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Finished loading submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >> >>>> Connecting the storage >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Connecting storage server >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Connecting storage server >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Refreshing the storage domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >>> >> >>>> Can't connect vdsm storage: Command >StorageDomain.getInfo >>>> >> >with >>>> >> >>> >args >>>> >> >>> >> >>>> {'storagedomainID': >>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >> >>> >failed: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> (code=350, message=Error in storage domain action: >>>> >> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Searching for submonitors in >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> agent.log: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Trying to restart agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>>
>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> Agent shutting down >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>>
>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >>>> >> >>> >> >>>> Found certificate common name: >ovirt-node-01.phoelex.com >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>> >> >>> >> >>>> Initializing ha-broker connection >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>> >> >>> >> >>>> Starting monitor network, options {'tcp_t_address': >'', >>>> >> >>> >> >'network_test': >>>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>> >> >>> >> >>>> Failed to start necessary monitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Traceback (most recent call last): >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> >> >>> >> >>>> line 131, in _run_agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> return action(he) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> >> >>> >> >>>> line 55, in action_proper >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> return he.start_monitoring() >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> >> >>> >> >>>> line 432, in start_monitoring >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> self._initialize_broker() >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> >> >>> >> >>>> line 556, in _initialize_broker >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> m.get('options', {})) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>> >> >>> >> >>>> line 89, in start_monitor >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start monitor >via >>>> >> >>> >> >ovirt-ha-broker: >>>> >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: >'network', >>>> >> >>> >options: >>>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', >>>> >'tcp_t_port': >>>> >> >'', >>>> >> >>> >> >'addr': >>>> >> >>> >> >>>> '192.168.1.99'}] >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Trying to restart agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>>
>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> Agent shutting down >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >>>> >> >>> >> ><hunter86_bg@yahoo.com> >>>> >> >>> >> >>>> wrote: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, >Brett" < >>>> >> >>> >> >>>> matonb@ltresources.co.uk> wrote: >>>> >> >>> >> >>>> >On the host you tried to restart the engine on: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >Add an alias to virsh (authenticates with >>>> >virsh_auth.conf) >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >alias virsh='virsh -c >>>> >> >>> >> >>>> >>>> >> >>> >>>>
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >Then run virsh: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh # list >>>> >> >>> >> >>>> > Id Name State >>>> >> >>> >> >>>> >>---------------------------------------------------- >>>> >> >>> >> >>>> > xx HostedEngine Paused >>>> >> >>> >> >>>> > xx ********** running >>>> >> >>> >> >>>> > ... >>>> >> >>> >> >>>> > xx ********** running >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >HostedEngine should be in the list, try and resume >the >>>> >> >engine: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh # resume HostedEngine >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> >wrote: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >> Thanks! >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM being >>>> >down.... >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >>>> >> >>> >> >>>> >> VM exists and is down, cleaning up and restarting >>>> >> >>> >> >>>> >> VM in WaitForLaunch >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, after >a >>>> >while >>>> >> >I >>>> >> >>> >get a >>>> >> >>> >> >>>> >status of >>>> >> >>> >> >>>> >> it being barfed... >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id:
>status >>>> >==-- >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >>>> >> >>> >> >>>> >> Status up-to-date : True >>>> >> >>> >> >>>> >> Hostname : >>>> >> >>> >ovirt-node-01.phoelex.com >>>> >> >>> >> >>>> >> Host ID : 2 >>>> >> >>> >> >>>> >> Engine status : {"reason": >"bad >>>> >vm >>>> >> >>> >status", >>>> >> >>> >> >>>> >"health": >>>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >>>> >> >>> >> >>>> >> Score : 0 >>>> >> >>> >> >>>> >> stopped : False >>>> >> >>> >> >>>> >> Local maintenance : False >>>> >> >>> >> >>>> >> crc32 : 5045f2eb >>>> >> >>> >> >>>> >> local_conf_timestamp : 1737037 >>>> >> >>> >> >>>> >> Host timestamp : 1737283 >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>>> >> >>> >> >>>> >> metadata_parse_version=1 >>>> >> >>> >> >>>> >> metadata_feature_version=1 >>>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17
>>>> >> >>> >> >>>> >> host-id=2 >>>> >> >>> >> >>>> >> score=0 >>>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 >>>> >2020) >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >>>> >> >>> >> >>>> >> maintenance=False >>>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown >>>> >> >>> >> >>>> >> stopped=False >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >>>> >> >>> >> >>>> ><matonb@ltresources.co.uk> >>>> >> >>> >> >>>> >> wrote: >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >>> First steps, on one of your hosts as root: >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> To get information: >>>> >> >>> >> >>>> >>> hosted-engine --vm-status >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> To start the engine: >>>> >> >>> >> >>>> >>> hosted-engine --vm-start >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq >>>> >> >>> >> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> >wrote: >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into >it >>>> >> >either. >>>> >> >>> >If >>>> >> >>> >> >I >>>> >> >>> >> >>>> >try to >>>> >> >>> >> >>>> >>>> log into the web-ui of the node it is running >on, I >>>> >get >>>> >> >>> >> >redirected >>>> >> >>> >> >>>> >because >>>> >> >>> >> >>>> >>>> the node can't reach the engine. >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>>> What are my next steps? >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>>> Shareef. >>>> >> >>> >> >>>> >>>>
>>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org >>>> >> >>> >> >>>> >>>> To unsubscribe send an email to >>>> >users-leave@ovirt.org >>>> >> >>> >> >>>> >>>> Privacy Statement: >>>> >> >>> >https://www.ovirt.org/privacy-policy.html >>>> >> >>> >> >>>> >>>> oVirt Code of Conduct: >>>> >> >>> >> >>>> >>>> >>>> >> https://www.ovirt.org/community/about/community-guidelines/ >>>> >> >>> >> >>>> >>>> List Archives: >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >>>> >> >>> >> > >>>> >> >>> >> >>>> >> >>> > >>>> >> >>> >>>> >> > >>>> >> >>>> > >>>> >
>>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> This has to be resolved: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Engine status : unknown >stale-data >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains >the >>>> >> >same, >>>> >> >>> >> >restart >>>> >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Verify that the engine's storage is available. Then >>>> >monitor >>>> >> >the >>>> >> >>> >> >broker >>>> >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Best Regards, >>>> >> >>> >> >>>> Strahil Nikolov >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> Hi Shareef, >>>> >> >>> >> >>>> >> >>> >> The flow of activation oVirt is more complex
plain
>KVM. >>>> >> >>> >> Mounting of the domains happen during the activation of >the >>>> >node >>>> >> >( >>>> >> >>> >the >>>> >> >>> >> HostedEngine is activating everything needed). >>>> >> >>> >> >>>> >> >>> >> Focus on the HostedEngine VM. >>>> >> >>> >> Is it running properly ? >>>> >> >>> >> >>>> >> >>> >> If not,try: >>>> >> >>> >> 1. Verify that the storage domain exists >>>> >> >>> >> 2. Check if it has 'ha_agents' directory >>>> >> >>> >> 3. Check if the links are OK, if not you can safely >remove >>>> >the >>>> >> >links >>>> >> >>> >> >>>> >> >>> >> 4. Next check the services are running: >>>> >> >>> >> A) sanlock >>>> >> >>> >> B) supervdsmd >>>> >> >>> >> C) vdsmd >>>> >> >>> >> D) libvirtd >>>> >> >>> >> >>>> >> >>> >> 5. Increase the log level for broker and agent services: >>>> >> >>> >> >>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha >>>> >> >>> >> vim *-log.conf >>>> >> >>> >> >>>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >>>> >> >>> >> >>>> >> >>> >> 6. Check what they are complaining about >>>> >> >>> >> Keep in mind that agent will keep throwing errors untill >the >>>> >> >broker >>>> >> >>> >stops >>>> >> >>> >> doing it (agent depends on broker), so broker must be >OK >>>> >before >>>> >> >>> >> peoceeding with the agent log. >>>> >> >>> >> >>>> >> >>> >> About the manual VM start, you need 2 things: >>>> >> >>> >> >>>> >> >>> >> 1. Define the VM network >>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> >>>> >> >>> >> <name>vdsm-ovirtmgmt</name> >>>> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >>>> >> >>> >> <forward mode='bridge'/> >>>> >> >>> >> <bridge name='ovirtmgmt'/> >>>> >> >>> >> </network> >>>> >> >>> >> >>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define >>>> >> >vdsm-ovirtmgmt.xml >>>> >> >>> >> >>>> >> >>> >> 2. Get an xml definition which can be found in
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C... than a the
vdsm
>log. >>>> >> >Every VM >>>> >> >>> >at >>>> >> >>> >> start up has it's configuration printed out in vdsm log >on >>>> >the >>>> >> >host >>>> >> >>> >it >>>> >> >>> >> starts. >>>> >> >>> >> Save to file and then: >>>> >> >>> >> A) virsh define myvm.xml >>>> >> >>> >> B) virsh start myvm >>>> >> >>> >> >>>> >> >>> >> It seems there is/was a problem with your NFS shares. >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> Best Regards, >>>> >> >>> >> Strahil Nikolov >>>> >> >>> >> >>>> >> >>> >>>> >> >>> Hey Shareef, >>>> >> >>> >>>> >> >>> Check if there are any files or folders not owned by >vdsm:kvm . >>>> >> >Something >>>> >> >>> like this: >>>> >> >>> >>>> >> >>> find . -not -user 36 -not -group 36 -print >>>> >> >>> >>>> >> >>> Also check if vdsm can access the images in the >>>> >> >>> '<vol-mount-point>/images' directories. >>>> >> >>> >>>> >> >>> Best Regards, >>>> >> >>> Strahil Nikolov >>>> >> >>> >>>> >> >> >>>> >> >>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? >>>> >> >>>> >> I don't see in the log output. >>>> >> >>>> >> Best Regards, >>>> >> Strahil Nikolov >>>> >> >>>> >>>> Based on your output , you got a PTR record for IPv4 & IPv6 ... >most >>>> probably it's the reason. >>>> >>>> Set the IPv6 on the interface and try again. >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>>
Do you have firewalld up and running on the host ?
Best Regards, Strahil Nikolov
I am guessing, but your interface is not asaigned to any zone , right? Just add the interface to the default zone (usually 'public').
Best Regards, Strahil Nikolov
Keep in mind that there are a lot of playbooks that can be used to deploy a HostedEngine Environment via ansible.
Keep in mind that if you plan to use oVirt in Prod, you need to know how to debug it (at least on basic level).
Best Regards, Strahil Nikolov
It's really interesting that you mention that topic. The only way I managed to break my engine was: A) bad SELINUX rpm which was solved via reinstall of the package and relabel B) Interrupted patch, as I forgot to use screen I think it is Prod ready, but it requires knowledge as it is not as dummy-proof like VMware. Yet, oVirt is way more flexible allowing you to run your own scripts before/during/after a certain event (vdsm hooks). Sadly Ansible (this is what is used for setup of gluster -> gdeploy, and for the engine) is quite dynamic and sometimes something might break. If you feel that oVirt breaks too often - just set your engine on a separate physical or virtual (non-hosted) machine, but do not complain that a free open-source product is not Production ready, just because you don't know how to debug it. You can trial the downstream solutions from Red Hat & Oracle and you will notice the difference. For me oVirt is like Fedora compared to RHEL/OEL/CentOS, but this is just a personal opinion. Best Regards, Strahil Nikolov

Actually, you've just raised a point I hadn't thought about. We have an old Xeon server that is being used to host some ESXi VMs that were needed while we transitioned to ovirt. Once I have moved those VMs I could repurpose that as the engine. On Thu, Apr 16, 2020 at 11:42 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 16, 2020 11:25:20 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Is this actually production ready? It seems to break at every step.
On Wed, Apr 15, 2020 at 5:45 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Thanks for your help but I've decided to try and reinstall from scratch. This is taking too long.
On Wed, Apr 15, 2020 at 3:25 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Yes, but there are no zones set up, just ports 22, 6801 adn 6900.
On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < > shareef@jalloq.co.uk> wrote: > >Oh this is painful. It seems to progress if you have both > >he_force_ipv4 > >set and run the deployment with the '--4' switch. > > > >But then I get a failure when the ansible script checks for > >firewalld-zones > >and doesn't get anything back. Should the deployment flow not be > >setting > >any zones it needs? > > > >2020-04-15 10:57:25,439+0000 INFO > >otopi.ovirt_hosted_engine_setup.ansible_utils > >ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get > >active list of active firewalld zones] > > > >2020-04-15 10:57:26,641+0000 DEBUG > >otopi.ovirt_hosted_engine_setup.ansible_utils > >ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': > >True, > >u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, > >u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd > >--get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 > >10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': > >1, > >u'invocation': {u'module_args': {u'creates': None, u'executable': None, > >u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set > >-euo > >pipefail && firewall-cmd --get-active-zones | grep -v > >"^\\s*interfaces"', > >u'removes': None, u'argv': None, u'warn': True, u'chdir': None, > >u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], > >u'msg': > >u'non-zero return code'} > > > >2020-04-15 10:57:26,741+0000 ERROR > >otopi.ovirt_hosted_engine_setup.ansible_utils > >ansible_utils._process_output:107 fatal: [localhost]: FAILED! => > >{"changed": true, "cmd": "set -euo pipefail && firewall-cmd > >--get-active-zones | grep -v \"^\\s*interfaces\"", "delta": > >"0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero > >return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": > >"", > >"stderr_lines": [], "stdout": "", "stdout_lines": []} > > > >On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> > >wrote: > > > >> Ha, spoke too soon. It's now stuck in a loop and a google
On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: points
me > >at > >> https://bugzilla.redhat.com/show_bug.cgi?id=1746585 > >> > >> However, forcing ipv4 doesn't seem to have fixed the loop. > >> > >> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> > >> wrote: > >> > >>> OK, that seems to have fixed it, thanks. Is this a side effect of > >>> redeploying the HE over a first time install? Nothing has changed in > >our > >>> setup and I didn't need to do this when I initially set up our > >nodes. > >>> > >>> > >>> > >>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov > ><hunter86_bg@yahoo.com> > >>> wrote: > >>> > >>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < > >>>> shareef@jalloq.co.uk> wrote: > >>>> >Hmmm, we're not using ipv6. Is that the issue? > >>>> > > >>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov > ><hunter86_bg@yahoo.com> > >>>> >wrote: > >>>> > > >>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < > >>>> >> shareef@jalloq.co.uk> wrote: > >>>> >> >Right, I've given up on recovering the HE so want to
On April 15, 2020 5:59:46 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: try
> >>>> >redeploy > >>>> >> >it. > >>>> >> >There doesn't seem to be enough information to debug why
> >>>> >> >broker/agent > >>>> >> >won't start cleanly. > >>>> >> > > >>>> >> >In running 'hosted-engine --deploy', I'm seeing the following > >error > >>>> >in > >>>> >> >the > >>>> >> >setup validation phase: > >>>> >> > > >>>> >> >2020-04-14 09:46:08,922+0000 DEBUG > >otopi.plugins.otopi.dialog.human > >>>> >> >dialog.__logString:204 DIALOG:SEND Please > >provide > >>>> >the > >>>> >> >hostname of this host on the management network > >>>> >> >[ovirt-node-00.phoelex.com]: > >>>> >> > > >>>> >> > > >>>> >> >2020-04-14 09:46:12,831+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge > >>>> >> >hostname.getResolvedAddresses:432 > >>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', > >'192.168.1.61']) > >>>> >> > > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge > >>>> >> >hostname._validateFQDNresolvability:289 > >ovirt-node-00.phoelex.com > >>>> >> >resolves > >>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) > >>>> >> > > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 > >>>> >> >execute: > >>>> >> >['/usr/bin/dig', '+noall', '+answer', > >'ovirt-node-00.phoelex.com', > >>>> >> >'ANY'], > >>>> >> >executable='None', cwd='None', env=None > >>>> >> > > >>>> >> >2020-04-14 09:46:12,871+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 > >>>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' > >>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 > >>>> >> > > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge
> >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: > >>>> >> > > >>>> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 > >>>> >> > > >>>> >> > > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge
> >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: > >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 > >>>> >> >execute: > >>>> >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', > >env=None > >>>> >> > > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 > >>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 > >>>> >> > > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge
> >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: > >>>> >> > > >>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state > >UNKNOWN > >>>> >> >group > >>>> >> >default qlen 1000 > >>>> >> > > >>>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > >>>> >> > > >>>> >> > inet 127.0.0.1/8 scope host lo > >>>> >> > > >>>> >> > valid_lft forever preferred_lft forever > >>>> >> > > >>>> >> > inet6 ::1/128 scope host > >>>> >> > > >>>> >> > valid_lft forever preferred_lft forever > >>>> >> > > >>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq > >master > >>>> >> >ovirtmgmt state UP group default qlen 1000 > >>>> >> > > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > >>>> >> > > >>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq > >state > >>>> >> >DOWN > >>>> >> >group default qlen 1000 > >>>> >> > > >>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff > >>>> >> > > >>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state > >DOWN > >>>> >> >group > >>>> >> >default qlen 1000 > >>>> >> > > >>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff > >>>> >> > > >>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN > >>>> >group > >>>> >> >default qlen 1000 > >>>> >> > > >>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff > >>>> >> > > >>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > >>>> >noqueue > >>>> >> >state UP group default qlen 1000 > >>>> >> > > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > >>>> >> > > >>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global > >ovirtmgmt > >>>> >> > > >>>> >> > valid_lft forever preferred_lft forever > >>>> >> > > >>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link > >>>> >> > > >>>> >> > valid_lft forever preferred_lft forever > >>>> >> > > >>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop > >state > >>>> >DOWN > >>>> >> >group > >>>> >> >default qlen 1000 > >>>> >> > > >>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff > >>>> >> > > >>>> >> > > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge
> >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: > >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge > >>>> >> >hostname.getLocalAddresses:251 > >>>> >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] > >>>> >> > > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG > >>>> >> >otopi.plugins.gr_he_common.network.bridge > >hostname.test_hostname:464 > >>>> >> >test_hostname exception > >>>> >> > > >>>> >> >Traceback (most recent call last): > >>>> >> > > >>>> >> >File > "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >>>> >> >line > >>>> >> >460, in test_hostname > >>>> >> > > >>>> >> > not_local_text, > >>>> >> > > >>>> >> >File > "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >>>> >> >line > >>>> >> >342, in _validateFQDNresolvability > >>>> >> > > >>>> >> > addresses=resolvedAddressesAsString > >>>> >> > > >>>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to > >>>> >64:ff9b::c0a8:13d > >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback > >>>> >devices > >>>> >> >on > >>>> >> >this host > >>>> >> > > >>>> >> >2020-04-14 09:46:12,884+0000 ERROR > >>>> >> >otopi.plugins.gr_he_common.network.bridge > >dialog.queryEnvKey:120 > >>>> >Host > >>>> >> >name > >>>> >> >is not valid: ovirt-node-00.phoelex.com resolves to > >>>> >64:ff9b::c0a8:13d > >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback > >>>> >devices > >>>> >> >on > >>>> >> >this host > >>>> >> > > >>>> >> >The node I'm running on has an IP address of .61 and resolves > >>>> >> >correctly. > >>>> >> > > >>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq > >>>> ><shareef@jalloq.co.uk> > >>>> >> >wrote: > >>>> >> > > >>>> >> >> Where should I be checking if there are any files/folder not > >owned > >>>> >by > >>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and it's > >fine. > >>>> >> >> > >>>> >> >> How would I go about checking vdsm can access those images? > >If I > >>>> >run > >>>> >> >> virsh, it lists them and they were running yesterday even > >though > >>>> >the > >>>> >> >HA was > >>>> >> >> down. I've since restarted both hosts but the broker is > >still > >>>> >> >spitting out > >>>> >> >> the same error (copied below). How do I find the reason the > >>>> >broker > >>>> >> >can't > >>>> >> >> connect to the storage? The conf file is already at DEBUG > >>>> >verbosity: > >>>> >> >> > >>>> >> >> [handler_logfile] > >>>> >> >> > >>>> >> >> class=logging.handlers.TimedRotatingFileHandler > >>>> >> >> > >>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, > >7) > >>>> >> >> > >>>> >> >> level=DEBUG > >>>> >> >> > >>>> >> >> formatter=long > >>>> >> >> > >>>> >> >> And what are all these .prob-<num> files that are being > >created? > >>>> >> >There > >>>> >> >> are over 250K of them now on the mount I'm using for
and the plugin.execute:921 plugin.execute:926 plugin.execute:921 plugin.execute:926 the
Data > >>>> >domain. > >>>> >> >> They're all of 0 size and of the form, > >>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: > >>>> >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 > >>>> >> >> > >>>> >> >> @eevans: The volume I have the Data Domain on has TB's free. > > The > >>>> >HA > >>>> >> >is > >>>> >> >> dead so I can't ssh in. No idea what started these errors > >and the > >>>> >> >other > >>>> >> >> VMs were still running happily although they're on a > >different > >>>> >Data > >>>> >> >Domain. > >>>> >> >> > >>>> >> >> Shareef. > >>>> >> >> > >>>> >> >> MainThread::INFO::2020-04-10 > >>>> >> >> > >>>> >> > >>>> >> > >>>> > >>>> > >
>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >>>> >> >> Connecting the storage > >>>> >> >> > >>>> >> >> MainThread::INFO::2020-04-10 > >>>> >> >> > >>>> >> > >>>> >> > >>>> > >>>> > >
>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >> Connecting storage server > >>>> >> >> > >>>> >> >> MainThread::INFO::2020-04-10 > >>>> >> >> > >>>> >> > >>>> >> > >>>> > >>>> > >
>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >> Connecting storage server > >>>> >> >> > >>>> >> >> MainThread::INFO::2020-04-10 > >>>> >> >> > >>>> >> > >>>> >> > >>>> > >>>> > >
>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >> Refreshing the storage domain > >>>> >> >> > >>>> >> >> MainThread::WARNING::2020-04-10 > >>>> >> >> > >>>> >> > >>>> >> > >>>> > >>>> > >
>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >>>> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo > >with > >>>> >args > >>>> >> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >>>> >failed: > >>>> >> >> > >>>> >> >> (code=350, message=Error in storage domain action: > >>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>>> >> >> > >>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov > >>>> >> ><hunter86_bg@yahoo.com> > >>>> >> >> wrote: > >>>> >> >> > >>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < > >>>> >> >>> shareef@jalloq.co.uk> wrote: > >>>> >> >>> >OK, let's go through this. I'm looking at the node that at > >>>> >least > >>>> >> >still > >>>> >> >>> >has > >>>> >> >>> >some VMs running. virsh also tells me that the > >HostedEngine VM > >>>> >is > >>>> >> >>> >running > >>>> >> >>> >but it's unresponsive and I can't shut it down. > >>>> >> >>> > > >>>> >> >>> >1. All storage domains exist and are mounted. > >>>> >> >>> >2. The ha_agent exists: > >>>> >> >>> > > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls > >>>> >> >/rhev/data-center/mnt/ > >>>> >> >>> >nas-01.phoelex.com > >>>> >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ > >>>> >> >>> > > >>>> >> >>> >dom_md ha_agent images master > >>>> >> >>> > > >>>> >> >>> >3. There are two links > >>>> >> >>> > > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll > >>>> >> >/rhev/data-center/mnt/ > >>>> >> >>> >nas-01.phoelex.com > >>>> >> >>> > >>>>
\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
> >>>> >> >>> > > >>>> >> >>> >total 8 > >>>> >> >>> > > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 > >hosted-engine.lockspace > >>>> >-> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 > >>>> >> >>> > > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 > >hosted-engine.metadata > >>>> >-> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 > >>>> >> >>> > > >>>> >> >>> >4. The services exist but all seem to have some sort of > >warning: > >>>> >> >>> > > >>>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: > >>>> >> >*2020-04-08 > >>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time > >10 > >>>> >sec* > >>>> >> >>> > > >>>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com > >supervdsmd[29409]: > >>>> >> >*failed > >>>> >> >>> >to > >>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared > >object > >>>> >> >file: > >>>> >> >>> >No > >>>> >> >>> >such file or directory* > >>>> >> >>> > > >>>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: > >*ERROR > >>>> >> >failed > >>>> >> >>> >to > >>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or > >>>> >> >directory'Is > >>>> >> >>> >the > >>>> >> >>> >Hosted Engine setup finished?* > >>>> >> >>> > > >>>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com > >libvirtd[29307]: > >>>> >> >2020-04-08 > >>>> >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 > >: > >>>> >> >cannot > >>>> >> >>> >parse > >>>> >> >>> >process status data > >>>> >> >>> > > >>>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: > >>>> >> >2020-04-08 > >>>> >> >>> >22:48:27.134+0000: 29309: error : > >virNetDevTapInterfaceStats:764 > >>>> >: > >>>> >> >>> >internal > >>>> >> >>> >error: /proc/net/dev: Interface not found > >>>> >> >>> > > >>>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: > >>>> >> >2020-04-08 > >>>> >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 > >: > >>>> >End > >>>> >> >of > >>>> >> >>> >file > >>>> >> >>> >while reading data: Input/output error > >>>> >> >>> > > >>>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: > >>>> >> >2020-04-09 > >>>> >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 > >: > >>>> >End > >>>> >> >of > >>>> >> >>> >file > >>>> >> >>> >while reading data: Input/output error > >>>> >> >>> > > >>>> >> >>> >5 & 6. The broker log is continually printing this error: > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>>> >> >>> >Running broker > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) > >>>> >> >>> >Starting monitor > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Searching for submonitors in > >>>> >> >>> > /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker > >>>> >> >>> > > >>>> >> >>> >/submonitors > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor network > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor cpu-load-no-engine > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor mgmt-bridge > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor network > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor cpu-load > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor engine-health > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor mgmt-bridge > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor cpu-load-no-engine > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor cpu-load > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor mem-free > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor storage-domain > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor storage-domain > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor mem-free > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Loaded submonitor engine-health > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >Finished loading submonitors > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) > >>>> >> >>> >Starting storage broker > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >>>> >> >>> >Connecting to VDSM > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) > >>>> >> >>> >Creating a new json-rpc connection to VDSM > >>>> >> >>> > > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 > >>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START thread > >>>> >> ><Thread(Client > >>>> >> >>> >localhost:54321, started daemon 139992488138496)> > >(func=<bound > >>>> >> >method > >>>> >> >>> >Reactor.process_requests of > ><yajsonrpc.betterAsyncore.Reactor > >>>> >> >object at > >>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) > >>>> >> >>> > > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) > >>>> >> >>> >Stomp connection established > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) > >>>> >> >Sending > >>>> >> >>> >response > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >>>> >> >>> >Connecting the storage > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >>> >Connecting storage server > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) > >>>> >> >Sending > >>>> >> >>> >response > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) > >>>> >> >Sending > >>>> >> >>> >response > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) > >>>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not > >>>> >> >available > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >>> >Connecting storage server > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) > >>>> >> >Sending > >>>> >> >>> >response > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >>> >[{u'status': 0, u'id': > >u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] > >>>> >> >>> > > >>>> >> >>> >MainThread::INFO::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >>> >Refreshing the storage domain > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) > >>>> >> >Sending > >>>> >> >>> >response > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >>> >Error refreshing storage domain: Command > >StorageDomain.getStats > >>>> >> >with > >>>> >> >>> >args > >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >>>> >failed: > >>>> >> >>> > > >>>> >> >>> >(code=350, message=Error in storage domain action: > >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) > >>>> >> >Sending > >>>> >> >>> >response > >>>> >> >>> > > >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) > >>>> >> >>> >Command StorageDomain.getInfo with args {'storagedomainID': > >>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >>>> >> >>> > > >>>> >> >>> >(code=350, message=Error in storage domain action: > >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>>> >> >>> > > >>>> >> >>> >MainThread::WARNING::2020-04-09 > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >>>> >> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo > >with > >>>> >args > >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >>>> >failed: > >>>> >> >>> > > >>>> >> >>> >(code=350, message=Error in storage domain action: > >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>>> >> >>> > > >>>> >> >>> > > >>>> >> >>> >The UUID it is moaning about is indeed the one that the HA > >sits > >>>> >on > >>>> >> >and > >>>> >> >>> >is > >>>> >> >>> >the one I listed the contents of in step 2 above. > >>>> >> >>> > > >>>> >> >>> > > >>>> >> >>> >So why can't it see this domain? > >>>> >> >>> > > >>>> >> >>> > > >>>> >> >>> >Thanks, Shareef. > >>>> >> >>> > > >>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov > >>>> >> ><hunter86_bg@yahoo.com> > >>>> >> >>> >wrote: > >>>> >> >>> > > >>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < > >>>> >> >>> >> shareef@jalloq.co.uk> wrote: > >>>> >> >>> >> >Don't know if this is useful or not, but I just tried to > >>>> >> >shutdown > >>>> >> >>> >and > >>>> >> >>> >> >start > >>>> >> >>> >> >another VM on one of the hosts and get the following > >error: > >>>> >> >>> >> > > >>>> >> >>> >> >virsh # start scratch > >>>> >> >>> >> > > >>>> >> >>> >> >error: Failed to start domain scratch > >>>> >> >>> >> > > >>>> >> >>> >> >error: Network not found: no network with matching name > >>>> >> >>> >> >'vdsm-ovirtmgmt' > >>>> >> >>> >> > > >>>> >> >>> >> >Is this not referring to the interface name as the > >network is > >>>> >> >called > >>>> >> >>> >> >'ovirtmgnt'. > >>>> >> >>> >> > > >>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq > >>>> >> >>> ><shareef@jalloq.co.uk> > >>>> >> >>> >> >wrote: > >>>> >> >>> >> > > >>>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't > >come > >>>> >up > >>>> >> >and > >>>> >> >>> >the > >>>> >> >>> >> >> agent.log is full of the same errors. > >>>> >> >>> >> >> > >>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq > >>>> >> >>> ><shareef@jalloq.co.uk> > >>>> >> >>> >> >> wrote: > >>>> >> >>> >> >> > >>>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh > >on > >>>> >the > >>>> >> >>> >second > >>>> >> >>> >> >host > >>>> >> >>> >> >>> but my first host is still dead. > >>>> >> >>> >> >>> > >>>> >> >>> >> >>> First of all, what are these 56,317 .prob- files that > >get > >>>> >> >dumped > >>>> >> >>> >to > >>>> >> >>> >> >the > >>>> >> >>> >> >>> NFS mounts? > >>>> >> >>> >> >>> > >>>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS > >directories > >>>> >at > >>>> >> >boot? > >>>> >> >>> >> >Is > >>>> >> >>> >> >>> that the issue with this particular node? > >>>> >> >>> >> >>> > >>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM > >>>> ><eevans@digitaldatatechs.com> > >>>> >> >>> >wrote: > >>>> >> >>> >> >>> > >>>> >> >>> >> >>>> Did you try virsh list --inactive > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Eric Evans > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Digital Data Services LLC. > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> 304.660.9080 > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> > >>>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM > >>>> >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> > >>>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> > >>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine > >unresponsive - > >>>> >how > >>>> >> >to > >>>> >> >>> >> >rescue? > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> I've now shut down the VMs on one host and rebooted > >it > >>>> >but > >>>> >> >the > >>>> >> >>> >> >agent > >>>> >> >>> >> >>>> service doesn't start. If I run 'hosted-engine > >>>> >--vm-status' > >>>> >> >I > >>>> >> >>> >get: > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> The hosted engine configuration has not been > >retrieved > >>>> >from > >>>> >> >>> >shared > >>>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is > >running and > >>>> >> >the > >>>> >> >>> >> >storage > >>>> >> >>> >> >>>> server is reachable. > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> and indeed if I list the mounts under > >>>> >/rhev/data-center/mnt, > >>>> >> >>> >only > >>>> >> >>> >> >one of > >>>> >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, > >one ISO > >>>> >> >Domain > >>>> >> >>> >> >and two > >>>> >> >>> >> >>>> Data Domains. Only one Data Domain has mounted and > >this > >>>> >has > >>>> >> >>> >lots > >>>> >> >>> >> >of .prob > >>>> >> >>> >> >>>> files in. So why haven't the other NFS exports been > >>>> >> >mounted? > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have helped > >much > >>>> >> >either. > >>>> >> >>> >I > >>>> >> >>> >> >can > >>>> >> >>> >> >>>> start the broker service but the agent service says > >no. > >>>> >> >Same > >>>> >> >>> >error > >>>> >> >>> >> >as the > >>>> >> >>> >> >>>> one in my last email. > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Shareef. > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq > >>>> >> >>> >> ><shareef@jalloq.co.uk> > >>>> >> >>> >> >>>> wrote: > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Right, still down. I've run virsh and it doesn't > >know > >>>> >> >anything > >>>> >> >>> >> >about > >>>> >> >>> >> >>>> the engine vm. > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> I've restarted the broker and agent services and I > >still > >>>> >get > >>>> >> >>> >> >nothing in > >>>> >> >>> >> >>>> virsh->list. > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I > >see > >>>> >lots > >>>> >> >of > >>>> >> >>> >> >errors: > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> broker.log: > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Searching for submonitors in > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor network > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor network > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor cpu-load > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor engine-health > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor cpu-load > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor mem-free > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor storage-domain > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor storage-domain > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor mem-free > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Loaded submonitor engine-health > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Finished loading submonitors > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >>>> >> >>> >> >>>> Connecting the storage > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >>> >> >>>> Connecting storage server > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >>> >> >>>> Connecting storage server > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >>>> >> >>> >> >>>> Refreshing the storage domain > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >>>> >> >>> >> >>>> Can't connect vdsm storage: Command > >StorageDomain.getInfo > >>>> >> >with > >>>> >> >>> >args > >>>> >> >>> >> >>>> {'storagedomainID': > >>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >>>> >> >>> >failed: > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> (code=350, message=Error in storage domain action: > >>>> >> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >>>> >> >>> >> >>>> Searching for submonitors in > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> agent.log: > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>>> >> >>> >> >>>> Trying to restart agent > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> > >>>> > >>>> > >
>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>>> >> >>> >> >>>> Agent shutting down > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> > >>>> > >>>> > >
>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) > >>>> >> >>> >> >>>> Found certificate common name: > >ovirt-node-01.phoelex.com > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >>>> >> >>> >> >>>> Initializing ha-broker connection > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) > >>>> >> >>> >> >>>> Starting monitor network, options {'tcp_t_address': > >'', > >>>> >> >>> >> >'network_test': > >>>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >>>> >> >>> >> >>>> Failed to start necessary monitors > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>>> >> >>> >> >>>> Traceback (most recent call last): > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> File > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >>>> >> >>> >> >>>> line 131, in _run_agent > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> return action(he) > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> File > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >>>> >> >>> >> >>>> line 55, in action_proper > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> return he.start_monitoring() > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> File > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >>>> >> >>> >> >>>> line 432, in start_monitoring > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> self._initialize_broker() > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> File > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >>>> >> >>> >> >>>> line 556, in _initialize_broker > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> m.get('options', {})) > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> File > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >>>> >> >>> >> >>>> line 89, in start_monitor > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start monitor > >via > >>>> >> >>> >> >ovirt-ha-broker: > >>>> >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: > >'network', > >>>> >> >>> >options: > >>>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', > >>>> >'tcp_t_port': > >>>> >> >'', > >>>> >> >>> >> >'addr': > >>>> >> >>> >> >>>> '192.168.1.99'}] > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> > >>>> >> > >>>> >> > >>>> > >>>> > >
>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >>>> >> >>> >> >>>> Trying to restart agent > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> > >>>> >> > >>>> > >>>> > >
> >>>> >> >>> >> >>>> >> host-id=1 > >>>> >> >>> >> >>>> >> score=3400 > >>>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 > >2020) > >>>> >> >>> >> >>>> >> conf_on_shared_storage=True > >>>> >> >>> >> >>>> >> maintenance=False > >>>> >> >>> >> >>>> >> state=EngineDown > >>>> >> >>> >> >>>> >> stopped=False > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id:
> >status > >>>> >==-- > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> conf_on_shared_storage : True > >>>> >> >>> >> >>>> >> Status up-to-date : False > >>>> >> >>> >> >>>> >> Hostname : > >>>> >> >>> >ovirt-node-00.phoelex.com > >>>> >> >>> >> >>>> >> Host ID : 1 > >>>> >> >>> >> >>>> >> Engine status : unknown > >>>> >stale-data > >>>> >> >>> >> >>>> >> Score : 3400 > >>>> >> >>> >> >>>> >> stopped : False > >>>> >> >>> >> >>>> >> Local maintenance : False > >>>> >> >>> >> >>>> >> crc32 : 9c4a034b > >>>> >> >>> >> >>>> >> local_conf_timestamp : 523362 > >>>> >> >>> >> >>>> >> Host timestamp : 523608 > >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): > >>>> >> >>> >> >>>> >> metadata_parse_version=1 > >>>> >> >>> >> >>>> >> metadata_feature_version=1 > >>>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11
>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >>>> >> >>> >> >>>> Agent shutting down > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov > >>>> >> >>> >> ><hunter86_bg@yahoo.com> > >>>> >> >>> >> >>>> wrote: > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, > >Brett" < > >>>> >> >>> >> >>>> matonb@ltresources.co.uk> wrote: > >>>> >> >>> >> >>>> >On the host you tried to restart the engine on: > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >Add an alias to virsh (authenticates with > >>>> >virsh_auth.conf) > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >alias virsh='virsh -c > >>>> >> >>> >> >>>> > >>>> >> >>> > >>>> >
>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >Then run virsh: > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >virsh > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >virsh # list > >>>> >> >>> >> >>>> > Id Name State > >>>> >> >>> >> >>>> > >>---------------------------------------------------- > >>>> >> >>> >> >>>> > xx HostedEngine Paused > >>>> >> >>> >> >>>> > xx ********** running > >>>> >> >>> >> >>>> > ... > >>>> >> >>> >> >>>> > xx ********** running > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >HostedEngine should be in the list, try and resume > >the > >>>> >> >engine: > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >virsh # resume HostedEngine > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq > >>>> >> >>> ><shareef@jalloq.co.uk> > >>>> >> >>> >> >>>> >wrote: > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> >> Thanks! > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM being > >>>> >down.... > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >>>> >> >>> >> >>>> >> VM exists and is down, cleaning up and restarting > >>>> >> >>> >> >>>> >> VM in WaitForLaunch > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, after > >a > >>>> >while > >>>> >> >I > >>>> >> >>> >get a > >>>> >> >>> >> >>>> >status of > >>>> >> >>> >> >>>> >> it being barfed... > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id:
> >status > >>>> >==-- > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> conf_on_shared_storage : True > >>>> >> >>> >> >>>> >> Status up-to-date : True > >>>> >> >>> >> >>>> >> Hostname : > >>>> >> >>> >ovirt-node-01.phoelex.com > >>>> >> >>> >> >>>> >> Host ID : 2 > >>>> >> >>> >> >>>> >> Engine status : {"reason": > >"bad > >>>> >vm > >>>> >> >>> >status", > >>>> >> >>> >> >>>> >"health": > >>>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} > >>>> >> >>> >> >>>> >> Score : 0 > >>>> >> >>> >> >>>> >> stopped : False > >>>> >> >>> >> >>>> >> Local maintenance : False > >>>> >> >>> >> >>>> >> crc32 : 5045f2eb > >>>> >> >>> >> >>>> >> local_conf_timestamp : 1737037 > >>>> >> >>> >> >>>> >> Host timestamp : 1737283 > >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): > >>>> >> >>> >> >>>> >> metadata_parse_version=1 > >>>> >> >>> >> >>>> >> metadata_feature_version=1 > >>>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17
> >>>> >> >>> >> >>>> >> host-id=2 > >>>> >> >>> >> >>>> >> score=0 > >>>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 > >>>> >2020) > >>>> >> >>> >> >>>> >> conf_on_shared_storage=True > >>>> >> >>> >> >>>> >> maintenance=False > >>>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown > >>>> >> >>> >> >>>> >> stopped=False > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > >>>> >> >>> >> >>>> ><matonb@ltresources.co.uk> > >>>> >> >>> >> >>>> >> wrote: > >>>> >> >>> >> >>>> >> > >>>> >> >>> >> >>>> >>> First steps, on one of your hosts as root: > >>>> >> >>> >> >>>> >>> > >>>> >> >>> >> >>>> >>> To get information: > >>>> >> >>> >> >>>> >>> hosted-engine --vm-status > >>>> >> >>> >> >>>> >>> > >>>> >> >>> >> >>>> >>> To start the engine: > >>>> >> >>> >> >>>> >>> hosted-engine --vm-start > >>>> >> >>> >> >>>> >>> > >>>> >> >>> >> >>>> >>> > >>>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq > >>>> >> >>> >> ><shareef@jalloq.co.uk> > >>>> >> >>> >> >>>> >wrote: > >>>> >> >>> >> >>>> >>> > >>>> >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into > >it > >>>> >> >either. > >>>> >> >>> >If > >>>> >> >>> >> >I > >>>> >> >>> >> >>>> >try to > >>>> >> >>> >> >>>> >>>> log into the web-ui of the node it is running > >on, I > >>>> >get > >>>> >> >>> >> >redirected > >>>> >> >>> >> >>>> >because > >>>> >> >>> >> >>>> >>>> the node can't reach the engine. > >>>> >> >>> >> >>>> >>>> > >>>> >> >>> >> >>>> >>>> What are my next steps? > >>>> >> >>> >> >>>> >>>> > >>>> >> >>> >> >>>> >>>> Shareef. > >>>> >> >>> >> >>>> >>>> _______________________________________________ > >>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org > >>>> >> >>> >> >>>> >>>> To unsubscribe send an email to > >>>> >users-leave@ovirt.org > >>>> >> >>> >> >>>> >>>> Privacy Statement: > >>>> >> >>> >https://www.ovirt.org/privacy-policy.html > >>>> >> >>> >> >>>> >>>> oVirt Code of Conduct: > >>>> >> >>> >> >>>> >>>> > >>>> >> https://www.ovirt.org/community/about/community-guidelines/ > >>>> >> >>> >> >>>> >>>> List Archives: > >>>> >> >>> >> >>>> >>>> > >>>> >> >>> >> >>>> > > >>>> >> >>> >> >>>> > >>>> >> >>> >> > > >>>> >> >>> >> > >>>> >> >>> > > >>>> >> >>> > >>>> >> > > >>>> >> > >>>> > > >>>> > > >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
> >>>> >> >>> >> >>>> >>>> > >>>> >> >>> >> >>>> >>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> This has to be resolved: > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Engine status : unknown > >stale-data > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains > >the > >>>> >> >same, > >>>> >> >>> >> >restart > >>>> >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Verify that the engine's storage is available. Then > >>>> >monitor > >>>> >> >the > >>>> >> >>> >> >broker > >>>> >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> Best Regards, > >>>> >> >>> >> >>>> Strahil Nikolov > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> > >>>> >> >>> >> > >>>> >> >>> >> Hi Shareef, > >>>> >> >>> >> > >>>> >> >>> >> The flow of activation oVirt is more complex than a plain > >KVM. > >>>> >> >>> >> Mounting of the domains happen during the activation of > >the > >>>> >node > >>>> >> >( > >>>> >> >>> >the > >>>> >> >>> >> HostedEngine is activating everything needed). > >>>> >> >>> >> > >>>> >> >>> >> Focus on the HostedEngine VM. > >>>> >> >>> >> Is it running properly ? > >>>> >> >>> >> > >>>> >> >>> >> If not,try: > >>>> >> >>> >> 1. Verify that the storage domain exists > >>>> >> >>> >> 2. Check if it has 'ha_agents' directory > >>>> >> >>> >> 3. Check if the links are OK, if not you can safely > >remove > >>>> >the > >>>> >> >links > >>>> >> >>> >> > >>>> >> >>> >> 4. Next check the services are running: > >>>> >> >>> >> A) sanlock > >>>> >> >>> >> B) supervdsmd > >>>> >> >>> >> C) vdsmd > >>>> >> >>> >> D) libvirtd > >>>> >> >>> >> > >>>> >> >>> >> 5. Increase the log level for broker and agent services: > >>>> >> >>> >> > >>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha > >>>> >> >>> >> vim *-log.conf > >>>> >> >>> >> > >>>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent > >>>> >> >>> >> > >>>> >> >>> >> 6. Check what they are complaining about > >>>> >> >>> >> Keep in mind that agent will keep throwing errors untill > >the > >>>> >> >broker > >>>> >> >>> >stops > >>>> >> >>> >> doing it (agent depends on broker), so broker must be > >OK > >>>> >before > >>>> >> >>> >> peoceeding with the agent log. > >>>> >> >>> >> > >>>> >> >>> >> About the manual VM start, you need 2 things: > >>>> >> >>> >> > >>>> >> >>> >> 1. Define the VM network > >>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> > >>>> >> >>> >> <name>vdsm-ovirtmgmt</name> > >>>> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> > >>>> >> >>> >> <forward mode='bridge'/> > >>>> >> >>> >> <bridge name='ovirtmgmt'/> > >>>> >> >>> >> </network> > >>>> >> >>> >> > >>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define > >>>> >> >vdsm-ovirtmgmt.xml > >>>> >> >>> >> > >>>> >> >>> >> 2. Get an xml definition which can be found in the vdsm > >log. > >>>> >> >Every VM > >>>> >> >>> >at > >>>> >> >>> >> start up has it's configuration printed out in vdsm log > >on > >>>> >the > >>>> >> >host > >>>> >> >>> >it > >>>> >> >>> >> starts. > >>>> >> >>> >> Save to file and then: > >>>> >> >>> >> A) virsh define myvm.xml > >>>> >> >>> >> B) virsh start myvm > >>>> >> >>> >> > >>>> >> >>> >> It seems there is/was a problem with your NFS shares. > >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> >> Best Regards, > >>>> >> >>> >> Strahil Nikolov > >>>> >> >>> >> > >>>> >> >>> > >>>> >> >>> Hey Shareef, > >>>> >> >>> > >>>> >> >>> Check if there are any files or folders not owned by > >vdsm:kvm . > >>>> >> >Something > >>>> >> >>> like this: > >>>> >> >>> > >>>> >> >>> find . -not -user 36 -not -group 36 -print > >>>> >> >>> > >>>> >> >>> Also check if vdsm can access the images in the > >>>> >> >>> '<vol-mount-point>/images' directories. > >>>> >> >>> > >>>> >> >>> Best Regards, > >>>> >> >>> Strahil Nikolov > >>>> >> >>> > >>>> >> >> > >>>> >> > >>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? > >>>> >> > >>>> >> I don't see in the log output. > >>>> >> > >>>> >> Best Regards, > >>>> >> Strahil Nikolov > >>>> >> > >>>> > >>>> Based on your output , you got a PTR record for IPv4 & IPv6 ... > >most > >>>> probably it's the reason. > >>>> > >>>> Set the IPv6 on the interface and try again. > >>>> > >>>> Best Regards, > >>>> Strahil Nikolov > >>>> > >>> > > Do you have firewalld up and running on the host ? > > Best Regards, > Strahil Nikolov >
I am guessing, but your interface is not asaigned to any zone , right? Just add the interface to the default zone (usually 'public').
Best Regards, Strahil Nikolov
Keep in mind that there are a lot of playbooks that can be used to deploy a HostedEngine Environment via ansible.
Keep in mind that if you plan to use oVirt in Prod, you need to know how to debug it (at least on basic level).
Best Regards, Strahil Nikolov
It's really interesting that you mention that topic. The only way I managed to break my engine was: A) bad SELINUX rpm which was solved via reinstall of the package and relabel B) Interrupted patch, as I forgot to use screen
I think it is Prod ready, but it requires knowledge as it is not as dummy-proof like VMware. Yet, oVirt is way more flexible allowing you to run your own scripts before/during/after a certain event (vdsm hooks).
Sadly Ansible (this is what is used for setup of gluster -> gdeploy, and for the engine) is quite dynamic and sometimes something might break.
If you feel that oVirt breaks too often - just set your engine on a separate physical or virtual (non-hosted) machine, but do not complain that a free open-source product is not Production ready, just because you don't know how to debug it.
You can trial the downstream solutions from Red Hat & Oracle and you will notice the difference. For me oVirt is like Fedora compared to RHEL/OEL/CentOS, but this is just a personal opinion.
Best Regards, Strahil Nikolov

OK, to wrap up this thread and to provide some detail as to how it concluded.... I now have the engine back up and running. The final issue was that I needed to re-create the Synology share that the HE was stored on. I got past all my issues and up to the domain setup stage. Obviously I couldn't install to the same share with the original domain still there so I deleted that (luckily other VMs are in a different share) but the install still failed with a storage domain creation error. So I created a new share and the install could now progress. So I have no idea what happened but I seem to have suffered some sort of failure of a shared folder on my Synology that caused issues with oVirt. I could still mount the folder manually and create/edit/delete files, and the engine was generating 100K's worth of .prob-* files, but it was somehow corrupt? I've copied the ansible error at the end of this mail. The other issue was the IPv6 one which was strange. I didn't see this when I first installed and setup oVirt but perhaps something changed somewhere in our setup. So as Strahil pointed out, I needed the IPV6ADDR entry in my ifcfg-eno1 for the interface being used for the node. The pain is that this is overwritten by the deployment so if that fails, you have to re-add it. So my interface config looks like this now: # Generated by VDSM version 4.30.40.1 DEVICE=eno1 BRIDGE=ovirtmgmt ONBOOT=yes MTU=1500 DEFROUTE=no NM_CONTROLLED=no IPV6INIT=no IPV6ADDR=64:ff9b::c0a8:13d Then I had the strange firewalld issue that I can't explain. I re-installed the node from scratch to resolve that as I'd lost patience. So thanks for all the help and I hope I never have to do that again. :-) Shareef. Ansible storage domain error: 2020-04-16 11:11:55,872+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Add NFS storage domain] 2020-04-16 11:11:58,777+0000 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'invocation': {u'module_args': {u'comment': None, u'warning_low_space': None, u'gluster fs': None, u'localfs': None, u'managed_block_storage': None, u'data_center': u'Default', u'id': None, u'iscsi': None, u'state': u'unattached', u'wipe_after_delete': None, u'destroy': None, u'fcp': None, u 'description': None, u'format': None, u'nested_attributes': [], u'host': u' ovirt-node-00.phoelex.com', u'discard_after_delete': None, u'wait': True, u'domain_function': u'data', u'name': u'hosted_storage' , u'critical_space_action_blocker': None, u'posixfs': None, u'poll_interval': 3, u'fetch_nested': False, u'nfs': {u'path': u'/volume1/ovirt', u'version': u'auto', u'mount_options': u'', u'address': u'nas- 01.phoelex.com'}, u'timeout': 180, u'backup': None}}, u'msg': u'Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably c ontains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.', u'exception': u'Traceback (most recent call last):\n File "/tmp/ansible_ovirt_storage_domain_payload_6uM8mE/ansible_ovirt_storage_domain_payload.zip/ansible/modules/cloud/ovirt/ovirt_storage_domain.py", line 792, in main\n File "/tmp/ansible_ovirt_storag e_domain_payload_6uM8mE/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py", line 621, in create\n **kwargs\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 25168, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 232, in _internal_add\n return future.wait() i f wait else future\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in wait\n return self._code(response)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", l ine 229, in callback\n self._check_fault(response)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, in _check_fault\n self._raise_error(response, body)\n File "/usr/lib6 4/python2.7/site-packages/ovirtsdk4/service.py", line 118, in _raise_error\n raise error\nError: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected s torage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.\n', u'changed': Fa lse, u'_ansible_no_log': False} 2020-04-16 11:11:58,877+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400. 2020-04-16 11:11:58,978+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]\". HTTP response code is 400."} On Thu, Apr 16, 2020 at 12:14 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote: > Actually, you've just raised a point I hadn't thought about. We have an > old Xeon server that is being used to host some ESXi VMs that were needed > while we transitioned to ovirt. Once I have moved those VMs I could > repurpose that as the engine. > > On Thu, Apr 16, 2020 at 11:42 AM Strahil Nikolov <hunter86_bg@yahoo.com> > wrote: > >> On April 16, 2020 11:25:20 AM GMT+03:00, Shareef Jalloq < >> shareef@jalloq.co.uk> wrote: >> >Is this actually production ready? It seems to break at every step. >> > >> >On Wed, Apr 15, 2020 at 5:45 PM Strahil Nikolov <hunter86_bg@yahoo.com> >> >wrote: >> > >> >> On April 15, 2020 5:59:46 PM GMT+03:00, Shareef Jalloq < >> >> shareef@jalloq.co.uk> wrote: >> >> >Thanks for your help but I've decided to try and reinstall from >> >> >scratch. >> >> >This is taking too long. >> >> > >> >> >On Wed, Apr 15, 2020 at 3:25 PM Strahil Nikolov >> ><hunter86_bg@yahoo.com> >> >> >wrote: >> >> > >> >> >> On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq < >> >> >> shareef@jalloq.co.uk> wrote: >> >> >> >Yes, but there are no zones set up, just ports 22, 6801 adn 6900. >> >> >> > >> >> >> >On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov >> >> >> ><hunter86_bg@yahoo.com> >> >> >> >wrote: >> >> >> > >> >> >> >> On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < >> >> >> >> shareef@jalloq.co.uk> wrote: >> >> >> >> >Oh this is painful. It seems to progress if you have both >> >> >> >> >he_force_ipv4 >> >> >> >> >set and run the deployment with the '--4' switch. >> >> >> >> > >> >> >> >> >But then I get a failure when the ansible script checks for >> >> >> >> >firewalld-zones >> >> >> >> >and doesn't get anything back. Should the deployment flow not >> >be >> >> >> >> >setting >> >> >> >> >any zones it needs? >> >> >> >> > >> >> >> >> >2020-04-15 10:57:25,439+0000 INFO >> >> >> >> >otopi.ovirt_hosted_engine_setup.ansible_utils >> >> >> >> >ansible_utils._process_output:109 TASK >> >[ovirt.hosted_engine_setup >> >> >: >> >> >> >Get >> >> >> >> >active list of active firewalld zones] >> >> >> >> > >> >> >> >> >2020-04-15 10:57:26,641+0000 DEBUG >> >> >> >> >otopi.ovirt_hosted_engine_setup.ansible_utils >> >> >> >> >ansible_utils._process_output:103 {u'stderr_lines': [], >> >> >u'changed': >> >> >> >> >True, >> >> >> >> >u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': >> >False, >> >> >> >> >u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd >> >> >> >> >--get-active-zones | grep -v "^\\s*interfaces"', u'start': >> >> >> >u'2020-04-15 >> >> >> >> >10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', >> >> >> >u'rc': >> >> >> >> >1, >> >> >> >> >u'invocation': {u'module_args': {u'creates': None, >> >u'executable': >> >> >> >None, >> >> >> >> >u'_uses_shell': True, u'strip_empty_ends': True, >> >u'_raw_params': >> >> >> >u'set >> >> >> >> >-euo >> >> >> >> >pipefail && firewall-cmd --get-active-zones | grep -v >> >> >> >> >"^\\s*interfaces"', >> >> >> >> >u'removes': None, u'argv': None, u'warn': True, u'chdir': >> >None, >> >> >> >> >u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': >> >> >[], >> >> >> >> >u'msg': >> >> >> >> >u'non-zero return code'} >> >> >> >> > >> >> >> >> >2020-04-15 10:57:26,741+0000 ERROR >> >> >> >> >otopi.ovirt_hosted_engine_setup.ansible_utils >> >> >> >> >ansible_utils._process_output:107 fatal: [localhost]: FAILED! >> >=> >> >> >> >> >{"changed": true, "cmd": "set -euo pipefail && firewall-cmd >> >> >> >> >--get-active-zones | grep -v \"^\\s*interfaces\"", "delta": >> >> >> >> >"0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": >> >> >> >"non-zero >> >> >> >> >return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", >> >> >> >"stderr": >> >> >> >> >"", >> >> >> >> >"stderr_lines": [], "stdout": "", "stdout_lines": []} >> >> >> >> > >> >> >> >> >On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq >> >> >> ><shareef@jalloq.co.uk> >> >> >> >> >wrote: >> >> >> >> > >> >> >> >> >> Ha, spoke too soon. It's now stuck in a loop and a google >> >> >points >> >> >> >me >> >> >> >> >at >> >> >> >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1746585 >> >> >> >> >> >> >> >> >> >> However, forcing ipv4 doesn't seem to have fixed the loop. >> >> >> >> >> >> >> >> >> >> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq >> >> >> ><shareef@jalloq.co.uk> >> >> >> >> >> wrote: >> >> >> >> >> >> >> >> >> >>> OK, that seems to have fixed it, thanks. Is this a side >> >> >effect >> >> >> >of >> >> >> >> >>> redeploying the HE over a first time install? Nothing has >> >> >changed >> >> >> >in >> >> >> >> >our >> >> >> >> >>> setup and I didn't need to do this when I initially set up >> >our >> >> >> >> >nodes. >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov >> >> >> >> ><hunter86_bg@yahoo.com> >> >> >> >> >>> wrote: >> >> >> >> >>> >> >> >> >> >>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < >> >> >> >> >>>> shareef@jalloq.co.uk> wrote: >> >> >> >> >>>> >Hmmm, we're not using ipv6. Is that the issue? >> >> >> >> >>>> > >> >> >> >> >>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov >> >> >> >> ><hunter86_bg@yahoo.com> >> >> >> >> >>>> >wrote: >> >> >> >> >>>> > >> >> >> >> >>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq >> >< >> >> >> >> >>>> >> shareef@jalloq.co.uk> wrote: >> >> >> >> >>>> >> >Right, I've given up on recovering the HE so want to >> >try >> >> >and >> >> >> >> >>>> >redeploy >> >> >> >> >>>> >> >it. >> >> >> >> >>>> >> >There doesn't seem to be enough information to debug >> >why >> >> >the >> >> >> >> >>>> >> >broker/agent >> >> >> >> >>>> >> >won't start cleanly. >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >In running 'hosted-engine --deploy', I'm seeing the >> >> >> >following >> >> >> >> >error >> >> >> >> >>>> >in >> >> >> >> >>>> >> >the >> >> >> >> >>>> >> >setup validation phase: >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:08,922+0000 DEBUG >> >> >> >> >otopi.plugins.otopi.dialog.human >> >> >> >> >>>> >> >dialog.__logString:204 DIALOG:SEND >> >Please >> >> >> >> >provide >> >> >> >> >>>> >the >> >> >> >> >>>> >> >hostname of this host on the management network >> >> >> >> >>>> >> >[ovirt-node-00.phoelex.com]: >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,831+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >> >>>> >> >hostname.getResolvedAddresses:432 >> >> >> >> >>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', >> >> >> >> >'192.168.1.61']) >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >> >>>> >> >hostname._validateFQDNresolvability:289 >> >> >> >> >ovirt-node-00.phoelex.com >> >> >> >> >>>> >> >resolves >> >> >> >> >>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >plugin.executeRaw:813 >> >> >> >> >>>> >> >execute: >> >> >> >> >>>> >> >['/usr/bin/dig', '+noall', '+answer', >> >> >> >> >'ovirt-node-00.phoelex.com', >> >> >> >> >>>> >> >'ANY'], >> >> >> >> >>>> >> >executable='None', cwd='None', env=None >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,871+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >plugin.executeRaw:863 >> >> >> >> >>>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', >> >' >> >> >> >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >plugin.execute:921 >> >> >> >> >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', >> >' >> >> >> >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >ovirt-node-00.phoelex.com. 86400 IN A >> >> >192.168.1.61 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >plugin.execute:926 >> >> >> >> >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', >> >' >> >> >> >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >plugin.executeRaw:813 >> >> >> >> >>>> >> >execute: >> >> >> >> >>>> >> >('/usr/sbin/ip', 'addr'), executable='None', >> >cwd='None', >> >> >> >> >env=None >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >plugin.executeRaw:863 >> >> >> >> >>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >plugin.execute:921 >> >> >> >> >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue >> >> >state >> >> >> >> >UNKNOWN >> >> >> >> >>>> >> >group >> >> >> >> >>>> >> >default qlen 1000 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > link/loopback 00:00:00:00:00:00 brd >> >00:00:00:00:00:00 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > inet 127.0.0.1/8 scope host lo >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > valid_lft forever preferred_lft forever >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > inet6 ::1/128 scope host >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > valid_lft forever preferred_lft forever >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 >> >qdisc >> >> >mq >> >> >> >> >master >> >> >> >> >>>> >> >ovirtmgmt state UP group default qlen 1000 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 >> >> >qdisc >> >> >> >mq >> >> >> >> >state >> >> >> >> >>>> >> >DOWN >> >> >> >> >>>> >> >group default qlen 1000 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc >> >noop >> >> >> >state >> >> >> >> >DOWN >> >> >> >> >>>> >> >group >> >> >> >> >>>> >> >default qlen 1000 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop >> >> >state >> >> >> >DOWN >> >> >> >> >>>> >group >> >> >> >> >>>> >> >default qlen 1000 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu >> >1500 >> >> >> >qdisc >> >> >> >> >>>> >noqueue >> >> >> >> >>>> >> >state UP group default qlen 1000 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope >> >global >> >> >> >> >ovirtmgmt >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > valid_lft forever preferred_lft forever >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > valid_lft forever preferred_lft forever >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc >> >> >noop >> >> >> >> >state >> >> >> >> >>>> >DOWN >> >> >> >> >>>> >> >group >> >> >> >> >>>> >> >default qlen 1000 >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >plugin.execute:926 >> >> >> >> >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >> >>>> >> >hostname.getLocalAddresses:251 >> >> >> >> >>>> >> >addresses: [u'192.168.1.61', >> >> >u'fe80::ae1f:6bff:febc:326a'] >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >> >hostname.test_hostname:464 >> >> >> >> >>>> >> >test_hostname exception >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >Traceback (most recent call last): >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >File >> >> >> >> >> >>"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >> >> >> >> >>>> >> >line >> >> >> >> >>>> >> >460, in test_hostname >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > not_local_text, >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >File >> >> >> >> >> >>"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", >> >> >> >> >>>> >> >line >> >> >> >> >>>> >> >342, in _validateFQDNresolvability >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > addresses=resolvedAddressesAsString >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to >> >> >> >> >>>> >64:ff9b::c0a8:13d >> >> >> >> >>>> >> >192.168.1.61 and not all of them can be mapped to non >> >> >> >loopback >> >> >> >> >>>> >devices >> >> >> >> >>>> >> >on >> >> >> >> >>>> >> >this host >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >2020-04-14 09:46:12,884+0000 ERROR >> >> >> >> >>>> >> >otopi.plugins.gr_he_common.network.bridge >> >> >> >> >dialog.queryEnvKey:120 >> >> >> >> >>>> >Host >> >> >> >> >>>> >> >name >> >> >> >> >>>> >> >is not valid: ovirt-node-00.phoelex.com resolves to >> >> >> >> >>>> >64:ff9b::c0a8:13d >> >> >> >> >>>> >> >192.168.1.61 and not all of them can be mapped to non >> >> >> >loopback >> >> >> >> >>>> >devices >> >> >> >> >>>> >> >on >> >> >> >> >>>> >> >this host >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >The node I'm running on has an IP address of .61 and >> >> >> >resolves >> >> >> >> >>>> >> >correctly. >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq >> >> >> >> >>>> ><shareef@jalloq.co.uk> >> >> >> >> >>>> >> >wrote: >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >> Where should I be checking if there are any >> >> >files/folder >> >> >> >not >> >> >> >> >owned >> >> >> >> >>>> >by >> >> >> >> >>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and >> >> >it's >> >> >> >> >fine. >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> How would I go about checking vdsm can access those >> >> >> >images? >> >> >> >> >If I >> >> >> >> >>>> >run >> >> >> >> >>>> >> >> virsh, it lists them and they were running yesterday >> >> >even >> >> >> >> >though >> >> >> >> >>>> >the >> >> >> >> >>>> >> >HA was >> >> >> >> >>>> >> >> down. I've since restarted both hosts but the >> >broker >> >> >is >> >> >> >> >still >> >> >> >> >>>> >> >spitting out >> >> >> >> >>>> >> >> the same error (copied below). How do I find the >> >> >reason >> >> >> >the >> >> >> >> >>>> >broker >> >> >> >> >>>> >> >can't >> >> >> >> >>>> >> >> connect to the storage? The conf file is already at >> >> >DEBUG >> >> >> >> >>>> >verbosity: >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> [handler_logfile] >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> class=logging.handlers.TimedRotatingFileHandler >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', >> >> >'d', >> >> >> >1, >> >> >> >> >7) >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> level=DEBUG >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> formatter=long >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> And what are all these .prob-<num> files that are >> >being >> >> >> >> >created? >> >> >> >> >>>> >> >There >> >> >> >> >>>> >> >> are over 250K of them now on the mount I'm using for >> >> >the >> >> >> >Data >> >> >> >> >>>> >domain. >> >> >> >> >>>> >> >> They're all of 0 size and of the form, >> >> >> >> >>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: >> >> >> >> >>>> >> >> >> >> >> >_volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> @eevans: The volume I have the Data Domain on has >> >TB's >> >> >> >free. >> >> >> >> > The >> >> >> >> >>>> >HA >> >> >> >> >>>> >> >is >> >> >> >> >>>> >> >> dead so I can't ssh in. No idea what started these >> >> >errors >> >> >> >> >and the >> >> >> >> >>>> >> >other >> >> >> >> >>>> >> >> VMs were still running happily although they're on a >> >> >> >> >different >> >> >> >> >>>> >Data >> >> >> >> >>>> >> >Domain. >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> Shareef. >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >> >> >>>> >> >> Connecting the storage >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >> Connecting storage server >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >> Connecting storage server >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >> Refreshing the storage domain >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> MainThread::WARNING::2020-04-10 >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >> >> >> >>>> >> >> Can't connect vdsm storage: Command >> >> >StorageDomain.getInfo >> >> >> >> >with >> >> >> >> >>>> >args >> >> >> >> >>>> >> >> {'storagedomainID': >> >> >> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >> >> >> >>>> >failed: >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> (code=350, message=Error in storage domain action: >> >> >> >> >>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov >> >> >> >> >>>> >> ><hunter86_bg@yahoo.com> >> >> >> >> >>>> >> >> wrote: >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef >> >Jalloq >> >> >< >> >> >> >> >>>> >> >>> shareef@jalloq.co.uk> wrote: >> >> >> >> >>>> >> >>> >OK, let's go through this. I'm looking at the >> >node >> >> >that >> >> >> >at >> >> >> >> >>>> >least >> >> >> >> >>>> >> >still >> >> >> >> >>>> >> >>> >has >> >> >> >> >>>> >> >>> >some VMs running. virsh also tells me that the >> >> >> >> >HostedEngine VM >> >> >> >> >>>> >is >> >> >> >> >>>> >> >>> >running >> >> >> >> >>>> >> >>> >but it's unresponsive and I can't shut it down. >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >1. All storage domains exist and are mounted. >> >> >> >> >>>> >> >>> >2. The ha_agent exists: >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls >> >> >> >> >>>> >> >/rhev/data-center/mnt/ >> >> >> >> >>>> >> >>> >nas-01.phoelex.com >> >> >> >> >>>> >> >>> >> >> >\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >dom_md ha_agent images master >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >3. There are two links >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll >> >> >> >> >>>> >> >/rhev/data-center/mnt/ >> >> >> >> >>>> >> >>> >nas-01.phoelex.com >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >>>>\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >total 8 >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >> >> >> >> >hosted-engine.lockspace >> >> >> >> >>>> >-> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >> >> >> >> >hosted-engine.metadata >> >> >> >> >>>> >-> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >4. The services exist but all seem to have some >> >sort >> >> >of >> >> >> >> >warning: >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com >> >> >> >sanlock[1728]: >> >> >> >> >>>> >> >*2020-04-08 >> >> >> >> >>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long >> >write >> >> >> >time >> >> >> >> >10 >> >> >> >> >>>> >sec* >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com >> >> >> >> >supervdsmd[29409]: >> >> >> >> >>>> >> >*failed >> >> >> >> >>>> >> >>> >to >> >> >> >> >>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open >> >> >> >shared >> >> >> >> >object >> >> >> >> >>>> >> >file: >> >> >> >> >>>> >> >>> >No >> >> >> >> >>>> >> >>> >such file or directory* >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com >> >> >vdsm[4801]: >> >> >> >> >*ERROR >> >> >> >> >>>> >> >failed >> >> >> >> >>>> >> >>> >to >> >> >> >> >>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such >> >> >file >> >> >> >or >> >> >> >> >>>> >> >directory'Is >> >> >> >> >>>> >> >>> >the >> >> >> >> >>>> >> >>> >Hosted Engine setup finished?* >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com >> >> >> >> >libvirtd[29307]: >> >> >> >> >>>> >> >2020-04-08 >> >> >> >> >>>> >> >>> >22:48:27.134+0000: 29309: warning : >> >> >> >qemuGetProcessInfo:1404 >> >> >> >> >: >> >> >> >> >>>> >> >cannot >> >> >> >> >>>> >> >>> >parse >> >> >> >> >>>> >> >>> >process status data >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com >> >> >> >libvirtd[29307]: >> >> >> >> >>>> >> >2020-04-08 >> >> >> >> >>>> >> >>> >22:48:27.134+0000: 29309: error : >> >> >> >> >virNetDevTapInterfaceStats:764 >> >> >> >> >>>> >: >> >> >> >> >>>> >> >>> >internal >> >> >> >> >>>> >> >>> >error: /proc/net/dev: Interface not found >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com >> >> >> >libvirtd[29307]: >> >> >> >> >>>> >> >2020-04-08 >> >> >> >> >>>> >> >>> >23:09:39.844+0000: 29307: error : >> >> >> >virNetSocketReadWire:1806 >> >> >> >> >: >> >> >> >> >>>> >End >> >> >> >> >>>> >> >of >> >> >> >> >>>> >> >>> >file >> >> >> >> >>>> >> >>> >while reading data: Input/output error >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com >> >> >> >libvirtd[29307]: >> >> >> >> >>>> >> >2020-04-09 >> >> >> >> >>>> >> >>> >01:05:26.660+0000: 29307: error : >> >> >> >virNetSocketReadWire:1806 >> >> >> >> >: >> >> >> >> >>>> >End >> >> >> >> >>>> >> >of >> >> >> >> >>>> >> >>> >file >> >> >> >> >>>> >> >>> >while reading data: Input/output error >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >5 & 6. The broker log is continually printing >> >this >> >> >> >error: >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >> >> >>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >> >> >>>> >> >>> >Running broker >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) >> >> >> >> >>>> >> >>> >Starting monitor >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Searching for submonitors in >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >/submonitors >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor network >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor cpu-load-no-engine >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor mgmt-bridge >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor network >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor cpu-load >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor engine-health >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor mgmt-bridge >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor cpu-load-no-engine >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor cpu-load >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor mem-free >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor storage-domain >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor storage-domain >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor mem-free >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Loaded submonitor engine-health >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >Finished loading submonitors >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) >> >> >> >> >>>> >> >>> >Starting storage broker >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >> >> >>>> >> >>> >Connecting to VDSM >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) >> >> >> >> >>>> >> >>> >Creating a new json-rpc connection to VDSM >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START >> >> >thread >> >> >> >> >>>> >> ><Thread(Client >> >> >> >> >>>> >> >>> >localhost:54321, started daemon 139992488138496)> >> >> >> >> >(func=<bound >> >> >> >> >>>> >> >method >> >> >> >> >>>> >> >>> >Reactor.process_requests of >> >> >> >> ><yajsonrpc.betterAsyncore.Reactor >> >> >> >> >>>> >> >object at >> >> >> >> >>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) >> >> >> >> >>>> >> >>> >Stomp connection established >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >> >> >>>> >> >Sending >> >> >> >> >>>> >> >>> >response >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >> >> >>>> >> >>> >Connecting the storage >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >>> >Connecting storage server >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >> >> >>>> >> >Sending >> >> >> >> >>>> >> >>> >response >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >> >> >>>> >> >Sending >> >> >> >> >>>> >> >>> >response >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) >> >> >> >> >>>> >> >>> >Storage domain >> >a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 >> >> >is >> >> >> >not >> >> >> >> >>>> >> >available >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >>> >Connecting storage server >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >> >> >>>> >> >Sending >> >> >> >> >>>> >> >>> >response >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >>> >[{u'status': 0, u'id': >> >> >> >> >u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::INFO::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >>> >Refreshing the storage domain >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >> >> >>>> >> >Sending >> >> >> >> >>>> >> >>> >response >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >>> >Error refreshing storage domain: Command >> >> >> >> >StorageDomain.getStats >> >> >> >> >>>> >> >with >> >> >> >> >>>> >> >>> >args >> >> >> >> >>>> >> >>> >{'storagedomainID': >> >> >> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >> >> >> >>>> >failed: >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >(code=350, message=Error in storage domain action: >> >> >> >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) >> >> >> >> >>>> >> >Sending >> >> >> >> >>>> >> >>> >response >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) >> >> >> >> >>>> >> >>> >Command StorageDomain.getInfo with args >> >> >> >{'storagedomainID': >> >> >> >> >>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >(code=350, message=Error in storage domain action: >> >> >> >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >MainThread::WARNING::2020-04-09 >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >> >> >> >>>> >> >>> >Can't connect vdsm storage: Command >> >> >> >StorageDomain.getInfo >> >> >> >> >with >> >> >> >> >>>> >args >> >> >> >> >>>> >> >>> >{'storagedomainID': >> >> >> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >> >> >> >>>> >failed: >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >(code=350, message=Error in storage domain action: >> >> >> >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >The UUID it is moaning about is indeed the one >> >that >> >> >the >> >> >> >HA >> >> >> >> >sits >> >> >> >> >>>> >on >> >> >> >> >>>> >> >and >> >> >> >> >>>> >> >>> >is >> >> >> >> >>>> >> >>> >the one I listed the contents of in step 2 above. >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >So why can't it see this domain? >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >Thanks, Shareef. >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov >> >> >> >> >>>> >> ><hunter86_bg@yahoo.com> >> >> >> >> >>>> >> >>> >wrote: >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef >> >> >Jalloq >> >> >> >< >> >> >> >> >>>> >> >>> >> shareef@jalloq.co.uk> wrote: >> >> >> >> >>>> >> >>> >> >Don't know if this is useful or not, but I just >> >> >tried >> >> >> >to >> >> >> >> >>>> >> >shutdown >> >> >> >> >>>> >> >>> >and >> >> >> >> >>>> >> >>> >> >start >> >> >> >> >>>> >> >>> >> >another VM on one of the hosts and get the >> >> >following >> >> >> >> >error: >> >> >> >> >>>> >> >>> >> > >> >> >> >> >>>> >> >>> >> >virsh # start scratch >> >> >> >> >>>> >> >>> >> > >> >> >> >> >>>> >> >>> >> >error: Failed to start domain scratch >> >> >> >> >>>> >> >>> >> > >> >> >> >> >>>> >> >>> >> >error: Network not found: no network with >> >matching >> >> >> >name >> >> >> >> >>>> >> >>> >> >'vdsm-ovirtmgmt' >> >> >> >> >>>> >> >>> >> > >> >> >> >> >>>> >> >>> >> >Is this not referring to the interface name as >> >the >> >> >> >> >network is >> >> >> >> >>>> >> >called >> >> >> >> >>>> >> >>> >> >'ovirtmgnt'. >> >> >> >> >>>> >> >>> >> > >> >> >> >> >>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >> >> >> >> >>>> >> >>> ><shareef@jalloq.co.uk> >> >> >> >> >>>> >> >>> >> >wrote: >> >> >> >> >>>> >> >>> >> > >> >> >> >> >>>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it >> >> >> >hasn't >> >> >> >> >come >> >> >> >> >>>> >up >> >> >> >> >>>> >> >and >> >> >> >> >>>> >> >>> >the >> >> >> >> >>>> >> >>> >> >> agent.log is full of the same errors. >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef >> >Jalloq >> >> >> >> >>>> >> >>> ><shareef@jalloq.co.uk> >> >> >> >> >>>> >> >>> >> >> wrote: >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >>>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it >> >using >> >> >> >virsh >> >> >> >> >on >> >> >> >> >>>> >the >> >> >> >> >>>> >> >>> >second >> >> >> >> >>>> >> >>> >> >host >> >> >> >> >>>> >> >>> >> >>> but my first host is still dead. >> >> >> >> >>>> >> >>> >> >>> >> >> >> >> >>>> >> >>> >> >>> First of all, what are these 56,317 .prob- >> >> >files >> >> >> >that >> >> >> >> >get >> >> >> >> >>>> >> >dumped >> >> >> >> >>>> >> >>> >to >> >> >> >> >>>> >> >>> >> >the >> >> >> >> >>>> >> >>> >> >>> NFS mounts? >> >> >> >> >>>> >> >>> >> >>> >> >> >> >> >>>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS >> >> >> >> >directories >> >> >> >> >>>> >at >> >> >> >> >>>> >> >boot? >> >> >> >> >>>> >> >>> >> >Is >> >> >> >> >>>> >> >>> >> >>> that the issue with this particular node? >> >> >> >> >>>> >> >>> >> >>> >> >> >> >> >>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM >> >> >> >> >>>> ><eevans@digitaldatatechs.com> >> >> >> >> >>>> >> >>> >wrote: >> >> >> >> >>>> >> >>> >> >>> >> >> >> >> >>>> >> >>> >> >>>> Did you try virsh list --inactive >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Eric Evans >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Digital Data Services LLC. >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> 304.660.9080 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> *From:* Shareef Jalloq >> ><shareef@jalloq.co.uk> >> >> >> >> >>>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >> >> >> >> >>>> >> >>> >> >>>> *To:* Strahil Nikolov >> ><hunter86_bg@yahoo.com> >> >> >> >> >>>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >> >> >> >> >>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine >> >> >> >> >unresponsive - >> >> >> >> >>>> >how >> >> >> >> >>>> >> >to >> >> >> >> >>>> >> >>> >> >rescue? >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> I've now shut down the VMs on one host and >> >> >> >rebooted >> >> >> >> >it >> >> >> >> >>>> >but >> >> >> >> >>>> >> >the >> >> >> >> >>>> >> >>> >> >agent >> >> >> >> >>>> >> >>> >> >>>> service doesn't start. If I run >> >> >'hosted-engine >> >> >> >> >>>> >--vm-status' >> >> >> >> >>>> >> >I >> >> >> >> >>>> >> >>> >get: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> The hosted engine configuration has not >> >been >> >> >> >> >retrieved >> >> >> >> >>>> >from >> >> >> >> >>>> >> >>> >shared >> >> >> >> >>>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent >> >is >> >> >> >> >running and >> >> >> >> >>>> >> >the >> >> >> >> >>>> >> >>> >> >storage >> >> >> >> >>>> >> >>> >> >>>> server is reachable. >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> and indeed if I list the mounts under >> >> >> >> >>>> >/rhev/data-center/mnt, >> >> >> >> >>>> >> >>> >only >> >> >> >> >>>> >> >>> >> >one of >> >> >> >> >>>> >> >>> >> >>>> the directories is mounted. I have 3 NFS >> >> >mounts, >> >> >> >> >one ISO >> >> >> >> >>>> >> >Domain >> >> >> >> >>>> >> >>> >> >and two >> >> >> >> >>>> >> >>> >> >>>> Data Domains. Only one Data Domain has >> >> >mounted >> >> >> >and >> >> >> >> >this >> >> >> >> >>>> >has >> >> >> >> >>>> >> >>> >lots >> >> >> >> >>>> >> >>> >> >of .prob >> >> >> >> >>>> >> >>> >> >>>> files in. So why haven't the other NFS >> >> >exports >> >> >> >been >> >> >> >> >>>> >> >mounted? >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have >> >> >> >helped >> >> >> >> >much >> >> >> >> >>>> >> >either. >> >> >> >> >>>> >> >>> >I >> >> >> >> >>>> >> >>> >> >can >> >> >> >> >>>> >> >>> >> >>>> start the broker service but the agent >> >service >> >> >> >says >> >> >> >> >no. >> >> >> >> >>>> >> >Same >> >> >> >> >>>> >> >>> >error >> >> >> >> >>>> >> >>> >> >as the >> >> >> >> >>>> >> >>> >> >>>> one in my last email. >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Shareef. >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef >> >Jalloq >> >> >> >> >>>> >> >>> >> ><shareef@jalloq.co.uk> >> >> >> >> >>>> >> >>> >> >>>> wrote: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Right, still down. I've run virsh and it >> >> >doesn't >> >> >> >> >know >> >> >> >> >>>> >> >anything >> >> >> >> >>>> >> >>> >> >about >> >> >> >> >>>> >> >>> >> >>>> the engine vm. >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> I've restarted the broker and agent >> >services >> >> >and >> >> >> >I >> >> >> >> >still >> >> >> >> >>>> >get >> >> >> >> >>>> >> >>> >> >nothing in >> >> >> >> >>>> >> >>> >> >>>> virsh->list. >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> In the logs under >> >> >/var/log/ovirt-hosted-engine-ha >> >> >> >I >> >> >> >> >see >> >> >> >> >>>> >lots >> >> >> >> >>>> >> >of >> >> >> >> >>>> >> >>> >> >errors: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> broker.log: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >> >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Searching for submonitors in >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor network >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor network >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor engine-health >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor mem-free >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor storage-domain >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor storage-domain >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor mem-free >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Loaded submonitor engine-health >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Finished loading submonitors >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >> >> >> >> >>>> >> >>> >> >>>> Connecting the storage >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >>> >> >>>> Connecting storage server >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >>> >> >>>> Connecting storage server >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >> >> >> >> >>>> >> >>> >> >>>> Refreshing the storage domain >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >> >> >> >> >>>> >> >>> >> >>>> Can't connect vdsm storage: Command >> >> >> >> >StorageDomain.getInfo >> >> >> >> >>>> >> >with >> >> >> >> >>>> >> >>> >args >> >> >> >> >>>> >> >>> >> >>>> {'storagedomainID': >> >> >> >> >>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >> >> >> >> >>>> >> >>> >failed: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> (code=350, message=Error in storage domain >> >> >> >action: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >> >> >> >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >> >> >> >> >>>> >> >>> >> >>>> Searching for submonitors in >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> agent.log: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >> >> >>>> >> >>> >> >>>> Trying to restart agent >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >> >> >>>> >> >>> >> >>>> Agent shutting down >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >> >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >> >> >> >> >>>> >> >>> >> >>>> Found certificate common name: >> >> >> >> >ovirt-node-01.phoelex.com >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >> >> >> >> >>>> >> >>> >> >>>> Initializing ha-broker connection >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >> >> >> >> >>>> >> >>> >> >>>> Starting monitor network, options >> >> >> >{'tcp_t_address': >> >> >> >> >'', >> >> >> >> >>>> >> >>> >> >'network_test': >> >> >> >> >>>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': >> >> >'192.168.1.99'} >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >> >> >> >> >>>> >> >>> >> >>>> Failed to start necessary monitors >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >> >> >>>> >> >>> >> >>>> Traceback (most recent call last): >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> File >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> >> >> >> >>>> >> >>> >> >>>> line 131, in _run_agent >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> return action(he) >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> File >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >> >> >> >> >>>> >> >>> >> >>>> line 55, in action_proper >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> return he.start_monitoring() >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> File >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> >> >> >> >>>> >> >>> >> >>>> line 432, in start_monitoring >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> self._initialize_broker() >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> File >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >> >> >> >> >>>> >> >>> >> >>>> line 556, in _initialize_broker >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> m.get('options', {})) >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> File >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> >> >> >> >>>> >> >>> >> >>>> line 89, in start_monitor >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start >> >> >> >monitor >> >> >> >> >via >> >> >> >> >>>> >> >>> >> >ovirt-ha-broker: >> >> >> >> >>>> >> >>> >> >>>> [Errno 2] No such file or directory, >> >[monitor: >> >> >> >> >'network', >> >> >> >> >>>> >> >>> >options: >> >> >> >> >>>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': >> >'dns', >> >> >> >> >>>> >'tcp_t_port': >> >> >> >> >>>> >> >'', >> >> >> >> >>>> >> >>> >> >'addr': >> >> >> >> >>>> >> >>> >> >>>> '192.168.1.99'}] >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >> >> >> >> >>>> >> >>> >> >>>> Trying to restart agent >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>>>>>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >> >> >> >> >>>> >> >>> >> >>>> Agent shutting down >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil >> >Nikolov >> >> >> >> >>>> >> >>> >> ><hunter86_bg@yahoo.com> >> >> >> >> >>>> >> >>> >> >>>> wrote: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, >> >"Maton, >> >> >> >> >Brett" < >> >> >> >> >>>> >> >>> >> >>>> matonb@ltresources.co.uk> wrote: >> >> >> >> >>>> >> >>> >> >>>> >On the host you tried to restart the >> >engine >> >> >on: >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >Add an alias to virsh (authenticates with >> >> >> >> >>>> >virsh_auth.conf) >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >alias virsh='virsh -c >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >> >> >>>>>>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >Then run virsh: >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >virsh >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >virsh # list >> >> >> >> >>>> >> >>> >> >>>> > Id Name >> >State >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>---------------------------------------------------- >> >> >> >> >>>> >> >>> >> >>>> > xx HostedEngine >> >Paused >> >> >> >> >>>> >> >>> >> >>>> > xx ********** >> >running >> >> >> >> >>>> >> >>> >> >>>> > ... >> >> >> >> >>>> >> >>> >> >>>> > xx ********** >> >> >running >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >HostedEngine should be in the list, try >> >and >> >> >> >resume >> >> >> >> >the >> >> >> >> >>>> >> >engine: >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >virsh # resume HostedEngine >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef >> >Jalloq >> >> >> >> >>>> >> >>> ><shareef@jalloq.co.uk> >> >> >> >> >>>> >> >>> >> >>>> >wrote: >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >> Thanks! >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM >> >> >being >> >> >> >> >>>> >down.... >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine >> >> >> >--vm-start >> >> >> >> >>>> >> >>> >> >>>> >> VM exists and is down, cleaning up and >> >> >> >restarting >> >> >> >> >>>> >> >>> >> >>>> >> VM in WaitForLaunch >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> but this doesn't seem to do anything. >> >OK, >> >> >> >after >> >> >> >> >a >> >> >> >> >>>> >while >> >> >> >> >>>> >> >I >> >> >> >> >>>> >> >>> >get a >> >> >> >> >>>> >> >>> >> >>>> >status of >> >> >> >> >>>> >> >>> >> >>>> >> it being barfed... >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: >> >1) >> >> >> >> >status >> >> >> >> >>>> >==-- >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : >> >True >> >> >> >> >>>> >> >>> >> >>>> >> Status up-to-date : >> >False >> >> >> >> >>>> >> >>> >> >>>> >> Hostname : >> >> >> >> >>>> >> >>> >ovirt-node-00.phoelex.com >> >> >> >> >>>> >> >>> >> >>>> >> Host ID : 1 >> >> >> >> >>>> >> >>> >> >>>> >> Engine status : >> >> >unknown >> >> >> >> >>>> >stale-data >> >> >> >> >>>> >> >>> >> >>>> >> Score : >> >3400 >> >> >> >> >>>> >> >>> >> >>>> >> stopped : >> >False >> >> >> >> >>>> >> >>> >> >>>> >> Local maintenance : >> >False >> >> >> >> >>>> >> >>> >> >>>> >> crc32 : >> >> >9c4a034b >> >> >> >> >>>> >> >>> >> >>>> >> local_conf_timestamp : >> >523362 >> >> >> >> >>>> >> >>> >> >>>> >> Host timestamp : >> >523608 >> >> >> >> >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >> >> >> >> >>>> >> >>> >> >>>> >> metadata_parse_version=1 >> >> >> >> >>>> >> >>> >> >>>> >> metadata_feature_version=1 >> >> >> >> >>>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 >> >2020) >> >> >> >> >>>> >> >>> >> >>>> >> host-id=1 >> >> >> >> >>>> >> >>> >> >>>> >> score=3400 >> >> >> >> >>>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 >> >> >> >16:13:06 >> >> >> >> >2020) >> >> >> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >> >> >> >> >>>> >> >>> >> >>>> >> maintenance=False >> >> >> >> >>>> >> >>> >> >>>> >> state=EngineDown >> >> >> >> >>>> >> >>> >> >>>> >> stopped=False >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: >> >2) >> >> >> >> >status >> >> >> >> >>>> >==-- >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : >> >True >> >> >> >> >>>> >> >>> >> >>>> >> Status up-to-date : >> >True >> >> >> >> >>>> >> >>> >> >>>> >> Hostname : >> >> >> >> >>>> >> >>> >ovirt-node-01.phoelex.com >> >> >> >> >>>> >> >>> >> >>>> >> Host ID : 2 >> >> >> >> >>>> >> >>> >> >>>> >> Engine status : >> >> >> >{"reason": >> >> >> >> >"bad >> >> >> >> >>>> >vm >> >> >> >> >>>> >> >>> >status", >> >> >> >> >>>> >> >>> >> >>>> >"health": >> >> >> >> >>>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", >> >"detail": >> >> >> >"Down"} >> >> >> >> >>>> >> >>> >> >>>> >> Score : 0 >> >> >> >> >>>> >> >>> >> >>>> >> stopped : >> >False >> >> >> >> >>>> >> >>> >> >>>> >> Local maintenance : >> >False >> >> >> >> >>>> >> >>> >> >>>> >> crc32 : >> >> >5045f2eb >> >> >> >> >>>> >> >>> >> >>>> >> local_conf_timestamp : >> >> >1737037 >> >> >> >> >>>> >> >>> >> >>>> >> Host timestamp : >> >> >1737283 >> >> >> >> >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >> >> >> >> >>>> >> >>> >> >>>> >> metadata_parse_version=1 >> >> >> >> >>>> >> >>> >> >>>> >> metadata_feature_version=1 >> >> >> >> >>>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 >> >> >2020) >> >> >> >> >>>> >> >>> >> >>>> >> host-id=2 >> >> >> >> >>>> >> >>> >> >>>> >> score=0 >> >> >> >> >>>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 >> >> >> >16:12:11 >> >> >> >> >>>> >2020) >> >> >> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >> >> >> >> >>>> >> >>> >> >>>> >> maintenance=False >> >> >> >> >>>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown >> >> >> >> >>>> >> >>> >> >>>> >> stopped=False >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, >> >Brett >> >> >> >> >>>> >> >>> >> >>>> ><matonb@ltresources.co.uk> >> >> >> >> >>>> >> >>> >> >>>> >> wrote: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >> >>>> >> >>> >> >>>> >>> First steps, on one of your hosts as >> >root: >> >> >> >> >>>> >> >>> >> >>>> >>> >> >> >> >> >>>> >> >>> >> >>>> >>> To get information: >> >> >> >> >>>> >> >>> >> >>>> >>> hosted-engine --vm-status >> >> >> >> >>>> >> >>> >> >>>> >>> >> >> >> >> >>>> >> >>> >> >>>> >>> To start the engine: >> >> >> >> >>>> >> >>> >> >>>> >>> hosted-engine --vm-start >> >> >> >> >>>> >> >>> >> >>>> >>> >> >> >> >> >>>> >> >>> >> >>>> >>> >> >> >> >> >>>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef >> >> >Jalloq >> >> >> >> >>>> >> >>> >> ><shareef@jalloq.co.uk> >> >> >> >> >>>> >> >>> >> >>>> >wrote: >> >> >> >> >>>> >> >>> >> >>>> >>> >> >> >> >> >>>> >> >>> >> >>>> >>>> So my engine has gone down and I can't >> >> >ssh >> >> >> >into >> >> >> >> >it >> >> >> >> >>>> >> >either. >> >> >> >> >>>> >> >>> >If >> >> >> >> >>>> >> >>> >> >I >> >> >> >> >>>> >> >>> >> >>>> >try to >> >> >> >> >>>> >> >>> >> >>>> >>>> log into the web-ui of the node it is >> >> >> >running >> >> >> >> >on, I >> >> >> >> >>>> >get >> >> >> >> >>>> >> >>> >> >redirected >> >> >> >> >>>> >> >>> >> >>>> >because >> >> >> >> >>>> >> >>> >> >>>> >>>> the node can't reach the engine. >> >> >> >> >>>> >> >>> >> >>>> >>>> >> >> >> >> >>>> >> >>> >> >>>> >>>> What are my next steps? >> >> >> >> >>>> >> >>> >> >>>> >>>> >> >> >> >> >>>> >> >>> >> >>>> >>>> Shareef. >> >> >> >> >>>> >> >>> >> >>>> >>>> >> >> >> >_______________________________________________ >> >> >> >> >>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org >> >> >> >> >>>> >> >>> >> >>>> >>>> To unsubscribe send an email to >> >> >> >> >>>> >users-leave@ovirt.org >> >> >> >> >>>> >> >>> >> >>>> >>>> Privacy Statement: >> >> >> >> >>>> >> >>> >https://www.ovirt.org/privacy-policy.html >> >> >> >> >>>> >> >>> >> >>>> >>>> oVirt Code of Conduct: >> >> >> >> >>>> >> >>> >> >>>> >>>> >> >> >> >> >>>> >> >> >> >>https://www.ovirt.org/community/about/community-guidelines/ >> >> >> >> >>>> >> >>> >> >>>> >>>> List Archives: >> >> >> >> >>>> >> >>> >> >>>> >>>> >> >> >> >> >>>> >> >>> >> >>>> > >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> > >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> > >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >> >> >> >> >>>> > >> >> >> >> >>>> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> > >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/ >> >> >> >> >>>> >> >>> >> >>>> >>>> >> >> >> >> >>>> >> >>> >> >>>> >>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> This has to be resolved: >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Engine status : >> >unknown >> >> >> >> >stale-data >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If >> >it >> >> >> >remains >> >> >> >> >the >> >> >> >> >>>> >> >same, >> >> >> >> >>>> >> >>> >> >restart >> >> >> >> >>>> >> >>> >> >>>> ovirt-ha-broker.service & >> >> >ovirt-ha-agent.service >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Verify that the engine's storage is >> >available. >> >> >> >Then >> >> >> >> >>>> >monitor >> >> >> >> >>>> >> >the >> >> >> >> >>>> >> >>> >> >broker >> >> >> >> >>>> >> >>> >> >>>> & agent logs in >> >> >/var/log/ovirt-hosted-engine-ha >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> Best Regards, >> >> >> >> >>>> >> >>> >> >>>> Strahil Nikolov >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >>>> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> Hi Shareef, >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> The flow of activation oVirt is more complex >> >than a >> >> >> >plain >> >> >> >> >KVM. >> >> >> >> >>>> >> >>> >> Mounting of the domains happen during the >> >> >activation >> >> >> >of >> >> >> >> >the >> >> >> >> >>>> >node >> >> >> >> >>>> >> >( >> >> >> >> >>>> >> >>> >the >> >> >> >> >>>> >> >>> >> HostedEngine is activating everything needed). >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> Focus on the HostedEngine VM. >> >> >> >> >>>> >> >>> >> Is it running properly ? >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> If not,try: >> >> >> >> >>>> >> >>> >> 1. Verify that the storage domain exists >> >> >> >> >>>> >> >>> >> 2. Check if it has 'ha_agents' directory >> >> >> >> >>>> >> >>> >> 3. Check if the links are OK, if not you can >> >> >safely >> >> >> >> >remove >> >> >> >> >>>> >the >> >> >> >> >>>> >> >links >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> 4. Next check the services are running: >> >> >> >> >>>> >> >>> >> A) sanlock >> >> >> >> >>>> >> >>> >> B) supervdsmd >> >> >> >> >>>> >> >>> >> C) vdsmd >> >> >> >> >>>> >> >>> >> D) libvirtd >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> 5. Increase the log level for broker and agent >> >> >> >services: >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha >> >> >> >> >>>> >> >>> >> vim *-log.conf >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> 6. Check what they are complaining about >> >> >> >> >>>> >> >>> >> Keep in mind that agent will keep throwing >> >errors >> >> >> >untill >> >> >> >> >the >> >> >> >> >>>> >> >broker >> >> >> >> >>>> >> >>> >stops >> >> >> >> >>>> >> >>> >> doing it (agent depends on broker), so broker >> >> >must >> >> >> >be >> >> >> >> >OK >> >> >> >> >>>> >before >> >> >> >> >>>> >> >>> >> peoceeding with the agent log. >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> About the manual VM start, you need 2 things: >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> 1. Define the VM network >> >> >> >> >>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> >> >> >> >> >>>> >> >>> >> <name>vdsm-ovirtmgmt</name> >> >> >> >> >>>> >> >>> >> >> ><uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >> >> >> >> >>>> >> >>> >> <forward mode='bridge'/> >> >> >> >> >>>> >> >>> >> <bridge name='ovirtmgmt'/> >> >> >> >> >>>> >> >>> >> </network> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh >> >define >> >> >> >> >>>> >> >vdsm-ovirtmgmt.xml >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> 2. Get an xml definition which can be found in >> >the >> >> >> >vdsm >> >> >> >> >log. >> >> >> >> >>>> >> >Every VM >> >> >> >> >>>> >> >>> >at >> >> >> >> >>>> >> >>> >> start up has it's configuration printed out in >> >> >vdsm >> >> >> >log >> >> >> >> >on >> >> >> >> >>>> >the >> >> >> >> >>>> >> >host >> >> >> >> >>>> >> >>> >it >> >> >> >> >>>> >> >>> >> starts. >> >> >> >> >>>> >> >>> >> Save to file and then: >> >> >> >> >>>> >> >>> >> A) virsh define myvm.xml >> >> >> >> >>>> >> >>> >> B) virsh start myvm >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> It seems there is/was a problem with your NFS >> >> >shares. >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> Best Regards, >> >> >> >> >>>> >> >>> >> Strahil Nikolov >> >> >> >> >>>> >> >>> >> >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> Hey Shareef, >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> Check if there are any files or folders not owned >> >by >> >> >> >> >vdsm:kvm . >> >> >> >> >>>> >> >Something >> >> >> >> >>>> >> >>> like this: >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> find . -not -user 36 -not -group 36 -print >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> Also check if vdsm can access the images in the >> >> >> >> >>>> >> >>> '<vol-mount-point>/images' directories. >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >>> Best Regards, >> >> >> >> >>>> >> >>> Strahil Nikolov >> >> >> >> >>>> >> >>> >> >> >> >> >>>> >> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? >> >> >> >> >>>> >> >> >> >> >> >>>> >> I don't see in the log output. >> >> >> >> >>>> >> >> >> >> >> >>>> >> Best Regards, >> >> >> >> >>>> >> Strahil Nikolov >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >>>> Based on your output , you got a PTR record for IPv4 & >> >> >IPv6 >> >> >> >... >> >> >> >> >most >> >> >> >> >>>> probably it's the reason. >> >> >> >> >>>> >> >> >> >> >>>> Set the IPv6 on the interface and try again. >> >> >> >> >>>> >> >> >> >> >>>> Best Regards, >> >> >> >> >>>> Strahil Nikolov >> >> >> >> >>>> >> >> >> >> >>> >> >> >> >> >> >> >> >> Do you have firewalld up and running on the host ? >> >> >> >> >> >> >> >> Best Regards, >> >> >> >> Strahil Nikolov >> >> >> >> >> >> >> >> >> >> I am guessing, but your interface is not asaigned to any zone , >> >> >right? >> >> >> Just add the interface to the default zone (usually 'public'). >> >> >> >> >> >> Best Regards, >> >> >> Strahil Nikolov >> >> >> >> >> >> >> Keep in mind that there are a lot of playbooks that can be used >> >to >> >> deploy a HostedEngine Environment via ansible. >> >> >> >> Keep in mind that if you plan to use oVirt in Prod, you need to know >> >how >> >> to debug it (at least on basic level). >> >> >> >> Best Regards, >> >> Strahil Nikolov >> >> >> >> It's really interesting that you mention that topic. >> The only way I managed to break my engine was: >> A) bad SELINUX rpm which was solved via reinstall of the package and >> relabel >> B) Interrupted patch, as I forgot to use screen >> >> I think it is Prod ready, but it requires knowledge as it is not >> as dummy-proof like VMware. Yet, oVirt is way more flexible allowing you >> to run your own scripts before/during/after a certain event (vdsm hooks). >> >> Sadly Ansible (this is what is used for setup of gluster -> >> gdeploy, and for the engine) is quite dynamic and sometimes something >> might break. >> >> If you feel that oVirt breaks too often - just set your engine on a >> separate physical or virtual (non-hosted) machine, but do not complain that >> a free open-source product is not Production ready, just because you don't >> know how to debug it. >> >> >> You can trial the downstream solutions from Red Hat & Oracle and you will >> notice the difference. For me oVirt is like Fedora compared to >> RHEL/OEL/CentOS, but this is just a personal opinion. >> >> Best Regards, >> Strahil Nikolov >> >

Sorry, I realised I copied the wrong NFS domain error. This is the final one after I had deleted the old domain: 2020-04-16 12:00:49,423+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Add NFS storage domain] 2020-04-16 12:00:51,827+0000 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'invocation': {u'module_args': {u'comment': None, u'warning_low_space': None, u'gluster fs': None, u'localfs': None, u'managed_block_storage': None, u'data_center': u'Default', u'id': None, u'iscsi': None, u'state': u'unattached', u'wipe_after_delete': None, u'destroy': None, u'fcp': None, u 'description': None, u'format': None, u'nested_attributes': [], u'host': u' ovirt-node-00.phoelex.com', u'discard_after_delete': None, u'wait': True, u'domain_function': u'data', u'name': u'hosted_storage' , u'critical_space_action_blocker': None, u'posixfs': None, u'poll_interval': 3, u'fetch_nested': False, u'nfs': {u'path': u'/volume2/vmstore', u'version': u'auto', u'mount_options': u'', u'address': u'na s-01.phoelex.com'}, u'timeout': 180, u'backup': None}}, u'msg': u'Fault reason is "Operation Failed". Fault detail is "[Error creating a storage domain]". HTTP response code is 400.', u'exception': u'Trac eback (most recent call last):\n File "/tmp/ansible_ovirt_storage_domain_payload_BWmbrq/ansible_ovirt_storage_domain_payload.zip/ansible/modules/cloud/ovirt/ovirt_storage_domain.py", line 792, in main\n File "/tmp/ansible_ovirt_storage_domain_payload_BWmbrq/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py", line 621, in create\n **kwargs\n File "/usr/lib64/python2.7/site-packag es/ovirtsdk4/services.py", line 25168, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 232, in _internal _add\n return future.wait() if wait else future\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in wait\n return self._code(response)\n File "/usr/lib64/python2.7/site-p ackages/ovirtsdk4/service.py", line 229, in callback\n self._check_fault(response)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, in _check_fault\n self._raise_error(re sponse, body)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 118, in _raise_error\n raise error\nError: Fault reason is "Operation Failed". Fault detail is "[Error creating a s torage domain]". HTTP response code is 400.\n', u'changed': False, u'_ansible_no_log': False} 2020-04-16 12:00:51,927+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 Error: Fault reason is "Operation Failed". Fault detail is "[Error creating a storage dom ain]". HTTP response code is 400. 2020-04-16 12:00:52,028+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Fai led\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400."} On Thu, Apr 16, 2020 at 2:28 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, to wrap up this thread and to provide some detail as to how it concluded....
I now have the engine back up and running. The final issue was that I needed to re-create the Synology share that the HE was stored on. I got past all my issues and up to the domain setup stage. Obviously I couldn't install to the same share with the original domain still there so I deleted that (luckily other VMs are in a different share) but the install still failed with a storage domain creation error. So I created a new share and the install could now progress. So I have no idea what happened but I seem to have suffered some sort of failure of a shared folder on my Synology that caused issues with oVirt. I could still mount the folder manually and create/edit/delete files, and the engine was generating 100K's worth of .prob-* files, but it was somehow corrupt? I've copied the ansible error at the end of this mail.
The other issue was the IPv6 one which was strange. I didn't see this when I first installed and setup oVirt but perhaps something changed somewhere in our setup. So as Strahil pointed out, I needed the IPV6ADDR entry in my ifcfg-eno1 for the interface being used for the node. The pain is that this is overwritten by the deployment so if that fails, you have to re-add it. So my interface config looks like this now:
# Generated by VDSM version 4.30.40.1
DEVICE=eno1
BRIDGE=ovirtmgmt
ONBOOT=yes
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=no
IPV6ADDR=64:ff9b::c0a8:13d
Then I had the strange firewalld issue that I can't explain. I re-installed the node from scratch to resolve that as I'd lost patience.
So thanks for all the help and I hope I never have to do that again. :-)
Shareef.
Ansible storage domain error:
2020-04-16 11:11:55,872+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Add NFS storage domain]
2020-04-16 11:11:58,777+0000 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'invocation': {u'module_args': {u'comment': None, u'warning_low_space': None, u'gluster
fs': None, u'localfs': None, u'managed_block_storage': None, u'data_center': u'Default', u'id': None, u'iscsi': None, u'state': u'unattached', u'wipe_after_delete': None, u'destroy': None, u'fcp': None, u
'description': None, u'format': None, u'nested_attributes': [], u'host': u' ovirt-node-00.phoelex.com', u'discard_after_delete': None, u'wait': True, u'domain_function': u'data', u'name': u'hosted_storage'
, u'critical_space_action_blocker': None, u'posixfs': None, u'poll_interval': 3, u'fetch_nested': False, u'nfs': {u'path': u'/volume1/ovirt', u'version': u'auto', u'mount_options': u'', u'address': u'nas-
01.phoelex.com'}, u'timeout': 180, u'backup': None}}, u'msg': u'Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably c
ontains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.', u'exception': u'Traceback (most recent call last):\n
File "/tmp/ansible_ovirt_storage_domain_payload_6uM8mE/ansible_ovirt_storage_domain_payload.zip/ansible/modules/cloud/ovirt/ovirt_storage_domain.py", line 792, in main\n File "/tmp/ansible_ovirt_storag
e_domain_payload_6uM8mE/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py", line 621, in create\n **kwargs\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line
25168, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 232, in _internal_add\n return future.wait() i
f wait else future\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in wait\n return self._code(response)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", l
ine 229, in callback\n self._check_fault(response)\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, in _check_fault\n self._raise_error(response, body)\n File "/usr/lib6
4/python2.7/site-packages/ovirtsdk4/service.py", line 118, in _raise_error\n raise error\nError: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected s
torage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.\n', u'changed': Fa
lse, u'_ansible_no_log': False}
2020-04-16 11:11:58,877+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.
2020-04-16 11:11:58,978+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]\". HTTP response code is 400."}
On Thu, Apr 16, 2020 at 12:14 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Actually, you've just raised a point I hadn't thought about. We have an old Xeon server that is being used to host some ESXi VMs that were needed while we transitioned to ovirt. Once I have moved those VMs I could repurpose that as the engine.
On Thu, Apr 16, 2020 at 11:42 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 16, 2020 11:25:20 AM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote:
Is this actually production ready? It seems to break at every step.
On Wed, Apr 15, 2020 at 5:45 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Thanks for your help but I've decided to try and reinstall from scratch. This is taking too long.
On Wed, Apr 15, 2020 at 3:25 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq < > shareef@jalloq.co.uk> wrote: > >Yes, but there are no zones set up, just ports 22, 6801 adn 6900. > > > >On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov > ><hunter86_bg@yahoo.com> > >wrote: > > > >> On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < > >> shareef@jalloq.co.uk> wrote: > >> >Oh this is painful. It seems to progress if you have both > >> >he_force_ipv4 > >> >set and run the deployment with the '--4' switch. > >> > > >> >But then I get a failure when the ansible script checks for > >> >firewalld-zones > >> >and doesn't get anything back. Should the deployment flow not be > >> >setting > >> >any zones it needs? > >> > > >> >2020-04-15 10:57:25,439+0000 INFO > >> >otopi.ovirt_hosted_engine_setup.ansible_utils > >> >ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : > >Get > >> >active list of active firewalld zones] > >> > > >> >2020-04-15 10:57:26,641+0000 DEBUG > >> >otopi.ovirt_hosted_engine_setup.ansible_utils > >> >ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': > >> >True, > >> >u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, > >> >u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd > >> >--get-active-zones | grep -v "^\\s*interfaces"', u'start': > >u'2020-04-15 > >> >10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', > >u'rc': > >> >1, > >> >u'invocation': {u'module_args': {u'creates': None, u'executable': > >None, > >> >u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': > >u'set > >> >-euo > >> >pipefail && firewall-cmd --get-active-zones | grep -v > >> >"^\\s*interfaces"', > >> >u'removes': None, u'argv': None, u'warn': True, u'chdir': None, > >> >u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], > >> >u'msg': > >> >u'non-zero return code'} > >> > > >> >2020-04-15 10:57:26,741+0000 ERROR > >> >otopi.ovirt_hosted_engine_setup.ansible_utils > >> >ansible_utils._process_output:107 fatal: [localhost]: FAILED! => > >> >{"changed": true, "cmd": "set -euo pipefail && firewall-cmd > >> >--get-active-zones | grep -v \"^\\s*interfaces\"", "delta": > >> >"0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": > >"non-zero > >> >return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", > >"stderr": > >> >"", > >> >"stderr_lines": [], "stdout": "", "stdout_lines": []} > >> > > >> >On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >wrote: > >> > > >> >> Ha, spoke too soon. It's now stuck in a loop and a google points > >me > >> >at > >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1746585 > >> >> > >> >> However, forcing ipv4 doesn't seem to have fixed the loop. > >> >> > >> >> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >> wrote: > >> >> > >> >>> OK, that seems to have fixed it, thanks. Is this a side effect > >of > >> >>> redeploying the HE over a first time install? Nothing has changed > >in > >> >our > >> >>> setup and I didn't need to do this when I initially set up our > >> >nodes. > >> >>> > >> >>> > >> >>> > >> >>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov > >> ><hunter86_bg@yahoo.com> > >> >>> wrote: > >> >>> > >> >>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < > >> >>>> shareef@jalloq.co.uk> wrote: > >> >>>> >Hmmm, we're not using ipv6. Is that the issue? > >> >>>> > > >> >>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov > >> ><hunter86_bg@yahoo.com> > >> >>>> >wrote: > >> >>>> > > >> >>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < > >> >>>> >> shareef@jalloq.co.uk> wrote: > >> >>>> >> >Right, I've given up on recovering the HE so want to
On April 15, 2020 5:59:46 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: try
and > >> >>>> >redeploy > >> >>>> >> >it. > >> >>>> >> >There doesn't seem to be enough information to debug why the > >> >>>> >> >broker/agent > >> >>>> >> >won't start cleanly. > >> >>>> >> > > >> >>>> >> >In running 'hosted-engine --deploy', I'm seeing the > >following > >> >error > >> >>>> >in > >> >>>> >> >the > >> >>>> >> >setup validation phase: > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:08,922+0000 DEBUG > >> >otopi.plugins.otopi.dialog.human > >> >>>> >> >dialog.__logString:204 DIALOG:SEND Please > >> >provide > >> >>>> >the > >> >>>> >> >hostname of this host on the management network > >> >>>> >> >[ovirt-node-00.phoelex.com]: > >> >>>> >> > > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,831+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >> >>>> >> >hostname.getResolvedAddresses:432 > >> >>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', > >> >'192.168.1.61']) > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >> >>>> >> >hostname._validateFQDNresolvability:289 > >> >ovirt-node-00.phoelex.com > >> >>>> >> >resolves > >> >>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >plugin.executeRaw:813 > >> >>>> >> >execute: > >> >>>> >> >['/usr/bin/dig', '+noall', '+answer', > >> >'ovirt-node-00.phoelex.com', > >> >>>> >> >'ANY'], > >> >>>> >> >executable='None', cwd='None', env=None > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,871+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >plugin.executeRaw:863 > >> >>>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' > >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 > >> >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: > >> >>>> >> > > >> >>>> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 > >> >>>> >> > > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 > >> >>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' > >> >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: > >> >>>> >> > > >> >>>> >> > > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >plugin.executeRaw:813 > >> >>>> >> >execute: > >> >>>> >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', > >> >env=None > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >plugin.executeRaw:863 > >> >>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921 > >> >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: > >> >>>> >> > > >> >>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state > >> >UNKNOWN > >> >>>> >> >group > >> >>>> >> >default qlen 1000 > >> >>>> >> > > >> >>>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > >> >>>> >> > > >> >>>> >> > inet 127.0.0.1/8 scope host lo > >> >>>> >> > > >> >>>> >> > valid_lft forever preferred_lft forever > >> >>>> >> > > >> >>>> >> > inet6 ::1/128 scope host > >> >>>> >> > > >> >>>> >> > valid_lft forever preferred_lft forever > >> >>>> >> > > >> >>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq > >> >master > >> >>>> >> >ovirtmgmt state UP group default qlen 1000 > >> >>>> >> > > >> >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > >> >>>> >> > > >> >>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc > >mq > >> >state > >> >>>> >> >DOWN > >> >>>> >> >group default qlen 1000 > >> >>>> >> > > >> >>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff > >> >>>> >> > > >> >>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop > >state > >> >DOWN > >> >>>> >> >group > >> >>>> >> >default qlen 1000 > >> >>>> >> > > >> >>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff > >> >>>> >> > > >> >>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state > >DOWN > >> >>>> >group > >> >>>> >> >default qlen 1000 > >> >>>> >> > > >> >>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff > >> >>>> >> > > >> >>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 > >qdisc > >> >>>> >noqueue > >> >>>> >> >state UP group default qlen 1000 > >> >>>> >> > > >> >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff > >> >>>> >> > > >> >>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global > >> >ovirtmgmt > >> >>>> >> > > >> >>>> >> > valid_lft forever preferred_lft forever > >> >>>> >> > > >> >>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link > >> >>>> >> > > >> >>>> >> > valid_lft forever preferred_lft forever > >> >>>> >> > > >> >>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop > >> >state > >> >>>> >DOWN > >> >>>> >> >group > >> >>>> >> >default qlen 1000 > >> >>>> >> > > >> >>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff > >> >>>> >> > > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926 > >> >>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: > >> >>>> >> > > >> >>>> >> > > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >> >>>> >> >hostname.getLocalAddresses:251 > >> >>>> >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >> >hostname.test_hostname:464 > >> >>>> >> >test_hostname exception > >> >>>> >> > > >> >>>> >> >Traceback (most recent call last): > >> >>>> >> > > >> >>>> >> >File > >> "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >> >>>> >> >line > >> >>>> >> >460, in test_hostname > >> >>>> >> > > >> >>>> >> > not_local_text, > >> >>>> >> > > >> >>>> >> >File > >> "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", > >> >>>> >> >line > >> >>>> >> >342, in _validateFQDNresolvability > >> >>>> >> > > >> >>>> >> > addresses=resolvedAddressesAsString > >> >>>> >> > > >> >>>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to > >> >>>> >64:ff9b::c0a8:13d > >> >>>> >> >192.168.1.61 and not all of them can be mapped to non > >loopback > >> >>>> >devices > >> >>>> >> >on > >> >>>> >> >this host > >> >>>> >> > > >> >>>> >> >2020-04-14 09:46:12,884+0000 ERROR > >> >>>> >> >otopi.plugins.gr_he_common.network.bridge > >> >dialog.queryEnvKey:120 > >> >>>> >Host > >> >>>> >> >name > >> >>>> >> >is not valid: ovirt-node-00.phoelex.com resolves to > >> >>>> >64:ff9b::c0a8:13d > >> >>>> >> >192.168.1.61 and not all of them can be mapped to non > >loopback > >> >>>> >devices > >> >>>> >> >on > >> >>>> >> >this host > >> >>>> >> > > >> >>>> >> >The node I'm running on has an IP address of .61 and > >resolves > >> >>>> >> >correctly. > >> >>>> >> > > >> >>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq > >> >>>> ><shareef@jalloq.co.uk> > >> >>>> >> >wrote: > >> >>>> >> > > >> >>>> >> >> Where should I be checking if there are any files/folder > >not > >> >owned > >> >>>> >by > >> >>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and it's > >> >fine. > >> >>>> >> >> > >> >>>> >> >> How would I go about checking vdsm can access those > >images? > >> >If I > >> >>>> >run > >> >>>> >> >> virsh, it lists them and they were running yesterday even > >> >though > >> >>>> >the > >> >>>> >> >HA was > >> >>>> >> >> down. I've since restarted both hosts but the broker is > >> >still > >> >>>> >> >spitting out > >> >>>> >> >> the same error (copied below). How do I find the reason > >the > >> >>>> >broker > >> >>>> >> >can't > >> >>>> >> >> connect to the storage? The conf file is already at DEBUG > >> >>>> >verbosity: > >> >>>> >> >> > >> >>>> >> >> [handler_logfile] > >> >>>> >> >> > >> >>>> >> >> class=logging.handlers.TimedRotatingFileHandler > >> >>>> >> >> > >> >>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', > >1, > >> >7) > >> >>>> >> >> > >> >>>> >> >> level=DEBUG > >> >>>> >> >> > >> >>>> >> >> formatter=long > >> >>>> >> >> > >> >>>> >> >> And what are all these .prob-<num> files that are being > >> >created? > >> >>>> >> >There > >> >>>> >> >> are over 250K of them now on the mount I'm using for the > >Data > >> >>>> >domain. > >> >>>> >> >> They're all of 0 size and of the form, > >> >>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: > >> >>>> >> >> > >_volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 > >> >>>> >> >> > >> >>>> >> >> @eevans: The volume I have the Data Domain on has TB's > >free. > >> > The > >> >>>> >HA > >> >>>> >> >is > >> >>>> >> >> dead so I can't ssh in. No idea what started these errors > >> >and the > >> >>>> >> >other > >> >>>> >> >> VMs were still running happily although they're on a > >> >different > >> >>>> >Data > >> >>>> >> >Domain. > >> >>>> >> >> > >> >>>> >> >> Shareef. > >> >>>> >> >> > >> >>>> >> >> MainThread::INFO::2020-04-10 > >> >>>> >> >> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> >>>> >> >> Connecting the storage > >> >>>> >> >> > >> >>>> >> >> MainThread::INFO::2020-04-10 > >> >>>> >> >> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >> Connecting storage server > >> >>>> >> >> > >> >>>> >> >> MainThread::INFO::2020-04-10 > >> >>>> >> >> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >> Connecting storage server > >> >>>> >> >> > >> >>>> >> >> MainThread::INFO::2020-04-10 > >> >>>> >> >> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >> Refreshing the storage domain > >> >>>> >> >> > >> >>>> >> >> MainThread::WARNING::2020-04-10 > >> >>>> >> >> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >> >>>> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo > >> >with > >> >>>> >args > >> >>>> >> >> {'storagedomainID': > >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >> >>>> >failed: > >> >>>> >> >> > >> >>>> >> >> (code=350, message=Error in storage domain action: > >> >>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>>> >> >> > >> >>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov > >> >>>> >> ><hunter86_bg@yahoo.com> > >> >>>> >> >> wrote: > >> >>>> >> >> > >> >>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < > >> >>>> >> >>> shareef@jalloq.co.uk> wrote: > >> >>>> >> >>> >OK, let's go through this. I'm looking at the node that > >at > >> >>>> >least > >> >>>> >> >still > >> >>>> >> >>> >has > >> >>>> >> >>> >some VMs running. virsh also tells me that the > >> >HostedEngine VM > >> >>>> >is > >> >>>> >> >>> >running > >> >>>> >> >>> >but it's unresponsive and I can't shut it down. > >> >>>> >> >>> > > >> >>>> >> >>> >1. All storage domains exist and are mounted. > >> >>>> >> >>> >2. The ha_agent exists: > >> >>>> >> >>> > > >> >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls > >> >>>> >> >/rhev/data-center/mnt/ > >> >>>> >> >>> >nas-01.phoelex.com > >> >>>> >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ > >> >>>> >> >>> > > >> >>>> >> >>> >dom_md ha_agent images master > >> >>>> >> >>> > > >> >>>> >> >>> >3. There are two links > >> >>>> >> >>> > > >> >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll > >> >>>> >> >/rhev/data-center/mnt/ > >> >>>> >> >>> >nas-01.phoelex.com > >> >>>> >> >>> > >> >>>> > >\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ > >> >>>> >> >>> > > >> >>>> >> >>> >total 8 > >> >>>> >> >>> > > >> >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 > >> >hosted-engine.lockspace > >> >>>> >-> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 > >> >>>> >> >>> > > >> >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 > >> >hosted-engine.metadata > >> >>>> >-> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 > >> >>>> >> >>> > > >> >>>> >> >>> >4. The services exist but all seem to have some sort of > >> >warning: > >> >>>> >> >>> > > >> >>>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com > >sanlock[1728]: > >> >>>> >> >*2020-04-08 > >> >>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write > >time > >> >10 > >> >>>> >sec* > >> >>>> >> >>> > > >> >>>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com > >> >supervdsmd[29409]: > >> >>>> >> >*failed > >> >>>> >> >>> >to > >> >>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open > >shared > >> >object > >> >>>> >> >file: > >> >>>> >> >>> >No > >> >>>> >> >>> >such file or directory* > >> >>>> >> >>> > > >> >>>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: > >> >*ERROR > >> >>>> >> >failed > >> >>>> >> >>> >to > >> >>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file > >or > >> >>>> >> >directory'Is > >> >>>> >> >>> >the > >> >>>> >> >>> >Hosted Engine setup finished?* > >> >>>> >> >>> > > >> >>>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com > >> >libvirtd[29307]: > >> >>>> >> >2020-04-08 > >> >>>> >> >>> >22:48:27.134+0000: 29309: warning : > >qemuGetProcessInfo:1404 > >> >: > >> >>>> >> >cannot > >> >>>> >> >>> >parse > >> >>>> >> >>> >process status data > >> >>>> >> >>> > > >> >>>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com > >libvirtd[29307]: > >> >>>> >> >2020-04-08 > >> >>>> >> >>> >22:48:27.134+0000: 29309: error : > >> >virNetDevTapInterfaceStats:764 > >> >>>> >: > >> >>>> >> >>> >internal > >> >>>> >> >>> >error: /proc/net/dev: Interface not found > >> >>>> >> >>> > > >> >>>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com > >libvirtd[29307]: > >> >>>> >> >2020-04-08 > >> >>>> >> >>> >23:09:39.844+0000: 29307: error : > >virNetSocketReadWire:1806 > >> >: > >> >>>> >End > >> >>>> >> >of > >> >>>> >> >>> >file > >> >>>> >> >>> >while reading data: Input/output error > >> >>>> >> >>> > > >> >>>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com > >libvirtd[29307]: > >> >>>> >> >2020-04-09 > >> >>>> >> >>> >01:05:26.660+0000: 29307: error : > >virNetSocketReadWire:1806 > >> >: > >> >>>> >End > >> >>>> >> >of > >> >>>> >> >>> >file > >> >>>> >> >>> >while reading data: Input/output error > >> >>>> >> >>> > > >> >>>> >> >>> >5 & 6. The broker log is continually printing this > >error: > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>>> >> >>> >Running broker > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) > >> >>>> >> >>> >Starting monitor > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Searching for submonitors in > >> >>>> >> >>> > >> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker > >> >>>> >> >>> > > >> >>>> >> >>> >/submonitors > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor network > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor cpu-load-no-engine > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor mgmt-bridge > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor network > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor cpu-load > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor engine-health > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor mgmt-bridge > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor cpu-load-no-engine > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor cpu-load > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor mem-free > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor storage-domain > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor storage-domain > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor mem-free > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Loaded submonitor engine-health > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >Finished loading submonitors > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) > >> >>>> >> >>> >Starting storage broker > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> >>>> >> >>> >Connecting to VDSM > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) > >> >>>> >> >>> >Creating a new json-rpc connection to VDSM > >> >>>> >> >>> > > >> >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 > >> >>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START thread > >> >>>> >> ><Thread(Client > >> >>>> >> >>> >localhost:54321, started daemon 139992488138496)> > >> >(func=<bound > >> >>>> >> >method > >> >>>> >> >>> >Reactor.process_requests of > >> ><yajsonrpc.betterAsyncore.Reactor > >> >>>> >> >object at > >> >>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) > >> >>>> >> >>> > > >> >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) > >> >>>> >> >>> >Stomp connection established > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >>>> >> >Sending > >> >>>> >> >>> >response > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> >>>> >> >>> >Connecting the storage > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >>> >Connecting storage server > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >>>> >> >Sending > >> >>>> >> >>> >response > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >>>> >> >Sending > >> >>>> >> >>> >response > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) > >> >>>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is > >not > >> >>>> >> >available > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >>> >Connecting storage server > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >>>> >> >Sending > >> >>>> >> >>> >response > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >>> >[{u'status': 0, u'id': > >> >u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::INFO::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >>> >Refreshing the storage domain > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >>>> >> >Sending > >> >>>> >> >>> >response > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >>> >Error refreshing storage domain: Command > >> >StorageDomain.getStats > >> >>>> >> >with > >> >>>> >> >>> >args > >> >>>> >> >>> >{'storagedomainID': > >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >> >>>> >failed: > >> >>>> >> >>> > > >> >>>> >> >>> >(code=350, message=Error in storage domain action: > >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) > >> >>>> >> >Sending > >> >>>> >> >>> >response > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::DEBUG::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) > >> >>>> >> >>> >Command StorageDomain.getInfo with args > >{'storagedomainID': > >> >>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > >> >>>> >> >>> > > >> >>>> >> >>> >(code=350, message=Error in storage domain action: > >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>>> >> >>> > > >> >>>> >> >>> >MainThread::WARNING::2020-04-09 > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >> >>>> >> >>> >Can't connect vdsm storage: Command > >StorageDomain.getInfo > >> >with > >> >>>> >args > >> >>>> >> >>> >{'storagedomainID': > >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >> >>>> >failed: > >> >>>> >> >>> > > >> >>>> >> >>> >(code=350, message=Error in storage domain action: > >> >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>>> >> >>> > > >> >>>> >> >>> > > >> >>>> >> >>> >The UUID it is moaning about is indeed the one that the > >HA > >> >sits > >> >>>> >on > >> >>>> >> >and > >> >>>> >> >>> >is > >> >>>> >> >>> >the one I listed the contents of in step 2 above. > >> >>>> >> >>> > > >> >>>> >> >>> > > >> >>>> >> >>> >So why can't it see this domain? > >> >>>> >> >>> > > >> >>>> >> >>> > > >> >>>> >> >>> >Thanks, Shareef. > >> >>>> >> >>> > > >> >>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov > >> >>>> >> ><hunter86_bg@yahoo.com> > >> >>>> >> >>> >wrote: > >> >>>> >> >>> > > >> >>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq > >< > >> >>>> >> >>> >> shareef@jalloq.co.uk> wrote: > >> >>>> >> >>> >> >Don't know if this is useful or not, but I just tried > >to > >> >>>> >> >shutdown > >> >>>> >> >>> >and > >> >>>> >> >>> >> >start > >> >>>> >> >>> >> >another VM on one of the hosts and get the following > >> >error: > >> >>>> >> >>> >> > > >> >>>> >> >>> >> >virsh # start scratch > >> >>>> >> >>> >> > > >> >>>> >> >>> >> >error: Failed to start domain scratch > >> >>>> >> >>> >> > > >> >>>> >> >>> >> >error: Network not found: no network with matching > >name > >> >>>> >> >>> >> >'vdsm-ovirtmgmt' > >> >>>> >> >>> >> > > >> >>>> >> >>> >> >Is this not referring to the interface name as the > >> >network is > >> >>>> >> >called > >> >>>> >> >>> >> >'ovirtmgnt'. > >> >>>> >> >>> >> > > >> >>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq > >> >>>> >> >>> ><shareef@jalloq.co.uk> > >> >>>> >> >>> >> >wrote: > >> >>>> >> >>> >> > > >> >>>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it > >hasn't > >> >come > >> >>>> >up > >> >>>> >> >and > >> >>>> >> >>> >the > >> >>>> >> >>> >> >> agent.log is full of the same errors. > >> >>>> >> >>> >> >> > >> >>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq > >> >>>> >> >>> ><shareef@jalloq.co.uk> > >> >>>> >> >>> >> >> wrote: > >> >>>> >> >>> >> >> > >> >>>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using > >virsh > >> >on > >> >>>> >the > >> >>>> >> >>> >second > >> >>>> >> >>> >> >host > >> >>>> >> >>> >> >>> but my first host is still dead. > >> >>>> >> >>> >> >>> > >> >>>> >> >>> >> >>> First of all, what are these 56,317 .prob- files > >that > >> >get > >> >>>> >> >dumped > >> >>>> >> >>> >to > >> >>>> >> >>> >> >the > >> >>>> >> >>> >> >>> NFS mounts? > >> >>>> >> >>> >> >>> > >> >>>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS > >> >directories > >> >>>> >at > >> >>>> >> >boot? > >> >>>> >> >>> >> >Is > >> >>>> >> >>> >> >>> that the issue with this particular node? > >> >>>> >> >>> >> >>> > >> >>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM > >> >>>> ><eevans@digitaldatatechs.com> > >> >>>> >> >>> >wrote: > >> >>>> >> >>> >> >>> > >> >>>> >> >>> >> >>>> Did you try virsh list --inactive > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Eric Evans > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Digital Data Services LLC. > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> 304.660.9080 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> > >> >>>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM > >> >>>> >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> > >> >>>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> > >> >>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine > >> >unresponsive - > >> >>>> >how > >> >>>> >> >to > >> >>>> >> >>> >> >rescue? > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> I've now shut down the VMs on one host and > >rebooted > >> >it > >> >>>> >but > >> >>>> >> >the > >> >>>> >> >>> >> >agent > >> >>>> >> >>> >> >>>> service doesn't start. If I run 'hosted-engine > >> >>>> >--vm-status' > >> >>>> >> >I > >> >>>> >> >>> >get: > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> The hosted engine configuration has not been > >> >retrieved > >> >>>> >from > >> >>>> >> >>> >shared > >> >>>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is > >> >running and > >> >>>> >> >the > >> >>>> >> >>> >> >storage > >> >>>> >> >>> >> >>>> server is reachable. > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> and indeed if I list the mounts under > >> >>>> >/rhev/data-center/mnt, > >> >>>> >> >>> >only > >> >>>> >> >>> >> >one of > >> >>>> >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, > >> >one ISO > >> >>>> >> >Domain > >> >>>> >> >>> >> >and two > >> >>>> >> >>> >> >>>> Data Domains. Only one Data Domain has mounted > >and > >> >this > >> >>>> >has > >> >>>> >> >>> >lots > >> >>>> >> >>> >> >of .prob > >> >>>> >> >>> >> >>>> files in. So why haven't the other NFS exports > >been > >> >>>> >> >mounted? > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have > >helped > >> >much > >> >>>> >> >either. > >> >>>> >> >>> >I > >> >>>> >> >>> >> >can > >> >>>> >> >>> >> >>>> start the broker service but the agent service > >says > >> >no. > >> >>>> >> >Same > >> >>>> >> >>> >error > >> >>>> >> >>> >> >as the > >> >>>> >> >>> >> >>>> one in my last email. > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Shareef. > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq > >> >>>> >> >>> >> ><shareef@jalloq.co.uk> > >> >>>> >> >>> >> >>>> wrote: > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Right, still down. I've run virsh and it doesn't > >> >know > >> >>>> >> >anything > >> >>>> >> >>> >> >about > >> >>>> >> >>> >> >>>> the engine vm. > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> I've restarted the broker and agent services and > >I > >> >still > >> >>>> >get > >> >>>> >> >>> >> >nothing in > >> >>>> >> >>> >> >>>> virsh->list. > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha > >I > >> >see > >> >>>> >lots > >> >>>> >> >of > >> >>>> >> >>> >> >errors: > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> broker.log: > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Searching for submonitors in > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor network > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor network > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor engine-health > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor cpu-load > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor mem-free > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor storage-domain > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor storage-domain > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor mem-free > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Loaded submonitor engine-health > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Finished loading submonitors > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) > >> >>>> >> >>> >> >>>> Connecting the storage > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >>> >> >>>> Connecting storage server > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >>> >> >>>> Connecting storage server > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) > >> >>>> >> >>> >> >>>> Refreshing the storage domain > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > >> >>>> >> >>> >> >>>> Can't connect vdsm storage: Command > >> >StorageDomain.getInfo > >> >>>> >> >with > >> >>>> >> >>> >args > >> >>>> >> >>> >> >>>> {'storagedomainID': > >> >>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >> >>>> >> >>> >failed: > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> (code=350, message=Error in storage domain > >action: > >> >>>> >> >>> >> >>>> > >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > >> >>>> >> >>> >> >>>> Searching for submonitors in > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> agent.log: > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>>> >> >>> >> >>>> Trying to restart agent > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>>> >> >>> >> >>>> Agent shutting down > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) > >> >>>> >> >>> >> >>>> Found certificate common name: > >> >ovirt-node-01.phoelex.com > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >> >>>> >> >>> >> >>>> Initializing ha-broker connection > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) > >> >>>> >> >>> >> >>>> Starting monitor network, options > >{'tcp_t_address': > >> >'', > >> >>>> >> >>> >> >'network_test': > >> >>>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) > >> >>>> >> >>> >> >>>> Failed to start necessary monitors > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>>> >> >>> >> >>>> Traceback (most recent call last): > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> File > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >> >>>> >> >>> >> >>>> line 131, in _run_agent > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> return action(he) > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> File > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > >> >>>> >> >>> >> >>>> line 55, in action_proper > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> return he.start_monitoring() > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> File > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >> >>>> >> >>> >> >>>> line 432, in start_monitoring > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> self._initialize_broker() > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> File > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > >> >>>> >> >>> >> >>>> line 556, in _initialize_broker > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> m.get('options', {})) > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> File > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >> >>>> >> >>> >> >>>> line 89, in start_monitor > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start > >monitor > >> >via > >> >>>> >> >>> >> >ovirt-ha-broker: > >> >>>> >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: > >> >'network', > >> >>>> >> >>> >options: > >> >>>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', > >> >>>> >'tcp_t_port': > >> >>>> >> >'', > >> >>>> >> >>> >> >'addr': > >> >>>> >> >>> >> >>>> '192.168.1.99'}] > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> > >> >>>> >> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
>>>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) > >> >>>> >> >>> >> >>>> Trying to restart agent > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> > >> >>>> > >> >>>> > >> > >> > >
> >> >>>> >> >>> >> >>>> >> host-id=1 > >> >>>> >> >>> >> >>>> >> score=3400 > >> >>>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 > >16:13:06 > >> >2020) > >> >>>> >> >>> >> >>>> >> conf_on_shared_storage=True > >> >>>> >> >>> >> >>>> >> maintenance=False > >> >>>> >> >>> >> >>>> >> state=EngineDown > >> >>>> >> >>> >> >>>> >> stopped=False > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id:
> >> >status > >> >>>> >==-- > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True > >> >>>> >> >>> >> >>>> >> Status up-to-date : False > >> >>>> >> >>> >> >>>> >> Hostname : > >> >>>> >> >>> >ovirt-node-00.phoelex.com > >> >>>> >> >>> >> >>>> >> Host ID : 1 > >> >>>> >> >>> >> >>>> >> Engine status : unknown > >> >>>> >stale-data > >> >>>> >> >>> >> >>>> >> Score : 3400 > >> >>>> >> >>> >> >>>> >> stopped : False > >> >>>> >> >>> >> >>>> >> Local maintenance : False > >> >>>> >> >>> >> >>>> >> crc32 : 9c4a034b > >> >>>> >> >>> >> >>>> >> local_conf_timestamp : 523362 > >> >>>> >> >>> >> >>>> >> Host timestamp : 523608 > >> >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): > >> >>>> >> >>> >> >>>> >> metadata_parse_version=1 > >> >>>> >> >>> >> >>>> >> metadata_feature_version=1 > >> >>>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11
>>>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > >> >>>> >> >>> >> >>>> Agent shutting down > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov > >> >>>> >> >>> >> ><hunter86_bg@yahoo.com> > >> >>>> >> >>> >> >>>> wrote: > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, > >> >Brett" < > >> >>>> >> >>> >> >>>> matonb@ltresources.co.uk> wrote: > >> >>>> >> >>> >> >>>> >On the host you tried to restart the engine on: > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >Add an alias to virsh (authenticates with > >> >>>> >virsh_auth.conf) > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >alias virsh='virsh -c > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> > >> >>>> > >>
>>>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >Then run virsh: > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >virsh > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >virsh # list > >> >>>> >> >>> >> >>>> > Id Name State > >> >>>> >> >>> >> >>>> > >> >>---------------------------------------------------- > >> >>>> >> >>> >> >>>> > xx HostedEngine Paused > >> >>>> >> >>> >> >>>> > xx ********** running > >> >>>> >> >>> >> >>>> > ... > >> >>>> >> >>> >> >>>> > xx ********** running > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >HostedEngine should be in the list, try and > >resume > >> >the > >> >>>> >> >engine: > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >virsh # resume HostedEngine > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq > >> >>>> >> >>> ><shareef@jalloq.co.uk> > >> >>>> >> >>> >> >>>> >wrote: > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> >> Thanks! > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM being > >> >>>> >down.... > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine > >--vm-start > >> >>>> >> >>> >> >>>> >> VM exists and is down, cleaning up and > >restarting > >> >>>> >> >>> >> >>>> >> VM in WaitForLaunch > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, > >after > >> >a > >> >>>> >while > >> >>>> >> >I > >> >>>> >> >>> >get a > >> >>>> >> >>> >> >>>> >status of > >> >>>> >> >>> >> >>>> >> it being barfed... > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id:
> >> >status > >> >>>> >==-- > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True > >> >>>> >> >>> >> >>>> >> Status up-to-date : True > >> >>>> >> >>> >> >>>> >> Hostname : > >> >>>> >> >>> >ovirt-node-01.phoelex.com > >> >>>> >> >>> >> >>>> >> Host ID : 2 > >> >>>> >> >>> >> >>>> >> Engine status : > >{"reason": > >> >"bad > >> >>>> >vm > >> >>>> >> >>> >status", > >> >>>> >> >>> >> >>>> >"health": > >> >>>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": > >"Down"} > >> >>>> >> >>> >> >>>> >> Score : 0 > >> >>>> >> >>> >> >>>> >> stopped : False > >> >>>> >> >>> >> >>>> >> Local maintenance : False > >> >>>> >> >>> >> >>>> >> crc32 : 5045f2eb > >> >>>> >> >>> >> >>>> >> local_conf_timestamp : 1737037 > >> >>>> >> >>> >> >>>> >> Host timestamp : 1737283 > >> >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): > >> >>>> >> >>> >> >>>> >> metadata_parse_version=1 > >> >>>> >> >>> >> >>>> >> metadata_feature_version=1 > >> >>>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >> >>>> >> >>> >> >>>> >> host-id=2 > >> >>>> >> >>> >> >>>> >> score=0 > >> >>>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 > >16:12:11 > >> >>>> >2020) > >> >>>> >> >>> >> >>>> >> conf_on_shared_storage=True > >> >>>> >> >>> >> >>>> >> maintenance=False > >> >>>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown > >> >>>> >> >>> >> >>>> >> stopped=False > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > >> >>>> >> >>> >> >>>> ><matonb@ltresources.co.uk> > >> >>>> >> >>> >> >>>> >> wrote: > >> >>>> >> >>> >> >>>> >> > >> >>>> >> >>> >> >>>> >>> First steps, on one of your hosts as root: > >> >>>> >> >>> >> >>>> >>> > >> >>>> >> >>> >> >>>> >>> To get information: > >> >>>> >> >>> >> >>>> >>> hosted-engine --vm-status > >> >>>> >> >>> >> >>>> >>> > >> >>>> >> >>> >> >>>> >>> To start the engine: > >> >>>> >> >>> >> >>>> >>> hosted-engine --vm-start > >> >>>> >> >>> >> >>>> >>> > >> >>>> >> >>> >> >>>> >>> > >> >>>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq > >> >>>> >> >>> >> ><shareef@jalloq.co.uk> > >> >>>> >> >>> >> >>>> >wrote: > >> >>>> >> >>> >> >>>> >>> > >> >>>> >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh > >into > >> >it > >> >>>> >> >either. > >> >>>> >> >>> >If > >> >>>> >> >>> >> >I > >> >>>> >> >>> >> >>>> >try to > >> >>>> >> >>> >> >>>> >>>> log into the web-ui of the node it is > >running > >> >on, I > >> >>>> >get > >> >>>> >> >>> >> >redirected > >> >>>> >> >>> >> >>>> >because > >> >>>> >> >>> >> >>>> >>>> the node can't reach the engine. > >> >>>> >> >>> >> >>>> >>>> > >> >>>> >> >>> >> >>>> >>>> What are my next steps? > >> >>>> >> >>> >> >>>> >>>> > >> >>>> >> >>> >> >>>> >>>> Shareef. > >> >>>> >> >>> >> >>>> >>>> > >_______________________________________________ > >> >>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org > >> >>>> >> >>> >> >>>> >>>> To unsubscribe send an email to > >> >>>> >users-leave@ovirt.org > >> >>>> >> >>> >> >>>> >>>> Privacy Statement: > >> >>>> >> >>> >https://www.ovirt.org/privacy-policy.html > >> >>>> >> >>> >> >>>> >>>> oVirt Code of Conduct: > >> >>>> >> >>> >> >>>> >>>> > >> >>>> >> >https://www.ovirt.org/community/about/community-guidelines/ > >> >>>> >> >>> >> >>>> >>>> List Archives: > >> >>>> >> >>> >> >>>> >>>> > >> >>>> >> >>> >> >>>> > > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > > >> >>>> >> >>> >> > >> >>>> >> >>> > > >> >>>> >> >>> > >> >>>> >> > > >> >>>> >> > >> >>>> > > >> >>>> > >> > > >> > > >
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...
> >> >>>> >> >>> >> >>>> >>>> > >> >>>> >> >>> >> >>>> >>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> This has to be resolved: > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Engine status : unknown > >> >stale-data > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it > >remains > >> >the > >> >>>> >> >same, > >> >>>> >> >>> >> >restart > >> >>>> >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Verify that the engine's storage is available. > >Then > >> >>>> >monitor > >> >>>> >> >the > >> >>>> >> >>> >> >broker > >> >>>> >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> Best Regards, > >> >>>> >> >>> >> >>>> Strahil Nikolov > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> >>>> > >> >>>> >> >>> >> > >> >>>> >> >>> >> Hi Shareef, > >> >>>> >> >>> >> > >> >>>> >> >>> >> The flow of activation oVirt is more complex than a > >plain > >> >KVM. > >> >>>> >> >>> >> Mounting of the domains happen during the activation > >of > >> >the > >> >>>> >node > >> >>>> >> >( > >> >>>> >> >>> >the > >> >>>> >> >>> >> HostedEngine is activating everything needed). > >> >>>> >> >>> >> > >> >>>> >> >>> >> Focus on the HostedEngine VM. > >> >>>> >> >>> >> Is it running properly ? > >> >>>> >> >>> >> > >> >>>> >> >>> >> If not,try: > >> >>>> >> >>> >> 1. Verify that the storage domain exists > >> >>>> >> >>> >> 2. Check if it has 'ha_agents' directory > >> >>>> >> >>> >> 3. Check if the links are OK, if not you can safely > >> >remove > >> >>>> >the > >> >>>> >> >links > >> >>>> >> >>> >> > >> >>>> >> >>> >> 4. Next check the services are running: > >> >>>> >> >>> >> A) sanlock > >> >>>> >> >>> >> B) supervdsmd > >> >>>> >> >>> >> C) vdsmd > >> >>>> >> >>> >> D) libvirtd > >> >>>> >> >>> >> > >> >>>> >> >>> >> 5. Increase the log level for broker and agent > >services: > >> >>>> >> >>> >> > >> >>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha > >> >>>> >> >>> >> vim *-log.conf > >> >>>> >> >>> >> > >> >>>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent > >> >>>> >> >>> >> > >> >>>> >> >>> >> 6. Check what they are complaining about > >> >>>> >> >>> >> Keep in mind that agent will keep throwing errors > >untill > >> >the > >> >>>> >> >broker > >> >>>> >> >>> >stops > >> >>>> >> >>> >> doing it (agent depends on broker), so broker must > >be > >> >OK > >> >>>> >before > >> >>>> >> >>> >> peoceeding with the agent log. > >> >>>> >> >>> >> > >> >>>> >> >>> >> About the manual VM start, you need 2 things: > >> >>>> >> >>> >> > >> >>>> >> >>> >> 1. Define the VM network > >> >>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> > >> >>>> >> >>> >> <name>vdsm-ovirtmgmt</name> > >> >>>> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> > >> >>>> >> >>> >> <forward mode='bridge'/> > >> >>>> >> >>> >> <bridge name='ovirtmgmt'/> > >> >>>> >> >>> >> </network> > >> >>>> >> >>> >> > >> >>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define > >> >>>> >> >vdsm-ovirtmgmt.xml > >> >>>> >> >>> >> > >> >>>> >> >>> >> 2. Get an xml definition which can be found in the > >vdsm > >> >log. > >> >>>> >> >Every VM > >> >>>> >> >>> >at > >> >>>> >> >>> >> start up has it's configuration printed out in vdsm > >log > >> >on > >> >>>> >the > >> >>>> >> >host > >> >>>> >> >>> >it > >> >>>> >> >>> >> starts. > >> >>>> >> >>> >> Save to file and then: > >> >>>> >> >>> >> A) virsh define myvm.xml > >> >>>> >> >>> >> B) virsh start myvm > >> >>>> >> >>> >> > >> >>>> >> >>> >> It seems there is/was a problem with your NFS shares. > >> >>>> >> >>> >> > >> >>>> >> >>> >> > >> >>>> >> >>> >> Best Regards, > >> >>>> >> >>> >> Strahil Nikolov > >> >>>> >> >>> >> > >> >>>> >> >>> > >> >>>> >> >>> Hey Shareef, > >> >>>> >> >>> > >> >>>> >> >>> Check if there are any files or folders not owned by > >> >vdsm:kvm . > >> >>>> >> >Something > >> >>>> >> >>> like this: > >> >>>> >> >>> > >> >>>> >> >>> find . -not -user 36 -not -group 36 -print > >> >>>> >> >>> > >> >>>> >> >>> Also check if vdsm can access the images in the > >> >>>> >> >>> '<vol-mount-point>/images' directories. > >> >>>> >> >>> > >> >>>> >> >>> Best Regards, > >> >>>> >> >>> Strahil Nikolov > >> >>>> >> >>> > >> >>>> >> >> > >> >>>> >> > >> >>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? > >> >>>> >> > >> >>>> >> I don't see in the log output. > >> >>>> >> > >> >>>> >> Best Regards, > >> >>>> >> Strahil Nikolov > >> >>>> >> > >> >>>> > >> >>>> Based on your output , you got a PTR record for IPv4 & IPv6 > >... > >> >most > >> >>>> probably it's the reason. > >> >>>> > >> >>>> Set the IPv6 on the interface and try again. > >> >>>> > >> >>>> Best Regards, > >> >>>> Strahil Nikolov > >> >>>> > >> >>> > >> > >> Do you have firewalld up and running on the host ? > >> > >> Best Regards, > >> Strahil Nikolov > >> > > I am guessing, but your interface is not asaigned to any zone , right? > Just add the interface to the default zone (usually 'public'). > > Best Regards, > Strahil Nikolov >
Keep in mind that there are a lot of playbooks that can be used to deploy a HostedEngine Environment via ansible.
Keep in mind that if you plan to use oVirt in Prod, you need to know how to debug it (at least on basic level).
Best Regards, Strahil Nikolov
It's really interesting that you mention that topic. The only way I managed to break my engine was: A) bad SELINUX rpm which was solved via reinstall of the package and relabel B) Interrupted patch, as I forgot to use screen
I think it is Prod ready, but it requires knowledge as it is not as dummy-proof like VMware. Yet, oVirt is way more flexible allowing you to run your own scripts before/during/after a certain event (vdsm hooks).
Sadly Ansible (this is what is used for setup of gluster -> gdeploy, and for the engine) is quite dynamic and sometimes something might break.
If you feel that oVirt breaks too often - just set your engine on a separate physical or virtual (non-hosted) machine, but do not complain that a free open-source product is not Production ready, just because you don't know how to debug it.
You can trial the downstream solutions from Red Hat & Oracle and you will notice the difference. For me oVirt is like Fedora compared to RHEL/OEL/CentOS, but this is just a personal opinion.
Best Regards, Strahil Nikolov

On April 15, 2020 11:59:00 AM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
OK, that seems to have fixed it, thanks. Is this a side effect of redeploying the HE over a first time install? Nothing has changed in our setup and I didn't need to do this when I initially set up our nodes.
On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> >> >rescue? > >> >>>> > >> >>>> > >> >>>> > >> >>>> I've now shut down the VMs on one host and rebooted it but
> >> >agent > >> >>>> service doesn't start. If I run 'hosted-engine --vm-status' I > >get: > >> >>>> > >> >>>> > >> >>>> > >> >>>> The hosted engine configuration has not been retrieved from > >shared > >> >>>> storage. Please ensure that ovirt-ha-agent is running and
to the the
> >> >storage > >> >>>> server is reachable. > >> >>>> > >> >>>> > >> >>>> > >> >>>> and indeed if I list the mounts under /rhev/data-center/mnt, > >only > >> >one of > >> >>>> the directories is mounted. I have 3 NFS mounts, one ISO Domain > >> >and two > >> >>>> Data Domains. Only one Data Domain has mounted and
how this has
> >lots > >> >of .prob > >> >>>> files in. So why haven't the other NFS exports been mounted? > >> >>>> > >> >>>> > >> >>>> > >> >>>> Manually mounting them doesn't seem to have helped much either. > >I > >> >can > >> >>>> start the broker service but the agent service says no. Same > >error > >> >as the > >> >>>> one in my last email. > >> >>>> > >> >>>> > >> >>>> > >> >>>> Shareef. > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq > >> ><shareef@jalloq.co.uk> > >> >>>> wrote: > >> >>>> > >> >>>> Right, still down. I've run virsh and it doesn't know anything > >> >about > >> >>>> the engine vm. > >> >>>> > >> >>>> > >> >>>> > >> >>>> I've restarted the broker and agent services and I still get > >> >nothing in > >> >>>> virsh->list. > >> >>>> > >> >>>> > >> >>>> > >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots of > >> >errors: > >> >>>> > >> >>>> > >> >>>> > >> >>>> broker.log: > >> >>>> > >> >>>> > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Searching for submonitors in > >> >>>> > >> > >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor network > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor cpu-load-no-engine > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor mgmt-bridge > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor network > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor cpu-load > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor engine-health > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor mgmt-bridge > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor cpu-load-no-engine > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor cpu-load > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor mem-free > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor storage-domain > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor storage-domain > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor mem-free > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Loaded submonitor engine-health > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Finished loading submonitors > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >> >>>> Connecting the storage > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>>> Connecting storage server > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>>> Connecting storage server > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>>> Refreshing the storage domain > >> >>>> > >> >>>> MainThread::WARNING::2020-04-08 > >> >>>> > >> > >> > >
20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo with > >args > >> >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} > >failed: > >> >>>> > >> >>>> (code=350, message=Error in storage domain action: > >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>>> Searching for submonitors in > >> >>>> > >> > >
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> >> >>>> > >> >>>> > >> >>>> > >> >>>> agent.log: > >> >>>> > >> >>>> > >> >>>> > >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> > >> > >> > >
20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >> >>>> Trying to restart agent > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> >
20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >> >>>> Agent shutting down > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> >
20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> >> >>>> Initializing ha-broker connection > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> >> >>>> Starting monitor network, options {'tcp_t_address': '', > >> >'network_test': > >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} > >> >>>> > >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> >> >>>> Failed to start necessary monitors > >> >>>> > >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >> >>>> Traceback (most recent call last): > >> >>>> > >> >>>> File > >> >>>> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >> >>>> line 131, in _run_agent > >> >>>> > >> >>>> return action(he) > >> >>>> > >> >>>> File > >> >>>> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >> >>>> line 55, in action_proper > >> >>>> > >> >>>> return he.start_monitoring() > >> >>>> > >> >>>> File > >> >>>> > >> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >> >>>> line 432, in start_monitoring > >> >>>> > >> >>>> self._initialize_broker() > >> >>>> > >> >>>> File > >> >>>> > >> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >> >>>> line 556, in _initialize_broker > >> >>>> > >> >>>> m.get('options', {})) > >> >>>> > >> >>>> File > >> >>>> > >> > >> > >
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> >> >>>> line 89, in start_monitor > >> >>>> > >> >>>> ).format(t=type, o=options, e=e) > >> >>>> > >> >>>> RequestError: brokerlink - failed to start monitor via > >> >ovirt-ha-broker: > >> >>>> [Errno 2] No such file or directory, [monitor: 'network', > >options: > >> >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port': '', > >> >'addr': > >> >>>> '192.168.1.99'}] > >> >>>> > >> >>>> > >> >>>> > >> >>>> MainThread::ERROR::2020-04-08 > >> >>>> > >> > >> > >
20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >> >>>> Trying to restart agent > >> >>>> > >> >>>> MainThread::INFO::2020-04-08 > >> >>>> > >> >
> >> >>>> Agent shutting down > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov > >> ><hunter86_bg@yahoo.com> > >> >>>> wrote: > >> >>>> > >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" < > >> >>>> matonb@ltresources.co.uk> wrote: > >> >>>> >On the host you tried to restart the engine on: > >> >>>> > > >> >>>> >Add an alias to virsh (authenticates with virsh_auth.conf) > >> >>>> > > >> >>>> >alias virsh='virsh -c > >> >>>> > qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' > >> >>>> > > >> >>>> >Then run virsh: > >> >>>> > > >> >>>> >virsh > >> >>>> > > >> >>>> >virsh # list > >> >>>> > Id Name State > >> >>>> >---------------------------------------------------- > >> >>>> > xx HostedEngine Paused > >> >>>> > xx ********** running > >> >>>> > ... > >> >>>> > xx ********** running > >> >>>> > > >> >>>> >HostedEngine should be in the list, try and resume
20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) the engine:
> >> >>>> > > >> >>>> >virsh # resume HostedEngine > >> >>>> > > >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >>>> >wrote: > >> >>>> > > >> >>>> >> Thanks! > >> >>>> >> > >> >>>> >> The status hangs due to, I guess, the VM being down.... > >> >>>> >> > >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start > >> >>>> >> VM exists and is down, cleaning up and restarting > >> >>>> >> VM in WaitForLaunch > >> >>>> >> > >> >>>> >> but this doesn't seem to do anything. OK, after a while I > >get a > >> >>>> >status of > >> >>>> >> it being barfed... > >> >>>> >> > >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==-- > >> >>>> >> > >> >>>> >> conf_on_shared_storage : True > >> >>>> >> Status up-to-date : False > >> >>>> >> Hostname : > >ovirt-node-00.phoelex.com > >> >>>> >> Host ID : 1 > >> >>>> >> Engine status : unknown stale-data > >> >>>> >> Score : 3400 > >> >>>> >> stopped : False > >> >>>> >> Local maintenance : False > >> >>>> >> crc32 : 9c4a034b > >> >>>> >> local_conf_timestamp : 523362 > >> >>>> >> Host timestamp : 523608 > >> >>>> >> Extra metadata (valid at timestamp): > >> >>>> >> metadata_parse_version=1 > >> >>>> >> metadata_feature_version=1 > >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020) > >> >>>> >> host-id=1 > >> >>>> >> score=3400 > >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06
Hmmm, we're not using ipv6. Is that the issue?
On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Right, I've given up on recovering the HE so want to try and redeploy it. There doesn't seem to be enough information to debug why the broker/agent won't start cleanly.
In running 'hosted-engine --deploy', I'm seeing the following error in the setup validation phase:
2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please provide
On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: the
hostname of this host on the management network [ovirt-node-00.phoelex.com]:
2020-04-14 09:46:12,831+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getResolvedAddresses:432 getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com resolves to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com', 'ANY'], executable='None', cwd='None', env=None
2020-04-14 09:46:12,871+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'], rc=0
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stdout:
ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ['/usr/bin/dig', '+noall', '+answer', ' ovirt-node-00.phoelex.com', 'ANY'] stderr:
2020-04-14 09:46:12,872+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr'), rc=0
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr') stdout:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:febc:326a/64 scope link
valid_lft forever preferred_lft forever
22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
2020-04-14 09:46:12,876+0000 DEBUG otopi.plugins.gr_he_common.network.bridge plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr') stderr:
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.getLocalAddresses:251 addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
2020-04-14 09:46:12,877+0000 DEBUG otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464 test_hostname exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 460, in test_hostname
not_local_text,
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", line 342, in _validateFQDNresolvability
addresses=resolvedAddressesAsString
RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
2020-04-14 09:46:12,884+0000 ERROR otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host name is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d 192.168.1.61 and not all of them can be mapped to non loopback devices on this host
The node I'm running on has an IP address of .61 and resolves correctly.
On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Where should I be checking if there are any files/folder not owned by vdsm:kvm? I checked on the mount the HA sits on and it's fine.
How would I go about checking vdsm can access those images? If I run virsh, it lists them and they were running yesterday even
the
HA was
down. I've since restarted both hosts but the broker is still spitting out the same error (copied below). How do I find the reason the broker can't connect to the storage? The conf file is already at DEBUG verbosity:
[handler_logfile]
class=logging.handlers.TimedRotatingFileHandler
args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
level=DEBUG
formatter=long
And what are all these .prob-<num> files that are being created? There are over 250K of them now on the mount I'm using for the Data domain. They're all of 0 size and of the form, /rhev/data-center/mnt/nas-01.phoelex.com: _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
@eevans: The volume I have the Data Domain on has TB's free. The HA is dead so I can't ssh in. No idea what started these errors and
On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: though the
other
VMs were still running happily although they're on a different Data Domain.
Shareef.
MainThread::INFO::2020-04-10
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
Connecting the storage
MainThread::INFO::2020-04-10
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2020-04-10
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::WARNING::2020-04-10
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: Command StorageDomain.getInfo with args {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
(code=350, message=Error in storage domain action: (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < > shareef@jalloq.co.uk> wrote: > >OK, let's go through this. I'm looking at the node that at least still > >has > >some VMs running. virsh also tells me that the HostedEngine VM is > >running > >but it's unresponsive and I can't shut it down. > > > >1. All storage domains exist and are mounted. > >2. The ha_agent exists: > > > >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls /rhev/data-center/mnt/ > >nas-01.phoelex.com > \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ > > > >dom_md ha_agent images master > > > >3. There are two links > > > >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ > >nas-01.phoelex.com > \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/ > > > >total 8 > > > >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace -> > >
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
> > > >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata -> > >
/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
> > > >4. The services exist but all seem to have some sort of warning: > > > >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: *2020-04-08 > >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec* > > > >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]: *failed > >to > >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: > >No > >such file or directory* > > > >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR failed > >to > >retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is > >the > >Hosted Engine setup finished?* > > > >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 > >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 : cannot > >parse > >process status data > > > >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 > >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 : > >internal > >error: /proc/net/dev: Interface not found > > > >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-08 > >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End of > >file > >while reading data: Input/output error > > > >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: 2020-04-09 > >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End of > >file > >while reading data: Input/output error > > > >5 & 6. The broker log is continually printing this error: > > > >MainThread::INFO::2020-04-09 > >
08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >ovirt-hosted-engine-ha broker 2.3.6 started > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >Running broker > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
> >Starting monitor > > > >MainThread::INFO::2020-04-09 > >
08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Searching for submonitors in > /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker > > > >/submonitors > > > >MainThread::INFO::2020-04-09 > >
08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor network > > > >MainThread::INFO::2020-04-09 > >
08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor cpu-load-no-engine > > > >MainThread::INFO::2020-04-09 > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor mgmt-bridge > > > >MainThread::INFO::2020-04-09 > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor network > > > >MainThread::INFO::2020-04-09 > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor cpu-load > > > >MainThread::INFO::2020-04-09 > >
08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor engine-health > > > >MainThread::INFO::2020-04-09 > >
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor mgmt-bridge > > > >MainThread::INFO::2020-04-09 > >
08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor cpu-load-no-engine > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor cpu-load > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor mem-free > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor storage-domain > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor storage-domain > > > >MainThread::INFO::2020-04-09 > >
08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor mem-free > > > >MainThread::INFO::2020-04-09 > >
08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Loaded submonitor engine-health > > > >MainThread::INFO::2020-04-09 > >
08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >Finished loading submonitors > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
> >Starting storage broker > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >Connecting to VDSM > > > >MainThread::DEBUG::2020-04-09 > >
08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
> >Creating a new json-rpc connection to VDSM > > > >Client localhost:54321::DEBUG::2020-04-09 > >08:07:31,453::concurrent::258::root::(run) START thread <Thread(Client > >localhost:54321, started daemon 139992488138496)> (func=<bound method > >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at > >0x7f528acabc90>>, args=(), kwargs={}) > > > >Client localhost:54321::DEBUG::2020-04-09 > >
08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
> >Stomp connection established > > > >MainThread::DEBUG::2020-04-09 > 08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::INFO::2020-04-09 > >
08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >Connecting the storage > > > >MainThread::INFO::2020-04-09 > >
08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >Connecting storage server > > > >MainThread::DEBUG::2020-04-09 > 08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > 08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >
08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not available > > > >MainThread::INFO::2020-04-09 > >
08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >Connecting storage server > > > >MainThread::DEBUG::2020-04-09 > 08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >
08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] > > > >MainThread::INFO::2020-04-09 > >
08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >Refreshing the storage domain > > > >MainThread::DEBUG::2020-04-09 > 08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >
08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >Error refreshing storage domain: Command StorageDomain.getStats with > >args > >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > > > >(code=350, message=Error in storage domain action: > >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > > >MainThread::DEBUG::2020-04-09 > 08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending > >response > > > >MainThread::DEBUG::2020-04-09 > >
08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
> >Command StorageDomain.getInfo with args {'storagedomainID': > >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > > > >(code=350, message=Error in storage domain action: > >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > > >MainThread::WARNING::2020-04-09 > >
> >Can't connect vdsm storage: Command StorageDomain.getInfo with args > >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: > > > >(code=350, message=Error in storage domain action: > >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) > > > > > >The UUID it is moaning about is indeed the one that the HA sits on and > >is > >the one I listed the contents of in step 2 above. > > > > > >So why can't it see this domain? > > > > > >Thanks, Shareef. > > > >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov <hunter86_bg@yahoo.com> > >wrote: > > > >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < > >> shareef@jalloq.co.uk> wrote: > >> >Don't know if this is useful or not, but I just tried to shutdown > >and > >> >start > >> >another VM on one of the hosts and get the following error: > >> > > >> >virsh # start scratch > >> > > >> >error: Failed to start domain scratch > >> > > >> >error: Network not found: no network with matching name > >> >'vdsm-ovirtmgmt' > >> > > >> >Is this not referring to the interface name as the network is called > >> >'ovirtmgnt'. > >> > > >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >wrote: > >> > > >> >> Hmmm, virsh tells me the HE is running but it hasn't come up and > >the > >> >> agent.log is full of the same errors. > >> >> > >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq > ><shareef@jalloq.co.uk> > >> >> wrote: > >> >> > >> >>> Ah hah! Ok, so I've managed to start it using virsh on
> >second > >> >host > >> >>> but my first host is still dead. > >> >>> > >> >>> First of all, what are these 56,317 .prob- files that get dumped > >to > >> >the > >> >>> NFS mounts? > >> >>> > >> >>> Secondly, why doesn't the node mount the NFS
08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) the directories at
boot?
> >> >Is > >> >>> that the issue with this particular node? > >> >>> > >> >>> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com> > >wrote: > >> >>> > >> >>>> Did you try virsh list --inactive > >> >>>> > >> >>>> > >> >>>> > >> >>>> Eric Evans > >> >>>> > >> >>>> Digital Data Services LLC. > >> >>>> > >> >>>> 304.660.9080 > >> >>>> > >> >>>> > >> >>>> > >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> > >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM > >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> > >> >>>> *Cc:* Ovirt Users <users@ovirt.org> > >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive
> >> >>>> >> conf_on_shared_storage=True > >> >>>> >> maintenance=False > >> >>>> >> state=EngineDown > >> >>>> >> stopped=False > >> >>>> >> > >> >>>> >> > >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==-- > >> >>>> >> > >> >>>> >> conf_on_shared_storage : True > >> >>>> >> Status up-to-date : True > >> >>>> >> Hostname : > >ovirt-node-01.phoelex.com > >> >>>> >> Host ID : 2 > >> >>>> >> Engine status : {"reason": "bad vm > >status", > >> >>>> >"health": > >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} > >> >>>> >> Score : 0 > >> >>>> >> stopped : False > >> >>>> >> Local maintenance : False > >> >>>> >> crc32 : 5045f2eb > >> >>>> >> local_conf_timestamp : 1737037 > >> >>>> >> Host timestamp : 1737283 > >> >>>> >> Extra metadata (valid at timestamp): > >> >>>> >> metadata_parse_version=1 > >> >>>> >> metadata_feature_version=1 > >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020) > >> >>>> >> host-id=2 > >> >>>> >> score=0 > >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11
> >> >>>> >> conf_on_shared_storage=True > >> >>>> >> maintenance=False > >> >>>> >> state=EngineUnexpectedlyDown > >> >>>> >> stopped=False > >> >>>> >> > >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett > >> >>>> ><matonb@ltresources.co.uk> > >> >>>> >> wrote: > >> >>>> >> > >> >>>> >>> First steps, on one of your hosts as root: > >> >>>> >>> > >> >>>> >>> To get information: > >> >>>> >>> hosted-engine --vm-status > >> >>>> >>> > >> >>>> >>> To start the engine: > >> >>>> >>> hosted-engine --vm-start > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq > >> ><shareef@jalloq.co.uk> > >> >>>> >wrote: > >> >>>> >>> > >> >>>> >>>> So my engine has gone down and I can't ssh into it either. > >If > >> >I > >> >>>> >try to > >> >>>> >>>> log into the web-ui of the node it is running on, I get > >> >redirected > >> >>>> >because > >> >>>> >>>> the node can't reach the engine. > >> >>>> >>>> > >> >>>> >>>> What are my next steps? > >> >>>> >>>> > >> >>>> >>>> Shareef. > >> >>>> >>>> _______________________________________________ > >> >>>> >>>> Users mailing list -- users@ovirt.org > >> >>>> >>>> To unsubscribe send an email to users-leave@ovirt.org > >> >>>> >>>> Privacy Statement: > >https://www.ovirt.org/privacy-policy.html > >> >>>> >>>> oVirt Code of Conduct: > >> >>>> >>>> https://www.ovirt.org/community/about/community-guidelines/ > >> >>>> >>>> List Archives: > >> >>>> >>>> > >> >>>> > > >> >>>> > >> > > >> > > >
> >> >>>> >>>> > >> >>>> >>> > >> >>>> > >> >>>> This has to be resolved: > >> >>>> > >> >>>> Engine status : unknown stale-data > >> >>>> > >> >>>> Run again 'hosted-engine --vm-status'. If it remains
> >> >restart > >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service > >> >>>> > >> >>>> Verify that the engine's storage is available. Then monitor
> >> >broker > >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha > >> >>>> > >> >>>> Best Regards, > >> >>>> Strahil Nikolov > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> > >> Hi Shareef, > >> > >> The flow of activation oVirt is more complex than a plain KVM. > >> Mounting of the domains happen during the activation of the node ( > >the > >> HostedEngine is activating everything needed). > >> > >> Focus on the HostedEngine VM. > >> Is it running properly ? > >> > >> If not,try: > >> 1. Verify that the storage domain exists > >> 2. Check if it has 'ha_agents' directory > >> 3. Check if the links are OK, if not you can safely remove
same, the the links
> >> > >> 4. Next check the services are running: > >> A) sanlock > >> B) supervdsmd > >> C) vdsmd > >> D) libvirtd > >> > >> 5. Increase the log level for broker and agent services: > >> > >> cd /etc/ovirt-hosted-engine-ha > >> vim *-log.conf > >> > >> systemctl restart ovirt-ha-broker ovirt-ha-agent > >> > >> 6. Check what they are complaining about > >> Keep in mind that agent will keep throwing errors untill
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C... the the
> >stops > >> doing it (agent depends on broker), so broker must be OK before > >> peoceeding with the agent log. > >> > >> About the manual VM start, you need 2 things: > >> > >> 1. Define the VM network > >> # cat vdsm-ovirtmgmt.xml <network> > >> <name>vdsm-ovirtmgmt</name> > >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> > >> <forward mode='bridge'/> > >> <bridge name='ovirtmgmt'/> > >> </network> > >> > >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define vdsm-ovirtmgmt.xml > >> > >> 2. Get an xml definition which can be found in the vdsm log. Every VM > >at > >> start up has it's configuration printed out in vdsm log on
broker the host
> >it > >> starts. > >> Save to file and then: > >> A) virsh define myvm.xml > >> B) virsh start myvm > >> > >> It seems there is/was a problem with your NFS shares. > >> > >> > >> Best Regards, > >> Strahil Nikolov > >> > > Hey Shareef, > > Check if there are any files or folders not owned by vdsm:kvm . Something > like this: > > find . -not -user 36 -not -group 36 -print > > Also check if vdsm can access the images in the > '<vol-mount-point>/images' directories. > > Best Regards, > Strahil Nikolov >
And the IPv6 address '64:ff9b::c0a8:13d' ?
I don't see in the log output.
Best Regards, Strahil Nikolov
Based on your output , you got a PTR record for IPv4 & IPv6 ... most probably it's the reason.
Set the IPv6 on the interface and try again.
Best Regards, Strahil Nikolov
Maybe the deployment process now also considers the IPv6... I don't know. Best Regards, Strahil Nikolov

If you haven’t got this resolved, log into the host and use ‘saslpasswd’ <local account> without the quotes. Then virsh start <vm name> and use the password you set on the local account. I’m not sure it will work, but has worked for regular vm’s. Eric Evans Digital Data Services LLC. 304.660.9080 From: Maton, Brett <matonb@ltresources.co.uk> Sent: Wednesday, April 8, 2020 12:09 PM To: Shareef Jalloq <shareef@jalloq.co.uk> Cc: Ovirt Users <users@ovirt.org> Subject: [ovirt-users] Re: ovirt-engine unresponsive - how to rescue? First steps, on one of your hosts as root: To get information: hosted-engine --vm-status To start the engine: hosted-engine --vm-start On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq <shareef@jalloq.co.uk <mailto:shareef@jalloq.co.uk> > wrote: So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine. What are my next steps? Shareef. _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C...

If you haven’t got this resolved, log into the host and use ‘saslpasswd’ <local account> without the quotes. Then virsh start <vm name> and use the password you set on the local account. I’m not sure it will work, but has worked for regular vm’s. Eric Evans Digital Data Services LLC. 304.660.9080 From: Shareef Jalloq <shareef@jalloq.co.uk> Sent: Wednesday, April 8, 2020 11:51 AM To: users@ovirt.org Subject: [ovirt-users] ovirt-engine unresponsive - how to rescue? So my engine has gone down and I can't ssh into it either. If I try to log into the web-ui of the node it is running on, I get redirected because the node can't reach the engine. What are my next steps? Shareef.
participants (4)
-
eevans@digitaldatatechs.com
-
Maton, Brett
-
Shareef Jalloq
-
Strahil Nikolov