Hmmm, we're not using ipv6.  Is that the issue?

On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
>Right, I've given up on recovering the HE so want to try and redeploy
>it.
>There doesn't seem to be enough information to debug why the
>broker/agent
>won't start cleanly.
>
>In running 'hosted-engine --deploy', I'm seeing the following error in
>the
>setup validation phase:
>
>2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human
>dialog.__logString:204 DIALOG:SEND                 Please provide the
>hostname of this host on the management network
>[ovirt-node-00.phoelex.com]:
>
>
>2020-04-14 09:46:12,831+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge
>hostname.getResolvedAddresses:432
>getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
>
>2020-04-14 09:46:12,832+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge
>hostname._validateFQDNresolvability:289 ovirt-node-00.phoelex.com
>resolves
>to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
>
>2020-04-14 09:46:12,832+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
>execute:
>['/usr/bin/dig', '+noall', '+answer', 'ovirt-node-00.phoelex.com',
>'ANY'],
>executable='None', cwd='None', env=None
>
>2020-04-14 09:46:12,871+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
>execute-result: ['/usr/bin/dig', '+noall', '+answer', '
>ovirt-node-00.phoelex.com', 'ANY'], rc=0
>
>2020-04-14 09:46:12,872+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge plugin.execute:921
>execute-output: ['/usr/bin/dig', '+noall', '+answer', '
>ovirt-node-00.phoelex.com', 'ANY'] stdout:
>
>ovirt-node-00.phoelex.com. 86400 IN     A       192.168.1.61
>
>
>2020-04-14 09:46:12,872+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge plugin.execute:926
>execute-output: ['/usr/bin/dig', '+noall', '+answer', '
>ovirt-node-00.phoelex.com', 'ANY'] stderr:
>
>
>
>2020-04-14 09:46:12,872+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
>execute:
>('/usr/sbin/ip', 'addr'), executable='None', cwd='None', env=None
>
>2020-04-14 09:46:12,876+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
>execute-result: ('/usr/sbin/ip', 'addr'), rc=0
>
>2020-04-14 09:46:12,876+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge plugin.execute:921
>execute-output: ('/usr/sbin/ip', 'addr') stdout:
>
>1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>group
>default qlen 1000
>
>    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>
>    inet 127.0.0.1/8 scope host lo
>
>       valid_lft forever preferred_lft forever
>
>    inet6 ::1/128 scope host
>
>       valid_lft forever preferred_lft forever
>
>2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
>ovirtmgmt state UP group default qlen 1000
>
>    link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
>
>3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
>DOWN
>group default qlen 1000
>
>    link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
>
>4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>group
>default qlen 1000
>
>    link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
>
>5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
>default qlen 1000
>
>    link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
>
>21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>state UP group default qlen 1000
>
>    link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
>
>    inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
>
>       valid_lft forever preferred_lft forever
>
>    inet6 fe80::ae1f:6bff:febc:326a/64 scope link
>
>       valid_lft forever preferred_lft forever
>
>22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>group
>default qlen 1000
>
>    link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
>
>
>2020-04-14 09:46:12,876+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge plugin.execute:926
>execute-output: ('/usr/sbin/ip', 'addr') stderr:
>
>
>
>2020-04-14 09:46:12,877+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge
>hostname.getLocalAddresses:251
>addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
>
>2020-04-14 09:46:12,877+0000 DEBUG
>otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464
>test_hostname exception
>
>Traceback (most recent call last):
>
>File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
>line
>460, in test_hostname
>
>    not_local_text,
>
>File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
>line
>342, in _validateFQDNresolvability
>
>    addresses=resolvedAddressesAsString
>
>RuntimeError: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d
>192.168.1.61 and not all of them can be mapped to non loopback devices
>on
>this host
>
>2020-04-14 09:46:12,884+0000 ERROR
>otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host
>name
>is not valid: ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d
>192.168.1.61 and not all of them can be mapped to non loopback devices
>on
>this host
>
>The node I'm running on has an IP address of .61 and resolves
>correctly.
>
>On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef@jalloq.co.uk>
>wrote:
>
>> Where should I be checking if there are any files/folder not owned by
>> vdsm:kvm?  I checked on the mount the HA sits on and it's fine.
>>
>> How would I go about checking vdsm can access those images?  If I run
>> virsh, it lists them and they were running yesterday even though the
>HA was
>> down.  I've since restarted both hosts but the broker is still
>spitting out
>> the same error (copied below).  How do I find the reason the broker
>can't
>> connect to the storage?  The conf file is already at DEBUG verbosity:
>>
>> [handler_logfile]
>>
>> class=logging.handlers.TimedRotatingFileHandler
>>
>> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
>>
>> level=DEBUG
>>
>> formatter=long
>>
>> And what are all these .prob-<num> files that are being created?
>There
>> are over 250K of them now on the mount I'm using for the Data domain.
>> They're all of 0 size and of the form,
>> /rhev/data-center/mnt/nas-01.phoelex.com:
>> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
>>
>> @eevans:  The volume I have the Data Domain on has TB's free.  The HA
>is
>> dead so I can't ssh in.  No idea what started these errors and the
>other
>> VMs were still running happily although they're on a different Data
>Domain.
>>
>> Shareef.
>>
>> MainThread::INFO::2020-04-10
>>
>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> Connecting the storage
>>
>> MainThread::INFO::2020-04-10
>>
>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Connecting storage server
>>
>> MainThread::INFO::2020-04-10
>>
>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Connecting storage server
>>
>> MainThread::INFO::2020-04-10
>>
>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> Refreshing the storage domain
>>
>> MainThread::WARNING::2020-04-10
>>
>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>> Can't connect vdsm storage: Command StorageDomain.getInfo with args
>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
>>
>> (code=350, message=Error in storage domain action:
>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>
>> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov
><hunter86_bg@yahoo.com>
>> wrote:
>>
>>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq <
>>> shareef@jalloq.co.uk> wrote:
>>> >OK, let's go through this.  I'm looking at the node that at least
>still
>>> >has
>>> >some VMs running.  virsh also tells me that the HostedEngine VM is
>>> >running
>>> >but it's unresponsive and I can't shut it down.
>>> >
>>> >1. All storage domains exist and are mounted.
>>> >2. The ha_agent exists:
>>> >
>>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls
>/rhev/data-center/mnt/
>>> >nas-01.phoelex.com
>>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
>>> >
>>> >dom_md  ha_agent  images  master
>>> >
>>> >3.  There are two links
>>> >
>>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll
>/rhev/data-center/mnt/
>>> >nas-01.phoelex.com
>>> >\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
>>> >
>>> >total 8
>>> >
>>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr  2 14:50 hosted-engine.lockspace ->
>>>
>>>
>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
>>> >
>>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr  2 14:50 hosted-engine.metadata ->
>>>
>>>
>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
>>> >
>>> >4. The services exist but all seem to have some sort of warning:
>>> >
>>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]:
>*2020-04-08
>>> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
>>> >
>>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com supervdsmd[29409]:
>*failed
>>> >to
>>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object
>file:
>>> >No
>>> >such file or directory*
>>> >
>>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: *ERROR
>failed
>>> >to
>>> >retrieve Hosted Engine HA score '[Errno 2] No such file or
>directory'Is
>>> >the
>>> >Hosted Engine setup finished?*
>>> >
>>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]:
>2020-04-08
>>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 :
>cannot
>>> >parse
>>> >process status data
>>> >
>>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]:
>2020-04-08
>>> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 :
>>> >internal
>>> >error: /proc/net/dev: Interface not found
>>> >
>>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]:
>2020-04-08
>>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End
>of
>>> >file
>>> >while reading data: Input/output error
>>> >
>>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]:
>2020-04-09
>>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End
>of
>>> >file
>>> >while reading data: Input/output error
>>> >
>>> >5 & 6.  The broker log is continually printing this error:
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> >ovirt-hosted-engine-ha broker 2.3.6 started
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> >Running broker
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
>>> >Starting monitor
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Searching for submonitors in
>>> >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
>>> >
>>> >/submonitors
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor network
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor cpu-load-no-engine
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor mgmt-bridge
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor network
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor cpu-load
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor engine-health
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor mgmt-bridge
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor cpu-load-no-engine
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor cpu-load
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor mem-free
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor storage-domain
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor storage-domain
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor mem-free
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Loaded submonitor engine-health
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >Finished loading submonitors
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
>>> >Starting storage broker
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>> >Connecting to VDSM
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
>>> >Creating a new json-rpc connection to VDSM
>>> >
>>> >Client localhost:54321::DEBUG::2020-04-09
>>> >08:07:31,453::concurrent::258::root::(run) START thread
><Thread(Client
>>> >localhost:54321, started daemon 139992488138496)> (func=<bound
>method
>>> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor
>object at
>>> >0x7f528acabc90>>, args=(), kwargs={})
>>> >
>>> >Client localhost:54321::DEBUG::2020-04-09
>>>
>>>
>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
>>> >Stomp connection established
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>> >08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send)
>Sending
>>> >response
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>> >Connecting the storage
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >Connecting storage server
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>> >08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send)
>Sending
>>> >response
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>> >08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send)
>Sending
>>> >response
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
>>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not
>available
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >Connecting storage server
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>> >08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send)
>Sending
>>> >response
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >[{u'status': 0, u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
>>> >
>>> >MainThread::INFO::2020-04-09
>>>
>>>
>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >Refreshing the storage domain
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>> >08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send)
>Sending
>>> >response
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >Error refreshing storage domain: Command StorageDomain.getStats
>with
>>> >args
>>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
>>> >
>>> >(code=350, message=Error in storage domain action:
>>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>> >08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send)
>Sending
>>> >response
>>> >
>>> >MainThread::DEBUG::2020-04-09
>>>
>>>
>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
>>> >Command StorageDomain.getInfo with args {'storagedomainID':
>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
>>> >
>>> >(code=350, message=Error in storage domain action:
>>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >
>>> >MainThread::WARNING::2020-04-09
>>>
>>>
>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>>> >Can't connect vdsm storage: Command StorageDomain.getInfo with args
>>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
>>> >
>>> >(code=350, message=Error in storage domain action:
>>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >
>>> >
>>> >The UUID it is moaning about is indeed the one that the HA sits on
>and
>>> >is
>>> >the one I listed the contents of in step 2 above.
>>> >
>>> >
>>> >So why can't it see this domain?
>>> >
>>> >
>>> >Thanks, Shareef.
>>> >
>>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov
><hunter86_bg@yahoo.com>
>>> >wrote:
>>> >
>>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq <
>>> >> shareef@jalloq.co.uk> wrote:
>>> >> >Don't know if this is useful or not, but I just tried to
>shutdown
>>> >and
>>> >> >start
>>> >> >another VM on one of the hosts and get the following error:
>>> >> >
>>> >> >virsh # start scratch
>>> >> >
>>> >> >error: Failed to start domain scratch
>>> >> >
>>> >> >error: Network not found: no network with matching name
>>> >> >'vdsm-ovirtmgmt'
>>> >> >
>>> >> >Is this not referring to the interface name as the network is
>called
>>> >> >'ovirtmgnt'.
>>> >> >
>>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq
>>> ><shareef@jalloq.co.uk>
>>> >> >wrote:
>>> >> >
>>> >> >> Hmmm, virsh tells me the HE is running but it hasn't come up
>and
>>> >the
>>> >> >> agent.log is full of the same errors.
>>> >> >>
>>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq
>>> ><shareef@jalloq.co.uk>
>>> >> >> wrote:
>>> >> >>
>>> >> >>> Ah hah!  Ok, so I've managed to start it using virsh on the
>>> >second
>>> >> >host
>>> >> >>> but my first host is still dead.
>>> >> >>>
>>> >> >>> First of all, what are these 56,317 .prob- files that get
>dumped
>>> >to
>>> >> >the
>>> >> >>> NFS mounts?
>>> >> >>>
>>> >> >>> Secondly, why doesn't the node mount the NFS directories at
>boot?
>>> >> >Is
>>> >> >>> that the issue with this particular node?
>>> >> >>>
>>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM <eevans@digitaldatatechs.com>
>>> >wrote:
>>> >> >>>
>>> >> >>>> Did you try virsh list --inactive
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> Eric Evans
>>> >> >>>>
>>> >> >>>> Digital Data Services LLC.
>>> >> >>>>
>>> >> >>>> 304.660.9080
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk>
>>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM
>>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com>
>>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org>
>>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive - how
>to
>>> >> >rescue?
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> I've now shut down the VMs on one host and rebooted it but
>the
>>> >> >agent
>>> >> >>>> service doesn't start.  If I run 'hosted-engine --vm-status'
>I
>>> >get:
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> The hosted engine configuration has not been retrieved from
>>> >shared
>>> >> >>>> storage. Please ensure that ovirt-ha-agent is running and
>the
>>> >> >storage
>>> >> >>>> server is reachable.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> and indeed if I list the mounts under /rhev/data-center/mnt,
>>> >only
>>> >> >one of
>>> >> >>>> the directories is mounted.  I have 3 NFS mounts, one ISO
>Domain
>>> >> >and two
>>> >> >>>> Data Domains.  Only one Data Domain has mounted and this has
>>> >lots
>>> >> >of .prob
>>> >> >>>> files in.  So why haven't the other NFS exports been
>mounted?
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> Manually mounting them doesn't seem to have helped much
>either.
>>> >I
>>> >> >can
>>> >> >>>> start the broker service but the agent service says no.
>Same
>>> >error
>>> >> >as the
>>> >> >>>> one in my last email.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> Shareef.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq
>>> >> ><shareef@jalloq.co.uk>
>>> >> >>>> wrote:
>>> >> >>>>
>>> >> >>>> Right, still down.  I've run virsh and it doesn't know
>anything
>>> >> >about
>>> >> >>>> the engine vm.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> I've restarted the broker and agent services and I still get
>>> >> >nothing in
>>> >> >>>> virsh->list.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see lots
>of
>>> >> >errors:
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> broker.log:
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Searching for submonitors in
>>> >> >>>>
>>> >>
>>>
>>>
>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor network
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor cpu-load-no-engine
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor mgmt-bridge
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor network
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor cpu-load
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor engine-health
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor mgmt-bridge
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor cpu-load-no-engine
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor cpu-load
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor mem-free
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor storage-domain
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor storage-domain
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor mem-free
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Loaded submonitor engine-health
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Finished loading submonitors
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>> >> >>>> Connecting the storage
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>>> Connecting storage server
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>>> Connecting storage server
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>>> Refreshing the storage domain
>>> >> >>>>
>>> >> >>>> MainThread::WARNING::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>>> >> >>>> Can't connect vdsm storage: Command StorageDomain.getInfo
>with
>>> >args
>>> >> >>>> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
>>> >failed:
>>> >> >>>>
>>> >> >>>> (code=350, message=Error in storage domain action:
>>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>>> Searching for submonitors in
>>> >> >>>>
>>> >>
>>>
>>>
>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> agent.log:
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> MainThread::ERROR::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> >> >>>> Trying to restart agent
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>>
>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> >> >>>> Agent shutting down
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>>
>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>>> >> >>>> Found certificate common name: ovirt-node-01.phoelex.com
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>> >> >>>> Initializing ha-broker connection
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>> >> >>>> Starting monitor network, options {'tcp_t_address': '',
>>> >> >'network_test':
>>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'}
>>> >> >>>>
>>> >> >>>> MainThread::ERROR::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>> >> >>>> Failed to start necessary monitors
>>> >> >>>>
>>> >> >>>> MainThread::ERROR::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> >> >>>> Traceback (most recent call last):
>>> >> >>>>
>>> >> >>>>   File
>>> >> >>>>
>>> >>
>>>
>>>
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> >> >>>> line 131, in _run_agent
>>> >> >>>>
>>> >> >>>>     return action(he)
>>> >> >>>>
>>> >> >>>>   File
>>> >> >>>>
>>> >>
>>>
>>>
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> >> >>>> line 55, in action_proper
>>> >> >>>>
>>> >> >>>>     return he.start_monitoring()
>>> >> >>>>
>>> >> >>>>   File
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> >> >>>> line 432, in start_monitoring
>>> >> >>>>
>>> >> >>>>     self._initialize_broker()
>>> >> >>>>
>>> >> >>>>   File
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> >> >>>> line 556, in _initialize_broker
>>> >> >>>>
>>> >> >>>>     m.get('options', {}))
>>> >> >>>>
>>> >> >>>>   File
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>> >> >>>> line 89, in start_monitor
>>> >> >>>>
>>> >> >>>>     ).format(t=type, o=options, e=e)
>>> >> >>>>
>>> >> >>>> RequestError: brokerlink - failed to start monitor via
>>> >> >ovirt-ha-broker:
>>> >> >>>> [Errno 2] No such file or directory, [monitor: 'network',
>>> >options:
>>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', 'tcp_t_port':
>'',
>>> >> >'addr':
>>> >> >>>> '192.168.1.99'}]
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> MainThread::ERROR::2020-04-08
>>> >> >>>>
>>> >>
>>> >>
>>>
>>>
>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> >> >>>> Trying to restart agent
>>> >> >>>>
>>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>>>
>>> >>
>>>
>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> >> >>>> Agent shutting down
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov
>>> >> ><hunter86_bg@yahoo.com>
>>> >> >>>> wrote:
>>> >> >>>>
>>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, Brett" <
>>> >> >>>> matonb@ltresources.co.uk> wrote:
>>> >> >>>> >On the host you tried to restart the engine on:
>>> >> >>>> >
>>> >> >>>> >Add an alias to virsh (authenticates with virsh_auth.conf)
>>> >> >>>> >
>>> >> >>>> >alias virsh='virsh -c
>>> >> >>>>
>>> >>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
>>> >> >>>> >
>>> >> >>>> >Then run virsh:
>>> >> >>>> >
>>> >> >>>> >virsh
>>> >> >>>> >
>>> >> >>>> >virsh # list
>>> >> >>>> > Id    Name                           State
>>> >> >>>> >----------------------------------------------------
>>> >> >>>> > xx    HostedEngine                   Paused
>>> >> >>>> > xx    **********                     running
>>> >> >>>> > ...
>>> >> >>>> > xx     **********                     running
>>> >> >>>> >
>>> >> >>>> >HostedEngine should be in the list, try and resume the
>engine:
>>> >> >>>> >
>>> >> >>>> >virsh # resume HostedEngine
>>> >> >>>> >
>>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq
>>> ><shareef@jalloq.co.uk>
>>> >> >>>> >wrote:
>>> >> >>>> >
>>> >> >>>> >> Thanks!
>>> >> >>>> >>
>>> >> >>>> >> The status hangs due to, I guess, the VM being down....
>>> >> >>>> >>
>>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start
>>> >> >>>> >> VM exists and is down, cleaning up and restarting
>>> >> >>>> >> VM in WaitForLaunch
>>> >> >>>> >>
>>> >> >>>> >> but this doesn't seem to do anything.  OK, after a while
>I
>>> >get a
>>> >> >>>> >status of
>>> >> >>>> >> it being barfed...
>>> >> >>>> >>
>>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id: 1) status ==--
>>> >> >>>> >>
>>> >> >>>> >> conf_on_shared_storage             : True
>>> >> >>>> >> Status up-to-date                  : False
>>> >> >>>> >> Hostname                           :
>>> >ovirt-node-00.phoelex.com
>>> >> >>>> >> Host ID                            : 1
>>> >> >>>> >> Engine status                      : unknown stale-data
>>> >> >>>> >> Score                              : 3400
>>> >> >>>> >> stopped                            : False
>>> >> >>>> >> Local maintenance                  : False
>>> >> >>>> >> crc32                              : 9c4a034b
>>> >> >>>> >> local_conf_timestamp               : 523362
>>> >> >>>> >> Host timestamp                     : 523608
>>> >> >>>> >> Extra metadata (valid at timestamp):
>>> >> >>>> >> metadata_parse_version=1
>>> >> >>>> >> metadata_feature_version=1
>>> >> >>>> >> timestamp=523608 (Wed Apr  8 16:17:11 2020)
>>> >> >>>> >> host-id=1
>>> >> >>>> >> score=3400
>>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr  8 16:13:06 2020)
>>> >> >>>> >> conf_on_shared_storage=True
>>> >> >>>> >> maintenance=False
>>> >> >>>> >> state=EngineDown
>>> >> >>>> >> stopped=False
>>> >> >>>> >>
>>> >> >>>> >>
>>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id: 2) status ==--
>>> >> >>>> >>
>>> >> >>>> >> conf_on_shared_storage             : True
>>> >> >>>> >> Status up-to-date                  : True
>>> >> >>>> >> Hostname                           :
>>> >ovirt-node-01.phoelex.com
>>> >> >>>> >> Host ID                            : 2
>>> >> >>>> >> Engine status                      : {"reason": "bad vm
>>> >status",
>>> >> >>>> >"health":
>>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"}
>>> >> >>>> >> Score                              : 0
>>> >> >>>> >> stopped                            : False
>>> >> >>>> >> Local maintenance                  : False
>>> >> >>>> >> crc32                              : 5045f2eb
>>> >> >>>> >> local_conf_timestamp               : 1737037
>>> >> >>>> >> Host timestamp                     : 1737283
>>> >> >>>> >> Extra metadata (valid at timestamp):
>>> >> >>>> >> metadata_parse_version=1
>>> >> >>>> >> metadata_feature_version=1
>>> >> >>>> >> timestamp=1737283 (Wed Apr  8 16:16:17 2020)
>>> >> >>>> >> host-id=2
>>> >> >>>> >> score=0
>>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr  8 16:12:11 2020)
>>> >> >>>> >> conf_on_shared_storage=True
>>> >> >>>> >> maintenance=False
>>> >> >>>> >> state=EngineUnexpectedlyDown
>>> >> >>>> >> stopped=False
>>> >> >>>> >>
>>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett
>>> >> >>>> ><matonb@ltresources.co.uk>
>>> >> >>>> >> wrote:
>>> >> >>>> >>
>>> >> >>>> >>> First steps, on one of your hosts as root:
>>> >> >>>> >>>
>>> >> >>>> >>> To get information:
>>> >> >>>> >>> hosted-engine --vm-status
>>> >> >>>> >>>
>>> >> >>>> >>> To start the engine:
>>> >> >>>> >>> hosted-engine --vm-start
>>> >> >>>> >>>
>>> >> >>>> >>>
>>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq
>>> >> ><shareef@jalloq.co.uk>
>>> >> >>>> >wrote:
>>> >> >>>> >>>
>>> >> >>>> >>>> So my engine has gone down and I can't ssh into it
>either.
>>> >If
>>> >> >I
>>> >> >>>> >try to
>>> >> >>>> >>>> log into the web-ui of the node it is running on, I get
>>> >> >redirected
>>> >> >>>> >because
>>> >> >>>> >>>> the node can't reach the engine.
>>> >> >>>> >>>>
>>> >> >>>> >>>> What are my next steps?
>>> >> >>>> >>>>
>>> >> >>>> >>>> Shareef.
>>> >> >>>> >>>> _______________________________________________
>>> >> >>>> >>>> Users mailing list -- users@ovirt.org
>>> >> >>>> >>>> To unsubscribe send an email to users-leave@ovirt.org
>>> >> >>>> >>>> Privacy Statement:
>>> >https://www.ovirt.org/privacy-policy.html
>>> >> >>>> >>>> oVirt Code of Conduct:
>>> >> >>>> >>>>
>https://www.ovirt.org/community/about/community-guidelines/
>>> >> >>>> >>>> List Archives:
>>> >> >>>> >>>>
>>> >> >>>> >
>>> >> >>>>
>>> >> >
>>> >>
>>> >
>>>
>https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5CDRQWR5MIKJUH3ISLCQ/
>>> >> >>>> >>>>
>>> >> >>>> >>>
>>> >> >>>>
>>> >> >>>> This has  to be resolved:
>>> >> >>>>
>>> >> >>>> Engine status                      : unknown stale-data
>>> >> >>>>
>>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains the
>same,
>>> >> >restart
>>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service
>>> >> >>>>
>>> >> >>>> Verify that the engine's storage is available. Then monitor
>the
>>> >> >broker
>>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha
>>> >> >>>>
>>> >> >>>> Best Regards,
>>> >> >>>> Strahil Nikolov
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >>
>>> >> Hi Shareef,
>>> >>
>>> >> The flow of activation oVirt is more complex than a plain KVM.
>>> >> Mounting of the domains happen during the activation of the node
>(
>>> >the
>>> >> HostedEngine is activating everything needed).
>>> >>
>>> >> Focus on the HostedEngine VM.
>>> >> Is it running properly ?
>>> >>
>>> >> If not,try:
>>> >> 1. Verify that the storage domain exists
>>> >> 2. Check if  it has 'ha_agents' directory
>>> >> 3. Check if the links are  OK, if not you can safely remove the
>links
>>> >>
>>> >> 4. Next check the services are running:
>>> >> A) sanlock
>>> >> B) supervdsmd
>>> >> C) vdsmd
>>> >> D) libvirtd
>>> >>
>>> >> 5. Increase the log level for broker  and agent services:
>>> >>
>>> >> cd  /etc/ovirt-hosted-engine-ha
>>> >> vim *-log.conf
>>> >>
>>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent
>>> >>
>>> >> 6. Check what they are complaining about
>>> >> Keep in mind that agent will keep throwing errors  untill the
>broker
>>> >stops
>>> >> doing it (agent depends  on broker),  so broker must be OK before
>>> >> peoceeding with the agent log.
>>> >>
>>> >> About the manual VM start, you need  2 things:
>>> >>
>>> >> 1.  Define the VM network
>>> >> # cat vdsm-ovirtmgmt.xml <network>
>>> >>   <name>vdsm-ovirtmgmt</name>
>>> >>   <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid>
>>> >>   <forward mode='bridge'/>
>>> >>   <bridge name='ovirtmgmt'/>
>>> >> </network>
>>> >>
>>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define
>vdsm-ovirtmgmt.xml
>>> >>
>>> >> 2. Get an xml definition which can be found in the vdsm log.
>Every VM
>>> >at
>>> >> start up has it's configuration printed out  in vdsm log  on the
>host
>>> >it
>>> >> starts.
>>> >> Save to file and then:
>>> >> A) virsh define myvm.xml
>>> >> B) virsh start myvm
>>> >>
>>> >> It seems there is/was a problem with your NFS shares.
>>> >>
>>> >>
>>> >> Best Regards,
>>> >> Strahil Nikolov
>>> >>
>>>
>>> Hey Shareef,
>>>
>>> Check if there are any files or folders not owned by vdsm:kvm .
>Something
>>> like this:
>>>
>>> find . -not -user 36 -not  -group 36 -print
>>>
>>> Also check if vdsm can access the images in the
>>> '<vol-mount-point>/images' directories.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>

And the IPv6 address  '64:ff9b::c0a8:13d' ?

I  don't see  in the log output.

Best Regards,
Strahil Nikolov