On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq <shareef(a)jalloq.co.uk>
wrote:
Right, I've given up on recovering the HE so want to try and
redeploy
it.
There doesn't seem to be enough information to debug why the
broker/agent
won't start cleanly.
In running 'hosted-engine --deploy', I'm seeing the following error in
the
setup validation phase:
2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human
dialog.__logString:204 DIALOG:SEND Please provide the
hostname of this host on the management network
[
ovirt-node-00.phoelex.com]:
2020-04-14 09:46:12,831+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge
hostname.getResolvedAddresses:432
getResolvedAddresses: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge
hostname._validateFQDNresolvability:289
ovirt-node-00.phoelex.com
resolves
to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
2020-04-14 09:46:12,832+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
execute:
['/usr/bin/dig', '+noall', '+answer',
'ovirt-node-00.phoelex.com',
'ANY'],
executable='None', cwd='None', env=None
2020-04-14 09:46:12,871+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
execute-result: ['/usr/bin/dig', '+noall', '+answer', '
ovirt-node-00.phoelex.com', 'ANY'], rc=0
2020-04-14 09:46:12,872+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge plugin.execute:921
execute-output: ['/usr/bin/dig', '+noall', '+answer', '
ovirt-node-00.phoelex.com', 'ANY'] stdout:
ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
2020-04-14 09:46:12,872+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge plugin.execute:926
execute-output: ['/usr/bin/dig', '+noall', '+answer', '
ovirt-node-00.phoelex.com', 'ANY'] stderr:
2020-04-14 09:46:12,872+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
execute:
('/usr/sbin/ip', 'addr'), executable='None', cwd='None',
env=None
2020-04-14 09:46:12,876+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
execute-result: ('/usr/sbin/ip', 'addr'), rc=0
2020-04-14 09:46:12,876+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge plugin.execute:921
execute-output: ('/usr/sbin/ip', 'addr') stdout:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
ovirtmgmt state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
DOWN
group default qlen 1000
link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
group
default qlen 1000
link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default qlen 1000
link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:febc:326a/64 scope link
valid_lft forever preferred_lft forever
22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
group
default qlen 1000
link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
2020-04-14 09:46:12,876+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge plugin.execute:926
execute-output: ('/usr/sbin/ip', 'addr') stderr:
2020-04-14 09:46:12,877+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge
hostname.getLocalAddresses:251
addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a']
2020-04-14 09:46:12,877+0000 DEBUG
otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464
test_hostname exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
line
460, in test_hostname
not_local_text,
File "/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
line
342, in _validateFQDNresolvability
addresses=resolvedAddressesAsString
RuntimeError:
ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d
192.168.1.61 and not all of them can be mapped to non loopback devices
on
this host
2020-04-14 09:46:12,884+0000 ERROR
otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120 Host
name
is not valid:
ovirt-node-00.phoelex.com resolves to 64:ff9b::c0a8:13d
192.168.1.61 and not all of them can be mapped to non loopback devices
on
this host
The node I'm running on has an IP address of .61 and resolves
correctly.
On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq <shareef(a)jalloq.co.uk>
wrote:
> Where should I be checking if there are any files/folder not owned by
> vdsm:kvm? I checked on the mount the HA sits on and it's fine.
>
> How would I go about checking vdsm can access those images? If I run
> virsh, it lists them and they were running yesterday even though the
HA was
> down. I've since restarted both hosts but the broker is still
spitting out
> the same error (copied below). How do I find the reason the broker
can't
> connect to the storage? The conf file is already at DEBUG verbosity:
>
> [handler_logfile]
>
> class=logging.handlers.TimedRotatingFileHandler
>
> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
>
> level=DEBUG
>
> formatter=long
>
> And what are all these .prob-<num> files that are being created?
There
> are over 250K of them now on the mount I'm using for the Data domain.
> They're all of 0 size and of the form,
> /rhev/data-center/mnt/nas-01.phoelex.com:
> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
>
> @eevans: The volume I have the Data Domain on has TB's free. The HA
is
> dead so I can't ssh in. No idea what started these errors and the
other
> VMs were still running happily although they're on a different Data
Domain.
>
> Shareef.
>
> MainThread::INFO::2020-04-10
>
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> Connecting the storage
>
> MainThread::INFO::2020-04-10
>
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
>
> MainThread::INFO::2020-04-10
>
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
>
> MainThread::INFO::2020-04-10
>
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Refreshing the storage domain
>
> MainThread::WARNING::2020-04-10
>
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> Can't connect vdsm storage: Command StorageDomain.getInfo with args
> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
>
> (code=350, message=Error in storage domain action:
> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>
> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov
<hunter86_bg(a)yahoo.com>
> wrote:
>
>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq <
>> shareef(a)jalloq.co.uk> wrote:
>> >OK, let's go through this. I'm looking at the node that at least
still
>> >has
>> >some VMs running. virsh also tells me that the HostedEngine VM is
>> >running
>> >but it's unresponsive and I can't shut it down.
>> >
>> >1. All storage domains exist and are mounted.
>> >2. The ha_agent exists:
>> >
>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls
/rhev/data-center/mnt/
>> >nas-01.phoelex.com
>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
>> >
>> >dom_md ha_agent images master
>> >
>> >3. There are two links
>> >
>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll
/rhev/data-center/mnt/
>> >nas-01.phoelex.com
>> >\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
>> >
>> >total 8
>> >
>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.lockspace ->
>>
>>
>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
>> >
>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 hosted-engine.metadata ->
>>
>>
>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
>> >
>> >4. The services exist but all seem to have some sort of warning:
>> >
>> >a) Apr 08 18:10:55
ovirt-node-01.phoelex.com sanlock[1728]:
*2020-04-08
>> >18:10:55 1744152 [36796]: s16 delta_renew long write time 10 sec*
>> >
>> >b) Mar 23 18:02:59
ovirt-node-01.phoelex.com supervdsmd[29409]:
*failed
>> >to
>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared object
file:
>> >No
>> >such file or directory*
>> >
>> >c) Apr 09 08:05:13
ovirt-node-01.phoelex.com vdsm[4801]: *ERROR
failed
>> >to
>> >retrieve Hosted Engine HA score '[Errno 2] No such file or
directory'Is
>> >the
>> >Hosted Engine setup finished?*
>> >
>> >d)Apr 08 22:48:27
ovirt-node-01.phoelex.com libvirtd[29307]:
2020-04-08
>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 :
cannot
>> >parse
>> >process status data
>> >
>> >Apr 08 22:48:27
ovirt-node-01.phoelex.com libvirtd[29307]:
2020-04-08
>> >22:48:27.134+0000: 29309: error : virNetDevTapInterfaceStats:764 :
>> >internal
>> >error: /proc/net/dev: Interface not found
>> >
>> >Apr 08 23:09:39
ovirt-node-01.phoelex.com libvirtd[29307]:
2020-04-08
>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 : End
of
>> >file
>> >while reading data: Input/output error
>> >
>> >Apr 09 01:05:26
ovirt-node-01.phoelex.com libvirtd[29307]:
2020-04-09
>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 : End
of
>> >file
>> >while reading data: Input/output error
>> >
>> >5 & 6. The broker log is continually printing this error:
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> >ovirt-hosted-engine-ha broker 2.3.6 started
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> >Running broker
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
>> >Starting monitor
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Searching for submonitors in
>> >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
>> >
>> >/submonitors
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor network
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor cpu-load-no-engine
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor mgmt-bridge
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor network
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor cpu-load
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor engine-health
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor mgmt-bridge
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor cpu-load-no-engine
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor cpu-load
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor mem-free
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor storage-domain
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor storage-domain
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor mem-free
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Loaded submonitor engine-health
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >Finished loading submonitors
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
>> >Starting storage broker
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> >Connecting to VDSM
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
>> >Creating a new json-rpc connection to VDSM
>> >
>> >Client localhost:54321::DEBUG::2020-04-09
>> >08:07:31,453::concurrent::258::root::(run) START thread
<Thread(Client
>> >localhost:54321, started daemon 139992488138496)> (func=<bound
method
>> >Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor
object at
>> >0x7f528acabc90>>, args=(), kwargs={})
>> >
>> >Client localhost:54321::DEBUG::2020-04-09
>>
>>
>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
>> >Stomp connection established
>> >
>> >MainThread::DEBUG::2020-04-09
>> >08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send)
Sending
>> >response
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> >Connecting the storage
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >Connecting storage server
>> >
>> >MainThread::DEBUG::2020-04-09
>> >08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send)
Sending
>> >response
>> >
>> >MainThread::DEBUG::2020-04-09
>> >08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send)
Sending
>> >response
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not
available
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >Connecting storage server
>> >
>> >MainThread::DEBUG::2020-04-09
>> >08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send)
Sending
>> >response
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >[{u'status': 0, u'id':
u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
>> >
>> >MainThread::INFO::2020-04-09
>>
>>
>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >Refreshing the storage domain
>> >
>> >MainThread::DEBUG::2020-04-09
>> >08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send)
Sending
>> >response
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >Error refreshing storage domain: Command StorageDomain.getStats
with
>> >args
>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
failed:
>> >
>> >(code=350, message=Error in storage domain action:
>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>> >
>> >MainThread::DEBUG::2020-04-09
>> >08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send)
Sending
>> >response
>> >
>> >MainThread::DEBUG::2020-04-09
>>
>>
>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
>> >Command StorageDomain.getInfo with args {'storagedomainID':
>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
>> >
>> >(code=350, message=Error in storage domain action:
>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>> >
>> >MainThread::WARNING::2020-04-09
>>
>>
>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>> >Can't connect vdsm storage: Command StorageDomain.getInfo with args
>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
failed:
>> >
>> >(code=350, message=Error in storage domain action:
>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>> >
>> >
>> >The UUID it is moaning about is indeed the one that the HA sits on
and
>> >is
>> >the one I listed the contents of in step 2 above.
>> >
>> >
>> >So why can't it see this domain?
>> >
>> >
>> >Thanks, Shareef.
>> >
>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov
<hunter86_bg(a)yahoo.com>
>> >wrote:
>> >
>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq <
>> >> shareef(a)jalloq.co.uk> wrote:
>> >> >Don't know if this is useful or not, but I just tried to
shutdown
>> >and
>> >> >start
>> >> >another VM on one of the hosts and get the following error:
>> >> >
>> >> >virsh # start scratch
>> >> >
>> >> >error: Failed to start domain scratch
>> >> >
>> >> >error: Network not found: no network with matching name
>> >> >'vdsm-ovirtmgmt'
>> >> >
>> >> >Is this not referring to the interface name as the network is
called
>> >> >'ovirtmgnt'.
>> >> >
>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq
>> ><shareef(a)jalloq.co.uk>
>> >> >wrote:
>> >> >
>> >> >> Hmmm, virsh tells me the HE is running but it hasn't come
up
and
>> >the
>> >> >> agent.log is full of the same errors.
>> >> >>
>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq
>> ><shareef(a)jalloq.co.uk>
>> >> >> wrote:
>> >> >>
>> >> >>> Ah hah! Ok, so I've managed to start it using virsh on
the
>> >second
>> >> >host
>> >> >>> but my first host is still dead.
>> >> >>>
>> >> >>> First of all, what are these 56,317 .prob- files that get
dumped
>> >to
>> >> >the
>> >> >>> NFS mounts?
>> >> >>>
>> >> >>> Secondly, why doesn't the node mount the NFS
directories at
boot?
>> >> >Is
>> >> >>> that the issue with this particular node?
>> >> >>>
>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM
<eevans(a)digitaldatatechs.com>
>> >wrote:
>> >> >>>
>> >> >>>> Did you try virsh list --inactive
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> Eric Evans
>> >> >>>>
>> >> >>>> Digital Data Services LLC.
>> >> >>>>
>> >> >>>> 304.660.9080
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> *From:* Shareef Jalloq <shareef(a)jalloq.co.uk>
>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM
>> >> >>>> *To:* Strahil Nikolov <hunter86_bg(a)yahoo.com>
>> >> >>>> *Cc:* Ovirt Users <users(a)ovirt.org>
>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine unresponsive
- how
to
>> >> >rescue?
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> I've now shut down the VMs on one host and rebooted
it but
the
>> >> >agent
>> >> >>>> service doesn't start. If I run 'hosted-engine
--vm-status'
I
>> >get:
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> The hosted engine configuration has not been retrieved
from
>> >shared
>> >> >>>> storage. Please ensure that ovirt-ha-agent is running
and
the
>> >> >storage
>> >> >>>> server is reachable.
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> and indeed if I list the mounts under
/rhev/data-center/mnt,
>> >only
>> >> >one of
>> >> >>>> the directories is mounted. I have 3 NFS mounts, one
ISO
Domain
>> >> >and two
>> >> >>>> Data Domains. Only one Data Domain has mounted and
this has
>> >lots
>> >> >of .prob
>> >> >>>> files in. So why haven't the other NFS exports
been
mounted?
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> Manually mounting them doesn't seem to have helped
much
either.
>> >I
>> >> >can
>> >> >>>> start the broker service but the agent service says no.
Same
>> >error
>> >> >as the
>> >> >>>> one in my last email.
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> Shareef.
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq
>> >> ><shareef(a)jalloq.co.uk>
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>> Right, still down. I've run virsh and it
doesn't know
anything
>> >> >about
>> >> >>>> the engine vm.
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> I've restarted the broker and agent services and I
still get
>> >> >nothing in
>> >> >>>> virsh->list.
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I see
lots
of
>> >> >errors:
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> broker.log:
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Searching for submonitors in
>> >> >>>>
>> >>
>>
>>
>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor network
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor cpu-load-no-engine
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor mgmt-bridge
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor network
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor cpu-load
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor engine-health
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor mgmt-bridge
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor cpu-load-no-engine
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor cpu-load
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor mem-free
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor storage-domain
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor storage-domain
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor mem-free
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Loaded submonitor engine-health
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Finished loading submonitors
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>> >> >>>> Connecting the storage
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >> >>>> Connecting storage server
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >> >>>> Connecting storage server
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>> >> >>>> Refreshing the storage domain
>> >> >>>>
>> >> >>>> MainThread::WARNING::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>> >> >>>> Can't connect vdsm storage: Command
StorageDomain.getInfo
with
>> >args
>> >> >>>> {'storagedomainID':
'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
>> >failed:
>> >> >>>>
>> >> >>>> (code=350, message=Error in storage domain action:
>> >> >>>>
(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>> >> >>>> Searching for submonitors in
>> >> >>>>
>> >>
>>
>>
>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> agent.log:
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> MainThread::ERROR::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> >> >>>> Trying to restart agent
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>>
>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> >> >>>> Agent shutting down
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>>
>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>> >> >>>> Found certificate common name:
ovirt-node-01.phoelex.com
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>> >> >>>> Initializing ha-broker connection
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>> >> >>>> Starting monitor network, options
{'tcp_t_address': '',
>> >> >'network_test':
>> >> >>>> 'dns', 'tcp_t_port': '',
'addr': '192.168.1.99'}
>> >> >>>>
>> >> >>>> MainThread::ERROR::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>> >> >>>> Failed to start necessary monitors
>> >> >>>>
>> >> >>>> MainThread::ERROR::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> >> >>>> Traceback (most recent call last):
>> >> >>>>
>> >> >>>> File
>> >> >>>>
>> >>
>>
>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> >> >>>> line 131, in _run_agent
>> >> >>>>
>> >> >>>> return action(he)
>> >> >>>>
>> >> >>>> File
>> >> >>>>
>> >>
>>
>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> >> >>>> line 55, in action_proper
>> >> >>>>
>> >> >>>> return he.start_monitoring()
>> >> >>>>
>> >> >>>> File
>> >> >>>>
>> >>
>> >>
>>
>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> >> >>>> line 432, in start_monitoring
>> >> >>>>
>> >> >>>> self._initialize_broker()
>> >> >>>>
>> >> >>>> File
>> >> >>>>
>> >>
>> >>
>>
>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> >> >>>> line 556, in _initialize_broker
>> >> >>>>
>> >> >>>> m.get('options', {}))
>> >> >>>>
>> >> >>>> File
>> >> >>>>
>> >>
>> >>
>>
>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>> >> >>>> line 89, in start_monitor
>> >> >>>>
>> >> >>>> ).format(t=type, o=options, e=e)
>> >> >>>>
>> >> >>>> RequestError: brokerlink - failed to start monitor via
>> >> >ovirt-ha-broker:
>> >> >>>> [Errno 2] No such file or directory, [monitor:
'network',
>> >options:
>> >> >>>> {'tcp_t_address': '',
'network_test': 'dns', 'tcp_t_port':
'',
>> >> >'addr':
>> >> >>>> '192.168.1.99'}]
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> MainThread::ERROR::2020-04-08
>> >> >>>>
>> >>
>> >>
>>
>>
>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> >> >>>> Trying to restart agent
>> >> >>>>
>> >> >>>> MainThread::INFO::2020-04-08
>> >> >>>>
>> >>
>>
>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> >> >>>> Agent shutting down
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov
>> >> ><hunter86_bg(a)yahoo.com>
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton,
Brett" <
>> >> >>>> matonb(a)ltresources.co.uk> wrote:
>> >> >>>> >On the host you tried to restart the engine on:
>> >> >>>> >
>> >> >>>> >Add an alias to virsh (authenticates with
virsh_auth.conf)
>> >> >>>> >
>> >> >>>> >alias virsh='virsh -c
>> >> >>>>
>> >>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
>> >> >>>> >
>> >> >>>> >Then run virsh:
>> >> >>>> >
>> >> >>>> >virsh
>> >> >>>> >
>> >> >>>> >virsh # list
>> >> >>>> > Id Name State
>> >> >>>>
>----------------------------------------------------
>> >> >>>> > xx HostedEngine Paused
>> >> >>>> > xx ********** running
>> >> >>>> > ...
>> >> >>>> > xx ********** running
>> >> >>>> >
>> >> >>>> >HostedEngine should be in the list, try and resume
the
engine:
>> >> >>>> >
>> >> >>>> >virsh # resume HostedEngine
>> >> >>>> >
>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq
>> ><shareef(a)jalloq.co.uk>
>> >> >>>> >wrote:
>> >> >>>> >
>> >> >>>> >> Thanks!
>> >> >>>> >>
>> >> >>>> >> The status hangs due to, I guess, the VM being
down....
>> >> >>>> >>
>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine
--vm-start
>> >> >>>> >> VM exists and is down, cleaning up and
restarting
>> >> >>>> >> VM in WaitForLaunch
>> >> >>>> >>
>> >> >>>> >> but this doesn't seem to do anything. OK,
after a while
I
>> >get a
>> >> >>>> >status of
>> >> >>>> >> it being barfed...
>> >> >>>> >>
>> >> >>>> >> --== Host
ovirt-node-00.phoelex.com (id: 1)
status ==--
>> >> >>>> >>
>> >> >>>> >> conf_on_shared_storage : True
>> >> >>>> >> Status up-to-date : False
>> >> >>>> >> Hostname :
>> >ovirt-node-00.phoelex.com
>> >> >>>> >> Host ID : 1
>> >> >>>> >> Engine status : unknown
stale-data
>> >> >>>> >> Score : 3400
>> >> >>>> >> stopped : False
>> >> >>>> >> Local maintenance : False
>> >> >>>> >> crc32 : 9c4a034b
>> >> >>>> >> local_conf_timestamp : 523362
>> >> >>>> >> Host timestamp : 523608
>> >> >>>> >> Extra metadata (valid at timestamp):
>> >> >>>> >> metadata_parse_version=1
>> >> >>>> >> metadata_feature_version=1
>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11 2020)
>> >> >>>> >> host-id=1
>> >> >>>> >> score=3400
>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8
16:13:06 2020)
>> >> >>>> >> conf_on_shared_storage=True
>> >> >>>> >> maintenance=False
>> >> >>>> >> state=EngineDown
>> >> >>>> >> stopped=False
>> >> >>>> >>
>> >> >>>> >>
>> >> >>>> >> --== Host
ovirt-node-01.phoelex.com (id: 2)
status ==--
>> >> >>>> >>
>> >> >>>> >> conf_on_shared_storage : True
>> >> >>>> >> Status up-to-date : True
>> >> >>>> >> Hostname :
>> >ovirt-node-01.phoelex.com
>> >> >>>> >> Host ID : 2
>> >> >>>> >> Engine status :
{"reason": "bad vm
>> >status",
>> >> >>>> >"health":
>> >> >>>> >> "bad", "vm":
"down_unexpected", "detail": "Down"}
>> >> >>>> >> Score : 0
>> >> >>>> >> stopped : False
>> >> >>>> >> Local maintenance : False
>> >> >>>> >> crc32 : 5045f2eb
>> >> >>>> >> local_conf_timestamp : 1737037
>> >> >>>> >> Host timestamp : 1737283
>> >> >>>> >> Extra metadata (valid at timestamp):
>> >> >>>> >> metadata_parse_version=1
>> >> >>>> >> metadata_feature_version=1
>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17 2020)
>> >> >>>> >> host-id=2
>> >> >>>> >> score=0
>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8
16:12:11 2020)
>> >> >>>> >> conf_on_shared_storage=True
>> >> >>>> >> maintenance=False
>> >> >>>> >> state=EngineUnexpectedlyDown
>> >> >>>> >> stopped=False
>> >> >>>> >>
>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett
>> >> >>>> ><matonb(a)ltresources.co.uk>
>> >> >>>> >> wrote:
>> >> >>>> >>
>> >> >>>> >>> First steps, on one of your hosts as
root:
>> >> >>>> >>>
>> >> >>>> >>> To get information:
>> >> >>>> >>> hosted-engine --vm-status
>> >> >>>> >>>
>> >> >>>> >>> To start the engine:
>> >> >>>> >>> hosted-engine --vm-start
>> >> >>>> >>>
>> >> >>>> >>>
>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef
Jalloq
>> >> ><shareef(a)jalloq.co.uk>
>> >> >>>> >wrote:
>> >> >>>> >>>
>> >> >>>> >>>> So my engine has gone down and I
can't ssh into it
either.
>> >If
>> >> >I
>> >> >>>> >try to
>> >> >>>> >>>> log into the web-ui of the node it is
running on, I get
>> >> >redirected
>> >> >>>> >because
>> >> >>>> >>>> the node can't reach the engine.
>> >> >>>> >>>>
>> >> >>>> >>>> What are my next steps?
>> >> >>>> >>>>
>> >> >>>> >>>> Shareef.
>> >> >>>> >>>>
_______________________________________________
>> >> >>>> >>>> Users mailing list -- users(a)ovirt.org
>> >> >>>> >>>> To unsubscribe send an email to
users-leave(a)ovirt.org
>> >> >>>> >>>> Privacy Statement:
>> >https://www.ovirt.org/privacy-policy.html
>> >> >>>> >>>> oVirt Code of Conduct:
>> >> >>>> >>>>
https://www.ovirt.org/community/about/community-guidelines/
>> >> >>>> >>>> List Archives:
>> >> >>>> >>>>
>> >> >>>> >
>> >> >>>>
>> >> >
>> >>
>> >
>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRS...
>> >> >>>> >>>>
>> >> >>>> >>>
>> >> >>>>
>> >> >>>> This has to be resolved:
>> >> >>>>
>> >> >>>> Engine status : unknown
stale-data
>> >> >>>>
>> >> >>>> Run again 'hosted-engine --vm-status'. If it
remains the
same,
>> >> >restart
>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service
>> >> >>>>
>> >> >>>> Verify that the engine's storage is available. Then
monitor
the
>> >> >broker
>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha
>> >> >>>>
>> >> >>>> Best Regards,
>> >> >>>> Strahil Nikolov
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >>
>> >> Hi Shareef,
>> >>
>> >> The flow of activation oVirt is more complex than a plain KVM.
>> >> Mounting of the domains happen during the activation of the node
(
>> >the
>> >> HostedEngine is activating everything needed).
>> >>
>> >> Focus on the HostedEngine VM.
>> >> Is it running properly ?
>> >>
>> >> If not,try:
>> >> 1. Verify that the storage domain exists
>> >> 2. Check if it has 'ha_agents' directory
>> >> 3. Check if the links are OK, if not you can safely remove the
links
>> >>
>> >> 4. Next check the services are running:
>> >> A) sanlock
>> >> B) supervdsmd
>> >> C) vdsmd
>> >> D) libvirtd
>> >>
>> >> 5. Increase the log level for broker and agent services:
>> >>
>> >> cd /etc/ovirt-hosted-engine-ha
>> >> vim *-log.conf
>> >>
>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent
>> >>
>> >> 6. Check what they are complaining about
>> >> Keep in mind that agent will keep throwing errors untill the
broker
>> >stops
>> >> doing it (agent depends on broker), so broker must be OK before
>> >> peoceeding with the agent log.
>> >>
>> >> About the manual VM start, you need 2 things:
>> >>
>> >> 1. Define the VM network
>> >> # cat vdsm-ovirtmgmt.xml <network>
>> >> <name>vdsm-ovirtmgmt</name>
>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid>
>> >> <forward mode='bridge'/>
>> >> <bridge name='ovirtmgmt'/>
>> >> </network>
>> >>
>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define
vdsm-ovirtmgmt.xml
>> >>
>> >> 2. Get an xml definition which can be found in the vdsm log.
Every VM
>> >at
>> >> start up has it's configuration printed out in vdsm log on the
host
>> >it
>> >> starts.
>> >> Save to file and then:
>> >> A) virsh define myvm.xml
>> >> B) virsh start myvm
>> >>
>> >> It seems there is/was a problem with your NFS shares.
>> >>
>> >>
>> >> Best Regards,
>> >> Strahil Nikolov
>> >>
>>
>> Hey Shareef,
>>
>> Check if there are any files or folders not owned by vdsm:kvm .
Something
>> like this:
>>
>> find . -not -user 36 -not -group 36 -print
>>
>> Also check if vdsm can access the images in the
>> '<vol-mount-point>/images' directories.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>
And the IPv6 address '64:ff9b::c0a8:13d' ?
I don't see in the log output.
Best Regards,
Strahil Nikolov