Ha, spoke too soon. It's now stuck in a loop and a google points me at
However, forcing ipv4 doesn't seem to have fixed the loop.
On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef(a)jalloq.co.uk> wrote:
OK, that seems to have fixed it, thanks. Is this a side effect of
redeploying the HE over a first time install? Nothing has changed in our
setup and I didn't need to do this when I initially set up our nodes.
On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov <hunter86_bg(a)yahoo.com>
wrote:
> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq <
> shareef(a)jalloq.co.uk> wrote:
> >Hmmm, we're not using ipv6. Is that the issue?
> >
> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov <hunter86_bg(a)yahoo.com>
> >wrote:
> >
> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq <
> >> shareef(a)jalloq.co.uk> wrote:
> >> >Right, I've given up on recovering the HE so want to try and
> >redeploy
> >> >it.
> >> >There doesn't seem to be enough information to debug why the
> >> >broker/agent
> >> >won't start cleanly.
> >> >
> >> >In running 'hosted-engine --deploy', I'm seeing the
following error
> >in
> >> >the
> >> >setup validation phase:
> >> >
> >> >2020-04-14 09:46:08,922+0000 DEBUG otopi.plugins.otopi.dialog.human
> >> >dialog.__logString:204 DIALOG:SEND Please provide
> >the
> >> >hostname of this host on the management network
> >> >[ovirt-node-00.phoelex.com]:
> >> >
> >> >
> >> >2020-04-14 09:46:12,831+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge
> >> >hostname.getResolvedAddresses:432
> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d',
'192.168.1.61'])
> >> >
> >> >2020-04-14 09:46:12,832+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge
> >> >hostname._validateFQDNresolvability:289
ovirt-node-00.phoelex.com
> >> >resolves
> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
> >> >
> >> >2020-04-14 09:46:12,832+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
> >> >execute:
> >> >['/usr/bin/dig', '+noall', '+answer',
'ovirt-node-00.phoelex.com',
> >> >'ANY'],
> >> >executable='None', cwd='None', env=None
> >> >
> >> >2020-04-14 09:46:12,871+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
> >> >execute-result: ['/usr/bin/dig', '+noall',
'+answer', '
> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0
> >> >
> >> >2020-04-14 09:46:12,872+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921
> >> >execute-output: ['/usr/bin/dig', '+noall',
'+answer', '
> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout:
> >> >
> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
> >> >
> >> >
> >> >2020-04-14 09:46:12,872+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926
> >> >execute-output: ['/usr/bin/dig', '+noall',
'+answer', '
> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr:
> >> >
> >> >
> >> >
> >> >2020-04-14 09:46:12,872+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
> >> >execute:
> >> >('/usr/sbin/ip', 'addr'), executable='None',
cwd='None', env=None
> >> >
> >> >2020-04-14 09:46:12,876+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0
> >> >
> >> >2020-04-14 09:46:12,876+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921
> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout:
> >> >
> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state
UNKNOWN
> >> >group
> >> >default qlen 1000
> >> >
> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >> >
> >> > inet 127.0.0.1/8 scope host lo
> >> >
> >> > valid_lft forever preferred_lft forever
> >> >
> >> > inet6 ::1/128 scope host
> >> >
> >> > valid_lft forever preferred_lft forever
> >> >
> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
master
> >> >ovirtmgmt state UP group default qlen 1000
> >> >
> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
> >> >
> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq
state
> >> >DOWN
> >> >group default qlen 1000
> >> >
> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
> >> >
> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state
DOWN
> >> >group
> >> >default qlen 1000
> >> >
> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
> >> >
> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> >group
> >> >default qlen 1000
> >> >
> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
> >> >
> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> >noqueue
> >> >state UP group default qlen 1000
> >> >
> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
> >> >
> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global ovirtmgmt
> >> >
> >> > valid_lft forever preferred_lft forever
> >> >
> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link
> >> >
> >> > valid_lft forever preferred_lft forever
> >> >
> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state
> >DOWN
> >> >group
> >> >default qlen 1000
> >> >
> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
> >> >
> >> >
> >> >2020-04-14 09:46:12,876+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926
> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr:
> >> >
> >> >
> >> >
> >> >2020-04-14 09:46:12,877+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge
> >> >hostname.getLocalAddresses:251
> >> >addresses: [u'192.168.1.61',
u'fe80::ae1f:6bff:febc:326a']
> >> >
> >> >2020-04-14 09:46:12,877+0000 DEBUG
> >> >otopi.plugins.gr_he_common.network.bridge hostname.test_hostname:464
> >> >test_hostname exception
> >> >
> >> >Traceback (most recent call last):
> >> >
> >> >File
"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
> >> >line
> >> >460, in test_hostname
> >> >
> >> > not_local_text,
> >> >
> >> >File
"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
> >> >line
> >> >342, in _validateFQDNresolvability
> >> >
> >> > addresses=resolvedAddressesAsString
> >> >
> >> >RuntimeError:
ovirt-node-00.phoelex.com resolves to
> >64:ff9b::c0a8:13d
> >> >192.168.1.61 and not all of them can be mapped to non loopback
> >devices
> >> >on
> >> >this host
> >> >
> >> >2020-04-14 09:46:12,884+0000 ERROR
> >> >otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:120
> >Host
> >> >name
> >> >is not valid:
ovirt-node-00.phoelex.com resolves to
> >64:ff9b::c0a8:13d
> >> >192.168.1.61 and not all of them can be mapped to non loopback
> >devices
> >> >on
> >> >this host
> >> >
> >> >The node I'm running on has an IP address of .61 and resolves
> >> >correctly.
> >> >
> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq
> ><shareef(a)jalloq.co.uk>
> >> >wrote:
> >> >
> >> >> Where should I be checking if there are any files/folder not owned
> >by
> >> >> vdsm:kvm? I checked on the mount the HA sits on and it's
fine.
> >> >>
> >> >> How would I go about checking vdsm can access those images? If I
> >run
> >> >> virsh, it lists them and they were running yesterday even though
> >the
> >> >HA was
> >> >> down. I've since restarted both hosts but the broker is still
> >> >spitting out
> >> >> the same error (copied below). How do I find the reason the
> >broker
> >> >can't
> >> >> connect to the storage? The conf file is already at DEBUG
> >verbosity:
> >> >>
> >> >> [handler_logfile]
> >> >>
> >> >> class=logging.handlers.TimedRotatingFileHandler
> >> >>
> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log',
'd', 1, 7)
> >> >>
> >> >> level=DEBUG
> >> >>
> >> >> formatter=long
> >> >>
> >> >> And what are all these .prob-<num> files that are being
created?
> >> >There
> >> >> are over 250K of them now on the mount I'm using for the Data
> >domain.
> >> >> They're all of 0 size and of the form,
> >> >> /rhev/data-center/mnt/nas-01.phoelex.com:
> >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
> >> >>
> >> >> @eevans: The volume I have the Data Domain on has TB's free.
The
> >HA
> >> >is
> >> >> dead so I can't ssh in. No idea what started these errors and
the
> >> >other
> >> >> VMs were still running happily although they're on a different
> >Data
> >> >Domain.
> >> >>
> >> >> Shareef.
> >> >>
> >> >> MainThread::INFO::2020-04-10
> >> >>
> >>
> >>
>
>
>>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >> >> Connecting the storage
> >> >>
> >> >> MainThread::INFO::2020-04-10
> >> >>
> >>
> >>
>
>
>>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >> Connecting storage server
> >> >>
> >> >> MainThread::INFO::2020-04-10
> >> >>
> >>
> >>
>
>
>>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >> Connecting storage server
> >> >>
> >> >> MainThread::INFO::2020-04-10
> >> >>
> >>
> >>
>
>
>>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >> Refreshing the storage domain
> >> >>
> >> >> MainThread::WARNING::2020-04-10
> >> >>
> >>
> >>
>
>
>>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo with
> >args
> >> >> {'storagedomainID':
'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
> >failed:
> >> >>
> >> >> (code=350, message=Error in storage domain action:
> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >> >>
> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov
> >> ><hunter86_bg(a)yahoo.com>
> >> >> wrote:
> >> >>
> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq <
> >> >>> shareef(a)jalloq.co.uk> wrote:
> >> >>> >OK, let's go through this. I'm looking at the node
that at
> >least
> >> >still
> >> >>> >has
> >> >>> >some VMs running. virsh also tells me that the
HostedEngine VM
> >is
> >> >>> >running
> >> >>> >but it's unresponsive and I can't shut it down.
> >> >>> >
> >> >>> >1. All storage domains exist and are mounted.
> >> >>> >2. The ha_agent exists:
> >> >>> >
> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls
> >> >/rhev/data-center/mnt/
> >> >>> >nas-01.phoelex.com
> >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
> >> >>> >
> >> >>> >dom_md ha_agent images master
> >> >>> >
> >> >>> >3. There are two links
> >> >>> >
> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll
> >> >/rhev/data-center/mnt/
> >> >>> >nas-01.phoelex.com
> >> >>>
> >>\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
> >> >>> >
> >> >>> >total 8
> >> >>> >
> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50
hosted-engine.lockspace
> >->
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
> >> >>> >
> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50
hosted-engine.metadata
> >->
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
> >> >>> >
> >> >>> >4. The services exist but all seem to have some sort of
warning:
> >> >>> >
> >> >>> >a) Apr 08 18:10:55
ovirt-node-01.phoelex.com
sanlock[1728]:
> >> >*2020-04-08
> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time
10
> >sec*
> >> >>> >
> >> >>> >b) Mar 23 18:02:59
ovirt-node-01.phoelex.com
supervdsmd[29409]:
> >> >*failed
> >> >>> >to
> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared
object
> >> >file:
> >> >>> >No
> >> >>> >such file or directory*
> >> >>> >
> >> >>> >c) Apr 09 08:05:13
ovirt-node-01.phoelex.com vdsm[4801]:
*ERROR
> >> >failed
> >> >>> >to
> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file
or
> >> >directory'Is
> >> >>> >the
> >> >>> >Hosted Engine setup finished?*
> >> >>> >
> >> >>> >d)Apr 08 22:48:27
ovirt-node-01.phoelex.com
libvirtd[29307]:
> >> >2020-04-08
> >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404
:
> >> >cannot
> >> >>> >parse
> >> >>> >process status data
> >> >>> >
> >> >>> >Apr 08 22:48:27
ovirt-node-01.phoelex.com libvirtd[29307]:
> >> >2020-04-08
> >> >>> >22:48:27.134+0000: 29309: error :
virNetDevTapInterfaceStats:764
> >:
> >> >>> >internal
> >> >>> >error: /proc/net/dev: Interface not found
> >> >>> >
> >> >>> >Apr 08 23:09:39
ovirt-node-01.phoelex.com libvirtd[29307]:
> >> >2020-04-08
> >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806
:
> >End
> >> >of
> >> >>> >file
> >> >>> >while reading data: Input/output error
> >> >>> >
> >> >>> >Apr 09 01:05:26
ovirt-node-01.phoelex.com libvirtd[29307]:
> >> >2020-04-09
> >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806
:
> >End
> >> >of
> >> >>> >file
> >> >>> >while reading data: Input/output error
> >> >>> >
> >> >>> >5 & 6. The broker log is continually printing this
error:
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >> >>> >Running broker
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
> >> >>> >Starting monitor
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Searching for submonitors in
> >> >>>
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
> >> >>> >
> >> >>> >/submonitors
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor network
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor cpu-load-no-engine
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor mgmt-bridge
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor network
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor cpu-load
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor engine-health
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor mgmt-bridge
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor cpu-load-no-engine
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor cpu-load
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor mem-free
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor storage-domain
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor storage-domain
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor mem-free
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Loaded submonitor engine-health
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >Finished loading submonitors
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
> >> >>> >Starting storage broker
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >> >>> >Connecting to VDSM
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
> >> >>> >Creating a new json-rpc connection to VDSM
> >> >>> >
> >> >>> >Client localhost:54321::DEBUG::2020-04-09
> >> >>> >08:07:31,453::concurrent::258::root::(run) START thread
> >> ><Thread(Client
> >> >>> >localhost:54321, started daemon 139992488138496)>
(func=<bound
> >> >method
> >> >>> >Reactor.process_requests of
<yajsonrpc.betterAsyncore.Reactor
> >> >object at
> >> >>> >0x7f528acabc90>>, args=(), kwargs={})
> >> >>> >
> >> >>> >Client localhost:54321::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
> >> >>> >Stomp connection established
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
>08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >> >Sending
> >> >>> >response
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >> >>> >Connecting the storage
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>> >Connecting storage server
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
>08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >> >Sending
> >> >>> >response
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
>08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >> >Sending
> >> >>> >response
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not
> >> >available
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>> >Connecting storage server
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
>08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >> >Sending
> >> >>> >response
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>> >[{u'status': 0, u'id':
u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
> >> >>> >
> >> >>> >MainThread::INFO::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>> >Refreshing the storage domain
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
>08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >> >Sending
> >> >>> >response
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>> >Error refreshing storage domain: Command
StorageDomain.getStats
> >> >with
> >> >>> >args
> >> >>> >{'storagedomainID':
'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
> >failed:
> >> >>> >
> >> >>> >(code=350, message=Error in storage domain action:
> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
>08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send)
> >> >Sending
> >> >>> >response
> >> >>> >
> >> >>> >MainThread::DEBUG::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
> >> >>> >Command StorageDomain.getInfo with args
{'storagedomainID':
> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed:
> >> >>> >
> >> >>> >(code=350, message=Error in storage domain action:
> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >> >>> >
> >> >>> >MainThread::WARNING::2020-04-09
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >> >>> >Can't connect vdsm storage: Command
StorageDomain.getInfo with
> >args
> >> >>> >{'storagedomainID':
'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
> >failed:
> >> >>> >
> >> >>> >(code=350, message=Error in storage domain action:
> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >> >>> >
> >> >>> >
> >> >>> >The UUID it is moaning about is indeed the one that the HA
sits
> >on
> >> >and
> >> >>> >is
> >> >>> >the one I listed the contents of in step 2 above.
> >> >>> >
> >> >>> >
> >> >>> >So why can't it see this domain?
> >> >>> >
> >> >>> >
> >> >>> >Thanks, Shareef.
> >> >>> >
> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov
> >> ><hunter86_bg(a)yahoo.com>
> >> >>> >wrote:
> >> >>> >
> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq
<
> >> >>> >> shareef(a)jalloq.co.uk> wrote:
> >> >>> >> >Don't know if this is useful or not, but I
just tried to
> >> >shutdown
> >> >>> >and
> >> >>> >> >start
> >> >>> >> >another VM on one of the hosts and get the
following error:
> >> >>> >> >
> >> >>> >> >virsh # start scratch
> >> >>> >> >
> >> >>> >> >error: Failed to start domain scratch
> >> >>> >> >
> >> >>> >> >error: Network not found: no network with matching
name
> >> >>> >> >'vdsm-ovirtmgmt'
> >> >>> >> >
> >> >>> >> >Is this not referring to the interface name as the
network is
> >> >called
> >> >>> >> >'ovirtmgnt'.
> >> >>> >> >
> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq
> >> >>> ><shareef(a)jalloq.co.uk>
> >> >>> >> >wrote:
> >> >>> >> >
> >> >>> >> >> Hmmm, virsh tells me the HE is running but it
hasn't come
> >up
> >> >and
> >> >>> >the
> >> >>> >> >> agent.log is full of the same errors.
> >> >>> >> >>
> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef
Jalloq
> >> >>> ><shareef(a)jalloq.co.uk>
> >> >>> >> >> wrote:
> >> >>> >> >>
> >> >>> >> >>> Ah hah! Ok, so I've managed to start
it using virsh on
> >the
> >> >>> >second
> >> >>> >> >host
> >> >>> >> >>> but my first host is still dead.
> >> >>> >> >>>
> >> >>> >> >>> First of all, what are these 56,317
.prob- files that get
> >> >dumped
> >> >>> >to
> >> >>> >> >the
> >> >>> >> >>> NFS mounts?
> >> >>> >> >>>
> >> >>> >> >>> Secondly, why doesn't the node mount
the NFS directories
> >at
> >> >boot?
> >> >>> >> >Is
> >> >>> >> >>> that the issue with this particular
node?
> >> >>> >> >>>
> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM
> ><eevans(a)digitaldatatechs.com>
> >> >>> >wrote:
> >> >>> >> >>>
> >> >>> >> >>>> Did you try virsh list --inactive
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> Eric Evans
> >> >>> >> >>>>
> >> >>> >> >>>> Digital Data Services LLC.
> >> >>> >> >>>>
> >> >>> >> >>>> 304.660.9080
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> *From:* Shareef Jalloq
<shareef(a)jalloq.co.uk>
> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58
PM
> >> >>> >> >>>> *To:* Strahil Nikolov
<hunter86_bg(a)yahoo.com>
> >> >>> >> >>>> *Cc:* Ovirt Users
<users(a)ovirt.org>
> >> >>> >> >>>> *Subject:* [ovirt-users] Re:
ovirt-engine unresponsive -
> >how
> >> >to
> >> >>> >> >rescue?
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> I've now shut down the VMs on one
host and rebooted it
> >but
> >> >the
> >> >>> >> >agent
> >> >>> >> >>>> service doesn't start. If I run
'hosted-engine
> >--vm-status'
> >> >I
> >> >>> >get:
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> The hosted engine configuration has
not been retrieved
> >from
> >> >>> >shared
> >> >>> >> >>>> storage. Please ensure that
ovirt-ha-agent is running and
> >> >the
> >> >>> >> >storage
> >> >>> >> >>>> server is reachable.
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> and indeed if I list the mounts
under
> >/rhev/data-center/mnt,
> >> >>> >only
> >> >>> >> >one of
> >> >>> >> >>>> the directories is mounted. I have 3
NFS mounts, one ISO
> >> >Domain
> >> >>> >> >and two
> >> >>> >> >>>> Data Domains. Only one Data Domain
has mounted and this
> >has
> >> >>> >lots
> >> >>> >> >of .prob
> >> >>> >> >>>> files in. So why haven't the
other NFS exports been
> >> >mounted?
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> Manually mounting them doesn't
seem to have helped much
> >> >either.
> >> >>> >I
> >> >>> >> >can
> >> >>> >> >>>> start the broker service but the
agent service says no.
> >> >Same
> >> >>> >error
> >> >>> >> >as the
> >> >>> >> >>>> one in my last email.
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> Shareef.
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM
Shareef Jalloq
> >> >>> >> ><shareef(a)jalloq.co.uk>
> >> >>> >> >>>> wrote:
> >> >>> >> >>>>
> >> >>> >> >>>> Right, still down. I've run
virsh and it doesn't know
> >> >anything
> >> >>> >> >about
> >> >>> >> >>>> the engine vm.
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> I've restarted the broker and
agent services and I still
> >get
> >> >>> >> >nothing in
> >> >>> >> >>>> virsh->list.
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> In the logs under
/var/log/ovirt-hosted-engine-ha I see
> >lots
> >> >of
> >> >>> >> >errors:
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> broker.log:
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6
started
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Searching for submonitors in
> >> >>> >> >>>>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor network
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor mgmt-bridge
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor network
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor cpu-load
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor engine-health
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor mgmt-bridge
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor cpu-load
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor mem-free
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor storage-domain
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor storage-domain
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor mem-free
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Loaded submonitor engine-health
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Finished loading submonitors
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
> >> >>> >> >>>> Connecting the storage
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>> >> >>>> Connecting storage server
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>> >> >>>> Connecting storage server
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> >> >>> >> >>>> Refreshing the storage domain
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::WARNING::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
> >> >>> >> >>>> Can't connect vdsm storage:
Command StorageDomain.getInfo
> >> >with
> >> >>> >args
> >> >>> >> >>>> {'storagedomainID':
> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
> >> >>> >failed:
> >> >>> >> >>>>
> >> >>> >> >>>> (code=350, message=Error in storage
domain action:
> >> >>> >> >>>>
(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6
started
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
> >> >>> >> >>>> Searching for submonitors in
> >> >>> >> >>>>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> agent.log:
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::ERROR::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >> >>> >> >>>> Trying to restart agent
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>>
> >>
>
>
>>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >> >>> >> >>>> Agent shutting down
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>>
> >>
>
>
>>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6
started
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> >> >>> >> >>>> Found certificate common name:
ovirt-node-01.phoelex.com
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> >> >>> >> >>>> Initializing ha-broker connection
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> >> >>> >> >>>> Starting monitor network, options
{'tcp_t_address': '',
> >> >>> >> >'network_test':
> >> >>> >> >>>> 'dns', 'tcp_t_port':
'', 'addr': '192.168.1.99'}
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::ERROR::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
> >> >>> >> >>>> Failed to start necessary monitors
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::ERROR::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >> >>> >> >>>> Traceback (most recent call last):
> >> >>> >> >>>>
> >> >>> >> >>>> File
> >> >>> >> >>>>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >> >>> >> >>>> line 131, in _run_agent
> >> >>> >> >>>>
> >> >>> >> >>>> return action(he)
> >> >>> >> >>>>
> >> >>> >> >>>> File
> >> >>> >> >>>>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >> >>> >> >>>> line 55, in action_proper
> >> >>> >> >>>>
> >> >>> >> >>>> return he.start_monitoring()
> >> >>> >> >>>>
> >> >>> >> >>>> File
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >> >>> >> >>>> line 432, in start_monitoring
> >> >>> >> >>>>
> >> >>> >> >>>> self._initialize_broker()
> >> >>> >> >>>>
> >> >>> >> >>>> File
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >> >>> >> >>>> line 556, in _initialize_broker
> >> >>> >> >>>>
> >> >>> >> >>>> m.get('options', {}))
> >> >>> >> >>>>
> >> >>> >> >>>> File
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> >> >>> >> >>>> line 89, in start_monitor
> >> >>> >> >>>>
> >> >>> >> >>>> ).format(t=type, o=options, e=e)
> >> >>> >> >>>>
> >> >>> >> >>>> RequestError: brokerlink - failed to
start monitor via
> >> >>> >> >ovirt-ha-broker:
> >> >>> >> >>>> [Errno 2] No such file or directory,
[monitor: 'network',
> >> >>> >options:
> >> >>> >> >>>> {'tcp_t_address': '',
'network_test': 'dns',
> >'tcp_t_port':
> >> >'',
> >> >>> >> >'addr':
> >> >>> >> >>>> '192.168.1.99'}]
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::ERROR::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>
>
>
>>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> >> >>> >> >>>> Trying to restart agent
> >> >>> >> >>>>
> >> >>> >> >>>> MainThread::INFO::2020-04-08
> >> >>> >> >>>>
> >> >>> >>
> >> >>>
> >>
>
>
>>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >> >>> >> >>>> Agent shutting down
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM
Strahil Nikolov
> >> >>> >> ><hunter86_bg(a)yahoo.com>
> >> >>> >> >>>> wrote:
> >> >>> >> >>>>
> >> >>> >> >>>> On April 8, 2020 7:47:20 PM
GMT+03:00, "Maton, Brett" <
> >> >>> >> >>>> matonb(a)ltresources.co.uk> wrote:
> >> >>> >> >>>> >On the host you tried to restart
the engine on:
> >> >>> >> >>>> >
> >> >>> >> >>>> >Add an alias to virsh
(authenticates with
> >virsh_auth.conf)
> >> >>> >> >>>> >
> >> >>> >> >>>> >alias virsh='virsh -c
> >> >>> >> >>>>
> >> >>>
> >>>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
> >> >>> >> >>>> >
> >> >>> >> >>>> >Then run virsh:
> >> >>> >> >>>> >
> >> >>> >> >>>> >virsh
> >> >>> >> >>>> >
> >> >>> >> >>>> >virsh # list
> >> >>> >> >>>> > Id Name
State
> >> >>> >> >>>>
>----------------------------------------------------
> >> >>> >> >>>> > xx HostedEngine
Paused
> >> >>> >> >>>> > xx **********
running
> >> >>> >> >>>> > ...
> >> >>> >> >>>> > xx **********
running
> >> >>> >> >>>> >
> >> >>> >> >>>> >HostedEngine should be in the
list, try and resume the
> >> >engine:
> >> >>> >> >>>> >
> >> >>> >> >>>> >virsh # resume HostedEngine
> >> >>> >> >>>> >
> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28,
Shareef Jalloq
> >> >>> ><shareef(a)jalloq.co.uk>
> >> >>> >> >>>> >wrote:
> >> >>> >> >>>> >
> >> >>> >> >>>> >> Thanks!
> >> >>> >> >>>> >>
> >> >>> >> >>>> >> The status hangs due to, I
guess, the VM being
> >down....
> >> >>> >> >>>> >>
> >> >>> >> >>>> >> [root@ovirt-node-01 ~]#
hosted-engine --vm-start
> >> >>> >> >>>> >> VM exists and is down,
cleaning up and restarting
> >> >>> >> >>>> >> VM in WaitForLaunch
> >> >>> >> >>>> >>
> >> >>> >> >>>> >> but this doesn't seem to
do anything. OK, after a
> >while
> >> >I
> >> >>> >get a
> >> >>> >> >>>> >status of
> >> >>> >> >>>> >> it being barfed...
> >> >>> >> >>>> >>
> >> >>> >> >>>> >> --== Host
ovirt-node-00.phoelex.com (id: 1) status
> >==--
> >> >>> >> >>>> >>
> >> >>> >> >>>> >> conf_on_shared_storage
: True
> >> >>> >> >>>> >> Status up-to-date
: False
> >> >>> >> >>>> >> Hostname
:
> >> >>> >ovirt-node-00.phoelex.com
> >> >>> >> >>>> >> Host ID
: 1
> >> >>> >> >>>> >> Engine status
: unknown
> >stale-data
> >> >>> >> >>>> >> Score
: 3400
> >> >>> >> >>>> >> stopped
: False
> >> >>> >> >>>> >> Local maintenance
: False
> >> >>> >> >>>> >> crc32
: 9c4a034b
> >> >>> >> >>>> >> local_conf_timestamp
: 523362
> >> >>> >> >>>> >> Host timestamp
: 523608
> >> >>> >> >>>> >> Extra metadata (valid at
timestamp):
> >> >>> >> >>>> >> metadata_parse_version=1
> >> >>> >> >>>> >> metadata_feature_version=1
> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8
16:17:11 2020)
> >> >>> >> >>>> >> host-id=1
> >> >>> >> >>>> >> score=3400
> >> >>> >> >>>> >> vm_conf_refresh_time=523362
(Wed Apr 8 16:13:06 2020)
> >> >>> >> >>>> >> conf_on_shared_storage=True
> >> >>> >> >>>> >> maintenance=False
> >> >>> >> >>>> >> state=EngineDown
> >> >>> >> >>>> >> stopped=False
> >> >>> >> >>>> >>
> >> >>> >> >>>> >>
> >> >>> >> >>>> >> --== Host
ovirt-node-01.phoelex.com (id: 2) status
> >==--
> >> >>> >> >>>> >>
> >> >>> >> >>>> >> conf_on_shared_storage
: True
> >> >>> >> >>>> >> Status up-to-date
: True
> >> >>> >> >>>> >> Hostname
:
> >> >>> >ovirt-node-01.phoelex.com
> >> >>> >> >>>> >> Host ID
: 2
> >> >>> >> >>>> >> Engine status
: {"reason": "bad
> >vm
> >> >>> >status",
> >> >>> >> >>>> >"health":
> >> >>> >> >>>> >> "bad",
"vm": "down_unexpected", "detail": "Down"}
> >> >>> >> >>>> >> Score
: 0
> >> >>> >> >>>> >> stopped
: False
> >> >>> >> >>>> >> Local maintenance
: False
> >> >>> >> >>>> >> crc32
: 5045f2eb
> >> >>> >> >>>> >> local_conf_timestamp
: 1737037
> >> >>> >> >>>> >> Host timestamp
: 1737283
> >> >>> >> >>>> >> Extra metadata (valid at
timestamp):
> >> >>> >> >>>> >> metadata_parse_version=1
> >> >>> >> >>>> >> metadata_feature_version=1
> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr
8 16:16:17 2020)
> >> >>> >> >>>> >> host-id=2
> >> >>> >> >>>> >> score=0
> >> >>> >> >>>> >> vm_conf_refresh_time=1737037
(Wed Apr 8 16:12:11
> >2020)
> >> >>> >> >>>> >> conf_on_shared_storage=True
> >> >>> >> >>>> >> maintenance=False
> >> >>> >> >>>> >>
state=EngineUnexpectedlyDown
> >> >>> >> >>>> >> stopped=False
> >> >>> >> >>>> >>
> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09
PM Maton, Brett
> >> >>> >> >>>> ><matonb(a)ltresources.co.uk>
> >> >>> >> >>>> >> wrote:
> >> >>> >> >>>> >>
> >> >>> >> >>>> >>> First steps, on one of
your hosts as root:
> >> >>> >> >>>> >>>
> >> >>> >> >>>> >>> To get information:
> >> >>> >> >>>> >>> hosted-engine
--vm-status
> >> >>> >> >>>> >>>
> >> >>> >> >>>> >>> To start the engine:
> >> >>> >> >>>> >>> hosted-engine
--vm-start
> >> >>> >> >>>> >>>
> >> >>> >> >>>> >>>
> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at
17:00, Shareef Jalloq
> >> >>> >> ><shareef(a)jalloq.co.uk>
> >> >>> >> >>>> >wrote:
> >> >>> >> >>>> >>>
> >> >>> >> >>>> >>>> So my engine has
gone down and I can't ssh into it
> >> >either.
> >> >>> >If
> >> >>> >> >I
> >> >>> >> >>>> >try to
> >> >>> >> >>>> >>>> log into the web-ui
of the node it is running on, I
> >get
> >> >>> >> >redirected
> >> >>> >> >>>> >because
> >> >>> >> >>>> >>>> the node can't
reach the engine.
> >> >>> >> >>>> >>>>
> >> >>> >> >>>> >>>> What are my next
steps?
> >> >>> >> >>>> >>>>
> >> >>> >> >>>> >>>> Shareef.
> >> >>> >> >>>> >>>>
_______________________________________________
> >> >>> >> >>>> >>>> Users mailing list
-- users(a)ovirt.org
> >> >>> >> >>>> >>>> To unsubscribe send
an email to
> >users-leave(a)ovirt.org
> >> >>> >> >>>> >>>> Privacy Statement:
> >> >>> >https://www.ovirt.org/privacy-policy.html
> >> >>> >> >>>> >>>> oVirt Code of
Conduct:
> >> >>> >> >>>> >>>>
> >> >https://www.ovirt.org/community/about/community-guidelines/
> >> >>> >> >>>> >>>> List Archives:
> >> >>> >> >>>> >>>>
> >> >>> >> >>>> >
> >> >>> >> >>>>
> >> >>> >> >
> >> >>> >>
> >> >>> >
> >> >>>
> >> >
> >>
> >
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRS...
> >> >>> >> >>>> >>>>
> >> >>> >> >>>> >>>
> >> >>> >> >>>>
> >> >>> >> >>>> This has to be resolved:
> >> >>> >> >>>>
> >> >>> >> >>>> Engine status :
unknown stale-data
> >> >>> >> >>>>
> >> >>> >> >>>> Run again 'hosted-engine
--vm-status'. If it remains the
> >> >same,
> >> >>> >> >restart
> >> >>> >> >>>> ovirt-ha-broker.service &
ovirt-ha-agent.service
> >> >>> >> >>>>
> >> >>> >> >>>> Verify that the engine's storage
is available. Then
> >monitor
> >> >the
> >> >>> >> >broker
> >> >>> >> >>>> & agent logs in
/var/log/ovirt-hosted-engine-ha
> >> >>> >> >>>>
> >> >>> >> >>>> Best Regards,
> >> >>> >> >>>> Strahil Nikolov
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >>
> >> >>> >> Hi Shareef,
> >> >>> >>
> >> >>> >> The flow of activation oVirt is more complex than a
plain KVM.
> >> >>> >> Mounting of the domains happen during the activation
of the
> >node
> >> >(
> >> >>> >the
> >> >>> >> HostedEngine is activating everything needed).
> >> >>> >>
> >> >>> >> Focus on the HostedEngine VM.
> >> >>> >> Is it running properly ?
> >> >>> >>
> >> >>> >> If not,try:
> >> >>> >> 1. Verify that the storage domain exists
> >> >>> >> 2. Check if it has 'ha_agents' directory
> >> >>> >> 3. Check if the links are OK, if not you can safely
remove
> >the
> >> >links
> >> >>> >>
> >> >>> >> 4. Next check the services are running:
> >> >>> >> A) sanlock
> >> >>> >> B) supervdsmd
> >> >>> >> C) vdsmd
> >> >>> >> D) libvirtd
> >> >>> >>
> >> >>> >> 5. Increase the log level for broker and agent
services:
> >> >>> >>
> >> >>> >> cd /etc/ovirt-hosted-engine-ha
> >> >>> >> vim *-log.conf
> >> >>> >>
> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent
> >> >>> >>
> >> >>> >> 6. Check what they are complaining about
> >> >>> >> Keep in mind that agent will keep throwing errors
untill the
> >> >broker
> >> >>> >stops
> >> >>> >> doing it (agent depends on broker), so broker must
be OK
> >before
> >> >>> >> peoceeding with the agent log.
> >> >>> >>
> >> >>> >> About the manual VM start, you need 2 things:
> >> >>> >>
> >> >>> >> 1. Define the VM network
> >> >>> >> # cat vdsm-ovirtmgmt.xml <network>
> >> >>> >> <name>vdsm-ovirtmgmt</name>
> >> >>> >>
<uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid>
> >> >>> >> <forward mode='bridge'/>
> >> >>> >> <bridge name='ovirtmgmt'/>
> >> >>> >> </network>
> >> >>> >>
> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define
> >> >vdsm-ovirtmgmt.xml
> >> >>> >>
> >> >>> >> 2. Get an xml definition which can be found in the
vdsm log.
> >> >Every VM
> >> >>> >at
> >> >>> >> start up has it's configuration printed out in
vdsm log on
> >the
> >> >host
> >> >>> >it
> >> >>> >> starts.
> >> >>> >> Save to file and then:
> >> >>> >> A) virsh define myvm.xml
> >> >>> >> B) virsh start myvm
> >> >>> >>
> >> >>> >> It seems there is/was a problem with your NFS shares.
> >> >>> >>
> >> >>> >>
> >> >>> >> Best Regards,
> >> >>> >> Strahil Nikolov
> >> >>> >>
> >> >>>
> >> >>> Hey Shareef,
> >> >>>
> >> >>> Check if there are any files or folders not owned by vdsm:kvm
.
> >> >Something
> >> >>> like this:
> >> >>>
> >> >>> find . -not -user 36 -not -group 36 -print
> >> >>>
> >> >>> Also check if vdsm can access the images in the
> >> >>> '<vol-mount-point>/images' directories.
> >> >>>
> >> >>> Best Regards,
> >> >>> Strahil Nikolov
> >> >>>
> >> >>
> >>
> >> And the IPv6 address '64:ff9b::c0a8:13d' ?
> >>
> >> I don't see in the log output.
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
>
> Based on your output , you got a PTR record for IPv4 & IPv6 ... most
> probably it's the reason.
>
> Set the IPv6 on the interface and try again.
>
> Best Regards,
> Strahil Nikolov
>