On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq <shareef(a)jalloq.co.uk>
wrote:
Oh this is painful. It seems to progress if you have both
he_force_ipv4
set and run the deployment with the '--4' switch.
But then I get a failure when the ansible script checks for
firewalld-zones
and doesn't get anything back. Should the deployment flow not be
setting
any zones it needs?
2020-04-15 10:57:25,439+0000 INFO
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get
active list of active firewalld zones]
2020-04-15 10:57:26,641+0000 DEBUG
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:103 {u'stderr_lines': [], u'changed':
True,
u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False,
u'stdout': u'', u'cmd': u'set -euo pipefail &&
firewall-cmd
--get-active-zones | grep -v "^\\s*interfaces"', u'start':
u'2020-04-15
10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr':
u'', u'rc':
1,
u'invocation': {u'module_args': {u'creates': None,
u'executable': None,
u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params':
u'set
-euo
pipefail && firewall-cmd --get-active-zones | grep -v
"^\\s*interfaces"',
u'removes': None, u'argv': None, u'warn': True, u'chdir':
None,
u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines':
[],
u'msg':
u'non-zero return code'}
2020-04-15 10:57:26,741+0000 ERROR
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:107 fatal: [localhost]: FAILED! =>
{"changed": true, "cmd": "set -euo pipefail &&
firewall-cmd
--get-active-zones | grep -v \"^\\s*interfaces\"", "delta":
"0:00:00.430999", "end": "2020-04-15 10:57:26.481202",
"msg": "non-zero
return code", "rc": 1, "start": "2020-04-15
10:57:26.050203", "stderr":
"",
"stderr_lines": [], "stdout": "", "stdout_lines":
[]}
On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef(a)jalloq.co.uk>
wrote:
> Ha, spoke too soon. It's now stuck in a loop and a google points me
at
>
https://bugzilla.redhat.com/show_bug.cgi?id=1746585
>
> However, forcing ipv4 doesn't seem to have fixed the loop.
>
> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef(a)jalloq.co.uk>
> wrote:
>
>> OK, that seems to have fixed it, thanks. Is this a side effect of
>> redeploying the HE over a first time install? Nothing has changed in
our
>> setup and I didn't need to do this when I initially set up our
nodes.
>>
>>
>>
>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov
<hunter86_bg(a)yahoo.com>
>> wrote:
>>
>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq <
>>> shareef(a)jalloq.co.uk> wrote:
>>> >Hmmm, we're not using ipv6. Is that the issue?
>>> >
>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov
<hunter86_bg(a)yahoo.com>
>>> >wrote:
>>> >
>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq <
>>> >> shareef(a)jalloq.co.uk> wrote:
>>> >> >Right, I've given up on recovering the HE so want to try
and
>>> >redeploy
>>> >> >it.
>>> >> >There doesn't seem to be enough information to debug why
the
>>> >> >broker/agent
>>> >> >won't start cleanly.
>>> >> >
>>> >> >In running 'hosted-engine --deploy', I'm seeing the
following
error
>>> >in
>>> >> >the
>>> >> >setup validation phase:
>>> >> >
>>> >> >2020-04-14 09:46:08,922+0000 DEBUG
otopi.plugins.otopi.dialog.human
>>> >> >dialog.__logString:204 DIALOG:SEND Please
provide
>>> >the
>>> >> >hostname of this host on the management network
>>> >> >[ovirt-node-00.phoelex.com]:
>>> >> >
>>> >> >
>>> >> >2020-04-14 09:46:12,831+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge
>>> >> >hostname.getResolvedAddresses:432
>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d',
'192.168.1.61'])
>>> >> >
>>> >> >2020-04-14 09:46:12,832+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge
>>> >> >hostname._validateFQDNresolvability:289
ovirt-node-00.phoelex.com
>>> >> >resolves
>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61'])
>>> >> >
>>> >> >2020-04-14 09:46:12,832+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
>>> >> >execute:
>>> >> >['/usr/bin/dig', '+noall', '+answer',
'ovirt-node-00.phoelex.com',
>>> >> >'ANY'],
>>> >> >executable='None', cwd='None', env=None
>>> >> >
>>> >> >2020-04-14 09:46:12,871+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
>>> >> >execute-result: ['/usr/bin/dig', '+noall',
'+answer', '
>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0
>>> >> >
>>> >> >2020-04-14 09:46:12,872+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921
>>> >> >execute-output: ['/usr/bin/dig', '+noall',
'+answer', '
>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout:
>>> >> >
>>> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61
>>> >> >
>>> >> >
>>> >> >2020-04-14 09:46:12,872+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926
>>> >> >execute-output: ['/usr/bin/dig', '+noall',
'+answer', '
>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr:
>>> >> >
>>> >> >
>>> >> >
>>> >> >2020-04-14 09:46:12,872+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813
>>> >> >execute:
>>> >> >('/usr/sbin/ip', 'addr'),
executable='None', cwd='None',
env=None
>>> >> >
>>> >> >2020-04-14 09:46:12,876+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863
>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0
>>> >> >
>>> >> >2020-04-14 09:46:12,876+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:921
>>> >> >execute-output: ('/usr/sbin/ip', 'addr')
stdout:
>>> >> >
>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
state
UNKNOWN
>>> >> >group
>>> >> >default qlen 1000
>>> >> >
>>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>> >> >
>>> >> > inet 127.0.0.1/8 scope host lo
>>> >> >
>>> >> > valid_lft forever preferred_lft forever
>>> >> >
>>> >> > inet6 ::1/128 scope host
>>> >> >
>>> >> > valid_lft forever preferred_lft forever
>>> >> >
>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
mq
master
>>> >> >ovirtmgmt state UP group default qlen 1000
>>> >> >
>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
>>> >> >
>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500
qdisc mq
state
>>> >> >DOWN
>>> >> >group default qlen 1000
>>> >> >
>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff
>>> >> >
>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
state
DOWN
>>> >> >group
>>> >> >default qlen 1000
>>> >> >
>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff
>>> >> >
>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state
DOWN
>>> >group
>>> >> >default qlen 1000
>>> >> >
>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff
>>> >> >
>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc
>>> >noqueue
>>> >> >state UP group default qlen 1000
>>> >> >
>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff
>>> >> >
>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global
ovirtmgmt
>>> >> >
>>> >> > valid_lft forever preferred_lft forever
>>> >> >
>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link
>>> >> >
>>> >> > valid_lft forever preferred_lft forever
>>> >> >
>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc
noop
state
>>> >DOWN
>>> >> >group
>>> >> >default qlen 1000
>>> >> >
>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff
>>> >> >
>>> >> >
>>> >> >2020-04-14 09:46:12,876+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.execute:926
>>> >> >execute-output: ('/usr/sbin/ip', 'addr')
stderr:
>>> >> >
>>> >> >
>>> >> >
>>> >> >2020-04-14 09:46:12,877+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge
>>> >> >hostname.getLocalAddresses:251
>>> >> >addresses: [u'192.168.1.61',
u'fe80::ae1f:6bff:febc:326a']
>>> >> >
>>> >> >2020-04-14 09:46:12,877+0000 DEBUG
>>> >> >otopi.plugins.gr_he_common.network.bridge
hostname.test_hostname:464
>>> >> >test_hostname exception
>>> >> >
>>> >> >Traceback (most recent call last):
>>> >> >
>>> >> >File
"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
>>> >> >line
>>> >> >460, in test_hostname
>>> >> >
>>> >> > not_local_text,
>>> >> >
>>> >> >File
"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
>>> >> >line
>>> >> >342, in _validateFQDNresolvability
>>> >> >
>>> >> > addresses=resolvedAddressesAsString
>>> >> >
>>> >> >RuntimeError:
ovirt-node-00.phoelex.com resolves to
>>> >64:ff9b::c0a8:13d
>>> >> >192.168.1.61 and not all of them can be mapped to non loopback
>>> >devices
>>> >> >on
>>> >> >this host
>>> >> >
>>> >> >2020-04-14 09:46:12,884+0000 ERROR
>>> >> >otopi.plugins.gr_he_common.network.bridge
dialog.queryEnvKey:120
>>> >Host
>>> >> >name
>>> >> >is not valid:
ovirt-node-00.phoelex.com resolves to
>>> >64:ff9b::c0a8:13d
>>> >> >192.168.1.61 and not all of them can be mapped to non loopback
>>> >devices
>>> >> >on
>>> >> >this host
>>> >> >
>>> >> >The node I'm running on has an IP address of .61 and
resolves
>>> >> >correctly.
>>> >> >
>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq
>>> ><shareef(a)jalloq.co.uk>
>>> >> >wrote:
>>> >> >
>>> >> >> Where should I be checking if there are any files/folder
not
owned
>>> >by
>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and
it's
fine.
>>> >> >>
>>> >> >> How would I go about checking vdsm can access those images?
If I
>>> >run
>>> >> >> virsh, it lists them and they were running yesterday even
though
>>> >the
>>> >> >HA was
>>> >> >> down. I've since restarted both hosts but the broker
is
still
>>> >> >spitting out
>>> >> >> the same error (copied below). How do I find the reason
the
>>> >broker
>>> >> >can't
>>> >> >> connect to the storage? The conf file is already at DEBUG
>>> >verbosity:
>>> >> >>
>>> >> >> [handler_logfile]
>>> >> >>
>>> >> >> class=logging.handlers.TimedRotatingFileHandler
>>> >> >>
>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log',
'd', 1,
7)
>>> >> >>
>>> >> >> level=DEBUG
>>> >> >>
>>> >> >> formatter=long
>>> >> >>
>>> >> >> And what are all these .prob-<num> files that are
being
created?
>>> >> >There
>>> >> >> are over 250K of them now on the mount I'm using for
the Data
>>> >domain.
>>> >> >> They're all of 0 size and of the form,
>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com:
>>> >> >>
_volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9
>>> >> >>
>>> >> >> @eevans: The volume I have the Data Domain on has TB's
free.
The
>>> >HA
>>> >> >is
>>> >> >> dead so I can't ssh in. No idea what started these
errors
and the
>>> >> >other
>>> >> >> VMs were still running happily although they're on a
different
>>> >Data
>>> >> >Domain.
>>> >> >>
>>> >> >> Shareef.
>>> >> >>
>>> >> >> MainThread::INFO::2020-04-10
>>> >> >>
>>> >>
>>> >>
>>>
>>>
>>07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>> >> >> Connecting the storage
>>> >> >>
>>> >> >> MainThread::INFO::2020-04-10
>>> >> >>
>>> >>
>>> >>
>>>
>>>
>>07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >> Connecting storage server
>>> >> >>
>>> >> >> MainThread::INFO::2020-04-10
>>> >> >>
>>> >>
>>> >>
>>>
>>>
>>07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >> Connecting storage server
>>> >> >>
>>> >> >> MainThread::INFO::2020-04-10
>>> >> >>
>>> >>
>>> >>
>>>
>>>
>>07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >> Refreshing the storage domain
>>> >> >>
>>> >> >> MainThread::WARNING::2020-04-10
>>> >> >>
>>> >>
>>> >>
>>>
>>>
>>07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>>> >> >> Can't connect vdsm storage: Command
StorageDomain.getInfo
with
>>> >args
>>> >> >> {'storagedomainID':
'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
>>> >failed:
>>> >> >>
>>> >> >> (code=350, message=Error in storage domain action:
>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >> >>
>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov
>>> >> ><hunter86_bg(a)yahoo.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq
<
>>> >> >>> shareef(a)jalloq.co.uk> wrote:
>>> >> >>> >OK, let's go through this. I'm looking at
the node that at
>>> >least
>>> >> >still
>>> >> >>> >has
>>> >> >>> >some VMs running. virsh also tells me that the
HostedEngine VM
>>> >is
>>> >> >>> >running
>>> >> >>> >but it's unresponsive and I can't shut it
down.
>>> >> >>> >
>>> >> >>> >1. All storage domains exist and are mounted.
>>> >> >>> >2. The ha_agent exists:
>>> >> >>> >
>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls
>>> >> >/rhev/data-center/mnt/
>>> >> >>> >nas-01.phoelex.com
>>> >> >>>
\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/
>>> >> >>> >
>>> >> >>> >dom_md ha_agent images master
>>> >> >>> >
>>> >> >>> >3. There are two links
>>> >> >>> >
>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll
>>> >> >/rhev/data-center/mnt/
>>> >> >>> >nas-01.phoelex.com
>>> >> >>>
>>> >>\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
>>> >> >>> >
>>> >> >>> >total 8
>>> >> >>> >
>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50
hosted-engine.lockspace
>>> >->
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604
>>> >> >>> >
>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50
hosted-engine.metadata
>>> >->
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4
>>> >> >>> >
>>> >> >>> >4. The services exist but all seem to have some
sort of
warning:
>>> >> >>> >
>>> >> >>> >a) Apr 08 18:10:55
ovirt-node-01.phoelex.com
sanlock[1728]:
>>> >> >*2020-04-08
>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long
write time
10
>>> >sec*
>>> >> >>> >
>>> >> >>> >b) Mar 23 18:02:59
ovirt-node-01.phoelex.com
supervdsmd[29409]:
>>> >> >*failed
>>> >> >>> >to
>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open
shared
object
>>> >> >file:
>>> >> >>> >No
>>> >> >>> >such file or directory*
>>> >> >>> >
>>> >> >>> >c) Apr 09 08:05:13
ovirt-node-01.phoelex.com
vdsm[4801]:
*ERROR
>>> >> >failed
>>> >> >>> >to
>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No
such file or
>>> >> >directory'Is
>>> >> >>> >the
>>> >> >>> >Hosted Engine setup finished?*
>>> >> >>> >
>>> >> >>> >d)Apr 08 22:48:27
ovirt-node-01.phoelex.com
libvirtd[29307]:
>>> >> >2020-04-08
>>> >> >>> >22:48:27.134+0000: 29309: warning :
qemuGetProcessInfo:1404
:
>>> >> >cannot
>>> >> >>> >parse
>>> >> >>> >process status data
>>> >> >>> >
>>> >> >>> >Apr 08 22:48:27
ovirt-node-01.phoelex.com
libvirtd[29307]:
>>> >> >2020-04-08
>>> >> >>> >22:48:27.134+0000: 29309: error :
virNetDevTapInterfaceStats:764
>>> >:
>>> >> >>> >internal
>>> >> >>> >error: /proc/net/dev: Interface not found
>>> >> >>> >
>>> >> >>> >Apr 08 23:09:39
ovirt-node-01.phoelex.com
libvirtd[29307]:
>>> >> >2020-04-08
>>> >> >>> >23:09:39.844+0000: 29307: error :
virNetSocketReadWire:1806
:
>>> >End
>>> >> >of
>>> >> >>> >file
>>> >> >>> >while reading data: Input/output error
>>> >> >>> >
>>> >> >>> >Apr 09 01:05:26
ovirt-node-01.phoelex.com
libvirtd[29307]:
>>> >> >2020-04-09
>>> >> >>> >01:05:26.660+0000: 29307: error :
virNetSocketReadWire:1806
:
>>> >End
>>> >> >of
>>> >> >>> >file
>>> >> >>> >while reading data: Input/output error
>>> >> >>> >
>>> >> >>> >5 & 6. The broker log is continually printing
this error:
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> >> >>> >Running broker
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor)
>>> >> >>> >Starting monitor
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Searching for submonitors in
>>> >> >>>
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
>>> >> >>> >
>>> >> >>> >/submonitors
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor network
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor cpu-load-no-engine
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor mgmt-bridge
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor network
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor cpu-load
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor engine-health
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor mgmt-bridge
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor cpu-load-no-engine
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor cpu-load
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor mem-free
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor storage-domain
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor storage-domain
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor mem-free
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Loaded submonitor engine-health
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >Finished loading submonitors
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker)
>>> >> >>> >Starting storage broker
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>> >> >>> >Connecting to VDSM
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug)
>>> >> >>> >Creating a new json-rpc connection to VDSM
>>> >> >>> >
>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09
>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START
thread
>>> >> ><Thread(Client
>>> >> >>> >localhost:54321, started daemon
139992488138496)>
(func=<bound
>>> >> >method
>>> >> >>> >Reactor.process_requests of
<yajsonrpc.betterAsyncore.Reactor
>>> >> >object at
>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={})
>>> >> >>> >
>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected)
>>> >> >>> >Stomp connection established
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>> >> >Sending
>>> >> >>> >response
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>> >> >>> >Connecting the storage
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>> >Connecting storage server
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>> >> >Sending
>>> >> >>> >response
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>> >> >Sending
>>> >> >>> >response
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path)
>>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2
is not
>>> >> >available
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>> >Connecting storage server
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>> >> >Sending
>>> >> >>> >response
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>> >[{u'status': 0, u'id':
u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}]
>>> >> >>> >
>>> >> >>> >MainThread::INFO::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>> >Refreshing the storage domain
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>> >> >Sending
>>> >> >>> >response
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>> >Error refreshing storage domain: Command
StorageDomain.getStats
>>> >> >with
>>> >> >>> >args
>>> >> >>> >{'storagedomainID':
'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
>>> >failed:
>>> >> >>> >
>>> >> >>> >(code=350, message=Error in storage domain action:
>>> >> >>>
>(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>> >> >Sending
>>> >> >>> >response
>>> >> >>> >
>>> >> >>> >MainThread::DEBUG::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size)
>>> >> >>> >Command StorageDomain.getInfo with args
{'storagedomainID':
>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
failed:
>>> >> >>> >
>>> >> >>> >(code=350, message=Error in storage domain action:
>>> >> >>>
>(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >> >>> >
>>> >> >>> >MainThread::WARNING::2020-04-09
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>>> >> >>> >Can't connect vdsm storage: Command
StorageDomain.getInfo
with
>>> >args
>>> >> >>> >{'storagedomainID':
'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
>>> >failed:
>>> >> >>> >
>>> >> >>> >(code=350, message=Error in storage domain action:
>>> >> >>>
>(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >The UUID it is moaning about is indeed the one that
the HA
sits
>>> >on
>>> >> >and
>>> >> >>> >is
>>> >> >>> >the one I listed the contents of in step 2 above.
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >So why can't it see this domain?
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >Thanks, Shareef.
>>> >> >>> >
>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov
>>> >> ><hunter86_bg(a)yahoo.com>
>>> >> >>> >wrote:
>>> >> >>> >
>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef
Jalloq <
>>> >> >>> >> shareef(a)jalloq.co.uk> wrote:
>>> >> >>> >> >Don't know if this is useful or not,
but I just tried to
>>> >> >shutdown
>>> >> >>> >and
>>> >> >>> >> >start
>>> >> >>> >> >another VM on one of the hosts and get the
following
error:
>>> >> >>> >> >
>>> >> >>> >> >virsh # start scratch
>>> >> >>> >> >
>>> >> >>> >> >error: Failed to start domain scratch
>>> >> >>> >> >
>>> >> >>> >> >error: Network not found: no network with
matching name
>>> >> >>> >> >'vdsm-ovirtmgmt'
>>> >> >>> >> >
>>> >> >>> >> >Is this not referring to the interface
name as the
network is
>>> >> >called
>>> >> >>> >> >'ovirtmgnt'.
>>> >> >>> >> >
>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef
Jalloq
>>> >> >>> ><shareef(a)jalloq.co.uk>
>>> >> >>> >> >wrote:
>>> >> >>> >> >
>>> >> >>> >> >> Hmmm, virsh tells me the HE is
running but it hasn't
come
>>> >up
>>> >> >and
>>> >> >>> >the
>>> >> >>> >> >> agent.log is full of the same
errors.
>>> >> >>> >> >>
>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM
Shareef Jalloq
>>> >> >>> ><shareef(a)jalloq.co.uk>
>>> >> >>> >> >> wrote:
>>> >> >>> >> >>
>>> >> >>> >> >>> Ah hah! Ok, so I've managed
to start it using virsh
on
>>> >the
>>> >> >>> >second
>>> >> >>> >> >host
>>> >> >>> >> >>> but my first host is still dead.
>>> >> >>> >> >>>
>>> >> >>> >> >>> First of all, what are these
56,317 .prob- files that
get
>>> >> >dumped
>>> >> >>> >to
>>> >> >>> >> >the
>>> >> >>> >> >>> NFS mounts?
>>> >> >>> >> >>>
>>> >> >>> >> >>> Secondly, why doesn't the
node mount the NFS
directories
>>> >at
>>> >> >boot?
>>> >> >>> >> >Is
>>> >> >>> >> >>> that the issue with this
particular node?
>>> >> >>> >> >>>
>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM
>>> ><eevans(a)digitaldatatechs.com>
>>> >> >>> >wrote:
>>> >> >>> >> >>>
>>> >> >>> >> >>>> Did you try virsh list
--inactive
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Eric Evans
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Digital Data Services LLC.
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> 304.660.9080
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> *From:* Shareef Jalloq
<shareef(a)jalloq.co.uk>
>>> >> >>> >> >>>> *Sent:* Wednesday, April 8,
2020 5:58 PM
>>> >> >>> >> >>>> *To:* Strahil Nikolov
<hunter86_bg(a)yahoo.com>
>>> >> >>> >> >>>> *Cc:* Ovirt Users
<users(a)ovirt.org>
>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re:
ovirt-engine
unresponsive -
>>> >how
>>> >> >to
>>> >> >>> >> >rescue?
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> I've now shut down the
VMs on one host and rebooted
it
>>> >but
>>> >> >the
>>> >> >>> >> >agent
>>> >> >>> >> >>>> service doesn't start.
If I run 'hosted-engine
>>> >--vm-status'
>>> >> >I
>>> >> >>> >get:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> The hosted engine
configuration has not been
retrieved
>>> >from
>>> >> >>> >shared
>>> >> >>> >> >>>> storage. Please ensure that
ovirt-ha-agent is
running and
>>> >> >the
>>> >> >>> >> >storage
>>> >> >>> >> >>>> server is reachable.
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> and indeed if I list the
mounts under
>>> >/rhev/data-center/mnt,
>>> >> >>> >only
>>> >> >>> >> >one of
>>> >> >>> >> >>>> the directories is mounted.
I have 3 NFS mounts,
one ISO
>>> >> >Domain
>>> >> >>> >> >and two
>>> >> >>> >> >>>> Data Domains. Only one Data
Domain has mounted and
this
>>> >has
>>> >> >>> >lots
>>> >> >>> >> >of .prob
>>> >> >>> >> >>>> files in. So why haven't
the other NFS exports been
>>> >> >mounted?
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Manually mounting them
doesn't seem to have helped
much
>>> >> >either.
>>> >> >>> >I
>>> >> >>> >> >can
>>> >> >>> >> >>>> start the broker service but
the agent service says
no.
>>> >> >Same
>>> >> >>> >error
>>> >> >>> >> >as the
>>> >> >>> >> >>>> one in my last email.
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Shareef.
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57
PM Shareef Jalloq
>>> >> >>> >> ><shareef(a)jalloq.co.uk>
>>> >> >>> >> >>>> wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Right, still down. I've
run virsh and it doesn't
know
>>> >> >anything
>>> >> >>> >> >about
>>> >> >>> >> >>>> the engine vm.
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> I've restarted the broker
and agent services and I
still
>>> >get
>>> >> >>> >> >nothing in
>>> >> >>> >> >>>> virsh->list.
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> In the logs under
/var/log/ovirt-hosted-engine-ha I
see
>>> >lots
>>> >> >of
>>> >> >>> >> >errors:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> broker.log:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker
2.3.6 started
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Searching for submonitors in
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor network
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor
cpu-load-no-engine
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor
mgmt-bridge
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor network
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor cpu-load
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor
engine-health
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor
mgmt-bridge
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor
cpu-load-no-engine
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor cpu-load
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor mem-free
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor
storage-domain
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor
storage-domain
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor mem-free
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Loaded submonitor
engine-health
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Finished loading submonitors
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>>> >> >>> >> >>>> Connecting the storage
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>> >> >>>> Connecting storage server
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>> >> >>>> Connecting storage server
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>>> >> >>> >> >>>> Refreshing the storage
domain
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
MainThread::WARNING::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>>> >> >>> >> >>>> Can't connect vdsm
storage: Command
StorageDomain.getInfo
>>> >> >with
>>> >> >>> >args
>>> >> >>> >> >>>> {'storagedomainID':
>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'}
>>> >> >>> >failed:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> (code=350, message=Error in
storage domain action:
>>> >> >>> >> >>>>
(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',))
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker
2.3.6 started
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>>> >> >>> >> >>>> Searching for submonitors in
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> agent.log:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
MainThread::ERROR::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> >> >>> >> >>>> Trying to restart agent
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>>
>>> >>
>>>
>>>
>>>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> >> >>> >> >>>> Agent shutting down
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>>
>>> >>
>>>
>>>
>>>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent
2.3.6 started
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>>> >> >>> >> >>>> Found certificate common
name:
ovirt-node-01.phoelex.com
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>> >> >>> >> >>>> Initializing ha-broker
connection
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>>> >> >>> >> >>>> Starting monitor network,
options {'tcp_t_address':
'',
>>> >> >>> >> >'network_test':
>>> >> >>> >> >>>> 'dns',
'tcp_t_port': '', 'addr': '192.168.1.99'}
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
MainThread::ERROR::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>>> >> >>> >> >>>> Failed to start necessary
monitors
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
MainThread::ERROR::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> >> >>> >> >>>> Traceback (most recent call
last):
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> File
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> >> >>> >> >>>> line 131, in _run_agent
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> return action(he)
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> File
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> >> >>> >> >>>> line 55, in action_proper
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> return
he.start_monitoring()
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> File
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> >> >>> >> >>>> line 432, in
start_monitoring
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
self._initialize_broker()
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> File
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> >> >>> >> >>>> line 556, in
_initialize_broker
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> m.get('options',
{}))
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> File
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>> >> >>> >> >>>> line 89, in start_monitor
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> ).format(t=type,
o=options, e=e)
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> RequestError: brokerlink -
failed to start monitor
via
>>> >> >>> >> >ovirt-ha-broker:
>>> >> >>> >> >>>> [Errno 2] No such file or
directory, [monitor:
'network',
>>> >> >>> >options:
>>> >> >>> >> >>>> {'tcp_t_address':
'', 'network_test': 'dns',
>>> >'tcp_t_port':
>>> >> >'',
>>> >> >>> >> >'addr':
>>> >> >>> >> >>>> '192.168.1.99'}]
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
MainThread::ERROR::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >>
>>> >>
>>>
>>>
>>>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>> >> >>> >> >>>> Trying to restart agent
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> MainThread::INFO::2020-04-08
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>>
>>> >>
>>>
>>>
>>>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>> >> >>> >> >>>> Agent shutting down
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10
PM Strahil Nikolov
>>> >> >>> >> ><hunter86_bg(a)yahoo.com>
>>> >> >>> >> >>>> wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM
GMT+03:00, "Maton,
Brett" <
>>> >> >>> >> >>>> matonb(a)ltresources.co.uk>
wrote:
>>> >> >>> >> >>>> >On the host you tried to
restart the engine on:
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >Add an alias to virsh
(authenticates with
>>> >virsh_auth.conf)
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >alias virsh='virsh
-c
>>> >> >>> >> >>>>
>>> >> >>>
>>>
>>>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >Then run virsh:
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >virsh
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >virsh # list
>>> >> >>> >> >>>> > Id Name
State
>>> >> >>> >> >>>>
>----------------------------------------------------
>>> >> >>> >> >>>> > xx HostedEngine
Paused
>>> >> >>> >> >>>> > xx **********
running
>>> >> >>> >> >>>> > ...
>>> >> >>> >> >>>> > xx **********
running
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >HostedEngine should be in
the list, try and resume
the
>>> >> >engine:
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >virsh # resume
HostedEngine
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at
17:28, Shareef Jalloq
>>> >> >>> ><shareef(a)jalloq.co.uk>
>>> >> >>> >> >>>> >wrote:
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>> >> Thanks!
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >> The status hangs due
to, I guess, the VM being
>>> >down....
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >> [root@ovirt-node-01
~]# hosted-engine --vm-start
>>> >> >>> >> >>>> >> VM exists and is
down, cleaning up and restarting
>>> >> >>> >> >>>> >> VM in WaitForLaunch
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >> but this doesn't
seem to do anything. OK, after
a
>>> >while
>>> >> >I
>>> >> >>> >get a
>>> >> >>> >> >>>> >status of
>>> >> >>> >> >>>> >> it being barfed...
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >> --== Host
ovirt-node-00.phoelex.com (id: 1)
status
>>> >==--
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >>
conf_on_shared_storage : True
>>> >> >>> >> >>>> >> Status up-to-date
: False
>>> >> >>> >> >>>> >> Hostname
:
>>> >> >>> >ovirt-node-00.phoelex.com
>>> >> >>> >> >>>> >> Host ID
: 1
>>> >> >>> >> >>>> >> Engine status
: unknown
>>> >stale-data
>>> >> >>> >> >>>> >> Score
: 3400
>>> >> >>> >> >>>> >> stopped
: False
>>> >> >>> >> >>>> >> Local maintenance
: False
>>> >> >>> >> >>>> >> crc32
: 9c4a034b
>>> >> >>> >> >>>> >> local_conf_timestamp
: 523362
>>> >> >>> >> >>>> >> Host timestamp
: 523608
>>> >> >>> >> >>>> >> Extra metadata
(valid at timestamp):
>>> >> >>> >> >>>> >>
metadata_parse_version=1
>>> >> >>> >> >>>> >>
metadata_feature_version=1
>>> >> >>> >> >>>> >> timestamp=523608
(Wed Apr 8 16:17:11 2020)
>>> >> >>> >> >>>> >> host-id=1
>>> >> >>> >> >>>> >> score=3400
>>> >> >>> >> >>>> >>
vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06
2020)
>>> >> >>> >> >>>> >>
conf_on_shared_storage=True
>>> >> >>> >> >>>> >> maintenance=False
>>> >> >>> >> >>>> >> state=EngineDown
>>> >> >>> >> >>>> >> stopped=False
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >> --== Host
ovirt-node-01.phoelex.com (id: 2)
status
>>> >==--
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >>
conf_on_shared_storage : True
>>> >> >>> >> >>>> >> Status up-to-date
: True
>>> >> >>> >> >>>> >> Hostname
:
>>> >> >>> >ovirt-node-01.phoelex.com
>>> >> >>> >> >>>> >> Host ID
: 2
>>> >> >>> >> >>>> >> Engine status
: {"reason":
"bad
>>> >vm
>>> >> >>> >status",
>>> >> >>> >> >>>> >"health":
>>> >> >>> >> >>>> >> "bad",
"vm": "down_unexpected", "detail": "Down"}
>>> >> >>> >> >>>> >> Score
: 0
>>> >> >>> >> >>>> >> stopped
: False
>>> >> >>> >> >>>> >> Local maintenance
: False
>>> >> >>> >> >>>> >> crc32
: 5045f2eb
>>> >> >>> >> >>>> >> local_conf_timestamp
: 1737037
>>> >> >>> >> >>>> >> Host timestamp
: 1737283
>>> >> >>> >> >>>> >> Extra metadata
(valid at timestamp):
>>> >> >>> >> >>>> >>
metadata_parse_version=1
>>> >> >>> >> >>>> >>
metadata_feature_version=1
>>> >> >>> >> >>>> >> timestamp=1737283
(Wed Apr 8 16:16:17 2020)
>>> >> >>> >> >>>> >> host-id=2
>>> >> >>> >> >>>> >> score=0
>>> >> >>> >> >>>> >>
vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11
>>> >2020)
>>> >> >>> >> >>>> >>
conf_on_shared_storage=True
>>> >> >>> >> >>>> >> maintenance=False
>>> >> >>> >> >>>> >>
state=EngineUnexpectedlyDown
>>> >> >>> >> >>>> >> stopped=False
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020
at 5:09 PM Maton, Brett
>>> >> >>> >> >>>>
><matonb(a)ltresources.co.uk>
>>> >> >>> >> >>>> >> wrote:
>>> >> >>> >> >>>> >>
>>> >> >>> >> >>>> >>> First steps, on
one of your hosts as root:
>>> >> >>> >> >>>> >>>
>>> >> >>> >> >>>> >>> To get
information:
>>> >> >>> >> >>>> >>> hosted-engine
--vm-status
>>> >> >>> >> >>>> >>>
>>> >> >>> >> >>>> >>> To start the
engine:
>>> >> >>> >> >>>> >>> hosted-engine
--vm-start
>>> >> >>> >> >>>> >>>
>>> >> >>> >> >>>> >>>
>>> >> >>> >> >>>> >>> On Wed, 8 Apr
2020 at 17:00, Shareef Jalloq
>>> >> >>> >> ><shareef(a)jalloq.co.uk>
>>> >> >>> >> >>>> >wrote:
>>> >> >>> >> >>>> >>>
>>> >> >>> >> >>>> >>>> So my engine
has gone down and I can't ssh into
it
>>> >> >either.
>>> >> >>> >If
>>> >> >>> >> >I
>>> >> >>> >> >>>> >try to
>>> >> >>> >> >>>> >>>> log into the
web-ui of the node it is running
on, I
>>> >get
>>> >> >>> >> >redirected
>>> >> >>> >> >>>> >because
>>> >> >>> >> >>>> >>>> the node
can't reach the engine.
>>> >> >>> >> >>>> >>>>
>>> >> >>> >> >>>> >>>> What are my
next steps?
>>> >> >>> >> >>>> >>>>
>>> >> >>> >> >>>> >>>> Shareef.
>>> >> >>> >> >>>> >>>>
_______________________________________________
>>> >> >>> >> >>>> >>>> Users
mailing list -- users(a)ovirt.org
>>> >> >>> >> >>>> >>>> To
unsubscribe send an email to
>>> >users-leave(a)ovirt.org
>>> >> >>> >> >>>> >>>> Privacy
Statement:
>>> >> >>> >https://www.ovirt.org/privacy-policy.html
>>> >> >>> >> >>>> >>>> oVirt Code
of Conduct:
>>> >> >>> >> >>>> >>>>
>>> >> >https://www.ovirt.org/community/about/community-guidelines/
>>> >> >>> >> >>>> >>>> List
Archives:
>>> >> >>> >> >>>> >>>>
>>> >> >>> >> >>>> >
>>> >> >>> >> >>>>
>>> >> >>> >> >
>>> >> >>> >>
>>> >> >>> >
>>> >> >>>
>>> >> >
>>> >>
>>> >
>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRS...
>>> >> >>> >> >>>> >>>>
>>> >> >>> >> >>>> >>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> This has to be resolved:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Engine status
: unknown
stale-data
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Run again 'hosted-engine
--vm-status'. If it remains
the
>>> >> >same,
>>> >> >>> >> >restart
>>> >> >>> >> >>>> ovirt-ha-broker.service &
ovirt-ha-agent.service
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Verify that the engine's
storage is available. Then
>>> >monitor
>>> >> >the
>>> >> >>> >> >broker
>>> >> >>> >> >>>> & agent logs in
/var/log/ovirt-hosted-engine-ha
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> Best Regards,
>>> >> >>> >> >>>> Strahil Nikolov
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >>
>>> >> >>> >> Hi Shareef,
>>> >> >>> >>
>>> >> >>> >> The flow of activation oVirt is more complex
than a plain
KVM.
>>> >> >>> >> Mounting of the domains happen during the
activation of
the
>>> >node
>>> >> >(
>>> >> >>> >the
>>> >> >>> >> HostedEngine is activating everything
needed).
>>> >> >>> >>
>>> >> >>> >> Focus on the HostedEngine VM.
>>> >> >>> >> Is it running properly ?
>>> >> >>> >>
>>> >> >>> >> If not,try:
>>> >> >>> >> 1. Verify that the storage domain exists
>>> >> >>> >> 2. Check if it has 'ha_agents'
directory
>>> >> >>> >> 3. Check if the links are OK, if not you can
safely
remove
>>> >the
>>> >> >links
>>> >> >>> >>
>>> >> >>> >> 4. Next check the services are running:
>>> >> >>> >> A) sanlock
>>> >> >>> >> B) supervdsmd
>>> >> >>> >> C) vdsmd
>>> >> >>> >> D) libvirtd
>>> >> >>> >>
>>> >> >>> >> 5. Increase the log level for broker and
agent services:
>>> >> >>> >>
>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha
>>> >> >>> >> vim *-log.conf
>>> >> >>> >>
>>> >> >>> >> systemctl restart ovirt-ha-broker
ovirt-ha-agent
>>> >> >>> >>
>>> >> >>> >> 6. Check what they are complaining about
>>> >> >>> >> Keep in mind that agent will keep throwing
errors untill
the
>>> >> >broker
>>> >> >>> >stops
>>> >> >>> >> doing it (agent depends on broker), so
broker must be
OK
>>> >before
>>> >> >>> >> peoceeding with the agent log.
>>> >> >>> >>
>>> >> >>> >> About the manual VM start, you need 2
things:
>>> >> >>> >>
>>> >> >>> >> 1. Define the VM network
>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network>
>>> >> >>> >> <name>vdsm-ovirtmgmt</name>
>>> >> >>> >>
<uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid>
>>> >> >>> >> <forward mode='bridge'/>
>>> >> >>> >> <bridge name='ovirtmgmt'/>
>>> >> >>> >> </network>
>>> >> >>> >>
>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh
define
>>> >> >vdsm-ovirtmgmt.xml
>>> >> >>> >>
>>> >> >>> >> 2. Get an xml definition which can be found in
the vdsm
log.
>>> >> >Every VM
>>> >> >>> >at
>>> >> >>> >> start up has it's configuration printed
out in vdsm log
on
>>> >the
>>> >> >host
>>> >> >>> >it
>>> >> >>> >> starts.
>>> >> >>> >> Save to file and then:
>>> >> >>> >> A) virsh define myvm.xml
>>> >> >>> >> B) virsh start myvm
>>> >> >>> >>
>>> >> >>> >> It seems there is/was a problem with your NFS
shares.
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> Best Regards,
>>> >> >>> >> Strahil Nikolov
>>> >> >>> >>
>>> >> >>>
>>> >> >>> Hey Shareef,
>>> >> >>>
>>> >> >>> Check if there are any files or folders not owned by
vdsm:kvm .
>>> >> >Something
>>> >> >>> like this:
>>> >> >>>
>>> >> >>> find . -not -user 36 -not -group 36 -print
>>> >> >>>
>>> >> >>> Also check if vdsm can access the images in the
>>> >> >>> '<vol-mount-point>/images' directories.
>>> >> >>>
>>> >> >>> Best Regards,
>>> >> >>> Strahil Nikolov
>>> >> >>>
>>> >> >>
>>> >>
>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ?
>>> >>
>>> >> I don't see in the log output.
>>> >>
>>> >> Best Regards,
>>> >> Strahil Nikolov
>>> >>
>>>
>>> Based on your output , you got a PTR record for IPv4 & IPv6 ...
most
>>> probably it's the reason.
>>>
>>> Set the IPv6 on the interface and try again.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>
Do you have firewalld up and running on the host ?
Best Regards,
Strahil Nikolov