On 22-03-2019 12:04, Arif Ali wrote:
On 21-03-2019 17:47, Simone Tiraboschi wrote:
On Thu, Mar 21, 2019 at 3:47 PM Arif Ali <mail(a)arif-ali.co.uk> wrote: Hi all,
Recently deployed oVirt version 4.3.1
It's in a self-hosted engine environment
Used the steps via cockpit to install the engine, and was able to add
the rest of the oVirt nodes without any specific problems
We tested the HA of the hosted-engine without a problem, and then at one
point of turn off the machine that was hosting the engine, to mimic
failure to see how it goes; the vm was able to move over successfully,
but some of the oVirt started to go into Unassigned. From a total of 6
oVirt hosts, I have 4 of them in this state.
Clicking on the host, I see the following message in the events. I can
get to the hosts via the engine, and ping the machine, so not sure what
it's doing that it's no longer working
VDSM <snip> command Get Host Capabilities failed: Message timeout which
can be caused by communication issues
Mind you, I have been trying to resolve this issue since Monday, and
have tried various things, like rebooting and re-installing the oVirt
hosts, without having much luck
So any assistance on this would be grateful, maybe I've missed something
really simple, and I am overlooking it
Can you please check that VDSM is correctly running on that nodes?
Are you able to correctly reach that nodes from the engine VM?
So, I have gone back, and re-installed the whole solution again with the
4.3.2 now, and I again have the same issue
Checking the vdsm logs, I get the issue below in the logs. The host is
either Unassigned or Connecting. I don't have the option to Activate or
put the host into Maintenance mode. I have tried rebooting the node with
no luck
Mar 22 10:53:27 scvirt02 vdsm[32481]: WARN WORKER BLOCKED: <WORKER
NAME=PERIODIC/2 RUNNING <TASK <OPERATION
ACTION=<VDSM.VIRT.SAMPLING.HOSTMONITOR OBJECT AT 0X7EFED4180610> AT
0X7EFED4180650> TIMEOUT=15, DURATION=30.00 AT 0X7EFED4180810> TASK#=2 AT
0X7EFEF41987D0>, TRACEBACK:
FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 785, IN __BOOTSTRAP
SELF.__BOOTSTRAP_INNER()
FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 812, IN __BOOTSTRAP_INNER
SELF.RUN()
FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 765, IN RUN
SELF.__TARGET(*SELF.__ARGS,
**SELF.__KWARGS)
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/COMMON/CONCURRENT.PY", LINE 195,
IN RUN
RET = FUNC(*ARGS, **KWARGS)
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 301, IN _RUN
SELF._EXECUTE_TASK()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 315, IN
_EXECUTE_TASK
TASK()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 391, IN
__CALL__
SELF._CALLABLE()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/VIRT/PERIODIC.PY", LINE 186, IN
__CALL__
SELF._FUNC()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/VIRT/SAMPLING.PY", LINE 481, IN
__CALL__
STATS =
HOSTAPI.GET_STATS(SELF._CIF, SELF._SAMPLES.STATS())
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/HOST/API.PY", LINE 79, IN
GET_STATS
RET['HASTATS'] = _GETHAINFO()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/HOST/API.PY", LINE 177, IN
_GETHAINFO
STATS = INSTANCE.GET_ALL_STATS()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/OVIRT_HOSTED_ENGINE_HA/CLIENT/CLIENT.PY",
LINE 94, IN GET_ALL_STATS
STATS =
BROKER.GET_STATS_FROM_STORAGE()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/OVIRT_HOSTED_ENGINE_HA/LIB/BROKERLINK.PY",
LINE 143, IN GET_STATS_FROM_STORAGE
RESULT = SELF._PROXY.GET_STATS()
FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1233, IN __CALL__
RETURN SELF.__SEND(SELF.__NAME,
ARGS)
FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1591, IN __REQUEST
VERBOSE=SELF.__VERBOSE
FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1273, IN REQUEST
RETURN SELF.SINGLE_REQUEST(HOST,
HANDLER, REQUEST_BODY, VERBOSE)
FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1303, IN SINGLE_REQUEST
RESPONSE =
H.GETRESPONSE(BUFFERING=TRUE)
FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 1113, IN GETRESPONSE
RESPONSE.BEGIN()
FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 444, IN BEGIN
VERSION, STATUS, REASON =
SELF._READ_STATUS()
FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 400, IN _READ_STATUS
LINE = SELF.FP.READLINE(_MAXLINE
+ 1)
FILE:
"/USR/LIB64/PYTHON2.7/SOCKET.PY", LINE 476, IN READLINE
DATA =
SELF._SOCK.RECV(SELF._RBUFSIZE)
On the engine host, I continuously get the following messages too
Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01900|jsonrpc|WARN|Dropped 3 log messages in last 14 seconds (most
recently, 7 seconds ago) due to excessive rate
Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01901|jsonrpc|WARN|ssl:[::ffff:192.168.203.205]:55658: send error:
Protocol error
Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01902|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55658: connection
dropped (Protocol error)
Mar 22 11:02:34 <snip> ovsdb-server[4724]:
ovs|01903|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:34 <snip> ovsdb-server[4724]:
ovs|01904|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49504: connection
dropped (Protocol error)
Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01905|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01906|jsonrpc|WARN|Dropped 1 log messages in last 5 seconds (most
recently, 5 seconds ago) due to excessive rate
Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01907|jsonrpc|WARN|ssl:[::ffff:192.168.203.203]:34114: send error:
Protocol error
Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01908|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34114: connection
dropped (Protocol error)
Mar 22 11:02:41 <snip> ovsdb-server[4724]:
ovs|01909|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52034: connection
dropped (Protocol error)
Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01910|stream_ssl|WARN|Dropped 1 log messages in last 7 seconds (most
recently, 7 seconds ago) due to excessive rate
Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01911|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01912|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55660: connection
dropped (Protocol error)
Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01913|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01914|jsonrpc|WARN|Dropped 2 log messages in last 9 seconds (most
recently, 2 seconds ago) due to excessive rate
Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01915|jsonrpc|WARN|ssl:[::ffff:192.168.203.202]:49506: send error:
Protocol error
Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01916|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49506: connection
dropped (Protocol error)
Mar 22 11:02:56 <snip> ovsdb-server[4724]:
ovs|01917|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:56 <snip> ovsdb-server[4724]:
ovs|01918|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34116: connection
dropped (Protocol error)
Mar 22 11:02:57 <snip> ovsdb-server[4724]:
ovs|01919|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52036: connection
dropped (Protocol error)
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01920|stream_ssl|WARN|Dropped 1 log messages in last 7 seconds (most
recently, 7 seconds ago) due to excessive rate
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01921|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01922|jsonrpc|WARN|Dropped 2 log messages in last 9 seconds (most
recently, 7 seconds ago) due to excessive rate
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01923|jsonrpc|WARN|ssl:[::ffff:192.168.203.205]:55662: send error:
Protocol error
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01924|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55662: connection
dropped (Protocol error)
Mar 22 11:03:06 <snip> ovsdb-server[4724]:
ovs|01925|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49508: connection
dropped (Protocol error)
Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01926|stream_ssl|WARN|Dropped 1 log messages in last 5 seconds (most
recently, 5 seconds ago) due to excessive rate
Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01927|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01928|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34118: connection
dropped (Protocol error)
Mar 22 11:03:13 <snip> ovsdb-server[4724]:
ovs|01929|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52038: connection
dropped (Protocol error)
I found my issue, and managed to resolve it nothing wrong with oVirt
The ovirtmgmt network is 10G, I by default set the MTU to 9000 as I
would normally for these type of the network, but found out later that
the network team at this site were not supporting 9000, so back to 1500
and all worked without a problem
Thanks to all for everyone's assistance
--
regards,
Arif Ali