On 21-03-2019 17:47, Simone Tiraboschi wrote:
On Thu, Mar 21, 2019 at 3:47 PM Arif Ali <mail(a)arif-ali.co.uk>
wrote:
> Hi all,
>
> Recently deployed oVirt version 4.3.1
>
> It's in a self-hosted engine environment
>
> Used the steps via cockpit to install the engine, and was able to add
> the rest of the oVirt nodes without any specific problems
>
> We tested the HA of the hosted-engine without a problem, and then at one
> point of turn off the machine that was hosting the engine, to mimic
> failure to see how it goes; the vm was able to move over successfully,
> but some of the oVirt started to go into Unassigned. From a total of 6
> oVirt hosts, I have 4 of them in this state.
>
> Clicking on the host, I see the following message in the events. I can
> get to the hosts via the engine, and ping the machine, so not sure what
> it's doing that it's no longer working
>
> VDSM <snip> command Get Host Capabilities failed: Message timeout which
> can be caused by communication issues
>
> Mind you, I have been trying to resolve this issue since Monday, and
> have tried various things, like rebooting and re-installing the oVirt
> hosts, without having much luck
>
> So any assistance on this would be grateful, maybe I've missed something
> really simple, and I am overlooking it
Can you please check that VDSM is correctly running on that nodes?
Are you able to correctly reach that nodes from the engine VM?
So, I have gone back, and re-installed the whole solution again with the
4.3.2 now, and I again have the same issue
Checking the vdsm logs, I get the issue below in the logs. The host is
either Unassigned or Connecting. I don't have the option to Activate or
put the host into Maintenance mode. I have tried rebooting the node with
no luck
Mar 22 10:53:27 scvirt02 vdsm[32481]: WARN WORKER BLOCKED: <WORKER
NAME=PERIODIC/2 RUNNING <TASK <OPERATION
ACTION=<VDSM.VIRT.SAMPLING.HOSTMONITOR OBJECT AT 0X7EFED4180610> AT
0X7EFED4180650> TIMEOUT=15, DURATION=30.00 AT 0X7EFED4180810> TASK#=2 AT
0X7EFEF41987D0>, TRACEBACK:
FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 785, IN __BOOTSTRAP
SELF.__BOOTSTRAP_INNER()
FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 812, IN __BOOTSTRAP_INNER
SELF.RUN()
FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 765, IN RUN
SELF.__TARGET(*SELF.__ARGS,
**SELF.__KWARGS)
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/COMMON/CONCURRENT.PY", LINE 195,
IN RUN
RET = FUNC(*ARGS, **KWARGS)
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 301, IN _RUN
SELF._EXECUTE_TASK()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 315, IN
_EXECUTE_TASK
TASK()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 391, IN
__CALL__
SELF._CALLABLE()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/VIRT/PERIODIC.PY", LINE 186, IN
__CALL__
SELF._FUNC()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/VIRT/SAMPLING.PY", LINE 481, IN
__CALL__
STATS =
HOSTAPI.GET_STATS(SELF._CIF, SELF._SAMPLES.STATS())
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/HOST/API.PY", LINE 79, IN
GET_STATS
RET['HASTATS'] = _GETHAINFO()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/HOST/API.PY", LINE 177, IN
_GETHAINFO
STATS = INSTANCE.GET_ALL_STATS()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/OVIRT_HOSTED_ENGINE_HA/CLIENT/CLIENT.PY",
LINE 94, IN GET_ALL_STATS
STATS =
BROKER.GET_STATS_FROM_STORAGE()
FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/OVIRT_HOSTED_ENGINE_HA/LIB/BROKERLINK.PY",
LINE 143, IN GET_STATS_FROM_STORAGE
RESULT = SELF._PROXY.GET_STATS()
FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1233, IN __CALL__
RETURN SELF.__SEND(SELF.__NAME,
ARGS)
FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1591, IN __REQUEST
VERBOSE=SELF.__VERBOSE
FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1273, IN REQUEST
RETURN SELF.SINGLE_REQUEST(HOST,
HANDLER, REQUEST_BODY, VERBOSE)
FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1303, IN SINGLE_REQUEST
RESPONSE =
H.GETRESPONSE(BUFFERING=TRUE)
FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 1113, IN GETRESPONSE
RESPONSE.BEGIN()
FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 444, IN BEGIN
VERSION, STATUS, REASON =
SELF._READ_STATUS()
FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 400, IN _READ_STATUS
LINE = SELF.FP.READLINE(_MAXLINE
+ 1)
FILE:
"/USR/LIB64/PYTHON2.7/SOCKET.PY", LINE 476, IN READLINE
DATA =
SELF._SOCK.RECV(SELF._RBUFSIZE)
On the engine host, I continuously get the following messages too
Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01900|jsonrpc|WARN|Dropped 3 log messages in last 14 seconds (most
recently, 7 seconds ago) due to excessive rate
Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01901|jsonrpc|WARN|ssl:[::ffff:192.168.203.205]:55658: send error:
Protocol error
Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01902|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55658: connection
dropped (Protocol error)
Mar 22 11:02:34 <snip> ovsdb-server[4724]:
ovs|01903|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:34 <snip> ovsdb-server[4724]:
ovs|01904|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49504: connection
dropped (Protocol error)
Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01905|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01906|jsonrpc|WARN|Dropped 1 log messages in last 5 seconds (most
recently, 5 seconds ago) due to excessive rate
Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01907|jsonrpc|WARN|ssl:[::ffff:192.168.203.203]:34114: send error:
Protocol error
Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01908|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34114: connection
dropped (Protocol error)
Mar 22 11:02:41 <snip> ovsdb-server[4724]:
ovs|01909|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52034: connection
dropped (Protocol error)
Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01910|stream_ssl|WARN|Dropped 1 log messages in last 7 seconds (most
recently, 7 seconds ago) due to excessive rate
Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01911|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01912|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55660: connection
dropped (Protocol error)
Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01913|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01914|jsonrpc|WARN|Dropped 2 log messages in last 9 seconds (most
recently, 2 seconds ago) due to excessive rate
Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01915|jsonrpc|WARN|ssl:[::ffff:192.168.203.202]:49506: send error:
Protocol error
Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01916|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49506: connection
dropped (Protocol error)
Mar 22 11:02:56 <snip> ovsdb-server[4724]:
ovs|01917|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:56 <snip> ovsdb-server[4724]:
ovs|01918|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34116: connection
dropped (Protocol error)
Mar 22 11:02:57 <snip> ovsdb-server[4724]:
ovs|01919|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52036: connection
dropped (Protocol error)
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01920|stream_ssl|WARN|Dropped 1 log messages in last 7 seconds (most
recently, 7 seconds ago) due to excessive rate
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01921|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01922|jsonrpc|WARN|Dropped 2 log messages in last 9 seconds (most
recently, 7 seconds ago) due to excessive rate
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01923|jsonrpc|WARN|ssl:[::ffff:192.168.203.205]:55662: send error:
Protocol error
Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01924|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55662: connection
dropped (Protocol error)
Mar 22 11:03:06 <snip> ovsdb-server[4724]:
ovs|01925|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49508: connection
dropped (Protocol error)
Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01926|stream_ssl|WARN|Dropped 1 log messages in last 5 seconds (most
recently, 5 seconds ago) due to excessive rate
Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01927|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01928|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34118: connection
dropped (Protocol error)
Mar 22 11:03:13 <snip> ovsdb-server[4724]:
ovs|01929|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52038: connection
dropped (Protocol error)
--
regards,
Arif Ali