On 21-03-2019 17:47, Simone Tiraboschi wrote:
On Thu, Mar 21, 2019 at 3:47 PM Arif Ali <mail@arif-ali.co.uk> wrote:Hi all,
Recently deployed oVirt version 4.3.1
It's in a self-hosted engine environment
Used the steps via cockpit to install the engine, and was able to add
the rest of the oVirt nodes without any specific problems
We tested the HA of the hosted-engine without a problem, and then at one
point of turn off the machine that was hosting the engine, to mimic
failure to see how it goes; the vm was able to move over successfully,
but some of the oVirt started to go into Unassigned. From a total of 6
oVirt hosts, I have 4 of them in this state.
Clicking on the host, I see the following message in the events. I can
get to the hosts via the engine, and ping the machine, so not sure what
it's doing that it's no longer working
VDSM <snip> command Get Host Capabilities failed: Message timeout which
can be caused by communication issues
Mind you, I have been trying to resolve this issue since Monday, and
have tried various things, like rebooting and re-installing the oVirt
hosts, without having much luck
So any assistance on this would be grateful, maybe I've missed something
really simple, and I am overlooking itCan you please check that VDSM is correctly running on that nodes?Are you able to correctly reach that nodes from the engine VM?
So, I have gone back, and re-installed the whole solution again with the 4.3.2 now, and I again have the same issue
Checking the vdsm logs, I get the issue below in the logs. The host is either Unassigned or Connecting. I don't have the option to Activate or put the host into Maintenance mode. I have tried rebooting the node with no luck
Mar 22 10:53:27 scvirt02 vdsm[32481]: WARN Worker blocked: <Worker name=periodic/2 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor object at 0x7efed4180610> at 0x7efed4180650> timeout=15, duration=30.00 at 0x7efed4180810> task#=2 at 0x7efef41987d0>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run
ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
self._callable()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 186, in __call__
self._func()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 481, in __call__
stats = hostapi.get_stats(self._cif, self._samples.stats())
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 79, in get_stats
ret['haStats'] = _getHaInfo()
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 177, in _getHaInfo
stats = instance.get_all_stats()
File: "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 94, in get_all_stats
stats = broker.get_stats_from_storage()
File: "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 143, in get_stats_from_storage
result = self._proxy.get_stats()
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
verbose=self.__verbose
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
response = h.getresponse(buffering=True)
File: "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
response.begin()
File: "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File: "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File: "/usr/lib64/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
Mar 22 11:02:32 <snip> ovsdb-server[4724]: ovs|01900|jsonrpc|WARN|Dropped 3 log messages in last 14 seconds (most recently, 7 seconds ago) due to excessive rate
Mar 22 11:02:32 <snip> ovsdb-server[4724]: ovs|01901|jsonrpc|WARN|ssl:[::ffff:192.168.203.205]:55658: send error: Protocol error
Mar 22 11:02:32 <snip> ovsdb-server[4724]: ovs|01902|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55658: connection dropped (Protocol error)
Mar 22 11:02:34 <snip> ovsdb-server[4724]: ovs|01903|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:34 <snip> ovsdb-server[4724]: ovs|01904|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49504: connection dropped (Protocol error)
Mar 22 11:02:40 <snip> ovsdb-server[4724]: ovs|01905|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:40 <snip> ovsdb-server[4724]: ovs|01906|jsonrpc|WARN|Dropped 1 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate
Mar 22 11:02:40 <snip> ovsdb-server[4724]: ovs|01907|jsonrpc|WARN|ssl:[::ffff:192.168.203.203]:34114: send error: Protocol error
Mar 22 11:02:40 <snip> ovsdb-server[4724]: ovs|01908|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34114: connection dropped (Protocol error)
Mar 22 11:02:41 <snip> ovsdb-server[4724]: ovs|01909|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52034: connection dropped (Protocol error)
Mar 22 11:02:48 <snip> ovsdb-server[4724]: ovs|01910|stream_ssl|WARN|Dropped 1 log messages in last 7 seconds (most recently, 7 seconds ago) due to excessive rate
Mar 22 11:02:48 <snip> ovsdb-server[4724]: ovs|01911|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:48 <snip> ovsdb-server[4724]: ovs|01912|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55660: connection dropped (Protocol error)
Mar 22 11:02:50 <snip> ovsdb-server[4724]: ovs|01913|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:50 <snip> ovsdb-server[4724]: ovs|01914|jsonrpc|WARN|Dropped 2 log messages in last 9 seconds (most recently, 2 seconds ago) due to excessive rate
Mar 22 11:02:50 <snip> ovsdb-server[4724]: ovs|01915|jsonrpc|WARN|ssl:[::ffff:192.168.203.202]:49506: send error: Protocol error
Mar 22 11:02:50 <snip> ovsdb-server[4724]: ovs|01916|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49506: connection dropped (Protocol error)
Mar 22 11:02:56 <snip> ovsdb-server[4724]: ovs|01917|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:02:56 <snip> ovsdb-server[4724]: ovs|01918|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34116: connection dropped (Protocol error)
Mar 22 11:02:57 <snip> ovsdb-server[4724]: ovs|01919|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52036: connection dropped (Protocol error)
Mar 22 11:03:04 <snip> ovsdb-server[4724]: ovs|01920|stream_ssl|WARN|Dropped 1 log messages in last 7 seconds (most recently, 7 seconds ago) due to excessive rate
Mar 22 11:03:04 <snip> ovsdb-server[4724]: ovs|01921|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:03:04 <snip> ovsdb-server[4724]: ovs|01922|jsonrpc|WARN|Dropped 2 log messages in last 9 seconds (most recently, 7 seconds ago) due to excessive rate
Mar 22 11:03:04 <snip> ovsdb-server[4724]: ovs|01923|jsonrpc|WARN|ssl:[::ffff:192.168.203.205]:55662: send error: Protocol error
Mar 22 11:03:04 <snip> ovsdb-server[4724]: ovs|01924|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55662: connection dropped (Protocol error)
Mar 22 11:03:06 <snip> ovsdb-server[4724]: ovs|01925|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49508: connection dropped (Protocol error)
Mar 22 11:03:12 <snip> ovsdb-server[4724]: ovs|01926|stream_ssl|WARN|Dropped 1 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate
Mar 22 11:03:12 <snip> ovsdb-server[4724]: ovs|01927|stream_ssl|WARN|SSL_accept: unexpected SSL connection close
Mar 22 11:03:12 <snip> ovsdb-server[4724]: ovs|01928|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34118: connection dropped (Protocol error)
Mar 22 11:03:13 <snip> ovsdb-server[4724]: ovs|01929|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52038: connection dropped (Protocol error)