Failing job is:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-iscsi-suite-mas...
My findings:
2018-07-06 03:28:05,461::008_restart_he_vm.py::_shutdown_he_vm::123::root::INFO::
* VM is down.
2018-07-06 03:28:05,461::008_restart_he_vm.py::_restart_services::127::root::INFO::
* Restarting services...
2018-07-06 03:28:05,729::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
task:f7db7960-b541-4d18-9a73-45b3d0677f03:Get ssh client for
lago-he-basic-iscsi-suite-master-host-0:
2018-07-06 03:28:06,031::ssh.py::ssh::58::lago.ssh::DEBUG::Running
98d0de9e on lago-he-basic-iscsi-suite-master-host-0: systemctl restart
vdsmd ovirt-ha-broker ovirt-ha-agent
2018-07-06 03:28:22,887::ssh.py::ssh::81::lago.ssh::DEBUG::Command
98d0de9e on lago-he-basic-iscsi-suite-master-host-0 returned with 0
it then waits for engine to be up again till it gives up 10 minutes later:
2018-07-06 03:38:20,979::log_utils.py::__enter__::600::lago.ssh::DEBUG::start
task:f44bb656-dfe3-4f57-a6c8-e7e208863054:Get ssh client for
lago-he-basic-iscsi-suite-master-host-0:
2018-07-06 03:38:21,290::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
task:f44bb656-dfe3-4f57-a6c8-e7e208863054:Get ssh client for
lago-he-basic-iscsi-suite-master-host-0:
2018-07-06 03:38:21,591::ssh.py::ssh::58::lago.ssh::DEBUG::Running
07b7f08a on lago-he-basic-iscsi-suite-master-host-0: hosted-engine
--vm-status
2018-07-06 03:38:21,637::ssh.py::ssh::81::lago.ssh::DEBUG::Command
07b7f08a on lago-he-basic-iscsi-suite-master-host-0 returned with 1
2018-07-06 03:38:21,637::ssh.py::ssh::89::lago.ssh::DEBUG::Command
07b7f08a on lago-he-basic-iscsi-suite-master-host-0 output:
The hosted engine configuration has not been retrieved from shared
storage. Please ensure that ovirt-ha-agent is running and the storage
server is reachable.
On host 0, vdsm.log shows it's restarting:
2018-07-05 23:28:19,646-0400 INFO (MainThread) [vds] Exiting (vdsmd:171)
2018-07-05 23:28:23,514-0400 INFO (MainThread) [vds] (PID: 19942) I
am the actual vdsm 4.30.0-465.git1ad18aa.el7
lago-he-basic-iscsi-suite-master-host-0 (3.10.0-862.2.3.el7.x86_64)
(vdsmd:149)
vdsm then stays for the 10 minutes above mentioned waiting for the storage
pool to go up:
2018-07-05 23:38:20,739-0400 INFO (vmrecovery) [vds] recovery:
waiting for storage pool to go up (clientIF:704)
while the ha agent try to get the hosted engine stats
2018-07-05 23:38:30,363-0400 WARN (vdsm.Scheduler) [Executor] Worker
blocked: <Worker name=jsonrpc/7 running <Task <JsonRpcTask {'params':
{}, 'jsonrpc': '2.0', 'method': u'Host.getStats',
'id':
u'cddde340-37a8-4f72-a471-b5bc40c06a16'} at 0x7f262814ae90>
timeout=60, duration=600.00 at 0x7f262815c050> task#=0 at
0x7f262834b810>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py",
line 195, in run
ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315,
in _execute_task
task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in
__call__
self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
261, in __call__
self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
304, in _serveRequest
response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
344, in _handle_request
res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202,
in _dynamicMethod
result = fn(*methodArgs)
File: "<string>", line 2, in getStats
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in method
ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1409, in getStats
multipath=True)}
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 78, in
get_stats
ret['haStats'] = _getHaInfo()
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 176,
in _getHaInfo
stats = instance.get_all_stats()
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 94, in get_all_stats
stats = broker.get_stats_from_storage()
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 135, in get_stats_from_storage
result = self._proxy.get_stats()
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
verbose=self.__verbose
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
response = h.getresponse(buffering=True)
File: "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
response.begin()
File: "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File: "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File: "/usr/lib64/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize) (executor:363)
Andrej, I see the test has been added by you in
commit e8d32f7375f2033b73544f47c1e1ca67abe8d35a
I'm not sure about the purpose of this test but I don't understand why we
are restarting the services on the host.
Nir, Tal, any idea on why the storage pool is not getting up?
I see vdsm is in recovery mode, I'm not sure if this was what the test
was supposed to do or if the intention was to cleanly shutdown vdsm
and cleanly restart it.
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <
https://www.redhat.com/>
sbonazzo(a)redhat.com
<
https://red.ht/sig>