
Failing job is: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-iscsi-suite-master... My findings: 2018-07-06 03:28:05,461::008_restart_he_vm.py::_shutdown_he_vm::123::root::INFO:: * VM is down. 2018-07-06 03:28:05,461::008_restart_he_vm.py::_restart_services::127::root::INFO:: * Restarting services... 2018-07-06 03:28:05,729::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:f7db7960-b541-4d18-9a73-45b3d0677f03:Get ssh client for lago-he-basic-iscsi-suite-master-host-0: 2018-07-06 03:28:06,031::ssh.py::ssh::58::lago.ssh::DEBUG::Running 98d0de9e on lago-he-basic-iscsi-suite-master-host-0: systemctl restart vdsmd ovirt-ha-broker ovirt-ha-agent 2018-07-06 03:28:22,887::ssh.py::ssh::81::lago.ssh::DEBUG::Command 98d0de9e on lago-he-basic-iscsi-suite-master-host-0 returned with 0 it then waits for engine to be up again till it gives up 10 minutes later: 2018-07-06 03:38:20,979::log_utils.py::__enter__::600::lago.ssh::DEBUG::start task:f44bb656-dfe3-4f57-a6c8-e7e208863054:Get ssh client for lago-he-basic-iscsi-suite-master-host-0: 2018-07-06 03:38:21,290::log_utils.py::__exit__::611::lago.ssh::DEBUG::end task:f44bb656-dfe3-4f57-a6c8-e7e208863054:Get ssh client for lago-he-basic-iscsi-suite-master-host-0: 2018-07-06 03:38:21,591::ssh.py::ssh::58::lago.ssh::DEBUG::Running 07b7f08a on lago-he-basic-iscsi-suite-master-host-0: hosted-engine --vm-status 2018-07-06 03:38:21,637::ssh.py::ssh::81::lago.ssh::DEBUG::Command 07b7f08a on lago-he-basic-iscsi-suite-master-host-0 returned with 1 2018-07-06 03:38:21,637::ssh.py::ssh::89::lago.ssh::DEBUG::Command 07b7f08a on lago-he-basic-iscsi-suite-master-host-0 output: The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. On host 0, vdsm.log shows it's restarting: 2018-07-05 23:28:19,646-0400 INFO (MainThread) [vds] Exiting (vdsmd:171) 2018-07-05 23:28:23,514-0400 INFO (MainThread) [vds] (PID: 19942) I am the actual vdsm 4.30.0-465.git1ad18aa.el7 lago-he-basic-iscsi-suite-master-host-0 (3.10.0-862.2.3.el7.x86_64) (vdsmd:149) vdsm then stays for the 10 minutes above mentioned waiting for the storage pool to go up: 2018-07-05 23:38:20,739-0400 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:704) while the ha agent try to get the hosted engine stats 2018-07-05 23:38:30,363-0400 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/7 running <Task <JsonRpcTask {'params': {}, 'jsonrpc': '2.0', 'method': u'Host.getStats', 'id': u'cddde340-37a8-4f72-a471-b5bc40c06a16'} at 0x7f262814ae90> timeout=60, duration=600.00 at 0x7f262815c050> task#=0 at 0x7f262834b810>, traceback: File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 765, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run self._execute_task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 261, in __call__ self._handler(self._ctx, self._req) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 304, in _serveRequest response = self._handle_request(req, ctx) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 344, in _handle_request res = method(**params) File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod result = fn(*methodArgs) File: "<string>", line 2, in getStats File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in method ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1409, in getStats multipath=True)} File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 78, in get_stats ret['haStats'] = _getHaInfo() File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 176, in _getHaInfo stats = instance.get_all_stats() File: "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 94, in get_all_stats stats = broker.get_stats_from_storage() File: "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 135, in get_stats_from_storage result = self._proxy.get_stats() File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ return self.__send(self.__name, args) File: "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request verbose=self.__verbose File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request response = h.getresponse(buffering=True) File: "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse response.begin() File: "/usr/lib64/python2.7/httplib.py", line 444, in begin version, status, reason = self._read_status() File: "/usr/lib64/python2.7/httplib.py", line 400, in _read_status line = self.fp.readline(_MAXLINE + 1) File: "/usr/lib64/python2.7/socket.py", line 476, in readline data = self._sock.recv(self._rbufsize) (executor:363) Andrej, I see the test has been added by you in commit e8d32f7375f2033b73544f47c1e1ca67abe8d35a I'm not sure about the purpose of this test but I don't understand why we are restarting the services on the host. Nir, Tal, any idea on why the storage pool is not getting up? I see vdsm is in recovery mode, I'm not sure if this was what the test was supposed to do or if the intention was to cleanly shutdown vdsm and cleanly restart it. -- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig>