On Sun, Mar 7, 2021 at 5:12 AM <jenkins(a)jenkins.phx.ovirt.org> wrote:
Failed during -basic-suite-master.test-scenarios.test_004_basic_sanity.test_ha_recovery
due to timeout. Can't tell exactly why. Now rebuilt, to see if it
happens again. Basic suite did pass this point with (mostly) the same
code, and there shouldn't be anything HE-specific in this test. FYI.
Some more details:
The function is:
@order_by(_TEST_LIST)
def test_ha_recovery(engine_api, get_ansible_host_for_vm):
engine = engine_api.system_service()
with engine_utils.wait_for_event(engine, [119, 9602, 506]):
# VM_DOWN_ERROR event(119)
# HA_VM_FAILED event event(9602)
# VDS_INITIATED_RUN_VM event(506)
ansible_host = get_ansible_host_for_vm(VM2_NAME)
pid = ansible_host.shell('pgrep -f
qemu.*guest=vm2')['stdout'].strip()
ansible_host.shell('kill -KILL {}'.format(pid))
vm_service = test_utils.get_vm_service(engine, VM2_NAME)
assertions.assert_true_within_long(
lambda:
vm_service.get().status == types.VmStatus.UP
)
with engine_utils.wait_for_event(engine, 33): # USER_STOP_VM event
vm_service.stop()
It failed inside the 'with' statement. Can't find any of the expected
events in engine.log. In last successful I see all three of them.
We do not actively log anything about running the pgrep command, but
it's logged by ansible on the host, to /var/log/messages. I see it in
last successful, but not in above failed run, in any of the two hosts.
So most likely it didn't even get to run pgrep.
IMO we probably want some more information logged in such a case, to
allow further investigation.
Best regards,
Build Number: 1943
Build Status: Still Failing
Triggered By: Started by timer
-------------------------------------
Changes Since Last Success:
-------------------------------------
Changes for Build #1939
[hbraha] ost: Check if user custom repos are used on VM
Changes for Build #1940
[hbraha] network: add check_repos fixture
Changes for Build #1941
[Andrej Cernek] network, pylint: Simplify pylintrc file
Changes for Build #1942
[Andrej Cernek] network, pylint: Simplify pylintrc file
Changes for Build #1943
[Andrej Cernek] network, pylint: Simplify pylintrc file
-----------------
Failed Tests:
-----------------
1 tests failed.
FAILED: he-basic-suite-master.test-scenarios.test_004_basic_sanity.test_ha_recovery
Error Message:
AssertionError: False != True after 600 seconds
Stack Trace:
engine_api = <ovirtsdk4.Connection object at 0x7f408114c1d0>
get_ansible_host_for_vm = <function
get_ansible_host_for_vm.<locals>.ansible_host_for at 0x7f4081171950>
@order_by(_TEST_LIST)
def test_ha_recovery(engine_api, get_ansible_host_for_vm):
engine = engine_api.system_service()
with engine_utils.wait_for_event(engine, [119, 9602, 506]):
# VM_DOWN_ERROR event(119)
# HA_VM_FAILED event event(9602)
# VDS_INITIATED_RUN_VM event(506)
ansible_host = get_ansible_host_for_vm(VM2_NAME)
pid = ansible_host.shell('pgrep -f
qemu.*guest=vm2')['stdout'].strip()
> ansible_host.shell('kill -KILL {}'.format(pid))
../he-basic-suite-master/test-scenarios/test_004_basic_sanity.py:1070:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.6/contextlib.py:88: in __exit__
next(self.gen)
../ost_utils/ost_utils/engine_utils.py:42: in wait_for_event
lambda:
../ost_utils/ost_utils/assertions.py:96: in assert_true_within_long
assert_equals_within_long(func, True, allowed_exceptions)
../ost_utils/ost_utils/assertions.py:81: in assert_equals_within_long
func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
func = <function wait_for_event.<locals>.<lambda> at 0x7f40810d5bf8>
value = True, timeout = 600, allowed_exceptions = [], initial_wait = 0
error_message = 'False != True after 600 seconds'
def assert_equals_within(
func, value, timeout, allowed_exceptions=None, initial_wait=10,
error_message=None
):
allowed_exceptions = allowed_exceptions or []
res = '<no-result-obtained>'
with _EggTimer(timeout) as timer:
while not timer.elapsed():
try:
res = func()
if res == value:
return
except Exception as exc:
if _instance_of_any(exc, allowed_exceptions):
time.sleep(3)
continue
LOGGER.exception("Unhandled exception in %s", func)
raise
if initial_wait == 0:
time.sleep(3)
else:
time.sleep(initial_wait)
initial_wait = 0
try:
if error_message is None:
error_message = '%s != %s after %s seconds' % (res, value,
timeout)
> raise AssertionError(error_message)
E AssertionError: False != True after 600 seconds
../ost_utils/ost_utils/assertions.py:59: AssertionError
--
Didi