basic suite 4.3 failed in live_storage_migration

Hi all, Please see [1][2]. lago.log [3]: 2020-01-20 11:55:16,647::utils.py::_ret_via_queue::63::lago.utils::DEBUG::Error while running thread Thread-72 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in _ret_via_queue queue.put({'return': func()}) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 79, in wrapper prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs File "/home/jenkins/agent/workspace/ovirt-system-tests_standard-check-patch/ovirt-system-tests/basic-suite-4.3/test-scenarios/004_basic_sanity.py", line 514, in live_storage_migration lambda: api.follow_link(disk_service.get().storage_domains[0]).name == SD_ISCSI_NAME File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 286, in assert_true_within_long assert_equals_within_long(func, True, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 273, in assert_equals_within_long func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 252, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 600 seconds Not sure, but this might be related: engine.log [4]: 2020-01-20 06:45:13,991-05 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [ae1529f8-3e05-4bc9-b3bf-d058f45dfb2b] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)',), code = 100 vdsm.log [5]: 2020-01-20 06:45:13,940-0500 ERROR (jsonrpc/1) [api] FINISH diskReplicateFinish error=Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo) (api:134) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 580, in diskReplicateFinish return self.vm.diskReplicateFinish(srcDisk, dstDisk) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4650, in diskReplicateFinish blkJobInfo = self._dom.blockJobInfo(drive.name, 0) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 108, in f raise toe TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo) Looked a bit, and can't find the root cause. Thanks and best regards, [1] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/ [2] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... [3] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... [4] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... [5] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... -- Didi

+Benny Zlotnik <bzlotnik@redhat.com> Can you please have a look? On Tue, 21 Jan 2020 at 11:14, Yedidyah Bar David <didi@redhat.com> wrote:
Hi all,
Please see [1][2].
lago.log [3]:
2020-01-20 11:55:16,647::utils.py::_ret_via_queue::63::lago.utils::DEBUG::Error while running thread Thread-72 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in _ret_via_queue queue.put({'return': func()}) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 79, in wrapper prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs File "/home/jenkins/agent/workspace/ovirt-system-tests_standard-check-patch/ovirt-system-tests/basic-suite-4.3/test-scenarios/004_basic_sanity.py", line 514, in live_storage_migration lambda: api.follow_link(disk_service.get().storage_domains[0]).name == SD_ISCSI_NAME File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 286, in assert_true_within_long assert_equals_within_long(func, True, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 273, in assert_equals_within_long func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 252, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 600 seconds
Not sure, but this might be related:
engine.log [4]:
2020-01-20 06:45:13,991-05 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [ae1529f8-3e05-4bc9-b3bf-d058f45dfb2b] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)',), code = 100
vdsm.log [5]:
2020-01-20 06:45:13,940-0500 ERROR (jsonrpc/1) [api] FINISH diskReplicateFinish error=Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo) (api:134) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 580, in diskReplicateFinish return self.vm.diskReplicateFinish(srcDisk, dstDisk) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4650, in diskReplicateFinish blkJobInfo = self._dom.blockJobInfo(drive.name, 0) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 108, in f raise toe TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)
Looked a bit, and can't find the root cause.
Thanks and best regards,
[1] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/ [2] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... [3] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... [4] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... [5] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... -- Didi
-- Regards, Eyal Shenitzky

On 21 Jan 2020, at 12:09, Eyal Shenitzky <eshenitz@redhat.com> wrote:
+Benny Zlotnik <mailto:bzlotnik@redhat.com> Can you please have a look?
likely the storage/network issue between engine and hosts, check /var/log/messages on hosts for anything suspicious
On Tue, 21 Jan 2020 at 11:14, Yedidyah Bar David <didi@redhat.com <mailto:didi@redhat.com>> wrote: Hi all,
Please see [1][2].
lago.log [3]:
2020-01-20 11:55:16,647::utils.py::_ret_via_queue::63::lago.utils::DEBUG::Error while running thread Thread-72 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in _ret_via_queue queue.put({'return': func()}) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 79, in wrapper prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs File "/home/jenkins/agent/workspace/ovirt-system-tests_standard-check-patch/ovirt-system-tests/basic-suite-4.3/test-scenarios/004_basic_sanity.py", line 514, in live_storage_migration lambda: api.follow_link(disk_service.get().storage_domains[0]).name == SD_ISCSI_NAME File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 286, in assert_true_within_long assert_equals_within_long(func, True, allowed_exceptions) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 273, in assert_equals_within_long func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 252, in assert_equals_within '%s != %s after %s seconds' % (res, value, timeout) AssertionError: False != True after 600 seconds
Not sure, but this might be related:
engine.log [4]:
2020-01-20 06:45:13,991-05 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [ae1529f8-3e05-4bc9-b3bf-d058f45dfb2b] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)',), code = 100
vdsm.log [5]:
2020-01-20 06:45:13,940-0500 ERROR (jsonrpc/1) [api] FINISH diskReplicateFinish error=Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo) (api:134) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 580, in diskReplicateFinish return self.vm.diskReplicateFinish(srcDisk, dstDisk) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4650, in diskReplicateFinish blkJobInfo = self._dom.blockJobInfo(drive.name <http://drive.name/>, 0) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 108, in f raise toe TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockInfo)
Looked a bit, and can't find the root cause.
Thanks and best regards,
[1] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/ <https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/> [2] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... <https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/artifact/check-patch.basic_suite_4.3.el7.x86_64/test_logs/basic-suite-4.3/post-004_basic_sanity.py/> [3] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... <https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/artifact/check-patch.basic_suite_4.3.el7.x86_64/test_logs/basic-suite-4.3/post-004_basic_sanity.py/lago_logs/lago.log/*view*/> [4] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... <https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/artifact/check-patch.basic_suite_4.3.el7.x86_64/test_logs/basic-suite-4.3/post-004_basic_sanity.py/lago-basic-suite-4-3-engine/_var_log/ovirt-engine/engine.log/*view*/> [5] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/a... <https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/7684/artifact/check-patch.basic_suite_4.3.el7.x86_64/test_logs/basic-suite-4.3/post-004_basic_sanity.py/lago-basic-suite-4-3-host-1/_var_log/vdsm/vdsm.log/*view*/> -- Didi
-- Regards, Eyal Shenitzky _______________________________________________ Devel mailing list -- devel@ovirt.org <mailto:devel@ovirt.org> To unsubscribe send an email to devel-leave@ovirt.org <mailto:devel-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ <https://www.ovirt.org/site/privacy-policy/> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WGBGB2NKNL7ZED... <https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WGBGB2NKNL7ZEDR537NS63HNY6TR4URR/>
participants (3)
-
Eyal Shenitzky
-
Michal Skrivanek
-
Yedidyah Bar David