<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Mon, Mar 19, 2018 at 9:22 AM Yedidyah Bar David <<a href="mailto:didi@redhat.com">didi@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On Mon, Mar 19, 2018 at 9:01 AM, Nir Soffer <<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>> wrote:<br>
><br>
><br>
> On Sun, Mar 18, 2018 at 4:21 PM Yedidyah Bar David <<a href="mailto:didi@redhat.com" target="_blank">didi@redhat.com</a>> wrote:<br>
>><br>
>> On Sun, Mar 18, 2018 at 2:48 PM, Yedidyah Bar David <<a href="mailto:didi@redhat.com" target="_blank">didi@redhat.com</a>><br>
>> wrote:<br>
>> > On Sun, Mar 18, 2018 at 1:45 PM, Yedidyah Bar David <<a href="mailto:didi@redhat.com" target="_blank">didi@redhat.com</a>><br>
>> > wrote:<br>
>> >> On Sun, Mar 18, 2018 at 11:20 AM, <<a href="mailto:jenkins@jenkins.phx.ovirt.org" target="_blank">jenkins@jenkins.phx.ovirt.org</a>><br>
>> >> wrote:<br>
>> >>> Project:<br>
>> >>> <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.1/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.1/</a><br>
>> >>> Build:<br>
>> >>> <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.1/223/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.1/223/</a><br>
>> >>> Build Number: 223<br>
>> >>> Build Status: Still Failing<br>
>> >>> Triggered By: Started by timer<br>
>> >><br>
>> >> It was broken by:<br>
>> >><br>
>> >> [1] <a href="https://gerrit.ovirt.org/88483" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/88483</a><br>
>> >><br>
>> >> It should be fixed by:<br>
>> >><br>
>> >> [2] <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1554283" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1554283</a><br>
>> >><br>
>> >> [2] is on modified, no idea about its status other than that.<br>
>> >><br>
>> >> I didn't intend to merge [1] before [2] is fixed, not sure if<br>
>> >> Sandro didn't notice my comment there, or thought that [2] is fixed.<br>
>> >><br>
>> >> If it's annoying we can revert [1] and re-merge when [2] is fixed.<br>
>> ><br>
>> > Talked with Gal, and pushed this, which should hopefully fix:<br>
>> ><br>
>> > <a href="https://gerrit.ovirt.org/89136" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/89136</a><br>
>><br>
>> It indeed seems to fix [1][2]:<br>
>><br>
>> 13:13:24 # he_get_shared_config:<br>
>> 13:13:26 # he_get_shared_config: Success (in 0:00:01)<br>
>> 13:13:26 # sleep:<br>
>> 13:15:26 # sleep: Success (in 0:02:00)<br>
>> 13:15:26 # add_he_hosts:<br>
>> 13:16:18 # add_he_hosts: Success (in 0:00:52)<br>
>> 13:16:18 # he_check_ha_agent:<br>
>> 13:16:19 # he_check_ha_agent: Success (in 0:00:00)<br>
>><br>
>> But later fails:<br>
>><br>
>> 13:16:20 # add_secondary_storage_domains:<br>
>> 13:19:30 Error while running thread<br>
>> 13:19:30 Traceback (most recent call last):<br>
>> 13:19:30 File "/usr/lib/python2.7/site-packages/lago/utils.py", line<br>
>> 58, in _ret_via_queue<br>
>> 13:19:30 queue.put({'return': func()})<br>
>> 13:19:30 File<br>
>><br>
>> "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",<br>
>> line 491, in add_nfs_storage_domain<br>
>> 13:19:30 add_generic_nfs_storage_domain(prefix, SD_NFS_NAME,<br>
>> SD_NFS_HOST_NAME, SD_NFS_PATH)<br>
>> 13:19:30 File<br>
>><br>
>> "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",<br>
>> line 496, in add_generic_nfs_storage_domain<br>
>> 13:19:30 add_generic_nfs_storage_domain_4(prefix, sd_nfs_name,<br>
>> nfs_host_name, mount_path, sd_format, sd_type, nfs_version)<br>
>> 13:19:30 File<br>
>><br>
>> "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",<br>
>> line 552, in add_generic_nfs_storage_domain_4<br>
>> 13:19:30 _add_storage_domain_4(api, p)<br>
>> 13:19:30 File<br>
>><br>
>> "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",<br>
>> line 466, in _add_storage_domain_4<br>
>> 13:19:30 id=<a href="http://sd.id" rel="noreferrer" target="_blank">sd.id</a>,<br>
>> 13:19:30 File<br>
>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 2219,<br>
>> in add<br>
>> 13:19:30 return self._internal_add(storage_domain, headers, query,<br>
>> wait)<br>
>> 13:19:30 File<br>
>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 223,<br>
>> in _internal_add<br>
>> 13:19:30 return future.wait() if wait else future<br>
>> 13:19:30 File<br>
>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 53, in<br>
>> wait<br>
>> 13:19:30 return self._code(response)<br>
>> 13:19:30 File<br>
>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 220,<br>
>> in callback<br>
>> 13:19:30 self._check_fault(response)<br>
>> 13:19:30 File<br>
>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 123,<br>
>> in _check_fault<br>
>> 13:19:30 self._raise_error(response, body)<br>
>> 13:19:30 File<br>
>> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 109,<br>
>> in _raise_error<br>
>> 13:19:30 raise error<br>
>> 13:19:30 Error: Fault reason is "Operation Failed". Fault detail is<br>
>> "[Storage domain cannot be reached. Please ensure it is accessible<br>
>> from the host(s).]". HTTP response code is 400.<br>
>> 13:19:30 Error while running thread<br>
>><br>
>> I'll retrigger now, but perhaps someone from storage wants to check.<br>
>> vdsm log has[3]:<br>
>><br>
>> 2018-03-18 08:59:52,918-0400 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer]<br>
>> RPC call Host.getHardwareInfo succeeded in 0.00 seconds (__init__:539)<br>
>> 2018-03-18 08:59:52,924-0400 INFO (jsonrpc/4) [vdsm.api] START<br>
>> prepareImage(sdUUID=u'424e809d-b7ad-4ed5-b6a1-5426d373a5d2',<br>
>> spUUID=u'8509c64f-cdd7-4713-9e09-a79d90ba26ed',<br>
>> imgUUID=u'718090ba-b36e-45cc-bcd6-597c16a766b9',<br>
>> leafUUID=u'23ee843b-20e0-4afe-98dc-e165334ac710', allowIllegal=False)<br>
>> from=::1,36020, task_id=92aeabf2-084a-4c98-a6e5-03efae38e7b3 (api:46)<br>
>> 2018-03-18 08:59:52,928-0400 INFO (jsonrpc/4) [vdsm.api] FINISH<br>
>> prepareImage error=Volume does not exist:<br>
>> (u'23ee843b-20e0-4afe-98dc-e165334ac710',) from=::1,36020,<br>
>> task_id=92aeabf2-084a-4c98-a6e5-03efae38e7b3 (api:50)<br>
>> 2018-03-18 08:59:52,928-0400 ERROR (jsonrpc/4)<br>
>> [storage.TaskManager.Task]<br>
>> (Task='92aeabf2-084a-4c98-a6e5-03efae38e7b3') Unexpected error<br>
>> (task:872)<br>
>> Traceback (most recent call last):<br>
>> File "/usr/share/vdsm/storage/task.py", line 879, in _run<br>
>> return fn(*args, **kargs)<br>
>> File "<string>", line 2, in prepareImage<br>
>> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in<br>
>> method<br>
>> ret = func(*args, **kwargs)<br>
>> File "/usr/share/vdsm/storage/hsm.py", line 3137, in prepareImage<br>
>> raise se.VolumeDoesNotExist(leafUUID)<br>
>> VolumeDoesNotExist: Volume does not exist:<br>
>> (u'23ee843b-20e0-4afe-98dc-e165334ac710',)<br>
><br>
><br>
> Immediately after this failed prepare, we see:<br>
><br>
> 2018-03-18 08:59:52,941-0400 INFO (jsonrpc/7) [vdsm.api] START<br>
> createVolume(sdUUID=u'424e809d-b7ad-4ed5-b6a1-5426d373a5d2',<br>
> spUUID=u'8509c64f-cdd7-4713-9e09-a79d90ba26ed',<br>
> imgUUID=u'718090ba-b36e-45cc-bcd6-597c16a766b9', size=u'1048576',<br>
> volFormat=5, preallocate=1, diskType=2,<br>
> volUUID=u'23ee843b-20e0-4afe-98dc-e165334ac710',<br>
> desc=u'hosted-engine.lockspace',<br>
> srcImgUUID=u'00000000-0000-0000-0000-000000000000',<br>
> srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None)<br>
> from=::1,36020, task_id=c4c73314-ee10-419a-8265-b8b87763e99f (api:46)<br>
><br>
> So that volume really did not exit when it was prepared, which means it is a<br>
> client<br>
> error, not vdsm error.<br>
><br>
> The caller (hosted engine agent?)<br>
<br>
The caller is ovirt-system-tests. The code for<br>
add_secondary_storage_domains is duplicated among most of the suites,<br>
didn't check variations. See e.g.:<br>
<br>
<a href="https://github.com/oVirt/ovirt-system-tests/blob/master/basic-suite-master/test-scenarios/002_bootstrap.py#L626" rel="noreferrer" target="_blank">https://github.com/oVirt/ovirt-system-tests/blob/master/basic-suite-master/test-scenarios/002_bootstrap.py#L626</a><br>
<br>
It's not calling vdsm directly, but uses the engine api (through the<br>
python sdk). Not sure if it's, as you say, "a bug in the client" (the<br>
engine), or we use the sdk improperly, or something else. Can someone<br>
please advice? Patches to ovirt-system-tests are welcome :-),<br>
especially patches to stress-test this, as it seems it's a sensitive<br>
timing issue - usually does not fail.<br></blockquote><div><br></div><div>Smells like bug in the code calling the sdk. Engine talks with vdsm for</div><div>years and it is unlikely that it will have such timing bugs.</div><div><br></div><div>The owner of this test should check that the test is valid.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
The specific suite that failed above is he-basic-suite-4.1 . I think<br>
that the code there is identical with basic-suite-4.1. Both are<br>
different from basic-suite-master. No idea if the diff is due to diff<br>
between the versions, or it's simply a bug fixed in the master suite<br>
and not fixed in 4.1 suites.<br>
<br>
> should use Volume.getInfo to check if a<br>
> volume<br>
> exits. Non existing volume is considered expected error in this call and<br>
> will be<br>
> logged as with INFO log level.<br>
><br>
> See<br>
> <a href="https://github.com/oVirt/vdsm/blob/f286cf7825578d8997ca6abc0801a823dee247ef/lib/vdsm/storage/hsm.py#L3075" rel="noreferrer" target="_blank">https://github.com/oVirt/vdsm/blob/f286cf7825578d8997ca6abc0801a823dee247ef/lib/vdsm/storage/hsm.py#L3075</a><br>
><br>
>><br>
>> [1]<br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/</a><br>
>> [2]<br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/consoleFull" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/consoleFull</a><br>
>> [3]<br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/artifact/exported-artifacts/he-basic-suite-4.1__logs/test_logs/he-basic-suite-4.1/post-002_bootstrap.py/lago-he-basic-suite-4-1-host0/_var_log/vdsm/vdsm.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/artifact/exported-artifacts/he-basic-suite-4.1__logs/test_logs/he-basic-suite-4.1/post-002_bootstrap.py/lago-he-basic-suite-4-1-host0/_var_log/vdsm/vdsm.log</a><br>
>><br>
>> ><br>
>> >><br>
>> >>><br>
>> >>> -------------------------------------<br>
>> >>> Changes Since Last Success:<br>
>> >>> -------------------------------------<br>
>> >>> Changes for Build #218<br>
>> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()<br>
>> >>><br>
>> >>><br>
>> >>> Changes for Build #219<br>
>> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()<br>
>> >>><br>
>> >>><br>
>> >>> Changes for Build #220<br>
>> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()<br>
>> >>><br>
>> >>><br>
>> >>> Changes for Build #221<br>
>> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()<br>
>> >>><br>
>> >>><br>
>> >>> Changes for Build #222<br>
>> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()<br>
>> >>><br>
>> >>><br>
>> >>> Changes for Build #223<br>
>> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()<br>
>> >>><br>
>> >>><br>
>> >>><br>
>> >>><br>
>> >>> -----------------<br>
>> >>> Failed Tests:<br>
>> >>> -----------------<br>
>> >>> 1 tests failed.<br>
>> >>> FAILED: 002_bootstrap.he_check_ha_agent<br>
>> >>><br>
>> >>> Error Message:<br>
>> >>> False is not true<br>
>> >>> -------------------- >> begin captured logging << --------------------<br>
>> >>> lago.ssh: DEBUG: start task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get<br>
>> >>> ssh client for lago-he-basic-suite-4-1-host0:<br>
>> >>> lago.ssh: DEBUG: end task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh<br>
>> >>> client for lago-he-basic-suite-4-1-host0:<br>
>> >>> lago.ssh: DEBUG: Running 7e67a956 on lago-he-basic-suite-4-1-host0:<br>
>> >>> grep Permission denied /var/log/ovirt-hosted-engine-ha/agent.log<br>
>> >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0<br>
>> >>> returned with 2<br>
>> >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0<br>
>> >>> output:<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> >>> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'<br>
>> >>><br>
>> >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0<br>
>> >>> errors:<br>
>> >>> grep: denied: No such file or directory<br>
>> >>><br>
>> >>> --------------------- >> end captured logging << ---------------------<br>
>> >>><br>
>> >>> Stack Trace:<br>
>> >>> File "/usr/lib64/python2.7/unittest/case.py", line 369, in run<br>
>> >>> testMethod()<br>
>> >>> File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in<br>
>> >>> runTest<br>
>> >>> self.test(*self.arg)<br>
>> >>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line<br>
>> >>> 129, in wrapped_test<br>
>> >>> test()<br>
>> >>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line<br>
>> >>> 59, in wrapper<br>
>> >>> return func(get_test_prefix(), *args, **kwargs)<br>
>> >>> File<br>
>> >>> "/home/jenkins/workspace/ovirt-system-tests_he-basic-suite-4.1/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",<br>
>> >>> line 1036, in he_check_ha_agent<br>
>> >>> 'Permission denied' not in result.out<br>
>> >>> File "/usr/lib64/python2.7/unittest/case.py", line 462, in<br>
>> >>> assertTrue<br>
>> >>> raise self.failureException(msg)<br>
>> >>> "False is not true\n-------------------- >> begin captured logging <<<br>
>> >>> --------------------\nlago.ssh: DEBUG: start<br>
>> >>> task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh client for<br>
>> >>> lago-he-basic-suite-4-1-host0:\nlago.ssh: DEBUG: end<br>
>> >>> task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh client for<br>
>> >>> lago-he-basic-suite-4-1-host0:\nlago.ssh: DEBUG: Running 7e67a956 on<br>
>> >>> lago-he-basic-suite-4-1-host0: grep Permission denied<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log\nlago.ssh: DEBUG: Command 7e67a956<br>
>> >>> on lago-he-basic-suite-4-1-host0 returned with 2\nlago.ssh: DEBUG: Command<br>
>> >>> 7e67a956 on lago-he-basic-suite-4-1-host0 output:\n<br>
>> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] Permission<br>
>> >>> denied:<br>
>> >>> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:<br>
>> >>> [Errno 13] Permission denied:<br>
>> >>> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:<br>
>> >>> [Errno 13] Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf<br>
>> '\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]<br>
>> Permission denied:<br>
>> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:<br>
>> [Errno 13] Permission denied:<br>
>> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:<br>
>> [Errno 13] Permission denied:<br>
>> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:<br>
>> [Errno 13] Permission denied:<br>
>> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:<br>
>> [Errno 13] Permission denied:<br>
>> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:<br>
>> [Errno 13] Permission denied:<br>
>> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n\nlago.ssh: DEBUG: Command<br>
>> 7e67a956 on lago-he-basic-suite-4-1-host0 errors:\n grep: denied: No such<br>
>> file or directory\n\n--------------------- >> end captured logging <<<br>
>> ---------------------"<br>
>> >><br>
>> >><br>
>> >><br>
>> >> --<br>
>> >> Didi<br>
>> ><br>
>> ><br>
>> ><br>
>> > --<br>
>> > Didi<br>
>><br>
>><br>
>><br>
>> --<br>
>> Didi<br>
>> _______________________________________________<br>
>> Infra mailing list<br>
>> <a href="mailto:Infra@ovirt.org" target="_blank">Infra@ovirt.org</a><br>
>> <a href="http://lists.ovirt.org/mailman/listinfo/infra" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/infra</a><br>
<br>
<br>
<br>
--<br>
Didi<br>
</blockquote></div></div>