[oVirt Jenkins] ovirt-system-tests_he-basic-suite-4.1 - Build # 223 - Still Failing!

Nir Soffer nsoffer at redhat.com
Mon Mar 19 07:01:16 UTC 2018


On Sun, Mar 18, 2018 at 4:21 PM Yedidyah Bar David <didi at redhat.com> wrote:

> On Sun, Mar 18, 2018 at 2:48 PM, Yedidyah Bar David <didi at redhat.com>
> wrote:
> > On Sun, Mar 18, 2018 at 1:45 PM, Yedidyah Bar David <didi at redhat.com>
> wrote:
> >> On Sun, Mar 18, 2018 at 11:20 AM,  <jenkins at jenkins.phx.ovirt.org>
> wrote:
> >>> Project:
> http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.1/
> >>> Build:
> http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.1/223/
> >>> Build Number: 223
> >>> Build Status:  Still Failing
> >>> Triggered By: Started by timer
> >>
> >> It was broken by:
> >>
> >> [1] https://gerrit.ovirt.org/88483
> >>
> >> It should be fixed by:
> >>
> >> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1554283
> >>
> >> [2] is on modified, no idea about its status other than that.
> >>
> >> I didn't intend to merge [1] before [2] is fixed, not sure if
> >> Sandro didn't notice my comment there, or thought that [2] is fixed.
> >>
> >> If it's annoying we can revert [1] and re-merge when [2] is fixed.
> >
> > Talked with Gal, and pushed this, which should hopefully fix:
> >
> > https://gerrit.ovirt.org/89136
>
> It indeed seems to fix [1][2]:
>
> 13:13:24   # he_get_shared_config:
> 13:13:26   # he_get_shared_config: Success (in 0:00:01)
> 13:13:26   # sleep:
> 13:15:26   # sleep: Success (in 0:02:00)
> 13:15:26   # add_he_hosts:
> 13:16:18   # add_he_hosts: Success (in 0:00:52)
> 13:16:18   # he_check_ha_agent:
> 13:16:19   # he_check_ha_agent: Success (in 0:00:00)
>
> But later fails:
>
> 13:16:20   # add_secondary_storage_domains:
> 13:19:30 Error while running thread
> 13:19:30 Traceback (most recent call last):
> 13:19:30   File "/usr/lib/python2.7/site-packages/lago/utils.py", line
> 58, in _ret_via_queue
> 13:19:30     queue.put({'return': func()})
> 13:19:30   File
>
> "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",
> line 491, in add_nfs_storage_domain
> 13:19:30     add_generic_nfs_storage_domain(prefix, SD_NFS_NAME,
> SD_NFS_HOST_NAME, SD_NFS_PATH)
> 13:19:30   File
>
> "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",
> line 496, in add_generic_nfs_storage_domain
> 13:19:30     add_generic_nfs_storage_domain_4(prefix, sd_nfs_name,
> nfs_host_name, mount_path, sd_format, sd_type, nfs_version)
> 13:19:30   File
>
> "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",
> line 552, in add_generic_nfs_storage_domain_4
> 13:19:30     _add_storage_domain_4(api, p)
> 13:19:30   File
>
> "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",
> line 466, in _add_storage_domain_4
> 13:19:30     id=sd.id,
> 13:19:30   File
> "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 2219,
> in add
> 13:19:30     return self._internal_add(storage_domain, headers, query,
> wait)
> 13:19:30   File
> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 223,
> in _internal_add
> 13:19:30     return future.wait() if wait else future
> 13:19:30   File
> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 53, in
> wait
> 13:19:30     return self._code(response)
> 13:19:30   File
> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 220,
> in callback
> 13:19:30     self._check_fault(response)
> 13:19:30   File
> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 123,
> in _check_fault
> 13:19:30     self._raise_error(response, body)
> 13:19:30   File
> "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 109,
> in _raise_error
> 13:19:30     raise error
> 13:19:30 Error: Fault reason is "Operation Failed". Fault detail is
> "[Storage domain cannot be reached. Please ensure it is accessible
> from the host(s).]". HTTP response code is 400.
> 13:19:30 Error while running thread
>
> I'll retrigger now, but perhaps someone from storage wants to check.
> vdsm log has[3]:
>
> 2018-03-18 08:59:52,918-0400 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer]
> RPC call Host.getHardwareInfo succeeded in 0.00 seconds (__init__:539)
> 2018-03-18 08:59:52,924-0400 INFO  (jsonrpc/4) [vdsm.api] START
> prepareImage(sdUUID=u'424e809d-b7ad-4ed5-b6a1-5426d373a5d2',
> spUUID=u'8509c64f-cdd7-4713-9e09-a79d90ba26ed',
> imgUUID=u'718090ba-b36e-45cc-bcd6-597c16a766b9',
> leafUUID=u'23ee843b-20e0-4afe-98dc-e165334ac710', allowIllegal=False)
> from=::1,36020, task_id=92aeabf2-084a-4c98-a6e5-03efae38e7b3 (api:46)
> 2018-03-18 08:59:52,928-0400 INFO  (jsonrpc/4) [vdsm.api] FINISH
> prepareImage error=Volume does not exist:
> (u'23ee843b-20e0-4afe-98dc-e165334ac710',) from=::1,36020,
> task_id=92aeabf2-084a-4c98-a6e5-03efae38e7b3 (api:50)
> 2018-03-18 08:59:52,928-0400 ERROR (jsonrpc/4)
> [storage.TaskManager.Task]
> (Task='92aeabf2-084a-4c98-a6e5-03efae38e7b3') Unexpected error
> (task:872)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 879, in _run
>     return fn(*args, **kargs)
>   File "<string>", line 2, in prepareImage
>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
> method
>     ret = func(*args, **kwargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 3137, in prepareImage
>     raise se.VolumeDoesNotExist(leafUUID)
> VolumeDoesNotExist: Volume does not exist:
> (u'23ee843b-20e0-4afe-98dc-e165334ac710',)
>

Immediately after this failed prepare, we see:

2018-03-18 08:59:52,941-0400 INFO (jsonrpc/7) [vdsm.api] START
createVolume(sdUUID=u'424e809d-b7ad-4ed5-b6a1-5426d373a5d2',
spUUID=u'8509c64f-cdd7-4713-9e09-a79d90ba26ed',
imgUUID=u'718090ba-b36e-45cc-bcd6-597c16a766b9', size=u'1048576',
volFormat=5, preallocate=1, diskType=2,
volUUID=u'23ee843b-20e0-4afe-98dc-e165334ac710',
desc=u'hosted-engine.lockspace',
srcImgUUID=u'00000000-0000-0000-0000-000000000000',
srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None)
from=::1,36020, task_id=c4c73314-ee10-419a-8265-b8b87763e99f (api:46)

So that volume really did not exit when it was prepared, which means it is
a client
error, not vdsm error.

The caller (hosted engine agent?) should use Volume.getInfo to check if a
volume
exits. Non existing volume is considered expected error in this call and
will be
logged as with INFO log level.

See
https://github.com/oVirt/vdsm/blob/f286cf7825578d8997ca6abc0801a823dee247ef/lib/vdsm/storage/hsm.py#L3075


> [1]
> http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/
> [2]
> http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/consoleFull
> [3]
> http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/artifact/exported-artifacts/he-basic-suite-4.1__logs/test_logs/he-basic-suite-4.1/post-002_bootstrap.py/lago-he-basic-suite-4-1-host0/_var_log/vdsm/vdsm.log
>
> >
> >>
> >>>
> >>> -------------------------------------
> >>> Changes Since Last Success:
> >>> -------------------------------------
> >>> Changes for Build #218
> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()
> >>>
> >>>
> >>> Changes for Build #219
> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()
> >>>
> >>>
> >>> Changes for Build #220
> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()
> >>>
> >>>
> >>> Changes for Build #221
> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()
> >>>
> >>>
> >>> Changes for Build #222
> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()
> >>>
> >>>
> >>> Changes for Build #223
> >>> [Martin Sivak] Cleanup of test code - next() replaced with any()
> >>>
> >>>
> >>>
> >>>
> >>> -----------------
> >>> Failed Tests:
> >>> -----------------
> >>> 1 tests failed.
> >>> FAILED:  002_bootstrap.he_check_ha_agent
> >>>
> >>> Error Message:
> >>> False is not true
> >>> -------------------- >> begin captured logging << --------------------
> >>> lago.ssh: DEBUG: start task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get
> ssh client for lago-he-basic-suite-4-1-host0:
> >>> lago.ssh: DEBUG: end task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh
> client for lago-he-basic-suite-4-1-host0:
> >>> lago.ssh: DEBUG: Running 7e67a956 on lago-he-basic-suite-4-1-host0:
> grep Permission denied /var/log/ovirt-hosted-engine-ha/agent.log
> >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0
> returned with 2
> >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0
> output:
> >>>  /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf'
> >>>
> >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0
> errors:
> >>>  grep: denied: No such file or directory
> >>>
> >>> --------------------- >> end captured logging << ---------------------
> >>>
> >>> Stack Trace:
> >>>   File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
> >>>     testMethod()
> >>>   File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in
> runTest
> >>>     self.test(*self.arg)
> >>>   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
> 129, in wrapped_test
> >>>     test()
> >>>   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
> 59, in wrapper
> >>>     return func(get_test_prefix(), *args, **kwargs)
> >>>   File
> "/home/jenkins/workspace/ovirt-system-tests_he-basic-suite-4.1/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py",
> line 1036, in he_check_ha_agent
> >>>     'Permission denied' not in result.out
> >>>   File "/usr/lib64/python2.7/unittest/case.py", line 462, in assertTrue
> >>>     raise self.failureException(msg)
> >>> "False is not true\n-------------------- >> begin captured logging <<
> --------------------\nlago.ssh: DEBUG: start
> task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh client for
> lago-he-basic-suite-4-1-host0:\nlago.ssh: DEBUG: end
> task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh client for
> lago-he-basic-suite-4-1-host0:\nlago.ssh: DEBUG: Running 7e67a956 on
> lago-he-basic-suite-4-1-host0: grep Permission denied
> /var/log/ovirt-hosted-engine-ha/agent.log\nlago.ssh: DEBUG: Command
> 7e67a956 on lago-he-basic-suite-4-1-host0 returned with 2\nlago.ssh: DEBUG:
> Command 7e67a956 on lago-he-basic-suite-4-1-host0 output:\n
> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] Permission
> denied:
> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:
> [Errno 13] Permission denied:
> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:
> [Errno 13] Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf
>  '\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13]
> Permission denied:
> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:
> [Errno 13] Permission denied:
> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:
> [Errno 13] Permission denied:
> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:
> [Errno 13] Permission denied:
> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:
> [Errno 13] Permission denied:
> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError:
> [Errno 13] Permission denied:
> '/var/run/ovirt-hosted-engine-ha/vm.conf'\n\nlago.ssh: DEBUG: Command
> 7e67a956 on lago-he-basic-suite-4-1-host0  errors:\n grep: denied: No such
> file or directory\n\n--------------------- >> end captured logging <<
> ---------------------"
> >>
> >>
> >>
> >> --
> >> Didi
> >
> >
> >
> > --
> > Didi
>
>
>
> --
> Didi
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20180319/f301f2c5/attachment-0001.html>


More information about the Infra mailing list