Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 22/02/2017 ] [add_secondary_storage_domains]

23 Feb 2017


      I see this error there now:

00:02:59.460 [upgrade-from-release_suit_el7] + yum install
--nogpgcheck -y --downloaddir=/dev/shm ntp ovirt-engine
ovirt-log-collector 'ovirt-engine-extension-aaa-ldap*'
00:02:59.460 [upgrade-from-release_suit_el7]
00:02:59.461 [upgrade-from-release_suit_el7]
00:02:59.461 [upgrade-from-release_suit_el7]  One of the configured
repositories failed (Unknown),
00:02:59.461 [upgrade-from-release_suit_el7]  and yum doesn't have
enough cached data to continue. At this point the only
00:02:59.461 [upgrade-from-release_suit_el7]  safe thing yum can do is
fail. There are a few ways to work "fix" this:
00:02:59.461 [upgrade-from-release_suit_el7]
00:02:59.461 [upgrade-from-release_suit_el7]      1. Contact the
upstream for the repository and get them to fix the problem.
00:02:59.461 [upgrade-from-release_suit_el7]
00:02:59.462 [upgrade-from-release_suit_el7]      2. Reconfigure the
baseurl/etc. for the repository, to point to a working
00:02:59.462 [upgrade-from-release_suit_el7]         upstream. This is
most often useful if you are using a newer
00:02:59.462 [upgrade-from-release_suit_el7]         distribution
release than is supported by the repository (and the
00:02:59.462 [upgrade-from-release_suit_el7]         packages for the
previous distribution release still work).
00:02:59.462 [upgrade-from-release_suit_el7]
00:02:59.462 [upgrade-from-release_suit_el7]      3. Disable the
repository, so yum won't use it by default. Yum will then
00:02:59.462 [upgrade-from-release_suit_el7]         just ignore the
repository until you permanently enable it again or use
00:02:59.463 [upgrade-from-release_suit_el7]         --enablerepo for
temporary usage:
00:02:59.463 [upgrade-from-release_suit_el7]
00:02:59.463 [upgrade-from-release_suit_el7]
yum-config-manager --disable <repoid>
00:02:59.463 [upgrade-from-release_suit_el7]
00:02:59.463 [upgrade-from-release_suit_el7]      4. Configure the
failing repository to be skipped, if it is unavailable.
00:02:59.463 [upgrade-from-release_suit_el7]         Note that yum
will try to contact the repo. when it runs most commands,
00:02:59.464 [upgrade-from-release_suit_el7]         so will have to
try and fail each time (and thus. yum will be be much
00:02:59.464 [upgrade-from-release_suit_el7]         slower). If it is
a very temporary problem though, this is often a nice
00:02:59.464 [upgrade-from-release_suit_el7]         compromise:
00:02:59.464 [upgrade-from-release_suit_el7]
00:02:59.464 [upgrade-from-release_suit_el7]
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
00:02:59.464 [upgrade-from-release_suit_el7]
00:02:59.464 [upgrade-from-release_suit_el7] Cannot find a valid
baseurl for repo: base/7/x86_64


On Thu, Feb 23, 2017 at 6:02 PM, Barak Korren <bkorren@redhat.com> wrote:
...
Great!
This OST experimental run is verifying that:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5503/
Hope it doesn't fail on something else...
On 23 February 2017 at 17:38, Nir Soffer <nsoffer@redhat.com> wrote:
...
Fixed in
commit 726e946257174926ea2591a1c4a3be2dae4297ea
Author: Nir Soffer <nsoffer@redhat.com>
Date:   Thu Feb 23 16:45:45 2017 +0200
sp: Mark helper method as @unsecured
In commit 7cf19dafd7cd (storage_mailbox: make inbox/outbox mailbox
    args), we added a helper that is used before the spm is started, but the
    helper was not marked as @unsecure. This cause the call to fail with:
File "/usr/share/vdsm/storage/sp.py", line 485, in
__createMailboxMonitor
            outbox = self._master_volume_path("inbox")
          File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
        line 77, in wrapper
            raise SecureError("Secured object is not in safe state")
        SecureError: Secured object is not in safe state
As this helper doesn't change the state of the storage pool, there is no
    reason to treat it as a secured method, which is the default for this
    class.
Change-Id: Icf92b9474c9000840a5c15e3b91f2ced4d02aca2
    Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Verified with http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_...
Thanks for reporting this.
Nir
On Thu, Feb 23, 2017 at 5:03 PM, Nir Soffer <nsoffer@redhat.com> wrote:
...
On Thu, Feb 23, 2017 at 4:51 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
...
On Thu, Feb 23, 2017 at 4:43 PM Nir Soffer <nsoffer@redhat.com> wrote:
...
On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote:
...
On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:
> On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com>
> wrote:
>> Test failed: [ add_secondary_storage_domains ]
>>
>> Note:
>> - This may or may not be related to
>>   https://bugzilla.redhat.com/show_bug.cgi?id=1421945
>>   The BZ talks about sporadic failures, while this seems to be
>>   happening consistently (for 6 runs so far)
>>
>> Link to suspected patches:
>> - https://gerrit.ovirt.org/70415
>> - https://gerrit.ovirt.org/69157
>
> Why do you suspect these patches?
Because the test right before them passed.
These are all the changes that caused the failing OST job to run.
> Did you try to run the tests with the latest patches before these
> patches?
Yes, the test before them pass.
Your are correct, these patches are broken:
2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1)
[storage.TaskManager.Task]
(Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error
(task:871)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line
878, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in
wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool
    spUUID, hostID, msdUUID, masterVersion, domainsMap)
  File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool
    res = pool.connect(hostID, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 672, in connect
    self.__createMailboxMonitor()
  File "/usr/share/vdsm/storage/sp.py", line 485, in
__createMailboxMonitor
    outbox = self._master_volume_path("inbox")
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
line 77, in wrapper
    raise SecureError("Secured object is not in safe state")
SecureError: Secured object is not in safe state
It's very confusing that this error is sometimes harmless and sometimes
isn't - how did you identify it as problematic?
It depends on the context.
Here we called __createMailboxMontior, which is something we call
when creating an instance, and is marked as @unsecured.
This call is calling now a new helper introduced in 7cf19dafd7cd,
but the helper was not marked as @unsecured. This will raise
UnsecureError, which will fail the current flow.
We have another instance of this during upgrade domain flow - I think
we have the same issue there, but this needs investigation.
Other errors means that a real secured method is called when a host
is not hte spm. This may be bad client code, or unavoidable, since
there is no race-free way to check that a host is the spm before
calling a method on the spm.
...
Y.
...
I'm sending a fix.
Nir
...
--
Barak Korren
bkorren@redhat.com
RHCE, RHCi, RHV-DevOps Team
https://ifireball.wordpress.com/
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel
--
Barak Korren
bkorren@redhat.com
RHCE, RHCi, RHV-DevOps Team
https://ifireball.wordpress.com/