[ OST Failure Report ] [ oVirt master ] [ 22/02/2017 ] [add_secondary_storage_domains]

Barak Korren

23 Feb 2017 23 Feb '17

8:37 a.m.

Test failed: [ add_secondary_storage_domains ] Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far) Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157 Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5487/ Link to all logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5487/artifa... Error snippet from log: <error> RequestError: status: 400 reason: Bad Request detail: Cannot import Virtual Disk: Storage Domain cannot be accessed. -Please check that at least one Host is operational and Data Center state is up. </error> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5487/testRe... -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

Show replies by date

Nir Soffer

23 Feb 23 Feb

3:35 p.m.

On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote:

...

Test failed: [ add_secondary_storage_domains ]

Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far)

Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157

Why do you suspect these patches? Did you try to run the tests with the latest patches before these patches?

...

Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5487/

Link to all logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5487/artifa...

Error snippet from log:

<error>

RequestError: status: 400 reason: Bad Request detail: Cannot import Virtual Disk: Storage Domain cannot be accessed. -Please check that at least one Host is operational and Data Center state is up.

</error>

http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5487/testRe...

-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/ _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

Barak Korren

3:38 p.m.

On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote:

...
Test failed: [ add_secondary_storage_domains ]

Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far)

Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157

Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

...

Did you try to run the tests with the latest patches before these patches?

Yes, the test before them pass. -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

Nir Soffer

3:43 p.m.

On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote:

...

On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote:

...
Test failed: [ add_secondary_storage_domains ]

Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far)

Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157

Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

...
Did you try to run the tests with the latest patches before these patches?

Yes, the test before them pass.

Your are correct, these patches are broken: 2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 672, in connect self.__createMailboxMonitor() File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state I'm sending a fix. Nir

...

-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

Allon Mureinik

3:44 p.m.

Thanks for shortlooping this. On Thu, Feb 23, 2017 at 4:43 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

...
On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote:

...
Test failed: [ add_secondary_storage_domains ]

Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far)

Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157

Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

...
Did you try to run the tests with the latest patches before these

On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote: patches?

...
Yes, the test before them pass.

Your are correct, these patches are broken:

2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 672, in connect self.__createMailboxMonitor() File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

I'm sending a fix.

Nir

...
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra

Yaniv Kaul

3:51 p.m.

On Thu, Feb 23, 2017 at 4:43 PM Nir Soffer <nsoffer@redhat.com> wrote:

...

...
On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote:

...
Test failed: [ add_secondary_storage_domains ]

Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far)

Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157

Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

...
Did you try to run the tests with the latest patches before these

On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote: patches?

...
Yes, the test before them pass.

Your are correct, these patches are broken:

2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 672, in connect self.__createMailboxMonitor() File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

It's very confusing that this error is sometimes harmless and sometimes isn't - how did you identify it as problematic? Y.

...

I'm sending a fix.

Nir

...
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Barak Korren

3:58 p.m.

On 23 February 2017 at 16:51, Yaniv Kaul <ykaul@redhat.com> wrote:

...

It's very confusing that this error is sometimes harmless and sometimes isn't - how did you identify it as problematic? Y.

If you're asking me, it was because this failed on the same issue for 12 OST runs straight. -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

Nir Soffer

4:03 p.m.

On Thu, Feb 23, 2017 at 4:51 PM, Yaniv Kaul <ykaul@redhat.com> wrote:

...

On Thu, Feb 23, 2017 at 4:43 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote:

...
On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote:

...
Test failed: [ add_secondary_storage_domains ]

Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far)

Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157

Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

...
Did you try to run the tests with the latest patches before these patches?

Yes, the test before them pass.

Your are correct, these patches are broken:

2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 672, in connect self.__createMailboxMonitor() File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

It's very confusing that this error is sometimes harmless and sometimes isn't - how did you identify it as problematic?

It depends on the context. Here we called __createMailboxMontior, which is something we call when creating an instance, and is marked as @unsecured. This call is calling now a new helper introduced in 7cf19dafd7cd, but the helper was not marked as @unsecured. This will raise UnsecureError, which will fail the current flow. We have another instance of this during upgrade domain flow - I think we have the same issue there, but this needs investigation. Other errors means that a real secured method is called when a host is not hte spm. This may be bad client code, or unavoidable, since there is no race-free way to check that a host is the spm before calling a method on the spm.

...

Y.

...
I'm sending a fix.

Nir

...
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Nir Soffer

4:38 p.m.

Fixed in commit 726e946257174926ea2591a1c4a3be2dae4297ea Author: Nir Soffer <nsoffer@redhat.com> Date: Thu Feb 23 16:45:45 2017 +0200 sp: Mark helper method as @unsecured In commit 7cf19dafd7cd (storage_mailbox: make inbox/outbox mailbox args), we added a helper that is used before the spm is started, but the helper was not marked as @unsecure. This cause the call to fail with: File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state As this helper doesn't change the state of the storage pool, there is no reason to treat it as a secured method, which is the default for this class. Change-Id: Icf92b9474c9000840a5c15e3b91f2ced4d02aca2 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Verified with http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_... Thanks for reporting this. Nir On Thu, Feb 23, 2017 at 5:03 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Thu, Feb 23, 2017 at 4:51 PM, Yaniv Kaul <ykaul@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:43 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote:

...
On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote:

...
Test failed: [ add_secondary_storage_domains ]

Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far)

Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157

Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

...
Did you try to run the tests with the latest patches before these patches?

Yes, the test before them pass.

Your are correct, these patches are broken:

2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 672, in connect self.__createMailboxMonitor() File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

It's very confusing that this error is sometimes harmless and sometimes isn't - how did you identify it as problematic?

It depends on the context.

Here we called __createMailboxMontior, which is something we call when creating an instance, and is marked as @unsecured.

This call is calling now a new helper introduced in 7cf19dafd7cd, but the helper was not marked as @unsecured. This will raise UnsecureError, which will fail the current flow.

We have another instance of this during upgrade domain flow - I think we have the same issue there, but this needs investigation.

Other errors means that a real secured method is called when a host is not hte spm. This may be bad client code, or unavoidable, since there is no race-free way to check that a host is the spm before calling a method on the spm.

...
Y.

...
I'm sending a fix.

Nir

...
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Barak Korren

5:02 p.m.

Great! This OST experimental run is verifying that: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5503/ Hope it doesn't fail on something else... On 23 February 2017 at 17:38, Nir Soffer <nsoffer@redhat.com> wrote:

...

Fixed in

commit 726e946257174926ea2591a1c4a3be2dae4297ea Author: Nir Soffer <nsoffer@redhat.com> Date: Thu Feb 23 16:45:45 2017 +0200

sp: Mark helper method as @unsecured

In commit 7cf19dafd7cd (storage_mailbox: make inbox/outbox mailbox args), we added a helper that is used before the spm is started, but the helper was not marked as @unsecure. This cause the call to fail with:

File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

As this helper doesn't change the state of the storage pool, there is no reason to treat it as a secured method, which is the default for this class.

Change-Id: Icf92b9474c9000840a5c15e3b91f2ced4d02aca2 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Verified with http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_...

Thanks for reporting this.

Nir

On Thu, Feb 23, 2017 at 5:03 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:51 PM, Yaniv Kaul <ykaul@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:43 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote:

...
On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote: > Test failed: [ add_secondary_storage_domains ] > > Note: > - This may or may not be related to > https://bugzilla.redhat.com/show_bug.cgi?id=1421945 > The BZ talks about sporadic failures, while this seems to be > happening consistently (for 6 runs so far) > > Link to suspected patches: > - https://gerrit.ovirt.org/70415 > - https://gerrit.ovirt.org/69157

Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

...
Did you try to run the tests with the latest patches before these patches?

Yes, the test before them pass.

Your are correct, these patches are broken:

2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 672, in connect self.__createMailboxMonitor() File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

It's very confusing that this error is sometimes harmless and sometimes isn't - how did you identify it as problematic?

It depends on the context.

Here we called __createMailboxMontior, which is something we call when creating an instance, and is marked as @unsecured.

This call is calling now a new helper introduced in 7cf19dafd7cd, but the helper was not marked as @unsecured. This will raise UnsecureError, which will fail the current flow.

We have another instance of this during upgrade domain flow - I think we have the same issue there, but this needs investigation.

Other errors means that a real secured method is called when a host is not hte spm. This may be bad client code, or unavoidable, since there is no race-free way to check that a host is the spm before calling a method on the spm.

...
Y.

...
I'm sending a fix.

Nir

...
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

Nir Soffer

5:11 p.m.

I see this error there now: 00:02:59.460 [upgrade-from-release_suit_el7] + yum install --nogpgcheck -y --downloaddir=/dev/shm ntp ovirt-engine ovirt-log-collector 'ovirt-engine-extension-aaa-ldap*' 00:02:59.460 [upgrade-from-release_suit_el7] 00:02:59.461 [upgrade-from-release_suit_el7] 00:02:59.461 [upgrade-from-release_suit_el7] One of the configured repositories failed (Unknown), 00:02:59.461 [upgrade-from-release_suit_el7] and yum doesn't have enough cached data to continue. At this point the only 00:02:59.461 [upgrade-from-release_suit_el7] safe thing yum can do is fail. There are a few ways to work "fix" this: 00:02:59.461 [upgrade-from-release_suit_el7] 00:02:59.461 [upgrade-from-release_suit_el7] 1. Contact the upstream for the repository and get them to fix the problem. 00:02:59.461 [upgrade-from-release_suit_el7] 00:02:59.462 [upgrade-from-release_suit_el7] 2. Reconfigure the baseurl/etc. for the repository, to point to a working 00:02:59.462 [upgrade-from-release_suit_el7] upstream. This is most often useful if you are using a newer 00:02:59.462 [upgrade-from-release_suit_el7] distribution release than is supported by the repository (and the 00:02:59.462 [upgrade-from-release_suit_el7] packages for the previous distribution release still work). 00:02:59.462 [upgrade-from-release_suit_el7] 00:02:59.462 [upgrade-from-release_suit_el7] 3. Disable the repository, so yum won't use it by default. Yum will then 00:02:59.462 [upgrade-from-release_suit_el7] just ignore the repository until you permanently enable it again or use 00:02:59.463 [upgrade-from-release_suit_el7] --enablerepo for temporary usage: 00:02:59.463 [upgrade-from-release_suit_el7] 00:02:59.463 [upgrade-from-release_suit_el7] yum-config-manager --disable <repoid> 00:02:59.463 [upgrade-from-release_suit_el7] 00:02:59.463 [upgrade-from-release_suit_el7] 4. Configure the failing repository to be skipped, if it is unavailable. 00:02:59.463 [upgrade-from-release_suit_el7] Note that yum will try to contact the repo. when it runs most commands, 00:02:59.464 [upgrade-from-release_suit_el7] so will have to try and fail each time (and thus. yum will be be much 00:02:59.464 [upgrade-from-release_suit_el7] slower). If it is a very temporary problem though, this is often a nice 00:02:59.464 [upgrade-from-release_suit_el7] compromise: 00:02:59.464 [upgrade-from-release_suit_el7] 00:02:59.464 [upgrade-from-release_suit_el7] yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true 00:02:59.464 [upgrade-from-release_suit_el7] 00:02:59.464 [upgrade-from-release_suit_el7] Cannot find a valid baseurl for repo: base/7/x86_64 On Thu, Feb 23, 2017 at 6:02 PM, Barak Korren <bkorren@redhat.com> wrote:

...

Great!

This OST experimental run is verifying that: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5503/

Hope it doesn't fail on something else...

On 23 February 2017 at 17:38, Nir Soffer <nsoffer@redhat.com> wrote:

...
Fixed in

commit 726e946257174926ea2591a1c4a3be2dae4297ea Author: Nir Soffer <nsoffer@redhat.com> Date: Thu Feb 23 16:45:45 2017 +0200

sp: Mark helper method as @unsecured

In commit 7cf19dafd7cd (storage_mailbox: make inbox/outbox mailbox args), we added a helper that is used before the spm is started, but the helper was not marked as @unsecure. This cause the call to fail with:

File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

As this helper doesn't change the state of the storage pool, there is no reason to treat it as a secured method, which is the default for this class.

Change-Id: Icf92b9474c9000840a5c15e3b91f2ced4d02aca2 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Verified with http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_...

Thanks for reporting this.

Nir

On Thu, Feb 23, 2017 at 5:03 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:51 PM, Yaniv Kaul <ykaul@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:43 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote:

...
On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote: > On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> > wrote: >> Test failed: [ add_secondary_storage_domains ] >> >> Note: >> - This may or may not be related to >> https://bugzilla.redhat.com/show_bug.cgi?id=1421945 >> The BZ talks about sporadic failures, while this seems to be >> happening consistently (for 6 runs so far) >> >> Link to suspected patches: >> - https://gerrit.ovirt.org/70415 >> - https://gerrit.ovirt.org/69157 > > Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

> Did you try to run the tests with the latest patches before these > patches?

Yes, the test before them pass.

Your are correct, these patches are broken:

2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 672, in connect self.__createMailboxMonitor() File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

It's very confusing that this error is sometimes harmless and sometimes isn't - how did you identify it as problematic?

It depends on the context.

Here we called __createMailboxMontior, which is something we call when creating an instance, and is marked as @unsecured.

This call is calling now a new helper introduced in 7cf19dafd7cd, but the helper was not marked as @unsecured. This will raise UnsecureError, which will fail the current flow.

We have another instance of this during upgrade domain flow - I think we have the same issue there, but this needs investigation.

Other errors means that a real secured method is called when a host is not hte spm. This may be bad client code, or unavoidable, since there is no race-free way to check that a host is the spm before calling a method on the spm.

...
Y.

...
I'm sending a fix.

Nir

...
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

Barak Korren

6:44 p.m.

On 23 February 2017 at 18:11, Nir Soffer <nsoffer@redhat.com> wrote:

...

I see this error there now:

00:02:59.460 [upgrade-from-release_suit_el7] + yum install --nogpgcheck -y --downloaddir=/dev/shm ntp ovirt-engine ovirt-log-collector 'ovirt-engine-extension-aaa-ldap*' 00:02:59.460 [upgrade-from-release_suit_el7] 00:02:59.461 [upgrade-from-release_suit_el7] 00:02:59.461 [upgrade-from-release_suit_el7] One of the configured

Yeah looks like a temp issue with CentOS repos. We should probably switch back to blocking outside repo access from Lago VMs. Next run passed finally: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5504/ -- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

Nir Soffer

6:48 p.m.

On Thu, Feb 23, 2017 at 7:44 PM, Barak Korren <bkorren@redhat.com> wrote:

...

On 23 February 2017 at 18:11, Nir Soffer <nsoffer@redhat.com> wrote:

...
I see this error there now:

00:02:59.460 [upgrade-from-release_suit_el7] + yum install --nogpgcheck -y --downloaddir=/dev/shm ntp ovirt-engine ovirt-log-collector 'ovirt-engine-extension-aaa-ldap*' 00:02:59.460 [upgrade-from-release_suit_el7] 00:02:59.461 [upgrade-from-release_suit_el7] 00:02:59.461 [upgrade-from-release_suit_el7] One of the configured

Yeah looks like a temp issue with CentOS repos. We should probably switch back to blocking outside repo access from Lago VMs.

Next run passed finally: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5504/

I like green :-)

Nir Soffer

8:24 p.m.

On Thu, Feb 23, 2017 at 5:03 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Thu, Feb 23, 2017 at 4:51 PM, Yaniv Kaul <ykaul@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:43 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 4:38 PM, Barak Korren <bkorren@redhat.com> wrote:

...
On 23 February 2017 at 16:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Feb 23, 2017 at 9:37 AM, Barak Korren <bkorren@redhat.com> wrote:

...
Test failed: [ add_secondary_storage_domains ]

Note: - This may or may not be related to https://bugzilla.redhat.com/show_bug.cgi?id=1421945 The BZ talks about sporadic failures, while this seems to be happening consistently (for 6 runs so far)

Link to suspected patches: - https://gerrit.ovirt.org/70415 - https://gerrit.ovirt.org/69157

Why do you suspect these patches?

Because the test right before them passed. These are all the changes that caused the failing OST job to run.

...
Did you try to run the tests with the latest patches before these patches?

Yes, the test before them pass.

Your are correct, these patches are broken:

2017-02-22 16:13:00,745-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4f670db2-70c2-4c21-96ff-114f57de70c0') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 989, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1051, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 672, in connect self.__createMailboxMonitor() File "/usr/share/vdsm/storage/sp.py", line 485, in __createMailboxMonitor outbox = self._master_volume_path("inbox") File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state

It's very confusing that this error is sometimes harmless and sometimes isn't - how did you identify it as problematic?

It depends on the context.

Here we called __createMailboxMontior, which is something we call when creating an instance, and is marked as @unsecured.

This call is calling now a new helper introduced in 7cf19dafd7cd, but the helper was not marked as @unsecured. This will raise UnsecureError, which will fail the current flow.

We have another instance of this during upgrade domain flow - I think we have the same issue there, but this needs investigation.

Fixed in https://gerrit.ovirt.org/73003

...

Other errors means that a real secured method is called when a host is not hte spm. This may be bad client code, or unavoidable, since there is no race-free way to check that a host is the spm before calling a method on the spm.

These issue should be fixed in vdsm by logging a warning during in spm only verbs, instead of a traceback. We already discussed this before, need to improve error handling infrastructure for this. Similar issue (calling with incorrect arguments) is tracked in this bug: https://bugzilla.redhat.com/1402359 Every SecureError from vdsm public verb is a caller error (engine, hosted engine, mom). Nir

...

...
Y.

...
I'm sending a fix.

Nir

...
-- Barak Korren bkorren@redhat.com RHCE, RHCi, RHV-DevOps Team https://ifireball.wordpress.com/

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

3170

Age (days ago)

3170

Last active (days ago)

List overview

Download

13 comments

4 participants

participants (4)

Allon Mureinik
Barak Korren
Nir Soffer
Yaniv Kaul

[ OST Failure Report ] [ oVirt master ] [ 22/02/2017 ] [add_secondary_storage_domains]

tags

participants (4)