[ovirt-devel] [ OST Failure Report ] [ oVirt Master (cockpit-ovirt) ] [ 15-03-2018 ] [ 098_ovirt_provider_ovn.use_ovn_provider ]

Dan Kenigsberg danken at redhat.com
Sat Mar 17 14:16:49 UTC 2018


Thanks for filing those, but let us not keep them secret:

Bug 1557419 - Recommend that a call should be tried again
sounds most reasonable. Providing more information to the client makes sense

Bug 1557424 - Automatically retry call failed because he failed to
acquire a lock
is more tricky. when would REST perform the retry? how often? We must
make sure we never cause a livelock.

Can we alternatively take a normal blocking lock when we create an
external network (instead of a trylock)?

On Fri, Mar 16, 2018 at 4:52 PM, Dominik Holler <dholler at redhat.com> wrote:
> I have created
> https://bugzilla.redhat.com/show_bug.cgi?id=1557419 and
> https://bugzilla.redhat.com/show_bug.cgi?id=1557424
> to discuss how failing to acquire locks in engine should be handled in
> engine's REST-API, in ovirt-sdk or in application (OST in this case).
>
>
> On Fri, 16 Mar 2018 09:32:47 +0000
> Dafna Ron <dron at redhat.com> wrote:
>
>> Thank you for the fast reply and help.
>>
>> On Thu, Mar 15, 2018 at 8:21 PM, Dominik Holler <dholler at redhat.com>
>> wrote:
>>
>> > On Thu, 15 Mar 2018 16:24:10 +0000
>> > Dafna Ron <dron at redhat.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > We have a failure on master for test
>> > > 098_ovirt_provider_ovn.use_ovn_provider in project cockpit-ovirt.
>> > > This seems to be a race because object is locked. also, the actual
>> > > failure is logged as WARN and not ERROR.
>> > >
>> > > I don't think the patch is actually related to the failure but I
>> > > think the test should be fixed.
>> > > can you please review to make sure we do not have an actual
>> > > regression and let me know if we need to open a bz to fix the
>> > > test?
>> > >
>> > >
>> > > *Link and headline of suspected patches: *
>> > > *https://gerrit.ovirt.org/#/c/89020/2
>> > > <https://gerrit.ovirt.org/#/c/89020/2> - *
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > *wizard: Enable scroll on start page for low-res screensLink to
>> > > Job:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6374
>> > > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6374>Link
>> > > to all
>> > > logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue-
>> > tester/6374/artifacts
>> > > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-
>> > tester/6374/artifacts>(Relevant)
>> > > error snippet from the log: <error>2018-03-15 10:05:00,160-04
>> > > DEBUG [org.ovirt.engine.core.sso.servlets.OAuthTokenInfoServlet]
>> > > (default task-10) [] Sending json response2018-03-15
>> > > 10:05:00,160-04 DEBUG
>> > > [org.ovirt.engine.core.sso.utils.TokenCleanupUtility] (default
>> > > task-10) [] Not cleaning up expired tokens2018-03-15
>> > > 10:05:00,169-04 INFO
>> > > [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> > > (EE-ManagedThreadFactory-engineScheduled-Thread-90) [789edb23]
>> > > Lock Acquired to object
>> > > 'EngineLock:{exclusiveLocks='[c38a67ec-0b48-4e6f-be85-
>> > 70c700df5483=PROVIDER]',
>> > > sharedLocks=''}'2018-03-15 10:05:00,184-04 INFO
>> > > [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> > > (EE-ManagedThreadFactory-engineScheduled-Thread-90) [789edb23]
>> > > Running command: SyncNetworkProviderCommand internal:
>> > > true.2018-03-15 10:05:00,228-04 DEBUG
>> > > [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$
>> > PostgresSimpleJdbcCall]
>> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] Compiled
>> > > stored procedure. Call string is [{call
>> > > getdcidbyexternalnetworkid(?)}]2018-03-15 10:05:00,228-04 DEBUG
>> > > [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$
>> > PostgresSimpleJdbcCall]
>> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] SqlCall
>> > > for procedure [GetDcIdByExternalNetworkId] compiled2018-03-15
>> > > 10:05:00,229-04 DEBUG
>> > > [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor]
>> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] method:
>> > > runQuery, params: [GetAllExternalNetworksOnProvider,
>> > > IdQueryParameters:{refresh='false', filtered='false'}],
>> > > timeElapsed: 353ms2018-03-15 10:05:00,239-04 INFO
>> > > [org.ovirt.engine.core.bll.network.dc.AddNetworkCommand] (default
>> > > task-13) [e1328379-17b7-49f8-beb2-cf8331784828] Failed to Acquire
>> > > Lock to object 'EngineLock:{exclusiveLocks='[network_1=NETWORK,
>> > > c38a67ec-0b48-4e6f-be85-70c700df5483=PROVIDER]',
>> > > sharedLocks=''}'2018-03-15 10:05:00,239-04 WARN
>> > > [org.ovirt.engine.core.bll.network.dc.AddNetworkCommand] (default
>> > > task-13) [e1328379-17b7-49f8-beb2-cf8331784828] Validation of
>> > > action 'AddNetwork' failed for user admin at internal-authz. Reasons:
>> > > VAR__TYPE__NETWORK,VAR__ACTION__ADD,ACTION_TYPE_FAILED_PROVIDER_LOCKED,$
>> > providerId
>> > > c38a67ec-0b48-4e6f-be85-70c700df54832018-03-15 10:05:00,240-04
>> > > DEBUG
>> > > [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor]
>> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] method:
>> > > runAction, params: [AddNetwork,
>> > > AddNetworkStoragePoolParameters:{commandId='61b365ec-27c1-
>> > 49af-ad72-f907df8befcd',
>> > > user='null', commandType='Unknown'}], timeElapsed: 10ms2018-03-15
>> > > 10:05:00,250-04 ERROR
>> > > [org.ovirt.engine.api.restapi.resource.AbstractBackendResource]
>> > > (default task-13) [] Operation Failed: [Cannot add Network.
>> > > Related operation on provider with the id
>> > > c38a67ec-0b48-4e6f-be85-70c700df5483 is currently in progress.
>> > > Please try again later.]2018-03-15 10:05:00,254-04 DEBUG
>> > > [org.ovirt.engine.core.utils.servlet.LocaleFilter] (default
>> > > task-14) [] Incoming locale 'en-US'. Filter determined locale to
>> > > be 'en-US'2018-03-15 10:05:00,254-04 DEBUG
>> > > [org.ovirt.engine.core.sso.servlets.OAuthTokenServlet] (default
>> > > task-14) [] Entered OAuthTokenServlet Query String: null,
>> > > Parameters : password = ***, grant_type = password, scope =
>> > > ovirt-app-api ovirt-ext=token-info:validate, username =
>> > > admin at internal, </error>*
>> >
>> > I will care about this.
>> > The problem is that SyncNetworkProviderCommand is running in the
>> > background and locking the provider, which blocks the lock for the
>> > tested AddNetworkCommand.
>> > The related changes are
>> > core: Add locking for Add and RemoveNetworkCommand
>> > https://gerrit.ovirt.org/#/c/85480/
>> > and
>> > core: Add SyncNetworkProviderCommand
>> > https://gerrit.ovirt.org/#/c/85134/
>> >
>> >
>> >
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
>


More information about the Devel mailing list