Dominik,
Here is the exception which caused "Not able to update response"
2017-02-09 06:30:41,828-05 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VDSM lago-basic-suite-master-host1
command HostSetupNetworksVDS failed: 880 > 879
2017-02-09 06:30:41,829-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Command
'HostSetupNetworksVDSCommand(HostName = lago-basic-suite-master-host1,
HostSetupNetworksVdsCommandParameters:{runAsync='true',
hostId='42fa76b6-7bf9-463e-80fa-fcca9b0d3e81',
vds='Host[lago-basic-suite-master-host1,42fa76b6-7bf9-463e-80fa-fcca9b0d3e81]',
rollbackOnFailure='true', connectivityTimeout='120',
networks='[HostNetwork:{defaultRoute='false', bonding='false',
networkName='Labeled_Network', nicName='eth0', vlan='600',
mtu='0',
vmNetwork='false', stp='false', properties='null',
ipv4BootProtocol='NONE', ipv4Address='null', ipv4Netmask='null',
ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null',
ipv6Prefix='null', ipv6Gateway='null'}]',
removedNetworks='[]',
bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY'})'
execution
failed: VDSGenericException: VDSNetworkException: 880 > 879
2017-02-09 06:30:41,829-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
(org.ovirt.thread.pool-7-thread-13) [1abe9b01] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: 880 > 879
at
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:70)
[vdsbroker.jar:]
at
org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.executeCommand(HostSetupNetworksCommand.java:290)
[bll.jar:]
at
org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251)
[bll.jar:]
at
org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391)
[bll.jar:]
at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055)
[bll.jar:]
at
org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164)
[utils.jar:]
at
org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103)
[utils.jar:]
at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451)
[bll.jar:]
at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397)
[bll.jar:]
This could be connected with the specific version of the jackson that we use.
On Tue, Feb 21, 2017 at 4:42 PM, Dominik Holler <dholler(a)redhat.com> wrote:
A deep analysis of the logfiles gives details about the
unexpected behavior, but I regret to not provide the fault causing the
unexpected behavior.
To get this fault, the help of someone familiar with
org.ovirt.vdsm.jsonrpc.client.JsonRpcClient is needed.
In the failing test "assign_labeled_network" a (labeld) network is
assigned to the cluster. For this reason the network has to be added to
the hosts. After that, the test "assign_labeled_network" checks, if the
engine acknowledges that hosts are in the labeld network. This
execution of the test failed, because this acknowledge of the engine
is missing after 180 seconds [3].
There are two hosts lago-basic-suite-master-host0 and
lago-basic-suite-master-host1 in the scenario.
lago-basic-suite-master-host1 fails and
lago-basic-suite-master-host0 succeeds, so only
lago-basic-suite-master-host1 is analyzed below.
Please find here the most relevant steps causing this error:
1. The engine sends Host.setupNetworks to the hosts in
line 40279 - 40295 in [1] with
"id":"02298344-165f-47e4-9ea4-7c17a55d37f8".
2. The host executes the Host.setupNetworks RPC call successfully in
line 1286 in [2].
3. The engine receives the acknowledgment of the successful execution
in line 40716 and 40717 in [1].
4. The error occurs in line 40718:
'[org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) []
Not able to update response for
"02298344-165f-47e4-9ea4-7c17a55d37f8"'. This means the engine can
not process the acknowledgment of the successful execution.
5. The command HostSetupNetworksVDS is aborted.
So Host.getCapabilities is skipped and the engine database is not
updated with the new network configuration of the host.
6. Since the test script relays on the information from database
about host network configuration, it does not see that
Host.setupNetworks is successfully executed and stops with the
error "False != True after 180 seconds" [3]
So the fault happens before or in step 4 and is around the jsonrpc
communication.
It is an open action item to precise the location of the fault.
[1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/5217/artifact/exported-artifacts/basic-suit-master-
el7/test_logs/basic-suite-master/post-005_network_by_
label.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
[2]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/5217/artifact/exported-artifacts/basic-suit-master-
el7/test_logs/basic-suite-master/post-005_network_by_
label.py/lago-basic-suite-master-host1/_var_log/vdsm/vdsm.log
[3]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/5217/testReport/junit/(root)/005_network_by_label/
assign_labeled_network/
On Thu, 9 Feb 2017 14:52:52 +0200
Shlomo Ben David <sbendavi(a)redhat.com> wrote:
> Hi,
>
>
> *Test failed:* [test-repo_ovirt_experimental_master]
>
> *Link to suspected patches:* n/a
>
> *Link to Job:*
>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217
>
> *Link to all logs:*
>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/5217/artifact/exported-artifacts/basic-suit-master-
el7/test_logs/basic-suite-master/post-005_network_by_label.py/
>
> *Error snippet from the log: *
>
> <error>
>
> ifup/VLAN100_Network::ERROR::2017-02-09
> 06:21:15,236::concurrent::189::root::(run) FINISH thread
> <Thread(ifup/VLAN100_Network, started daemon 140189099861760)> failed
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/concurrent.py", line
> 185, in run ret = func(*args, **kwargs)
> File
> "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
> line 949, in _exec_ifup _exec_ifup_by_name(iface.name, cgroup)
> File
> "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
> line 935, in _exec_ifup_by_name raise
> ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
> ConfigNetworkError: (29, 'Determining IPv6 information for
> VLAN100_Network... failed.')
>
> </error>
>
> Best Regards,
>
> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL
> RHCSA | RHCVA | RHCE
> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
>
> OPEN SOURCE - 1 4 011 && 011 4 1