<div dir="ltr"><div>Dominik,<br><br></div>Here is the exception which caused "Not able to update response"<br><br><pre>2017-02-09 06:30:41,828-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-13) [1abe9b01] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM lago-basic-suite-master-host1 command HostSetupNetworksVDS failed: 880 > 879
2017-02-09 06:30:41,829-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (org.ovirt.thread.pool-7-thread-13) [1abe9b01] Command 'HostSetupNetworksVDSCommand(HostName = lago-basic-suite-master-host1, HostSetupNetworksVdsCommandParameters:{runAsync='true', hostId='42fa76b6-7bf9-463e-80fa-fcca9b0d3e81', vds='Host[lago-basic-suite-master-host1,42fa76b6-7bf9-463e-80fa-fcca9b0d3e81]', rollbackOnFailure='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='false', bonding='false', networkName='Labeled_Network', nicName='eth0', vlan='600', mtu='0', vmNetwork='false', stp='false', properties='null', ipv4BootProtocol='NONE', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY'})' execution failed: VDSGenericException: VDSNetworkException: 880 > 879
2017-02-09 06:30:41,829-05 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (org.ovirt.thread.pool-7-thread-13) [1abe9b01] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: 880 > 879
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:70) [vdsbroker.jar:]
        at org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.executeCommand(HostSetupNetworksCommand.java:290) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) [bll.jar:]<br><br></pre><pre>This could be connected with the specific version of the jackson that we use.<br></pre></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 21, 2017 at 4:42 PM, Dominik Holler <span dir="ltr"><<a href="mailto:dholler@redhat.com" target="_blank">dholler@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">A deep analysis of the logfiles gives details about the<br>
unexpected behavior, but I regret to not provide the fault causing the<br>
unexpected behavior.<br>
<br>
To get this fault, the help of someone familiar with<br>
org.ovirt.vdsm.jsonrpc.client.<wbr>JsonRpcClient is needed.<br>
<br>
In the failing test "assign_labeled_network" a (labeld) network is<br>
assigned to the cluster. For this reason the network has to be added to<br>
the hosts. After that, the test "assign_labeled_network" checks, if the<br>
engine acknowledges that hosts are in the labeld network. This<br>
execution of the test failed, because this acknowledge of the engine<br>
is missing after 180 seconds [3].<br>
<br>
There are two hosts lago-basic-suite-master-host0 and<br>
lago-basic-suite-master-host1 in the scenario.<br>
lago-basic-suite-master-host1 fails and<br>
lago-basic-suite-master-host0 succeeds, so only<br>
lago-basic-suite-master-host1 is analyzed below.<br>
<br>
Please find here the most relevant steps causing this error:<br>
1. The engine sends Host.setupNetworks to the hosts in<br>
line <a href="tel:40279%20-%2040295" value="+14027940295">40279 - 40295</a> in [1] with<br>
"id":"02298344-165f-47e4-9ea4-<wbr>7c17a55d37f8".<br>
2. The host executes the Host.setupNetworks RPC call successfully in<br>
line 1286 in [2].<br>
3. The engine receives the acknowledgment of the successful execution<br>
in line 40716 and 40717 in [1].<br>
4. The error occurs in line 40718:<br>
'[org.ovirt.vdsm.jsonrpc.<wbr>client.JsonRpcClient] (ResponseWorker) []<br>
Not able to update response for<br>
"02298344-165f-47e4-9ea4-<wbr>7c17a55d37f8"'. This means the engine can<br>
not process the acknowledgment of the successful execution.<br>
5. The command HostSetupNetworksVDS is aborted.<br>
So Host.getCapabilities is skipped and the engine database is not<br>
updated with the new network configuration of the host.<br>
6. Since the test script relays on the information from database<br>
about host network configuration, it does not see that<br>
Host.setupNetworks is successfully executed and stops with the<br>
error "False != True after 180 seconds" [3]<br>
<br>
So the fault happens before or in step 4 and is around the jsonrpc<br>
communication.<br>
<br>
It is an open action item to precise the location of the fault.<br>
<br>
<br>
<br>
[1]<br>
<a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_network_by_label.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/lago-basic-suite-<wbr>master-engine/_var_log/ovirt-<wbr>engine/engine.log</a><br>
<br>
[2]<br>
<a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_network_by_label.py/lago-basic-suite-master-host1/_var_log/vdsm/vdsm.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/lago-basic-suite-<wbr>master-host1/_var_log/vdsm/<wbr>vdsm.log</a><br>
<br>
[3]<br>
<a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/testReport/junit/(root)/005_network_by_label/assign_labeled_network/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5217/testReport/junit/(<wbr>root)/005_network_by_label/<wbr>assign_labeled_network/</a><br>
<br>
<br>
<br>
On Thu, 9 Feb 2017 14:52:52 +0200<br>
Shlomo Ben David <<a href="mailto:sbendavi@redhat.com">sbendavi@redhat.com</a>> wrote:<br>
<br>
> Hi,<br>
><br>
><br>
> *Test failed:* [test-repo_ovirt_experimental_<wbr>master]<br>
><br>
> *Link to suspected patches:* n/a<br>
><br>
> *Link to Job:*<br>
> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5217</a><br>
><br>
> *Link to all logs:*<br>
> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5217/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-005_network_by_label.py/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5217/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-005_network_by_<wbr>label.py/</a><br>
><br>
> *Error snippet from the log: *<br>
><br>
> <error><br>
><br>
> ifup/VLAN100_Network::ERROR::<wbr>2017-02-09<br>
> 06:21:15,236::concurrent::189:<wbr>:root::(run) FINISH thread<br>
> <Thread(ifup/VLAN100_Network, started daemon 140189099861760)> failed<br>
> Traceback (most recent call last):<br>
> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/concurrent.py", line<br>
> 185, in run ret = func(*args, **kwargs)<br>
> File<br>
> "/usr/lib/python2.7/site-<wbr>packages/vdsm/network/<wbr>configurators/ifcfg.py",<br>
> line 949, in _exec_ifup _exec_ifup_by_name(<a href="http://iface.name" rel="noreferrer" target="_blank">iface.name</a>, cgroup)<br>
> File<br>
> "/usr/lib/python2.7/site-<wbr>packages/vdsm/network/<wbr>configurators/ifcfg.py",<br>
> line 935, in _exec_ifup_by_name raise<br>
> ConfigNetworkError(ERR_FAILED_<wbr>IFUP, out[-1] if out else '')<br>
> ConfigNetworkError: (29, 'Determining IPv6 information for<br>
> VLAN100_Network... failed.')<br>
><br>
> </error><br>
><br>
> Best Regards,<br>
><br>
> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL<br>
> RHCSA | RHCVA | RHCE<br>
> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)<br>
><br>
> OPEN SOURCE - 1 4 011 && 011 4 1<br>
<br>
</blockquote></div><br></div>