<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 5, 2017 at 8:29 AM, Dan Kenigsberg <span dir="ltr">&lt;<a href="mailto:danken@redhat.com" target="_blank">danken@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Piotr, could you provide more information?<br>
<br>
Which setupNetworks action triggers this problem? Any idea which lock<br>
did we use to take and when did we drop it?<br></blockquote><div><br></div><div>I though that this [1] would make sure that setupNetworks is exclusive operation on a host which seems not to be the case.<br></div><div>In the logs I saw following message sent:<br><br>{&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;method&quot;:&quot;Host.setupNetworks&quot;,&quot;params&quot;:{&quot;networks&quot;:{&quot;VLAN200_Network&quot;:{&quot;vlan&quot;:&quot;200&quot;,&quot;netmask&quot;:&quot;255.255.255.0&quot;,&quot;ipv6autoconf&quot;:false,&quot;nic&quot;:&quot;eth0&quot;,&quot;bridged&quot;:&quot;false&quot;,&quot;ipaddr&quot;:&quot;192.0.3.1&quot;,&quot;dhcpv6&quot;:false,&quot;mtu&quot;:1500,&quot;switch&quot;:&quot;legacy&quot;}},&quot;bondings&quot;:{},&quot;options&quot;:{&quot;connectivityTimeout&quot;:120,&quot;connectivityCheck&quot;:&quot;true&quot;}},&quot;id&quot;:&quot;3f7f74ea-fc39-4815-831b-5e3b1c22131d&quot;}<br><br></div><div>Few seconds later there was:<br><br>{&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;method&quot;:&quot;Host.getAllVmStats&quot;,&quot;params&quot;:{},&quot;id&quot;:&quot;67d510eb-6dfc-4f67-97b6-a4e63c670ff2&quot;}<br><br></div><div>and still while we were calling pings there was:<br><br>{&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;method&quot;:&quot;StoragePool.getSpmStatus&quot;,&quot;params&quot;:{&quot;storagepoolID&quot;:&quot;8cc227da-70e7-4557-aa01-6d8ddee6f847&quot;},&quot;id&quot;:&quot;d4d04c7c-47b8-44db-867b-770e1e19361c&quot;}<br><br></div><div>My assumption was that those calls should not happen and calls them selves could be corrupted or their responses.<br></div><div>What do you think?</div><div><br>[1] <a href="https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/host/HostSetupNetworksCommand.java#L285">https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/host/HostSetupNetworksCommand.java#L285</a> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Is it related to your<br>
<a href="https://gerrit.ovirt.org/#/q/Idaa54767bb7e54bf13e89887ca34fa8e01ade420" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/q/<wbr>Idaa54767bb7e54bf13e89887ca34f<wbr>a8e01ade420</a><br>
(jsonrpc incomplete message)?<br>
<div><div class="gmail-h5"><br></div></div></blockquote><div><br></div><div>Seems not to be related.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">
<br>
On Fri, Mar 3, 2017 at 4:10 PM, Piotr Kliczewski &lt;<a href="mailto:pkliczew@redhat.com">pkliczew@redhat.com</a>&gt; wrote:<br>
&gt; This one we say already. The cause of it that during network setup engine<br>
&gt; sends messages which may fail or partially arrive.<br>
&gt; We used to have host level lock to protect this kind of situation but it<br>
&gt; seems like we do not have it anymore.<br>
&gt;<br>
&gt; Previous failure was triggered by host monitoring now it was SpmStatus.<br>
&gt;<br>
&gt; On Fri, Mar 3, 2017 at 2:54 PM, Pavel Zhukov &lt;<a href="mailto:pzhukov@redhat.com">pzhukov@redhat.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Hi,<br>
&gt;&gt;<br>
&gt;&gt; Migration is failed because host is in Connecting state.<br>
&gt;&gt; Seems like another jsonrpc related issue (Unrecognized message received).<br>
&gt;&gt;<br>
&gt;&gt; Job:<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5654/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5654/</a><br>
&gt;&gt; Logs:<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5654/artifact/exported-artifacts/*zip*/exported-artifacts.zip" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5654/artifact/exported-<wbr>artifacts/*zip*/exported-<wbr>artifacts.zip</a><br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Pavel<br>
&gt;&gt;<br>
&gt;&gt; [LOGS SNIPPET]<br>
&gt;&gt; 2017-03-03 06:00:40,882-05 DEBUG<br>
&gt;&gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.Reactor] (SSL Stomp Reactor)<br>
&gt;&gt; [3155acab] Unable to process messages Unrecognized message received :<br>
&gt;&gt; org.ovirt.vdsm.jsonrpc.client.<wbr>ClientConnectionException: Unrecognized<br>
&gt;&gt; message received<br>
&gt;&gt;   42965:2017-03-03 06:00:40,889-05 ERROR<br>
&gt;&gt; [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]<br>
&gt;&gt; (DefaultQuartzScheduler4) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,<wbr>802),<br>
&gt;&gt; Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM<br>
&gt;&gt; lago-basic-suite-master-host1 command SpmStatusVDS failed: Unrecognized<br>
&gt;&gt; message received<br>
&gt;&gt;   42966:2017-03-03 06:00:40,889-05 ERROR<br>
&gt;&gt; [org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>SpmStatusVDSCommand]<br>
&gt;&gt; (DefaultQuartzScheduler4) [] Command &#39;SpmStatusVDSCommand(HostName =<br>
&gt;&gt; lago-basic-suite-master-host1,<br>
&gt;&gt; SpmStatusVDSCommandParameters:<wbr>{runAsync=&#39;true&#39;,<br>
&gt;&gt; hostId=&#39;bba0ec26-4856-4389-<wbr>982d-2ad68cb3f682&#39;,<br>
&gt;&gt; storagePoolId=&#39;8cc227da-70e7-<wbr>4557-aa01-6d8ddee6f847&#39;})&#39; execution failed:<br>
&gt;&gt; VDSGenericException: VDSNetworkException: Unrecognized message received<br>
&gt;&gt;   42999:2017-03-03 06:00:40,925-05 WARN<br>
&gt;&gt; [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]<br>
&gt;&gt; (DefaultQuartzScheduler4) [6c7fb340] EVENT_ID:<br>
&gt;&gt; SYSTEM_CHANGE_STORAGE_POOL_<wbr>STATUS_PROBLEMATIC_WITH_ERROR(<wbr>987), Correlation<br>
&gt;&gt; ID: 6c7fb340, Call Stack: null, Custom Event ID: -1, Message: Invalid status<br>
&gt;&gt; on Data Center test-dc. Setting Data Center status to Non Responsive (On<br>
&gt;&gt; host lago-basic-suite-master-host1, Error: Network error during<br>
&gt;&gt; communication with the Host.).<br>
&gt;&gt;   43457:2017-03-03 06:00:44,466-05 ERROR<br>
&gt;&gt; [org.ovirt.engine.core.bll.<wbr>network.host.HostValidator] (default task-17)<br>
&gt;&gt; [14be0e43-97b8-4882-bbcc-<wbr>27392543fae6] Unable to setup network: operation<br>
&gt;&gt; can only be done when Host status is one of: Maintenance, Up,<br>
&gt;&gt; NonOperational; current status is Connecting<br>
&gt;&gt;   43460:2017-03-03 06:00:44,476-05 ERROR<br>
&gt;&gt; [org.ovirt.engine.api.restapi.<wbr>resource.<wbr>AbstractBackendResource] (default<br>
&gt;&gt; task-17) [] Operation Failed: [Cannot setup Networks. Operation can be<br>
&gt;&gt; performed only when Host status is  Maintenance, Up, NonOperational.]<br>
&gt;<br>
&gt;<br>
&gt;<br>
</div></div>&gt; ______________________________<wbr>_________________<br>
&gt; Devel mailing list<br>
&gt; <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
</blockquote></div><br></div></div>