<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 6, 2017 at 10:10 AM, Piotr Kliczewski <span dir="ltr">&lt;<a href="mailto:pkliczew@redhat.com" target="_blank">pkliczew@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div class="h5"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 6, 2017 at 9:46 AM, Dan Kenigsberg <span dir="ltr">&lt;<a href="mailto:danken@redhat.com" target="_blank">danken@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_-3871231027632991197HOEnZb"><div class="m_-3871231027632991197h5">On Mon, Mar 6, 2017 at 10:11 AM, Piotr Kliczewski &lt;<a href="mailto:pkliczew@redhat.com" target="_blank">pkliczew@redhat.com</a>&gt; wrote:<br>
&gt;<br>
&gt;<br>
&gt; On Mon, Mar 6, 2017 at 8:23 AM, Dan Kenigsberg &lt;<a href="mailto:danken@redhat.com" target="_blank">danken@redhat.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; On Sun, Mar 5, 2017 at 9:50 PM, Piotr Kliczewski &lt;<a href="mailto:pkliczew@redhat.com" target="_blank">pkliczew@redhat.com</a>&gt;<br>
&gt;&gt; wrote:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; On Sun, Mar 5, 2017 at 8:29 AM, Dan Kenigsberg &lt;<a href="mailto:danken@redhat.com" target="_blank">danken@redhat.com</a>&gt;<br>
&gt;&gt; &gt; wrote:<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Piotr, could you provide more information?<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Which setupNetworks action triggers this problem? Any idea which lock<br>
&gt;&gt; &gt;&gt; did we use to take and when did we drop it?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; I though that this [1] would make sure that setupNetworks is exclusive<br>
&gt;&gt; &gt; operation on a host which seems not to be the case.<br>
&gt;&gt; &gt; In the logs I saw following message sent:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; {&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;method&quot;:&quot;Hos<wbr>t.setupNetworks&quot;,&quot;params&quot;:{&quot;<wbr>networks&quot;:{&quot;VLAN200_Network&quot;:{<wbr>&quot;vlan&quot;:&quot;200&quot;,&quot;netmask&quot;:&quot;255.25<wbr>5.255.0&quot;,&quot;ipv6autoconf&quot;:false,<wbr>&quot;nic&quot;:&quot;eth0&quot;,&quot;bridged&quot;:&quot;false&quot;<wbr>,&quot;ipaddr&quot;:&quot;192.0.3.1&quot;,&quot;dhcpv6&quot;<wbr>:false,&quot;mtu&quot;:1500,&quot;switch&quot;:&quot;<wbr>legacy&quot;}},&quot;bondings&quot;:{},&quot;<wbr>options&quot;:{&quot;connectivityTimeout<wbr>&quot;:120,&quot;connectivityCheck&quot;:&quot;<wbr>true&quot;}},&quot;id&quot;:&quot;3f7f74ea-fc39-<wbr>4815-831b-5e3b1c22131d&quot;}<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Few seconds later there was:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; {&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;method&quot;:&quot;Hos<wbr>t.getAllVmStats&quot;,&quot;params&quot;:{},&quot;<wbr>id&quot;:&quot;67d510eb-6dfc-4f67-97b6-<wbr>a4e63c670ff2&quot;}<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; and still while we were calling pings there was:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; {&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;method&quot;:&quot;Sto<wbr>ragePool.getSpmStatus&quot;,&quot;params<wbr>&quot;:{&quot;storagepoolID&quot;:&quot;8cc227da-<wbr>70e7-4557-aa01-6d8ddee6f847&quot;},<wbr>&quot;id&quot;:&quot;d4d04c7c-47b8-44db-867b-<wbr>770e1e19361c&quot;}<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; My assumption was that those calls should not happen and calls them<br>
&gt;&gt; &gt; selves<br>
&gt;&gt; &gt; could be corrupted or their responses.<br>
&gt;&gt; &gt; What do you think?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; [1]<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; <a href="https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/host/HostSetupNetworksCommand.java#L285" rel="noreferrer" target="_blank">https://github.com/oVirt/ovirt<wbr>-engine/blob/master/backend/<wbr>manager/modules/bll/src/main/<wbr>java/org/ovirt/engine/core/<wbr>bll/network/host/HostSetupNetw<wbr>orksCommand.java#L285</a><br>
&gt;&gt;<br>
&gt;&gt; I suspect that getVmStats and getSpmStatus simply do not take the<br>
&gt;&gt; hostmonitoring lock, and I don&#39;t see anything wrong in that.<br>
&gt;&gt;<br>
&gt;&gt; Note that during 006_migration, we set only a mere migration network,<br>
&gt;&gt; not the management network. This operation should not interfere with<br>
&gt;&gt; Engine-Vdsm communication in any way; I don&#39;t yet understand why you<br>
&gt;&gt; suspect that it does.<br>
&gt;<br>
&gt;<br>
&gt; My assumption here is that I saw this failure 2 times and both were during<br>
&gt; setupNetworks.<br>
&gt; The pattern is that always a call fails which &quot;should not&quot; occur during such<br>
&gt; operation.<br>
&gt;<br>
<br>
</div></div>It is fair to suspect an interaction with setupNetworks, but let us<br>
put some substance into it.<br>
What is the mode of failure of the other command?<br>
</blockquote></div><br></div></div></div><div class="gmail_extra">I am not sure what do you mean. Can you please explain?<br></div><div class="gmail_extra"><br></div></div>
</blockquote></div><br></div><div class="gmail_extra">Now, I understand the question (explanation offline). The reason why the parse method fails is that we have heartbeat frame glued together<br></div><div class="gmail_extra">with a response we should not get (partial). <br><br></div><div class="gmail_extra">During setupNetworks we ignore heartbeats but it doesn&#39;t mean that we do not receive them.<br></div><div class="gmail_extra">It seems that vdsm sends a heartbeat assuming that there was no interaction but actually there was.<br></div><div class="gmail_extra">We may want to fix this on vdsm side.<br><br></div><div class="gmail_extra">On the other hand we need to fix setupNetworks locking on the engine side. We either should not lock or<br></div><div class="gmail_extra">make sure the lock is taken for all possible interactions.<br></div><div class="gmail_extra"><br></div></div>