<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 20, 2017 at 2:42 PM, Nir Soffer <span dir="ltr">&lt;<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br><br><div class="gmail_quote"><div dir="ltr">בתאריך יום ה׳, 20 באפר׳ 2017, 13:51, מאת Yaniv Kaul ‏&lt;<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>&gt;:<br></div><div><div class="gmail-h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" id="gmail-m_-8546862986421770527gmail_block_quote0"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Apr 20, 2017 at 1:32 PM, Nir Soffer <span dir="ltr">&lt;<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br><br><div class="gmail_quote"><div dir="ltr">בתאריך יום ה׳, 20 באפר׳ 2017, 13:05, מאת Piotr Kliczewski ‏&lt;<a href="mailto:piotr.kliczewski@gmail.com" target="_blank">piotr.kliczewski@gmail.com</a>&gt;:<br></div><div><div class="gmail-m_-8546862986421770527m_8882979470196019703gmail-h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Apr 20, 2017 at 11:49 AM, Yaniv Kaul &lt;<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>&gt; wrote:<br>
&gt;<br>
&gt;<br>
&gt; On Thu, Apr 20, 2017 at 11:55 AM, Piotr Kliczewski<br>
&gt; &lt;<a href="mailto:piotr.kliczewski@gmail.com" target="_blank">piotr.kliczewski@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; On Thu, Apr 20, 2017 at 10:32 AM, Yaniv Kaul &lt;<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>&gt; wrote:<br>
&gt;&gt; &gt; No, that&#39;s not the issue.<br>
&gt;&gt; &gt; I&#39;ve seen it happening few times.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 1. It always with the ISO domain (which we don&#39;t use anyway in o-s-t)<br>
&gt;&gt; &gt; 2. Apparently, only one host is asking for a mount:<br>
&gt;&gt; &gt;  authenticated mount request from <a href="http://192.168.201.4:713" rel="noreferrer" target="_blank">192.168.201.4:713</a> for /exports/nfs/iso<br>
&gt;&gt; &gt; (/exports/nfs/iso)<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; (/var/log/messages of the NFS server)<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; And indeed, you can see in[1] that host1 made the request and all is<br>
&gt;&gt; &gt; well on<br>
&gt;&gt; &gt; it.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; However, there are connection issues with host0 which cause a timeout to<br>
&gt;&gt; &gt; connectStorageServer():<br>
&gt;&gt; &gt; From[2]:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 2017-04-19 18:58:58,465-04 DEBUG<br>
&gt;&gt; &gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.internal.<wbr>ResponseWorker] (ResponseWorker)<br>
&gt;&gt; &gt; []<br>
&gt;&gt; &gt; Message received:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; {&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;error&quot;:{&quot;<wbr>code&quot;:&quot;lago-basic-suite-<wbr>master-host0:192912448&quot;,&quot;<wbr>message&quot;:&quot;Vds<br>
&gt;&gt; &gt; timeout occured&quot;},&quot;id&quot;:null}<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 2017-04-19 18:58:58,475-04 ERROR<br>
&gt;&gt; &gt; [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]<br>
&gt;&gt; &gt; (org.ovirt.thread.pool-7-<wbr>thread-37) [755b908a] EVENT_ID:<br>
&gt;&gt; &gt; VDS_BROKER_COMMAND_FAILURE(10,<wbr>802), Correlation ID: null, Call Stack:<br>
&gt;&gt; &gt; null,<br>
&gt;&gt; &gt; Custom Event ID: -1, Message: VDSM lago-basic-suite-master-host0 command<br>
&gt;&gt; &gt; ConnectStorageServerVDS failed: Message timeout which can be caused by<br>
&gt;&gt; &gt; communication issues<br>
&gt;&gt; &gt; 2017-04-19 18:58:58,475-04 INFO<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; [org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>ConnectStorageServerVDSCommand<wbr>]<br>
&gt;&gt; &gt; (org.ovirt.thread.pool-7-<wbr>thread-37) [755b908a] Command<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; &#39;org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>ConnectStorageServerVDSCommand<wbr>&#39;<br>
&gt;&gt; &gt; return value &#39;<br>
&gt;&gt; &gt; ServerConnectionStatusReturn:{<wbr>status=&#39;Status [code=5022, message=Message<br>
&gt;&gt; &gt; timeout which can be caused by communication issues]&#39;}<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; I wonder why, but on /var/log/messages[3], I&#39;m seeing:<br>
&gt;&gt; &gt; Apr 19 18:56:58 lago-basic-suite-master-host0 journal: vdsm Executor<br>
&gt;&gt; &gt; WARN<br>
&gt;&gt; &gt; Worker blocked: &lt;Worker name=jsonrpc/3 running &lt;Task &lt;JsonRpcTask<br>
&gt;&gt; &gt; {&#39;params&#39;:<br>
&gt;&gt; &gt; {u&#39;connectionParams&#39;: [{u&#39;id&#39;: u&#39;4ca8fc84-d872-4a7f-907f-<wbr>9445bda7b6d1&#39;,<br>
&gt;&gt; &gt; u&#39;connection&#39;: u&#39;192.168.201.3:/exports/nfs/<wbr>share1&#39;, u&#39;iqn&#39;: u&#39;&#39;,<br>
&gt;&gt; &gt; u&#39;user&#39;:<br>
&gt;&gt; &gt; u&#39;&#39;, u&#39;tpgt&#39;: u&#39;1&#39;, u&#39;protocol_version&#39;: u&#39;4.2&#39;, u&#39;password&#39;:<br>
&gt;&gt; &gt; &#39;********&#39;,<br>
&gt;&gt; &gt; u&#39;port&#39;: u&#39;&#39;}], u&#39;storagepoolID&#39;:<br>
&gt;&gt; &gt; u&#39;00000000-0000-0000-0000-<wbr>000000000000&#39;,<br>
&gt;&gt; &gt; u&#39;domainType&#39;: 1}, &#39;jsonrpc&#39;: &#39;2.0&#39;, &#39;method&#39;:<br>
&gt;&gt; &gt; u&#39;StoragePool.<wbr>connectStorageServer&#39;, &#39;id&#39;:<br>
&gt;&gt; &gt; u&#39;057da9c2-1e67-4c2f-9511-<wbr>7d9de250386b&#39;} at 0x2f44110&gt; timeout=60,<br>
&gt;&gt; &gt; duration=60 at 0x2f44310&gt; task#=9 at 0x2ac11d0&gt;<br>
&gt;&gt; &gt; ...<br>
&gt;&gt; &gt;<br>
&gt;&gt;<br>
&gt;&gt; I see following sequence:<br>
&gt;&gt;<br>
&gt;&gt; The message is sent:<br>
&gt;&gt;<br>
&gt;&gt; 2017-04-19 18:55:58,020-04 DEBUG<br>
&gt;&gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.stomp.<wbr>StompCommonClient]<br>
&gt;&gt; (org.ovirt.thread.pool-7-<wbr>thread-37) [755b908a] Message sent: SEND<br>
&gt;&gt; destination:jms.topic.vdsm_<wbr>requests<br>
&gt;&gt; content-length:381<br>
&gt;&gt; ovirtCorrelationId:755b908a<br>
&gt;&gt; reply-to:jms.topic.vdsm_<wbr>responses<br>
&gt;&gt;<br>
&gt;&gt; &lt;JsonRpcRequest id: &quot;057da9c2-1e67-4c2f-9511-<wbr>7d9de250386b&quot;, method:<br>
&gt;&gt; StoragePool.<wbr>connectStorageServer, params:<br>
&gt;&gt; {storagepoolID=00000000-0000-<wbr>0000-0000-000000000000, domainType=1,<br>
&gt;&gt; connectionParams=[{password=, protocol_version=4.2, port=, iqn=,<br>
&gt;&gt; connection=192.168.201.3:/<wbr>exports/nfs/share1,<br>
&gt;&gt; id=4ca8fc84-d872-4a7f-907f-<wbr>9445bda7b6d1, user=, tpgt=1}]}&gt;<br>
&gt;&gt;<br>
&gt;&gt; There is no response for specified amount of time and we timeout:<br>
&gt;&gt;<br>
&gt;&gt; 2017-04-19 18:58:58,465-04 DEBUG<br>
&gt;&gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.internal.<wbr>ResponseWorker]<br>
&gt;&gt; (ResponseWorker) [] Message received:<br>
&gt;&gt;<br>
&gt;&gt; {&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;error&quot;:{&quot;<wbr>code&quot;:&quot;lago-basic-suite-<wbr>master-host0:192912448&quot;,&quot;<wbr>message&quot;:&quot;Vds<br>
&gt;&gt; timeout occured&quot;},&quot;id&quot;:null}<br>
&gt;&gt;<br>
&gt;&gt; As Yaniv pointed here is why we never get the response:<br>
&gt;&gt;<br>
&gt;&gt; Apr 19 18:58:58 lago-basic-suite-master-host0 journal: vdsm Executor<br>
&gt;&gt; WARN Worker blocked: &lt;Worker name=jsonrpc/3 running &lt;Task &lt;JsonRpcTask<br>
&gt;&gt; {&#39;params&#39;: {u&#39;connectionParams&#39;: [{u&#39;id&#39;:<br>
&gt;&gt; u&#39;4ca8fc84-d872-4a7f-907f-<wbr>9445bda7b6d1&#39;, u&#39;connection&#39;:<br>
&gt;&gt; u&#39;192.168.201.3:/exports/nfs/<wbr>share1&#39;, u&#39;iqn&#39;: u&#39;&#39;, u&#39;user&#39;: u&#39;&#39;,<br>
&gt;&gt; u&#39;tpgt&#39;: u&#39;1&#39;, u&#39;protocol_version&#39;: u&#39;4.2&#39;, u&#39;password&#39;: &#39;********&#39;,<br>
&gt;&gt; u&#39;port&#39;: u&#39;&#39;}], u&#39;storagepoolID&#39;:<br>
&gt;&gt; u&#39;00000000-0000-0000-0000-<wbr>000000000000&#39;, u&#39;domainType&#39;: 1}, &#39;jsonrpc&#39;:<br>
&gt;&gt; &#39;2.0&#39;, &#39;method&#39;: u&#39;StoragePool.<wbr>connectStorageServer&#39;, &#39;id&#39;:<br>
&gt;&gt; u&#39;057da9c2-1e67-4c2f-9511-<wbr>7d9de250386b&#39;} at 0x2f44110&gt; timeout=60,<br>
&gt;&gt; duration=180 at 0x2f44310&gt; task#=9 at 0x2ac11d0&gt;<br></blockquote></div></div></div><div><br></div><div>This means the connection attempt was stuck for 180 seconds. Need to check if the mount was stuck, or maybe there is some issue in supervdsm running this.</div><div><br></div><div>This is a new warning introduced lately, before a stuck worker was hidden.</div><div><br></div><div>Do we check that the nfs server is up before we start the tests?</div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>We do not check, but it is up - it&#39;s installed before anything else, and happens to be on the Engine. It has several minutes to be up and ready.</div><div>Moreover, a command to the same NFS server, same params, only a different mount point, succeeded, a second earlier:</div><div><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em;color:rgb(0,0,0)">18:55:56,300::supervdsm_<wbr>server::92::SuperVdsm.<wbr>ServerCallback::(wrapper) call mount with (u&#39;192.168.201.3:/exports/nfs/<wbr>share2&#39;, u&#39;/rhev/data-center/mnt/192.<wbr>168.201.3:_exports_nfs_share2&#39;<wbr>) {&#39;vfstype&#39;: &#39;nfs&#39;, &#39;mntOpts&#39;: &#39;soft,nosharecache,timeo=600,<wbr>retrans=6,nfsvers=4,<wbr>minorversion=1&#39;, &#39;timeout&#39;: None, &#39;cgroup&#39;: None}</pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em;color:rgb(0,0,0)">...</pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em;color:rgb(0,0,0)"><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em">18:55:56,338::supervdsm_<wbr>server::99::SuperVdsm.<wbr>ServerCallback::(wrapper) return mount with None</pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><br></pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em">But:</pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em">18:55:58,076::supervdsm_<wbr>server::92::SuperVdsm.<wbr>ServerCallback::(wrapper) call mount with (u&#39;192.168.201.3:/exports/nfs/<wbr>share1&#39;, u&#39;/rhev/data-center/mnt/192.<wbr>168.201.3:_exports_nfs_share1&#39;<wbr>) {&#39;vfstype&#39;: &#39;nfs&#39;, &#39;mntOpts&#39;: &#39;soft,nosharecache,timeo=600,<wbr>retrans=6,nfsvers=4,<wbr>minorversion=2&#39;, &#39;timeout&#39;: None, &#39;cgroup&#39;: None}</pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><br></pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em">Got stuck.</pre></pre></pre></div></div></div></div></blockquote></div></div></div><div><br></div><div>The stuck mount is using nfs 4.2, the successful one is 4.1.</div></blockquote><div><br></div><div>On the other host it succeeded[1], but the order was different (it runs in parallel threads) .</div><div>Perhaps we should add some random sleep between them?</div><div><br></div><div>Y.</div><div><br></div><div>[1] <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host1/_var_log/vdsm/supervdsm.log">http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host1/_var_log/vdsm/supervdsm.log</a></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" id="gmail-m_-8546862986421770527gmail_block_quote1"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em;color:rgb(0,0,0)"><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><br></pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em">See <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1443913" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1443913</a></pre></pre></pre></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em;color:rgb(0,0,0)"><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em">Y.</pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><br></pre><pre class="gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_comment_text gmail-m_-8546862986421770527m_8882979470196019703gmail-bz_wrap_comment_text" id="gmail-m_-8546862986421770527m_8882979470196019703gmail-comment_text_0" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><br></pre></pre></pre></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_-8546862986421770527m_8882979470196019703gmail-HOEnZb"><div class="gmail-m_-8546862986421770527m_8882979470196019703gmail-h5"><div><br></div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
&gt;&gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 3. Also, there is still the infamous unable to update response issues.<br>
&gt;&gt; &gt;<br>
&gt;&gt;<br>
&gt;&gt; When we see timeout on a call our default behavior is to reconnect<br>
&gt;&gt; when we clean pending messages.<br>
&gt;&gt; As a result when we reconnect and receive a response from the message<br>
&gt;&gt; sent before disconnect<br>
&gt;&gt; we say it is unknown to us.<br>
&gt;<br>
&gt;<br>
&gt; But the example I&#39;ve given was earlier than the storage issue?<br>
<br>
The specific message that you refer to was a ping command but it<br>
timeout (3 secs)<br>
before it arrived and it was removed from tracking. When it finally<br>
arrived it was not tracked anymore.<br>
<br>
We may want to increase the timeout to give it more time for arrival.<br>
<br>
&gt; Y.<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; {&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;method&quot;:&quot;<wbr>Host.ping&quot;,&quot;params&quot;:{},&quot;id&quot;:&quot;<wbr>7cb6052f-c732-4f7c-bd2d-<wbr>e48c2ae1f5e0&quot;}�<br>
&gt;&gt; &gt; 2017-04-19 18:54:27,843-04 DEBUG<br>
&gt;&gt; &gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.stomp.<wbr>StompCommonClient]<br>
&gt;&gt; &gt; (org.ovirt.thread.pool-7-<wbr>thread-15) [62d198cc] Message sent: SEND<br>
&gt;&gt; &gt; destination:jms.topic.vdsm_<wbr>requests<br>
&gt;&gt; &gt; content-length:94<br>
&gt;&gt; &gt; ovirtCorrelationId:62d198cc<br>
&gt;&gt; &gt; reply-to:jms.topic.vdsm_<wbr>responses<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; &lt;JsonRpcRequest id: &quot;7cb6052f-c732-4f7c-bd2d-<wbr>e48c2ae1f5e0&quot;, method:<br>
&gt;&gt; &gt; Host.ping, params: {}&gt;<br>
&gt;&gt; &gt; 2017-04-19 18:54:27,885-04 DEBUG<br>
&gt;&gt; &gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.stomp.impl.<wbr>Message]<br>
&gt;&gt; &gt; (org.ovirt.thread.pool-7-<wbr>thread-16) [1f9aac13] SEND<br>
&gt;&gt; &gt; ovirtCorrelationId:1f9aac13<br>
&gt;&gt; &gt; destination:jms.topic.vdsm_<wbr>requests<br>
&gt;&gt; &gt; reply-to:jms.topic.vdsm_<wbr>responses<br>
&gt;&gt; &gt; content-length:94<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; ...<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; {&quot;jsonrpc&quot;: &quot;2.0&quot;, &quot;id&quot;: &quot;7cb6052f-c732-4f7c-bd2d-<wbr>e48c2ae1f5e0&quot;,<br>
&gt;&gt; &gt; &quot;result&quot;:<br>
&gt;&gt; &gt; true}�<br>
&gt;&gt; &gt; 2017-04-19 18:54:32,132-04 DEBUG<br>
&gt;&gt; &gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.internal.<wbr>ResponseWorker] (ResponseWorker)<br>
&gt;&gt; &gt; []<br>
&gt;&gt; &gt; Message received: {&quot;jsonrpc&quot;: &quot;2.0&quot;, &quot;id&quot;:<br>
&gt;&gt; &gt; &quot;7cb6052f-c732-4f7c-bd2d-<wbr>e48c2ae1f5e0&quot;, &quot;result&quot;: true}<br>
&gt;&gt; &gt; 2017-04-19 18:54:32,133-04 ERROR<br>
&gt;&gt; &gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.JsonRpcClient] (ResponseWorker) [] Not<br>
&gt;&gt; &gt; able<br>
&gt;&gt; &gt; to update response for &quot;7cb6052f-c732-4f7c-bd2d-<wbr>e48c2ae1f5e0&quot;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Would be nice to understand why.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 4. Lastly, MOM is not running. Why?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Please open a bug with the details from item #2 above.<br>
&gt;&gt; &gt; Y.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; [1]<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host1/_var_log/vdsm/supervdsm.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6403/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-002_bootstrap.py/<wbr>lago-basic-suite-master-host1/<wbr>_var_log/vdsm/supervdsm.log</a><br>
&gt;&gt; &gt; [2]<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6403/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-002_bootstrap.py/<wbr>lago-basic-suite-master-<wbr>engine/_var_log/ovirt-engine/<wbr>engine.log</a><br>
&gt;&gt; &gt; [3]<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host0/_var_log/messages" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6403/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-002_bootstrap.py/<wbr>lago-basic-suite-master-host0/<wbr>_var_log/messages</a><br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; On Thu, Apr 20, 2017 at 9:27 AM, Gil Shinar &lt;<a href="mailto:gshinar@redhat.com" target="_blank">gshinar@redhat.com</a>&gt; wrote:<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Test failed: add_secondary_storage_domains<br>
&gt;&gt; &gt;&gt; Link to suspected patches:<br>
&gt;&gt; &gt;&gt; Link to Job:<br>
&gt;&gt; &gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6403</a><br>
&gt;&gt; &gt;&gt; Link to all logs:<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6403/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-002_bootstrap.py</a><br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Error seems to be:<br>
&gt;&gt; &gt;&gt; 2017-04-19 18:58:58,774-0400 ERROR (jsonrpc/2)<br>
&gt;&gt; &gt;&gt; [storage.TaskManager.Task]<br>
&gt;&gt; &gt;&gt; (Task=&#39;8f9699ed-cc2f-434b-<wbr>aa1e-b3c8ff30324a&#39;) Unexpected error<br>
&gt;&gt; &gt;&gt; (task:871)<br>
&gt;&gt; &gt;&gt; Traceback (most recent call last):<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/storage/task.py&quot;<wbr>, line<br>
&gt;&gt; &gt;&gt; 878,<br>
&gt;&gt; &gt;&gt; in _run<br>
&gt;&gt; &gt;&gt;     return fn(*args, **kargs)<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/logUtils.py&quot;, line 52, in<br>
&gt;&gt; &gt;&gt; wrapper<br>
&gt;&gt; &gt;&gt;     res = f(*args, **kwargs)<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/share/vdsm/storage/hsm.<wbr>py&quot;, line 2709, in<br>
&gt;&gt; &gt;&gt; getStorageDomainInfo<br>
&gt;&gt; &gt;&gt;     dom = self.validateSdUUID(sdUUID)<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/share/vdsm/storage/hsm.<wbr>py&quot;, line 298, in validateSdUUID<br>
&gt;&gt; &gt;&gt;     sdDom = sdCache.produce(sdUUID=sdUUID)<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/share/vdsm/storage/sdc.<wbr>py&quot;, line 112, in produce<br>
&gt;&gt; &gt;&gt;     domain.getRealDomain()<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/share/vdsm/storage/sdc.<wbr>py&quot;, line 53, in getRealDomain<br>
&gt;&gt; &gt;&gt;     return self._cache._realProduce(self.<wbr>_sdUUID)<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/share/vdsm/storage/sdc.<wbr>py&quot;, line 136, in _realProduce<br>
&gt;&gt; &gt;&gt;     domain = self._findDomain(sdUUID)<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/share/vdsm/storage/sdc.<wbr>py&quot;, line 153, in _findDomain<br>
&gt;&gt; &gt;&gt;     return findMethod(sdUUID)<br>
&gt;&gt; &gt;&gt;   File &quot;/usr/share/vdsm/storage/sdc.<wbr>py&quot;, line 178, in<br>
&gt;&gt; &gt;&gt; _findUnfetchedDomain<br>
&gt;&gt; &gt;&gt;     raise se.StorageDomainDoesNotExist(<wbr>sdUUID)<br>
&gt;&gt; &gt;&gt; StorageDomainDoesNotExist: Storage domain does not exist:<br>
&gt;&gt; &gt;&gt; (u&#39;ac3bbc93-26ba-4ea8-8e76-<wbr>c5b761f01931&#39;,)<br>
&gt;&gt; &gt;&gt; 2017-04-19 18:58:58,777-0400 INFO  (jsonrpc/2)<br>
&gt;&gt; &gt;&gt; [storage.TaskManager.Task]<br>
&gt;&gt; &gt;&gt; (Task=&#39;8f9699ed-cc2f-434b-<wbr>aa1e-b3c8ff30324a&#39;) aborting: Task is<br>
&gt;&gt; &gt;&gt; aborted:<br>
&gt;&gt; &gt;&gt; &#39;Storage domain does not exist&#39; - code 358 (task:1176)<br>
&gt;&gt; &gt;&gt; 2017-04-19 18:58:58,777-0400 ERROR (jsonrpc/2) [storage.Dispatcher]<br>
&gt;&gt; &gt;&gt; {&#39;status&#39;: {&#39;message&#39;: &quot;Storage domain does not exist:<br>
&gt;&gt; &gt;&gt; (u&#39;ac3bbc93-26ba-4ea8-8e76-<wbr>c5b761f01931&#39;,)&quot;, &#39;code&#39;: 358}}<br>
&gt;&gt; &gt;&gt; (dispatcher:78)<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; ______________________________<wbr>_________________<br>
&gt;&gt; &gt;&gt; Devel mailing list<br>
&gt;&gt; &gt;&gt; <a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
&gt;&gt; &gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; ______________________________<wbr>_________________<br>
&gt;&gt; &gt; Devel mailing list<br>
&gt;&gt; &gt; <a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
&gt;&gt; &gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
&gt;<br>
&gt;<br>
______________________________<wbr>_________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a></blockquote></div>
</div></div></blockquote></div></div></div></blockquote></div>
</div></div></blockquote></div><br></div></div>