<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 12, 2017 at 2:39 PM, Yaniv Kaul <span dir="ltr">&lt;<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">It still has the heatbeat exceeded issue - please make sure you test with a fixed version:</div></blockquote><div><br></div><div>It uses the latest rpm with the fix:</div><div><br></div><div>---&gt; Package vdsm-jsonrpc-java.noarch 0:1.4.1-1.20170112092258.git1861532.el7.centos will be installed<br></div><div><br></div><div><div>* 1861532 - (HEAD -&gt; master, origin/master, origin/HEAD) version bump (4 hours ago) Piotr Kliczewski &lt;<a href="mailto:piotr.kliczewski@gmail.com">piotr.kliczewski@gmail.com</a>&gt;</div><div>* 9347d06 - (tag: v1.4.1) handle ssl closed status (19 hours ago) Piotr Kliczewski &lt;<a href="mailto:piotr.kliczewski@gmail.com">piotr.kliczewski@gmail.com</a>&gt;</div><div><br></div></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><pre style="color:rgb(0,0,0)">2017-01-12 05:50:27,021-05 DEBUG [org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.ReactorClient] (SSL Stomp Reactor) [103d0f0a] Heartbeat exceeded. Closing channel
2017-01-12 05:50:27,022-05 DEBUG [org.ovirt.vdsm.jsonrpc.<wbr>client.internal.<wbr>ResponseWorker] (ResponseWorker) [] Message received: {&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;error&quot;:{&quot;<wbr>code&quot;:&quot;<a href="http://192.168.201.4:389513927" target="_blank">192.168.201.4:389513927</a><wbr>&quot;,&quot;message&quot;:&quot;Heartbeat exceeded&quot;},&quot;id&quot;:null}</pre><pre style="color:rgb(0,0,0)"><br></pre><pre style="color:rgb(0,0,0)">Then we can start and understand the failures:</pre><pre style="color:rgb(0,0,0)"><pre>2017-01-12 05:50:27,055-05 ERROR [org.ovirt.engine.core.bll.<wbr>network.host.<wbr>HostSetupNetworksCommand] (org.ovirt.thread.pool-7-<wbr>thread-2) [76b0383f] Command &#39;org.ovirt.engine.core.bll.<wbr>network.host.<wbr>HostSetupNetworksCommand&#39; failed: EngineException: org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exceeded (Failed with error VDS_NETWORK_ERROR and code 5022)
2017-01-12 05:50:27,058-05 INFO  [org.ovirt.engine.core.bll.<wbr>network.host.<wbr>HostSetupNetworksCommand] (org.ovirt.thread.pool-7-<wbr>thread-2) [76b0383f] Lock freed to object &#39;EngineLock:{exclusiveLocks=&#39;[<wbr>HOST_NETWORK40eb11ba-e6ac-<wbr>478a-b8b1-73b7892ace65=&lt;HOST_<wbr>NETWORK, ACTION_TYPE_FAILED_SETUP_<wbr>NETWORKS_OR_REFRESH_IN_<wbr>PROGRESS&gt;]&#39;, sharedLocks=&#39;null&#39;}&#39;
2017-01-12 05:50:27,061-05 WARN  [org.ovirt.engine.core.<wbr>vdsbroker.VdsManager] (org.ovirt.thread.pool-7-<wbr>thread-19) [76b0383f] Host &#39;lago-basic-suite-master-<wbr>host1&#39; is not responding.
2017-01-12 05:50:27,074-05 WARN  [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector] (org.ovirt.thread.pool-7-<wbr>thread-19) [76b0383f] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host lago-basic-suite-master-host1 is not responding. Host cannot be fenced automatically because power management for the host is disabled.
2017-01-12 05:50:27,079-05 ERROR [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector] (org.ovirt.thread.pool-7-<wbr>thread-2) [76b0383f] Failed to configure management network: Failed to configure management network on host lago-basic-suite-master-host1 due to setup networks failure.
2017-01-12 05:50:27,079-05 ERROR [org.ovirt.engine.core.bll.<wbr>hostdeploy.<wbr>InstallVdsInternalCommand] (org.ovirt.thread.pool-7-<wbr>thread-2) [76b0383f] Exception: org.ovirt.engine.core.bll.<wbr>network.NetworkConfigurator$<wbr>NetworkConfiguratorException: Failed to configure management network
        at org.ovirt.engine.core.bll.<wbr>network.NetworkConfigurator.<wbr>configureManagementNetwork(<wbr>NetworkConfigurator.java:247) [bll.jar:]</pre></pre></div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="gmail-h5">On Thu, Jan 12, 2017 at 2:12 PM, Daniel Belenky <span dir="ltr">&lt;<a href="mailto:dbelenky@redhat.com" target="_blank">dbelenky@redhat.com</a>&gt;</span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5"><div dir="ltr">Hi all,<div><br></div><div>test-repo ovirt experimental master job fails, and it seems that there is an issue with &#39;add_host&#39; phase under the &#39;<i>bootstrap</i>&#39; suite.</div><div>From the logs, it seems that the suite was unable to fire up the host / something is wrong with host</div><div><br></div><div><font face="monospace, monospace" style="background-color:rgb(255,255,255)" color="#444444">&lt;error<span class="gmail-m_829647681265246765m_-8437613926293775782gmail-html-attribute"> <span class="gmail-m_829647681265246765m_-8437613926293775782gmail-html-attribute-name">type</span>=&quot;<span class="gmail-m_829647681265246765m_-8437613926293775782gmail-html-attribute-value">exceptions.Runtim<wbr>eError</span>&quot;</span><span class="gmail-m_829647681265246765m_-8437613926293775782gmail-html-attribute"> <span class="gmail-m_829647681265246765m_-8437613926293775782gmail-html-attribute-name">message</span>=&quot;<span class="gmail-m_829647681265246765m_-8437613926293775782gmail-html-attribute-value">Host lago-basic-suite-master-host1 is in non operational state -------------------- &gt;&gt; begin captured logging &lt;&lt; -------------------- lago.ssh: DEBUG: start task Get ssh client for lago-basic-suite-master-host0 lago.ssh: DEBUG: Still got 100 tries for lago-basic-suite-master-host0 lago.ssh: DEBUG: end task Get ssh client for lago-basic-suite-master-host0 lago.ssh: DEBUG: Running aab0eff8 on lago-basic-suite-master-host0: yum install -y iptables lago.ssh: DEBUG: Command aab0eff8 on lago-basic-suite-master-host0 returned with 0 lago.ssh: DEBUG: Command aab0eff8 on lago-basic-suite-master-host0 output: Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: <a href="http://centos.host-engine.com" target="_blank">centos.host-engine.com</a> * extras: <a href="http://linux.mirrors.es.net" target="_blank">linux.mirrors.es.net</a> * updates: <a href="http://mirror.n5tech.com" target="_blank">mirror.n5tech.com</a> Package iptables-1.4.21-17.el7.x86_64 already installed and latest version Nothing to do lago.ssh: DEBUG: start task Get ssh client for lago-basic-suite-master-host1 lago.ssh: DEBUG: Still got 100 tries for lago-basic-suite-master-host1 lago.ssh: DEBUG: end task Get ssh client for lago-basic-suite-master-host1 lago.ssh: DEBUG: Running ab5c94f2 on lago-basic-suite-master-host1: yum install -y iptables lago.ssh: DEBUG: Command ab5c94f2 on lago-basic-suite-master-host1 returned with 0 lago.ssh: DEBUG: Command ab5c94f2 on lago-basic-suite-master-host1 output: Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: <a href="http://mirror.n5tech.com" target="_blank">mirror.n5tech.com</a> * extras: <a href="http://ftp.osuosl.org" target="_blank">ftp.osuosl.org</a> * updates: <a href="http://mirrors.usc.edu" target="_blank">mirrors.usc.edu</a> Package iptables-1.4.21-17.el7.x86_64 already installed and latest version Nothing to do ovirtlago.testlib: ERROR: * Unhandled exception in &lt;function _host_is_up at 0x322e938&gt; Traceback (most recent call last): File &quot;/usr/lib/python2.7/site-packa<wbr>ges/ovirtlago/testlib.py&quot;, line 217, in assert_equals_within res = func() File &quot;/home/jenkins/workspace/test-<wbr>repo_ovirt_experimental_master<wbr>/ovirt-system-tests/basic-<wbr>suite-master/test-scenarios/<wbr>002_bootstrap.py&quot;, line 162, in _host_is_up raise RuntimeError(&#39;Host %s is in non operational state&#39; % <a href="http://host.name" target="_blank">host.name</a>()) RuntimeError: Host lago-basic-suite-master-host1 is in non operational state --------------------- &gt;&gt; end captured logging &lt;&lt; ---------------------</span>&quot;</span>&gt;</font></div><div><font color="#666666" face="monospace, monospace"><br></font></div><div><br></div><div><font color="#000000" face="arial, helvetica, sans-serif">From the engine.log, I found a timeout in the rpc call (but this error is seen on jobs that success too, so might not be relevant(?))</font></div><div><pre><font color="#444444">2017-01-12 05:49:53,383-05 ERROR [org.ovirt.engine.core.vdsbrok<wbr>er.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-7-threa<wbr>d-2) [76b0383f] Command &#39;PollVDSCommand(HostName = lago-basic-suite-master-host1, VdsIdVDSCommandParametersBase:<wbr>{runAsync=&#39;true&#39;, hostId=&#39;40eb11ba-e6ac-478a-b8b<wbr>1-73b7892ace65&#39;})&#39; execution failed: VDSGenericException: VDSNetworkException: Timeout during rpc call
2017-01-12 05:49:53,383-05 DEBUG [org.ovirt.engine.core.vdsbrok<wbr>er.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-7-threa<wbr>d-2) [76b0383f] Exception: org.ovirt.engine.core.vdsbroke<wbr>r.vdsbroker.VDSNetworkExceptio<wbr>n: VDSGenericException: VDSNetworkException: Timeout during rpc call</font></pre></div><div>... (the full error is very long, so I wont paste it here, its in the<b> engine.log</b>)</div><div><pre><font color="#444444">2017-01-12 05:49:58,291-05 ERROR [org.ovirt.engine.core.vdsbrok<wbr>er.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-7-threa<wbr>d-1) [30b2ca77] Timeout waiting for VDSM response: Internal timeout occured
</font></pre><div><br></div><div><br></div><div>In the host&#39;s vdsm.log, there are some errors too:</div><div><pre><font color="#444444">2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for unfetched domain 380623d8-1e85-4831-9048-3d0593<wbr>2f3d3a (sdc:151)
2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for domain 380623d8-1e85-4831-9048-3d0593<wbr>2f3d3a (sdc:168)
2017-01-12 05:51:48,395 WARN  (jsonrpc/0) [storage.LVM] lvm vgs failed: 5 [] [&#39;  WARNING: Not using lvmetad because config setting use_lvmetad=0.&#39;, &#39;  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).&#39;, &#39;  Volume group &quot;380623d8-1e85-4831-9048-3d059<wbr>32f3d3a&quot; not found&#39;, &#39;  Cannot process volume group 380623d8-1e85-4831-9048-3d0593<wbr>2f3d3a&#39;] (lvm:377)
2017-01-12 05:51:48,398 ERROR (jsonrpc/0) [storage.StorageDomainCache] domain 380623d8-1e85-4831-9048-3d0593<wbr>2f3d3a not found (sdc:157)
Traceback (most recent call last):
  File &quot;/usr/share/vdsm/storage/sdc.p<wbr>y&quot;, line 155, in _findDomain
    dom = findMethod(sdUUID)
  File &quot;/usr/share/vdsm/storage/sdc.p<wbr>y&quot;, line 185, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(s<wbr>dUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u&#39;380623d8-1e85-4831-9048-3d0<wbr>5932f3d3a&#39;,)</font></pre><pre><br></pre><pre>and</pre><pre><pre style="color:rgb(0,0,0)">2017-01-12 05:53:45,375 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from &lt;yajsonrpc.betterAsyncore.Disp<wbr>atcher connected (&#39;::1&#39;, 43814, 0, 0) at 0x235a2d8&gt;: unexpected eof (betterAsyncore:119)
</pre><div><br></div></pre><pre><font color="#444444"><a href="http://jenkins.ovirt.org/view/experimental%20jobs/job/test-repo_ovirt_experimental_master/4693/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/" target="_blank">Link to Jenkins</a><br></font></pre><pre><font face="arial, helvetica, sans-serif" color="#000000">Can someone please take a look?</font></pre><pre><font face="arial, helvetica, sans-serif" color="#000000">Thanks, </font></pre></div><div class="gmail-m_829647681265246765m_-8437613926293775782gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div><div><span style="background-color:rgb(255,255,255)"><span style="color:rgb(0,0,0)"><i>Daniel Belenky<br></i></span></span></div><span style="background-color:rgb(255,255,255)"><span style="color:rgb(0,0,0)"><i>RHV DevOps<br></i></span></span></div><span style="background-color:rgb(255,255,255)"><span style="color:rgb(0,0,0)"><i>Red Hat Israel<br></i></span></span></div></div></div></div></div></div></div>
</div></div>
<br></div></div>______________________________<wbr>_________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/devel</a><br></blockquote></div><br></div>
<br>______________________________<wbr>_________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>Eyal Edri<br>Associate Manager</div><div>RHV DevOps<br>EMEA ENG Virtualization R&amp;D<br>Red Hat Israel<br><br>phone: +972-9-7692018<br>irc: eedri (on #tlv #rhev-dev #rhev-integ)</div></div></div></div></div></div></div>
</div></div>