<div dir="ltr"><div>I was able to reproduce the error [1] on a manual run with only new vdsm from [2],</div><div>and also to verify that w/o this change, while using latest tested run [3] it works.</div><div><br></div><div>So I think this proves quite clearly the problem is one of the latest VDSM patches.</div><div><br></div><div>I&#39;m running again the test with the suspected bad VDSM and hopefully will be able to extract the env to tar.gz file</div><div>which anyone can import using the lago demo tool.</div><div><br></div><div><br></div><div><br></div><div>[1] <a href="http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/748/" target="_blank">http://jenkins.ovirt.org/<wbr>view/oVirt%20system%20tests/<wbr>job/ovirt-system-tests_manual/<wbr>748/</a></div><div>[2] <a href="http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/" target="_blank">http://jenkins.ovirt.org/<wbr>job/vdsm_master_build-<wbr>artifacts-el7-x86_64/2694/</a></div><div>[3] <a href="http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/747/" target="_blank">http://jenkins.ovirt.org/<wbr>view/oVirt%20system%20tests/<wbr>job/ovirt-system-tests_manual/<wbr>747/</a></div><div><br></div><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <span dir="ltr">&lt;<a href="mailto:ngoldin@redhat.com" target="_blank">ngoldin@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi, sorry for posting late, I had a brief look at this yesterday:<br>
1. I couldn&#39;t replicate it locally - which means it is most likely a<br>
recent change.<br>
2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new<br>
version is used this week(0.40) - and they seem OK - specifically<br>
memroy and vcpus(which was my initial suspect).<br>
3. I saw two Engine patches, a bit prior to the time it started to<br>
fail, which *might* in my common sense be related, but it is out of my<br>
scope to tell(CC&#39;ed patch owners):<br>
<br>
core: Make VmAnalyzer to treat a migrated Paused VM as success -<br>
<a href="https://gerrit.ovirt.org/78305" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/78305</a><br>
<br>
fix custom fencing default config setting<br>
<a href="https://gerrit.ovirt.org/78720" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/78720</a><br>
<br>
Shot in the wild - Could it be that the &#39;CPUOverload&#39; filter was not<br>
active before for some reason?<br>
<br>
Also, there are some exceptions in host0 vdsm log[1], failing to get<br>
VM stats, though I can&#39;t tell if they are specific to this failure.<br>
<br>
Of course this is not a complete analysis, I hope it helps.<br>
<br>
<br>
[1] <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/t<wbr>est-repo_ovirt_experimental_ma<wbr>ster/7431/artifact/exported-ar<wbr>tifacts/basic-suit-master-el7/<wbr>test_logs/basic-suite-master/<wbr>post-006_migrations.py/lago-<wbr>basic-suite-master-host0/_var_<wbr>log/vdsm/vdsm.log</a><br>
<span class="m_2989331196243842324gmail-HOEnZb"><font color="#888888"><br>
<br>
Nadav.<br>
</font></span><div class="m_2989331196243842324gmail-HOEnZb"><div class="m_2989331196243842324gmail-h5"><br>
<br>
<br>
<br>
<br>
On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri &lt;<a href="mailto:eedri@redhat.com" target="_blank">eedri@redhat.com</a>&gt; wrote:<br>
&gt;<br>
&gt;<br>
&gt; On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek<br>
&gt; &lt;<a href="mailto:michal.skrivanek@redhat.com" target="_blank">michal.skrivanek@redhat.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On 3 Jul 2017, at 15:35, Shlomo Ben David &lt;<a href="mailto:sbendavi@redhat.com" target="_blank">sbendavi@redhat.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Hi,<br>
&gt;&gt;<br>
&gt;&gt; Test failed: [ 006_migrations.migrate_vm ]<br>
&gt;&gt; Link to suspected patches: N/A<br>
&gt;&gt; Link to Job:<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/t<wbr>est-repo_ovirt_experimental_ma<wbr>ster/7431/</a><br>
&gt;&gt; Link to all logs:<br>
&gt;&gt; Error snippet from the log:<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/t<wbr>est-repo_ovirt_experimental_ma<wbr>ster/7431/artifact/exported-ar<wbr>tifacts/basic-suit-master-el7/<wbr>test_logs/basic-suite-master/<wbr>post-006_migrations.py/</a><br>
&gt;&gt;<br>
&gt;&gt; &lt;error&gt;<br>
&gt;&gt;<br>
&gt;&gt;  &quot;Fault reason is &quot;Operation Failed&quot;. Fault detail is &quot;[Cannot migrate VM.<br>
&gt;&gt; There is no host that satisfies current scheduling constraints. See below<br>
&gt;&gt; for details:, The host lago-basic-suite-master-host0 did not satisfy<br>
&gt;&gt; internal filter CPUOverloaded because its CPU is too loaded.]&quot;<br>
&gt;&gt;<br>
&gt;&gt; &lt;/error&gt;<br>
&gt;&gt;<br>
&gt;&gt; &lt;engine log&gt;<br>
&gt;&gt;<br>
&gt;&gt; 2017-07-02 16:43:22,829-04 INFO<br>
&gt;&gt; [org.ovirt.engine.core.bll.Mig<wbr>rateVmToServerCommand] (default task-27)<br>
&gt;&gt; [87508047-fdc5-4a2f-9692-c83f7<wbr>b55bbc2] Lock Acquired to object<br>
&gt;&gt; &#39;EngineLock:{exclusiveLocks=&#39;[<wbr>2b34910d-cef2-44d6-a274-30e847<wbr>3eb5d9=VM]&#39;,<br>
&gt;&gt; sharedLocks=&#39;&#39;}&#39;<br>
&gt;&gt; 2017-07-02 16:43:22,833-04 DEBUG<br>
&gt;&gt; [org.ovirt.engine.core.dal.dbb<wbr>roker.PostgresDbEngineDialect$<wbr>PostgresSimpleJdbcCall]<br>
&gt;&gt; (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr>b55bbc2] Compiled stored<br>
&gt;&gt; procedure. Call string is [{call getdiskvmelementspluggedtovm(?<wbr>)}]<br>
&gt;&gt; 2017-07-02 16:43:22,833-04 DEBUG<br>
&gt;&gt; [org.ovirt.engine.core.dal.dbb<wbr>roker.PostgresDbEngineDialect$<wbr>PostgresSimpleJdbcCall]<br>
&gt;&gt; (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr>b55bbc2] SqlCall for<br>
&gt;&gt; procedure [GetDiskVmElementsPluggedToVm] compiled<br>
&gt;&gt; 2017-07-02 16:43:22,843-04 DEBUG<br>
&gt;&gt; [org.ovirt.engine.core.dal.dbb<wbr>roker.PostgresDbEngineDialect$<wbr>PostgresSimpleJdbcCall]<br>
&gt;&gt; (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr>b55bbc2] Compiled stored<br>
&gt;&gt; procedure. Call string is [{call getattacheddisksnapshotstovm(?<wbr>, ?)}]<br>
&gt;&gt; 2017-07-02 16:43:22,843-04 DEBUG<br>
&gt;&gt; [org.ovirt.engine.core.dal.dbb<wbr>roker.PostgresDbEngineDialect$<wbr>PostgresSimpleJdbcCall]<br>
&gt;&gt; (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr>b55bbc2] SqlCall for<br>
&gt;&gt; procedure [GetAttachedDiskSnapshotsToVm] compiled<br>
&gt;&gt; 2017-07-02 16:43:22,919-04 INFO<br>
&gt;&gt; [org.ovirt.engine.core.bll.sch<wbr>eduling.SchedulingManager] (default task-27)<br>
&gt;&gt; [87508047-fdc5-4a2f-9692-c83f7<wbr>b55bbc2] Candidate host<br>
&gt;&gt; &#39;lago-basic-suite-master-host0<wbr>&#39; (&#39;46bdc63d-98f5-4eee-81aa-2fb8<wbr>8b8f7cbe&#39;) was<br>
&gt;&gt; filtered out by &#39;VAR__FILTERTYPE__INTERNAL&#39; filter &#39;CPUOverloaded&#39;<br>
&gt;&gt; (correlation id: null)<br>
&gt;&gt; 2017-07-02 16:43:22,920-04 WARN<br>
&gt;&gt; [org.ovirt.engine.core.bll.Mig<wbr>rateVmToServerCommand] (default task-27)<br>
&gt;&gt; [87508047-fdc5-4a2f-9692-c83f7<wbr>b55bbc2] Validation of action<br>
&gt;&gt; &#39;MigrateVmToServer&#39; failed for user admin@internal-authz. Reasons:<br>
&gt;&gt; VAR__ACTION__MIGRATE,VAR__TYPE<wbr>__VM,SCHEDULING_ALL_HOSTS_FILT<wbr>ERED_OUT,VAR__FILTERTYPE__INTE<wbr>RNAL,$hostName<br>
&gt;&gt; lago-basic-suite-master-host0,<wbr>$filterName<br>
&gt;&gt; CPUOverloaded,VAR__DETAIL__CPU<wbr>_OVERLOADED,SCHEDULING_HOST_<wbr>FILTERED_REASON_WITH_DETAIL<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; This has nothing to do with migration<br>
&gt;&gt; The CPUOverload is a scheduling policy, unless there was any change in<br>
&gt;&gt; that area the obvious explanation would be that the host has a CPU overload<br>
&gt;&gt; condition.<br>
&gt;&gt; I briefly looked at logs and see &quot;&quot;cpuUser&quot;: &quot;83.40&quot;, &quot;cpuSys&quot;: &quot;16.59&quot;,<br>
&gt;&gt; &quot;cpuIdle&quot;: “0.08”” which indeed suggests an overload, from the same sample I<br>
&gt;&gt; can see it’s vdsm (&quot;cpuUserVdsmd&quot;: “77.38”, cpuSysVdsmd&quot;: “18.44&quot;<br>
&gt;&gt;<br>
&gt;&gt; Since similar values are consistently being reported for some time, and<br>
&gt;&gt; there is a setupNetworks and storage rescan prior to the the failure, and<br>
&gt;&gt; there is no other indication of anything wrong, I’d just say the environment<br>
&gt;&gt; or the order of tests or timing has changed, but nothing wrong with the<br>
&gt;&gt; oVirt code<br>
&gt;&gt; Did any of that changed recently? Does it reproduce locally?<br>
&gt;<br>
&gt;<br>
&gt; AFAIK, no significant environment changes or tests were done.<br>
&gt; We will try to reproduce it locally and also on the manual job,  but from<br>
&gt; what it looks it is very consistent (unlike other race failures we&#39;ve seen<br>
&gt; lately ) and continues to fails on the same tests, so its either a change in<br>
&gt; oVirt or something else that we&#39;re not thinking on.<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Thanks,<br>
&gt;&gt; michal<br>
&gt;&gt;<br>
&gt;&gt; 2017-07-02 16:43:22,920-04 INFO<br>
&gt;&gt; [org.ovirt.engine.core.bll.Mig<wbr>rateVmToServerCommand] (default task-27)<br>
&gt;&gt; [87508047-fdc5-4a2f-9692-c83f7<wbr>b55bbc2] Lock freed to object<br>
&gt;&gt; &#39;EngineLock:{exclusiveLocks=&#39;[<wbr>2b34910d-cef2-44d6-a274-30e847<wbr>3eb5d9=VM]&#39;,<br>
&gt;&gt; sharedLocks=&#39;&#39;}&#39;<br>
&gt;&gt; 2017-07-02 16:43:22,929-04 DEBUG<br>
&gt;&gt; [org.ovirt.engine.core.utils.t<wbr>imer.FixedDelayJobListener]<br>
&gt;&gt; (DefaultQuartzScheduler7) [] Rescheduling<br>
&gt;&gt; DEFAULT.org.ovirt.engine.core.<wbr>bll.ColdRebootAutoStartVmsRunn<wbr>er.startFailedAutoStartVms#-92<wbr>23372036854775733<br>
&gt;&gt; as there is no unfired trigger.<br>
&gt;&gt; 2017-07-02 16:43:22,932-04 ERROR<br>
&gt;&gt; [org.ovirt.engine.api.restapi.<wbr>resource.AbstractBackendResour<wbr>ce] (default<br>
&gt;&gt; task-27) [] Operation Failed: [Cannot migrate VM. There is no host that<br>
&gt;&gt; satisfies current scheduling constraints. See below for details:, The host<br>
&gt;&gt; lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded<br>
&gt;&gt; because its CPU is too loaded.]<br>
&gt;&gt; 2017-07-02 16:43:23,331-04 DEBUG<br>
&gt;&gt; [org.ovirt.engine.core.utils.t<wbr>imer.FixedDelayJobListener]<br>
&gt;&gt; (DefaultQuartzScheduler2) [] Rescheduling<br>
&gt;&gt; DEFAULT.org.ovirt.engine.core.<wbr>bll.HaAutoStartVmsRunner.start<wbr>FailedAutoStartVms#-9223372036<wbr>854775793<br>
&gt;&gt; as there is no unfired trigger.<br>
&gt;&gt; 2017-07-02 16:43:23,332-04 DEBUG<br>
&gt;&gt; [org.ovirt.engine.core.utils.t<wbr>imer.FixedDelayJobListener]<br>
&gt;&gt; (DefaultQuartzScheduler2) [] Rescheduling<br>
&gt;&gt; DEFAULT.org.ovirt.engine.core.<wbr>bll.tasks.CommandCallbacksPoll<wbr>er.invokeCallbackMethods#-9223<wbr>372036854775783<br>
&gt;&gt; as there is no unfired trigger.<br>
&gt;&gt;<br>
&gt;&gt; &lt;engine log&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Best Regards,<br>
&gt;&gt;<br>
&gt;&gt; Shlomi Ben-David | Software Engineer | Red Hat ISRAEL<br>
&gt;&gt; RHCSA | RHCVA | RHCE<br>
&gt;&gt; IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)<br>
&gt;&gt;<br>
&gt;&gt; OPEN SOURCE - 1 4 011 &amp;&amp; 011 4 1<br>
&gt;&gt;<br>
&gt;&gt; ______________________________<wbr>_________________<br>
&gt;&gt; Devel mailing list<br>
&gt;&gt; <a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/devel</a><br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; ______________________________<wbr>_________________<br>
&gt;&gt; Devel mailing list<br>
&gt;&gt; <a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/devel</a><br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt;<br>
&gt; Eyal edri<br>
&gt;<br>
&gt;<br>
&gt; ASSOCIATE MANAGER<br>
&gt;<br>
&gt; RHV DevOps<br>
&gt;<br>
&gt; EMEA VIRTUALIZATION R&amp;D<br>
&gt;<br>
&gt;<br>
&gt; Red Hat EMEA<br>
&gt;<br>
</div></div><span class="m_2989331196243842324gmail-im m_2989331196243842324gmail-HOEnZb">&gt; TRIED. TESTED. TRUSTED.<br>
&gt; phone: <a href="tel:%2B972-9-7692018" value="+97297692018" target="_blank">+972-9-7692018</a><br>
&gt; irc: eedri (on #tlv #rhev-dev #rhev-integ)<br>
&gt;<br>
</span><div class="m_2989331196243842324gmail-HOEnZb"><div class="m_2989331196243842324gmail-h5">&gt; ______________________________<wbr>_________________<br>
&gt; Devel mailing list<br>
&gt; <a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/devel</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="m_2989331196243842324gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><p style="font-family:overpass,sans-serif;margin:0px;padding:0px;font-size:14px;text-transform:uppercase;font-weight:bold"><font color="#cc0000">Eyal edri</font></p><p style="color:rgb(0,0,0);font-family:overpass,sans-serif;font-weight:bold;margin:0px;padding:0px;font-size:14px;text-transform:uppercase"><br></p><p style="color:rgb(0,0,0);font-family:overpass,sans-serif;font-size:10px;margin:0px 0px 4px;text-transform:uppercase">ASSOCIATE MANAGER</p><p style="color:rgb(0,0,0);font-family:overpass,sans-serif;font-size:10px;margin:0px 0px 4px;text-transform:uppercase">RHV DevOps</p><p style="color:rgb(0,0,0);font-family:overpass,sans-serif;font-size:10px;margin:0px 0px 4px;text-transform:uppercase">EMEA VIRTUALIZATION R&amp;D</p><p style="color:rgb(0,0,0);font-family:overpass,sans-serif;font-size:10px;margin:0px 0px 4px;text-transform:uppercase"><br></p><p style="font-family:overpass,sans-serif;margin:0px;font-size:10px;color:rgb(153,153,153)"><a href="https://www.redhat.com/" style="color:rgb(0,136,206);margin:0px" target="_blank">Red Hat EMEA</a></p><table border="0" style="color:rgb(0,0,0);font-family:overpass,sans-serif;font-size:medium"><tbody><tr><td width="100px"><a href="https://red.ht/sig" style="color:rgb(17,85,204)" target="_blank"><img src="https://www.redhat.com/profiles/rh/themes/redhatdotcom/img/logo-red-hat-black.png" width="90" height="auto"></a></td><td style="font-size:10px"><a href="https://redhat.com/trusted" style="color:rgb(204,0,0);font-weight:bold" target="_blank">TRIED. TESTED. TRUSTED.</a></td></tr></tbody></table></div><div>phone: <a href="tel:+972%209-769-2018" value="+97297692018" target="_blank">+972-9-7692018</a><br>irc: eedri (on #tlv #rhev-dev #rhev-integ)</div></div></div></div></div></div></div></div></div>
</div></div>