<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 4 Jul 2017, at 13:00, Eyal Edri <<a href="mailto:eedri@redhat.com" class="">eedri@redhat.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">I was able to reproduce the error [1] on a manual run with only new vdsm from [2],</div><div class="">and also to verify that w/o this change, while using latest tested run [3] it works.</div><div class=""><br class=""></div><div class="">So I think this proves quite clearly the problem is one of the latest VDSM patches.</div></div></div></blockquote><div><br class=""></div>There is only a single patch between vdsms [1] and [3]</div><div><a href="https://gerrit.ovirt.org/#/c/78536" class="">https://gerrit.ovirt.org/#/c/78536</a></div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">I'm running again the test with the suspected bad VDSM and hopefully will be able to extract the env to tar.gz file</div><div class="">which anyone can import using the lago demo tool.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">[1] <a href="http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/748/" target="_blank" class="">http://jenkins.ovirt.org/<wbr class="">view/oVirt%20system%20tests/<wbr class="">job/ovirt-system-tests_manual/<wbr class="">748/</a></div><div class="">[2] <a href="http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/" target="_blank" class="">http://jenkins.ovirt.org/<wbr class="">job/vdsm_master_build-<wbr class="">artifacts-el7-x86_64/2694/</a></div><div class="">[3] <a href="http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/747/" target="_blank" class="">http://jenkins.ovirt.org/<wbr class="">view/oVirt%20system%20tests/<wbr class="">job/ovirt-system-tests_manual/<wbr class="">747/</a></div><div class=""><br class=""></div><br class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <span dir="ltr" class=""><<a href="mailto:ngoldin@redhat.com" target="_blank" class="">ngoldin@redhat.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi, sorry for posting late, I had a brief look at this yesterday:<br class="">
1. I couldn't replicate it locally - which means it is most likely a<br class="">
recent change.<br class="">
2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new<br class="">
version is used this week(0.40) - and they seem OK - specifically<br class="">
memroy and vcpus(which was my initial suspect).<br class="">
3. I saw two Engine patches, a bit prior to the time it started to<br class="">
fail, which *might* in my common sense be related, but it is out of my<br class="">
scope to tell(CC'ed patch owners):<br class="">
<br class="">
core: Make VmAnalyzer to treat a migrated Paused VM as success -<br class="">
<a href="https://gerrit.ovirt.org/78305" rel="noreferrer" target="_blank" class="">https://gerrit.ovirt.org/78305</a><br class="">
<br class="">
fix custom fencing default config setting<br class="">
<a href="https://gerrit.ovirt.org/78720" rel="noreferrer" target="_blank" class="">https://gerrit.ovirt.org/78720</a><br class="">
<br class="">
Shot in the wild - Could it be that the 'CPUOverload' filter was not<br class="">
active before for some reason?<br class="">
<br class="">
Also, there are some exceptions in host0 vdsm log[1], failing to get<br class="">
VM stats, though I can't tell if they are specific to this failure.<br class="">
<br class="">
Of course this is not a complete analysis, I hope it helps.<br class="">
<br class="">
<br class="">
[1] <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log" rel="noreferrer" target="_blank" class="">http://jenkins.ovirt.org/job/t<wbr class="">est-repo_ovirt_experimental_ma<wbr class="">ster/7431/artifact/exported-ar<wbr class="">tifacts/basic-suit-master-el7/<wbr class="">test_logs/basic-suite-master/<wbr class="">post-006_migrations.py/lago-<wbr class="">basic-suite-master-host0/_var_<wbr class="">log/vdsm/vdsm.log</a><br class="">
<span class="m_2989331196243842324gmail-HOEnZb"><font color="#888888" class=""><br class="">
<br class="">
Nadav.<br class="">
</font></span><div class="m_2989331196243842324gmail-HOEnZb"><div class="m_2989331196243842324gmail-h5"><br class="">
<br class="">
<br class="">
<br class="">
<br class="">
On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <<a href="mailto:eedri@redhat.com" target="_blank" class="">eedri@redhat.com</a>> wrote:<br class="">
><br class="">
><br class="">
> On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek<br class="">
> <<a href="mailto:michal.skrivanek@redhat.com" target="_blank" class="">michal.skrivanek@redhat.com</a>> wrote:<br class="">
>><br class="">
>><br class="">
>> On 3 Jul 2017, at 15:35, Shlomo Ben David <<a href="mailto:sbendavi@redhat.com" target="_blank" class="">sbendavi@redhat.com</a>> wrote:<br class="">
>><br class="">
>> Hi,<br class="">
>><br class="">
>> Test failed: [ 006_migrations.migrate_vm ]<br class="">
>> Link to suspected patches: N/A<br class="">
>> Link to Job:<br class="">
>> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/" rel="noreferrer" target="_blank" class="">http://jenkins.ovirt.org/job/t<wbr class="">est-repo_ovirt_experimental_ma<wbr class="">ster/7431/</a><br class="">
>> Link to all logs:<br class="">
>> Error snippet from the log:<br class="">
>> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/" rel="noreferrer" target="_blank" class="">http://jenkins.ovirt.org/job/t<wbr class="">est-repo_ovirt_experimental_ma<wbr class="">ster/7431/artifact/exported-ar<wbr class="">tifacts/basic-suit-master-el7/<wbr class="">test_logs/basic-suite-master/<wbr class="">post-006_migrations.py/</a><br class="">
>><br class="">
>> <error><br class="">
>><br class="">
>> "Fault reason is "Operation Failed". Fault detail is "[Cannot migrate VM.<br class="">
>> There is no host that satisfies current scheduling constraints. See below<br class="">
>> for details:, The host lago-basic-suite-master-host0 did not satisfy<br class="">
>> internal filter CPUOverloaded because its CPU is too loaded.]"<br class="">
>><br class="">
>> </error><br class="">
>><br class="">
>> <engine log><br class="">
>><br class="">
>> 2017-07-02 16:43:22,829-04 INFO<br class="">
>> [org.ovirt.engine.core.bll.Mig<wbr class="">rateVmToServerCommand] (default task-27)<br class="">
>> [87508047-fdc5-4a2f-9692-c83f7<wbr class="">b55bbc2] Lock Acquired to object<br class="">
>> 'EngineLock:{exclusiveLocks='[<wbr class="">2b34910d-cef2-44d6-a274-30e847<wbr class="">3eb5d9=VM]',<br class="">
>> sharedLocks=''}'<br class="">
>> 2017-07-02 16:43:22,833-04 DEBUG<br class="">
>> [org.ovirt.engine.core.dal.dbb<wbr class="">roker.PostgresDbEngineDialect$<wbr class="">PostgresSimpleJdbcCall]<br class="">
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr class="">b55bbc2] Compiled stored<br class="">
>> procedure. Call string is [{call getdiskvmelementspluggedtovm(?<wbr class="">)}]<br class="">
>> 2017-07-02 16:43:22,833-04 DEBUG<br class="">
>> [org.ovirt.engine.core.dal.dbb<wbr class="">roker.PostgresDbEngineDialect$<wbr class="">PostgresSimpleJdbcCall]<br class="">
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr class="">b55bbc2] SqlCall for<br class="">
>> procedure [GetDiskVmElementsPluggedToVm] compiled<br class="">
>> 2017-07-02 16:43:22,843-04 DEBUG<br class="">
>> [org.ovirt.engine.core.dal.dbb<wbr class="">roker.PostgresDbEngineDialect$<wbr class="">PostgresSimpleJdbcCall]<br class="">
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr class="">b55bbc2] Compiled stored<br class="">
>> procedure. Call string is [{call getattacheddisksnapshotstovm(?<wbr class="">, ?)}]<br class="">
>> 2017-07-02 16:43:22,843-04 DEBUG<br class="">
>> [org.ovirt.engine.core.dal.dbb<wbr class="">roker.PostgresDbEngineDialect$<wbr class="">PostgresSimpleJdbcCall]<br class="">
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr class="">b55bbc2] SqlCall for<br class="">
>> procedure [GetAttachedDiskSnapshotsToVm] compiled<br class="">
>> 2017-07-02 16:43:22,919-04 INFO<br class="">
>> [org.ovirt.engine.core.bll.sch<wbr class="">eduling.SchedulingManager] (default task-27)<br class="">
>> [87508047-fdc5-4a2f-9692-c83f7<wbr class="">b55bbc2] Candidate host<br class="">
>> 'lago-basic-suite-master-host0<wbr class="">' ('46bdc63d-98f5-4eee-81aa-2fb8<wbr class="">8b8f7cbe') was<br class="">
>> filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded'<br class="">
>> (correlation id: null)<br class="">
>> 2017-07-02 16:43:22,920-04 WARN<br class="">
>> [org.ovirt.engine.core.bll.Mig<wbr class="">rateVmToServerCommand] (default task-27)<br class="">
>> [87508047-fdc5-4a2f-9692-c83f7<wbr class="">b55bbc2] Validation of action<br class="">
>> 'MigrateVmToServer' failed for user admin@internal-authz. Reasons:<br class="">
>> VAR__ACTION__MIGRATE,VAR__TYPE<wbr class="">__VM,SCHEDULING_ALL_HOSTS_FILT<wbr class="">ERED_OUT,VAR__FILTERTYPE__INTE<wbr class="">RNAL,$hostName<br class="">
>> lago-basic-suite-master-host0,<wbr class="">$filterName<br class="">
>> CPUOverloaded,VAR__DETAIL__CPU<wbr class="">_OVERLOADED,SCHEDULING_HOST_<wbr class="">FILTERED_REASON_WITH_DETAIL<br class="">
>><br class="">
>><br class="">
>><br class="">
>> This has nothing to do with migration<br class="">
>> The CPUOverload is a scheduling policy, unless there was any change in<br class="">
>> that area the obvious explanation would be that the host has a CPU overload<br class="">
>> condition.<br class="">
>> I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": "16.59",<br class="">
>> "cpuIdle": “0.08”” which indeed suggests an overload, from the same sample I<br class="">
>> can see it’s vdsm ("cpuUserVdsmd": “77.38”, cpuSysVdsmd": “18.44"<br class="">
>><br class="">
>> Since similar values are consistently being reported for some time, and<br class="">
>> there is a setupNetworks and storage rescan prior to the the failure, and<br class="">
>> there is no other indication of anything wrong, I’d just say the environment<br class="">
>> or the order of tests or timing has changed, but nothing wrong with the<br class="">
>> oVirt code<br class="">
>> Did any of that changed recently? Does it reproduce locally?<br class="">
><br class="">
><br class="">
> AFAIK, no significant environment changes or tests were done.<br class="">
> We will try to reproduce it locally and also on the manual job, but from<br class="">
> what it looks it is very consistent (unlike other race failures we've seen<br class="">
> lately ) and continues to fails on the same tests, so its either a change in<br class="">
> oVirt or something else that we're not thinking on.<br class="">
><br class="">
>><br class="">
>><br class="">
>> Thanks,<br class="">
>> michal<br class="">
>><br class="">
>> 2017-07-02 16:43:22,920-04 INFO<br class="">
>> [org.ovirt.engine.core.bll.Mig<wbr class="">rateVmToServerCommand] (default task-27)<br class="">
>> [87508047-fdc5-4a2f-9692-c83f7<wbr class="">b55bbc2] Lock freed to object<br class="">
>> 'EngineLock:{exclusiveLocks='[<wbr class="">2b34910d-cef2-44d6-a274-30e847<wbr class="">3eb5d9=VM]',<br class="">
>> sharedLocks=''}'<br class="">
>> 2017-07-02 16:43:22,929-04 DEBUG<br class="">
>> [org.ovirt.engine.core.utils.t<wbr class="">imer.FixedDelayJobListener]<br class="">
>> (DefaultQuartzScheduler7) [] Rescheduling<br class="">
>> <a href="http://DEFAULT.org" class="">DEFAULT.org</a>.ovirt.engine.core.<wbr class="">bll.ColdRebootAutoStartVmsRunn<wbr class="">er.startFailedAutoStartVms#-92<wbr class="">23372036854775733<br class="">
>> as there is no unfired trigger.<br class="">
>> 2017-07-02 16:43:22,932-04 ERROR<br class="">
>> [org.ovirt.engine.api.restapi.<wbr class="">resource.AbstractBackendResour<wbr class="">ce] (default<br class="">
>> task-27) [] Operation Failed: [Cannot migrate VM. There is no host that<br class="">
>> satisfies current scheduling constraints. See below for details:, The host<br class="">
>> lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded<br class="">
>> because its CPU is too loaded.]<br class="">
>> 2017-07-02 16:43:23,331-04 DEBUG<br class="">
>> [org.ovirt.engine.core.utils.t<wbr class="">imer.FixedDelayJobListener]<br class="">
>> (DefaultQuartzScheduler2) [] Rescheduling<br class="">
>> <a href="http://DEFAULT.org" class="">DEFAULT.org</a>.ovirt.engine.core.<wbr class="">bll.HaAutoStartVmsRunner.start<wbr class="">FailedAutoStartVms#-9223372036<wbr class="">854775793<br class="">
>> as there is no unfired trigger.<br class="">
>> 2017-07-02 16:43:23,332-04 DEBUG<br class="">
>> [org.ovirt.engine.core.utils.t<wbr class="">imer.FixedDelayJobListener]<br class="">
>> (DefaultQuartzScheduler2) [] Rescheduling<br class="">
>> <a href="http://DEFAULT.org" class="">DEFAULT.org</a>.ovirt.engine.core.<wbr class="">bll.tasks.CommandCallbacksPoll<wbr class="">er.invokeCallbackMethods#-9223<wbr class="">372036854775783<br class="">
>> as there is no unfired trigger.<br class="">
>><br class="">
>> <engine log><br class="">
>><br class="">
>><br class="">
>><br class="">
>> Best Regards,<br class="">
>><br class="">
>> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL<br class="">
>> RHCSA | RHCVA | RHCE<br class="">
>> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)<br class="">
>><br class="">
>> OPEN SOURCE - 1 4 011 && 011 4 1<br class="">
>><br class="">
>> ______________________________<wbr class="">_________________<br class="">
>> Devel mailing list<br class="">
>> <a href="mailto:Devel@ovirt.org" target="_blank" class="">Devel@ovirt.org</a><br class="">
>> <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/mailman<wbr class="">/listinfo/devel</a><br class="">
>><br class="">
>><br class="">
>><br class="">
>> ______________________________<wbr class="">_________________<br class="">
>> Devel mailing list<br class="">
>> <a href="mailto:Devel@ovirt.org" target="_blank" class="">Devel@ovirt.org</a><br class="">
>> <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/mailman<wbr class="">/listinfo/devel</a><br class="">
><br class="">
><br class="">
><br class="">
><br class="">
> --<br class="">
><br class="">
> Eyal edri<br class="">
><br class="">
><br class="">
> ASSOCIATE MANAGER<br class="">
><br class="">
> RHV DevOps<br class="">
><br class="">
> EMEA VIRTUALIZATION R&D<br class="">
><br class="">
><br class="">
> Red Hat EMEA<br class="">
><br class="">
</div></div><span class="m_2989331196243842324gmail-im m_2989331196243842324gmail-HOEnZb">> TRIED. TESTED. TRUSTED.<br class="">
> phone: <a href="tel:%2B972-9-7692018" value="+97297692018" target="_blank" class="">+972-9-7692018</a><br class="">
> irc: eedri (on #tlv #rhev-dev #rhev-integ)<br class="">
><br class="">
</span><div class="m_2989331196243842324gmail-HOEnZb"><div class="m_2989331196243842324gmail-h5">> ______________________________<wbr class="">_________________<br class="">
> Devel mailing list<br class="">
> <a href="mailto:Devel@ovirt.org" target="_blank" class="">Devel@ovirt.org</a><br class="">
> <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/mailman<wbr class="">/listinfo/devel</a><br class="">
</div></div></blockquote></div><br class=""><br clear="all" class=""><div class=""><br class=""></div>-- <br class=""><div class="m_2989331196243842324gmail_signature"><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class=""><div style="font-family: overpass, sans-serif; margin: 0px; padding: 0px; font-size: 14px; text-transform: uppercase; font-weight: bold;" class=""><font color="#cc0000" class="">Eyal edri</font></div><div style="font-family: overpass, sans-serif; font-weight: bold; margin: 0px; padding: 0px; font-size: 14px; text-transform: uppercase;" class=""><br class=""></div><p style="font-family: overpass, sans-serif; font-size: 10px; margin: 0px 0px 4px; text-transform: uppercase;" class="">ASSOCIATE MANAGER</p><p style="font-family: overpass, sans-serif; font-size: 10px; margin: 0px 0px 4px; text-transform: uppercase;" class="">RHV DevOps</p><p style="font-family: overpass, sans-serif; font-size: 10px; margin: 0px 0px 4px; text-transform: uppercase;" class="">EMEA VIRTUALIZATION R&D</p><p style="font-family: overpass, sans-serif; font-size: 10px; margin: 0px 0px 4px; text-transform: uppercase;" class=""><br class=""></p><div style="font-family: overpass, sans-serif; margin: 0px; font-size: 10px; color: rgb(153, 153, 153);" class=""><a href="https://www.redhat.com/" style="color:rgb(0,136,206);margin:0px" target="_blank" class="">Red Hat EMEA</a></div><table border="0" style="font-family: overpass, sans-serif; font-size: inherit;" class=""><tbody class=""><tr class=""><td width="100px" class=""><a href="https://red.ht/sig" style="color:rgb(17,85,204)" target="_blank" class=""><img src="https://www.redhat.com/profiles/rh/themes/redhatdotcom/img/logo-red-hat-black.png" width="90" height="auto" class=""></a></td><td style="font-size:10px" class=""><a href="https://redhat.com/trusted" style="color:rgb(204,0,0);font-weight:bold" target="_blank" class="">TRIED. TESTED. TRUSTED.</a></td></tr></tbody></table></div><div class="">phone: <a href="tel:+972%209-769-2018" value="+97297692018" target="_blank" class="">+972-9-7692018</a><br class="">irc: eedri (on #tlv #rhev-dev #rhev-integ)</div></div></div></div></div></div></div></div></div>
</div></div>
</div></blockquote></div><br class=""></body></html>