
--Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 4 Jul 2017, at 13:00, Eyal Edri <eedri@redhat.com> wrote: =20 I was able to reproduce the error [1] on a manual run with only new = vdsm from [2], and also to verify that w/o this change, while using latest tested run = [3] it works. =20 So I think this proves quite clearly the problem is one of the latest = VDSM patches.
=20 I'm running again the test with the suspected bad VDSM and hopefully = will be able to extract the env to tar.gz file which anyone can import using the lago demo tool. =20 =20 =20 [1] = http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-test= s_manual/748/ = <http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tes= ts_manual/748/> [2] = http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/ = <http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/=
[3] = http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-test= s_manual/747/ = <http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tes= ts_manual/747/> =20 =20 =20 On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <ngoldin@redhat.com = <mailto:ngoldin@redhat.com>> wrote: Hi, sorry for posting late, I had a brief look at this yesterday: 1. I couldn't replicate it locally - which means it is most likely a recent change. 2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new version is used this week(0.40) - and they seem OK - specifically memroy and vcpus(which was my initial suspect). 3. I saw two Engine patches, a bit prior to the time it started to fail, which *might* in my common sense be related, but it is out of my scope to tell(CC'ed patch owners): =20 core: Make VmAnalyzer to treat a migrated Paused VM as success - https://gerrit.ovirt.org/78305 <https://gerrit.ovirt.org/78305> =20 fix custom fencing default config setting https://gerrit.ovirt.org/78720 <https://gerrit.ovirt.org/78720> =20 Shot in the wild - Could it be that the 'CPUOverload' filter was not active before for some reason? =20 Also, there are some exceptions in host0 vdsm log[1], failing to get VM stats, though I can't tell if they are specific to this failure. =20 Of course this is not a complete analysis, I hope it helps. =20 =20 [1] = http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/arti= fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master= /post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.l= og = <http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/art= ifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-maste= r/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.= log> =20 =20 Nadav. =20 =20 =20 =20 =20 On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <eedri@redhat.com = <mailto:eedri@redhat.com>> wrote:
On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> =
wrote:
On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi@redhat.com =
<mailto:sbendavi@redhat.com>> wrote:
Hi,
Test failed: [ 006_migrations.migrate_vm ] Link to suspected patches: N/A Link to Job: =
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/ = <http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/>
Link to all logs: Error snippet from the log: = http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/arti= fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master= /post-006_migrations.py/ = <http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/art= ifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-maste= r/post-006_migrations.py/>
<error>
"Fault reason is "Operation Failed". Fault detail is "[Cannot = migrate VM. There is no host that satisfies current scheduling constraints. See = below for details:, The host lago-basic-suite-master-host0 did not = satisfy internal filter CPUOverloaded because its CPU is too loaded.]"
</error>
<engine log>
2017-07-02 16:43:22,829-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object = 'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]= ', sharedLocks=3D''}' 2017-07-02 16:43:22,833-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled = stored procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}] 2017-07-02 16:43:22,833-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall = for procedure [GetDiskVmElementsPluggedToVm] compiled 2017-07-02 16:43:22,843-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled = stored procedure. Call string is [{call getattacheddisksnapshotstovm(?, = ?)}] 2017-07-02 16:43:22,843-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall = for procedure [GetAttachedDiskSnapshotsToVm] compiled 2017-07-02 16:43:22,919-04 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host 'lago-basic-suite-master-host0' = ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' (correlation id: null) 2017-07-02 16:43:22,920-04 WARN [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: = VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__= FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName = CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_= WITH_DETAIL
This has nothing to do with migration The CPUOverload is a scheduling policy, unless there was any change = in that area the obvious explanation would be that the host has a CPU = overload condition. I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": = "16.59", "cpuIdle": =E2=80=9C0.08=E2=80=9D=E2=80=9D which indeed suggests an = overload, from the same sample I can see it=E2=80=99s vdsm ("cpuUserVdsmd": =E2=80=9C77.38=E2=80=9D, = cpuSysVdsmd": =E2=80=9C18.44"
Since similar values are consistently being reported for some time, = and there is a setupNetworks and storage rescan prior to the the = failure, and there is no other indication of anything wrong, I=E2=80=99d just = say the environment or the order of tests or timing has changed, but nothing wrong with =
oVirt code Did any of that changed recently? Does it reproduce locally?
AFAIK, no significant environment changes or tests were done. We will try to reproduce it locally and also on the manual job, but = from what it looks it is very consistent (unlike other race failures = we've seen lately ) and continues to fails on the same tests, so its either a = change in oVirt or something else that we're not thinking on.
Thanks, michal
2017-07-02 16:43:22,920-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default =
task-27)
[87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object = 'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]= ', sharedLocks=3D''}' 2017-07-02 16:43:22,929-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunner.startFailed= AutoStartVms#-9223372036854775733 as there is no unfired trigger. 2017-07-02 16:43:22,932-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] = (default task-27) [] Operation Failed: [Cannot migrate VM. There is no host =
There is only a single patch between vdsms [1] and [3] https://gerrit.ovirt.org/#/c/78536 the that
satisfies current scheduling constraints. See below for details:, = The host lago-basic-suite-master-host0 did not satisfy internal filter = CPUOverloaded because its CPU is too loaded.] 2017-07-02 16:43:23,331-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.startFailedAutoStar= tVms#-9223372036854775793 as there is no unfired trigger. 2017-07-02 16:43:23,332-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallb= ackMethods#-9223372036854775783 as there is no unfired trigger.
<engine log>
Best Regards,
Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
OPEN SOURCE - 1 4 011 && 011 4 1
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel = <http://lists.ovirt.org/mailman/listinfo/devel>
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel = <http://lists.ovirt.org/mailman/listinfo/devel>
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA
TRIED. TESTED. TRUSTED. phone: +972-9-7692018 <tel:%2B972-9-7692018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel = <http://lists.ovirt.org/mailman/listinfo/devel> =20 =20 =20 --=20 EYAL EDRI =20 ASSOCIATE MANAGER RHV DEVOPS EMEA VIRTUALIZATION R&D =20 Red Hat=C2=A0EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. = <https://redhat.com/trusted> phone: +972-9-7692018 <tel:+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 4 Jul 2017, at 13:00, Eyal Edri <<a = href=3D"mailto:eedri@redhat.com" class=3D"">eedri@redhat.com</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"">I was able to reproduce the error = [1] on a manual run with only new vdsm from [2],</div><div class=3D"">and = also to verify that w/o this change, while using latest tested run [3] = it works.</div><div class=3D""><br class=3D""></div><div class=3D"">So I = think this proves quite clearly the problem is one of the latest VDSM = patches.</div></div></div></blockquote><div><br class=3D""></div>There = is only a single patch between vdsms [1] and [3]</div><div><a = href=3D"https://gerrit.ovirt.org/#/c/78536" = class=3D"">https://gerrit.ovirt.org/#/c/78536</a></div><div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D""><br class=3D""></div><div = class=3D"">I'm running again the test with the suspected bad VDSM and = hopefully will be able to extract the env to tar.gz file</div><div = class=3D"">which anyone can import using the lago demo tool.</div><div = class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div><div = class=3D""><br class=3D""></div><div class=3D"">[1] <a = href=3D"http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-sys= tem-tests_manual/748/" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/<wbr = class=3D"">view/oVirt%20system%20tests/<wbr = class=3D"">job/ovirt-system-tests_manual/<wbr = class=3D"">748/</a></div><div class=3D"">[2] <a = href=3D"http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_6= 4/2694/" target=3D"_blank" class=3D"">http://jenkins.ovirt.org/<wbr = class=3D"">job/vdsm_master_build-<wbr = class=3D"">artifacts-el7-x86_64/2694/</a></div><div class=3D"">[3] <a= = href=3D"http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-sys= tem-tests_manual/747/" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/<wbr = class=3D"">view/oVirt%20system%20tests/<wbr = class=3D"">job/ovirt-system-tests_manual/<wbr = class=3D"">747/</a></div><div class=3D""><br class=3D""></div><br = class=3D""><div class=3D"gmail_extra"><br class=3D""><div = class=3D"gmail_quote">On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <span = dir=3D"ltr" class=3D""><<a href=3D"mailto:ngoldin@redhat.com" = target=3D"_blank" class=3D"">ngoldin@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi, sorry = for posting late, I had a brief look at this yesterday:<br class=3D""> 1. I couldn't replicate it locally - which means it is most likely a<br = class=3D""> recent change.<br class=3D""> 2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new<br = class=3D""> version is used this week(0.40) - and they seem OK - specifically<br = class=3D""> memroy and vcpus(which was my initial suspect).<br class=3D""> 3. I saw two Engine patches, a bit prior to the time it started to<br = class=3D""> fail, which *might* in my common sense be related, but it is out of = my<br class=3D""> scope to tell(CC'ed patch owners):<br class=3D""> <br class=3D""> core: Make VmAnalyzer to treat a migrated Paused VM as success -<br = class=3D""> <a href=3D"https://gerrit.ovirt.org/78305" rel=3D"noreferrer" = target=3D"_blank" class=3D"">https://gerrit.ovirt.org/78305</a><br = class=3D""> <br class=3D""> fix custom fencing default config setting<br class=3D""> <a href=3D"https://gerrit.ovirt.org/78720" rel=3D"noreferrer" = target=3D"_blank" class=3D"">https://gerrit.ovirt.org/78720</a><br = class=3D""> <br class=3D""> Shot in the wild - Could it be that the 'CPUOverload' filter was not<br = class=3D""> active before for some reason?<br class=3D""> <br class=3D""> Also, there are some exceptions in host0 vdsm log[1], failing to get<br = class=3D""> VM stats, though I can't tell if they are specific to this failure.<br = class=3D""> <br class=3D""> Of course this is not a complete analysis, I hope it helps.<br class=3D"">= <br class=3D""> <br class=3D""> [1] <a = href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7= 431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suit= e-master/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vds= m/vdsm.log" rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/t<wbr = class=3D"">est-repo_ovirt_experimental_ma<wbr = class=3D"">ster/7431/artifact/exported-ar<wbr = class=3D"">tifacts/basic-suit-master-el7/<wbr = class=3D"">test_logs/basic-suite-master/<wbr = class=3D"">post-006_migrations.py/lago-<wbr = class=3D"">basic-suite-master-host0/_var_<wbr = class=3D"">log/vdsm/vdsm.log</a><br class=3D""> <span class=3D"m_2989331196243842324gmail-HOEnZb"><font color=3D"#888888" = class=3D""><br class=3D""> <br class=3D""> Nadav.<br class=3D""> </font></span><div class=3D"m_2989331196243842324gmail-HOEnZb"><div = class=3D"m_2989331196243842324gmail-h5"><br class=3D""> <br class=3D""> <br class=3D""> <br class=3D""> <br class=3D""> On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <<a = href=3D"mailto:eedri@redhat.com" target=3D"_blank" = class=3D"">eedri@redhat.com</a>> wrote:<br class=3D""> ><br class=3D""> ><br class=3D""> > On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek<br class=3D""> > <<a href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>> wrote:<br class=3D""> >><br class=3D""> >><br class=3D""> >> On 3 Jul 2017, at 15:35, Shlomo Ben David <<a = href=3D"mailto:sbendavi@redhat.com" target=3D"_blank" = class=3D"">sbendavi@redhat.com</a>> wrote:<br class=3D""> >><br class=3D""> >> Hi,<br class=3D""> >><br class=3D""> >> Test failed: [ 006_migrations.migrate_vm ]<br class=3D""> >> Link to suspected patches: N/A<br class=3D""> >> Link to Job:<br class=3D""> >> <a = href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7= 431/" rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/t<wbr = class=3D"">est-repo_ovirt_experimental_ma<wbr class=3D"">ster/7431/</a><br= class=3D""> >> Link to all logs:<br class=3D""> >> Error snippet from the log:<br class=3D""> >> <a = href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7= 431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suit= e-master/post-006_migrations.py/" rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/t<wbr = class=3D"">est-repo_ovirt_experimental_ma<wbr = class=3D"">ster/7431/artifact/exported-ar<wbr = class=3D"">tifacts/basic-suit-master-el7/<wbr = class=3D"">test_logs/basic-suite-master/<wbr = class=3D"">post-006_migrations.py/</a><br class=3D""> >><br class=3D""> >> <error><br class=3D""> >><br class=3D""> >> "Fault reason is "Operation Failed". Fault detail is = "[Cannot migrate VM.<br class=3D""> >> There is no host that satisfies current scheduling constraints. = See below<br class=3D""> >> for details:, The host lago-basic-suite-master-host0 did not = satisfy<br class=3D""> >> internal filter CPUOverloaded because its CPU is too = loaded.]"<br class=3D""> >><br class=3D""> >> </error><br class=3D""> >><br class=3D""> >> <engine log><br class=3D""> >><br class=3D""> >> 2017-07-02 16:43:22,829-04 INFO<br class=3D""> >> [org.ovirt.engine.core.bll.Mig<wbr = class=3D"">rateVmToServerCommand] (default task-27)<br class=3D""> >> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] Lock = Acquired to object<br class=3D""> >> 'EngineLock:{exclusiveLocks=3D'[<wbr = class=3D"">2b34910d-cef2-44d6-a274-30e847<wbr class=3D"">3eb5d9=3DVM]',<br= class=3D""> >> sharedLocks=3D''}'<br class=3D""> >> 2017-07-02 16:43:22,833-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.PostgresDbEngineDialect$<wbr = class=3D"">PostgresSimpleJdbcCall]<br class=3D""> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr = class=3D"">b55bbc2] Compiled stored<br class=3D""> >> procedure. Call string is [{call = getdiskvmelementspluggedtovm(?<wbr class=3D"">)}]<br class=3D""> >> 2017-07-02 16:43:22,833-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.PostgresDbEngineDialect$<wbr = class=3D"">PostgresSimpleJdbcCall]<br class=3D""> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr = class=3D"">b55bbc2] SqlCall for<br class=3D""> >> procedure [GetDiskVmElementsPluggedToVm] compiled<br class=3D""> >> 2017-07-02 16:43:22,843-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.PostgresDbEngineDialect$<wbr = class=3D"">PostgresSimpleJdbcCall]<br class=3D""> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr = class=3D"">b55bbc2] Compiled stored<br class=3D""> >> procedure. Call string is [{call = getattacheddisksnapshotstovm(?<wbr class=3D"">, ?)}]<br class=3D""> >> 2017-07-02 16:43:22,843-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.PostgresDbEngineDialect$<wbr = class=3D"">PostgresSimpleJdbcCall]<br class=3D""> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr = class=3D"">b55bbc2] SqlCall for<br class=3D""> >> procedure [GetAttachedDiskSnapshotsToVm] compiled<br class=3D""> >> 2017-07-02 16:43:22,919-04 INFO<br class=3D""> >> [org.ovirt.engine.core.bll.sch<wbr = class=3D"">eduling.SchedulingManager] (default task-27)<br class=3D""> >> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] = Candidate host<br class=3D""> >> 'lago-basic-suite-master-host0<wbr class=3D"">' = ('46bdc63d-98f5-4eee-81aa-2fb8<wbr class=3D"">8b8f7cbe') was<br = class=3D""> >> filtered out by 'VAR__FILTERTYPE__INTERNAL' filter = 'CPUOverloaded'<br class=3D""> >> (correlation id: null)<br class=3D""> >> 2017-07-02 16:43:22,920-04 WARN<br class=3D""> >> [org.ovirt.engine.core.bll.Mig<wbr = class=3D"">rateVmToServerCommand] (default task-27)<br class=3D""> >> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] = Validation of action<br class=3D""> >> 'MigrateVmToServer' failed for user admin@internal-authz. = Reasons:<br class=3D""> >> VAR__ACTION__MIGRATE,VAR__TYPE<wbr = class=3D"">__VM,SCHEDULING_ALL_HOSTS_FILT<wbr = class=3D"">ERED_OUT,VAR__FILTERTYPE__INTE<wbr class=3D"">RNAL,$hostName<br= class=3D""> >> lago-basic-suite-master-host0,<wbr class=3D"">$filterName<br = class=3D""> >> CPUOverloaded,VAR__DETAIL__CPU<wbr = class=3D"">_OVERLOADED,SCHEDULING_HOST_<wbr = class=3D"">FILTERED_REASON_WITH_DETAIL<br class=3D""> >><br class=3D""> >><br class=3D""> >><br class=3D""> >> This has nothing to do with migration<br class=3D""> >> The CPUOverload is a scheduling policy, unless there was any = change in<br class=3D""> >> that area the obvious explanation would be that the host has a = CPU overload<br class=3D""> >> condition.<br class=3D""> >> I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": = "16.59",<br class=3D""> >> "cpuIdle": =E2=80=9C0.08=E2=80=9D=E2=80=9D which indeed = suggests an overload, from the same sample I<br class=3D""> >> can see it=E2=80=99s vdsm ("cpuUserVdsmd": =E2=80=9C77.38=E2=80=9D= , cpuSysVdsmd": =E2=80=9C18.44"<br class=3D""> >><br class=3D""> >> Since similar values are consistently being reported for some = time, and<br class=3D""> >> there is a setupNetworks and storage rescan prior to the the = failure, and<br class=3D""> >> there is no other indication of anything wrong, I=E2=80=99d = just say the environment<br class=3D""> >> or the order of tests or timing has changed, but nothing wrong = with the<br class=3D""> >> oVirt code<br class=3D""> >> Did any of that changed recently? Does it reproduce locally?<br = class=3D""> ><br class=3D""> ><br class=3D""> > AFAIK, no significant environment changes or tests were done.<br = class=3D""> > We will try to reproduce it locally and also on the manual = job, but from<br class=3D""> > what it looks it is very consistent (unlike other race failures = we've seen<br class=3D""> > lately ) and continues to fails on the same tests, so its either a = change in<br class=3D""> > oVirt or something else that we're not thinking on.<br class=3D""> ><br class=3D""> >><br class=3D""> >><br class=3D""> >> Thanks,<br class=3D""> >> michal<br class=3D""> >><br class=3D""> >> 2017-07-02 16:43:22,920-04 INFO<br class=3D""> >> [org.ovirt.engine.core.bll.Mig<wbr = class=3D"">rateVmToServerCommand] (default task-27)<br class=3D""> >> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] Lock = freed to object<br class=3D""> >> 'EngineLock:{exclusiveLocks=3D'[<wbr = class=3D"">2b34910d-cef2-44d6-a274-30e847<wbr class=3D"">3eb5d9=3DVM]',<br= class=3D""> >> sharedLocks=3D''}'<br class=3D""> >> 2017-07-02 16:43:22,929-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.utils.t<wbr = class=3D"">imer.FixedDelayJobListener]<br class=3D""> >> (DefaultQuartzScheduler7) [] Rescheduling<br class=3D""> >> <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr = class=3D"">bll.ColdRebootAutoStartVmsRunn<wbr = class=3D"">er.startFailedAutoStartVms#-92<wbr = class=3D"">23372036854775733<br class=3D""> >> as there is no unfired trigger.<br class=3D""> >> 2017-07-02 16:43:22,932-04 ERROR<br class=3D""> >> [org.ovirt.engine.api.restapi.<wbr = class=3D"">resource.AbstractBackendResour<wbr class=3D"">ce] (default<br = class=3D""> >> task-27) [] Operation Failed: [Cannot migrate VM. There is no = host that<br class=3D""> >> satisfies current scheduling constraints. See below for = details:, The host<br class=3D""> >> lago-basic-suite-master-host0 did not satisfy internal filter = CPUOverloaded<br class=3D""> >> because its CPU is too loaded.]<br class=3D""> >> 2017-07-02 16:43:23,331-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.utils.t<wbr = class=3D"">imer.FixedDelayJobListener]<br class=3D""> >> (DefaultQuartzScheduler2) [] Rescheduling<br class=3D""> >> <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr = class=3D"">bll.HaAutoStartVmsRunner.start<wbr = class=3D"">FailedAutoStartVms#-9223372036<wbr class=3D"">854775793<br = class=3D""> >> as there is no unfired trigger.<br class=3D""> >> 2017-07-02 16:43:23,332-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.utils.t<wbr = class=3D"">imer.FixedDelayJobListener]<br class=3D""> >> (DefaultQuartzScheduler2) [] Rescheduling<br class=3D""> >> <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr = class=3D"">bll.tasks.CommandCallbacksPoll<wbr = class=3D"">er.invokeCallbackMethods#-9223<wbr = class=3D"">372036854775783<br class=3D""> >> as there is no unfired trigger.<br class=3D""> >><br class=3D""> >> <engine log><br class=3D""> >><br class=3D""> >><br class=3D""> >><br class=3D""> >> Best Regards,<br class=3D""> >><br class=3D""> >> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL<br = class=3D""> >> RHCSA | RHCVA | RHCE<br class=3D""> >> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)<br = class=3D""> >><br class=3D""> >> OPEN SOURCE - 1 4 011 && 011 4 1<br class=3D""> >><br class=3D""> >> ______________________________<wbr = class=3D"">_________________<br class=3D""> >> Devel mailing list<br class=3D""> >> <a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" = class=3D"">Devel@ovirt.org</a><br class=3D""> >> <a href=3D"http://lists.ovirt.org/mailman/listinfo/devel" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/devel</a><br class=3D""> >><br class=3D""> >><br class=3D""> >><br class=3D""> >> ______________________________<wbr = class=3D"">_________________<br class=3D""> >> Devel mailing list<br class=3D""> >> <a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" = class=3D"">Devel@ovirt.org</a><br class=3D""> >> <a href=3D"http://lists.ovirt.org/mailman/listinfo/devel" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/devel</a><br class=3D""> ><br class=3D""> ><br class=3D""> ><br class=3D""> ><br class=3D""> > --<br class=3D""> ><br class=3D""> > Eyal edri<br class=3D""> ><br class=3D""> ><br class=3D""> > ASSOCIATE MANAGER<br class=3D""> ><br class=3D""> > RHV DevOps<br class=3D""> ><br class=3D""> > EMEA VIRTUALIZATION R&D<br class=3D""> ><br class=3D""> ><br class=3D""> > Red Hat EMEA<br class=3D""> ><br class=3D""> </div></div><span class=3D"m_2989331196243842324gmail-im = m_2989331196243842324gmail-HOEnZb">> TRIED. TESTED. TRUSTED.<br = class=3D""> > phone: <a href=3D"tel:%2B972-9-7692018" value=3D"+97297692018" = target=3D"_blank" class=3D"">+972-9-7692018</a><br class=3D""> > irc: eedri (on #tlv #rhev-dev #rhev-integ)<br class=3D""> ><br class=3D""> </span><div class=3D"m_2989331196243842324gmail-HOEnZb"><div = class=3D"m_2989331196243842324gmail-h5">> = ______________________________<wbr class=3D"">_________________<br = class=3D""> > Devel mailing list<br class=3D""> > <a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" = class=3D"">Devel@ovirt.org</a><br class=3D""> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/devel" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/devel</a><br class=3D""> </div></div></blockquote></div><br class=3D""><br clear=3D"all" = class=3D""><div class=3D""><br class=3D""></div>-- <br class=3D""><div = class=3D"m_2989331196243842324gmail_signature"><div dir=3D"ltr" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D""><div dir=3D"ltr" = class=3D""><div class=3D""><div style=3D"font-family: overpass, = sans-serif; margin: 0px; padding: 0px; font-size: 14px; text-transform: = uppercase; font-weight: bold;" class=3D""><font color=3D"#cc0000" = class=3D"">Eyal edri</font></div><div style=3D"font-family: overpass, = sans-serif; font-weight: bold; margin: 0px; padding: 0px; font-size: = 14px; text-transform: uppercase;" class=3D""><br class=3D""></div><p = style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px = 0px 4px; text-transform: uppercase;" class=3D"">ASSOCIATE MANAGER</p><p = style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px = 0px 4px; text-transform: uppercase;" class=3D"">RHV DevOps</p><p = style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px = 0px 4px; text-transform: uppercase;" class=3D"">EMEA VIRTUALIZATION = R&D</p><p style=3D"font-family: overpass, sans-serif; font-size: = 10px; margin: 0px 0px 4px; text-transform: uppercase;" class=3D""><br = class=3D""></p><div style=3D"font-family: overpass, sans-serif; margin: = 0px; font-size: 10px; color: rgb(153, 153, 153);" class=3D""><a = href=3D"https://www.redhat.com/" style=3D"color:rgb(0,136,206);margin:0px"= target=3D"_blank" class=3D"">Red Hat EMEA</a></div><table = border=3D"0" style=3D"font-family: overpass, sans-serif; font-size: = inherit;" class=3D""><tbody class=3D""><tr class=3D""><td width=3D"100px" = class=3D""><a href=3D"https://red.ht/sig" style=3D"color:rgb(17,85,204)" = target=3D"_blank" class=3D""><img = src=3D"https://www.redhat.com/profiles/rh/themes/redhatdotcom/img/logo-red= -hat-black.png" width=3D"90" height=3D"auto" class=3D""></a></td><td = style=3D"font-size:10px" class=3D""><a href=3D"https://redhat.com/trusted"= style=3D"color:rgb(204,0,0);font-weight:bold" target=3D"_blank" = class=3D"">TRIED. TESTED. = TRUSTED.</a></td></tr></tbody></table></div><div class=3D"">phone: <a = href=3D"tel:+972%209-769-2018" value=3D"+97297692018" target=3D"_blank" = class=3D"">+972-9-7692018</a><br class=3D"">irc: eedri (on #tlv = #rhev-dev = #rhev-integ)</div></div></div></div></div></div></div></div></div> </div></div> </div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756--