[ OST Failure Report ] [ oVirt master ] [ 03-07-2017 ] [ 006_migrations.migrate_vm ]

Hi, Test failed: [ 006_migrations.migrate_vm ] Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/ Link to all logs: Error snippet from the log: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifa... <error> "Fault reason is "Operation Failed". Fault detail is "[Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]" </error> <engine log> 2017-07-02 16:43:22,829-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e8473eb5d9=VM]', sharedLocks=''}' 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}] 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetDiskVmElementsPluggedToVm] compiled 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getattacheddisksnapshotstovm(?, ?)}] 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetAttachedDiskSnapshotsToVm] compiled 2017-07-02 16:43:22,919-04 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' (correlation id: null) 2017-07-02 16:43:22,920-04 WARN [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL 2017-07-02 16:43:22,920-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e8473eb5d9=VM]', sharedLocks=''}' 2017-07-02 16:43:22,929-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunner.startFailedAutoStartVms#-9223372036854775733 as there is no unfired trigger. 2017-07-02 16:43:22,932-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-27) [] Operation Failed: [Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.] 2017-07-02 16:43:23,331-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.startFailedAutoStartVms#-9223372036854775793 as there is no unfired trigger. 2017-07-02 16:43:23,332-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods#-9223372036854775783 as there is no unfired trigger. <engine log> Best Regards, Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci) OPEN SOURCE - 1 4 011 && 011 4 1

--Apple-Mail=_F6BEE682-0833-4DDF-AC85-B3A2BC1EDFF6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi@redhat.com> wrote: =20 Hi, =20 Test failed: [ 006_migrations.migrate_vm ] Link to suspected patches: N/A Link to Job: = http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/ = <http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/> Link to all logs:=20 Error snippet from the log: = http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/arti= fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master= /post-006_migrations.py/ = <http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/art= ifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-maste= r/post-006_migrations.py/> =20 <error> =20 "Fault reason is "Operation Failed". Fault detail is "[Cannot migrate = VM. There is no host that satisfies current scheduling constraints. See = below for details:, The host lago-basic-suite-master-host0 did not = satisfy internal filter CPUOverloaded because its CPU is too loaded.]" =20 </error> =20 <engine log> =20 2017-07-02 16:43:22,829-04 INFO = [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) = [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object = 'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]= ', sharedLocks=3D''}' 2017-07-02 16:43:22,833-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] = Compiled stored procedure. Call string is [{call = getdiskvmelementspluggedtovm(?)}] 2017-07-02 16:43:22,833-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] = SqlCall for procedure [GetDiskVmElementsPluggedToVm] compiled 2017-07-02 16:43:22,843-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] = Compiled stored procedure. Call string is [{call = getattacheddisksnapshotstovm(?, ?)}] 2017-07-02 16:43:22,843-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] = SqlCall for procedure [GetAttachedDiskSnapshotsToVm] compiled 2017-07-02 16:43:22,919-04 INFO = [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host = 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') = was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' = (correlation id: null) 2017-07-02 16:43:22,920-04 WARN = [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) = [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action = 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: = VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__= FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName = CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_= WITH_DETAIL
This has nothing to do with migration The CPUOverload is a scheduling policy, unless there was any change in = that area the obvious explanation would be that the host has a CPU = overload condition. I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": "16.59", = "cpuIdle": =E2=80=9C0.08=E2=80=9D=E2=80=9D which indeed suggests an = overload, from the same sample I can see it=E2=80=99s vdsm = ("cpuUserVdsmd": =E2=80=9C77.38=E2=80=9D, cpuSysVdsmd": =E2=80=9C18.44" Since similar values are consistently being reported for some time, and = there is a setupNetworks and storage rescan prior to the the failure, = and there is no other indication of anything wrong, I=E2=80=99d just say = the environment or the order of tests or timing has changed, but nothing = wrong with the oVirt code Did any of that changed recently? Does it reproduce locally? Thanks, michal
2017-07-02 16:43:22,920-04 INFO = [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) = [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object = 'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]= ', sharedLocks=3D''}' 2017-07-02 16:43:22,929-04 DEBUG = [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] = (DefaultQuartzScheduler7) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunner.startFailed= AutoStartVms#-9223372036854775733 as there is no unfired trigger. 2017-07-02 16:43:22,932-04 ERROR = [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default = task-27) [] Operation Failed: [Cannot migrate VM. There is no host that = satisfies current scheduling constraints. See below for details:, The = host lago-basic-suite-master-host0 did not satisfy internal filter = CPUOverloaded because its CPU is too loaded.] 2017-07-02 16:43:23,331-04 DEBUG = [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] = (DefaultQuartzScheduler2) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.startFailedAutoStar= tVms#-9223372036854775793 as there is no unfired trigger. 2017-07-02 16:43:23,332-04 DEBUG = [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] = (DefaultQuartzScheduler2) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallb= ackMethods#-9223372036854775783 as there is no unfired trigger. =20 <engine log> =20 =20 =20 Best Regards, =20 Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci) =20 OPEN SOURCE - 1 4 011 && 011 4 1 =20 _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--Apple-Mail=_F6BEE682-0833-4DDF-AC85-B3A2BC1EDFF6 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 3 Jul 2017, at 15:35, Shlomo Ben David <<a = href=3D"mailto:sbendavi@redhat.com" class=3D"">sbendavi@redhat.com</a>>= wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D"">Hi,<br class=3D""><br class=3D"">Test failed: [ = 006_migrations.migrate_vm ]<br class=3D"">Link to suspected patches: = N/A<br class=3D"">Link to Job: <a = href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7= 431/" = class=3D"">http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_maste= r/7431/</a><br class=3D"">Link to all logs: <br class=3D"">Error = snippet from the log: <a = href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7= 431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suit= e-master/post-006_migrations.py/" = class=3D"">http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_maste= r/7431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-s= uite-master/post-006_migrations.py/</a><br class=3D""><br = class=3D""><error><br class=3D""><br class=3D""> "Fault = reason is "Operation Failed". Fault detail is "[Cannot migrate VM. There = is no host that satisfies current scheduling constraints. See below for = details:, The host lago-basic-suite-master-host0 did not satisfy = internal filter CPUOverloaded because its CPU is too loaded.]"<div = class=3D""><br class=3D""></div><div class=3D""></error><br = class=3D""></div><div class=3D""><br class=3D""></div><div = class=3D""><engine log><br class=3D""></div><div class=3D""><br = class=3D""></div><div class=3D""><div class=3D"">2017-07-02 = 16:43:22,829-04 INFO = [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object = 'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]= ', sharedLocks=3D''}'</div><div class=3D"">2017-07-02 16:43:22,833-04 = DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] = Compiled stored procedure. Call string is [{call = getdiskvmelementspluggedtovm(?)}]</div><div class=3D"">2017-07-02 = 16:43:22,833-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] = SqlCall for procedure [GetDiskVmElementsPluggedToVm] compiled</div><div = class=3D"">2017-07-02 16:43:22,843-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] = Compiled stored procedure. Call string is [{call = getattacheddisksnapshotstovm(?, ?)}]</div><div class=3D"">2017-07-02 = 16:43:22,843-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] = SqlCall for procedure [GetAttachedDiskSnapshotsToVm] compiled</div><div = class=3D"">2017-07-02 16:43:22,919-04 INFO = [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host = 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') = was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' = (correlation id: null)</div><div class=3D"">2017-07-02 16:43:22,920-04 = WARN [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action = 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: = VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__= FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName = CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_= WITH_DETAIL</div></div></div></div></blockquote><div><br = class=3D""><div><br class=3D""></div><div>This has nothing to do with = migration<br class=3D"">The CPUOverload is a scheduling policy, unless = there was any change in that area the obvious explanation would be that = the host has a CPU overload condition.<br class=3D"">I briefly looked at = logs and see ""cpuUser": "83.40", "cpuSys": "16.59", "cpuIdle": = =E2=80=9C0.08=E2=80=9D=E2=80=9D which indeed suggests an overload, from = the same sample I can see it=E2=80=99s vdsm ("cpuUserVdsmd": = =E2=80=9C77.38=E2=80=9D, cpuSysVdsmd": =E2=80=9C18.44"<br = class=3D""><br class=3D""></div>Since similar values are consistently = being reported for some time, and there is a setupNetworks and storage = rescan prior to the the failure, and there is no other indication of = anything wrong, I=E2=80=99d just say the environment or the order of = tests or timing has changed, but nothing wrong with the oVirt = code</div><div>Did any of that changed recently? Does it reproduce = locally?</div><div><br = class=3D""></div><div>Thanks,</div><div>michal</div><div><br = class=3D""></div><blockquote type=3D"cite" class=3D""><div class=3D""><div= dir=3D"ltr" class=3D""><div class=3D""><div class=3D"">2017-07-02 = 16:43:22,920-04 INFO = [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object = 'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]= ', sharedLocks=3D''}'</div><div class=3D"">2017-07-02 16:43:22,929-04 = DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] = (DefaultQuartzScheduler7) [] Rescheduling <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.bll.ColdRebootAutoStartVmsRun= ner.startFailedAutoStartVms#-9223372036854775733 as there is no unfired = trigger.</div><div class=3D"">2017-07-02 16:43:22,932-04 ERROR = [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default = task-27) [] Operation Failed: [Cannot migrate VM. There is no host that = satisfies current scheduling constraints. See below for details:, The = host lago-basic-suite-master-host0 did not satisfy internal filter = CPUOverloaded because its CPU is too loaded.]</div><div = class=3D"">2017-07-02 16:43:23,331-04 DEBUG = [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] = (DefaultQuartzScheduler2) [] Rescheduling <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.bll.HaAutoStartVmsRunner.star= tFailedAutoStartVms#-9223372036854775793 as there is no unfired = trigger.</div><div class=3D"">2017-07-02 16:43:23,332-04 DEBUG = [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] = (DefaultQuartzScheduler2) [] Rescheduling <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.bll.tasks.CommandCallbacksPol= ler.invokeCallbackMethods#-9223372036854775783 as there is no unfired = trigger.</div><div class=3D""><br class=3D""></div><engine log><br = class=3D""><br class=3D""><div class=3D""><br class=3D""></div><br = clear=3D"all" class=3D""><div class=3D""><div = class=3D"gmail_signature"><div dir=3D"ltr" class=3D""><div class=3D""><div= dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" = class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"">Best Regards,</div><div dir=3D"ltr"= class=3D""><br class=3D""></div><div dir=3D"ltr" class=3D"">Shlomi = Ben-David | Software Engineer <span style=3D"font-size:small" = class=3D"">| </span><span style=3D"font-size:12.8px" class=3D"">Red = Hat ISRAEL</span></div><div class=3D"">RHCSA | <span = style=3D"font-size:small" class=3D"">RHCVA | </span><span = style=3D"font-size:small" class=3D"">RHCE</span></div><div dir=3D"ltr" = class=3D"">IRC: shlomibendavid <span style=3D"font-size:small" = class=3D"">(on #rhev-integ, #rhev-dev, #rhev-ci)</span><br class=3D""><br = class=3D"">OPEN SOURCE - 1 4 011 && 011 4 1<br class=3D""><br = class=3D""></div></div></div></div></div></div></div></div></div></div></d= iv> </div></div> _______________________________________________<br class=3D"">Devel = mailing list<br class=3D""><a href=3D"mailto:Devel@ovirt.org" = class=3D"">Devel@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/devel</div></blockquote=
</div><br class=3D""></body></html>=
--Apple-Mail=_F6BEE682-0833-4DDF-AC85-B3A2BC1EDFF6--

On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi@redhat.com> wrote:
Hi,
Test failed: [ 006_migrations.migrate_vm ] Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/7431/ Link to all logs: Error snippet from the log: http://jenkins.ovirt.org/job/test-repo_ovirt_ experimental_master/7431/artifact/exported-artifacts/ basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/
<error>
"Fault reason is "Operation Failed". Fault detail is "[Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]"
</error>
<engine log>
2017-07-02 16:43:22,829-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e8473eb5d9=VM]', sharedLocks=''}' 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker. PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}] 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker. PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetDiskVmElementsPluggedToVm] compiled 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker. PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getattacheddisksnapshotstovm(?, ?)}] 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker. PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetAttachedDiskSnapshotsToVm] compiled 2017-07-02 16:43:22,919-04 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' (correlation id: null) 2017-07-02 16:43:22,920-04 WARN [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_ FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName CPUOverloaded,VAR__DETAIL__ CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL
This has nothing to do with migration The CPUOverload is a scheduling policy, unless there was any change in that area the obvious explanation would be that the host has a CPU overload condition. I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": "16.59", "cpuIdle": “0.08”” which indeed suggests an overload, from the same sample I can see it’s vdsm ("cpuUserVdsmd": “77.38”, cpuSysVdsmd": “18.44"
Since similar values are consistently being reported for some time, and there is a setupNetworks and storage rescan prior to the the failure, and there is no other indication of anything wrong, I’d just say the environment or the order of tests or timing has changed, but nothing wrong with the oVirt code Did any of that changed recently? Does it reproduce locally?
AFAIK, no significant environment changes or tests were done. We will try to reproduce it locally and also on the manual job, but from what it looks it is very consistent (unlike other race failures we've seen lately ) and continues to fails on the same tests, so its either a change in oVirt or something else that we're not thinking on.
Thanks, michal
2017-07-02 16:43:22,920-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e8473eb5d9=VM]', sharedLocks=''}' 2017-07-02 16:43:22,929-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [] Rescheduling DEFAULT.org.ovirt.engine.core. bll.ColdRebootAutoStartVmsRunner.startFailedAutoStartVms#-9223372036854775733 as there is no unfired trigger. 2017-07-02 16:43:22,932-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-27) [] Operation Failed: [Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.] 2017-07-02 16:43:23,331-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core. bll.HaAutoStartVmsRunner.startFailedAutoStartVms#-9223372036854775793 as there is no unfired trigger. 2017-07-02 16:43:23,332-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core. bll.tasks.CommandCallbacksPoller.invokeCallbackMethods#-9223372036854775783 as there is no unfired trigger.
<engine log>
Best Regards,
Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
OPEN SOURCE - 1 4 011 && 011 4 1
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal edri ASSOCIATE MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

Hi, sorry for posting late, I had a brief look at this yesterday: 1. I couldn't replicate it locally - which means it is most likely a recent change. 2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new version is used this week(0.40) - and they seem OK - specifically memroy and vcpus(which was my initial suspect). 3. I saw two Engine patches, a bit prior to the time it started to fail, which *might* in my common sense be related, but it is out of my scope to tell(CC'ed patch owners): core: Make VmAnalyzer to treat a migrated Paused VM as success - https://gerrit.ovirt.org/78305 fix custom fencing default config setting https://gerrit.ovirt.org/78720 Shot in the wild - Could it be that the 'CPUOverload' filter was not active before for some reason? Also, there are some exceptions in host0 vdsm log[1], failing to get VM stats, though I can't tell if they are specific to this failure. Of course this is not a complete analysis, I hope it helps. [1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifa... Nadav. On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi@redhat.com> wrote:
Hi,
Test failed: [ 006_migrations.migrate_vm ] Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/ Link to all logs: Error snippet from the log: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifa...
<error>
"Fault reason is "Operation Failed". Fault detail is "[Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]"
</error>
<engine log>
2017-07-02 16:43:22,829-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e8473eb5d9=VM]', sharedLocks=''}' 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}] 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetDiskVmElementsPluggedToVm] compiled 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getattacheddisksnapshotstovm(?, ?)}] 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetAttachedDiskSnapshotsToVm] compiled 2017-07-02 16:43:22,919-04 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' (correlation id: null) 2017-07-02 16:43:22,920-04 WARN [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL
This has nothing to do with migration The CPUOverload is a scheduling policy, unless there was any change in that area the obvious explanation would be that the host has a CPU overload condition. I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": "16.59", "cpuIdle": “0.08”” which indeed suggests an overload, from the same sample I can see it’s vdsm ("cpuUserVdsmd": “77.38”, cpuSysVdsmd": “18.44"
Since similar values are consistently being reported for some time, and there is a setupNetworks and storage rescan prior to the the failure, and there is no other indication of anything wrong, I’d just say the environment or the order of tests or timing has changed, but nothing wrong with the oVirt code Did any of that changed recently? Does it reproduce locally?
AFAIK, no significant environment changes or tests were done. We will try to reproduce it locally and also on the manual job, but from what it looks it is very consistent (unlike other race failures we've seen lately ) and continues to fails on the same tests, so its either a change in oVirt or something else that we're not thinking on.
Thanks, michal
2017-07-02 16:43:22,920-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e8473eb5d9=VM]', sharedLocks=''}' 2017-07-02 16:43:22,929-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunner.startFailedAutoStartVms#-9223372036854775733 as there is no unfired trigger. 2017-07-02 16:43:22,932-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-27) [] Operation Failed: [Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.] 2017-07-02 16:43:23,331-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.startFailedAutoStartVms#-9223372036854775793 as there is no unfired trigger. 2017-07-02 16:43:23,332-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods#-9223372036854775783 as there is no unfired trigger.
<engine log>
Best Regards,
Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
OPEN SOURCE - 1 4 011 && 011 4 1
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA
TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
1. I couldn't replicate it locally - which means it is most likely a recent change.
To clarify, I meant that with the current tested repository I couldn't replicate it, I didn't try mimicking the 'under_testing' repository which includes the recent patches(and which the failure happens on).

I was able to reproduce the error [1] on a manual run with only new vdsm from [2], and also to verify that w/o this change, while using latest tested run [3] it works. So I think this proves quite clearly the problem is one of the latest VDSM patches. I'm running again the test with the suspected bad VDSM and hopefully will be able to extract the env to tar.gz file which anyone can import using the lago demo tool. [1] http://jenkins.ovirt.org/view/oVirt%20system%20tests/ job/ovirt-system-tests_manual/748/ [2] http://jenkins.ovirt.org/job/vdsm_master_build- artifacts-el7-x86_64/2694/ [3] http://jenkins.ovirt.org/view/oVirt%20system%20tests/ job/ovirt-system-tests_manual/747/ On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
Hi, sorry for posting late, I had a brief look at this yesterday: 1. I couldn't replicate it locally - which means it is most likely a recent change. 2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new version is used this week(0.40) - and they seem OK - specifically memroy and vcpus(which was my initial suspect). 3. I saw two Engine patches, a bit prior to the time it started to fail, which *might* in my common sense be related, but it is out of my scope to tell(CC'ed patch owners):
core: Make VmAnalyzer to treat a migrated Paused VM as success - https://gerrit.ovirt.org/78305
fix custom fencing default config setting https://gerrit.ovirt.org/78720
Shot in the wild - Could it be that the 'CPUOverload' filter was not active before for some reason?
Also, there are some exceptions in host0 vdsm log[1], failing to get VM stats, though I can't tell if they are specific to this failure.
Of course this is not a complete analysis, I hope it helps.
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma ster/7431/artifact/exported-artifacts/basic-suit-master-el7/ test_logs/basic-suite-master/post-006_migrations.py/lago- basic-suite-master-host0/_var_log/vdsm/vdsm.log
Nadav.
On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi@redhat.com> wrote:
Hi,
Test failed: [ 006_migrations.migrate_vm ] Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/ Link to all logs: Error snippet from the log: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
ster/7431/artifact/exported-artifacts/basic-suit-master-el7/ test_logs/basic-suite-master/post-006_migrations.py/
<error>
"Fault reason is "Operation Failed". Fault detail is "[Cannot migrate
VM.
There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]"
</error>
<engine log>
2017-07-02 16:43:22,829-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e847 3eb5d9=VM]', sharedLocks=''}' 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}] 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetDiskVmElementsPluggedToVm] compiled 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getattacheddisksnapshotstovm(?, ?)}] 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetAttachedDiskSnapshotsToVm] compiled 2017-07-02 16:43:22,919-04 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' (correlation id: null) 2017-07-02 16:43:22,920-04 WARN [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILT ERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_ FILTERED_REASON_WITH_DETAIL
This has nothing to do with migration The CPUOverload is a scheduling policy, unless there was any change in that area the obvious explanation would be that the host has a CPU overload condition. I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": "16.59", "cpuIdle": “0.08”” which indeed suggests an overload, from the same sample I can see it’s vdsm ("cpuUserVdsmd": “77.38”, cpuSysVdsmd": “18.44"
Since similar values are consistently being reported for some time, and there is a setupNetworks and storage rescan prior to the the failure, and there is no other indication of anything wrong, I’d just say the environment or the order of tests or timing has changed, but nothing wrong with the oVirt code Did any of that changed recently? Does it reproduce locally?
AFAIK, no significant environment changes or tests were done. We will try to reproduce it locally and also on the manual job, but from what it looks it is very consistent (unlike other race failures we've seen lately ) and continues to fails on the same tests, so its either a change in oVirt or something else that we're not thinking on.
Thanks, michal
2017-07-02 16:43:22,920-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e847
3eb5d9=VM]',
sharedLocks=''}' 2017-07-02 16:43:22,929-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunn er.startFailedAutoStartVms#-9223372036854775733 as there is no unfired trigger. 2017-07-02 16:43:22,932-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-27) [] Operation Failed: [Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.] 2017-07-02 16:43:23,331-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.start FailedAutoStartVms#-9223372036854775793 as there is no unfired trigger. 2017-07-02 16:43:23,332-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoll er.invokeCallbackMethods#-9223372036854775783 as there is no unfired trigger.
<engine log>
Best Regards,
Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
OPEN SOURCE - 1 4 011 && 011 4 1
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA
TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal edri ASSOCIATE MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)

--Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 4 Jul 2017, at 13:00, Eyal Edri <eedri@redhat.com> wrote: =20 I was able to reproduce the error [1] on a manual run with only new = vdsm from [2], and also to verify that w/o this change, while using latest tested run = [3] it works. =20 So I think this proves quite clearly the problem is one of the latest = VDSM patches.
=20 I'm running again the test with the suspected bad VDSM and hopefully = will be able to extract the env to tar.gz file which anyone can import using the lago demo tool. =20 =20 =20 [1] = http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-test= s_manual/748/ = <http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tes= ts_manual/748/> [2] = http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/ = <http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/=
[3] = http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-test= s_manual/747/ = <http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tes= ts_manual/747/> =20 =20 =20 On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <ngoldin@redhat.com = <mailto:ngoldin@redhat.com>> wrote: Hi, sorry for posting late, I had a brief look at this yesterday: 1. I couldn't replicate it locally - which means it is most likely a recent change. 2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new version is used this week(0.40) - and they seem OK - specifically memroy and vcpus(which was my initial suspect). 3. I saw two Engine patches, a bit prior to the time it started to fail, which *might* in my common sense be related, but it is out of my scope to tell(CC'ed patch owners): =20 core: Make VmAnalyzer to treat a migrated Paused VM as success - https://gerrit.ovirt.org/78305 <https://gerrit.ovirt.org/78305> =20 fix custom fencing default config setting https://gerrit.ovirt.org/78720 <https://gerrit.ovirt.org/78720> =20 Shot in the wild - Could it be that the 'CPUOverload' filter was not active before for some reason? =20 Also, there are some exceptions in host0 vdsm log[1], failing to get VM stats, though I can't tell if they are specific to this failure. =20 Of course this is not a complete analysis, I hope it helps. =20 =20 [1] = http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/arti= fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master= /post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.l= og = <http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/art= ifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-maste= r/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.= log> =20 =20 Nadav. =20 =20 =20 =20 =20 On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <eedri@redhat.com = <mailto:eedri@redhat.com>> wrote:
On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> =
wrote:
On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi@redhat.com =
<mailto:sbendavi@redhat.com>> wrote:
Hi,
Test failed: [ 006_migrations.migrate_vm ] Link to suspected patches: N/A Link to Job: =
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/ = <http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/>
Link to all logs: Error snippet from the log: = http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/arti= fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master= /post-006_migrations.py/ = <http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/art= ifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-maste= r/post-006_migrations.py/>
<error>
"Fault reason is "Operation Failed". Fault detail is "[Cannot = migrate VM. There is no host that satisfies current scheduling constraints. See = below for details:, The host lago-basic-suite-master-host0 did not = satisfy internal filter CPUOverloaded because its CPU is too loaded.]"
</error>
<engine log>
2017-07-02 16:43:22,829-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object = 'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]= ', sharedLocks=3D''}' 2017-07-02 16:43:22,833-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled = stored procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}] 2017-07-02 16:43:22,833-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall = for procedure [GetDiskVmElementsPluggedToVm] compiled 2017-07-02 16:43:22,843-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled = stored procedure. Call string is [{call getattacheddisksnapshotstovm(?, = ?)}] 2017-07-02 16:43:22,843-04 DEBUG = [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple= JdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall = for procedure [GetAttachedDiskSnapshotsToVm] compiled 2017-07-02 16:43:22,919-04 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host 'lago-basic-suite-master-host0' = ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' (correlation id: null) 2017-07-02 16:43:22,920-04 WARN [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default = task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: = VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__= FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName = CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_= WITH_DETAIL
This has nothing to do with migration The CPUOverload is a scheduling policy, unless there was any change = in that area the obvious explanation would be that the host has a CPU = overload condition. I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": = "16.59", "cpuIdle": =E2=80=9C0.08=E2=80=9D=E2=80=9D which indeed suggests an = overload, from the same sample I can see it=E2=80=99s vdsm ("cpuUserVdsmd": =E2=80=9C77.38=E2=80=9D, = cpuSysVdsmd": =E2=80=9C18.44"
Since similar values are consistently being reported for some time, = and there is a setupNetworks and storage rescan prior to the the = failure, and there is no other indication of anything wrong, I=E2=80=99d just = say the environment or the order of tests or timing has changed, but nothing wrong with =
oVirt code Did any of that changed recently? Does it reproduce locally?
AFAIK, no significant environment changes or tests were done. We will try to reproduce it locally and also on the manual job, but = from what it looks it is very consistent (unlike other race failures = we've seen lately ) and continues to fails on the same tests, so its either a = change in oVirt or something else that we're not thinking on.
Thanks, michal
2017-07-02 16:43:22,920-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default =
task-27)
[87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object = 'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]= ', sharedLocks=3D''}' 2017-07-02 16:43:22,929-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunner.startFailed= AutoStartVms#-9223372036854775733 as there is no unfired trigger. 2017-07-02 16:43:22,932-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] = (default task-27) [] Operation Failed: [Cannot migrate VM. There is no host =
There is only a single patch between vdsms [1] and [3] https://gerrit.ovirt.org/#/c/78536 the that
satisfies current scheduling constraints. See below for details:, = The host lago-basic-suite-master-host0 did not satisfy internal filter = CPUOverloaded because its CPU is too loaded.] 2017-07-02 16:43:23,331-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.startFailedAutoStar= tVms#-9223372036854775793 as there is no unfired trigger. 2017-07-02 16:43:23,332-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling = DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallb= ackMethods#-9223372036854775783 as there is no unfired trigger.
<engine log>
Best Regards,
Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
OPEN SOURCE - 1 4 011 && 011 4 1
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel = <http://lists.ovirt.org/mailman/listinfo/devel>
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel = <http://lists.ovirt.org/mailman/listinfo/devel>
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA
TRIED. TESTED. TRUSTED. phone: +972-9-7692018 <tel:%2B972-9-7692018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org <mailto:Devel@ovirt.org> http://lists.ovirt.org/mailman/listinfo/devel = <http://lists.ovirt.org/mailman/listinfo/devel> =20 =20 =20 --=20 EYAL EDRI =20 ASSOCIATE MANAGER RHV DEVOPS EMEA VIRTUALIZATION R&D =20 Red Hat=C2=A0EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. = <https://redhat.com/trusted> phone: +972-9-7692018 <tel:+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 4 Jul 2017, at 13:00, Eyal Edri <<a = href=3D"mailto:eedri@redhat.com" class=3D"">eedri@redhat.com</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"">I was able to reproduce the error = [1] on a manual run with only new vdsm from [2],</div><div class=3D"">and = also to verify that w/o this change, while using latest tested run [3] = it works.</div><div class=3D""><br class=3D""></div><div class=3D"">So I = think this proves quite clearly the problem is one of the latest VDSM = patches.</div></div></div></blockquote><div><br class=3D""></div>There = is only a single patch between vdsms [1] and [3]</div><div><a = href=3D"https://gerrit.ovirt.org/#/c/78536" = class=3D"">https://gerrit.ovirt.org/#/c/78536</a></div><div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D""><br class=3D""></div><div = class=3D"">I'm running again the test with the suspected bad VDSM and = hopefully will be able to extract the env to tar.gz file</div><div = class=3D"">which anyone can import using the lago demo tool.</div><div = class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div><div = class=3D""><br class=3D""></div><div class=3D"">[1] <a = href=3D"http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-sys= tem-tests_manual/748/" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/<wbr = class=3D"">view/oVirt%20system%20tests/<wbr = class=3D"">job/ovirt-system-tests_manual/<wbr = class=3D"">748/</a></div><div class=3D"">[2] <a = href=3D"http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_6= 4/2694/" target=3D"_blank" class=3D"">http://jenkins.ovirt.org/<wbr = class=3D"">job/vdsm_master_build-<wbr = class=3D"">artifacts-el7-x86_64/2694/</a></div><div class=3D"">[3] <a= = href=3D"http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-sys= tem-tests_manual/747/" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/<wbr = class=3D"">view/oVirt%20system%20tests/<wbr = class=3D"">job/ovirt-system-tests_manual/<wbr = class=3D"">747/</a></div><div class=3D""><br class=3D""></div><br = class=3D""><div class=3D"gmail_extra"><br class=3D""><div = class=3D"gmail_quote">On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <span = dir=3D"ltr" class=3D""><<a href=3D"mailto:ngoldin@redhat.com" = target=3D"_blank" class=3D"">ngoldin@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi, sorry = for posting late, I had a brief look at this yesterday:<br class=3D""> 1. I couldn't replicate it locally - which means it is most likely a<br = class=3D""> recent change.<br class=3D""> 2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new<br = class=3D""> version is used this week(0.40) - and they seem OK - specifically<br = class=3D""> memroy and vcpus(which was my initial suspect).<br class=3D""> 3. I saw two Engine patches, a bit prior to the time it started to<br = class=3D""> fail, which *might* in my common sense be related, but it is out of = my<br class=3D""> scope to tell(CC'ed patch owners):<br class=3D""> <br class=3D""> core: Make VmAnalyzer to treat a migrated Paused VM as success -<br = class=3D""> <a href=3D"https://gerrit.ovirt.org/78305" rel=3D"noreferrer" = target=3D"_blank" class=3D"">https://gerrit.ovirt.org/78305</a><br = class=3D""> <br class=3D""> fix custom fencing default config setting<br class=3D""> <a href=3D"https://gerrit.ovirt.org/78720" rel=3D"noreferrer" = target=3D"_blank" class=3D"">https://gerrit.ovirt.org/78720</a><br = class=3D""> <br class=3D""> Shot in the wild - Could it be that the 'CPUOverload' filter was not<br = class=3D""> active before for some reason?<br class=3D""> <br class=3D""> Also, there are some exceptions in host0 vdsm log[1], failing to get<br = class=3D""> VM stats, though I can't tell if they are specific to this failure.<br = class=3D""> <br class=3D""> Of course this is not a complete analysis, I hope it helps.<br class=3D"">= <br class=3D""> <br class=3D""> [1] <a = href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7= 431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suit= e-master/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vds= m/vdsm.log" rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/t<wbr = class=3D"">est-repo_ovirt_experimental_ma<wbr = class=3D"">ster/7431/artifact/exported-ar<wbr = class=3D"">tifacts/basic-suit-master-el7/<wbr = class=3D"">test_logs/basic-suite-master/<wbr = class=3D"">post-006_migrations.py/lago-<wbr = class=3D"">basic-suite-master-host0/_var_<wbr = class=3D"">log/vdsm/vdsm.log</a><br class=3D""> <span class=3D"m_2989331196243842324gmail-HOEnZb"><font color=3D"#888888" = class=3D""><br class=3D""> <br class=3D""> Nadav.<br class=3D""> </font></span><div class=3D"m_2989331196243842324gmail-HOEnZb"><div = class=3D"m_2989331196243842324gmail-h5"><br class=3D""> <br class=3D""> <br class=3D""> <br class=3D""> <br class=3D""> On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <<a = href=3D"mailto:eedri@redhat.com" target=3D"_blank" = class=3D"">eedri@redhat.com</a>> wrote:<br class=3D""> ><br class=3D""> ><br class=3D""> > On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek<br class=3D""> > <<a href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>> wrote:<br class=3D""> >><br class=3D""> >><br class=3D""> >> On 3 Jul 2017, at 15:35, Shlomo Ben David <<a = href=3D"mailto:sbendavi@redhat.com" target=3D"_blank" = class=3D"">sbendavi@redhat.com</a>> wrote:<br class=3D""> >><br class=3D""> >> Hi,<br class=3D""> >><br class=3D""> >> Test failed: [ 006_migrations.migrate_vm ]<br class=3D""> >> Link to suspected patches: N/A<br class=3D""> >> Link to Job:<br class=3D""> >> <a = href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7= 431/" rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/t<wbr = class=3D"">est-repo_ovirt_experimental_ma<wbr class=3D"">ster/7431/</a><br= class=3D""> >> Link to all logs:<br class=3D""> >> Error snippet from the log:<br class=3D""> >> <a = href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7= 431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suit= e-master/post-006_migrations.py/" rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/t<wbr = class=3D"">est-repo_ovirt_experimental_ma<wbr = class=3D"">ster/7431/artifact/exported-ar<wbr = class=3D"">tifacts/basic-suit-master-el7/<wbr = class=3D"">test_logs/basic-suite-master/<wbr = class=3D"">post-006_migrations.py/</a><br class=3D""> >><br class=3D""> >> <error><br class=3D""> >><br class=3D""> >> "Fault reason is "Operation Failed". Fault detail is = "[Cannot migrate VM.<br class=3D""> >> There is no host that satisfies current scheduling constraints. = See below<br class=3D""> >> for details:, The host lago-basic-suite-master-host0 did not = satisfy<br class=3D""> >> internal filter CPUOverloaded because its CPU is too = loaded.]"<br class=3D""> >><br class=3D""> >> </error><br class=3D""> >><br class=3D""> >> <engine log><br class=3D""> >><br class=3D""> >> 2017-07-02 16:43:22,829-04 INFO<br class=3D""> >> [org.ovirt.engine.core.bll.Mig<wbr = class=3D"">rateVmToServerCommand] (default task-27)<br class=3D""> >> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] Lock = Acquired to object<br class=3D""> >> 'EngineLock:{exclusiveLocks=3D'[<wbr = class=3D"">2b34910d-cef2-44d6-a274-30e847<wbr class=3D"">3eb5d9=3DVM]',<br= class=3D""> >> sharedLocks=3D''}'<br class=3D""> >> 2017-07-02 16:43:22,833-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.PostgresDbEngineDialect$<wbr = class=3D"">PostgresSimpleJdbcCall]<br class=3D""> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr = class=3D"">b55bbc2] Compiled stored<br class=3D""> >> procedure. Call string is [{call = getdiskvmelementspluggedtovm(?<wbr class=3D"">)}]<br class=3D""> >> 2017-07-02 16:43:22,833-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.PostgresDbEngineDialect$<wbr = class=3D"">PostgresSimpleJdbcCall]<br class=3D""> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr = class=3D"">b55bbc2] SqlCall for<br class=3D""> >> procedure [GetDiskVmElementsPluggedToVm] compiled<br class=3D""> >> 2017-07-02 16:43:22,843-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.PostgresDbEngineDialect$<wbr = class=3D"">PostgresSimpleJdbcCall]<br class=3D""> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr = class=3D"">b55bbc2] Compiled stored<br class=3D""> >> procedure. Call string is [{call = getattacheddisksnapshotstovm(?<wbr class=3D"">, ?)}]<br class=3D""> >> 2017-07-02 16:43:22,843-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.PostgresDbEngineDialect$<wbr = class=3D"">PostgresSimpleJdbcCall]<br class=3D""> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr = class=3D"">b55bbc2] SqlCall for<br class=3D""> >> procedure [GetAttachedDiskSnapshotsToVm] compiled<br class=3D""> >> 2017-07-02 16:43:22,919-04 INFO<br class=3D""> >> [org.ovirt.engine.core.bll.sch<wbr = class=3D"">eduling.SchedulingManager] (default task-27)<br class=3D""> >> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] = Candidate host<br class=3D""> >> 'lago-basic-suite-master-host0<wbr class=3D"">' = ('46bdc63d-98f5-4eee-81aa-2fb8<wbr class=3D"">8b8f7cbe') was<br = class=3D""> >> filtered out by 'VAR__FILTERTYPE__INTERNAL' filter = 'CPUOverloaded'<br class=3D""> >> (correlation id: null)<br class=3D""> >> 2017-07-02 16:43:22,920-04 WARN<br class=3D""> >> [org.ovirt.engine.core.bll.Mig<wbr = class=3D"">rateVmToServerCommand] (default task-27)<br class=3D""> >> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] = Validation of action<br class=3D""> >> 'MigrateVmToServer' failed for user admin@internal-authz. = Reasons:<br class=3D""> >> VAR__ACTION__MIGRATE,VAR__TYPE<wbr = class=3D"">__VM,SCHEDULING_ALL_HOSTS_FILT<wbr = class=3D"">ERED_OUT,VAR__FILTERTYPE__INTE<wbr class=3D"">RNAL,$hostName<br= class=3D""> >> lago-basic-suite-master-host0,<wbr class=3D"">$filterName<br = class=3D""> >> CPUOverloaded,VAR__DETAIL__CPU<wbr = class=3D"">_OVERLOADED,SCHEDULING_HOST_<wbr = class=3D"">FILTERED_REASON_WITH_DETAIL<br class=3D""> >><br class=3D""> >><br class=3D""> >><br class=3D""> >> This has nothing to do with migration<br class=3D""> >> The CPUOverload is a scheduling policy, unless there was any = change in<br class=3D""> >> that area the obvious explanation would be that the host has a = CPU overload<br class=3D""> >> condition.<br class=3D""> >> I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": = "16.59",<br class=3D""> >> "cpuIdle": =E2=80=9C0.08=E2=80=9D=E2=80=9D which indeed = suggests an overload, from the same sample I<br class=3D""> >> can see it=E2=80=99s vdsm ("cpuUserVdsmd": =E2=80=9C77.38=E2=80=9D= , cpuSysVdsmd": =E2=80=9C18.44"<br class=3D""> >><br class=3D""> >> Since similar values are consistently being reported for some = time, and<br class=3D""> >> there is a setupNetworks and storage rescan prior to the the = failure, and<br class=3D""> >> there is no other indication of anything wrong, I=E2=80=99d = just say the environment<br class=3D""> >> or the order of tests or timing has changed, but nothing wrong = with the<br class=3D""> >> oVirt code<br class=3D""> >> Did any of that changed recently? Does it reproduce locally?<br = class=3D""> ><br class=3D""> ><br class=3D""> > AFAIK, no significant environment changes or tests were done.<br = class=3D""> > We will try to reproduce it locally and also on the manual = job, but from<br class=3D""> > what it looks it is very consistent (unlike other race failures = we've seen<br class=3D""> > lately ) and continues to fails on the same tests, so its either a = change in<br class=3D""> > oVirt or something else that we're not thinking on.<br class=3D""> ><br class=3D""> >><br class=3D""> >><br class=3D""> >> Thanks,<br class=3D""> >> michal<br class=3D""> >><br class=3D""> >> 2017-07-02 16:43:22,920-04 INFO<br class=3D""> >> [org.ovirt.engine.core.bll.Mig<wbr = class=3D"">rateVmToServerCommand] (default task-27)<br class=3D""> >> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] Lock = freed to object<br class=3D""> >> 'EngineLock:{exclusiveLocks=3D'[<wbr = class=3D"">2b34910d-cef2-44d6-a274-30e847<wbr class=3D"">3eb5d9=3DVM]',<br= class=3D""> >> sharedLocks=3D''}'<br class=3D""> >> 2017-07-02 16:43:22,929-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.utils.t<wbr = class=3D"">imer.FixedDelayJobListener]<br class=3D""> >> (DefaultQuartzScheduler7) [] Rescheduling<br class=3D""> >> <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr = class=3D"">bll.ColdRebootAutoStartVmsRunn<wbr = class=3D"">er.startFailedAutoStartVms#-92<wbr = class=3D"">23372036854775733<br class=3D""> >> as there is no unfired trigger.<br class=3D""> >> 2017-07-02 16:43:22,932-04 ERROR<br class=3D""> >> [org.ovirt.engine.api.restapi.<wbr = class=3D"">resource.AbstractBackendResour<wbr class=3D"">ce] (default<br = class=3D""> >> task-27) [] Operation Failed: [Cannot migrate VM. There is no = host that<br class=3D""> >> satisfies current scheduling constraints. See below for = details:, The host<br class=3D""> >> lago-basic-suite-master-host0 did not satisfy internal filter = CPUOverloaded<br class=3D""> >> because its CPU is too loaded.]<br class=3D""> >> 2017-07-02 16:43:23,331-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.utils.t<wbr = class=3D"">imer.FixedDelayJobListener]<br class=3D""> >> (DefaultQuartzScheduler2) [] Rescheduling<br class=3D""> >> <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr = class=3D"">bll.HaAutoStartVmsRunner.start<wbr = class=3D"">FailedAutoStartVms#-9223372036<wbr class=3D"">854775793<br = class=3D""> >> as there is no unfired trigger.<br class=3D""> >> 2017-07-02 16:43:23,332-04 DEBUG<br class=3D""> >> [org.ovirt.engine.core.utils.t<wbr = class=3D"">imer.FixedDelayJobListener]<br class=3D""> >> (DefaultQuartzScheduler2) [] Rescheduling<br class=3D""> >> <a href=3D"http://DEFAULT.org" = class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr = class=3D"">bll.tasks.CommandCallbacksPoll<wbr = class=3D"">er.invokeCallbackMethods#-9223<wbr = class=3D"">372036854775783<br class=3D""> >> as there is no unfired trigger.<br class=3D""> >><br class=3D""> >> <engine log><br class=3D""> >><br class=3D""> >><br class=3D""> >><br class=3D""> >> Best Regards,<br class=3D""> >><br class=3D""> >> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL<br = class=3D""> >> RHCSA | RHCVA | RHCE<br class=3D""> >> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)<br = class=3D""> >><br class=3D""> >> OPEN SOURCE - 1 4 011 && 011 4 1<br class=3D""> >><br class=3D""> >> ______________________________<wbr = class=3D"">_________________<br class=3D""> >> Devel mailing list<br class=3D""> >> <a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" = class=3D"">Devel@ovirt.org</a><br class=3D""> >> <a href=3D"http://lists.ovirt.org/mailman/listinfo/devel" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/devel</a><br class=3D""> >><br class=3D""> >><br class=3D""> >><br class=3D""> >> ______________________________<wbr = class=3D"">_________________<br class=3D""> >> Devel mailing list<br class=3D""> >> <a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" = class=3D"">Devel@ovirt.org</a><br class=3D""> >> <a href=3D"http://lists.ovirt.org/mailman/listinfo/devel" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/devel</a><br class=3D""> ><br class=3D""> ><br class=3D""> ><br class=3D""> ><br class=3D""> > --<br class=3D""> ><br class=3D""> > Eyal edri<br class=3D""> ><br class=3D""> ><br class=3D""> > ASSOCIATE MANAGER<br class=3D""> ><br class=3D""> > RHV DevOps<br class=3D""> ><br class=3D""> > EMEA VIRTUALIZATION R&D<br class=3D""> ><br class=3D""> ><br class=3D""> > Red Hat EMEA<br class=3D""> ><br class=3D""> </div></div><span class=3D"m_2989331196243842324gmail-im = m_2989331196243842324gmail-HOEnZb">> TRIED. TESTED. TRUSTED.<br = class=3D""> > phone: <a href=3D"tel:%2B972-9-7692018" value=3D"+97297692018" = target=3D"_blank" class=3D"">+972-9-7692018</a><br class=3D""> > irc: eedri (on #tlv #rhev-dev #rhev-integ)<br class=3D""> ><br class=3D""> </span><div class=3D"m_2989331196243842324gmail-HOEnZb"><div = class=3D"m_2989331196243842324gmail-h5">> = ______________________________<wbr class=3D"">_________________<br = class=3D""> > Devel mailing list<br class=3D""> > <a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" = class=3D"">Devel@ovirt.org</a><br class=3D""> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/devel" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/devel</a><br class=3D""> </div></div></blockquote></div><br class=3D""><br clear=3D"all" = class=3D""><div class=3D""><br class=3D""></div>-- <br class=3D""><div = class=3D"m_2989331196243842324gmail_signature"><div dir=3D"ltr" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D""><div dir=3D"ltr" = class=3D""><div class=3D""><div style=3D"font-family: overpass, = sans-serif; margin: 0px; padding: 0px; font-size: 14px; text-transform: = uppercase; font-weight: bold;" class=3D""><font color=3D"#cc0000" = class=3D"">Eyal edri</font></div><div style=3D"font-family: overpass, = sans-serif; font-weight: bold; margin: 0px; padding: 0px; font-size: = 14px; text-transform: uppercase;" class=3D""><br class=3D""></div><p = style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px = 0px 4px; text-transform: uppercase;" class=3D"">ASSOCIATE MANAGER</p><p = style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px = 0px 4px; text-transform: uppercase;" class=3D"">RHV DevOps</p><p = style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px = 0px 4px; text-transform: uppercase;" class=3D"">EMEA VIRTUALIZATION = R&D</p><p style=3D"font-family: overpass, sans-serif; font-size: = 10px; margin: 0px 0px 4px; text-transform: uppercase;" class=3D""><br = class=3D""></p><div style=3D"font-family: overpass, sans-serif; margin: = 0px; font-size: 10px; color: rgb(153, 153, 153);" class=3D""><a = href=3D"https://www.redhat.com/" style=3D"color:rgb(0,136,206);margin:0px"= target=3D"_blank" class=3D"">Red Hat EMEA</a></div><table = border=3D"0" style=3D"font-family: overpass, sans-serif; font-size: = inherit;" class=3D""><tbody class=3D""><tr class=3D""><td width=3D"100px" = class=3D""><a href=3D"https://red.ht/sig" style=3D"color:rgb(17,85,204)" = target=3D"_blank" class=3D""><img = src=3D"https://www.redhat.com/profiles/rh/themes/redhatdotcom/img/logo-red= -hat-black.png" width=3D"90" height=3D"auto" class=3D""></a></td><td = style=3D"font-size:10px" class=3D""><a href=3D"https://redhat.com/trusted"= style=3D"color:rgb(204,0,0);font-weight:bold" target=3D"_blank" = class=3D"">TRIED. TESTED. = TRUSTED.</a></td></tr></tbody></table></div><div class=3D"">phone: <a = href=3D"tel:+972%209-769-2018" value=3D"+97297692018" target=3D"_blank" = class=3D"">+972-9-7692018</a><br class=3D"">irc: eedri (on #tlv = #rhev-dev = #rhev-integ)</div></div></div></div></div></div></div></div></div> </div></div> </div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756--

https://gerrit.ovirt.org/#/c/78536 broke network functional tests but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/ I tried to run OST with my fix yesterday and still encountered the same failures. On Tue, Jul 4, 2017 at 2:25 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 4 Jul 2017, at 13:00, Eyal Edri <eedri@redhat.com> wrote:
I was able to reproduce the error [1] on a manual run with only new vdsm from [2], and also to verify that w/o this change, while using latest tested run [3] it works.
So I think this proves quite clearly the problem is one of the latest VDSM patches.
There is only a single patch between vdsms [1] and [3] https://gerrit.ovirt.org/#/c/78536
I'm running again the test with the suspected bad VDSM and hopefully will be able to extract the env to tar.gz file which anyone can import using the lago demo tool.
[1] http://jenkins.ovirt.org/view/oVirt%20system%20tests/job /ovirt-system-tests_manual/748/ [2] http://jenkins.ovirt.org/job/vdsm_master_build-artifacts -el7-x86_64/2694/ [3] http://jenkins.ovirt.org/view/oVirt%20system%20tests/job /ovirt-system-tests_manual/747/
On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <ngoldin@redhat.com> wrote:
Hi, sorry for posting late, I had a brief look at this yesterday: 1. I couldn't replicate it locally - which means it is most likely a recent change. 2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new version is used this week(0.40) - and they seem OK - specifically memroy and vcpus(which was my initial suspect). 3. I saw two Engine patches, a bit prior to the time it started to fail, which *might* in my common sense be related, but it is out of my scope to tell(CC'ed patch owners):
core: Make VmAnalyzer to treat a migrated Paused VM as success - https://gerrit.ovirt.org/78305
fix custom fencing default config setting https://gerrit.ovirt.org/78720
Shot in the wild - Could it be that the 'CPUOverload' filter was not active before for some reason?
Also, there are some exceptions in host0 vdsm log[1], failing to get VM stats, though I can't tell if they are specific to this failure.
Of course this is not a complete analysis, I hope it helps.
[1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma ster/7431/artifact/exported-artifacts/basic-suit-master-el7/ test_logs/basic-suite-master/post-006_migrations.py/lago-bas ic-suite-master-host0/_var_log/vdsm/vdsm.log
Nadav.
On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi@redhat.com> wrote:
Hi,
Test failed: [ 006_migrations.migrate_vm ] Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/ Link to all logs: Error snippet from the log: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
ster/7431/artifact/exported-artifacts/basic-suit-master-el7/ test_logs/basic-suite-master/post-006_migrations.py/
<error>
"Fault reason is "Operation Failed". Fault detail is "[Cannot migrate
VM.
There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]"
</error>
<engine log>
2017-07-02 16:43:22,829-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e847 3eb5d9=VM]', sharedLocks=''}' 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}] 2017-07-02 16:43:22,833-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetDiskVmElementsPluggedToVm] compiled 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored procedure. Call string is [{call getattacheddisksnapshotstovm(?, ?)}] 2017-07-02 16:43:22,843-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ PostgresSimpleJdbcCall] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for procedure [GetAttachedDiskSnapshotsToVm] compiled 2017-07-02 16:43:22,919-04 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded' (correlation id: null) 2017-07-02 16:43:22,920-04 WARN [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action 'MigrateVmToServer' failed for user admin@internal-authz. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILT ERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName lago-basic-suite-master-host0,$filterName CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FI LTERED_REASON_WITH_DETAIL
This has nothing to do with migration The CPUOverload is a scheduling policy, unless there was any change in that area the obvious explanation would be that the host has a CPU overload condition. I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": "16.59", "cpuIdle": “0.08”” which indeed suggests an overload, from the same sample I can see it’s vdsm ("cpuUserVdsmd": “77.38”, cpuSysVdsmd": “18.44"
Since similar values are consistently being reported for some time, and there is a setupNetworks and storage rescan prior to the the failure, and there is no other indication of anything wrong, I’d just say the environment or the order of tests or timing has changed, but nothing wrong with the oVirt code Did any of that changed recently? Does it reproduce locally?
AFAIK, no significant environment changes or tests were done. We will try to reproduce it locally and also on the manual job, but from what it looks it is very consistent (unlike other race failures we've seen lately ) and continues to fails on the same tests, so its either a change in oVirt or something else that we're not thinking on.
Thanks, michal
2017-07-02 16:43:22,920-04 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e847
3eb5d9=VM]',
sharedLocks=''}' 2017-07-02 16:43:22,929-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunn er.startFailedAutoStartVms#-9223372036854775733 as there is no unfired trigger. 2017-07-02 16:43:22,932-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-27) [] Operation Failed: [Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.] 2017-07-02 16:43:23,331-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.start FailedAutoStartVms#-9223372036854775793 as there is no unfired trigger. 2017-07-02 16:43:23,332-04 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler2) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoll er.invokeCallbackMethods#-9223372036854775783 as there is no unfired trigger.
<engine log>
Best Regards,
Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
OPEN SOURCE - 1 4 011 && 011 4 1
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA
TRIED. TESTED. TRUSTED. phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- IRIT GOIHMAN SOFTWARE ENGINEER EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>

On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 broke network functional tests but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/
I tried to run OST with my fix yesterday and still encountered the same failures.
Here is a reproducer of the failure with the fix patch: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/ So that was not it probably... -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

This is a multi-part message in MIME format. --------------49F132688DE9CAA39D0927E1 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit This issue is reproduced locally as well. you can run the following to reproduce locally ./run_suite.sh -s http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master you will have the environment still running which would allow to view the live environment. if you have any issues please ping me and I will help any way I can. Thanks, Dafna On 07/04/2017 01:35 PM, Barak Korren wrote:
On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com <mailto:igoihman@redhat.com>> wrote:
https://gerrit.ovirt.org/#/c/78536 <https://gerrit.ovirt.org/#/c/78536> broke network functional tests but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/ <https://gerrit.ovirt.org/#/c/78925/>
I tried to run OST with my fix yesterday and still encountered the same failures.
Here is a reproducer of the failure with the fix patch: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/
So that was not it probably...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com <http://redhat.com> | TRIED. TESTED. TRUSTED. | redhat.com/trusted <http://redhat.com/trusted>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--------------49F132688DE9CAA39D0927E1 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">This issue is reproduced locally as well. <br> <br> you can run the following to reproduce locally <br> <br> ./run_suite.sh -s <a class="moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/">http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/</a> basic-suite-master<br> <br> you will have the environment still running which would allow to view the live environment. <br> if you have any issues please ping me and I will help any way I can. <br> <br> Thanks, <br> Dafna<br> <br> <br> <br> <br> On 07/04/2017 01:35 PM, Barak Korren wrote:<br> </div> <blockquote type="cite" cite="mid:CAGJrMmp5ruvcWH1qM47cz8vK6X-0KV=YX6Jx_PmT804u16XERg@mail.gmail.com"> <div dir="ltr"><br> <div class="gmail_extra"><br> <div class="gmail_quote">On 4 July 2017 at 14:32, Irit Goihman <span dir="ltr"><<a href="mailto:igoihman@redhat.com" target="_blank" moz-do-not-send="true">igoihman@redhat.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div dir="ltr"> <div><a href="https://gerrit.ovirt.org/#/c/78536" target="_blank" moz-do-not-send="true">https://gerrit.ovirt.org/#/c/<wbr>78536</a> broke network functional tests but a fix was merged today: <a href="https://gerrit.ovirt.org/#/c/78925/" target="_blank" moz-do-not-send="true">https://gerrit.ovirt.org/#/c/<wbr>78925/</a><br> </div> <div><br> </div> <div>I tried to run OST with my fix yesterday and still encountered the same failures.</div> </div> </blockquote> <div><br> </div> <div>Here is a reproducer of the failure with the fix patch:<br> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/" moz-do-not-send="true">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/</a><br> <br> </div> <div>So that was not it probably...<br> </div> </div> <br clear="all"> <br> -- <br> <div class="gmail_signature">Barak Korren<br> RHV DevOps team , RHCE, RHCi<br> Red Hat EMEA<br> <a href="http://redhat.com" target="_blank" moz-do-not-send="true">redhat.com</a> | TRIED. TESTED. TRUSTED. | <a href="http://redhat.com/trusted" target="_blank" moz-do-not-send="true">redhat.com/trusted</a></div> </div> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:Devel@ovirt.org">Devel@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/devel">http://lists.ovirt.org/mailman/listinfo/devel</a></pre> </blockquote> <p><br> </p> </body> </html> --------------49F132688DE9CAA39D0927E1--

On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote:
This issue is reproduced locally as well.
you can run the following to reproduce locally
./run_suite.sh -s http://jenkins.ovirt.org/job/ vdsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master
you will have the environment still running which would allow to view the live environment. if you have any issues please ping me and I will help any way I can.
Thanks, Dafna
Here is the list of changes done from the vdsm that is verified ( in tested now ) to HEAD: * 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add integration tests for client reconnect (6 hours ago) Irit Goihman < igoihman@redhat.com> * 2a2f6cd - stomp: set default heartbeat values and add grace period (6 hours ago) Irit Goihman <igoihman@redhat.com> * 56c306a - tests: Make random uuid test repeatable (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir Soffer < nsoffer@redhat.com> * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir Soffer < nsoffer@redhat.com> * f923b0b - storage: Added disk type change logging (18 hours ago) Denis Chaplygin <dchaplyg@redhat.com> * f1d54a1 - net: Unneeded newline is added when updating only the mtu (25 hours ago) Edward Haas <edwardh@redhat.com> * 9056d61 - virt: metadata: remove dead code (26 hours ago) Francesco Romani <fromani@redhat.com> * 08982b4 - virt: network: use core.find_device_guest_address (31 hours ago) Francesco Romani <fromani@redhat.com> * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir Soffer < nsoffer@redhat.com> * 42f5efb - stomp: implement client reconnect (2 days ago) Irit Goihman < igoihman@redhat.com>
On 07/04/2017 01:35 PM, Barak Korren wrote:
On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 broke network functional tests but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/
I tried to run OST with my fix yesterday and still encountered the same failures.
Here is a reproducer of the failure with the fix patch: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/
So that was not it probably...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________ Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal edri ASSOCIATE MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

I've checked vdsm logs and couldn't find anything related to my change. I'll run OST without my changes and see if it runs successfully. On Tue, Jul 4, 2017 at 4:49 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote:
This issue is reproduced locally as well.
you can run the following to reproduce locally
./run_suite.sh -s http://jenkins.ovirt.org/job/v dsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master
you will have the environment still running which would allow to view the live environment. if you have any issues please ping me and I will help any way I can.
Thanks, Dafna
Here is the list of changes done from the vdsm that is verified ( in tested now ) to HEAD:
* 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add integration tests for client reconnect (6 hours ago) Irit Goihman < igoihman@redhat.com> * 2a2f6cd - stomp: set default heartbeat values and add grace period (6 hours ago) Irit Goihman <igoihman@redhat.com> * 56c306a - tests: Make random uuid test repeatable (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir Soffer < nsoffer@redhat.com> * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir Soffer < nsoffer@redhat.com> * f923b0b - storage: Added disk type change logging (18 hours ago) Denis Chaplygin <dchaplyg@redhat.com> * f1d54a1 - net: Unneeded newline is added when updating only the mtu (25 hours ago) Edward Haas <edwardh@redhat.com> * 9056d61 - virt: metadata: remove dead code (26 hours ago) Francesco Romani <fromani@redhat.com> * 08982b4 - virt: network: use core.find_device_guest_address (31 hours ago) Francesco Romani <fromani@redhat.com> * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir Soffer < nsoffer@redhat.com> * 42f5efb - stomp: implement client reconnect (2 days ago) Irit Goihman < igoihman@redhat.com>
On 07/04/2017 01:35 PM, Barak Korren wrote:
On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 broke network functional tests but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/
I tried to run OST with my fix yesterday and still encountered the same failures.
Here is a reproducer of the failure with the fix patch: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/
So that was not it probably...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________ Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- IRIT GOIHMAN SOFTWARE ENGINEER EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>

Looking at the last experimental job the reason of the failure is: 2017-07-04 09:39:10,491-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-18) [] Operation Failed: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.] Do we think that vdsm increased its cpu consumption recently? On Tue, Jul 4, 2017 at 3:54 PM, Irit Goihman <igoihman@redhat.com> wrote:
I've checked vdsm logs and couldn't find anything related to my change. I'll run OST without my changes and see if it runs successfully.
On Tue, Jul 4, 2017 at 4:49 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote:
This issue is reproduced locally as well.
you can run the following to reproduce locally
./run_suite.sh -s http://jenkins.ovirt.org/job/v dsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master
you will have the environment still running which would allow to view the live environment. if you have any issues please ping me and I will help any way I can.
Thanks, Dafna
Here is the list of changes done from the vdsm that is verified ( in tested now ) to HEAD:
* 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add integration tests for client reconnect (6 hours ago) Irit Goihman < igoihman@redhat.com> * 2a2f6cd - stomp: set default heartbeat values and add grace period (6 hours ago) Irit Goihman <igoihman@redhat.com> * 56c306a - tests: Make random uuid test repeatable (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir Soffer <nsoffer@redhat.com> * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir Soffer < nsoffer@redhat.com> * f923b0b - storage: Added disk type change logging (18 hours ago) Denis Chaplygin <dchaplyg@redhat.com> * f1d54a1 - net: Unneeded newline is added when updating only the mtu (25 hours ago) Edward Haas <edwardh@redhat.com> * 9056d61 - virt: metadata: remove dead code (26 hours ago) Francesco Romani <fromani@redhat.com> * 08982b4 - virt: network: use core.find_device_guest_address (31 hours ago) Francesco Romani <fromani@redhat.com> * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir Soffer < nsoffer@redhat.com> * 42f5efb - stomp: implement client reconnect (2 days ago) Irit Goihman < igoihman@redhat.com>
On 07/04/2017 01:35 PM, Barak Korren wrote:
On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 broke network functional tests but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/
I tried to run OST with my fix yesterday and still encountered the same failures.
Here is a reproducer of the failure with the fix patch: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/
So that was not it probably...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________ Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Guys, I think we proved which vdsm works ( git sha1: 28558d7 ) and what was the changelog since until the point it fails, so you have the list of changes and steps to reproduce locally. Which again, is reproducible on CI and locally, so please go over the changes done or reproduce the problem locally and see the issue on a live system. On Tue, Jul 4, 2017 at 5:07 PM, Piotr Kliczewski <piotr.kliczewski@gmail.com
wrote:
Looking at the last experimental job the reason of the failure is:
2017-07-04 09:39:10,491-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-18) [] Operation Failed: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]
Do we think that vdsm increased its cpu consumption recently?
On Tue, Jul 4, 2017 at 3:54 PM, Irit Goihman <igoihman@redhat.com> wrote:
I've checked vdsm logs and couldn't find anything related to my change. I'll run OST without my changes and see if it runs successfully.
On Tue, Jul 4, 2017 at 4:49 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote:
This issue is reproduced locally as well.
you can run the following to reproduce locally
./run_suite.sh -s http://jenkins.ovirt.org/job/v dsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master
you will have the environment still running which would allow to view the live environment. if you have any issues please ping me and I will help any way I can.
Thanks, Dafna
Here is the list of changes done from the vdsm that is verified ( in tested now ) to HEAD:
* 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add integration tests for client reconnect (6 hours ago) Irit Goihman < igoihman@redhat.com> * 2a2f6cd - stomp: set default heartbeat values and add grace period (6 hours ago) Irit Goihman <igoihman@redhat.com> * 56c306a - tests: Make random uuid test repeatable (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir Soffer <nsoffer@redhat.com> * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * f923b0b - storage: Added disk type change logging (18 hours ago) Denis Chaplygin <dchaplyg@redhat.com> * f1d54a1 - net: Unneeded newline is added when updating only the mtu (25 hours ago) Edward Haas <edwardh@redhat.com> * 9056d61 - virt: metadata: remove dead code (26 hours ago) Francesco Romani <fromani@redhat.com> * 08982b4 - virt: network: use core.find_device_guest_address (31 hours ago) Francesco Romani <fromani@redhat.com> * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir Soffer < nsoffer@redhat.com> * 42f5efb - stomp: implement client reconnect (2 days ago) Irit Goihman < igoihman@redhat.com>
On 07/04/2017 01:35 PM, Barak Korren wrote:
On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 broke network functional tests but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/
I tried to run OST with my fix yesterday and still encountered the same failures.
Here is a reproducer of the failure with the fix patch: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/
So that was not it probably...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________ Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal edri ASSOCIATE MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

https://gerrit.ovirt.org/#/c/78536 was indeed the offending patch, the change was reverted and OST should pass now. On Tue, Jul 4, 2017 at 5:19 PM, Eyal Edri <eedri@redhat.com> wrote:
Guys,
I think we proved which vdsm works ( git sha1: 28558d7 ) and what was the changelog since until the point it fails, so you have the list of changes and steps to reproduce locally. Which again, is reproducible on CI and locally, so please go over the changes done or reproduce the problem locally and see the issue on a live system.
On Tue, Jul 4, 2017 at 5:07 PM, Piotr Kliczewski < piotr.kliczewski@gmail.com> wrote:
Looking at the last experimental job the reason of the failure is:
2017-07-04 09:39:10,491-04 ERROR [org.ovirt.engine.api.restapi. resource.AbstractBackendResource] (default task-18) [] Operation Failed: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]
Do we think that vdsm increased its cpu consumption recently?
On Tue, Jul 4, 2017 at 3:54 PM, Irit Goihman <igoihman@redhat.com> wrote:
I've checked vdsm logs and couldn't find anything related to my change. I'll run OST without my changes and see if it runs successfully.
On Tue, Jul 4, 2017 at 4:49 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote:
This issue is reproduced locally as well.
you can run the following to reproduce locally
./run_suite.sh -s http://jenkins.ovirt.org/job/v dsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master
you will have the environment still running which would allow to view the live environment. if you have any issues please ping me and I will help any way I can.
Thanks, Dafna
Here is the list of changes done from the vdsm that is verified ( in tested now ) to HEAD:
* 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add integration tests for client reconnect (6 hours ago) Irit Goihman < igoihman@redhat.com> * 2a2f6cd - stomp: set default heartbeat values and add grace period (6 hours ago) Irit Goihman <igoihman@redhat.com> * 56c306a - tests: Make random uuid test repeatable (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir Soffer <nsoffer@redhat.com> * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * f923b0b - storage: Added disk type change logging (18 hours ago) Denis Chaplygin <dchaplyg@redhat.com> * f1d54a1 - net: Unneeded newline is added when updating only the mtu (25 hours ago) Edward Haas <edwardh@redhat.com> * 9056d61 - virt: metadata: remove dead code (26 hours ago) Francesco Romani <fromani@redhat.com> * 08982b4 - virt: network: use core.find_device_guest_address (31 hours ago) Francesco Romani <fromani@redhat.com> * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir Soffer <nsoffer@redhat.com> * 42f5efb - stomp: implement client reconnect (2 days ago) Irit Goihman <igoihman@redhat.com>
On 07/04/2017 01:35 PM, Barak Korren wrote:
On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 broke network functional tests but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/
I tried to run OST with my fix yesterday and still encountered the same failures.
Here is a reproducer of the failure with the fix patch: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/
So that was not it probably...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________ Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
-- IRIT GOIHMAN SOFTWARE ENGINEER EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>

On Wed, Jul 5, 2017 at 9:39 AM, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 was indeed the offending patch, the change was reverted and OST should pass now.
- Do we know why? - O-S-T seems to be a great tool for finding JSON-RPC/STOMP issues. I suggest running it on every change related to these. Y.
On Tue, Jul 4, 2017 at 5:19 PM, Eyal Edri <eedri@redhat.com> wrote:
Guys,
I think we proved which vdsm works ( git sha1: 28558d7 ) and what was the changelog since until the point it fails, so you have the list of changes and steps to reproduce locally. Which again, is reproducible on CI and locally, so please go over the changes done or reproduce the problem locally and see the issue on a live system.
On Tue, Jul 4, 2017 at 5:07 PM, Piotr Kliczewski < piotr.kliczewski@gmail.com> wrote:
Looking at the last experimental job the reason of the failure is:
2017-07-04 09:39:10,491-04 ERROR [org.ovirt.engine.api.restapi. resource.AbstractBackendResource] (default task-18) [] Operation Failed: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]
Do we think that vdsm increased its cpu consumption recently?
On Tue, Jul 4, 2017 at 3:54 PM, Irit Goihman <igoihman@redhat.com> wrote:
I've checked vdsm logs and couldn't find anything related to my change. I'll run OST without my changes and see if it runs successfully.
On Tue, Jul 4, 2017 at 4:49 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote:
This issue is reproduced locally as well.
you can run the following to reproduce locally
./run_suite.sh -s http://jenkins.ovirt.org/job/v dsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master
you will have the environment still running which would allow to view the live environment. if you have any issues please ping me and I will help any way I can.
Thanks, Dafna
Here is the list of changes done from the vdsm that is verified ( in tested now ) to HEAD:
* 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add integration tests for client reconnect (6 hours ago) Irit Goihman < igoihman@redhat.com> * 2a2f6cd - stomp: set default heartbeat values and add grace period (6 hours ago) Irit Goihman <igoihman@redhat.com> * 56c306a - tests: Make random uuid test repeatable (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir Soffer <nsoffer@redhat.com> * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * f923b0b - storage: Added disk type change logging (18 hours ago) Denis Chaplygin <dchaplyg@redhat.com> * f1d54a1 - net: Unneeded newline is added when updating only the mtu (25 hours ago) Edward Haas <edwardh@redhat.com> * 9056d61 - virt: metadata: remove dead code (26 hours ago) Francesco Romani <fromani@redhat.com> * 08982b4 - virt: network: use core.find_device_guest_address (31 hours ago) Francesco Romani <fromani@redhat.com> * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir Soffer <nsoffer@redhat.com> * 42f5efb - stomp: implement client reconnect (2 days ago) Irit Goihman <igoihman@redhat.com>
On 07/04/2017 01:35 PM, Barak Korren wrote:
On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote:
> https://gerrit.ovirt.org/#/c/78536 broke network functional tests > but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/ > > I tried to run OST with my fix yesterday and still encountered the > same failures. >
Here is a reproducer of the failure with the fix patch: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/
So that was not it probably...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________ Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Wed, Jul 5, 2017 at 10:02 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Wed, Jul 5, 2017 at 9:39 AM, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 was indeed the offending patch, the change was reverted and OST should pass now.
- Do we know why? - O-S-T seems to be a great tool for finding JSON-RPC/STOMP issues. I suggest running it on every change related to these.
In addition, if we have collectd installed now, can't we add a test that will check if CPU/Memory consumption spike above the normal and fail before it reach actions like vm run/migration?
Y.
On Tue, Jul 4, 2017 at 5:19 PM, Eyal Edri <eedri@redhat.com> wrote:
Guys,
I think we proved which vdsm works ( git sha1: 28558d7 ) and what was the changelog since until the point it fails, so you have the list of changes and steps to reproduce locally. Which again, is reproducible on CI and locally, so please go over the changes done or reproduce the problem locally and see the issue on a live system.
On Tue, Jul 4, 2017 at 5:07 PM, Piotr Kliczewski < piotr.kliczewski@gmail.com> wrote:
Looking at the last experimental job the reason of the failure is:
2017-07-04 09:39:10,491-04 ERROR [org.ovirt.engine.api.restapi. resource.AbstractBackendResource] (default task-18) [] Operation Failed: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]
Do we think that vdsm increased its cpu consumption recently?
On Tue, Jul 4, 2017 at 3:54 PM, Irit Goihman <igoihman@redhat.com> wrote:
I've checked vdsm logs and couldn't find anything related to my change. I'll run OST without my changes and see if it runs successfully.
On Tue, Jul 4, 2017 at 4:49 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote:
> This issue is reproduced locally as well. > > you can run the following to reproduce locally > > ./run_suite.sh -s http://jenkins.ovirt.org/job/v > dsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master > > you will have the environment still running which would allow to > view the live environment. > if you have any issues please ping me and I will help any way I can. > > Thanks, > Dafna > > > Here is the list of changes done from the vdsm that is verified ( in tested now ) to HEAD:
* 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add integration tests for client reconnect (6 hours ago) Irit Goihman < igoihman@redhat.com> * 2a2f6cd - stomp: set default heartbeat values and add grace period (6 hours ago) Irit Goihman <igoihman@redhat.com> * 56c306a - tests: Make random uuid test repeatable (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir Soffer <nsoffer@redhat.com> * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * f923b0b - storage: Added disk type change logging (18 hours ago) Denis Chaplygin <dchaplyg@redhat.com> * f1d54a1 - net: Unneeded newline is added when updating only the mtu (25 hours ago) Edward Haas <edwardh@redhat.com> * 9056d61 - virt: metadata: remove dead code (26 hours ago) Francesco Romani <fromani@redhat.com> * 08982b4 - virt: network: use core.find_device_guest_address (31 hours ago) Francesco Romani <fromani@redhat.com> * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir Soffer <nsoffer@redhat.com> * 42f5efb - stomp: implement client reconnect (2 days ago) Irit Goihman <igoihman@redhat.com>
> > > > On 07/04/2017 01:35 PM, Barak Korren wrote: > > > > On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote: > >> https://gerrit.ovirt.org/#/c/78536 broke network functional tests >> but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/ >> >> I tried to run OST with my fix yesterday and still encountered the >> same failures. >> > > Here is a reproducer of the failure with the fix patch: > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/ > > So that was not it probably... > > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > > > _______________________________________________ > Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel > > > > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel >
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Eyal edri ASSOCIATE MANAGER RHV DevOps EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)

On Wed, Jul 5, 2017 at 10:13 AM Eyal Edri <eedri@redhat.com> wrote:
On Wed, Jul 5, 2017 at 10:02 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Wed, Jul 5, 2017 at 9:39 AM, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 was indeed the offending patch, the change was reverted and OST should pass now.
- Do we know why? - O-S-T seems to be a great tool for finding JSON-RPC/STOMP issues. I suggest running it on every change related to these.
In addition, if we have collectd installed now, can't we add a test that will check if CPU/Memory consumption spike above the normal and fail before it reach actions like vm run/migration?
on my list.
Y.
On Tue, Jul 4, 2017 at 5:19 PM, Eyal Edri <eedri@redhat.com> wrote:
Guys,
I think we proved which vdsm works ( git sha1: 28558d7 ) and what was the changelog since until the point it fails, so you have the list of changes and steps to reproduce locally. Which again, is reproducible on CI and locally, so please go over the changes done or reproduce the problem locally and see the issue on a live system.
On Tue, Jul 4, 2017 at 5:07 PM, Piotr Kliczewski < piotr.kliczewski@gmail.com> wrote:
Looking at the last experimental job the reason of the failure is:
2017-07-04 09:39:10,491-04 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-18) [] Operation Failed: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]
Do we think that vdsm increased its cpu consumption recently?
On Tue, Jul 4, 2017 at 3:54 PM, Irit Goihman <igoihman@redhat.com> wrote:
I've checked vdsm logs and couldn't find anything related to my change. I'll run OST without my changes and see if it runs successfully.
On Tue, Jul 4, 2017 at 4:49 PM, Eyal Edri <eedri@redhat.com> wrote:
> > > On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote: > >> This issue is reproduced locally as well. >> >> you can run the following to reproduce locally >> >> ./run_suite.sh -s >> http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/ >> basic-suite-master >> >> you will have the environment still running which would allow to >> view the live environment. >> if you have any issues please ping me and I will help any way I >> can. >> >> Thanks, >> Dafna >> >> >> > Here is the list of changes done from the vdsm that is verified ( in > tested now ) to HEAD: > > * 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add > integration tests for client reconnect (6 hours ago) Irit Goihman < > igoihman@redhat.com> > * 2a2f6cd - stomp: set default heartbeat values and add grace period > (6 hours ago) Irit Goihman <igoihman@redhat.com> > * 56c306a - tests: Make random uuid test repeatable (17 hours ago) > Nir Soffer <nsoffer@redhat.com> > * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 > hours ago) Nir Soffer <nsoffer@redhat.com> > * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir > Soffer <nsoffer@redhat.com> > * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir > Soffer <nsoffer@redhat.com> > * f923b0b - storage: Added disk type change logging (18 hours ago) > Denis Chaplygin <dchaplyg@redhat.com> > * f1d54a1 - net: Unneeded newline is added when updating only the > mtu (25 hours ago) Edward Haas <edwardh@redhat.com> > * 9056d61 - virt: metadata: remove dead code (26 hours ago) > Francesco Romani <fromani@redhat.com> > * 08982b4 - virt: network: use core.find_device_guest_address (31 > hours ago) Francesco Romani <fromani@redhat.com> > * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir > Soffer <nsoffer@redhat.com> > * 42f5efb - stomp: implement client reconnect (2 days ago) Irit > Goihman <igoihman@redhat.com> > > >> >> >> >> On 07/04/2017 01:35 PM, Barak Korren wrote: >> >> >> >> On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote: >> >>> https://gerrit.ovirt.org/#/c/78536 broke network functional tests >>> but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/ >>> >>> I tried to run OST with my fix yesterday and still encountered the >>> same failures. >>> >> >> Here is a reproducer of the failure with the fix patch: >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/ >> >> So that was not it probably... >> >> >> -- >> Barak Korren >> RHV DevOps team , RHCE, RHCi >> Red Hat EMEA >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >> >> >> _______________________________________________ >> Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel >> >> >> >> _______________________________________________ >> Devel mailing list >> Devel@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > > Eyal edri > > > ASSOCIATE MANAGER > > RHV DevOps > > EMEA VIRTUALIZATION R&D > > > Red Hat EMEA <https://www.redhat.com/> > <https://red.ht/sig> TRIED. TESTED. TRUSTED. > <https://redhat.com/trusted> > phone: +972-9-7692018 <+972%209-769-2018> > irc: eedri (on #tlv #rhev-dev #rhev-integ) >
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)

On Wed, Jul 5, 2017 at 10:02 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Wed, Jul 5, 2017 at 9:39 AM, Irit Goihman <igoihman@redhat.com> wrote:
https://gerrit.ovirt.org/#/c/78536 was indeed the offending patch, the change was reverted and OST should pass now.
- Do we know why?
High CPU load caused by reactor thread which is triggered after heartbeats timeout has exceeded. I'm still testing it and trying to find the root cause. - O-S-T seems to be a great tool for finding JSON-RPC/STOMP issues. I
suggest running it on every change related to these.
This will be part of verification from now on.
Y.
On Tue, Jul 4, 2017 at 5:19 PM, Eyal Edri <eedri@redhat.com> wrote:
Guys,
I think we proved which vdsm works ( git sha1: 28558d7 ) and what was the changelog since until the point it fails, so you have the list of changes and steps to reproduce locally. Which again, is reproducible on CI and locally, so please go over the changes done or reproduce the problem locally and see the issue on a live system.
On Tue, Jul 4, 2017 at 5:07 PM, Piotr Kliczewski < piotr.kliczewski@gmail.com> wrote:
Looking at the last experimental job the reason of the failure is:
2017-07-04 09:39:10,491-04 ERROR [org.ovirt.engine.api.restapi. resource.AbstractBackendResource] (default task-18) [] Operation Failed: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host lago-basic-suite-master-host0 did not satisfy internal filter CPUOverloaded because its CPU is too loaded.]
Do we think that vdsm increased its cpu consumption recently?
On Tue, Jul 4, 2017 at 3:54 PM, Irit Goihman <igoihman@redhat.com> wrote:
I've checked vdsm logs and couldn't find anything related to my change. I'll run OST without my changes and see if it runs successfully.
On Tue, Jul 4, 2017 at 4:49 PM, Eyal Edri <eedri@redhat.com> wrote:
On Tue, Jul 4, 2017 at 4:29 PM, Dafna Ron <dron@redhat.com> wrote:
> This issue is reproduced locally as well. > > you can run the following to reproduce locally > > ./run_suite.sh -s http://jenkins.ovirt.org/job/v > dsm_master_build-artifacts-el7-x86_64/2694/ basic-suite-master > > you will have the environment still running which would allow to > view the live environment. > if you have any issues please ping me and I will help any way I can. > > Thanks, > Dafna > > > Here is the list of changes done from the vdsm that is verified ( in tested now ) to HEAD:
* 74b2276 - (HEAD -> master, origin/master, origin/HEAD) stomp: add integration tests for client reconnect (6 hours ago) Irit Goihman < igoihman@redhat.com> * 2a2f6cd - stomp: set default heartbeat values and add grace period (6 hours ago) Irit Goihman <igoihman@redhat.com> * 56c306a - tests: Make random uuid test repeatable (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 864d4e3 - python3: Fix UUID packing/unpacking on python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * 4ac4221 - python3: Improve uuid packing tests (17 hours ago) Nir Soffer <nsoffer@redhat.com> * d264c8d - python3: Run misc_test in python 3 (17 hours ago) Nir Soffer <nsoffer@redhat.com> * f923b0b - storage: Added disk type change logging (18 hours ago) Denis Chaplygin <dchaplyg@redhat.com> * f1d54a1 - net: Unneeded newline is added when updating only the mtu (25 hours ago) Edward Haas <edwardh@redhat.com> * 9056d61 - virt: metadata: remove dead code (26 hours ago) Francesco Romani <fromani@redhat.com> * 08982b4 - virt: network: use core.find_device_guest_address (31 hours ago) Francesco Romani <fromani@redhat.com> * 62e2bc5 - python3: Run qcow2_test on python 3 (2 days ago) Nir Soffer <nsoffer@redhat.com> * 42f5efb - stomp: implement client reconnect (2 days ago) Irit Goihman <igoihman@redhat.com>
> > > > On 07/04/2017 01:35 PM, Barak Korren wrote: > > > > On 4 July 2017 at 14:32, Irit Goihman <igoihman@redhat.com> wrote: > >> https://gerrit.ovirt.org/#/c/78536 broke network functional tests >> but a fix was merged today: https://gerrit.ovirt.org/#/c/78925/ >> >> I tried to run OST with my fix yesterday and still encountered the >> same failures. >> > > Here is a reproducer of the failure with the fix patch: > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1061/ > > So that was not it probably... > > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > > > _______________________________________________ > Devel mailing listDevel@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/devel > > > > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel >
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal edri
ASSOCIATE MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> phone: +972-9-7692018 <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ)
--
IRIT GOIHMAN
SOFTWARE ENGINEER
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- IRIT GOIHMAN SOFTWARE ENGINEER EMEA VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> @redhatnews <https://twitter.com/redhatnews> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc>
participants (10)
-
Barak Korren
-
Dafna Ron
-
Eyal Edri
-
Irit Goihman
-
Michal Skrivanek
-
Nadav Goldin
-
Piotr Kliczewski
-
Roy Golan
-
Shlomo Ben David
-
Yaniv Kaul