--Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
On 4 Jul 2017, at 13:00, Eyal Edri <eedri(a)redhat.com> wrote:
=20
I was able to reproduce the error [1] on a manual run with only new =
vdsm from
[2],
and also to verify that w/o this change, while using latest tested
run =
[3] it works.
=20
So I think this proves quite clearly the problem is one of the latest =
VDSM
patches.
There is only a single patch between vdsms [1] and [3]
https://gerrit.ovirt.org/#/c/78536
=20
I'm running again the test with the suspected bad VDSM and hopefully =
will be
able to extract the env to tar.gz file
which anyone can import using the lago demo tool.
=20
=20
=20
[1] =
http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-test=
s_manual/748/ =
<
http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tes=
ts_manual/748/>
[2] =
http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/ =
<
http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/2694/=
[3] =
http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-test=
s_manual/747/ =
<
http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tes=
ts_manual/747/>
=20
=20
=20
On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <ngoldin(a)redhat.com =
<mailto:ngoldin@redhat.com>> wrote:
Hi, sorry for posting late, I had a brief look at this yesterday:
1. I couldn't replicate it locally - which means it is most likely a
recent change.
2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new
version is used this week(0.40) - and they seem OK - specifically
memroy and vcpus(which was my initial suspect).
3. I saw two Engine patches, a bit prior to the time it started to
fail, which *might* in my common sense be related, but it is out of my
scope to tell(CC'ed patch owners):
=20
core: Make VmAnalyzer to treat a migrated Paused VM as success -
https://gerrit.ovirt.org/78305 <
https://gerrit.ovirt.org/78305>
=20
fix custom fencing default config setting
https://gerrit.ovirt.org/78720 <
https://gerrit.ovirt.org/78720>
=20
Shot in the wild - Could it be that the 'CPUOverload' filter was not
active before for some reason?
=20
Also, there are some exceptions in host0 vdsm log[1], failing to get
VM stats, though I can't tell if they are specific to this failure.
=20
Of course this is not a complete analysis, I hope it helps.
=20
=20
[1] =
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/arti=
fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master=
/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.l=
og =
<
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/art=
ifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-maste=
r/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.=
log>
=20
=20
Nadav.
=20
=20
=20
=20
=20
On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <eedri(a)redhat.com =
<mailto:eedri@redhat.com>> wrote:
>
>
> On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek
> <michal.skrivanek(a)redhat.com <mailto:michal.skrivanek@redhat.com>> =
wrote:
>>
>>
>> On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi(a)redhat.com =
<mailto:sbendavi@redhat.com>> wrote:
>>
>> Hi,
>>
>> Test failed: [ 006_migrations.migrate_vm ]
>> Link to suspected patches: N/A
>> Link to Job:
>> =
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/
=
<
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/>
>> Link to all logs:
>> Error snippet from the log:
>> =
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/arti=
fact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master=
/post-006_migrations.py/ =
<
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/art=
ifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-maste=
r/post-006_migrations.py/>
>>
>> <error>
>>
>> "Fault reason is "Operation Failed". Fault detail is
"[Cannot =
migrate VM.
>> There is no host that satisfies current scheduling
constraints. See =
below
>> for details:, The host lago-basic-suite-master-host0 did not
=
satisfy
>> internal filter CPUOverloaded because its CPU is too
loaded.]"
>>
>> </error>
>>
>> <engine log>
>>
>> 2017-07-02 16:43:22,829-04 INFO
>> [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default =
task-27)
>> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to
object
>> =
'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]=
',
>> sharedLocks=3D''}'
>> 2017-07-02 16:43:22,833-04 DEBUG
>> =
[org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple=
JdbcCall]
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2]
Compiled =
stored
>> procedure. Call string is [{call
getdiskvmelementspluggedtovm(?)}]
>> 2017-07-02 16:43:22,833-04 DEBUG
>> =
[org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple=
JdbcCall]
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2]
SqlCall =
for
>> procedure [GetDiskVmElementsPluggedToVm] compiled
>> 2017-07-02 16:43:22,843-04 DEBUG
>> =
[org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple=
JdbcCall]
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2]
Compiled =
stored
>> procedure. Call string is [{call
getattacheddisksnapshotstovm(?, =
?)}]
>> 2017-07-02 16:43:22,843-04 DEBUG
>> =
[org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimple=
JdbcCall]
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2]
SqlCall =
for
>> procedure [GetAttachedDiskSnapshotsToVm] compiled
>> 2017-07-02 16:43:22,919-04 INFO
>> [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default =
task-27)
>> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host
>> 'lago-basic-suite-master-host0' =
('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was
>> filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'CPUOverloaded'
>> (correlation id: null)
>> 2017-07-02 16:43:22,920-04 WARN
>> [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default =
task-27)
>> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action
>> 'MigrateVmToServer' failed for user admin@internal-authz. Reasons:
>> =
VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__=
FILTERTYPE__INTERNAL,$hostName
>> lago-basic-suite-master-host0,$filterName
>> =
CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_=
WITH_DETAIL
>>
>>
>>
>> This has nothing to do with migration
>> The CPUOverload is a scheduling policy, unless there was any change =
in
>> that area the obvious explanation would be that the host has
a CPU =
overload
>> condition.
>> I briefly looked at logs and see ""cpuUser": "83.40",
"cpuSys": =
"16.59",
>> "cpuIdle": =E2=80=9C0.08=E2=80=9D=E2=80=9D which
indeed suggests an =
overload, from the same sample I
>> can see it=E2=80=99s vdsm ("cpuUserVdsmd":
=E2=80=9C77.38=E2=80=9D, =
cpuSysVdsmd": =E2=80=9C18.44"
>>
>> Since similar values are consistently being reported for some time, =
and
>> there is a setupNetworks and storage rescan prior to the the
=
failure, and
>> there is no other indication of anything wrong, I=E2=80=99d
just =
say the environment
>> or the order of tests or timing has changed, but nothing
wrong with =
the
>> oVirt code
>> Did any of that changed recently? Does it reproduce locally?
>
>
> AFAIK, no significant environment changes or tests were done.
> We will try to reproduce it locally and also on the manual job, but =
from
> what it looks it is very consistent (unlike other race failures
=
we've seen
> lately ) and continues to fails on the same tests, so its either
a =
change in
> oVirt or something else that we're not thinking on.
>
>>
>>
>> Thanks,
>> michal
>>
>> 2017-07-02 16:43:22,920-04 INFO
>> [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default =
task-27)
>> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object
>> =
'EngineLock:{exclusiveLocks=3D'[2b34910d-cef2-44d6-a274-30e8473eb5d9=3DVM]=
',
>> sharedLocks=3D''}'
>> 2017-07-02 16:43:22,929-04 DEBUG
>> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
>> (DefaultQuartzScheduler7) [] Rescheduling
>> =
DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunner.startFailed=
AutoStartVms#-9223372036854775733
>> as there is no unfired trigger.
>> 2017-07-02 16:43:22,932-04 ERROR
>> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] =
(default
>> task-27) [] Operation Failed: [Cannot migrate VM. There is
no host =
that
>> satisfies current scheduling constraints. See below for
details:, =
The host
>> lago-basic-suite-master-host0 did not satisfy internal
filter =
CPUOverloaded
>> because its CPU is too loaded.]
>> 2017-07-02 16:43:23,331-04 DEBUG
>> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
>> (DefaultQuartzScheduler2) [] Rescheduling
>> =
DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.startFailedAutoStar=
tVms#-9223372036854775793
>> as there is no unfired trigger.
>> 2017-07-02 16:43:23,332-04 DEBUG
>> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
>> (DefaultQuartzScheduler2) [] Rescheduling
>> =
DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallb=
ackMethods#-9223372036854775783
>> as there is no unfired trigger.
>>
>> <engine log>
>>
>>
>>
>> Best Regards,
>>
>> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL
>> RHCSA | RHCVA | RHCE
>> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
>>
>> OPEN SOURCE - 1 4 011 && 011 4 1
>>
>> _______________________________________________
>> Devel mailing list
>> Devel(a)ovirt.org <mailto:Devel@ovirt.org>
>>
http://lists.ovirt.org/mailman/listinfo/devel =
<
http://lists.ovirt.org/mailman/listinfo/devel>
>>
>>
>>
>> _______________________________________________
>> Devel mailing list
>> Devel(a)ovirt.org <mailto:Devel@ovirt.org>
>>
http://lists.ovirt.org/mailman/listinfo/devel =
<
http://lists.ovirt.org/mailman/listinfo/devel>
>
>
>
>
> --
>
> Eyal edri
>
>
> ASSOCIATE MANAGER
>
> RHV DevOps
>
> EMEA VIRTUALIZATION R&D
>
>
> Red Hat EMEA
>
> TRIED. TESTED. TRUSTED.
> phone: +972-9-7692018 <tel:%2B972-9-7692018>
> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
> _______________________________________________
> Devel mailing list
> Devel(a)ovirt.org <mailto:Devel@ovirt.org>
>
http://lists.ovirt.org/mailman/listinfo/devel =
<
http://lists.ovirt.org/mailman/listinfo/devel>
=20
=20
=20
--=20
EYAL EDRI
=20
ASSOCIATE MANAGER
RHV DEVOPS
EMEA VIRTUALIZATION R&D
=20
Red Hat=C2=A0EMEA <
https://www.redhat.com/>
<
https://red.ht/sig> TRIED. TESTED. TRUSTED. =
<
https://redhat.com/trusted>
phone: +972-9-7692018 <tel:+972%209-769-2018>
irc: eedri (on #tlv #rhev-dev #rhev-integ)
--Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=utf-8
<html><head><meta http-equiv=3D"Content-Type"
content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><br class=3D""><div><blockquote
type=3D"cite" class=3D""><div =
class=3D"">On 4 Jul 2017, at 13:00, Eyal Edri <<a =
href=3D"mailto:eedri@redhat.com"
class=3D"">eedri(a)redhat.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div
class=3D""><div =
dir=3D"ltr" class=3D""><div class=3D"">I was able
to reproduce the error =
[1] on a manual run with only new vdsm from [2],</div><div
class=3D"">and =
also to verify that w/o this change, while using latest tested run [3] =
it works.</div><div class=3D""><br
class=3D""></div><div class=3D"">So I =
think this proves quite clearly the problem is one of the latest VDSM =
patches.</div></div></div></blockquote><div><br
class=3D""></div>There =
is only a single patch between vdsms [1] and [3]</div><div><a =
href=3D"https://gerrit.ovirt.org/#/c/78536" =
class=3D"">https://gerrit.ovirt.org/#/c/78536</a></...
=
class=3D""><blockquote type=3D"cite"
class=3D""><div class=3D""><div =
dir=3D"ltr" class=3D""><div class=3D""><br
class=3D""></div><div =
class=3D"">I'm running again the test with the suspected bad VDSM and =
hopefully will be able to extract the env to tar.gz file</div><div =
class=3D"">which anyone can import using the lago demo
tool.</div><div =
class=3D""><br class=3D""></div><div
class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""></div><div
class=3D"">[1] <a =
href=3D"http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi...
tem-tests_manual/748/" target=3D"_blank" =
class=3D"">http://jenkins.ovirt.org/<wbr =
class=3D"">view/oVirt%20system%20tests/<wbr =
class=3D"">job/ovirt-system-tests_manual/<wbr =
class=3D"">748/</a></div><div
class=3D"">[2] <a =
href=3D"http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7...
4/2694/" target=3D"_blank"
class=3D"">http://jenkins.ovirt.org/<wbr =
class=3D"">job/vdsm_master_build-<wbr =
class=3D"">artifacts-el7-x86_64/2694/</a></div><div
class=3D"">[3] <a=
=
href=3D"http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovi...
tem-tests_manual/747/" target=3D"_blank" =
class=3D"">http://jenkins.ovirt.org/<wbr =
class=3D"">view/oVirt%20system%20tests/<wbr =
class=3D"">job/ovirt-system-tests_manual/<wbr =
class=3D"">747/</a></div><div class=3D""><br
class=3D""></div><br =
class=3D""><div class=3D"gmail_extra"><br
class=3D""><div =
class=3D"gmail_quote">On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <span
=
dir=3D"ltr" class=3D""><<a
href=3D"mailto:ngoldin@redhat.com" =
target=3D"_blank"
class=3D"">ngoldin(a)redhat.com</a>&gt;</span> wrote:<br =
class=3D""><blockquote class=3D"gmail_quote"
style=3D"margin:0px 0px 0px =
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi, sorry =
for posting late, I had a brief look at this yesterday:<br class=3D"">
1. I couldn't replicate it locally - which means it is most likely a<br =
class=3D"">
recent change.<br class=3D"">
2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new<br =
class=3D"">
version is used this week(0.40) - and they seem OK - specifically<br =
class=3D"">
memroy and vcpus(which was my initial suspect).<br class=3D"">
3. I saw two Engine patches, a bit prior to the time it started to<br =
class=3D"">
fail, which *might* in my common sense be related, but it is out of =
my<br class=3D"">
scope to tell(CC'ed patch owners):<br class=3D"">
<br class=3D"">
core: Make VmAnalyzer to treat a migrated Paused VM as success -<br =
class=3D"">
<a
href=3D"https://gerrit.ovirt.org/78305" rel=3D"noreferrer" =
target=3D"_blank"
class=3D"">https://gerrit.ovirt.org/78305</a><br =
class=3D"">
<br class=3D"">
fix custom fencing default config setting<br class=3D"">
<a
href=3D"https://gerrit.ovirt.org/78720" rel=3D"noreferrer" =
target=3D"_blank"
class=3D"">https://gerrit.ovirt.org/78720</a><br =
class=3D"">
<br class=3D"">
Shot in the wild - Could it be that the 'CPUOverload' filter was not<br =
class=3D"">
active before for some reason?<br class=3D"">
<br class=3D"">
Also, there are some exceptions in host0 vdsm log[1], failing to get<br =
class=3D"">
VM stats, though I can't tell if they are specific to this failure.<br =
class=3D"">
<br class=3D"">
Of course this is not a complete analysis, I hope it helps.<br
class=3D"">=
<br class=3D"">
<br class=3D"">
[1] <a =
href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma...
431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suit=
e-master/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vds=
m/vdsm.log" rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://jenkins.ovirt.org/job/t<wbr =
class=3D"">est-repo_ovirt_experimental_ma<wbr =
class=3D"">ster/7431/artifact/exported-ar<wbr =
class=3D"">tifacts/basic-suit-master-el7/<wbr =
class=3D"">test_logs/basic-suite-master/<wbr =
class=3D"">post-006_migrations.py/lago-<wbr =
class=3D"">basic-suite-master-host0/_var_<wbr =
class=3D"">log/vdsm/vdsm.log</a><br class=3D"">
<span class=3D"m_2989331196243842324gmail-HOEnZb"><font
color=3D"#888888" =
class=3D""><br class=3D"">
<br class=3D"">
Nadav.<br class=3D"">
</font></span><div
class=3D"m_2989331196243842324gmail-HOEnZb"><div =
class=3D"m_2989331196243842324gmail-h5"><br class=3D"">
<br class=3D"">
<br class=3D"">
<br class=3D"">
<br class=3D"">
On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <<a =
href=3D"mailto:eedri@redhat.com" target=3D"_blank" =
class=3D"">eedri(a)redhat.com</a>&gt; wrote:<br
class=3D"">
><br class=3D"">
><br class=3D"">
> On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek<br class=3D"">
> <<a href=3D"mailto:michal.skrivanek@redhat.com"
target=3D"_blank" =
class=3D"">michal.skrivanek(a)redhat.com</a>&gt; wrote:<br
class=3D"">
>><br class=3D"">
>><br class=3D"">
>> On 3 Jul 2017, at 15:35, Shlomo Ben David <<a =
href=3D"mailto:sbendavi@redhat.com" target=3D"_blank" =
class=3D"">sbendavi(a)redhat.com</a>&gt; wrote:<br
class=3D"">
>><br class=3D"">
>> Hi,<br class=3D"">
>><br class=3D"">
>> Test failed: [ 006_migrations.migrate_vm ]<br
class=3D"">
>> Link to suspected patches: N/A<br class=3D"">
>> Link to Job:<br class=3D"">
>> <a =
href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma...
431/" rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://jenkins.ovirt.org/job/t<wbr =
class=3D"">est-repo_ovirt_experimental_ma<wbr
class=3D"">ster/7431/</a><br=
class=3D"">
>> Link to all logs:<br class=3D"">
>> Error snippet from the log:<br class=3D"">
>> <a =
href=3D"http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma...
431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suit=
e-master/post-006_migrations.py/" rel=3D"noreferrer"
target=3D"_blank" =
class=3D"">http://jenkins.ovirt.org/job/t<wbr =
class=3D"">est-repo_ovirt_experimental_ma<wbr =
class=3D"">ster/7431/artifact/exported-ar<wbr =
class=3D"">tifacts/basic-suit-master-el7/<wbr =
class=3D"">test_logs/basic-suite-master/<wbr =
class=3D"">post-006_migrations.py/</a><br class=3D"">
>><br class=3D"">
>> <error><br class=3D"">
>><br class=3D"">
>> "Fault reason is "Operation Failed". Fault
detail is =
"[Cannot migrate VM.<br class=3D"">
>> There is no host that satisfies current scheduling constraints. =
See below<br class=3D"">
>> for details:, The host lago-basic-suite-master-host0 did not =
satisfy<br class=3D"">
>> internal filter CPUOverloaded because its CPU is too =
loaded.]"<br class=3D"">
>><br class=3D"">
>> </error><br class=3D"">
>><br class=3D"">
>> <engine log><br class=3D"">
>><br class=3D"">
>> 2017-07-02 16:43:22,829-04 INFO<br class=3D"">
>> [org.ovirt.engine.core.bll.Mig<wbr =
class=3D"">rateVmToServerCommand] (default task-27)<br
class=3D"">
>> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2]
Lock =
Acquired to object<br class=3D"">
>> 'EngineLock:{exclusiveLocks=3D'[<wbr =
class=3D"">2b34910d-cef2-44d6-a274-30e847<wbr
class=3D"">3eb5d9=3DVM]',<br=
class=3D"">
>> sharedLocks=3D''}'<br class=3D"">
>> 2017-07-02 16:43:22,833-04 DEBUG<br class=3D"">
>> [org.ovirt.engine.core.dal.dbb<wbr =
class=3D"">roker.PostgresDbEngineDialect$<wbr =
class=3D"">PostgresSimpleJdbcCall]<br class=3D"">
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr =
class=3D"">b55bbc2] Compiled stored<br class=3D"">
>> procedure. Call string is [{call =
getdiskvmelementspluggedtovm(?<wbr class=3D"">)}]<br
class=3D"">
>> 2017-07-02 16:43:22,833-04 DEBUG<br class=3D"">
>> [org.ovirt.engine.core.dal.dbb<wbr =
class=3D"">roker.PostgresDbEngineDialect$<wbr =
class=3D"">PostgresSimpleJdbcCall]<br class=3D"">
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr =
class=3D"">b55bbc2] SqlCall for<br class=3D"">
>> procedure [GetDiskVmElementsPluggedToVm] compiled<br
class=3D"">
>> 2017-07-02 16:43:22,843-04 DEBUG<br class=3D"">
>> [org.ovirt.engine.core.dal.dbb<wbr =
class=3D"">roker.PostgresDbEngineDialect$<wbr =
class=3D"">PostgresSimpleJdbcCall]<br class=3D"">
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr =
class=3D"">b55bbc2] Compiled stored<br class=3D"">
>> procedure. Call string is [{call =
getattacheddisksnapshotstovm(?<wbr class=3D"">, ?)}]<br
class=3D"">
>> 2017-07-02 16:43:22,843-04 DEBUG<br class=3D"">
>> [org.ovirt.engine.core.dal.dbb<wbr =
class=3D"">roker.PostgresDbEngineDialect$<wbr =
class=3D"">PostgresSimpleJdbcCall]<br class=3D"">
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7<wbr =
class=3D"">b55bbc2] SqlCall for<br class=3D"">
>> procedure [GetAttachedDiskSnapshotsToVm] compiled<br
class=3D"">
>> 2017-07-02 16:43:22,919-04 INFO<br class=3D"">
>> [org.ovirt.engine.core.bll.sch<wbr =
class=3D"">eduling.SchedulingManager] (default task-27)<br
class=3D"">
>> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] =
Candidate host<br class=3D"">
>> 'lago-basic-suite-master-host0<wbr class=3D"">'
=
('46bdc63d-98f5-4eee-81aa-2fb8<wbr class=3D"">8b8f7cbe') was<br
=
class=3D"">
>> filtered out by 'VAR__FILTERTYPE__INTERNAL' filter =
'CPUOverloaded'<br class=3D"">
>> (correlation id: null)<br class=3D"">
>> 2017-07-02 16:43:22,920-04 WARN<br class=3D"">
>> [org.ovirt.engine.core.bll.Mig<wbr =
class=3D"">rateVmToServerCommand] (default task-27)<br
class=3D"">
>> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2] =
Validation of action<br class=3D"">
>> 'MigrateVmToServer' failed for user admin@internal-authz. =
Reasons:<br class=3D"">
>> VAR__ACTION__MIGRATE,VAR__TYPE<wbr =
class=3D"">__VM,SCHEDULING_ALL_HOSTS_FILT<wbr =
class=3D"">ERED_OUT,VAR__FILTERTYPE__INTE<wbr
class=3D"">RNAL,$hostName<br=
class=3D"">
>> lago-basic-suite-master-host0,<wbr
class=3D"">$filterName<br =
class=3D"">
>> CPUOverloaded,VAR__DETAIL__CPU<wbr =
class=3D"">_OVERLOADED,SCHEDULING_HOST_<wbr =
class=3D"">FILTERED_REASON_WITH_DETAIL<br class=3D"">
>><br class=3D"">
>><br class=3D"">
>><br class=3D"">
>> This has nothing to do with migration<br class=3D"">
>> The CPUOverload is a scheduling policy, unless there was any =
change in<br class=3D"">
>> that area the obvious explanation would be that the host has a =
CPU overload<br class=3D"">
>> condition.<br class=3D"">
>> I briefly looked at logs and see ""cpuUser":
"83.40", "cpuSys": =
"16.59",<br class=3D"">
>> "cpuIdle": =E2=80=9C0.08=E2=80=9D=E2=80=9D which indeed =
suggests an overload, from the same sample I<br class=3D"">
>> can see it=E2=80=99s vdsm ("cpuUserVdsmd":
=E2=80=9C77.38=E2=80=9D=
, cpuSysVdsmd": =E2=80=9C18.44"<br class=3D"">
>><br class=3D"">
>> Since similar values are consistently being reported for some =
time, and<br class=3D"">
>> there is a setupNetworks and storage rescan prior to the the =
failure, and<br class=3D"">
>> there is no other indication of anything wrong, I=E2=80=99d =
just say the environment<br class=3D"">
>> or the order of tests or timing has changed, but nothing wrong =
with the<br class=3D"">
>> oVirt code<br class=3D"">
>> Did any of that changed recently? Does it reproduce locally?<br =
class=3D"">
><br class=3D"">
><br class=3D"">
> AFAIK, no significant environment changes or tests were done.<br =
class=3D"">
> We will try to reproduce it locally and also on the manual =
job, but from<br class=3D"">
> what it looks it is very consistent (unlike other race failures =
we've seen<br class=3D"">
> lately ) and continues to fails on the same tests, so its either a =
change in<br class=3D"">
> oVirt or something else that we're not thinking on.<br
class=3D"">
><br class=3D"">
>><br class=3D"">
>><br class=3D"">
>> Thanks,<br class=3D"">
>> michal<br class=3D"">
>><br class=3D"">
>> 2017-07-02 16:43:22,920-04 INFO<br class=3D"">
>> [org.ovirt.engine.core.bll.Mig<wbr =
class=3D"">rateVmToServerCommand] (default task-27)<br
class=3D"">
>> [87508047-fdc5-4a2f-9692-c83f7<wbr class=3D"">b55bbc2]
Lock =
freed to object<br class=3D"">
>> 'EngineLock:{exclusiveLocks=3D'[<wbr =
class=3D"">2b34910d-cef2-44d6-a274-30e847<wbr
class=3D"">3eb5d9=3DVM]',<br=
class=3D"">
>> sharedLocks=3D''}'<br class=3D"">
>> 2017-07-02 16:43:22,929-04 DEBUG<br class=3D"">
>> [org.ovirt.engine.core.utils.t<wbr =
class=3D"">imer.FixedDelayJobListener]<br class=3D"">
>> (DefaultQuartzScheduler7) [] Rescheduling<br class=3D"">
>> <a href=3D"http://DEFAULT.org" =
class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr =
class=3D"">bll.ColdRebootAutoStartVmsRunn<wbr =
class=3D"">er.startFailedAutoStartVms#-92<wbr =
class=3D"">23372036854775733<br class=3D"">
>> as there is no unfired trigger.<br class=3D"">
>> 2017-07-02 16:43:22,932-04 ERROR<br class=3D"">
>> [org.ovirt.engine.api.restapi.<wbr =
class=3D"">resource.AbstractBackendResour<wbr class=3D"">ce]
(default<br =
class=3D"">
>> task-27) [] Operation Failed: [Cannot migrate VM. There is no =
host that<br class=3D"">
>> satisfies current scheduling constraints. See below for =
details:, The host<br class=3D"">
>> lago-basic-suite-master-host0 did not satisfy internal filter =
CPUOverloaded<br class=3D"">
>> because its CPU is too loaded.]<br class=3D"">
>> 2017-07-02 16:43:23,331-04 DEBUG<br class=3D"">
>> [org.ovirt.engine.core.utils.t<wbr =
class=3D"">imer.FixedDelayJobListener]<br class=3D"">
>> (DefaultQuartzScheduler2) [] Rescheduling<br class=3D"">
>> <a href=3D"http://DEFAULT.org" =
class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr =
class=3D"">bll.HaAutoStartVmsRunner.start<wbr =
class=3D"">FailedAutoStartVms#-9223372036<wbr
class=3D"">854775793<br =
class=3D"">
>> as there is no unfired trigger.<br class=3D"">
>> 2017-07-02 16:43:23,332-04 DEBUG<br class=3D"">
>> [org.ovirt.engine.core.utils.t<wbr =
class=3D"">imer.FixedDelayJobListener]<br class=3D"">
>> (DefaultQuartzScheduler2) [] Rescheduling<br class=3D"">
>> <a href=3D"http://DEFAULT.org" =
class=3D"">DEFAULT.org</a>.ovirt.engine.core.<wbr =
class=3D"">bll.tasks.CommandCallbacksPoll<wbr =
class=3D"">er.invokeCallbackMethods#-9223<wbr =
class=3D"">372036854775783<br class=3D"">
>> as there is no unfired trigger.<br class=3D"">
>><br class=3D"">
>> <engine log><br class=3D"">
>><br class=3D"">
>><br class=3D"">
>><br class=3D"">
>> Best Regards,<br class=3D"">
>><br class=3D"">
>> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL<br =
class=3D"">
>> RHCSA | RHCVA | RHCE<br class=3D"">
>> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)<br =
class=3D"">
>><br class=3D"">
>> OPEN SOURCE - 1 4 011 && 011 4 1<br
class=3D"">
>><br class=3D"">
>> ______________________________<wbr =
class=3D"">_________________<br class=3D"">
>> Devel mailing list<br class=3D"">
>> <a href=3D"mailto:Devel@ovirt.org"
target=3D"_blank" =
class=3D"">Devel(a)ovirt.org</a><br class=3D"">
>> <a
href=3D"http://lists.ovirt.org/mailman/listinfo/devel" =
rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://lists.ovirt.org/mailman<wbr =
class=3D"">/listinfo/devel</a><br class=3D"">
>><br class=3D"">
>><br class=3D"">
>><br class=3D"">
>> ______________________________<wbr =
class=3D"">_________________<br class=3D"">
>> Devel mailing list<br class=3D"">
>> <a href=3D"mailto:Devel@ovirt.org"
target=3D"_blank" =
class=3D"">Devel(a)ovirt.org</a><br class=3D"">
>> <a
href=3D"http://lists.ovirt.org/mailman/listinfo/devel" =
rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://lists.ovirt.org/mailman<wbr =
class=3D"">/listinfo/devel</a><br class=3D"">
><br class=3D"">
><br class=3D"">
><br class=3D"">
><br class=3D"">
> --<br class=3D"">
><br class=3D"">
> Eyal edri<br class=3D"">
><br class=3D"">
><br class=3D"">
> ASSOCIATE MANAGER<br class=3D"">
><br class=3D"">
> RHV DevOps<br class=3D"">
><br class=3D"">
> EMEA VIRTUALIZATION R&D<br class=3D"">
><br class=3D"">
><br class=3D"">
> Red Hat EMEA<br class=3D"">
><br class=3D"">
</div></div><span class=3D"m_2989331196243842324gmail-im =
m_2989331196243842324gmail-HOEnZb">> TRIED. TESTED. TRUSTED.<br =
class=3D"">
> phone: <a href=3D"tel:%2B972-9-7692018"
value=3D"+97297692018" =
target=3D"_blank" class=3D"">+972-9-7692018</a><br
class=3D"">
> irc: eedri (on #tlv #rhev-dev #rhev-integ)<br class=3D"">
><br class=3D"">
</span><div class=3D"m_2989331196243842324gmail-HOEnZb"><div =
class=3D"m_2989331196243842324gmail-h5">> =
______________________________<wbr class=3D"">_________________<br =
class=3D"">
> Devel mailing list<br class=3D"">
> <a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" =
class=3D"">Devel(a)ovirt.org</a><br class=3D"">
> <a
href=3D"http://lists.ovirt.org/mailman/listinfo/devel" =
rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://lists.ovirt.org/mailman<wbr =
class=3D"">/listinfo/devel</a><br class=3D"">
</div></div></blockquote></div><br
class=3D""><br clear=3D"all" =
class=3D""><div class=3D""><br
class=3D""></div>-- <br class=3D""><div =
class=3D"m_2989331196243842324gmail_signature"><div dir=3D"ltr"
=
class=3D""><div class=3D""><div dir=3D"ltr"
class=3D""><div =
class=3D""><div dir=3D"ltr" class=3D""><div
class=3D""><div dir=3D"ltr" =
class=3D""><div class=3D""><div style=3D"font-family:
overpass, =
sans-serif; margin: 0px; padding: 0px; font-size: 14px; text-transform: =
uppercase; font-weight: bold;" class=3D""><font
color=3D"#cc0000" =
class=3D"">Eyal edri</font></div><div
style=3D"font-family: overpass, =
sans-serif; font-weight: bold; margin: 0px; padding: 0px; font-size: =
14px; text-transform: uppercase;" class=3D""><br
class=3D""></div><p =
style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px =
0px 4px; text-transform: uppercase;" class=3D"">ASSOCIATE
MANAGER</p><p =
style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px =
0px 4px; text-transform: uppercase;" class=3D"">RHV
DevOps</p><p =
style=3D"font-family: overpass, sans-serif; font-size: 10px; margin: 0px =
0px 4px; text-transform: uppercase;" class=3D"">EMEA VIRTUALIZATION =
R&D</p><p style=3D"font-family: overpass, sans-serif; font-size: =
10px; margin: 0px 0px 4px; text-transform: uppercase;" class=3D""><br
=
class=3D""></p><div style=3D"font-family: overpass, sans-serif;
margin: =
0px; font-size: 10px; color: rgb(153, 153, 153);" class=3D""><a =
href=3D"https://www.redhat.com/"
style=3D"color:rgb(0,136,206);margin:0px"=
target=3D"_blank" class=3D"">Red
Hat EMEA</a></div><table =
border=3D"0" style=3D"font-family: overpass, sans-serif; font-size: =
inherit;" class=3D""><tbody class=3D""><tr
class=3D""><td width=3D"100px" =
class=3D""><a href=3D"https://red.ht/sig"
style=3D"color:rgb(17,85,204)" =
target=3D"_blank" class=3D""><img =
src=3D"https://www.redhat.com/profiles/rh/themes/redhatdotcom/img/lo...
-hat-black.png" width=3D"90" height=3D"auto"
class=3D""></a></td><td =
style=3D"font-size:10px" class=3D""><a
href=3D"https://redhat.com/trusted"=
style=3D"color:rgb(204,0,0);font-weight:bold" target=3D"_blank" =
class=3D"">TRIED. TESTED. =
TRUSTED.</a></td></tr></tbody></table></div><div
class=3D"">phone: <a =
href=3D"tel:+972%209-769-2018" value=3D"+97297692018"
target=3D"_blank" =
class=3D"">+972-9-7692018</a><br class=3D"">irc: eedri
(on #tlv =
#rhev-dev =
#rhev-integ)</div></div></div></div></div></div></div></div></div>
</div></div>
</div></blockquote></div><br
class=3D""></body></html>=
--Apple-Mail=_E7900078-8414-4EE0-946C-00E929769756--