[ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 03-07-2017 ] [ 006_migrations.migrate_vm ]

Irit Goihman igoihman at redhat.com
Tue Jul 4 11:32:09 UTC 2017


https://gerrit.ovirt.org/#/c/78536 broke network functional tests but a fix
was merged today: https://gerrit.ovirt.org/#/c/78925/

I tried to run OST with my fix yesterday and still encountered the same
failures.

On Tue, Jul 4, 2017 at 2:25 PM, Michal Skrivanek <
michal.skrivanek at redhat.com> wrote:

>
> On 4 Jul 2017, at 13:00, Eyal Edri <eedri at redhat.com> wrote:
>
> I was able to reproduce the error [1] on a manual run with only new vdsm
> from [2],
> and also to verify that w/o this change, while using latest tested run [3]
> it works.
>
> So I think this proves quite clearly the problem is one of the latest VDSM
> patches.
>
>
> There is only a single patch between vdsms [1] and [3]
> https://gerrit.ovirt.org/#/c/78536
>
>
> I'm running again the test with the suspected bad VDSM and hopefully will
> be able to extract the env to tar.gz file
> which anyone can import using the lago demo tool.
>
>
>
> [1] http://jenkins.ovirt.org/view/oVirt%20system%20tests/job
> /ovirt-system-tests_manual/748/
> [2] http://jenkins.ovirt.org/job/vdsm_master_build-artifacts
> -el7-x86_64/2694/
> [3] http://jenkins.ovirt.org/view/oVirt%20system%20tests/job
> /ovirt-system-tests_manual/747/
>
>
>
> On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <ngoldin at redhat.com> wrote:
>
>> Hi, sorry for posting late, I had a brief look at this yesterday:
>> 1. I couldn't replicate it locally - which means it is most likely a
>> recent change.
>> 2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new
>> version is used this week(0.40) - and they seem OK - specifically
>> memroy and vcpus(which was my initial suspect).
>> 3. I saw two Engine patches, a bit prior to the time it started to
>> fail, which *might* in my common sense be related, but it is out of my
>> scope to tell(CC'ed patch owners):
>>
>> core: Make VmAnalyzer to treat a migrated Paused VM as success -
>> https://gerrit.ovirt.org/78305
>>
>> fix custom fencing default config setting
>> https://gerrit.ovirt.org/78720
>>
>> Shot in the wild - Could it be that the 'CPUOverload' filter was not
>> active before for some reason?
>>
>> Also, there are some exceptions in host0 vdsm log[1], failing to get
>> VM stats, though I can't tell if they are specific to this failure.
>>
>> Of course this is not a complete analysis, I hope it helps.
>>
>>
>> [1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
>> ster/7431/artifact/exported-artifacts/basic-suit-master-el7/
>> test_logs/basic-suite-master/post-006_migrations.py/lago-bas
>> ic-suite-master-host0/_var_log/vdsm/vdsm.log
>>
>>
>> Nadav.
>>
>>
>>
>>
>>
>> On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri <eedri at redhat.com> wrote:
>> >
>> >
>> > On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek
>> > <michal.skrivanek at redhat.com> wrote:
>> >>
>> >>
>> >> On 3 Jul 2017, at 15:35, Shlomo Ben David <sbendavi at redhat.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Test failed: [ 006_migrations.migrate_vm ]
>> >> Link to suspected patches: N/A
>> >> Link to Job:
>> >> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/
>> >> Link to all logs:
>> >> Error snippet from the log:
>> >> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
>> ster/7431/artifact/exported-artifacts/basic-suit-master-el7/
>> test_logs/basic-suite-master/post-006_migrations.py/
>> >>
>> >> <error>
>> >>
>> >>  "Fault reason is "Operation Failed". Fault detail is "[Cannot migrate
>> VM.
>> >> There is no host that satisfies current scheduling constraints. See
>> below
>> >> for details:, The host lago-basic-suite-master-host0 did not satisfy
>> >> internal filter CPUOverloaded because its CPU is too loaded.]"
>> >>
>> >> </error>
>> >>
>> >> <engine log>
>> >>
>> >> 2017-07-02 16:43:22,829-04 INFO
>> >> [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27)
>> >> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object
>> >> 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e847
>> 3eb5d9=VM]',
>> >> sharedLocks=''}'
>> >> 2017-07-02 16:43:22,833-04 DEBUG
>> >> [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$
>> PostgresSimpleJdbcCall]
>> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled
>> stored
>> >> procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}]
>> >> 2017-07-02 16:43:22,833-04 DEBUG
>> >> [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$
>> PostgresSimpleJdbcCall]
>> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for
>> >> procedure [GetDiskVmElementsPluggedToVm] compiled
>> >> 2017-07-02 16:43:22,843-04 DEBUG
>> >> [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$
>> PostgresSimpleJdbcCall]
>> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled
>> stored
>> >> procedure. Call string is [{call getattacheddisksnapshotstovm(?, ?)}]
>> >> 2017-07-02 16:43:22,843-04 DEBUG
>> >> [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$
>> PostgresSimpleJdbcCall]
>> >> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for
>> >> procedure [GetAttachedDiskSnapshotsToVm] compiled
>> >> 2017-07-02 16:43:22,919-04 INFO
>> >> [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default
>> task-27)
>> >> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host
>> >> 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe')
>> was
>> >> filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded'
>> >> (correlation id: null)
>> >> 2017-07-02 16:43:22,920-04 WARN
>> >> [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27)
>> >> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action
>> >> 'MigrateVmToServer' failed for user admin at internal-authz. Reasons:
>> >> VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILT
>> ERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName
>> >> lago-basic-suite-master-host0,$filterName
>> >> CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FI
>> LTERED_REASON_WITH_DETAIL
>> >>
>> >>
>> >>
>> >> This has nothing to do with migration
>> >> The CPUOverload is a scheduling policy, unless there was any change in
>> >> that area the obvious explanation would be that the host has a CPU
>> overload
>> >> condition.
>> >> I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys":
>> "16.59",
>> >> "cpuIdle": “0.08”” which indeed suggests an overload, from the same
>> sample I
>> >> can see it’s vdsm ("cpuUserVdsmd": “77.38”, cpuSysVdsmd": “18.44"
>> >>
>> >> Since similar values are consistently being reported for some time, and
>> >> there is a setupNetworks and storage rescan prior to the the failure,
>> and
>> >> there is no other indication of anything wrong, I’d just say the
>> environment
>> >> or the order of tests or timing has changed, but nothing wrong with the
>> >> oVirt code
>> >> Did any of that changed recently? Does it reproduce locally?
>> >
>> >
>> > AFAIK, no significant environment changes or tests were done.
>> > We will try to reproduce it locally and also on the manual job,  but
>> from
>> > what it looks it is very consistent (unlike other race failures we've
>> seen
>> > lately ) and continues to fails on the same tests, so its either a
>> change in
>> > oVirt or something else that we're not thinking on.
>> >
>> >>
>> >>
>> >> Thanks,
>> >> michal
>> >>
>> >> 2017-07-02 16:43:22,920-04 INFO
>> >> [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27)
>> >> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock freed to object
>> >> 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e847
>> 3eb5d9=VM]',
>> >> sharedLocks=''}'
>> >> 2017-07-02 16:43:22,929-04 DEBUG
>> >> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
>> >> (DefaultQuartzScheduler7) [] Rescheduling
>> >> DEFAULT.org.ovirt.engine.core.bll.ColdRebootAutoStartVmsRunn
>> er.startFailedAutoStartVms#-9223372036854775733
>> >> as there is no unfired trigger.
>> >> 2017-07-02 16:43:22,932-04 ERROR
>> >> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource]
>> (default
>> >> task-27) [] Operation Failed: [Cannot migrate VM. There is no host that
>> >> satisfies current scheduling constraints. See below for details:, The
>> host
>> >> lago-basic-suite-master-host0 did not satisfy internal filter
>> CPUOverloaded
>> >> because its CPU is too loaded.]
>> >> 2017-07-02 16:43:23,331-04 DEBUG
>> >> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
>> >> (DefaultQuartzScheduler2) [] Rescheduling
>> >> DEFAULT.org.ovirt.engine.core.bll.HaAutoStartVmsRunner.start
>> FailedAutoStartVms#-9223372036854775793
>> >> as there is no unfired trigger.
>> >> 2017-07-02 16:43:23,332-04 DEBUG
>> >> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
>> >> (DefaultQuartzScheduler2) [] Rescheduling
>> >> DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoll
>> er.invokeCallbackMethods#-9223372036854775783
>> >> as there is no unfired trigger.
>> >>
>> >> <engine log>
>> >>
>> >>
>> >>
>> >> Best Regards,
>> >>
>> >> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL
>> >> RHCSA | RHCVA | RHCE
>> >> IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci)
>> >>
>> >> OPEN SOURCE - 1 4 011 && 011 4 1
>> >>
>> >> _______________________________________________
>> >> Devel mailing list
>> >> Devel at ovirt.org
>> >> http://lists.ovirt.org/mailman/listinfo/devel
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Devel mailing list
>> >> Devel at ovirt.org
>> >> http://lists.ovirt.org/mailman/listinfo/devel
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Eyal edri
>> >
>> >
>> > ASSOCIATE MANAGER
>> >
>> > RHV DevOps
>> >
>> > EMEA VIRTUALIZATION R&D
>> >
>> >
>> > Red Hat EMEA
>> >
>> > TRIED. TESTED. TRUSTED.
>> > phone: +972-9-7692018
>> > irc: eedri (on #tlv #rhev-dev #rhev-integ)
>> >
>> > _______________________________________________
>> > Devel mailing list
>> > Devel at ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/devel
>>
>
>
>
> --
> Eyal edri
>
> ASSOCIATE MANAGER
>
> RHV DevOps
>
> EMEA VIRTUALIZATION R&D
>
>
> Red Hat EMEA <https://www.redhat.com/>
> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
> phone: +972-9-7692018 <+972%209-769-2018>
> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
>
>


-- 

IRIT GOIHMAN

SOFTWARE ENGINEER

EMEA VIRTUALIZATION R&D

Red Hat EMEA <https://www.redhat.com/>

<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
@redhatnews <https://twitter.com/redhatnews>   Red Hat
<https://www.linkedin.com/company/red-hat>   Red Hat
<https://www.facebook.com/RedHatInc>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170704/ddeafc5c/attachment-0001.html>


More information about the Devel mailing list