DAO tests failing

Eyal Edri eedri at redhat.com
Mon Nov 24 20:46:26 UTC 2014


Lior,

just a note - regardless of if this error is a false positive or not, 
the infra team is very much understaffed (single man operation almost), at least when it comes
to jenkins stability, so it's not un-common if there are infra issues and it takes some time to 
resolve them.
So as david proposed (on a different thread i think)- if you think a certain job shouldn't fail, please don't block on it.
Talk to your manager or request the right permissions to remove it from gerrit if you're a maintainer/TL,
but please make sure you do validate it locally before merging.

We're planning a full day infra hackathon soon to address many issues on the oVirt infra,
and everyone will be able to assist if they can/like - an official email with details will be
sent in the following days.

Eyal.



----- Original Message -----
> From: "David Caro" <dcaroest at redhat.com>
> To: "Lior Vernia" <lvernia at redhat.com>
> Cc: infra at ovirt.org
> Sent: Monday, November 24, 2014 4:27:00 PM
> Subject: Re: DAO tests failing
> 
> On 11/24, Lior Vernia wrote:
> > 
> > 
> > On 24/11/14 14:25, David Caro wrote:
> > > On 11/24, Lior Vernia wrote:
> > >>
> > >>
> > >> On 24/11/14 14:16, David Caro wrote:
> > >>> On 11/24, Allon Mureinik wrote:
> > >>>> Looks like recreating the database fails:
> > >>>>
> > >>>> 08:54:54 [ovirt-engine_master_dao-unit-tests_created] $ /bin/sh
> > >>>> /tmp/hudson8811709354471707251.sh
> > >>>> 08:54:54
> > >>>> /home/jenkins/workspace/ovirt-engine_master_dao-unit-tests_created
> > >>>> 08:54:54
> > >>>> /home/jenkins/workspace/ovirt-engine_master_dao-unit-tests_created
> > >>>> 08:54:55 could not change directory to
> > >>>> "/home/jenkins/workspace/ovirt-engine_master_dao-unit-tests_created"
> > >>>> 08:54:55 ERROR:  role "engine" already exists
> > >>>> 08:54:55 could not change directory to
> > >>>> "/home/jenkins/workspace/ovirt-engine_master_dao-unit-tests_created"
> > >>>> 08:54:55 ALTER ROLE
> > >>>> 08:54:56 could not change directory to
> > >>>> "/home/jenkins/workspace/ovirt-engine_master_dao-unit-tests_created"
> > >>>> 08:54:56 dropdb: database removal failed: ERROR:  database "engine"
> > >>>> does not exist
> > >>>> 08:54:56 could not change directory to
> > >>>> "/home/jenkins/workspace/ovirt-engine_master_dao-unit-tests_created"
> > >>>> 08:54:58 CREATE DATABASE
> > >>>> 08:54:59 Creating schema
> > >>>> engine at localhost:5432/ovirt_engine_master_dao_unit_tests_created_6082
> > >>>>
> > >>>
> > >>> Those are not errors, are just unfiltered messages when it tries to
> > >>> make sure the database is not there and the user has enough rights at
> > >>> the startup of the job.
> > >>>
> > >>> The first issue is:
> > >>>
> > >>> 00:09:49.369 2014-11-24 09:03:47,301 SEVERE
> > >>> [org.ovirt.engine.core.dal.dbbroker.BatchProcedureExecutionConnectionCallback
> > >>> doInConnection] Can't execute batch: Batch entry 0 select * from
> > >>> public.insertnumanode(CAST ('830fd8d0-9332-4d81-bb80-54beee7d5b59' AS
> > >>> uuid),CAST (NULL AS uuid),CAST ('77296e00-0cad-4e5a-9299-008a7b6f4355'
> > >>> AS uuid),CAST ('0' AS int2),CAST ('0' AS int8),CAST ('4' AS int2),CAST
> > >>> (NULL AS int8),CAST (NULL AS int4),CAST (NULL AS numeric),CAST (NULL
> > >>> AS numeric),CAST (NULL AS numeric),CAST (NULL AS int4),CAST (NULL AS
> > >>> text)) as result was aborted.  Call getNextException to see the cause.
> > >>> 00:09:49.441 2014-11-24 09:03:47,303 SEVERE
> > >>> [org.ovirt.engine.core.dal.dbbroker.BatchProcedureExecutionConnectionCallback
> > >>> doInConnection] Can't execute batch. Next exception is: ERROR: insert
> > >>> or update on table "numa_node" violates foreign key constraint
> > >>> "fk_numa_node_vm"
> > >>>
> > >>> What to me looks like a real issue. I'll do a couple more checks, but
> > >>> I don't know how the tests work or what the code does, that's your
> > >>> domain.
> > >>>
> > >>
> > >> I don't know if that specifically is a real issue or not (as it isn't
> > >> related to my patch and I haven't researched it), I do see however that
> > >> there are ultimately 433 failures and 371 errors, and that is spread
> > >> across many independent tests.
> > > 
> > > You should be aware that it's not just your patch that runs, but all
> > > the patches yours depends on, that I see are quite a lot, are you sure
> > > that none of them introduces those failures?
> > > 
> > 
> > It had failed the same way when it didn't depend on anything unmerged -
> > I recently rebased it on two other patches so I could merge them first
> > (and neither of them touches the dal project).
> > 
> > What do you mean by "quite a lot"? Do you see more than two?
> 
> From the gerrit page I see that the patch depended on other 5 patches,
> 3 of those are already merged. So no, right now I only see two.
> 
> 
> > Could these be environmental issues?
> 
> There's always a possibility, but I think that it's quite improbable
> in this case.
> 
> > It's possible that my patch causes all of this,
> > but I don't see how. Could it be the Allon's got it right, and the DB
> > isn't being constructed properly?
> 
> The DB is being constructed as part of the test. The messages Allon
> pointed out are not errors but just unfiltered messages (it just drops
> the db directly instead of checking if it exists first for example).
> I don't see any issues on the db creation.
> 
> ERROR: insert or update on table "network" violates foreign key
> constraint "fk_network_qos_id"   Detail: Key
> (qos_id)=(de956031-6be2-43d6-bb90-5191c9253318) is not present in
> table "qos".
> 
> 
> This failure is quite a specific one, you can try looking for the
> point where that key should be added. It looks like you are trying to
> add an entry to the network table with an id from the qos that does
> not exist. Most of the other failures also seem related to foreign
> keys not being consistent.
> 
> That does not seem like an environmental issue. The db did not exist
> at the start of the test qnd it works well when run from branch HEAD,
> so it's an issue that's introduced by your patch or any it depends
> on. Keep in mind that it does a checkout of the change and not a
> rebase.
> 
> 
> I see that the first failure (if I'm not mistaken) seems to be:
> 
> 00:02:38.691 Running
> org.ovirt.engine.core.dao.VmAndTemplatesGenerationsDaoTest
> 00:02:39.007 Tests run: 18, Failures: 13, Errors: 1, Skipped: 0, Time
> elapsed: 0.325 sec <<< FAILURE!
> 
> What might lead to an entry not being created on the db and the other
> failures.
> 
> I don't know the code or the tests code, so I don't think I can help
> you more than that.
> 
> 
> > 
> > >>
> > >>>>
> > >>>>
> > >>>> ----- Original Message -----
> > >>>>> From: "Lior Vernia" <lvernia at redhat.com>
> > >>>>> To: infra at ovirt.org
> > >>>>> Sent: Monday, November 24, 2014 1:37:25 PM
> > >>>>> Subject: DAO tests failing
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> I've noticed recurrent failures of DAO tests on one of my patches,
> > >>>>> this
> > >>>>> looks like something systematic as there are hundreds of failures in
> > >>>>> unrelated files.
> > >>>>>
> > >>>>> http://gerrit.ovirt.org/#/c/34121/
> > >>>>>
> > >>>>> Talked to dcaro about it on Thursday, but have been rebasing and
> > >>>>> re-running the tests and they keep failing.
> > >>>>>
> > >>>>> Thanks, Lior.
> > >>>>> _______________________________________________
> > >>>>> Infra mailing list
> > >>>>> Infra at ovirt.org
> > >>>>> http://lists.ovirt.org/mailman/listinfo/infra
> > >>>>>
> > >>>> _______________________________________________
> > >>>> Infra mailing list
> > >>>> Infra at ovirt.org
> > >>>> http://lists.ovirt.org/mailman/listinfo/infra
> > >>>
> > > 
> 
> --
> David Caro
> 
> Red Hat S.L.
> Continuous Integration Engineer - EMEA ENG Virtualization R&D
> 
> Tel.: +420 532 294 605
> Email: dcaro at redhat.com
> Web: www.redhat.com
> RHT Global #: 82-62605
> 
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
> 



More information about the Infra mailing list