[ OST Failure Report ] [ oVirt master ] [ 2017-11-13 ] [add_master_storage_domain]

Test failed: [ add_master_storage_domain ] Link to suspected patches: - https://gerrit.ovirt.org/#/c/83849/4 This seems to be a fairly consistent regression as all patches that follow the patch above exhibit the same issue when tested, this list of similarly failing patches includes: - https://gerrit.ovirt.org/#/c/83940/2 - https://gerrit.ovirt.org/#/c/83608/7 - https://gerrit.ovirt.org/#/c/83815/3 The following patch is an exception because it simply failed to build: - https://gerrit.ovirt.org/#/c/83370/9 Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3770/ Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3770/artifact/... Error snippet from log: <error> Fault reason is "Operation Failed". Fault detail is "[Failed to attach Storage due to an error on the Data Center master Storage Domain. -Please activate the master Storage Domain first.]". HTTP response code is 409. </error> -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On Mon, Nov 13, 2017 at 9:56 AM Barak Korren <bkorren@redhat.com> wrote:
Test failed: [ add_master_storage_domain ]
Link to suspected patches: - https://gerrit.ovirt.org/#/c/83849/4
This fixes grammar in few error messages, I don't see how this can cause failures unless someone is depending on the error text...
This seems to be a fairly consistent regression as all patches that follow the patch above exhibit the same issue when tested, this list of similarly failing patches includes: - https://gerrit.ovirt.org/#/c/83940/2 - https://gerrit.ovirt.org/#/c/83608/7 - https://gerrit.ovirt.org/#/c/83815/3 The following patch is an exception because it simply failed to build: - https://gerrit.ovirt.org/#/c/83370/9
Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3770/
Link to all logs:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3770/artifact/...
Error snippet from log:
<error>
Fault reason is "Operation Failed". Fault detail is "[Failed to attach Storage due to an error on the Data Center master Storage Domain. -Please activate the master Storage Domain first.]". HTTP response code is 409.
</error>
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On 13 November 2017 at 10:08, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Nov 13, 2017 at 9:56 AM Barak Korren <bkorren@redhat.com> wrote:
Test failed: [ add_master_storage_domain ]
Link to suspected patches: - https://gerrit.ovirt.org/#/c/83849/4
This fixes grammar in few error messages, I don't see how this can cause failures unless someone is depending on the error text...
Previous test with engine patches before it was successful: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3769 This one included the following patches which were the two commits leading up to the failing one: - https://gerrit.ovirt.org/#/c/83920/4 - https://gerrit.ovirt.org/#/c/83918/2 Also following tests with no engine patches were also successful: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3780/ so this eliminates suspicion for unrelated infra issues. I think that is sufficient grounds for suspicion.
This seems to be a fairly consistent regression as all patches that follow the patch above exhibit the same issue when tested, this list of similarly failing patches includes: - https://gerrit.ovirt.org/#/c/83940/2 - https://gerrit.ovirt.org/#/c/83608/7 - https://gerrit.ovirt.org/#/c/83815/3 The following patch is an exception because it simply failed to build: - https://gerrit.ovirt.org/#/c/83370/9
Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3770/
Link to all logs:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3770/artifact/...
Error snippet from log:
<error>
Fault reason is "Operation Failed". Fault detail is "[Failed to attach Storage due to an error on the Data Center master Storage Domain. -Please activate the master Storage Domain first.]". HTTP response code is 409.
</error>
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

It's most definitely not THAT patch. Could be one of the earlier patches in the series, though. Looking. On Mon, Nov 13, 2017 at 10:54 AM, Barak Korren <bkorren@redhat.com> wrote:
On 13 November 2017 at 10:08, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Nov 13, 2017 at 9:56 AM Barak Korren <bkorren@redhat.com> wrote:
Test failed: [ add_master_storage_domain ]
Link to suspected patches: - https://gerrit.ovirt.org/#/c/83849/4
This fixes grammar in few error messages, I don't see how this can cause failures unless someone is depending on the error text...
Previous test with engine patches before it was successful: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3769 This one included the following patches which were the two commits leading up to the failing one: - https://gerrit.ovirt.org/#/c/83920/4 - https://gerrit.ovirt.org/#/c/83918/2
Also following tests with no engine patches were also successful: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3780/ so this eliminates suspicion for unrelated infra issues.
I think that is sufficient grounds for suspicion.
This seems to be a fairly consistent regression as all patches that follow the patch above exhibit the same issue when tested, this list of similarly failing patches includes: - https://gerrit.ovirt.org/#/c/83940/2 - https://gerrit.ovirt.org/#/c/83608/7 - https://gerrit.ovirt.org/#/c/83815/3 The following patch is an exception because it simply failed to build: - https://gerrit.ovirt.org/#/c/83370/9
Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3770/
Link to all logs:
tester/3770/artifact/exported-artifacts/basic-suit-master- el7/test_logs/basic-suite-master/post-002_bootstrap.py/
Error snippet from log:
<error>
Fault reason is "Operation Failed". Fault detail is "[Failed to attach Storage due to an error on the Data Center master Storage Domain. -Please activate the master Storage Domain first.]". HTTP response code
is
409.
</error>
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Commit 86bf82e746404145e8b97df46f514086e4f82e69 is probably the offending commit, taking a deeper look now, should have a fix within the hour On Mon, Nov 13, 2017 at 9:55 AM, Barak Korren <bkorren@redhat.com> wrote:
Test failed: [ add_master_storage_domain ]
Link to suspected patches: - https://gerrit.ovirt.org/#/c/83849/4 This seems to be a fairly consistent regression as all patches that follow the patch above exhibit the same issue when tested, this list of similarly failing patches includes: - https://gerrit.ovirt.org/#/c/83940/2 - https://gerrit.ovirt.org/#/c/83608/7 - https://gerrit.ovirt.org/#/c/83815/3 The following patch is an exception because it simply failed to build: - https://gerrit.ovirt.org/#/c/83370/9
Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3770/
Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue- tester/3770/artifact/exported-artifacts/basic-suit-master- el7/test_logs/basic-suite-master/post-002_bootstrap.py/
Error snippet from log:
<error>
Fault reason is "Operation Failed". Fault detail is "[Failed to attach Storage due to an error on the Data Center master Storage Domain. -Please activate the master Storage Domain first.]". HTTP response code is 409.
</error>
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On 13 November 2017 at 11:10, Allon Mureinik <amureini@redhat.com> wrote:
Commit 86bf82e746404145e8b97df46f514086e4f82e69 is probably the offending commit, taking a deeper look now, should have a fix within the hour
Hmm, I see what happened there, that commit is https://gerrit.ovirt.org/c/83920, so it was tested at: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3769 and passed. BUT, it was tested along with https://gerrit.ovirt.org/#/c/83918/2 that was merged earlier as far as git ordering goes, but somehow was assumed to be a later patch. When we test multiple patches from the same project we essentially use packages built from the one we deem to be the latest. So in this case only the packages from 83918 were actually tested. We went to great lengths to preserve correct patch ordering by preserving the order of event notifications reaching us from gerrit. But here it seems we ended out with out-of-order patches. Do you have any idea how did we end up with this? Do you remember what was the sequence in which you clicked the 'merge' button on the different patches? -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

I merged 83920 with dependencies. Perhaps gerrit doesn't handle such chains properly (or perhaps our CI is misusing the API?). WRT current issues, posted two patches upstream: - https://gerrit.ovirt.org/#/c/83986/ - fixes the GWT build that was broken this morning, preventing a proper build altogether - https://gerrit.ovirt.org/#/c/83987/ - should fix the OST regression you noted. I'm waiting for the CI to finish running on both and will merge. On Mon, Nov 13, 2017 at 11:26 AM, Barak Korren <bkorren@redhat.com> wrote:
On 13 November 2017 at 11:10, Allon Mureinik <amureini@redhat.com> wrote:
Commit 86bf82e746404145e8b97df46f514086e4f82e69 is probably the offending commit, taking a deeper look now, should have a fix within the hour
Hmm, I see what happened there, that commit is https://gerrit.ovirt.org/c/83920, so it was tested at: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3769 and passed.
BUT, it was tested along with https://gerrit.ovirt.org/#/c/83918/2 that was merged earlier as far as git ordering goes, but somehow was assumed to be a later patch. When we test multiple patches from the same project we essentially use packages built from the one we deem to be the latest. So in this case only the packages from 83918 were actually tested.
We went to great lengths to preserve correct patch ordering by preserving the order of event notifications reaching us from gerrit. But here it seems we ended out with out-of-order patches. Do you have any idea how did we end up with this? Do you remember what was the sequence in which you clicked the 'merge' button on the different patches?
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

Both are merged, let's see how OST fares On Mon, Nov 13, 2017 at 12:00 PM, Allon Mureinik <amureini@redhat.com> wrote:
I merged 83920 with dependencies. Perhaps gerrit doesn't handle such chains properly (or perhaps our CI is misusing the API?).
WRT current issues, posted two patches upstream: - https://gerrit.ovirt.org/#/c/83986/ - fixes the GWT build that was broken this morning, preventing a proper build altogether - https://gerrit.ovirt.org/#/c/83987/ - should fix the OST regression you noted.
I'm waiting for the CI to finish running on both and will merge.
On Mon, Nov 13, 2017 at 11:26 AM, Barak Korren <bkorren@redhat.com> wrote:
On 13 November 2017 at 11:10, Allon Mureinik <amureini@redhat.com> wrote:
Commit 86bf82e746404145e8b97df46f514086e4f82e69 is probably the offending commit, taking a deeper look now, should have a fix within the hour
Hmm, I see what happened there, that commit is https://gerrit.ovirt.org/c/83920, so it was tested at: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3769 and passed.
BUT, it was tested along with https://gerrit.ovirt.org/#/c/83918/2 that was merged earlier as far as git ordering goes, but somehow was assumed to be a later patch. When we test multiple patches from the same project we essentially use packages built from the one we deem to be the latest. So in this case only the packages from 83918 were actually tested.
We went to great lengths to preserve correct patch ordering by preserving the order of event notifications reaching us from gerrit. But here it seems we ended out with out-of-order patches. Do you have any idea how did we end up with this? Do you remember what was the sequence in which you clicked the 'merge' button on the different patches?
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

On 13 November 2017 at 12:44, Allon Mureinik <amureini@redhat.com> wrote:
Both are merged, let's see how OST fares
The 1st one failed, but the 2nd passed: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3792 So we're over this hill. -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
participants (3)
-
Allon Mureinik
-
Barak Korren
-
Nir Soffer