[ OST Failure Report ] [ oVirt Master ] [ Aug. 8th 2017 ] [ 001_upgrade_engine.test_initialize_engine ]

This is a multi-part message in MIME format. --------------578D3CF4C53C01FCD1265694 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit ** *Hi, * * ** We see a sporadic failure in the upgrade test. from what I can see from the log it is related to the firewalD package. Test failed: *001_upgrade_engine.test_initialize_engine* Link to suspected patches: Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/ Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/ Error snippet from the log: <error> * *2017-08-08 04:09:11,006-0400 DEBUG otopi.plugins.otopi.network.firewalld plugin.execute:926 execute-output: ('/bin/firewall-cmd', '--zone', u'public', '--permanent', '--add-service', 'ovirt-postgres') stderr: ESC[91mError: Action org.fedoraproject.FirewallD1.all is not registeredESC[00m 2017-08-08 04:09:11,007-0400 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/otopi/plugins/otopi/network/firewalld.py", line 334, in _closeup '--add-service', service, File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute command=args[0], RuntimeError: Command '/bin/firewall-cmd' failed to execute </error> * --------------578D3CF4C53C01FCD1265694 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-c11c-2c92-eac3-5aaf079f97bc"> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Hi, </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> </span></p> </b><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-c11c-2c92-eac3-5aaf079f97bc"> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">We see a sporadic failure in the upgrade test. from what I can see from the log it is related to the firewalD package. </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Test failed: <b>001_upgrade_engine.test_initialize_engine</b></span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link to suspected patches: </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link to Job: <a class="moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/</a></span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link to all logs: <a class="moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/</a></span></p> <br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Error snippet from the log: </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"><error></span></p> <br> </b><br> <b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-c11c-2c92-eac3-5aaf079f97bc">2017-08-08 04:09:11,006-0400 DEBUG otopi.plugins.otopi.network.firewalld plugin.execute:926 execute-output: ('/bin/firewall-cmd', '--zone', u'public', '--permanent', '--add-service', 'ovirt-postgres') stderr:<br> ESC[91mError: Action org.fedoraproject.FirewallD1.all is not registeredESC[00m<br> <br> 2017-08-08 04:09:11,007-0400 DEBUG otopi.context context._executeMethod:142 method exception<br> Traceback (most recent call last):<br> File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod<br> method['method']()<br> File "/usr/share/otopi/plugins/otopi/network/firewalld.py", line 334, in _closeup<br> '--add-service', service,<br> File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute<br> command=args[0],<br> RuntimeError: Command '/bin/firewall-cmd' failed to execute<br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></error></span></p> <br> </b></p> <p><br> </p> <p><br> </p> <p><br> </p> </body> </html> --------------578D3CF4C53C01FCD1265694--

On Tue, Aug 8, 2017 at 12:21 PM, Dafna Ron <dron@redhat.com> wrote:
*Hi, *
* We see a sporadic failure in the upgrade test. from what I can see from the log it is related to the firewalD package. Test failed: 001_upgrade_engine.test_initialize_engine Link to suspected patches: Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/ <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/> Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/ <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/> Error snippet from the log: <error> *
*2017-08-08 04:09:11,006-0400 DEBUG otopi.plugins.otopi.network.firewalld plugin.execute:926 execute-output: ('/bin/firewall-cmd', '--zone', u'public', '--permanent', '--add-service', 'ovirt-postgres') stderr: ESC[91mError: Action org.fedoraproject.FirewallD1.all is not registeredESC[00m 2017-08-08 04:09:11,007-0400 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/otopi/plugins/otopi/network/firewalld.py", line 334, in _closeup '--add-service', service, File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute command=args[0], RuntimeError: Command '/bin/firewall-cmd' failed to execute </error> *
Google shows we are not the only one suffering from it - see [1] and the workarounds suggested. Y. [1] https://github.com/openshift/openshift-ansible/issues/3213
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Tue, Aug 8, 2017 at 12:27 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Tue, Aug 8, 2017 at 12:21 PM, Dafna Ron <dron@redhat.com> wrote:
Hi,
We see a sporadic failure in the upgrade test. from what I can see from the log it is related to the firewalD package.
Test failed: 001_upgrade_engine.test_initialize_engine
Link to suspected patches:
Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/
Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/
Downloaded all exported artifacts, unzipped, and:
Error snippet from the log:
<error>
2017-08-08 04:09:11,006-0400 DEBUG otopi.plugins.otopi.network.firewalld plugin.execute:926 execute-output: ('/bin/firewall-cmd', '--zone', u'public', '--permanent', '--add-service', 'ovirt-postgres') stderr: ESC[91mError: Action org.fedoraproject.FirewallD1.all is not registeredESC[00m
2017-08-08 04:09:11,007-0400 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/otopi/plugins/otopi/network/firewalld.py", line 334, in _closeup '--add-service', service, File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute command=args[0], RuntimeError: Command '/bin/firewall-cmd' failed to execute
</error>
can't find this error. In the console log [1] I see: 08:09:14 [upgrade-from-prevrelease-suit] Error occured, aborting 08:09:14 [upgrade-from-prevrelease-suit] Traceback (most recent call last): 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 360, in do_run 08:09:14 [upgrade-from-prevrelease-suit] self.cli_plugins[args.ovirtverb].do_run(args) 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 184, in do_run 08:09:14 [upgrade-from-prevrelease-suit] self._do_run(**vars(args)) 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in wrapper 08:09:14 [upgrade-from-prevrelease-suit] return func(*args, **kwargs) 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in wrapper 08:09:14 [upgrade-from-prevrelease-suit] return func(*args, prefix=prefix, **kwargs) 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 99, in do_ovirt_runtest 08:09:14 [upgrade-from-prevrelease-suit] raise RuntimeError('Some tests failed') 08:09:14 [upgrade-from-prevrelease-suit] RuntimeError: Some tests failed But it's hard to understand what failed. If I look around in the web interface, I can find this in [2]. So for some reason, downloading "all files in zip" misses many files. Not sure why. IIRC we already had similar cases in the past. When I press "(all files in zip)" in [3], I get a 134016 bytes archive.zip, which only has inside it "basic-suit-master-el7" (and "JenkinsTestedChangeList.dat"), no "upgrade-from-prevrelease-suit-master-el7" or "upgrade-from-release-suit-master-el7". Please check. [1] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/consoleFu... [2] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/... [3] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/
Google shows we are not the only one suffering from it - see [1] and the workarounds suggested.
The workaround [4] seems to have been to restart polkitd. Do we want to do this? Where? If in otopi, I'd consider this a somewhat ugly hack, and would rather not, unless we have a clear and realistic reproducer. If in CI, not sure where/when exactly. Best, [4] https://github.com/openshift/openshift-ansible/pull/3831/commits/457605bc7b8...
Y. [1] https://github.com/openshift/openshift-ansible/issues/3213
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
-- Didi

There is something going on withe the artifacts logs, they are only partially downloaded (I sent a mail to lago team to have a look). Here is the link to the setup logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/... The error would appear in the ovirt-engine-setup-20170808040357-4iiirj.log: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/... Hope this helps. On 08/08/2017 01:30 PM, Yedidyah Bar David wrote:
On Tue, Aug 8, 2017 at 12:27 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Tue, Aug 8, 2017 at 12:21 PM, Dafna Ron <dron@redhat.com> wrote:
Hi,
We see a sporadic failure in the upgrade test. from what I can see from the log it is related to the firewalD package.
Test failed: 001_upgrade_engine.test_initialize_engine
Link to suspected patches:
Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/
Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/
Downloaded all exported artifacts, unzipped, and:
Error snippet from the log:
<error>
2017-08-08 04:09:11,006-0400 DEBUG otopi.plugins.otopi.network.firewalld plugin.execute:926 execute-output: ('/bin/firewall-cmd', '--zone', u'public', '--permanent', '--add-service', 'ovirt-postgres') stderr: ESC[91mError: Action org.fedoraproject.FirewallD1.all is not registeredESC[00m
2017-08-08 04:09:11,007-0400 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/otopi/plugins/otopi/network/firewalld.py", line 334, in _closeup '--add-service', service, File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute command=args[0], RuntimeError: Command '/bin/firewall-cmd' failed to execute
</error>
can't find this error.
In the console log [1] I see:
08:09:14 [upgrade-from-prevrelease-suit] Error occured, aborting 08:09:14 [upgrade-from-prevrelease-suit] Traceback (most recent call last): 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 360, in do_run 08:09:14 [upgrade-from-prevrelease-suit] self.cli_plugins[args.ovirtverb].do_run(args) 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 184, in do_run 08:09:14 [upgrade-from-prevrelease-suit] self._do_run(**vars(args)) 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in wrapper 08:09:14 [upgrade-from-prevrelease-suit] return func(*args, **kwargs) 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in wrapper 08:09:14 [upgrade-from-prevrelease-suit] return func(*args, prefix=prefix, **kwargs) 08:09:14 [upgrade-from-prevrelease-suit] File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 99, in do_ovirt_runtest 08:09:14 [upgrade-from-prevrelease-suit] raise RuntimeError('Some tests failed') 08:09:14 [upgrade-from-prevrelease-suit] RuntimeError: Some tests failed
But it's hard to understand what failed.
If I look around in the web interface, I can find this in [2].
So for some reason, downloading "all files in zip" misses many files. Not sure why. IIRC we already had similar cases in the past. When I press "(all files in zip)" in [3], I get a 134016 bytes archive.zip, which only has inside it "basic-suit-master-el7" (and "JenkinsTestedChangeList.dat"), no "upgrade-from-prevrelease-suit-master-el7" or "upgrade-from-release-suit-master-el7". Please check.
[1] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/consoleFu... [2] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/... [3] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1684/artifact/
Google shows we are not the only one suffering from it - see [1] and the workarounds suggested. The workaround [4] seems to have been to restart polkitd. Do we want to do this? Where? If in otopi, I'd consider this a somewhat ugly hack, and would rather not, unless we have a clear and realistic reproducer. If in CI, not sure where/when exactly.
Best,
[4] https://github.com/openshift/openshift-ansible/pull/3831/commits/457605bc7b8...
Y. [1] https://github.com/openshift/openshift-ansible/issues/3213
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
participants (3)
-
Dafna Ron
-
Yaniv Kaul
-
Yedidyah Bar David