[Ovirt] [CQ weekly status] [10-08-2019]
by Dusan Fodor
Hi,
This mail is to provide the current status of CQ and allow people to review
status before and after the weekend.
Please refer to below colour map for further information on the meaning of
the colours.
*CQ-4.2*: GREEN (#3)
No failure during this week.
*CQ-4.3*: RED (#1)
Last failure was on 10-08 for ovirt-ansible-image-template in
verify_glance_import.
This issue is ongoing, investigation will continue under thread [OST
Failure Report] [oVirt master&4.3] [09-08-2019] [verify_glance_import]
*CQ-Master:* RED (#1)
Last failure was on 10-08 for ovirt-ansible-image-template in
verify_glance_import.
This issue is ongoing, investigation will continue under thread [OST
Failure Report] [oVirt master&4.3] [09-08-2019] [verify_glance_import]
Current running jobs for 4.2 [1], 4.3 [2] and master [3] can be found
here:
[1] http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt
-4.2_change-queue-tester/
[2] https://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt
-4.3_change-queue-tester/
[3] http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt
-master_change-queue-tester/
Have a nice week!
Dusan
-------------------------------------------------------------------------------------------------------------------
COLOUR MAP
Green = job has been passing successfully
** green for more than 3 days may suggest we need a review of our test
coverage
1.
1-3 days GREEN (#1)
2.
4-7 days GREEN (#2)
3.
Over 7 days GREEN (#3)
Yellow = intermittent failures for different projects but no lasting or
current regressions
** intermittent would be a healthy project as we expect a number of
failures during the week
** I will not report any of the solved failures or regressions.
1.
Solved job failures YELLOW (#1)
2.
Solved regressions YELLOW (#2)
Red = job has been failing
** Active Failures. The colour will change based on the amount of time the
project/s has been broken. Only active regressions would be reported.
1.
1-3 days RED (#1)
2.
4-7 days RED (#2)
3.
Over 7 days RED (#3)
_______________________________________________
Devel mailing list -- devel(a)ovirt.org
To unsubscribe send an email to devel-leave(a)ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt
.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/devel@ovirt
.org/message/YCNCKRK3G4EJXA3OCYAUS4VMKRDA67F4/
5 years, 5 months
Ovirt Manager - FATAL: the database system is in recovery mode
by Sameer Sardar
PACKAGE_NAME="ovirt-engine"
PACKAGE_VERSION="3.6.2.6"
PACKAGE_DISPLAY_VERSION="3.6.2.6-1.el6"
OPERATING SYSTEM="CentOS 6.7"
Its been running error-free for over 3 years now. Over the past few weeks, the Ovirt manager application has been frequently losing connection with its own database residing on the same server. It is throwing up errors saying: “Caused by org.postgresql.util.PSQLException: FATAL: the database system is in recovery mode”. The only means we have found to recover from this is to hard reboot the server, after which it works fine for a couple of days before throwing up these errors again.
Here are some logs :
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is in recovery mode
at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:293)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:108)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22)
at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:32)
at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:393)
at org.postgresql.Driver.connect(Driver.java:267)
at org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.createLocalManagedConnection(LocalManagedConnectionFactory.java:322)
5 years, 5 months
OST's basic suite UI sanity tests optimization
by Marcin Sobczyk
Hi,
_TL; DR_ Let's cut the running time of '008_basic_ui_sanity.py' by more
than 3 minutes by sacrificing Firefox and Chrome screenshots in favor of
Chromium.
During the OST hackathon in Brno this year, I saw an opportunity to
optimize basic UI sanity tests from basic suite.
The way we currently run them, is by setting up a Selenium grid using 3
docker containers, with a dedicated network... that's insanity! (pun
intended).
Let's a look at the running time of '008_basic_ui_sanity.py' scenario
(https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-te...):
01:31:50 @ Run test: 008_basic_ui_sanity.py:
01:31:50 nose.config: INFO: Ignoring files matching ['^\\.', '^_',
'^setup\\.py$']
01:31:50 # init:
01:31:50 # init: Success (in 0:00:00)
01:31:50 # start_grid:
01:34:05 # start_grid: Success (in 0:02:15)
01:34:05 # initialize_chrome:
01:34:18 # initialize_chrome: Success (in 0:00:13)
01:34:18 # login:
01:34:27 # login: Success (in 0:00:08)
01:34:27 # left_nav:
01:34:45 # left_nav: Success (in 0:00:18)
01:34:45 # close_driver:
01:34:46 # close_driver: Success (in 0:00:00)
01:34:46 # initialize_firefox:
01:35:02 # initialize_firefox: Success (in 0:00:16)
01:35:02 # login:
01:35:11 # login: Success (in 0:00:08)
01:35:11 # left_nav:
01:35:29 # left_nav: Success (in 0:00:18)
01:35:29 # cleanup:
01:35:36 # cleanup: Success (in 0:00:06)
01:35:36 # Results located at
/dev/shm/ost/deployment-basic-suite-master/008_basic_ui_sanity.py.junit.xml
01:35:36 @ Run test: 008_basic_ui_sanity.py: Success (in 0:03:45)
Starting the Selenium grid takes 2:15 out of 3:35 of total running time!
I've investigated a lot of approaches and came up with something like this:
* install 'chromium-headless' package on engine VM
* download 'chromedriver' and 'selenium hub' jar and deploy them in
'/var/opt/' on engine's VM
* run 'selenium.jar' on engine VM from '008_basic_ui_sanity.py' by
using Lago's ssh
* connect to the Selenium instance running on the engine in
'008_basic_ui_sanity.py'
* make screenshots
This series of patches represent the changes:
https://gerrit.ovirt.org/#/q/topic:selenium-on-engine+(status:open+OR+sta....
This is the new running time (https://jenkins.ovirt.org/view/oVirt
system tests/job/ovirt-system-tests_manual/4195/):
20:13:26 @ Run test: 008_basic_ui_sanity.py:
20:13:26 nose.config: INFO: Ignoring files matching ['^\\.', '^_',
'^setup\\.py$']
20:13:26 # init:
20:13:26 # init: Success (in 0:00:00)
20:13:26 # make_screenshots:
20:13:27 * Retrying (Retry(total=2, connect=None, read=None,
redirect=None, status=None)) after connection broken by
'NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7fdb6004f8d0>: Failed to establish a new connection: [Errno 111]
Connection refused',)': /wd/hub
20:13:27 * Retrying (Retry(total=1, connect=None, read=None,
redirect=None, status=None)) after connection broken by
'NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7fdb6004fa10>: Failed to establish a new connection: [Errno 111]
Connection refused',)': /wd/hub
20:13:27 * Retrying (Retry(total=0, connect=None, read=None,
redirect=None, status=None)) after connection broken by
'NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7fdb6004fb50>: Failed to establish a new connection: [Errno 111]
Connection refused',)': /wd/hub
20:13:28 * Redirecting http://192.168.201.4:4444/wd/hub ->
http://192.168.201.4:4444/wd/hub/static/resource/hub.html
20:14:02 # make_screenshots: Success (in 0:00:35)
20:14:02 # Results located at
/dev/shm/ost/deployment-basic-suite-master/008_basic_ui_sanity.py.junit.xml
20:14:02 @ Run test: 008_basic_ui_sanity.py: Success (in 0:00:35)
(The 'NewConnectionErrors' is waiting for Selenium hub to be up and
running, I can silence these later).
And the screenshots are here:
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-te...
_The pros:_
* we cut the running time by more than 3 minutes
_The cons:_
* we don't get Firefox or Chrome screenshots - we get Chromium
screenshots (although AFAIK, QE has much more Selenium tests which
cover both Firefox and Chrome)
* we polute the engine VM with 'chromium-headless' package and deps
(in total: 'chromium-headless', 'chromium-common', 'flac-libs' and
'minizip'), although we can remove these after the tests
_Some design choices explained:_
Q: Why engine VM?
A: Because the engine VM already has 'X11' libs. We could install
'chromium-headless' (and even other browsers) on our Jenkins executors,
but that would mess them up a lot.
Q: Why Chromium?
A: Because it has a separate 'headless' package.
Q: Why not use 'chromedriver' RPM in favor of
https://chromedriver.storage.googleapis.com Chromedriver builds?
A: Because the RPM version pulls a lot of extra dependencies even on the
engine VM ('gtk3', 'cairo' etc.). Builds from the URL are the offical
Google Chromedriver builds, they contain a single binary, and they work
for us.
_What still needs to be polished with the patches:_
* Currently 'setup_engine_selenium.sh' script downloads each time
'selenium.jar' and 'chromedriver.zip' (even with these downloads we
get much faster set-up times) - we should bake these into the engine
VM image template.
* 'selenium_hub_running' function in 'selenium_on_engine.py' is
hackish - an ability to run an ssh command with a context manager
(and auto-terminate on it exits) should be part of Lago. Can be
refactored.
Questions, comments, reviews are welcome.
Regards, Marcin
5 years, 5 months
Failure during OST run
by Andrej Cernek
Hi,
I am failing at running ost:
* [Thread-2] Deploy VM lago-network-suite-master-engine: ERROR (in
0:02:13)
# Deploy environment: ERROR (in 0:02:14)
@ Deploy oVirt environment: ERROR (in 0:02:14)
The error seems to be caused by unavailable package:
Error: Package:
ovirt-engine-metrics-1.3.4-0.0.master.20190804083458.git68b317a.el7.noarch
(alocalsync)
Requires: ansible >= 2.8.3
Available: ansible-2.4.2.0-2.el7.noarch (extras)
ansible = 2.4.2.0-2.el7
Installing: ansible-2.8.2-1.el7.noarch (alocalsync)
ansible = 2.8.2-1.el7
Did anyone encounter similar issues?
Thanks,
Regards,
Andrej
5 years, 5 months
[Ovirt] [CQ weekly status] [02-08-2019]
by Dusan Fodor
Hi,
This mail is to provide the current status of CQ and allow people to review
status before and after the weekend.
Please refer to below colour map for further information on the meaning of
the colours.
*CQ-4.2*: RED (#1)
Last failure was on 01-08 for ovirt-ansible-hosted-engine-setup caused by
missing dependency, patch is pending to fix this.
*CQ-4.3*: RED (#1)
Last failure was on 02-08 for vdsm caused by missing dependency, patch is
pending to fix this.
*CQ-Master:* RED (#1)
Last failure was on 02-08 for ovirt-engine due failure in build-artifacts,
which was caused by gerrit issue, which was reported Evgheni.
Current running jobs for 4.2 [1], 4.3 [2] and master [3] can be found
here:
[1]
http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt-4.2_change-...
[2]
https://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt-4.3_change...
[3]
http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt-master_chan...
Have a nice week!
Dusan
-------------------------------------------------------------------------------------------------------------------
COLOUR MAP
Green = job has been passing successfully
** green for more than 3 days may suggest we need a review of our test
coverage
1.
1-3 days GREEN (#1)
2.
4-7 days GREEN (#2)
3.
Over 7 days GREEN (#3)
Yellow = intermittent failures for different projects but no lasting or
current regressions
** intermittent would be a healthy project as we expect a number of
failures during the week
** I will not report any of the solved failures or regressions.
1.
Solved job failures YELLOW (#1)
2.
Solved regressions YELLOW (#2)
Red = job has been failing
** Active Failures. The colour will change based on the amount of time the
project/s has been broken. Only active regressions would be reported.
1.
1-3 days RED (#1)
2.
4-7 days RED (#2)
3.
Over 7 days RED (#3)
_______________________________________________
Devel mailing list -- devel(a)ovirt.org
To unsubscribe send an email to devel-leave(a)ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/YCNCKRK3G4E...
5 years, 5 months
Network tests failing randomly again: AssertionError: '192.168.98.1/29' unexpectedly found in ['192.168.99.1/29', '192.168.98.1/29', 'fe80::3c92:48ff:fecd:8366/64']
by Nir Soffer
This used to happen randomly in the past, and started to happen again.
I can ignore this failure and merge, but this may fail the change queue.
Build:
https://jenkins.ovirt.org/job/vdsm_standard-check-patch/9520//artifact/ch...
======================================================================
FAIL: test_add_delete_ipv4 (network.ip_address_test.IPAddressTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/tests/network/ip_address_test.py",
line 222, in test_add_delete_ipv4
self._test_add_delete(IPV4_A_WITH_PREFIXLEN, IPV4_B_WITH_PREFIXLEN)
File "/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/tests/network/ip_address_test.py",
line 247, in _test_add_delete
self._assert_has_no_address(nic, ip_b)
File "/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/tests/network/ip_address_test.py",
line 344, in _assert_has_no_address
self._assert_address_not_in(address_with_prefixlen, addresses)
File "/home/jenkins/workspace/vdsm_standard-check-patch/vdsm/tests/network/ip_address_test.py",
line 352, in _assert_address_not_in
self.assertNotIn(address_with_prefixlen, addresses_list)
AssertionError: '192.168.98.1/29' unexpectedly found in
['192.168.99.1/29', '192.168.98.1/29', 'fe80::3c92:48ff:fecd:8366/64']
-------------------- >> begin captured logging << --------------------
2019-08-05 12:27:49,718 DEBUG (MainThread) [root] /sbin/ip link add
name dummy_GmE1I type dummy (cwd None) (cmdutils:130)
2019-08-05 12:27:49,731 DEBUG (MainThread) [root] SUCCESS: <err> = '';
<rc> = 0 (cmdutils:138)
2019-08-05 12:27:49,733 DEBUG (netlink/events) [root] START thread
<Thread(netlink/events, started daemon 140511381804800)> (func=<bound
method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object
at 0x7fcb7bbfd0d0>>, args=(), kwargs={}) (concurrent:193)
2019-08-05 12:27:49,734 DEBUG (MainThread) [root] /sbin/ip link set
dev dummy_GmE1I up (cwd None) (cmdutils:130)
2019-08-05 12:27:49,746 DEBUG (MainThread) [root] SUCCESS: <err> = '';
<rc> = 0 (cmdutils:138)
2019-08-05 12:27:49,749 DEBUG (netlink/events) [root] FINISH thread
<Thread(netlink/events, started daemon 140511381804800)>
(concurrent:196)
2019-08-05 12:27:49,755 DEBUG (MainThread) [root] /sbin/ip -4 addr add
dev dummy_GmE1I 192.168.99.1/29 (cwd None) (cmdutils:130)
2019-08-05 12:27:49,763 DEBUG (MainThread) [root] SUCCESS: <err> = '';
<rc> = 0 (cmdutils:138)
2019-08-05 12:27:49,767 DEBUG (MainThread) [root] /sbin/ip -4 addr add
dev dummy_GmE1I 192.168.98.1/29 (cwd None) (cmdutils:130)
2019-08-05 12:27:49,778 DEBUG (MainThread) [root] SUCCESS: <err> = '';
<rc> = 0 (cmdutils:138)
2019-08-05 12:27:49,785 DEBUG (MainThread) [root] /sbin/ip -4 addr del
dev dummy_GmE1I 192.168.98.1/29 (cwd None) (cmdutils:130)
2019-08-05 12:27:49,796 DEBUG (MainThread) [root] SUCCESS: <err> = '';
<rc> = 0 (cmdutils:138)
2019-08-05 12:27:49,803 DEBUG (MainThread) [root] /sbin/ip link del
dev dummy_GmE1I (cwd None) (cmdutils:130)
2019-08-05 12:27:49,824 DEBUG (MainThread) [root] SUCCESS: <err> = '';
<rc> = 0 (cmdutils:138)
5 years, 5 months