[oVirt Jenkins] ovirt-system-tests_performance-suite-master - Build
# 1521 - Still Failing!
by jenkins@jenkins.phx.ovirt.org
Project: https://jenkins.ovirt.org/job/ovirt-system-tests_performance-suite-master/
Build: https://jenkins.ovirt.org/job/ovirt-system-tests_performance-suite-master...
Build Number: 1521
Build Status: Still Failing
Triggered By: Started by timer
-------------------------------------
Changes Since Last Success:
-------------------------------------
Changes for Build #1516
[Marcin Sobczyk] network: Run network suite master on el8
Changes for Build #1517
[Andrej Cernek] network, hostlib: refactor setup_networks method
[Barak Korren] usrc: Support *.yml files
Changes for Build #1518
[Galit] Remove basic-suite-4.2 suite
Changes for Build #1519
[Marcin Sobczyk] basic: Verify glance template creation before tearing down the engine
Changes for Build #1520
[Marcin Sobczyk] ost-images: Use OST images when available
Changes for Build #1521
[Ehud Yonasi] fix el7 container label.
[Roy Golan] Removing rgolan(a)redhat.com from email notifs
-----------------
Failed Tests:
-----------------
1 tests failed.
FAILED: 1000_analyze_postgresql.analyze_postgres
Error Message:
Failed to perform the postgres analysis. Exit code is 1
-------------------- >> begin captured logging << --------------------
lago.plugins.vm: INFO: * Copy /home/jenkins/agent/workspace/ovirt-system-tests_performance-suite-master/ovirt-system-tests/performance-suite-master/../common/test-scenarios-files/analyze_postgresql.sh to lago-performance-suite-master-engine:/root: ?[0m?[0m
lago.ssh: DEBUG: start task:a9b6bf76-d6a4-4f6d-938d-59d14cc805d8:Get ssh client for lago-performance-suite-master-engine:
lago.ssh: DEBUG: end task:a9b6bf76-d6a4-4f6d-938d-59d14cc805d8:Get ssh client for lago-performance-suite-master-engine:
lago.plugins.vm: INFO: * Copy /home/jenkins/agent/workspace/ovirt-system-tests_performance-suite-master/ovirt-system-tests/performance-suite-master/../common/test-scenarios-files/analyze_postgresql.sh to lago-performance-suite-master-engine:/root: ?[32mSuccess?[0m (in 0:00:00)
lago.ssh: DEBUG: start task:3cbcc980-918f-4be9-ac5e-e10f86305521:Get ssh client for lago-performance-suite-master-engine:
lago.ssh: DEBUG: end task:3cbcc980-918f-4be9-ac5e-e10f86305521:Get ssh client for lago-performance-suite-master-engine:
lago.ssh: DEBUG: Running aee507f2 on lago-performance-suite-master-engine: /root/analyze_postgresql.sh
lago.ssh: DEBUG: Command aee507f2 on lago-performance-suite-master-engine returned with 1
lago.ssh: DEBUG: Command aee507f2 on lago-performance-suite-master-engine output:
Last metadata expiration check: 0:11:06 ago on Thu 16 Jul 2020 11:03:48 PM EDT.
[MIRROR] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)
[MIRROR] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)
[MIRROR] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)
[MIRROR] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)
[FAILED] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)
lago.ssh: DEBUG: Command aee507f2 on lago-performance-suite-master-engine errors:
+ pg_datadir=/var/lib/pgsql/data
+ pg_service=postgresql.service
+ pgdg=
+ scl_enable=
++ get_engine_pg_scl
++ engine_pg_scl_conf=(/etc/ovirt-engine/engine.conf.d/*scl-postgres*.conf)
++ local engine_pg_scl_conf
++ local res=
++ [[ -e /etc/ovirt-engine/engine.conf.d/*scl-postgres*.conf ]]
++ echo ''
+ engine_pg_scl=
+ [[ -n '' ]]
++ rpm -q postgresql
+ pgver=postgresql-12.1-2.module_el8.1.0+273+979c16e6.x86_64
+ case "${pgver}" in
+ pgdg=https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_6...
+ [[ -n https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... ]]
+ yum install -y https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd...
Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)
--------------------- >> end captured logging << ---------------------
Stack Trace:
File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
testMethod()
File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 142, in wrapped_test
test()
File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in wrapper
return func(get_test_prefix(), *args, **kwargs)
File "/home/jenkins/agent/workspace/ovirt-system-tests_performance-suite-master/ovirt-system-tests/performance-suite-master/test-scenarios/1000_analyze_postgresql.py", line 52, in analyze_postgres
' Exit code is %s' % result.code
File "/usr/lib/python2.7/site-packages/nose/tools/trivial.py", line 29, in eq_
raise AssertionError(msg or "%r != %r" % (a, b))
'Failed to perform the postgres analysis. Exit code is 1\n-------------------- >> begin captured logging << --------------------\nlago.plugins.vm: INFO: * Copy /home/jenkins/agent/workspace/ovirt-system-tests_performance-suite-master/ovirt-system-tests/performance-suite-master/../common/test-scenarios-files/analyze_postgresql.sh to lago-performance-suite-master-engine:/root: \x1b[0m\x1b[0m\nlago.ssh: DEBUG: start task:a9b6bf76-d6a4-4f6d-938d-59d14cc805d8:Get ssh client for lago-performance-suite-master-engine:\nlago.ssh: DEBUG: end task:a9b6bf76-d6a4-4f6d-938d-59d14cc805d8:Get ssh client for lago-performance-suite-master-engine:\nlago.plugins.vm: INFO: * Copy /home/jenkins/agent/workspace/ovirt-system-tests_performance-suite-master/ovirt-system-tests/performance-suite-master/../common/test-scenarios-files/analyze_postgresql.sh to lago-performance-suite-master-engine:/root: \x1b[32mSuccess\x1b[0m (in 0:00:00)\nlago.ssh: DEBUG: start task:3cbcc980-918f-4be9-ac5e-e10f86305521:Get ssh client for lago-performance-suite-master-engine:\nlago.ssh: DEBUG: end task:3cbcc980-918f-4be9-ac5e-e10f86305521:Get ssh client for lago-performance-suite-master-engine:\nlago.ssh: DEBUG: Running aee507f2 on lago-performance-suite-master-engine: /root/analyze_postgresql.sh\nlago.ssh: DEBUG: Command aee507f2 on lago-performance-suite-master-engine returned with 1\nlago.ssh: DEBUG: Command aee507f2 on lago-performance-suite-master-engine output:\n Last metadata expiration check: 0:11:06 ago on Thu 16 Jul 2020 11:03:48 PM EDT.\n[MIRROR] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)\n[MIRROR] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)\n[MIRROR] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)\n[MIRROR] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)\n[FAILED] pgdg-redhat-repo-latest.noarch.rpm: Status code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)\n\nlago.ssh: DEBUG: Command aee507f2 on lago-performance-suite-master-engine errors:\n + pg_datadir=/var/lib/pgsql/data\n+ pg_service=postgresql.service\n+ pgdg=\n+ scl_enable=\n++ get_engine_pg_scl\n++ engine_pg_scl_conf=(/etc/ovirt-engine/engine.conf.d/*scl-postgres*.conf)\n++ local engine_pg_scl_conf\n++ local res=\n++ [[ -e /etc/ovirt-engine/engine.conf.d/*scl-postgres*.conf ]]\n++ echo \'\'\n+ engine_pg_scl=\n+ [[ -n \'\' ]]\n++ rpm -q postgresql\n+ pgver=postgresql-12.1-2.module_el8.1.0+273+979c16e6.x86_64\n+ case "${pgver}" in\n+ pgdg=https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_6... [[ -n https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... ]]\n+ yum install -y https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... code: 404 for https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-8-x86_64/pgd... (IP: 72.32.157.246)\n\n--------------------- >> end captured logging << ---------------------'
4 years, 5 months
[JIRA] (OVIRT-2972) resources.ovirt.org ran out of space for artifact publishing
by Evgheni Dereveanchin (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-2972?page=com.atlassian.jir... ]
Evgheni Dereveanchin commented on OVIRT-2972:
---------------------------------------------
Suspect jobs that were reporting issues this weekend:
[https://jenkins.ovirt.org/job/ovirt_master_publish-rpms_nightly/|https://...]
[https://jenkins.ovirt.org/job/ovirt_4.3_publish-rpms_nightly/|https://jen...]
[https://jenkins.ovirt.org/job/ovirt_4.2_publish-rpms_nightly/|https://jen...]
The errors are as follows:
{noformat}13:15:32 2020-07-19 11:15:32,180::INFO::root::Saving /home/jenkins/ovirt-master-snapshot.tmp/rpm/fc29/x86_64/ovirt-engine-wildfly-18.0.0-1.fc29.x86_64.rpm
13:15:32 Traceback (most recent call last):
13:15:32 File "/bin/repoman", line 10, in <module>
13:15:32 sys.exit(main())
13:15:32 File "/usr/lib/python2.7/site-packages/repoman/cmd.py", line 461, in main
13:15:32 exit_code = do_add(args, config, repo)
13:15:32 File "/usr/lib/python2.7/site-packages/repoman/cmd.py", line 361, in do_add
13:15:32 repo.save()
13:15:32 File "/usr/lib/python2.7/site-packages/repoman/common/repo.py", line 42, in _func
13:15:32 return func(self, *args, **kwargs)
13:15:32 File "/usr/lib/python2.7/site-packages/repoman/common/repo.py", line 201, in save
13:15:32 store.save()
13:15:32 File "/usr/lib/python2.7/site-packages/repoman/common/stores/RPM/__init__.py", line 263, in save
13:15:32 self._save(**args)
13:15:32 File "/usr/lib/python2.7/site-packages/repoman/common/stores/RPM/__init__.py", line 302, in _save
13:15:32 save_file(pkg.path, dst_path)
13:15:32 File "/usr/lib/python2.7/site-packages/repoman/common/utils.py", line 436, in save_file
13:15:32 copy(src_path, dst_path)
13:15:32 File "/usr/lib/python2.7/site-packages/repoman/common/utils.py", line 296, in copy
13:15:32 shutil.copy2(what, where)
13:15:32 File "/usr/lib64/python2.7/shutil.py", line 130, in copy2
13:15:32 copyfile(src, dst)
13:15:32 File "/usr/lib64/python2.7/shutil.py", line 84, in copyfile
13:15:32 copyfileobj(fsrc, fdst)
13:15:32 File "/usr/lib64/python2.7/shutil.py", line 52, in copyfileobj
13:15:32 fdst.write(buf)
13:15:32 IOError: [Errno 28] No space left on device
13:15:32 2020-07-19 11:15:32,917::INFO::repoman.common.repo::Cleaning up temporary dir /tmp/tmpRKo543
13:15:34 Waiting for scan_for_artifacts.sh cron job to finish (attempt 1 of 50)
{noformat}
looks like the job ran out of tmpfs space - did we publish too many files for some reason? Not entirely sure what tmp is used for in these jobs.
> resources.ovirt.org ran out of space for artifact publishing
> ------------------------------------------------------------
>
> Key: OVIRT-2972
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2972
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Reporter: Evgheni Dereveanchin
> Assignee: infra
> Attachments: image-20200720-083159.png, prom.png
>
>
> This weekend Nagios sent disk space alerts for the /home/jenkins on resources.ovirt.org
> AFAIR this is used as intermediate artifact storage for publishing and shouldn't fill up that much. Logging a ticket to investigate the root cause.
> The partition is free again now so this is not blocking anything right now.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100133)
4 years, 5 months
[JIRA] (OVIRT-2972) resources.ovirt.org ran out of space for artifact publishing
by Evgheni Dereveanchin (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-2972?page=com.atlassian.jir... ]
Evgheni Dereveanchin commented on OVIRT-2972:
---------------------------------------------
I did get emails from Nagios so we were alerted, just want to confirm the cause here as that’s something that doesn’t happen often: looks like some publishing job failed and did not clean up after itself which caused the next one to bring in even more files and fill up the partition.
> resources.ovirt.org ran out of space for artifact publishing
> ------------------------------------------------------------
>
> Key: OVIRT-2972
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2972
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Reporter: Evgheni Dereveanchin
> Assignee: infra
> Attachments: image-20200720-083159.png, prom.png
>
>
> This weekend Nagios sent disk space alerts for the /home/jenkins on resources.ovirt.org
> AFAIR this is used as intermediate artifact storage for publishing and shouldn't fill up that much. Logging a ticket to investigate the root cause.
> The partition is free again now so this is not blocking anything right now.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100133)
4 years, 5 months
[JIRA] (OVIRT-2973) [FIRING:1] TestAlert
by Alertmanager_Bot (oVirt JIRA)
Alertmanager_Bot created OVIRT-2973:
---------------------------------------
Summary: [FIRING:1] TestAlert
Key: OVIRT-2973
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2973
Project: oVirt - virtualization made easy
Issue Type: Bug
Reporter: Alertmanager_Bot
Assignee: infra
Labels:
- alertname = TestAlert
- key = value
Annotations:
Source:
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100133)
4 years, 5 months
[JIRA] (OVIRT-2972) resources.ovirt.org ran out of space for artifact publishing
by Shlomi Zidmi (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-2972?page=com.atlassian.jir... ]
Shlomi Zidmi commented on OVIRT-2972:
-------------------------------------
Actually we are monitoring [resources.ovirt.org|http://resources.ovirt.org] in Prometheus, and should have caught this on time
!prom.png|width=83.33333333333334%!
Looking at the alert rules defined, i see the following:
{noformat}node_filesystem_free_bytes
/ node_filesystem_size_bytes{device!="tmpfs"} < 0.1{noformat}
The blue line (/dev/sdb - mounted on /home/jenkins) in the picture is slightly above 0.1, and that’s why no alerts were fired. I’ll readjust the threshold so that Prometheus would be able to catch similar issues sooner next time.
> resources.ovirt.org ran out of space for artifact publishing
> ------------------------------------------------------------
>
> Key: OVIRT-2972
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2972
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Reporter: Evgheni Dereveanchin
> Assignee: infra
> Attachments: image-20200720-083159.png, prom.png
>
>
> This weekend Nagios sent disk space alerts for the /home/jenkins on resources.ovirt.org
> AFAIR this is used as intermediate artifact storage for publishing and shouldn't fill up that much. Logging a ticket to investigate the root cause.
> The partition is free again now so this is not blocking anything right now.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100133)
4 years, 5 months
[JIRA] (OVIRT-2972) resources.ovirt.org ran out of space for artifact publishing
by Galit Rosenthal (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-2972?page=com.atlassian.jir... ]
Galit Rosenthal edited comment on OVIRT-2972 at 7/20/20 9:13 AM:
-----------------------------------------------------------------
When I fixed the issues, the nagios already send a message that the disk space came back to normal.
I just removed the *.ready directory that prevented from the publish to work
was (Author: grosenth(a)redhat.com):
When I fixed the issues, the jenkins already send a message that the disk space came back to normal.
I just removed the *.ready directory that prevented from the publish to work
> resources.ovirt.org ran out of space for artifact publishing
> ------------------------------------------------------------
>
> Key: OVIRT-2972
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2972
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Reporter: Evgheni Dereveanchin
> Assignee: infra
> Attachments: image-20200720-083159.png
>
>
> This weekend Nagios sent disk space alerts for the /home/jenkins on resources.ovirt.org
> AFAIR this is used as intermediate artifact storage for publishing and shouldn't fill up that much. Logging a ticket to investigate the root cause.
> The partition is free again now so this is not blocking anything right now.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100133)
4 years, 5 months