[ OST Failure Report ] [ oVirt Master (ovirt-engine-metrics) ] [ 22-02-2018 ] [ 003_00_metrics_bootstrap.metrics_and_log_collector ]
by Dafna Ron
hi,
We are failing test 003_00_metrics_bootstrap.metrics_and_log_collector for
basic suite.
*Link and headline of suspected patches: *
*ansible: End playbook based on initial validations -
https://gerrit.ovirt.org/#/c/88062/
<https://gerrit.ovirt.org/#/c/88062/>Link to
Job:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5829/
<http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5829/>Link
to all
logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5829/a...
<http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5829/artifacts>(Relevant)
error snippet from the log: <error>*
/var/tmp:
drwxr-x--x. root abrt system_u:object_r:abrt_var_cache_t:s0 abrt
-rw-------. root root unconfined_u:object_r:user_tmp_t:s0 rpm-tmp.aLitM7
-rw-------. root root unconfined_u:object_r:user_tmp_t:s0 rpm-tmp.G2r7IM
-rw-------. root root unconfined_u:object_r:user_tmp_t:s0 rpm-tmp.kVymZE
-rw-------. root root unconfined_u:object_r:user_tmp_t:s0 rpm-tmp.uPDvvU
drwx------. root root system_u:object_r:tmp_t:s0
systemd-private-cd49c74726d5463f8d6f6502380e5e12-chronyd.service-i1T5IE
drwx------. root root system_u:object_r:tmp_t:s0
systemd-private-cd49c74726d5463f8d6f6502380e5e12-systemd-timedated.service-lhoUsS
/var/tmp/abrt:
-rw-------. root root system_u:object_r:abrt_var_cache_t:s0 last-via-server
/var/tmp/systemd-private-cd49c74726d5463f8d6f6502380e5e12-chronyd.service-i1T5IE:
drwxrwxrwt. root root system_u:object_r:tmp_t:s0 tmp
/var/tmp/systemd-private-cd49c74726d5463f8d6f6502380e5e12-chronyd.service-i1T5IE/tmp:
/var/tmp/systemd-private-cd49c74726d5463f8d6f6502380e5e12-systemd-timedated.service-lhoUsS:
drwxrwxrwt. root root system_u:object_r:tmp_t:s0 tmp
/var/tmp/systemd-private-cd49c74726d5463f8d6f6502380e5e12-systemd-timedated.service-lhoUsS/tmp:
/var/yp:
)
2018-02-22 07:24:05::DEBUG::__main__::251::root:: STDERR(/bin/ls:
cannot open directory
/rhev/data-center/mnt/blockSD/6babba93-09c8-4846-9ccb-07728f72eecb/master/tasks/bd563276-5092-4d28-86c4-63aa6c0b4344.temp:
No such file or directory
)
2018-02-22 07:24:05::ERROR::__main__::832::root:: Failed to collect
logs from: lago-basic-suite-master-host-0; /bin/ls: cannot open
directory /rhev/data-center/mnt/blockSD/6babba93-09c8-4846-9ccb-07728f72eecb/master/tasks/bd563276-5092-4d28-86c4-63aa6c0b4344.temp:
No such file or directory
*</error>*
6 years, 9 months
[ OST Failure Report ] [ oVirt Master (ovirt-host) ] [ 22-02-2018 ] [002_bootstrap.add_hosts ]
by Dafna Ron
hi,
we have a failed test for ovirt-host in upgrade suite.
*Link and headline of suspected patches: *
*Require collectd-virt plugin - https://gerrit.ovirt.org/#/c/87311/
<https://gerrit.ovirt.org/#/c/87311/>Link to Job:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5827/
<http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5827/>Link
to all
logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5827/a...
<http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5827/artifacts>(Relevant)
error snippet from the log: <error>*
2018-02-22 05:38:47,587-05 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(VdsDeploy) [546df98d] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509),
Installing Host lago-upgrade-from-release-suite-master-host0. Setting
kernel arguments.
2018-02-22 05:38:47,858-05 ERROR
[org.ovirt.engine.core.uutils.ssh.SSHDialog]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] Swallowing
exception as preferring stderr
2018-02-22 05:38:47,859-05 ERROR
[org.ovirt.engine.core.uutils.ssh.SSHDialog]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] SSH error running
command root@lago-upgrade-from-release-suite-master-host0:'umask 0077;
MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)";
trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr
\"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C
"${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy
DIALOG/dialect=str:machine DIALOG/customization=bool:True':
RuntimeException: Unexpected error during execution: bash: line 1:
1395 Segmentation fault "${MYTMP}"/ovirt-host-deploy
DIALOG/dialect=str:machine DIALOG/customization=bool:True
2018-02-22 05:38:47,859-05 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
[546df98d] Error during deploy dialog
2018-02-22 05:38:47,860-05 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] Error during host
lago-upgrade-from-release-suite-master-host0 install
2018-02-22 05:38:47,864-05 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] EVENT_ID:
VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has occurred during
installation of Host lago-upgrade-from-release-suite-master-host0:
Unexpected error during execution: bash: line 1: 1395 Segmentation
fault "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
DIALOG/customization=bool:True
.
2018-02-22 05:38:47,864-05 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] Error during host
lago-upgrade-from-release-suite-master-host0 install, preferring first
exception: Unexpected connection termination
2018-02-22 05:38:47,864-05 ERROR
[org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] Host installation
failed for host '65c048e5-a97a-422d-88b8-abe1fd925602',
'lago-upgrade-from-release-suite-master-host0': Unexpected connection
termination
2018-02-22 05:38:47,869-05 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] START,
SetVdsStatusVDSCommand(HostName =
lago-upgrade-from-release-suite-master-host0,
SetVdsStatusVDSCommandParameters:{hostId='65c048e5-a97a-422d-88b8-abe1fd925602',
status='InstallFailed', nonOperationalReason='NONE',
stopSpmFailureLogged='false', maintenanceReason='null'}), log id:
1973b09c
2018-02-22 05:38:47,898-05 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] FINISH,
SetVdsStatusVDSCommand, log id: 1973b09c
2018-02-22 05:38:47,911-05 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] EVENT_ID:
VDS_INSTALL_FAILED(505), Host
lago-upgrade-from-release-suite-master-host0 installation failed.
Unexpected connection termination.
2018-02-22 05:38:47,920-05 INFO
[org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand]
(EE-ManagedThreadFactory-engine-Thread-1) [546df98d] Lock freed to
object 'EngineLock:{exclusiveLocks='[65c048e5-a97a-422d-88b8-abe1fd925602=VDS]',
sharedLocks=''}'
*</error>*
6 years, 9 months
[ OST Failure Report ] [ oVirt $VER (ovirt-log-collector) ] [ 21-02-2018 ] [ 098_ovirt_provider_ovn.use_ovn_provider ]
by Dafna Ron
Hi,
We failed test on duplicate network name. I don't think that the change is
related.
I can see that we are importing network_1 and then trying to create a
network by the same name.
I don't think there was a change made to ovn before this failure that could
have cause this to fail but adding Dan in case he is aware of any changes.
*Link and headline of suspected patches: failed change: *
inventory: handle to postgresql dumps from sosreport -
https://gerrit.ovirt.org/#/c/88010/
*Change reported as root cause of failure: *collector: fix list function -
https://gerrit.ovirt.org/#/c/87798/
*Link to
Job:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5805
<http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5805>Link to
all logs:*
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5805/artifacts
*(Relevant) error snippet from the log: <error>*
2018-02-20 23:51:03,179-05 DEBUG
[org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall]
(default task-19) [4a0a588a-e606-4151-936a-f66a8c580a78] Compiled
stored procedure. Call string is [{call
getdcidbyexternalnetworkid(?)}]
2018-02-20 23:51:03,179-05 DEBUG
[org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall]
(default task-19) [4a0a588a-e606-4151-936a-f66a8c580a78] SqlCall for
procedure [GetDcIdByExternalNetworkId] compiled
2018-02-20 23:51:03,180-05 DEBUG
[org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor]
(default task-19) [4a0a588a-e606-4151-936a-f66a8c580a78] method:
runQuery, params: [GetAllExternalNetworksOnProvider,
IdQueryParameters:{refresh='false', filtered='false'}], timeElapsed:
266ms
2018-02-20 23:51:03,187-05 INFO
[org.ovirt.engine.core.bll.network.dc.AddNetworkCommand] (default
task-19) [4a0a588a-e606-4151-936a-f66a8c580a78] Lock Acquired to
object 'EngineLock:{exclusiveLocks='[network_1=NETWORK,
94fc81b9-fbf2-4262-a877-32a066a573c2=PROVIDER]', sharedLocks=''}'
2018-02-20 23:51:03,190-05 WARN
[org.ovirt.engine.core.bll.network.dc.AddNetworkCommand] (default
task-19) [4a0a588a-e606-4151-936a-f66a8c580a78] Validation of action
'AddNetwork' failed for user admin@internal-authz. Reasons:
VAR__TYPE__NETWORK,VAR__ACTION__ADD,ACTION_TYPE_FAILED_NETWORK_NAME_IN_USE,$NetworkName
network_1
2018-02-20 23:51:03,190-05 INFO
[org.ovirt.engine.core.bll.network.dc.AddNetworkCommand] (default
task-19) [4a0a588a-e606-4151-936a-f66a8c580a78] Lock freed to object
'EngineLock:{exclusiveLocks='[network_1=NETWORK,
94fc81b9-fbf2-4262-a877-32a066a573c2=PROVIDER]', sharedLocks=''}'
2018-02-20 23:51:03,191-05 DEBUG
[org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor]
(default task-19) [4a0a588a-e606-4151-936a-f66a8c580a78] method:
runAction, params: [AddNetwork,
AddNetworkStoragePoolParameters:{commandId='41c44658-04e5-4aeb-9f65-cc2ad76e0c42',
user='null', commandType='Unknown'}], timeElapsed: 10ms
2018-02-20 23:51:03,196-05 ERROR
[org.ovirt.engine.api.restapi.resource.AbstractBackendResource]
(default task-19) [] Operation Failed: [Cannot add Network. The name
of the logical network 'network_1' is already used by an existing
logical network in the same data-center.
-Please choose a different name.]
2018-02-20 23:51:03,346-05 DEBUG
[org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor]
(EE-ManagedThreadFactory-engineScheduled-Thread-47) [] method: get,
params: [da0bebdb-cb83-415f-bcba-8920126e31a4], timeElapsed: 6ms
2018-02-20 23:51:03,346-05 DEBUG
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-47) []
vdsManager::refreshVdsStats entered,
host='lago-basic-suite-master-host-1'(da0bebdb-cb83-415f-bcba-8920126e31a4)
2018-02-20 23:51:03,365-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-47) [] START,
GetStatsAsyncVDSCommand(HostName = lago-basic-suite-master-host-1,
VdsIdAndVdsVDSCommandParametersBase:{hostId='da0bebdb-cb83-415f-bcba-8920126e31a4',
vds='Host[lago-basic-suite-master-host-1,da0bebdb-cb83-415f-bcba-8920126e31a4]'}),
log id: 279de334
2018-02-20 23:51:03,365-05 DEBUG
[org.ovirt.vdsm.jsonrpc.client.reactors.stomp.impl.Message]
(EE-ManagedThreadFactory-engineScheduled-Thread-47) [] SEND
destination:jms.topic.vdsm_requests
reply-to:jms.topic.vdsm_responses
content-length:98
*</error>*
6 years, 9 months
Documenting HA VMs on Wiki
by Milan Zamazal
Hi, to document and clarify some facts about HA VMs to users, especially
regarding I/O errors, I work on a HOWTO document to be added to oVirt
Wiki.
If you know how HA VMs or related mechanisms (such as VM handling on I/O
errors) work and would like to help review the document, please look at
the pull request: https://github.com/oVirt/ovirt-site/pull/1530
Thanks,
Milan
6 years, 9 months
[vdsm][network] false test failiures in CI
by Francesco Romani
Hi there,
recently some patches of mine failed on CI with this error which is
totally unrelated and which seems bogus.
Could some network developer please have a look and improve the current
state? :)
https://gerrit.ovirt.org/#/c/87881/
19:43:33
======================================================================
19:43:33 ERROR: test_list_ipv4_ipv6
(network.ip_address_test.IPAddressTest) 19:43:33
----------------------------------------------------------------------
19:43:33 Traceback (most recent call last): 19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/tests/testValidation.py",
line 330, in wrapper 19:43:33 return f(*args, **kwargs) 19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/tests/network/ip_address_test.py",
line 289, in test_list_ipv4_ipv6 19:43:33
ipv6_addresses=[IPV6_B_WITH_PREFIXLEN] 19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/tests/network/ip_address_test.py",
line 297, in _test_list 19:43:33 address.IPAddressData(addr,
device=nic)) 19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/lib/vdsm/network/ip/address/iproute2.py",
line 40, in add 19:43:33 addr_data.address, addr_data.prefixlen,
addr_data.family 19:43:33 File "/usr/lib64/python2.7/contextlib.py",
line 35, in __exit__ 19:43:33 self.gen.throw(type, value, traceback)
19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/lib/vdsm/network/ip/address/iproute2.py",
line 91, in _translate_iproute2_exception 19:43:33 new_exception,
new_exception(str(address_data), error_message), tb) 19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/lib/vdsm/network/ip/address/iproute2.py",
line 86, in _translate_iproute2_exception 19:43:33 yield 19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/lib/vdsm/network/ip/address/iproute2.py",
line 40, in add 19:43:33 addr_data.address, addr_data.prefixlen,
addr_data.family 19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/lib/vdsm/network/ipwrapper.py",
line 561, in addrAdd 19:43:33 _exec_cmd(command) 19:43:33 File
"/home/jenkins/workspace/vdsm_4.2_check-patch-fc27-x86_64/vdsm/lib/vdsm/network/ipwrapper.py",
line 482, in _exec_cmd 19:43:33 raise exc(returnCode,
error.splitlines()) 19:43:33 IPAddressAddError:
("IPAddressData(device='dummy_ZVN3L'
address=IPv6Interface(u'2002:99::1/64') scope=None flags=None)",
'RTNETLINK answers: Permission denied')
I had other failures, but they look like known issues (mkimage, no port
on protocol detector)
--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh
6 years, 9 months
[ OST Failure Report ] [ oVirt 4.2 (ovirt-engine-metrics) ] [ 20-02-2018 ] [ 003_00_metrics_bootstrap.metrics_and_log_collector ]
by Dafna Ron
Hi,
We have a failed test for project ovirt-engine-metrics.
It seems that CQ is reporting that the root cause of the failure was a
patch from 6 days ago.
Shirly, can you please check the changes?
*Link and headline of suspected patches: change reported as failed: *
replace vdsm stats with collectd virt plugin -
https://gerrit.ovirt.org/#/c/87310/
*change reported as root cause: *
ansible: add role to copy required files -
https://gerrit.ovirt.org/#/c/87430/
TASK [oVirt.ovirt-initial-validations/validate-config-yml : Validate
viaq_metrics_store parameter] ***
fatal: [localhost]: FAILED! => {"msg": "The conditional check
'viaq_metrics_store != true' failed. The error was: error while
evaluating conditional (viaq_metrics_store != true):
'viaq_metrics_store' is undefined\n\nThe error appears to have been in
'/usr/share/ansible/roles/oVirt.metrics/roles/oVirt.ovirt-initial-validations/validate-config-yml/tasks/main.yml':
line 49, column 3, but may\nbe elsewhere in the file depending on the
exact syntax problem.\n\nThe offending line appears to be:\n\n\n-
name: \"Validate viaq_metrics_store parameter\"\n ^ here\n"}
NO MORE HOSTS LEFT *************************************************************
PLAY RECAP *********************************************************************
localhost : ok=4 changed=0 unreachable=0 failed=1
lago.utils: ERROR: Error while running thread
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in
_ret_via_queue
queue.put({'return': func()})
File "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/003_00_metrics_bootstrap.py",
line 54, in configure_metrics
' Exit code is %s' % result.code
File "/usr/lib/python2.7/site-packages/nose/tools/trivial.py", line 29, in eq_
raise AssertionError(msg or "%r != %r" % (a, b))
AssertionError: Configuring ovirt machines for metrics failed. Exit code is 2
lago.ssh: DEBUG: Command 59b597f6 on lago-basic-suite-4-2-engine returned with 0
lago.ssh: DEBUG: Command 59b597f6 on lago-basic-suite-4-2-engine output:
This command will collect system configuration and diagnostic
information from this system.
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before
being passed to any third party.
No changes will be made to system configuration.
INFO: /dev/shm/log does not exist. It will be created.
DEBUG: API Vendor(None) API Version(4.2.0)
WARNING: This ovirt-log-collector call will collect logs from all
available hosts. This may take long time, depending on the size of
your deployment
INFO: Gathering oVirt Engine information...
DEBUG: loading config
'/usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf'
DEBUG: calling(['sosreport', '--list-plugins'])
DEBUG: returncode(0)
DEBUG: STDOUT(
sosreport (version 3.4)
*Link to
Job:http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/864/
<http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/864/>Link to
all
logs:http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/864/artif...
<http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/864/artifacts>(Relevant)
error snippet from the log: <error></error>*
6 years, 9 months
CI outage notice
by Evgheni Dereveanchin
Hi everyone,
After the Jenkins update tonight it started behaving unstably and there
were several restarts recorded throughout the night. The root cause is now
found and mitigated [1], Jenkins is up and stable again. There may still be
slaves around that were running jobs when the master restarted which caused
leftover files in the workspace.
If you had any CI jobs fail with errors unrelated to code between midnight
and 8:15 GMT today - please re-trigger those. If you still get failures -
please inform me or other members of the infra team to offline such slaves
and perform a cleanup.
--
Regards,
Evgheni Dereveanchin
[1] https://ovirt-jira.atlassian.net/browse/OVIRT-1904
6 years, 9 months