It still has the heatbeat exceeded issue - please make sure you test with a fixed version:
2017-01-12 05:50:27,021-05 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [103d0f0a] Heartbeat exceeded. Closing channel 2017-01-12 05:50:27,022-05 DEBUG [org.ovirt.vdsm.jsonrpc. client.internal. ResponseWorker] (ResponseWorker) [] Message received: {"jsonrpc":"2.0","error":{" code":"192.168.201.4:389513927 ","message":"Heartbeat exceeded"},"id":null} Then we can start and understand the failures:2017-01-12 05:50:27,055-05 ERROR [org.ovirt.engine.core.bll.network.host. HostSetupNetworksCommand] (org.ovirt.thread.pool-7- thread-2) [76b0383f] Command 'org.ovirt.engine.core.bll. network.host. HostSetupNetworksCommand' failed: EngineException: org.ovirt.engine.core. vdsbroker.vdsbroker. VDSNetworkException: VDSGenericException: VDSNetworkException: Heartbeat exceeded (Failed with error VDS_NETWORK_ERROR and code 5022) 2017-01-12 05:50:27,058-05 INFO [org.ovirt.engine.core.bll. network.host. HostSetupNetworksCommand] (org.ovirt.thread.pool-7- thread-2) [76b0383f] Lock freed to object 'EngineLock:{exclusiveLocks='[ HOST_NETWORK40eb11ba-e6ac- 478a-b8b1-73b7892ace65=<HOST_ NETWORK, ACTION_TYPE_FAILED_SETUP_ NETWORKS_OR_REFRESH_IN_ PROGRESS>]', sharedLocks='null'}' 2017-01-12 05:50:27,061-05 WARN [org.ovirt.engine.core. vdsbroker.VdsManager] (org.ovirt.thread.pool-7- thread-19) [76b0383f] Host 'lago-basic-suite-master- host1' is not responding. 2017-01-12 05:50:27,074-05 WARN [org.ovirt.engine.core.dal. dbbroker.auditloghandling. AuditLogDirector] (org.ovirt.thread.pool-7- thread-19) [76b0383f] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host lago-basic-suite-master-host1 is not responding. Host cannot be fenced automatically because power management for the host is disabled. 2017-01-12 05:50:27,079-05 ERROR [org.ovirt.engine.core.dal. dbbroker.auditloghandling. AuditLogDirector] (org.ovirt.thread.pool-7- thread-2) [76b0383f] Failed to configure management network: Failed to configure management network on host lago-basic-suite-master-host1 due to setup networks failure. 2017-01-12 05:50:27,079-05 ERROR [org.ovirt.engine.core.bll. hostdeploy. InstallVdsInternalCommand] (org.ovirt.thread.pool-7- thread-2) [76b0383f] Exception: org.ovirt.engine.core.bll. network.NetworkConfigurator$ NetworkConfiguratorException: Failed to configure management network at org.ovirt.engine.core.bll. network.NetworkConfigurator. configureManagementNetwork( NetworkConfigurator.java:247) [bll.jar:] On Thu, Jan 12, 2017 at 2:12 PM, Daniel Belenky <dbelenky@redhat.com> wrote:______________________________Hi all,test-repo ovirt experimental master job fails, and it seems that there is an issue with 'add_host' phase under the 'bootstrap' suite.From the logs, it seems that the suite was unable to fire up the host / something is wrong with host<error type="exceptions.RuntimeError " message="Host lago-basic-suite-master-host1 is in non operational state -------------------- >> begin captured logging << -------------------- lago.ssh: DEBUG: start task Get ssh client for lago-basic-suite-master-host0 lago.ssh: DEBUG: Still got 100 tries for lago-basic-suite-master-host0 lago.ssh: DEBUG: end task Get ssh client for lago-basic-suite-master-host0 lago.ssh: DEBUG: Running aab0eff8 on lago-basic-suite-master-host0: yum install -y iptables lago.ssh: DEBUG: Command aab0eff8 on lago-basic-suite-master-host0 returned with 0 lago.ssh: DEBUG: Command aab0eff8 on lago-basic-suite-master-host0 output: Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: centos.host-engine.com * extras: linux.mirrors.es.net * updates: mirror.n5tech.com Package iptables-1.4.21-17.el7.x86_64 already installed and latest version Nothing to do lago.ssh: DEBUG: start task Get ssh client for lago-basic-suite-master-host1 lago.ssh: DEBUG: Still got 100 tries for lago-basic-suite-master-host1 lago.ssh: DEBUG: end task Get ssh client for lago-basic-suite-master-host1 lago.ssh: DEBUG: Running ab5c94f2 on lago-basic-suite-master-host1: yum install -y iptables lago.ssh: DEBUG: Command ab5c94f2 on lago-basic-suite-master-host1 returned with 0 lago.ssh: DEBUG: Command ab5c94f2 on lago-basic-suite-master-host1 output: Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirror.n5tech.com * extras: ftp.osuosl.org * updates: mirrors.usc.edu Package iptables-1.4.21-17.el7.x86_64 already installed and latest version Nothing to do ovirtlago.testlib: ERROR: * Unhandled exception in <function _host_is_up at 0x322e938> Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 217, in assert_equals_within res = func() File "/home/jenkins/workspace/test- ">repo_ovirt_experimental_master /ovirt-system-tests/basic- suite-master/test-scenarios/ 002_bootstrap.py", line 162, in _host_is_up raise RuntimeError('Host %s is in non operational state' % host.name()) RuntimeError: Host lago-basic-suite-master-host1 is in non operational state --------------------- >> end captured logging << --------------------- From the engine.log, I found a timeout in the rpc call (but this error is seen on jobs that success too, so might not be relevant(?))2017-01-12 05:49:53,383-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-7-threa d-2) [76b0383f] Command 'PollVDSCommand(HostName = lago-basic-suite-master-host1, VdsIdVDSCommandParametersBase: {runAsync='true', hostId='40eb11ba-e6ac-478a-b8b 1-73b7892ace65'})' execution failed: VDSGenericException: VDSNetworkException: Timeout during rpc call 2017-01-12 05:49:53,383-05 DEBUG [org.ovirt.engine.core.vdsbrok er.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-7-threa d-2) [76b0383f] Exception: org.ovirt.engine.core.vdsbroke r.vdsbroker.VDSNetworkExceptio n: VDSGenericException: VDSNetworkException: Timeout during rpc call ... (the full error is very long, so I wont paste it here, its in the engine.log)2017-01-12 05:49:58,291-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-7-threa d-1) [30b2ca77] Timeout waiting for VDSM response: Internal timeout occured In the host's vdsm.log, there are some errors too:2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for unfetched domain 380623d8-1e85-4831-9048-3d05932f3d3a (sdc:151) 2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for domain 380623d8-1e85-4831-9048-3d0593 2f3d3a (sdc:168) 2017-01-12 05:51:48,395 WARN (jsonrpc/0) [storage.LVM] lvm vgs failed: 5 [] [' WARNING: Not using lvmetad because config setting use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).', ' Volume group "380623d8-1e85-4831-9048-3d059 32f3d3a" not found', ' Cannot process volume group 380623d8-1e85-4831-9048-3d0593 2f3d3a'] (lvm:377) 2017-01-12 05:51:48,398 ERROR (jsonrpc/0) [storage.StorageDomainCache] domain 380623d8-1e85-4831-9048-3d0593 2f3d3a not found (sdc:157) Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.p y", line 155, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.p y", line 185, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(s dUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'380623d8-1e85-4831-9048-3d0 5932f3d3a',) and2017-01-12 05:53:45,375 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 43814, 0, 0) at 0x235a2d8>: unexpected eof (betterAsyncore:119) Link to JenkinsCan someone please take a look?Thanks,Red Hat IsraelDaniel BelenkyRHV DevOps_________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel