Looking at the logs I see that the response arrived:
2017-01-12 05:50:00,408-05 DEBUG
[org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
(ResponseWorker) [] Message received: {"jsonrpc": "2.0",
"id":
"80681e72-920e-4020-9ea8-76256dc7408e", "result":
{"message": "Done",
"code": 0}}
and we see:
2017-01-12 05:50:00,874-05 INFO
[org.ovirt.engine.core.bll.network.host.CommitNetworkChangesCommand]
(org.ovirt.thread.pool-7-thread-1) [39b888ad] Running command:
CommitNetworkChangesCommand internal: true. Entities affected : ID:
3210366f-d334-44a5-8134-93c73c7b10bb Type: VDSAction group
CONFIGURE_HOST_NETWORK with role type ADMIN
and almost 30 seconds later I see:
2017-01-12 05:50:27,022-05 DEBUG
[org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
(ResponseWorker) [] Message received:
{"jsonrpc":"2.0","error":{"code":"192.168.201.4:389513927","message":"Heartbeat
exceeded"},"id":null}
In my opinion communication is correct to declare "Heartbeat exceeded"
there was no interaction with host for specified amount of time.
Thanks,
Piotr
On Thu, Jan 12, 2017 at 1:54 PM, Eyal Edri <eedri(a)redhat.com> wrote:
On Thu, Jan 12, 2017 at 2:39 PM, Yaniv Kaul <ykaul(a)redhat.com> wrote:
>
> It still has the heatbeat exceeded issue - please make sure you test with
> a fixed version:
It uses the latest rpm with the fix:
---> Package vdsm-jsonrpc-java.noarch
0:1.4.1-1.20170112092258.git1861532.el7.centos will be installed
* 1861532 - (HEAD -> master, origin/master, origin/HEAD) version bump (4
hours ago) Piotr Kliczewski <piotr.kliczewski(a)gmail.com>
* 9347d06 - (tag: v1.4.1) handle ssl closed status (19 hours ago) Piotr
Kliczewski <piotr.kliczewski(a)gmail.com>
>
> 2017-01-12 05:50:27,021-05 DEBUG
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor)
> [103d0f0a] Heartbeat exceeded. Closing channel
> 2017-01-12 05:50:27,022-05 DEBUG
> [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) []
> Message received:
>
{"jsonrpc":"2.0","error":{"code":"192.168.201.4:389513927","message":"Heartbeat
> exceeded"},"id":null}
>
>
> Then we can start and understand the failures:
>
> 2017-01-12 05:50:27,055-05 ERROR
> [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand]
> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Command
> 'org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand' failed:
> EngineException:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Heartbeat exceeded (Failed with
> error VDS_NETWORK_ERROR and code 5022)
> 2017-01-12 05:50:27,058-05 INFO
> [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand]
> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Lock freed to object
>
'EngineLock:{exclusiveLocks='[HOST_NETWORK40eb11ba-e6ac-478a-b8b1-73b7892ace65=<HOST_NETWORK,
> ACTION_TYPE_FAILED_SETUP_NETWORKS_OR_REFRESH_IN_PROGRESS>]',
> sharedLocks='null'}'
> 2017-01-12 05:50:27,061-05 WARN
> [org.ovirt.engine.core.vdsbroker.VdsManager]
> (org.ovirt.thread.pool-7-thread-19) [76b0383f] Host
> 'lago-basic-suite-master-host1' is not responding.
> 2017-01-12 05:50:27,074-05 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (org.ovirt.thread.pool-7-thread-19) [76b0383f] Correlation ID: null, Call
> Stack: null, Custom Event ID: -1, Message: Host
> lago-basic-suite-master-host1 is not responding. Host cannot be fenced
> automatically because power management for the host is disabled.
> 2017-01-12 05:50:27,079-05 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Failed to configure management
> network: Failed to configure management network on host
> lago-basic-suite-master-host1 due to setup networks failure.
> 2017-01-12 05:50:27,079-05 ERROR
> [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand]
> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Exception:
> org.ovirt.engine.core.bll.network.NetworkConfigurator$NetworkConfiguratorException:
> Failed to configure management network
> at
>
org.ovirt.engine.core.bll.network.NetworkConfigurator.configureManagementNetwork(NetworkConfigurator.java:247)
> [bll.jar:]
>
>
> On Thu, Jan 12, 2017 at 2:12 PM, Daniel Belenky <dbelenky(a)redhat.com>
> wrote:
>>
>> Hi all,
>>
>> test-repo ovirt experimental master job fails, and it seems that there is
>> an issue with 'add_host' phase under the 'bootstrap' suite.
>> From the logs, it seems that the suite was unable to fire up the host /
>> something is wrong with host
>>
>> <error type="exceptions.RuntimeError" message="Host
>> lago-basic-suite-master-host1 is in non operational state
>> -------------------- >> begin captured logging <<
--------------------
>> lago.ssh: DEBUG: start task Get ssh client for lago-basic-suite-master-host0
>> lago.ssh: DEBUG: Still got 100 tries for lago-basic-suite-master-host0
>> lago.ssh: DEBUG: end task Get ssh client for lago-basic-suite-master-host0
>> lago.ssh: DEBUG: Running aab0eff8 on lago-basic-suite-master-host0: yum
>> install -y iptables lago.ssh: DEBUG: Command aab0eff8 on
>> lago-basic-suite-master-host0 returned with 0 lago.ssh: DEBUG: Command
>> aab0eff8 on lago-basic-suite-master-host0 output: Loaded plugins:
>> fastestmirror Loading mirror speeds from cached hostfile * base:
>>
centos.host-engine.com * extras:
linux.mirrors.es.net * updates:
>>
mirror.n5tech.com Package iptables-1.4.21-17.el7.x86_64 already installed
>> and latest version Nothing to do lago.ssh: DEBUG: start task Get ssh client
>> for lago-basic-suite-master-host1 lago.ssh: DEBUG: Still got 100 tries for
>> lago-basic-suite-master-host1 lago.ssh: DEBUG: end task Get ssh client for
>> lago-basic-suite-master-host1 lago.ssh: DEBUG: Running ab5c94f2 on
>> lago-basic-suite-master-host1: yum install -y iptables lago.ssh: DEBUG:
>> Command ab5c94f2 on lago-basic-suite-master-host1 returned with 0 lago.ssh:
>> DEBUG: Command ab5c94f2 on lago-basic-suite-master-host1 output: Loaded
>> plugins: fastestmirror Loading mirror speeds from cached hostfile * base:
>>
mirror.n5tech.com * extras:
ftp.osuosl.org * updates:
mirrors.usc.edu
>> Package iptables-1.4.21-17.el7.x86_64 already installed and latest version
>> Nothing to do ovirtlago.testlib: ERROR: * Unhandled exception in <function
>> _host_is_up at 0x322e938> Traceback (most recent call last): File
>> "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 217, in
>> assert_equals_within res = func() File
>>
"/home/jenkins/workspace/test-repo_ovirt_experimental_master/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
>> line 162, in _host_is_up raise RuntimeError('Host %s is in non operational
>> state' % host.name()) RuntimeError: Host lago-basic-suite-master-host1 is in
>> non operational state --------------------- >> end captured logging
<<
>> ---------------------">
>>
>>
>> From the engine.log, I found a timeout in the rpc call (but this error is
>> seen on jobs that success too, so might not be relevant(?))
>>
>> 2017-01-12 05:49:53,383-05 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
>> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Command
>> 'PollVDSCommand(HostName = lago-basic-suite-master-host1,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='40eb11ba-e6ac-478a-b8b1-73b7892ace65'})' execution failed:
>> VDSGenericException: VDSNetworkException: Timeout during rpc call
>> 2017-01-12 05:49:53,383-05 DEBUG
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
>> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Exception:
>> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>> VDSGenericException: VDSNetworkException: Timeout during rpc call
>>
>> ... (the full error is very long, so I wont paste it here, its in the
>> engine.log)
>>
>> 2017-01-12 05:49:58,291-05 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
>> (org.ovirt.thread.pool-7-thread-1) [30b2ca77] Timeout waiting for VDSM
>> response: Internal timeout occured
>>
>>
>>
>> In the host's vdsm.log, there are some errors too:
>>
>> 2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache]
>> looking for unfetched domain 380623d8-1e85-4831-9048-3d05932f3d3a (sdc:151)
>> 2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache]
>> looking for domain 380623d8-1e85-4831-9048-3d05932f3d3a (sdc:168)
>> 2017-01-12 05:51:48,395 WARN (jsonrpc/0) [storage.LVM] lvm vgs failed: 5
>> [] [' WARNING: Not using lvmetad because config setting use_lvmetad=0.',
'
>> WARNING: To avoid corruption, rescan devices to make changes visible (pvscan
>> --cache).', ' Volume group
"380623d8-1e85-4831-9048-3d05932f3d3a" not
>> found', ' Cannot process volume group
>> 380623d8-1e85-4831-9048-3d05932f3d3a'] (lvm:377)
>> 2017-01-12 05:51:48,398 ERROR (jsonrpc/0) [storage.StorageDomainCache]
>> domain 380623d8-1e85-4831-9048-3d05932f3d3a not found (sdc:157)
>> Traceback (most recent call last):
>> File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
>> dom = findMethod(sdUUID)
>> File "/usr/share/vdsm/storage/sdc.py", line 185, in
>> _findUnfetchedDomain
>> raise se.StorageDomainDoesNotExist(sdUUID)
>> StorageDomainDoesNotExist: Storage domain does not exist:
>> (u'380623d8-1e85-4831-9048-3d05932f3d3a',)
>>
>>
>> and
>>
>> 2017-01-12 05:53:45,375 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
>> SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
>> ('::1', 43814, 0, 0) at 0x235a2d8>: unexpected eof
(betterAsyncore:119)
>>
>>
>> Link to Jenkins
>>
>> Can someone please take a look?
>>
>> Thanks,
>>
>> Daniel Belenky
>> RHV DevOps
>> Red Hat Israel
>>
>> _______________________________________________
>> Devel mailing list
>> Devel(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/devel
>
>
>
> _______________________________________________
> Devel mailing list
> Devel(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/devel
--
Eyal Edri
Associate Manager
RHV DevOps
EMEA ENG Virtualization R&D
Red Hat Israel
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________
Devel mailing list
Devel(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel