[ovirt-devel] ovirt master system tests fail

Piotr Kliczewski piotr.kliczewski at gmail.com
Thu Jan 12 14:49:56 UTC 2017


Looking at the logs I see that the response arrived:

2017-01-12 05:50:00,408-05 DEBUG
[org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
(ResponseWorker) [] Message received: {"jsonrpc": "2.0", "id":
"80681e72-920e-4020-9ea8-76256dc7408e", "result": {"message": "Done",
"code": 0}}

and we see:

2017-01-12 05:50:00,874-05 INFO
[org.ovirt.engine.core.bll.network.host.CommitNetworkChangesCommand]
(org.ovirt.thread.pool-7-thread-1) [39b888ad] Running command:
CommitNetworkChangesCommand internal: true. Entities affected :  ID:
3210366f-d334-44a5-8134-93c73c7b10bb Type: VDSAction group
CONFIGURE_HOST_NETWORK with role type ADMIN

and almost 30 seconds later I see:

2017-01-12 05:50:27,022-05 DEBUG
[org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
(ResponseWorker) [] Message received:
{"jsonrpc":"2.0","error":{"code":"192.168.201.4:389513927","message":"Heartbeat
exceeded"},"id":null}

In my opinion communication is correct to declare "Heartbeat exceeded"
there was no interaction with host for specified amount of time.

Thanks,
Piotr

On Thu, Jan 12, 2017 at 1:54 PM, Eyal Edri <eedri at redhat.com> wrote:
>
>
> On Thu, Jan 12, 2017 at 2:39 PM, Yaniv Kaul <ykaul at redhat.com> wrote:
>>
>> It still has the heatbeat exceeded issue - please make sure you test with
>> a fixed version:
>
>
> It uses the latest rpm with the fix:
>
> ---> Package vdsm-jsonrpc-java.noarch
> 0:1.4.1-1.20170112092258.git1861532.el7.centos will be installed
>
> * 1861532 - (HEAD -> master, origin/master, origin/HEAD) version bump (4
> hours ago) Piotr Kliczewski <piotr.kliczewski at gmail.com>
> * 9347d06 - (tag: v1.4.1) handle ssl closed status (19 hours ago) Piotr
> Kliczewski <piotr.kliczewski at gmail.com>
>
>
>
>>
>> 2017-01-12 05:50:27,021-05 DEBUG
>> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor)
>> [103d0f0a] Heartbeat exceeded. Closing channel
>> 2017-01-12 05:50:27,022-05 DEBUG
>> [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) []
>> Message received:
>> {"jsonrpc":"2.0","error":{"code":"192.168.201.4:389513927","message":"Heartbeat
>> exceeded"},"id":null}
>>
>>
>> Then we can start and understand the failures:
>>
>> 2017-01-12 05:50:27,055-05 ERROR
>> [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand]
>> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Command
>> 'org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand' failed:
>> EngineException:
>> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>> VDSGenericException: VDSNetworkException: Heartbeat exceeded (Failed with
>> error VDS_NETWORK_ERROR and code 5022)
>> 2017-01-12 05:50:27,058-05 INFO
>> [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand]
>> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Lock freed to object
>> 'EngineLock:{exclusiveLocks='[HOST_NETWORK40eb11ba-e6ac-478a-b8b1-73b7892ace65=<HOST_NETWORK,
>> ACTION_TYPE_FAILED_SETUP_NETWORKS_OR_REFRESH_IN_PROGRESS>]',
>> sharedLocks='null'}'
>> 2017-01-12 05:50:27,061-05 WARN
>> [org.ovirt.engine.core.vdsbroker.VdsManager]
>> (org.ovirt.thread.pool-7-thread-19) [76b0383f] Host
>> 'lago-basic-suite-master-host1' is not responding.
>> 2017-01-12 05:50:27,074-05 WARN
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (org.ovirt.thread.pool-7-thread-19) [76b0383f] Correlation ID: null, Call
>> Stack: null, Custom Event ID: -1, Message: Host
>> lago-basic-suite-master-host1 is not responding. Host cannot be fenced
>> automatically because power management for the host is disabled.
>> 2017-01-12 05:50:27,079-05 ERROR
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Failed to configure management
>> network: Failed to configure management network on host
>> lago-basic-suite-master-host1 due to setup networks failure.
>> 2017-01-12 05:50:27,079-05 ERROR
>> [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand]
>> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Exception:
>> org.ovirt.engine.core.bll.network.NetworkConfigurator$NetworkConfiguratorException:
>> Failed to configure management network
>> 	at
>> org.ovirt.engine.core.bll.network.NetworkConfigurator.configureManagementNetwork(NetworkConfigurator.java:247)
>> [bll.jar:]
>>
>>
>> On Thu, Jan 12, 2017 at 2:12 PM, Daniel Belenky <dbelenky at redhat.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> test-repo ovirt experimental master job fails, and it seems that there is
>>> an issue with 'add_host' phase under the 'bootstrap' suite.
>>> From the logs, it seems that the suite was unable to fire up the host /
>>> something is wrong with host
>>>
>>> <error type="exceptions.RuntimeError" message="Host
>>> lago-basic-suite-master-host1 is in non operational state
>>> -------------------- >> begin captured logging << --------------------
>>> lago.ssh: DEBUG: start task Get ssh client for lago-basic-suite-master-host0
>>> lago.ssh: DEBUG: Still got 100 tries for lago-basic-suite-master-host0
>>> lago.ssh: DEBUG: end task Get ssh client for lago-basic-suite-master-host0
>>> lago.ssh: DEBUG: Running aab0eff8 on lago-basic-suite-master-host0: yum
>>> install -y iptables lago.ssh: DEBUG: Command aab0eff8 on
>>> lago-basic-suite-master-host0 returned with 0 lago.ssh: DEBUG: Command
>>> aab0eff8 on lago-basic-suite-master-host0 output: Loaded plugins:
>>> fastestmirror Loading mirror speeds from cached hostfile * base:
>>> centos.host-engine.com * extras: linux.mirrors.es.net * updates:
>>> mirror.n5tech.com Package iptables-1.4.21-17.el7.x86_64 already installed
>>> and latest version Nothing to do lago.ssh: DEBUG: start task Get ssh client
>>> for lago-basic-suite-master-host1 lago.ssh: DEBUG: Still got 100 tries for
>>> lago-basic-suite-master-host1 lago.ssh: DEBUG: end task Get ssh client for
>>> lago-basic-suite-master-host1 lago.ssh: DEBUG: Running ab5c94f2 on
>>> lago-basic-suite-master-host1: yum install -y iptables lago.ssh: DEBUG:
>>> Command ab5c94f2 on lago-basic-suite-master-host1 returned with 0 lago.ssh:
>>> DEBUG: Command ab5c94f2 on lago-basic-suite-master-host1 output: Loaded
>>> plugins: fastestmirror Loading mirror speeds from cached hostfile * base:
>>> mirror.n5tech.com * extras: ftp.osuosl.org * updates: mirrors.usc.edu
>>> Package iptables-1.4.21-17.el7.x86_64 already installed and latest version
>>> Nothing to do ovirtlago.testlib: ERROR: * Unhandled exception in <function
>>> _host_is_up at 0x322e938> Traceback (most recent call last): File
>>> "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 217, in
>>> assert_equals_within res = func() File
>>> "/home/jenkins/workspace/test-repo_ovirt_experimental_master/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
>>> line 162, in _host_is_up raise RuntimeError('Host %s is in non operational
>>> state' % host.name()) RuntimeError: Host lago-basic-suite-master-host1 is in
>>> non operational state --------------------- >> end captured logging <<
>>> ---------------------">
>>>
>>>
>>> From the engine.log, I found a timeout in the rpc call (but this error is
>>> seen on jobs that success too, so might not be relevant(?))
>>>
>>> 2017-01-12 05:49:53,383-05 ERROR
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
>>> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Command
>>> 'PollVDSCommand(HostName = lago-basic-suite-master-host1,
>>> VdsIdVDSCommandParametersBase:{runAsync='true',
>>> hostId='40eb11ba-e6ac-478a-b8b1-73b7892ace65'})' execution failed:
>>> VDSGenericException: VDSNetworkException: Timeout during rpc call
>>> 2017-01-12 05:49:53,383-05 DEBUG
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
>>> (org.ovirt.thread.pool-7-thread-2) [76b0383f] Exception:
>>> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>>> VDSGenericException: VDSNetworkException: Timeout during rpc call
>>>
>>> ... (the full error is very long, so I wont paste it here, its in the
>>> engine.log)
>>>
>>> 2017-01-12 05:49:58,291-05 ERROR
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
>>> (org.ovirt.thread.pool-7-thread-1) [30b2ca77] Timeout waiting for VDSM
>>> response: Internal timeout occured
>>>
>>>
>>>
>>> In the host's vdsm.log, there are some errors too:
>>>
>>> 2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache]
>>> looking for unfetched domain 380623d8-1e85-4831-9048-3d05932f3d3a (sdc:151)
>>> 2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache]
>>> looking for domain 380623d8-1e85-4831-9048-3d05932f3d3a (sdc:168)
>>> 2017-01-12 05:51:48,395 WARN  (jsonrpc/0) [storage.LVM] lvm vgs failed: 5
>>> [] ['  WARNING: Not using lvmetad because config setting use_lvmetad=0.', '
>>> WARNING: To avoid corruption, rescan devices to make changes visible (pvscan
>>> --cache).', '  Volume group "380623d8-1e85-4831-9048-3d05932f3d3a" not
>>> found', '  Cannot process volume group
>>> 380623d8-1e85-4831-9048-3d05932f3d3a'] (lvm:377)
>>> 2017-01-12 05:51:48,398 ERROR (jsonrpc/0) [storage.StorageDomainCache]
>>> domain 380623d8-1e85-4831-9048-3d05932f3d3a not found (sdc:157)
>>> Traceback (most recent call last):
>>>   File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
>>>     dom = findMethod(sdUUID)
>>>   File "/usr/share/vdsm/storage/sdc.py", line 185, in
>>> _findUnfetchedDomain
>>>     raise se.StorageDomainDoesNotExist(sdUUID)
>>> StorageDomainDoesNotExist: Storage domain does not exist:
>>> (u'380623d8-1e85-4831-9048-3d05932f3d3a',)
>>>
>>>
>>> and
>>>
>>> 2017-01-12 05:53:45,375 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
>>> SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
>>> ('::1', 43814, 0, 0) at 0x235a2d8>: unexpected eof (betterAsyncore:119)
>>>
>>>
>>> Link to Jenkins
>>>
>>> Can someone please take a look?
>>>
>>> Thanks,
>>>
>>> Daniel Belenky
>>> RHV DevOps
>>> Red Hat Israel
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>
>>
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>
>
>
>
> --
> Eyal Edri
> Associate Manager
> RHV DevOps
> EMEA ENG Virtualization R&D
> Red Hat Israel
>
> phone: +972-9-7692018
> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel


More information about the Devel mailing list