[ovirt-devel] suspend_resume_vm fail on master experimental

Milan Zamazal mzamazal at redhat.com
Wed Jan 11 10:49:10 UTC 2017


I just ran ovirt-system-tests on two very different machines.  It passed
on one of them, while it failed on the other one, at a different place:

  @ Run test: 005_network_by_label.py: 
  nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
    # assign_hosts_network_label: 
  Error while running thread
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site-packages/lago/utils.py", line 55, in _ret_via_queue
      queue.put({'return': func()})
    File "/var/local/lago/ovirt-system-tests/basic-suite-master/test-scenarios/005_network_by_label.py", line 56, in _assign_host_network_label
      host_nic=nic
    File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", line 16231, in add
      headers={"Correlation-Id":correlation_id, "Expect":expect}
    File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 79, in add
      return self.request('POST', url, body, headers, cls=cls)
    File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 122, in request
      persistent_auth=self.__persistent_auth
    File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 79, in do_request
      persistent_auth)
    File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 162, in __do_request
      raise errors.RequestError(response_code, response_reason, response_body)
  RequestError: 
  status: 409
  reason: Conflict
  detail: Cannot add Label. Operation can be performed only when Host status is  Maintenance, Up, NonOperational.

I can also see occasional errors like the following in vdsm.log:

  ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:192.168.201.3', 47434, 0, 0) at 0x271fd88>: (104, 'Connection reset by peer') (betterAsyncore:119)

So we are probably dealing with an error that occurs "randomly" and is
not related to a particular test.

Daniel Belenky <dbelenky at redhat.com> writes:

> Link to Jenkins
> <http://jenkins.ovirt.org/view/experimental%20jobs/job/test-repo_ovirt_experimental_master/4648/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/>
>
> On Wed, Jan 11, 2017 at 10:26 AM, Francesco Romani <fromani at redhat.com>
> wrote:
>
>> Hi all
>>
>> On 01/11/2017 08:52 AM, Eyal Edri wrote:
>>
>> Adding Tomas from Virt.
>>
>> On Tue, Jan 10, 2017 at 10:54 AM, Piotr Kliczewski <
>> piotr.kliczewski at gmail.com> wrote:
>>
>>> On Tue, Jan 10, 2017 at 9:29 AM, Daniel Belenky <dbelenky at redhat.com>
>>> wrote:
>>> > Hi all,
>>> >
>>> > test-repo_ovirt_experimental_master (link to Jenkins) job failed on
>>> > basic_sanity scenario.
>>> > The job was triggered by https://gerrit.ovirt.org/#/c/69845/
>>> >
>>> > From looking at the logs, it seems that the reason is VDSM.
>>> >
>>> > In the VDSM log, i see the following error:
>>> >
>>> > 2017-01-09 16:47:41,331 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
>>> SSL
>>> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
>>> ('::1',
>>> > 34942, 0, 0) at 0x36b95f0>: unexpected eof (betterAsyncore:119)
>>>
>>
>> Daniel, could you please remind me the jenkins link? I see something
>> suspicious on the Vdsm log.
>> Most notably, Vdsm received SIGTERM. Is this expected and part of the test?
>>
>> >
>>>
>>> This issue means that the client closed connection while vdsm was
>>> replying. It can happen at any time
>>> when the client is not nice with the connection. As you can see the
>>> client connected locally '::1'.
>>>
>>> >
>>> > Also, when looking at the MOM logs, I see the the following:
>>> >
>>> > 2017-01-09 16:43:39,508 - mom.vdsmInterface - ERROR - Cannot connect to
>>> > VDSM! [Errno 111] Connection refused
>>> >
>>>
>>> Looking at the log at this time vdsm had no open socket.
>>
>>
>>
>> Correct, but IIRC we have a race on startup - that's the reason why MOM
>> retries to connect. After the new try, MOM seems to behave
>> correctly:
>>
>> 2017-01-09 16:44:05,672 - mom.RPCServer - INFO - ping()
>> 2017-01-09 16:44:05,673 - mom.RPCServer - INFO - getStatistics()
>>
>> --
>> Francesco Romani
>> Red Hat Engineering Virtualization R & D
>> IRC: fromani
>>
>>


More information about the Devel mailing list