[ovirt-devel] suspend_resume_vm fail on master experimental

Yaniv Kaul ykaul at redhat.com
Wed Jan 11 11:21:01 UTC 2017


On Wed, Jan 11, 2017 at 12:49 PM, Milan Zamazal <mzamazal at redhat.com> wrote:

> I just ran ovirt-system-tests on two very different machines.  It passed
> on one of them, while it failed on the other one, at a different place:
>
>   @ Run test: 005_network_by_label.py:
>   nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
>     # assign_hosts_network_label:
>   Error while running thread
>   Traceback (most recent call last):
>     File "/usr/lib/python2.7/site-packages/lago/utils.py", line 55, in
> _ret_via_queue
>       queue.put({'return': func()})
>     File "/var/local/lago/ovirt-system-tests/basic-suite-master/test-
> scenarios/005_network_by_label.py", line 56, in _assign_host_network_label
>       host_nic=nic
>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py",
> line 16231, in add
>       headers={"Correlation-Id":correlation_id, "Expect":expect}
>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py",
> line 79, in add
>       return self.request('POST', url, body, headers, cls=cls)
>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py",
> line 122, in request
>       persistent_auth=self.__persistent_auth
>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
> line 79, in do_request
>       persistent_auth)
>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
> line 162, in __do_request
>       raise errors.RequestError(response_code, response_reason,
> response_body)
>   RequestError:
>   status: 409
>   reason: Conflict
>   detail: Cannot add Label. Operation can be performed only when Host
> status is  Maintenance, Up, NonOperational.
>

This is an issue we've seen from time to time and have not figured it out
yet. Do you have engine logs for it?


>
> I can also see occasional errors like the following in vdsm.log:
>
>   ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from
> <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:192.168.201.3',
> 47434, 0, 0) at 0x271fd88>: (104, 'Connection reset by peer')
> (betterAsyncore:119)
>

This is the core issue of today's - but probably unrelated to the issue
you've just described, that we have seen happening from time to time in the
past (I'd say that I've seen it happening last time ~2 weeks ago or so, but
it's not reproducible easily to me).
Y.


>
> So we are probably dealing with an error that occurs "randomly" and is
> not related to a particular test.
>
> Daniel Belenky <dbelenky at redhat.com> writes:
>
> > Link to Jenkins
> > <http://jenkins.ovirt.org/view/experimental%20jobs/job/
> test-repo_ovirt_experimental_master/4648/artifact/exported-
> artifacts/basic_suite_master.sh-el7/exported-artifacts/>
> >
> > On Wed, Jan 11, 2017 at 10:26 AM, Francesco Romani <fromani at redhat.com>
> > wrote:
> >
> >> Hi all
> >>
> >> On 01/11/2017 08:52 AM, Eyal Edri wrote:
> >>
> >> Adding Tomas from Virt.
> >>
> >> On Tue, Jan 10, 2017 at 10:54 AM, Piotr Kliczewski <
> >> piotr.kliczewski at gmail.com> wrote:
> >>
> >>> On Tue, Jan 10, 2017 at 9:29 AM, Daniel Belenky <dbelenky at redhat.com>
> >>> wrote:
> >>> > Hi all,
> >>> >
> >>> > test-repo_ovirt_experimental_master (link to Jenkins) job failed on
> >>> > basic_sanity scenario.
> >>> > The job was triggered by https://gerrit.ovirt.org/#/c/69845/
> >>> >
> >>> > From looking at the logs, it seems that the reason is VDSM.
> >>> >
> >>> > In the VDSM log, i see the following error:
> >>> >
> >>> > 2017-01-09 16:47:41,331 ERROR (JsonRpc (StompReactor))
> [vds.dispatcher]
> >>> SSL
> >>> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> >>> ('::1',
> >>> > 34942, 0, 0) at 0x36b95f0>: unexpected eof (betterAsyncore:119)
> >>>
> >>
> >> Daniel, could you please remind me the jenkins link? I see something
> >> suspicious on the Vdsm log.
> >> Most notably, Vdsm received SIGTERM. Is this expected and part of the
> test?
> >>
> >> >
> >>>
> >>> This issue means that the client closed connection while vdsm was
> >>> replying. It can happen at any time
> >>> when the client is not nice with the connection. As you can see the
> >>> client connected locally '::1'.
> >>>
> >>> >
> >>> > Also, when looking at the MOM logs, I see the the following:
> >>> >
> >>> > 2017-01-09 16:43:39,508 - mom.vdsmInterface - ERROR - Cannot connect
> to
> >>> > VDSM! [Errno 111] Connection refused
> >>> >
> >>>
> >>> Looking at the log at this time vdsm had no open socket.
> >>
> >>
> >>
> >> Correct, but IIRC we have a race on startup - that's the reason why MOM
> >> retries to connect. After the new try, MOM seems to behave
> >> correctly:
> >>
> >> 2017-01-09 16:44:05,672 - mom.RPCServer - INFO - ping()
> >> 2017-01-09 16:44:05,673 - mom.RPCServer - INFO - getStatistics()
> >>
> >> --
> >> Francesco Romani
> >> Red Hat Engineering Virtualization R & D
> >> IRC: fromani
> >>
> >>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170111/2886f818/attachment.html>


More information about the Devel mailing list