Yaniv Kaul <ykaul(a)redhat.com> writes:
On Wed, Jan 11, 2017 at 12:49 PM, Milan Zamazal
<mzamazal(a)redhat.com> wrote:
> I just ran ovirt-system-tests on two very different machines. It passed
> on one of them, while it failed on the other one, at a different place:
>
> @ Run test: 005_network_by_label.py:
> nose.config: INFO: Ignoring files matching ['^\\.', '^_',
'^setup\\.py$']
> # assign_hosts_network_label:
> Error while running thread
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 55, in
> _ret_via_queue
> queue.put({'return': func()})
> File "/var/local/lago/ovirt-system-tests/basic-suite-master/test-
> scenarios/005_network_by_label.py", line 56, in _assign_host_network_label
> host_nic=nic
> File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py",
> line 16231, in add
> headers={"Correlation-Id":correlation_id, "Expect":expect}
> File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py",
> line 79, in add
> return self.request('POST', url, body, headers, cls=cls)
> File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py",
> line 122, in request
> persistent_auth=self.__persistent_auth
> File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
> line 79, in do_request
> persistent_auth)
> File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
> line 162, in __do_request
> raise errors.RequestError(response_code, response_reason,
> response_body)
> RequestError:
> status: 409
> reason: Conflict
> detail: Cannot add Label. Operation can be performed only when Host
> status is Maintenance, Up, NonOperational.
>
This is an issue we've seen from time to time and have not figured it out
yet. Do you have engine logs for it?
Yes, I still have the given tests run instance available. Here's an
excerpt, I'll send you the complete logs off-list (they are large):
> I can also see occasional errors like the following in vdsm.log:
>
> ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from
> <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:192.168.201.3',
> 47434, 0, 0) at 0x271fd88>: (104, 'Connection reset by peer')
> (betterAsyncore:119)
>
This is the core issue of today's - but probably unrelated to the issue
you've just described, that we have seen happening from time to time in the
past (I'd say that I've seen it happening last time ~2 weeks ago or so, but
it's not reproducible easily to me).
Y.
>
> So we are probably dealing with an error that occurs "randomly" and is
> not related to a particular test.
>
> Daniel Belenky <dbelenky(a)redhat.com> writes:
>
> > Link to Jenkins
> > <
http://jenkins.ovirt.org/view/experimental%20jobs/job/
> test-repo_ovirt_experimental_master/4648/artifact/exported-
> artifacts/basic_suite_master.sh-el7/exported-artifacts/>
> >
> > On Wed, Jan 11, 2017 at 10:26 AM, Francesco Romani <fromani(a)redhat.com>
> > wrote:
> >
> >> Hi all
> >>
> >> On 01/11/2017 08:52 AM, Eyal Edri wrote:
> >>
> >> Adding Tomas from Virt.
> >>
> >> On Tue, Jan 10, 2017 at 10:54 AM, Piotr Kliczewski <
> >> piotr.kliczewski(a)gmail.com> wrote:
> >>
> >>> On Tue, Jan 10, 2017 at 9:29 AM, Daniel Belenky
<dbelenky(a)redhat.com>
> >>> wrote:
> >>> > Hi all,
> >>> >
> >>> > test-repo_ovirt_experimental_master (link to Jenkins) job failed
on
> >>> > basic_sanity scenario.
> >>> > The job was triggered by
https://gerrit.ovirt.org/#/c/69845/
> >>> >
> >>> > From looking at the logs, it seems that the reason is VDSM.
> >>> >
> >>> > In the VDSM log, i see the following error:
> >>> >
> >>> > 2017-01-09 16:47:41,331 ERROR (JsonRpc (StompReactor))
> [vds.dispatcher]
> >>> SSL
> >>> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher
connected
> >>> ('::1',
> >>> > 34942, 0, 0) at 0x36b95f0>: unexpected eof (betterAsyncore:119)
> >>>
> >>
> >> Daniel, could you please remind me the jenkins link? I see something
> >> suspicious on the Vdsm log.
> >> Most notably, Vdsm received SIGTERM. Is this expected and part of the
> test?
> >>
> >> >
> >>>
> >>> This issue means that the client closed connection while vdsm was
> >>> replying. It can happen at any time
> >>> when the client is not nice with the connection. As you can see the
> >>> client connected locally '::1'.
> >>>
> >>> >
> >>> > Also, when looking at the MOM logs, I see the the following:
> >>> >
> >>> > 2017-01-09 16:43:39,508 - mom.vdsmInterface - ERROR - Cannot
connect
> to
> >>> > VDSM! [Errno 111] Connection refused
> >>> >
> >>>
> >>> Looking at the log at this time vdsm had no open socket.
> >>
> >>
> >>
> >> Correct, but IIRC we have a race on startup - that's the reason why MOM
> >> retries to connect. After the new try, MOM seems to behave
> >> correctly:
> >>
> >> 2017-01-09 16:44:05,672 - mom.RPCServer - INFO - ping()
> >> 2017-01-09 16:44:05,673 - mom.RPCServer - INFO - getStatistics()
> >>
> >> --
> >> Francesco Romani
> >> Red Hat Engineering Virtualization R & D
> >> IRC: fromani
> >>
> >>
> _______________________________________________
> Devel mailing list
> Devel(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/devel
>