[ovirt-devel] suspend_resume_vm fail on master experimental

Milan Zamazal mzamazal at redhat.com
Wed Jan 11 12:16:08 UTC 2017


Yaniv Kaul <ykaul at redhat.com> writes:

> On Wed, Jan 11, 2017 at 12:49 PM, Milan Zamazal <mzamazal at redhat.com> wrote:
>
>> I just ran ovirt-system-tests on two very different machines.  It passed
>> on one of them, while it failed on the other one, at a different place:
>>
>>   @ Run test: 005_network_by_label.py:
>>   nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
>>     # assign_hosts_network_label:
>>   Error while running thread
>>   Traceback (most recent call last):
>>     File "/usr/lib/python2.7/site-packages/lago/utils.py", line 55, in
>> _ret_via_queue
>>       queue.put({'return': func()})
>>     File "/var/local/lago/ovirt-system-tests/basic-suite-master/test-
>> scenarios/005_network_by_label.py", line 56, in _assign_host_network_label
>>       host_nic=nic
>>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py",
>> line 16231, in add
>>       headers={"Correlation-Id":correlation_id, "Expect":expect}
>>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py",
>> line 79, in add
>>       return self.request('POST', url, body, headers, cls=cls)
>>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py",
>> line 122, in request
>>       persistent_auth=self.__persistent_auth
>>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
>> line 79, in do_request
>>       persistent_auth)
>>     File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
>> line 162, in __do_request
>>       raise errors.RequestError(response_code, response_reason,
>> response_body)
>>   RequestError:
>>   status: 409
>>   reason: Conflict
>>   detail: Cannot add Label. Operation can be performed only when Host
>> status is  Maintenance, Up, NonOperational.
>>
>
> This is an issue we've seen from time to time and have not figured it out
> yet. Do you have engine logs for it?

Yes, I still have the given tests run instance available.  Here's an
excerpt, I'll send you the complete logs off-list (they are large):

-------------- next part --------------
A non-text attachment was scrubbed...
Name: engine.log-excerpt.xz
Type: application/x-xz
Size: 31504 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170111/2d0ace62/attachment-0001.xz>
-------------- next part --------------

>> I can also see occasional errors like the following in vdsm.log:
>>
>>   ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from
>> <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:192.168.201.3',
>> 47434, 0, 0) at 0x271fd88>: (104, 'Connection reset by peer')
>> (betterAsyncore:119)
>>
>
> This is the core issue of today's - but probably unrelated to the issue
> you've just described, that we have seen happening from time to time in the
> past (I'd say that I've seen it happening last time ~2 weeks ago or so, but
> it's not reproducible easily to me).
> Y.
>
>
>>
>> So we are probably dealing with an error that occurs "randomly" and is
>> not related to a particular test.
>>
>> Daniel Belenky <dbelenky at redhat.com> writes:
>>
>> > Link to Jenkins
>> > <http://jenkins.ovirt.org/view/experimental%20jobs/job/
>> test-repo_ovirt_experimental_master/4648/artifact/exported-
>> artifacts/basic_suite_master.sh-el7/exported-artifacts/>
>> >
>> > On Wed, Jan 11, 2017 at 10:26 AM, Francesco Romani <fromani at redhat.com>
>> > wrote:
>> >
>> >> Hi all
>> >>
>> >> On 01/11/2017 08:52 AM, Eyal Edri wrote:
>> >>
>> >> Adding Tomas from Virt.
>> >>
>> >> On Tue, Jan 10, 2017 at 10:54 AM, Piotr Kliczewski <
>> >> piotr.kliczewski at gmail.com> wrote:
>> >>
>> >>> On Tue, Jan 10, 2017 at 9:29 AM, Daniel Belenky <dbelenky at redhat.com>
>> >>> wrote:
>> >>> > Hi all,
>> >>> >
>> >>> > test-repo_ovirt_experimental_master (link to Jenkins) job failed on
>> >>> > basic_sanity scenario.
>> >>> > The job was triggered by https://gerrit.ovirt.org/#/c/69845/
>> >>> >
>> >>> > From looking at the logs, it seems that the reason is VDSM.
>> >>> >
>> >>> > In the VDSM log, i see the following error:
>> >>> >
>> >>> > 2017-01-09 16:47:41,331 ERROR (JsonRpc (StompReactor))
>> [vds.dispatcher]
>> >>> SSL
>> >>> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
>> >>> ('::1',
>> >>> > 34942, 0, 0) at 0x36b95f0>: unexpected eof (betterAsyncore:119)
>> >>>
>> >>
>> >> Daniel, could you please remind me the jenkins link? I see something
>> >> suspicious on the Vdsm log.
>> >> Most notably, Vdsm received SIGTERM. Is this expected and part of the
>> test?
>> >>
>> >> >
>> >>>
>> >>> This issue means that the client closed connection while vdsm was
>> >>> replying. It can happen at any time
>> >>> when the client is not nice with the connection. As you can see the
>> >>> client connected locally '::1'.
>> >>>
>> >>> >
>> >>> > Also, when looking at the MOM logs, I see the the following:
>> >>> >
>> >>> > 2017-01-09 16:43:39,508 - mom.vdsmInterface - ERROR - Cannot connect
>> to
>> >>> > VDSM! [Errno 111] Connection refused
>> >>> >
>> >>>
>> >>> Looking at the log at this time vdsm had no open socket.
>> >>
>> >>
>> >>
>> >> Correct, but IIRC we have a race on startup - that's the reason why MOM
>> >> retries to connect. After the new try, MOM seems to behave
>> >> correctly:
>> >>
>> >> 2017-01-09 16:44:05,672 - mom.RPCServer - INFO - ping()
>> >> 2017-01-09 16:44:05,673 - mom.RPCServer - INFO - getStatistics()
>> >>
>> >> --
>> >> Francesco Romani
>> >> Red Hat Engineering Virtualization R & D
>> >> IRC: fromani
>> >>
>> >>
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>>


More information about the Devel mailing list