On Mon, Nov 27, 2017 at 10:38 AM, Yedidyah Bar David <didi(a)redhat.com>
wrote:
On Sun, Nov 26, 2017 at 7:24 PM, Nir Soffer
<nsoffer(a)redhat.com> wrote:
> I think we need to check and report which process is listening on a port
> when starting a server on that port fail.
How do you know that a server was "started on that port", and that
if failed specifically because it failed to bind?
There is no standardized (Unix) way to mark that a service wants to
listen on a specific port, or that it failed because a specific port
was bound by some other process.
There are various classical *inetd* daemons, and modern systemd.socket,
that listen *instead* of some service. Then they can manage the port
resources and perhaps do something intelligent about them.
>
> Didi, do you think we can integrate this in the deploy code, or this
> should be implemented in each server?
It should be quite easy to patch otopi's services.state to run something
if start fails, e.g. 'ss -anp' or whatever you want.
It should even be not-too-hard to do this in a self-contained plugin,
so can be part of otopi-debug-plugins.
If we decide that something needs to be implemented by each server,
perhaps "something" should be to be controlled by a systemd.socket unit.
Didn't try, though, to see what this actually buys us.
>
> Maybe when deployment fails, the deploy code can report all the
> listening sockets and the processes bound to these sockets?
Pushed now:
https://gerrit.ovirt.org/84699 core: Name TRANSACTION_INIT
https://gerrit.ovirt.org/84700 plugins: debug: Add debug_failure
https://gerrit.ovirt.org/84701 automation: Test failure
Will merge soon, if all goes well.
Dafna - thanks for opening the bug on ovirt-imageio, but I am not
sure anyone can do much about it without more info, such as might
be provided by above patches. When I suggested below to open BZ
I meant on otopi or host-deploy to provide more debugging info,
not for imageio - obviously no harm in opening it, and it's good
to have it even if only for reference.
Feel free to open BZ for other things discussed above, if relevant.
>
> Nir
>
> On Sun, Nov 26, 2017 at 7:11 PM Gal Ben Haim <gbenhaim(a)redhat.com>
wrote:
>>
>> The failure is not consistent.
>>
>> On Sun, Nov 26, 2017 at 5:33 PM, Yaniv Kaul <ykaul(a)redhat.com> wrote:
>>>
>>>
>>>
>>> On Sun, Nov 26, 2017 at 4:53 PM, Gal Ben Haim <gbenhaim(a)redhat.com>
>>> wrote:
>>>>
>>>> We still see this issue on the upgrade suite from latest release to
>>>> master [1].
>>>> I don't see any evidence in "/var/log/messages" [2] that
>>>> "ovirt-imageio-proxy" was started twice.
>>>
>>>
>>> Since it's not a registered port and a high port, could it be used by
>>> something else (what are the odds though ?
>>> Is it consistent?
>>> Y.
>>>
>>>>
>>>>
>>>> [1]
>>>>
http://jenkins.ovirt.org/blue/rest/organizations/jenkins/
pipelines/ovirt-master_change-queue-tester/runs/4153/nodes/
123/steps/241/log/?start=0
>>>>
>>>> [2]
>>>>
http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/
ovirt-master_change-queue-tester/4153/artifact/exported-
artifacts/upgrade-from-release-suit-master-el7/test_
logs/upgrade-from-release-suite-master/post-001_initialize_engine.py/lago-
upgrade-from-release-suite-master-engine/_var_log/messages/*view*/
>>>>
>>>> On Fri, Nov 24, 2017 at 8:16 PM, Dafna Ron <dron(a)redhat.com>
wrote:
>>>>>
>>>>> there were two different patches reported as failing cq today with
the
>>>>> ovirt-imageio-proxy service failing to start.
>>>>>
>>>>> Here is the latest failure:
>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-
tester/4130/artifact
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 11/23/2017 03:39 PM, Allon Mureinik wrote:
>>>>>
>>>>> Daniel/Nir?
>>>>>
>>>>> On Thu, Nov 23, 2017 at 5:29 PM, Dafna Ron <dron(a)redhat.com>
wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a failing on test
>>>>>> 001_initialize_engine.test_initialize_engine.
>>>>>>
>>>>>> This is failing with error Failed to start service
>>>>>> 'ovirt-imageio-proxy
>>>>>>
>>>>>>
>>>>>> Link and headline ofto suspected patches:
>>>>>>
>>>>>> build: Make resulting RPMs architecture-specific -
>>>>>>
https://gerrit.ovirt.org/#/c/84534/
>>>>>>
>>>>>>
>>>>>> Link to Job:
>>>>>>
>>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4055
>>>>>>
>>>>>>
>>>>>> Link to all logs:
>>>>>>
>>>>>>
>>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-
tester/4055/artifact/
>>>>>>
>>>>>>
>>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-
tester/4055/artifact/exported-artifacts/upgrade-from-
release-suit-master-el7/test_logs/upgrade-from-release-
suite-master/post-001_initialize_engine.py/lago-
upgrade-from-release-suite-master-engine/_var_log/messages/*view*/
>>>>>>
>>>>>>
>>>>>> (Relevant) error snippet from the log:
>>>>>>
>>>>>> <error>
>>>>>>
>>>>>>
>>>>>> from lago log:
>>>>>>
>>>>>> Failed to start service 'ovirt-imageio-proxy
>>>>>>
>>>>>> messages logs:
>>>>>>
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> Starting Session 8 of user root.
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: Traceback (most recent call last):
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/bin/ovirt-imageio-proxy", line 85,
in
>>>>>> <module>
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: status = image_proxy.main(args, config)
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
>>>>>>
"/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py",
line
>>>>>> 21, in main
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: image_server.start(config)
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
>>>>>>
"/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py",
line 45,
>>>>>> in start
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: WSGIRequestHandler)
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/SocketServer.py",
line 419,
>>>>>> in __init__
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: self.server_bind()
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/wsgiref/
simple_server.py",
>>>>>> line 48, in server_bind
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self)
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/BaseHTTPServer.py",
line
>>>>>> 108, in server_bind
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: SocketServer.TCPServer.server_bind(self)
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/SocketServer.py",
line 430,
>>>>>> in server_bind
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address)
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/socket.py", line
224, in
>>>>>> meth
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args)
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address already in
use
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> ovirt-imageio-proxy.service: main process exited, code=exited,
>>>>>> status=1/FAILURE
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> Failed to start oVirt ImageIO Proxy.
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> Unit ovirt-imageio-proxy.service entered failed state.
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> ovirt-imageio-proxy.service failed.
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling
restart.
>>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> Starting oVirt ImageIO Proxy...
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: Traceback (most recent call last):
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/bin/ovirt-imageio-proxy", line 85,
in
>>>>>> <module>
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: status = image_proxy.main(args, config)
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
>>>>>>
"/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py",
line
>>>>>> 21, in main
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: image_server.start(config)
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
>>>>>>
"/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py",
line 45,
>>>>>> in start
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: WSGIRequestHandler)
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/SocketServer.py",
line 419,
>>>>>> in __init__
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: self.server_bind()
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/wsgiref/
simple_server.py",
>>>>>> line 48, in server_bind
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self)
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/BaseHTTPServer.py",
line
>>>>>> 108, in server_bind
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: SocketServer.TCPServer.server_bind(self)
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/SocketServer.py",
line 430,
>>>>>> in server_bind
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address)
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/socket.py", line
224, in
>>>>>> meth
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args)
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
>>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address already in
use
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> ovirt-imageio-proxy.service: main process exited, code=exited,
>>>>>> status=1/FAILURE
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> Failed to start oVirt ImageIO Proxy.
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> Unit ovirt-imageio-proxy.service entered failed state.
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> ovirt-imageio-proxy.service failed.
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling
restart.
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> start request repeated too quickly for
ovirt-imageio-proxy.service
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> Failed to start oVirt ImageIO Proxy.
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> Unit ovirt-imageio-proxy.service entered failed state.
>>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine
systemd:
>>>>>> ovirt-imageio-proxy.service failed.
>>>>>>
>>>>>> </error>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Infra mailing list
>>>>>> Infra(a)ovirt.org
>>>>>>
http://lists.ovirt.org/mailman/listinfo/infra
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Devel mailing list
>>>>> Devel(a)ovirt.org
>>>>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> GAL bEN HAIM
>>>> RHV DEVOPS
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/devel
>>>
>>>
>>
>>
>>
>> --
>> GAL bEN HAIM
>> RHV DEVOPS
>> _______________________________________________
>> Devel mailing list
>> Devel(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/devel
--
Didi