Didi - we just saw a report of a similar failure on engine-4.1's OST.
COuld you please backport these patches there too?
On Mon, Nov 27, 2017 at 2:57 PM, Yedidyah Bar David <didi(a)redhat.com> wrote:
On Mon, Nov 27, 2017 at 10:38 AM, Yedidyah Bar David
<didi(a)redhat.com>
wrote:
> On Sun, Nov 26, 2017 at 7:24 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
> > I think we need to check and report which process is listening on a
> port
> > when starting a server on that port fail.
>
> How do you know that a server was "started on that port", and that
> if failed specifically because it failed to bind?
>
> There is no standardized (Unix) way to mark that a service wants to
> listen on a specific port, or that it failed because a specific port
> was bound by some other process.
>
> There are various classical *inetd* daemons, and modern systemd.socket,
> that listen *instead* of some service. Then they can manage the port
> resources and perhaps do something intelligent about them.
>
> >
> > Didi, do you think we can integrate this in the deploy code, or this
> > should be implemented in each server?
>
> It should be quite easy to patch otopi's services.state to run something
> if start fails, e.g. 'ss -anp' or whatever you want.
>
> It should even be not-too-hard to do this in a self-contained plugin,
> so can be part of otopi-debug-plugins.
>
> If we decide that something needs to be implemented by each server,
> perhaps "something" should be to be controlled by a systemd.socket unit.
> Didn't try, though, to see what this actually buys us.
>
> >
> > Maybe when deployment fails, the deploy code can report all the
> > listening sockets and the processes bound to these sockets?
>
> Pushed now:
>
>
https://gerrit.ovirt.org/84699 core: Name TRANSACTION_INIT
>
https://gerrit.ovirt.org/84700 plugins: debug: Add debug_failure
>
https://gerrit.ovirt.org/84701 automation: Test failure
>
> Will merge soon, if all goes well.
>
Merged them.
Pushed to OST:
https://gerrit.ovirt.org/84710
Dafna - thanks for opening the bug on ovirt-imageio, but I am not
sure anyone can do much about it without more info, such as might
be provided by above patches. When I suggested below to open BZ
I meant on otopi or host-deploy to provide more debugging info,
not for imageio - obviously no harm in opening it, and it's good
to have it even if only for reference.
>
> Feel free to open BZ for other things discussed above, if relevant.
>
> >
> > Nir
> >
> > On Sun, Nov 26, 2017 at 7:11 PM Gal Ben Haim <gbenhaim(a)redhat.com>
> wrote:
> >>
> >> The failure is not consistent.
> >>
> >> On Sun, Nov 26, 2017 at 5:33 PM, Yaniv Kaul <ykaul(a)redhat.com> wrote:
> >>>
> >>>
> >>>
> >>> On Sun, Nov 26, 2017 at 4:53 PM, Gal Ben Haim
<gbenhaim(a)redhat.com>
> >>> wrote:
> >>>>
> >>>> We still see this issue on the upgrade suite from latest release to
> >>>> master [1].
> >>>> I don't see any evidence in "/var/log/messages" [2]
that
> >>>> "ovirt-imageio-proxy" was started twice.
> >>>
> >>>
> >>> Since it's not a registered port and a high port, could it be used
by
> >>> something else (what are the odds though ?
> >>> Is it consistent?
> >>> Y.
> >>>
> >>>>
> >>>>
> >>>> [1]
> >>>>
http://jenkins.ovirt.org/blue/rest/organizations/jenkins/pip
> elines/ovirt-master_change-queue-tester/runs/4153/nodes/123/
> steps/241/log/?start=0
> >>>>
> >>>> [2]
> >>>>
http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovir
> t-master_change-queue-tester/4153/artifact/exported-artifac
> ts/upgrade-from-release-suit-master-el7/test_logs/upgrade-
> from-release-suite-master/post-001_initialize_engine.py/
> lago-upgrade-from-release-suite-master-engine/_var_log/messages/*view*/
> >>>>
> >>>> On Fri, Nov 24, 2017 at 8:16 PM, Dafna Ron <dron(a)redhat.com>
wrote:
> >>>>>
> >>>>> there were two different patches reported as failing cq today
with
> the
> >>>>> ovirt-imageio-proxy service failing to start.
> >>>>>
> >>>>> Here is the latest failure:
> >>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
> r/4130/artifact
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 11/23/2017 03:39 PM, Allon Mureinik wrote:
> >>>>>
> >>>>> Daniel/Nir?
> >>>>>
> >>>>> On Thu, Nov 23, 2017 at 5:29 PM, Dafna Ron
<dron(a)redhat.com> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> We have a failing on test
> >>>>>> 001_initialize_engine.test_initialize_engine.
> >>>>>>
> >>>>>> This is failing with error Failed to start service
> >>>>>> 'ovirt-imageio-proxy
> >>>>>>
> >>>>>>
> >>>>>> Link and headline ofto suspected patches:
> >>>>>>
> >>>>>> build: Make resulting RPMs architecture-specific -
> >>>>>>
https://gerrit.ovirt.org/#/c/84534/
> >>>>>>
> >>>>>>
> >>>>>> Link to Job:
> >>>>>>
> >>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4055
> >>>>>>
> >>>>>>
> >>>>>> Link to all logs:
> >>>>>>
> >>>>>>
> >>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
> r/4055/artifact/
> >>>>>>
> >>>>>>
> >>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
> r/4055/artifact/exported-artifacts/upgrade-from-release-
> suit-master-el7/test_logs/upgrade-from-release-suite-
> master/post-001_initialize_engine.py/lago-upgrade-from-
> release-suite-master-engine/_var_log/messages/*view*/
> >>>>>>
> >>>>>>
> >>>>>> (Relevant) error snippet from the log:
> >>>>>>
> >>>>>> <error>
> >>>>>>
> >>>>>>
> >>>>>> from lago log:
> >>>>>>
> >>>>>> Failed to start service 'ovirt-imageio-proxy
> >>>>>>
> >>>>>> messages logs:
> >>>>>>
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> Starting Session 8 of user root.
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: Traceback (most recent call last):
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/bin/ovirt-imageio-proxy", line
> 85, in
> >>>>>> <module>
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: status = image_proxy.main(args,
config)
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
> >>>>>>
"/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py",
> line
> >>>>>> 21, in main
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: image_server.start(config)
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
> >>>>>>
"/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py",
> line 45,
> >>>>>> in start
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: WSGIRequestHandler)
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/SocketServer.py",
> line 419,
> >>>>>> in __init__
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: self.server_bind()
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/wsgiref/
> simple_server.py",
> >>>>>> line 48, in server_bind
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self)
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/BaseHTTPServer.py",
> line
> >>>>>> 108, in server_bind
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy:
SocketServer.TCPServer.server_bind(self)
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/SocketServer.py",
> line 430,
> >>>>>> in server_bind
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address)
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/socket.py", line
> 224, in
> >>>>>> meth
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args)
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address
already in
> use
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> ovirt-imageio-proxy.service: main process exited,
code=exited,
> >>>>>> status=1/FAILURE
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> Failed to start oVirt ImageIO Proxy.
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> Unit ovirt-imageio-proxy.service entered failed state.
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> ovirt-imageio-proxy.service failed.
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling
restart.
> >>>>>> Nov 23 07:30:47
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> Starting oVirt ImageIO Proxy...
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: Traceback (most recent call last):
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/bin/ovirt-imageio-proxy", line
> 85, in
> >>>>>> <module>
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: status = image_proxy.main(args,
config)
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
> >>>>>>
"/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py",
> line
> >>>>>> 21, in main
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: image_server.start(config)
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
> >>>>>>
"/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py",
> line 45,
> >>>>>> in start
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: WSGIRequestHandler)
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/SocketServer.py",
> line 419,
> >>>>>> in __init__
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: self.server_bind()
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/wsgiref/
> simple_server.py",
> >>>>>> line 48, in server_bind
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self)
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/BaseHTTPServer.py",
> line
> >>>>>> 108, in server_bind
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy:
SocketServer.TCPServer.server_bind(self)
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/SocketServer.py",
> line 430,
> >>>>>> in server_bind
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address)
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: File
"/usr/lib64/python2.7/socket.py", line
> 224, in
> >>>>>> meth
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args)
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> >>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address
already in
> use
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> ovirt-imageio-proxy.service: main process exited,
code=exited,
> >>>>>> status=1/FAILURE
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> Failed to start oVirt ImageIO Proxy.
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> Unit ovirt-imageio-proxy.service entered failed state.
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> ovirt-imageio-proxy.service failed.
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling
restart.
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> start request repeated too quickly for
ovirt-imageio-proxy.service
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> Failed to start oVirt ImageIO Proxy.
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> Unit ovirt-imageio-proxy.service entered failed state.
> >>>>>> Nov 23 07:30:48
lago-upgrade-from-release-suite-master-engine
> systemd:
> >>>>>> ovirt-imageio-proxy.service failed.
> >>>>>>
> >>>>>> </error>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Infra mailing list
> >>>>>> Infra(a)ovirt.org
> >>>>>>
http://lists.ovirt.org/mailman/listinfo/infra
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Devel mailing list
> >>>>> Devel(a)ovirt.org
> >>>>>
http://lists.ovirt.org/mailman/listinfo/devel
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> GAL bEN HAIM
> >>>> RHV DEVOPS
> >>>>
> >>>> _______________________________________________
> >>>> Devel mailing list
> >>>> Devel(a)ovirt.org
> >>>>
http://lists.ovirt.org/mailman/listinfo/devel
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> GAL bEN HAIM
> >> RHV DEVOPS
> >> _______________________________________________
> >> Devel mailing list
> >> Devel(a)ovirt.org
> >>
http://lists.ovirt.org/mailman/listinfo/devel
>
>
>
> --
> Didi
>
--
Didi
_______________________________________________
Devel mailing list
Devel(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel