[ovirt-devel] suspend_resume_vm fail on master experimental

Yaniv Kaul ykaul at redhat.com
Wed Jan 11 08:55:39 UTC 2017


On Wed, Jan 11, 2017 at 10:53 AM, Piotr Kliczewski <
piotr.kliczewski at gmail.com> wrote:

> On Wed, Jan 11, 2017 at 9:29 AM, Yaniv Kaul <ykaul at redhat.com> wrote:
> >
> >
> > On Wed, Jan 11, 2017 at 10:26 AM, Francesco Romani <fromani at redhat.com>
> > wrote:
> >>
> >> Hi all
> >>
> >>
> >> On 01/11/2017 08:52 AM, Eyal Edri wrote:
> >>
> >> Adding Tomas from Virt.
> >>
> >> On Tue, Jan 10, 2017 at 10:54 AM, Piotr Kliczewski
> >> <piotr.kliczewski at gmail.com> wrote:
> >>>
> >>> On Tue, Jan 10, 2017 at 9:29 AM, Daniel Belenky <dbelenky at redhat.com>
> >>> wrote:
> >>> > Hi all,
> >>> >
> >>> > test-repo_ovirt_experimental_master (link to Jenkins) job failed on
> >>> > basic_sanity scenario.
> >>> > The job was triggered by https://gerrit.ovirt.org/#/c/69845/
> >>> >
> >>> > From looking at the logs, it seems that the reason is VDSM.
> >>> >
> >>> > In the VDSM log, i see the following error:
> >>> >
> >>> > 2017-01-09 16:47:41,331 ERROR (JsonRpc (StompReactor))
> [vds.dispatcher]
> >>> > SSL
> >>> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> >>> > ('::1',
> >>> > 34942, 0, 0) at 0x36b95f0>: unexpected eof (betterAsyncore:119)
> >>
> >>
> >> Daniel, could you please remind me the jenkins link? I see something
> >> suspicious on the Vdsm log.
> >
> >
> > Please use my live system:
> > ssh mini at ykaul-mini.tlv.redhat.com (qum5net)
> > then run a console to the VM:
> > lagocli --prefix-path /dev/shm/run/current shell engine
> >
> > (or 'host0' for the host)
> >
> >> Most notably, Vdsm received SIGTERM. Is this expected and part of the
> >> test?
> >
> >
> > It's not.
> >
> >
> >>
> >>
> >>
> >>> >
> >>>
> >>> This issue means that the client closed connection while vdsm was
> >>> replying. It can happen at any time
> >>> when the client is not nice with the connection. As you can see the
> >>> client connected locally '::1'.
> >>>
> >>> >
> >>> > Also, when looking at the MOM logs, I see the the following:
> >>> >
> >>> > 2017-01-09 16:43:39,508 - mom.vdsmInterface - ERROR - Cannot connect
> to
> >>> > VDSM! [Errno 111] Connection refused
> >>> >
> >>>
> >>> Looking at the log at this time vdsm had no open socket.
> >>
> >>
> >>
> >> Correct, but IIRC we have a race on startup - that's the reason why MOM
> >> retries to connect. After the new try, MOM seems to behave
> >> correctly:
> >>
> >> 2017-01-09 16:44:05,672 - mom.RPCServer - INFO - ping()
> >> 2017-01-09 16:44:05,673 - mom.RPCServer - INFO - getStatistics()
> >
> >
> > But there are multiple other disconnections, without anything in mom log:
> >
> > [root at lago-basic-suite-master-host0 vdsm]# grep SSL
> /var/log/vdsm/vdsm.log
> > |grep "::1"
> > 2017-01-11 02:29:46,310 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
> SSL
> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> ('::1',
> > 49046, 0, 0) at 0x39bf1b8>: unexpected eof (betterAsyncore:119)
> > 2017-01-11 02:29:51,089 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
> SSL
> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> ('::1',
> > 49048, 0, 0) at 0x39d1ea8>: unexpected eof (betterAsyncore:119)
> > 2017-01-11 02:29:51,392 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
> SSL
> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> ('::1',
> > 49050, 0, 0) at 0x39d1710>: unexpected eof (betterAsyncore:119)
> > 2017-01-11 02:29:51,700 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
> SSL
> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> ('::1',
> > 49052, 0, 0) at 0x39d1128>: unexpected eof (betterAsyncore:119)
> > 2017-01-11 02:29:52,008 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
> SSL
> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> ('::1',
> > 49054, 0, 0) at 0x39d1128>: unexpected eof (betterAsyncore:119)
> > 2017-01-11 02:37:34,019 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
> SSL
> > error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected
> ('::1',
> > 49032, 0, 0) at 0x380cef0>: unexpected eof (betterAsyncore:119)
> >
>
> If there is no hosted engine this could be only mom. This is
> indication of connection closure.
>

There is no hosted-engine, although the services are installed (since we've
installed the Cockpit stuff) - but not working.
Y.


>
> >
> >>
> >>
> >>
> >> --
> >> Francesco Romani
> >> Red Hat Engineering Virtualization R & D
> >> IRC: fromani
> >>
> >>
> >> _______________________________________________
> >> Devel mailing list
> >> Devel at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/devel
> >
> >
> >
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170111/1ce9dd98/attachment.html>


More information about the Devel mailing list