[JIRA] (OVIRT-1324) Re: Jenkins check-merged failure on Vdsm 4.1

Nadav Goldin (oVirt JIRA) jira at ovirt-jira.atlassian.net
Fri Apr 21 08:25:12 UTC 2017


    [ https://ovirt-jira.atlassian.net/browse/OVIRT-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=30000#comment-30000 ] 

Nadav Goldin commented on OVIRT-1324:
-------------------------------------

Adding more information from the email thread(probably missed here):
{quote}Hi Milan, sorry for missing this.

In short, it looks like a libvirt/qemu error, I guess it lays
somewhere in the nested environment the Jenkins slave runs at. I was
able to extract the libvirt log from this specific run, but there is
nothing useful there, except that there was no proper termination.
>From reading here[1] it might be related to a load on the hypervisor,
and the timeout configured for libvirt to wait for qemu. Unfortunately
looking at the this[2] thread, it seems that a patch to configure the
timeout never got into libvirt, which leaves us with a default of 30
seconds, and that might not be enough in our nested environment. I
presume that if the hypervisor which the Jenkins slave runs is highly
loaded, then when we try to start the vdsm_functional_tests_lago VM,
it might take more than 30 seconds for qemu to respond.

Another indication of this "hypothesis" is that I never seen this
error on OST - which uses bare-metal slaves.

Evgheni, do we have the load monitoring on the hypervisor that runs
vm0065.workers-phx.ovirt.org? Not sure if we added that eventually.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=987088
[2] https://www.redhat.com/archives/libvir-list/2014-January/msg00410.html
{quote}
[~ederevea]

> Re: Jenkins check-merged failure on Vdsm 4.1
> --------------------------------------------
>
>                 Key: OVIRT-1324
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1324
>             Project: oVirt - virtualization made easy
>          Issue Type: By-EMAIL
>            Reporter: Gil Shinar
>            Assignee: infra
>
> Adding infra-support so a ticket will be opened
> Milan, is it still relevant?
> Thanks
> Gil
> On Mon, Apr 10, 2017 at 10:56 AM, Milan Zamazal <mzamazal at redhat.com> wrote:
> > Hi,
> >
> > after my Vdsm patch https://gerrit.ovirt.org/75329 in ovirt-4.1 branch
> > had been merged, Jenkins check-merged job
> > http://jenkins.ovirt.org/job/vdsm_4.1_check-merged-el7-x86_64/173/
> > failed with the following error:
> >
> >   07:01:21 @ Start specified VMs:
> >   07:01:21   # Start nets:
> >   07:01:21     * Create network vdsm_functional_tests_lago:
> >   07:01:27     * Create network vdsm_functional_tests_lago: Success (in
> > 0:00:05)
> >   07:01:27   # Start nets: Success (in 0:00:05)
> >   07:01:27   # Start vms:
> >   07:01:27     * Starting VM vdsm_functional_tests_host-el7:
> >   07:02:07 libvirt: QEMU Driver error : monitor socket did not show up: No
> > such file or directory
> >   07:02:07     * Starting VM vdsm_functional_tests_host-el7: ERROR (in
> > 0:00:40)
> >   07:02:07   # Start vms: ERROR (in 0:00:40)
> >   07:02:07   # Destroy network vdsm_functional_tests_lago:
> >   07:02:07   # Destroy network vdsm_functional_tests_lago: ERROR (in
> > 0:00:00)
> >   07:02:07 @ Start specified VMs: ERROR (in 0:00:46)
> >   07:02:07 Error occured, aborting
> >   07:02:07 Traceback (most recent call last):
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/cmd.py", line
> > 936, in main
> >   07:02:07     cli_plugins[args.verb].do_run(args)
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py",
> > line 184, in do_run
> >   07:02:07     self._do_run(**vars(args))
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/utils.py", line
> > 495, in wrapper
> >   07:02:07     return func(*args, **kwargs)
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/utils.py", line
> > 506, in wrapper
> >   07:02:07     return func(*args, prefix=prefix, **kwargs)
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/cmd.py", line
> > 264, in do_start
> >   07:02:07     prefix.start(vm_names=vm_names)
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/prefix.py", line
> > 1033, in start
> >   07:02:07     self.virt_env.start(vm_names=vm_names)
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/virt.py", line
> > 331, in start
> >   07:02:07     vm.start()
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py",
> > line 299, in start
> >   07:02:07     return self.provider.start(*args, **kwargs)
> >   07:02:07   File "/usr/lib/python2.7/site-packages/lago/vm.py", line
> > 106, in start
> >   07:02:07     dom = self.libvirt_con.createXML(self._libvirt_xml())
> >   07:02:07   File "/usr/lib64/python2.7/site-packages/libvirt.py", line
> > 3782, in createXML
> >   07:02:07     if ret is None:raise libvirtError('virDomainCreateXML()
> > failed', conn=self)
> >   07:02:07 libvirtError: monitor socket did not show up: No such file or
> > directory
> >   07:02:07 Took 210 seconds
> >
> > The error is apparently unrelated to my patch since: 1. my patch should
> > have nothing to do with VM start; 2. Jenkins has run successfully on the
> > following patch (https://gerrit.ovirt.org/75321).  FWIW, the preceding
> > patch (https://gerrit.ovirt.org/75038) has run successfully too.
> >
> > Do you know what's wrong?
> >
> > Thanks,
> > Milan
> > _______________________________________________
> > Infra mailing list
> > Infra at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/infra
> >



--
This message was sent by Atlassian JIRA
(v1000.910.0#100040)


More information about the Infra mailing list