On Mon, Mar 25, 2019 at 12:20 PM Greg Sheremeta <gshereme(a)redhat.com> wrote:
Not related to ui-extensions (which is very frontend only,
Javascript
project)
I know its not, but this is just to surface that these failures needs to be
investigated in depth by a relevant team in development to find root cause,
something that isn't being done today enough IMHO, which leads to OST being
broken for longer times.
<testcase classname="002_bootstrap"
name="add_master_storage_domain" time="0.514">
<error type="exceptions.RuntimeError" *message="Could not find
hosts that are up in DC test-dc*
-------------------- >> begin captured logging <<
-------------------- lago.ssh: DEBUG: start
task:ffc0094a-1134-4072-b8d7-3f2ea75eea7f:Get ssh client for
lago-basic-suite-4-3-engine: lago.ssh: DEBUG: end
task:ffc0094a-1134-4072-b8d7-3f2ea75eea7f:Get ssh client for
lago-basic-suite-4-3-engine: lago.ssh: DEBUG: Ru
On Mon, Mar 25, 2019, 5:15 AM Eyal Edri <eedri(a)redhat.com> wrote:
> Still fails, now on a different component. ( ovirt-web-ui-extentions )
>
>
https://jenkins.ovirt.org/job/ovirt-4.3_change-queue-tester/339/
>
> On Fri, Mar 22, 2019 at 3:59 PM Dan Kenigsberg <danken(a)redhat.com> wrote:
>
>>
>>
>> On Fri, Mar 22, 2019 at 3:21 PM Marcin Sobczyk <msobczyk(a)redhat.com>
>> wrote:
>>
>>> Dafna,
>>>
>>> in 'verify_add_hosts' we specifically wait for single host to be up
>>> with a timeout:
>>>
>>> 144 up_hosts = hosts_service.list(search='datacenter={} AND
status=up'.format(DC_NAME))
>>> 145 if len(up_hosts):
>>> 146 return True
>>>
>>> The log files say, that it took ~50 secs for one of the hosts to be up
>>> (seems reasonable) and no timeout is being reported.
>>> Just after running 'verify_add_hosts', we run
>>> 'add_master_storage_domain', which calls '_hosts_in_dc'
function.
>>> That function does the exact same check, but it fails:
>>>
>>> 113 hosts = hosts_service.list(search='datacenter={} AND
status=up'.format(dc_name))
>>> 114 if hosts:
>>> 115 if random_host:
>>> 116 return random.choice(hosts)
>>>
>>> I don't think it is relevant to our current failure; but I consider
>> random_host=True as a bad practice. As if we do not have enough moving
>> parts, we are adding intentional randomness. Reproducibility is far more
>> important than coverage - particularly for a shared system test like OST.
>>
>>> 117 else:
>>> 118 return sorted(hosts, key=lambda host: host.name)
>>> 119 raise RuntimeError('Could not find hosts that are up in DC
%s' % dc_name)
>>>
>>>
>>> I'm also not able to reproduce this issue locally on my server. The
>>> investigation continues...
>>>
>>
>> I think that it would be fair to take the filtering by host state out of
>> Engine and into the test, where we can easily log the current status of
>> each host. Then we'd have better understanding on the next failure.
>>
>> On 3/22/19 1:17 PM, Marcin Sobczyk wrote:
>>>
>>> Hi,
>>>
>>> sure, I'm on it - it's weird though, I did ran 4.3 basic suite for
this
>>> patch manually and everything was ok.
>>> On 3/22/19 1:05 PM, Dafna Ron wrote:
>>>
>>> Hi,
>>>
>>> We are failing branch 4.3 for test:
>>> 002_bootstrap.add_master_storage_domain
>>>
>>> It seems that in one of the hosts, the vdsm is not starting
>>> there is nothing in vdsm.log or in supervdsm.log
>>>
>>> CQ identified this patch as the suspected root cause:
>>>
>>>
https://gerrit.ovirt.org/#/c/98748/ - vdsm: client: Add support for
>>> flow id
>>>
>>> Milan, Marcin, can you please have a look?
>>>
>>> full logs:
>>>
>>>
>>>
http://jenkins.ovirt.org/job/ovirt-4.3_change-queue-tester/326/artifact/b...
>>>
>>> the only error I can see is about host not being up (makes sense as
>>> vdsm is not running)
>>>
>>> Stacktrace
>>>
>>> File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
>>> testMethod()
>>> File "/usr/lib/python2.7/site-packages/nose/case.py", line 197,
in runTest
>>> self.test(*self.arg)
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py",
line 142, in wrapped_test
>>> test()
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py",
line 60, in wrapper
>>> return func(get_test_prefix(), *args, **kwargs)
>>> File
"/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py",
line 417, in add_master_storage_domain
>>> add_iscsi_storage_domain(prefix)
>>> File
"/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py",
line 561, in add_iscsi_storage_domain
>>> host=_random_host_from_dc(api, DC_NAME),
>>> File
"/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py",
line 122, in _random_host_from_dc
>>> return _hosts_in_dc(api, dc_name, True)
>>> File
"/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py",
line 119, in _hosts_in_dc
>>> raise RuntimeError('Could not find hosts that are up in DC %s' %
dc_name)
>>> 'Could not find hosts that are up in DC test-dc\n--------------------
>> begin captured logging << --------------------\nlago.ssh: DEBUG: start
task:937bdea7-a2a3-47ad-9383-36647ea37ddf:Get ssh client for
lago-basic-suite-4-3-engine:\nlago.ssh: DEBUG: end
task:937bdea7-a2a3-47ad-9383-36647ea37ddf:Get ssh client for
lago-basic-suite-4-3-engine:\nlago.ssh: DEBUG: Running c07b5ee2 on
lago-basic-suite-4-3-engine: cat /root/multipath.txt\nlago.ssh: DEBUG: Command c07b5ee2 on
lago-basic-suite-4-3-engine returned with 0\nlago.ssh: DEBUG: Command c07b5ee2 on
lago-basic-suite-4-3-engine output:\n
3600140516f88cafa71243648ea218995\n360014053e28f60001764fed9978ec4b3\n360014059edc777770114a6484891dcf1\n36001405d93d8585a50d43a4ad0bd8d19\n36001405e31361631de14bcf87d43e55a\n\n-----------
>>>
>>> _______________________________________________
>>> Devel mailing list -- devel(a)ovirt.org
>>> To unsubscribe send an email to devel-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>>
https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/J4NCHXTK5ZY...
>>>
>> _______________________________________________
>> Devel mailing list -- devel(a)ovirt.org
>> To unsubscribe send an email to devel-leave(a)ovirt.org
>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>>
https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/ULS4OKU2YZF...
>>
>
>
> --
>
> Eyal edri
>
>
> MANAGER
>
> RHV/CNV DevOps
>
> EMEA VIRTUALIZATION R&D
>
>
> Red Hat EMEA <
https://www.redhat.com/>
> <
https://red.ht/sig> TRIED. TESTED. TRUSTED.
<
https://redhat.com/trusted>
> phone: +972-9-7692018
> irc: eedri (on #tlv #rhev-dev #rhev-integ)
> _______________________________________________
> Devel mailing list -- devel(a)ovirt.org
> To unsubscribe send an email to devel-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/EM7QDDNG523...
>
--
Eyal edri
MANAGER
RHV/CNV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <
TRIED. TESTED. TRUSTED. <