On 11/22/19 4:54 PM, Martin Perina wrote:
On Fri, Nov 22, 2019 at 4:43 PM Dominik Holler <dholler(a)redhat.com> wrote:
>
> On Fri, Nov 22, 2019 at 12:17 PM Dominik Holler <dholler(a)redhat.com>
> wrote:
>
>>
>>
>> On Fri, Nov 22, 2019 at 12:00 PM Miguel Duarte de Mora Barroso <
>> mdbarroso(a)redhat.com> wrote:
>>
>>> On Fri, Nov 22, 2019 at 11:54 AM Vojtech Juranek <vjuranek(a)redhat.com>
>>> wrote:
>>> >
>>> > On pátek 22. listopadu 2019 9:56:56 CET Miguel Duarte de Mora Barroso
>>> wrote:
>>> > > On Fri, Nov 22, 2019 at 9:49 AM Vojtech Juranek <
>>> vjuranek(a)redhat.com>
>>> > > wrote:
>>> > > >
>>> > > >
>>> > > > On pátek 22. listopadu 2019 9:41:26 CET Dominik Holler wrote:
>>> > > >
>>> > > > > On Fri, Nov 22, 2019 at 8:40 AM Dominik Holler <
>>> dholler(a)redhat.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > On Thu, Nov 21, 2019 at 10:54 PM Nir Soffer <
>>> nsoffer(a)redhat.com>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > >> On Thu, Nov 21, 2019 at 11:24 PM Vojtech
Juranek
>>> > > > > >> <vjuranek(a)redhat.com>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> wrote:
>>> > > > > >>
>>> > > > > >> > Hi,
>>> > > > > >> > OST fails (see e.g. [1]) in
>>> 002_bootstrap.check_update_host. It
>>> > > > > >> > fails
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> with
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> > FAILED! => {"changed": false,
"failures": [], "msg":
>>> "Depsolve
>>> > > > > >> > Error
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> occured:
>>> > > > > >>
>>> > > > > >> > \n Problem 1: cannot install the best
update candidate for
>>> package
>>> > > > > >> > vdsm-
>>> > > > > >> >
network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n - nothing
>>> provides
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> nmstate
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> > needed by
>>> vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>>> > > > > >> > Problem 2:
>>> > > > > >> > package
vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch
>>> requires
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> vdsm-network
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> > = 4.40.0-1271.git524e08c8a.el8, but none of
the providers
>>> can be
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> installed\n
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> > - cannot install the best update candidate
for package
>>> vdsm-
>>> > > > > >> >
python-4.40.0-1236.git63ea8cb8b.el8.noarch\n - nothing
>>> provides
>>> > > > > >> > nmstate
>>> > > > > >> > needed by
>>> vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> nmstate should be provided by copr repo enabled
by
>>> > > > > >> ovirt-release-master.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > I re-triggered as
>>> > > > > >
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131
>>> > > > > > maybe
>>> > > > > >
https://gerrit.ovirt.org/#/c/104825/
>>> > > > > > was missing
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > Looks like
>>> > > > >
https://gerrit.ovirt.org/#/c/104825/ is ignored by OST.
>>> > > >
>>> > > >
>>> > > >
>>> > > > maybe not. You re-triggered with [1], which really missed
this
>>> patch.
>>> > > > I did a rebase and now running with this patch in build #6132
>>> [2]. Let's
>>> > > > wait
>>> > for it to see if gerrit #104825 helps.
>>> > > >
>>> > > >
>>> > > >
>>> > > > [1]
https://jenkins.ovirt.org/job/standard-manual-runner/909/
>>> > > > [2]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/
>>> > > >
>>> > > >
>>> > > >
>>> > > > > Miguel, do you think merging
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>>
https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos
>>> > > > > t-cq
>>> > .repo.in
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > would solve this?
>>> > >
>>> > >
>>> > > I've split the patch Dominik mentions above in two, one of
them
>>> adding
>>> > > the nmstate / networkmanager copr repos - [3].
>>> > >
>>> > > Let's see if it fixes it.
>>> >
>>> > it fixes original issue, but OST still fails in
>>> > 098_ovirt_provider_ovn.use_ovn_provider:
>>> >
>>> >
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134
>>>
>>> I think Dominik was looking into this issue; +Dominik Holler please
>>> confirm.
>>>
>>> Let me know if you need any help Dominik.
>>>
>>
>>
>> Thanks.
>> The problem is that the hosts lost connection to storage:
>>
>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
>> :
>>
>> 2019-11-22 05:39:12,326-0500 DEBUG (jsonrpc/5) [common.commands] /usr/bin/taskset
--cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgs --config 'devices {
preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3
filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|",
"r|.*|"] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings --units b
--nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
(cwd None) (commands:153)
>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] Error checking
path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
>> Traceback (most recent call last):
>> File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line
499, in _pathChecked
>> delay = result.delay()
>> File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line
391, in delay
>> raise exception.MiscFileReadException(self.path, self.rc, self.err)
>> vdsm.storage.exception.MiscFileReadException: Internal file read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
>> 2019-11-22 05:39:12,416-0500 INFO (check/loop) [storage.Monitor] Domain
d10879c6-8de1-40ba-87fa-f447844eed2a became INVALID (monitor:472)
>>
>>
>> I failed to reproduce local to analyze this, I will try again, any hints
>> welcome.
>>
>>
>
>
>
https://gerrit.ovirt.org/#/c/104925/1/ shows that 008_basic_ui_sanity.py
> triggers the problem.
> Is there someone with knowledge about the basic_ui_sanity around?
>
How do you think it's related? By commenting out the ui sanity tests and
seeing OST with successful finish?
Looking at 6134 run you were discussing:
- timing of the ui sanity set-up [1]:
11:40:20 @ Run test: 008_basic_ui_sanity.py:
- timing of first encountered storage error [2]:
2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] Error
checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line
499, in _pathChecked
delay = result.delay()
File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391,
in delay
raise exception.MiscFileReadException(self.path, self.rc, self.err)
vdsm.storage.exception.MiscFileReadException: Internal file read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
Timezone difference aside, it seems to me that these storage errors
occured before doing anything ui-related.
I remember talking with Steven Rosenberg on IRC a couple of days ago about
some storage metadata issues and he said he got a response from Nir, that
"it's a known issue".
Nir, Amit, can you comment on this?
The error mentioned here is not vdsm error but warning about storage
accessibility. We sould convert the tracebacks to warning.
The reason for such issue can be misconfigured network (maybe network team
is testing negative flows?), or some issue in the NFS server.
One read timeout is not an issue. We have a real issue only if we have
consistent read timeouts or errors for couple of minutes, after that engine
can deactivate the storage domain or some hosts if only these hosts are
having trouble to access storage.
In OST we never expect such conditions since we don't test negative flows,
and we should have good connectivity with the vms running on the same host.
Nir
[1]
[2]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
>
Marcin, could you please take a look?
>
>
>
>> >
>>> > > [3] -
https://gerrit.ovirt.org/#/c/104897/
>>> > >
>>> > >
>>> > > > >
>>> > > > >
>>> > > > > >> Who installs this rpm in OST?
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > I do not understand the question.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >> > [...]
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> > See [2] for full error.
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> > Can someone please take a look?
>>> > > > > >> > Thanks
>>> > > > > >> > Vojta
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> > [1]
>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/
>>> > > > > >> > [2]
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact
>>> > > > > >> /
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> >
exported-artifacts/test_logs/basic-suite-master/
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> post-002_bootstrap.py/lago-
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________
>>> > > > > >> ____
>>> > > > > >> ________________________________>>
>>> > > > > >>
>>> > > > > >> > Devel mailing list -- devel(a)ovirt.org
>>> > > > > >> > To unsubscribe send an email to
devel-leave(a)ovirt.org
>>> > > > > >> > Privacy Statement:
>>>
https://www.ovirt.org/site/privacy-policy/
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> > oVirt Code of Conduct:
>>> > > > > >>
>>> > > > > >>
https://www.ovirt.org/community/about/community-guidelines/
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> > List Archives:
>>> > > > > >>
>>> > > > > >>
>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4K5N3VQ
>>> > > > > >> N26B
>>> > > > > >> L73K7D45A2IR7R3UMMM23/
>>> > > > > >> _______________________________________________
>>> > > > > >> Devel mailing list -- devel(a)ovirt.org
>>> > > > > >> To unsubscribe send an email to
devel-leave(a)ovirt.org
>>> > > > > >> Privacy Statement:
>>>
https://www.ovirt.org/site/privacy-policy/
>>> > > > > >> oVirt Code of Conduct:
>>> > > > > >>
https://www.ovirt.org/community/about/community-guidelines/
>>> > > > > >> List Archives:
>>> > > > > >>
>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JN7MNUZ
>>> > > > > >> N5K3
>>> > > > > >> NS5TGXFCILYES77KI5TZU/
>>> > > >
>>> > > >
>>> > >
>>> > > _______________________________________________
>>> > > Devel mailing list -- devel(a)ovirt.org
>>> > > To unsubscribe send an email to devel-leave(a)ovirt.org
>>> > > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>> > > oVirt Code of Conduct:
>>> > >
https://www.ovirt.org/community/about/community-guidelines/ List
>>> Archives:
>>> > >
>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UPJ5SEAV5Z65H
>>> > > 5BQ3SCHOYZX6JMTQPBW/
>>> >
>>>
>>>
--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.