On 11/22/19 4:54 PM, Martin Perina
wrote:
On Fri, Nov
22, 2019 at 11:54 AM Vojtech Juranek <vjuranek@redhat.com>
wrote:
>
> On pátek 22. listopadu 2019 9:56:56 CET
Miguel Duarte de Mora Barroso wrote:
> > On Fri, Nov 22, 2019 at 9:49 AM
Vojtech Juranek <vjuranek@redhat.com>
> > wrote:
> > >
> > >
> > > On pátek 22. listopadu 2019
9:41:26 CET Dominik Holler wrote:
> > >
> > > > On Fri, Nov 22, 2019 at 8:40
AM Dominik Holler <dholler@redhat.com>
> > > > wrote:
> > > >
> > > > > On Thu, Nov 21, 2019 at
10:54 PM Nir Soffer <nsoffer@redhat.com>
> > > > > wrote:
> > > > >
> > > > >> On Thu, Nov 21,
2019 at 11:24 PM Vojtech Juranek
> > > > >> <vjuranek@redhat.com>
> > > > >>
> > > > >>
> > > > >>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi,
> > > > >> > OST fails (see
e.g. [1]) in 002_bootstrap.check_update_host. It
> > > > >> > fails
> > > > >>
> > > > >>
> > > > >>
> > > > >> with
> > > > >>
> > > > >>
> > > > >>
> > > > >> > FAILED! =>
{"changed": false, "failures": [], "msg":
"Depsolve
> > > > >> > Error
> > > > >>
> > > > >>
> > > > >>
> > > > >> occured:
> > > > >>
> > > > >> > \n Problem 1:
cannot install the best update candidate for
package
> > > > >> > vdsm-
> > > > >> >
network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n -
nothing provides
> > > > >>
> > > > >>
> > > > >>
> > > > >> nmstate
> > > > >>
> > > > >>
> > > > >>
> > > > >> > needed by
vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
> > > > >> > Problem 2:
> > > > >> > package
vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch
requires
> > > > >>
> > > > >>
> > > > >>
> > > > >> vdsm-network
> > > > >>
> > > > >>
> > > > >>
> > > > >> > =
4.40.0-1271.git524e08c8a.el8, but none of the
providers can be
> > > > >>
> > > > >>
> > > > >>
> > > > >> installed\n
> > > > >>
> > > > >>
> > > > >>
> > > > >> > - cannot
install the best update candidate for package
vdsm-
> > > > >> >
python-4.40.0-1236.git63ea8cb8b.el8.noarch\n -
nothing provides
> > > > >> > nmstate
> > > > >> > needed by
vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
> > > > >>
> > > > >>
> > > > >>
> > > > >> nmstate should be
provided by copr repo enabled by
> > > > >>
ovirt-release-master.
> > > > >
> > > > >
> > > > >
> > > > > I re-triggered as
> > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131
> > > > > maybe
> > > > > https://gerrit.ovirt.org/#/c/104825/
> > > > > was missing
> > > >
> > > >
> > > >
> > > > Looks like
> > > > https://gerrit.ovirt.org/#/c/104825/
is ignored by OST.
> > >
> > >
> > >
> > > maybe not. You re-triggered with
[1], which really missed this patch.
> > > I did a rebase and now running
with this patch in build #6132 [2]. Let's
> > > wait
> for it to see if gerrit #104825 helps.
> > >
> > >
> > >
> > > [1] https://jenkins.ovirt.org/job/standard-manual-runner/909/
> > > [2] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/
> > >
> > >
> > >
> > > > Miguel, do you think merging
> > > >
> > > >
> > > >
> > > > https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos
> > > > t-cq
> .repo.in
> > > >
> > > >
> > > >
> > > > would solve this?
> >
> >
> > I've split the patch Dominik mentions
above in two, one of them adding
> > the nmstate / networkmanager copr
repos - [3].
> >
> > Let's see if it fixes it.
>
> it fixes original issue, but OST still
fails in
> 098_ovirt_provider_ovn.use_ovn_provider:
>
> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134
I think Dominik was looking into this issue;
+Dominik Holler please confirm.
Let me know if you need any help Dominik.
Thanks.
The problem is that the hosts lost connection
to storage:
2019-11-22 05:39:12,326-0500 DEBUG (jsonrpc/5) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgs --config 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|", "r|.*|"] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name (cwd None) (commands:153)
2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata (monitor:501)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in _pathChecked
delay = result.delay()
File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in delay
raise exception.MiscFileReadException(self.path, self.rc, self.err)
vdsm.storage.exception.MiscFileReadException: Internal file read failure: ('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata', 1, 'Read timeout')
2019-11-22 05:39:12,416-0500 INFO (check/loop) [storage.Monitor] Domain d10879c6-8de1-40ba-87fa-f447844eed2a became INVALID (monitor:472)
I failed to reproduce local to analyze this,
I will try again, any hints welcome.
Is there someone with knowledge about the
basic_ui_sanity around?
How do you think it's related? By commenting out the ui sanity tests
and seeing OST with successful finish?
Looking at 6134 run you were discussing:
- timing of the ui sanity set-up [1]:
11:40:20 @ Run test: 008_basic_ui_sanity.py:
- timing of first encountered storage error [2]:
2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor]
Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py",
line 499, in _pathChecked
delay = result.delay()
File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py",
line 391, in delay
raise exception.MiscFileReadException(self.path, self.rc,
self.err)
vdsm.storage.exception.MiscFileReadException: Internal file read
failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
Timezone difference aside, it seems to me that these storage errors
occured before doing anything ui-related.
I remember talking with Steven Rosenberg on IRC a couple of days ago
about some storage metadata issues and he said he got a response
from Nir, that "it's a known issue".
Nir, Amit, can you comment on this?