On Mon, Nov 25, 2019 at 5:16 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
On Mon, Nov 25, 2019 at 6:05 PM Dominik Holler
<dholler(a)redhat.com> wrote:
>
>
>
> On Mon, Nov 25, 2019 at 4:50 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>> On Mon, Nov 25, 2019 at 11:00 AM Dominik Holler <dholler(a)redhat.com>
wrote:
>> >
>> >
>> >
>> > On Fri, Nov 22, 2019 at 8:57 PM Dominik Holler <dholler(a)redhat.com>
wrote:
>> >>
>> >>
>> >>
>> >> On Fri, Nov 22, 2019 at 5:54 PM Dominik Holler
<dholler(a)redhat.com>
wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Nov 22, 2019 at 5:48 PM Nir Soffer
<nsoffer(a)redhat.com>
wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Fri, Nov 22, 2019, 18:18 Marcin Sobczyk
<msobczyk(a)redhat.com>
wrote:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 11/22/19 4:54 PM, Martin Perina wrote:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Nov 22, 2019 at 4:43 PM Dominik Holler <
dholler(a)redhat.com> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Nov 22, 2019 at 12:17 PM Dominik Holler <
dholler(a)redhat.com> wrote:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Nov 22, 2019 at 12:00 PM Miguel Duarte de
Mora Barroso <
mdbarroso(a)redhat.com> wrote:
>> >>>>>>>>
>> >>>>>>>> On Fri, Nov 22, 2019 at 11:54 AM Vojtech
Juranek <
vjuranek(a)redhat.com> wrote:
>> >>>>>>>> >
>> >>>>>>>> > On pátek 22. listopadu 2019 9:56:56 CET
Miguel Duarte de
Mora Barroso wrote:
>> >>>>>>>> > > On Fri, Nov 22, 2019 at 9:49 AM
Vojtech Juranek <
vjuranek(a)redhat.com>
>> >>>>>>>> > > wrote:
>> >>>>>>>> > > >
>> >>>>>>>> > > >
>> >>>>>>>> > > > On pátek 22. listopadu 2019
9:41:26 CET Dominik Holler
wrote:
>> >>>>>>>> > > >
>> >>>>>>>> > > > > On Fri, Nov 22, 2019 at
8:40 AM Dominik Holler <
dholler(a)redhat.com>
>> >>>>>>>> > > > > wrote:
>> >>>>>>>> > > > >
>> >>>>>>>> > > > > > On Thu, Nov 21, 2019
at 10:54 PM Nir Soffer <
nsoffer(a)redhat.com>
>> >>>>>>>> > > > > > wrote:
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > >> On Thu, Nov 21,
2019 at 11:24 PM Vojtech Juranek
>> >>>>>>>> > > > > >>
<vjuranek(a)redhat.com>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> wrote:
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > Hi,
>> >>>>>>>> > > > > >> > OST fails
(see e.g. [1]) in
002_bootstrap.check_update_host. It
>> >>>>>>>> > > > > >> > fails
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> with
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > FAILED!
=> {"changed": false, "failures": [],
"msg": "Depsolve
>> >>>>>>>> > > > > >> > Error
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> occured:
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > \n Problem 1:
cannot install the best update
candidate for package
>> >>>>>>>> > > > > >> > vdsm-
>> >>>>>>>> > > > > >> >
network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n -
nothing provides
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> nmstate
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > needed by
vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>> >>>>>>>> > > > > >> > Problem 2:
>> >>>>>>>> > > > > >> > package
vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch requires
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> vdsm-network
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > =
4.40.0-1271.git524e08c8a.el8, but none of the
providers can be
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> installed\n
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > - cannot
install the best update candidate for
package vdsm-
>> >>>>>>>> > > > > >> >
python-4.40.0-1236.git63ea8cb8b.el8.noarch\n -
nothing provides
>> >>>>>>>> > > > > >> > nmstate
>> >>>>>>>> > > > > >> > needed by
vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> nmstate should be
provided by copr repo enabled by
>> >>>>>>>> > > > > >>
ovirt-release-master.
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > > I re-triggered as
>> >>>>>>>> > > > > >
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131
>> >>>>>>>> > > > > > maybe
>> >>>>>>>> > > > > >
https://gerrit.ovirt.org/#/c/104825/
>> >>>>>>>> > > > > > was missing
>> >>>>>>>> > > > >
>> >>>>>>>> > > > >
>> >>>>>>>> > > > >
>> >>>>>>>> > > > > Looks like
>> >>>>>>>> > > > >
https://gerrit.ovirt.org/#/c/104825/ is ignored by
OST.
>> >>>>>>>> > > >
>> >>>>>>>> > > >
>> >>>>>>>> > > >
>> >>>>>>>> > > > maybe not. You re-triggered with
[1], which really
missed this patch.
>> >>>>>>>> > > > I did a rebase and now running
with this patch in build
#6132 [2]. Let's
>> >>>>>>>> > > > wait
>> >>>>>>>> > for it to see if gerrit #104825 helps.
>> >>>>>>>> > > >
>> >>>>>>>> > > >
>> >>>>>>>> > > >
>> >>>>>>>> > > > [1]
https://jenkins.ovirt.org/job/standard-manual-runner/909/
>> >>>>>>>> > > > [2]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/
>> >>>>>>>> > > >
>> >>>>>>>> > > >
>> >>>>>>>> > > >
>> >>>>>>>> > > > > Miguel, do you think
merging
>> >>>>>>>> > > > >
>> >>>>>>>> > > > >
>> >>>>>>>> > > > >
>> >>>>>>>> > > > >
https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos
>> >>>>>>>> > > > > t-cq
>> >>>>>>>> > .repo.in
>> >>>>>>>> > > > >
>> >>>>>>>> > > > >
>> >>>>>>>> > > > >
>> >>>>>>>> > > > > would solve this?
>> >>>>>>>> > >
>> >>>>>>>> > >
>> >>>>>>>> > > I've split the patch Dominik
mentions above in two, one of
them adding
>> >>>>>>>> > > the nmstate / networkmanager copr
repos - [3].
>> >>>>>>>> > >
>> >>>>>>>> > > Let's see if it fixes it.
>> >>>>>>>> >
>> >>>>>>>> > it fixes original issue, but OST still
fails in
>> >>>>>>>> > 098_ovirt_provider_ovn.use_ovn_provider:
>> >>>>>>>> >
>> >>>>>>>> >
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134
>> >>>>>>>>
>> >>>>>>>> I think Dominik was looking into this issue;
+Dominik Holler
please confirm.
>> >>>>>>>>
>> >>>>>>>> Let me know if you need any help Dominik.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Thanks.
>> >>>>>>> The problem is that the hosts lost connection to
storage:
>> >>>>>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
:
>> >>>>>>>
>> >>>>>>> 2019-11-22 05:39:12,326-0500 DEBUG (jsonrpc/5)
[common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n
/sbin/lvm vgs --config 'devices { preferred_names=["^/dev/mapper/"]
ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3
filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|",
"r|.*|"] } global { locking_type=1 prioritise_write_locks=1
wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }'
--noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
(cwd None) (commands:153)
>> >>>>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop)
[storage.Monitor] Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
>> >>>>>>> Traceback (most recent call last):
>> >>>>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in
_pathChecked
>> >>>>>>> delay = result.delay()
>> >>>>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in delay
>> >>>>>>> raise
exception.MiscFileReadException(self.path, self.rc,
self.err)
>> >>>>>>> vdsm.storage.exception.MiscFileReadException:
Internal file
read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
>> >>>>>>> 2019-11-22 05:39:12,416-0500 INFO (check/loop)
[storage.Monitor] Domain d10879c6-8de1-40ba-87fa-f447844eed2a became
INVALID (monitor:472)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> I failed to reproduce local to analyze this, I will
try again,
any hints welcome.
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
https://gerrit.ovirt.org/#/c/104925/1/ shows that
008_basic_ui_sanity.py triggers the problem.
>> >>>>>> Is there someone with knowledge about the
basic_ui_sanity around?
>> >>>>>
>> >>>>> How do you think it's related? By commenting out the ui
sanity
tests and seeing OST with successful finish?
>> >>>>>
>> >>>>> Looking at 6134 run you were discussing:
>> >>>>>
>> >>>>> - timing of the ui sanity set-up [1]:
>> >>>>>
>> >>>>> 11:40:20 @ Run test: 008_basic_ui_sanity.py:
>> >>>>>
>> >>>>> - timing of first encountered storage error [2]:
>> >>>>>
>> >>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop)
[storage.Monitor]
Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
>> >>>>> Traceback (most recent call last):
>> >>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in
_pathChecked
>> >>>>> delay = result.delay()
>> >>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/check.py",
line 391, in delay
>> >>>>> raise exception.MiscFileReadException(self.path,
self.rc,
self.err)
>> >>>>> vdsm.storage.exception.MiscFileReadException: Internal file
read
failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
>> >>>>>
>> >>>>> Timezone difference aside, it seems to me that these
storage
errors occured before doing anything ui-related.
>> >>
>> >>
>> >>
>> >> You are right, a time.sleep(8*60) in
>> >>
https://gerrit.ovirt.org/#/c/104925/2
>> >> has triggers the issue the same way.
>>
>> So this is a test issues, assuming that the UI tests can complete in
>> less than 8 minutes?
>>
>
> To my eyes this looks like storage is just stop working after some time.
>
>>
>> >>
>> >
>> > Nir or Steve, can you please confirm that this is a storage problem?
>>
>> Why do you think we have a storage problem?
>>
>
> I understand from the posted log snippets that they say that the storage
is not accessible anymore,
No, so far one read timeout was reported, this does not mean storage
is not available anymore.
It can be temporary issue that does not harm anything.
> while the host is still responsive.
> This might be triggered by something outside storage, e.g. the network
providing the storage stopped working,
> But I think a possible next step in analysing this issue would be to
find the reason why storage is not happy.
Sounds like there was a miscommunication in this thread.
I try to address all of your points, please let me know if something is
missing or not clearly expressed.
First step is to understand which test fails,
098_ovirt_provider_ovn.use_ovn_provider
and why. This can be done by the owner of the test,
The test was added by the network team.
understanding what the test does
The test tries to add a vNIC.
and what is the expected system behavior.
It is expected that adding a vNIC works, because the VM should be up.
If the owner of the test thinks that the test failed because of a
storage
issue
I am not sure who is the owner, but I do.
someone from storage can look at this.
Thanks, I would appreciate this.
But the fact that adding long sleep reproduce the issue means it is
not
related
in any way to storage.
Nir
>
>>
>> >
>> >>
>> >>
>> >>>>>
>> >>>>> I remember talking with Steven Rosenberg on IRC a couple of
days
ago about some storage metadata issues and he said he got a response from
Nir, that "it's a known issue".
>> >>>>>
>> >>>>> Nir, Amit, can you comment on this?
>> >>>>
>> >>>>
>> >>>> The error mentioned here is not vdsm error but warning about
storage accessibility. We sould convert the tracebacks to warning.
>> >>>>
>> >>>> The reason for such issue can be misconfigured network (maybe
network team is testing negative flows?),
>> >>>
>> >>>
>> >>> No.
>> >>>
>> >>>>
>> >>>> or some issue in the NFS server.
>> >>>>
>> >>>
>> >>> Only hint I found is
>> >>> "Exiting Time2Retain handler because
session_reinstatement=1"
>> >>> but I have no idea what this means or if this is relevant at all.
>> >>>
>> >>>>
>> >>>> One read timeout is not an issue. We have a real issue only if
we
have consistent read timeouts or errors for couple of minutes, after that
engine can deactivate the storage domain or some hosts if only these hosts
are having trouble to access storage.
>> >>>>
>> >>>> In OST we never expect such conditions since we don't test
negative flows, and we should have good connectivity with the vms running
on the same host.
>> >>>>
>> >>>
>> >>> Ack, this seems to be the problem.
>> >>>
>> >>>>
>> >>>> Nir
>> >>>>
>> >>>>
>> >>>>> [1]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/console
>> >>>>> [2]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> Marcin, could you please take a look?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>>>
>> >>>>>>>> >
>> >>>>>>>> > > [3] -
https://gerrit.ovirt.org/#/c/104897/
>> >>>>>>>> > >
>> >>>>>>>> > >
>> >>>>>>>> > > > >
>> >>>>>>>> > > > >
>> >>>>>>>> > > > > >> Who installs this
rpm in OST?
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > > I do not understand
the question.
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > >
>> >>>>>>>> > > > > >> > [...]
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> > See [2] for
full error.
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> > Can someone
please take a look?
>> >>>>>>>> > > > > >> > Thanks
>> >>>>>>>> > > > > >> > Vojta
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> >
>> >>>>>>>> > > > > >> > [1]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/
>> >>>>>>>> > > > > >> > [2]
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact
>> >>>>>>>> > > > > >> /
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> >
exported-artifacts/test_logs/basic-suite-master/
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
post-002_bootstrap.py/lago-
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________
>> >>>>>>>> > > > > >> ____
>> >>>>>>>> > > > > >>
________________________________>>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > Devel mailing
list -- devel(a)ovirt.org
>> >>>>>>>> > > > > >> > To
unsubscribe send an email to
devel-leave(a)ovirt.org
>> >>>>>>>> > > > > >> > Privacy
Statement:
https://www.ovirt.org/site/privacy-policy/
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > oVirt Code of
Conduct:
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
https://www.ovirt.org/community/about/community-guidelines/
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >> > List
Archives:
>> >>>>>>>> > > > > >>
>> >>>>>>>> > > > > >>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4K5N3VQ
>> >>>>>>>> > > > > >> N26B
>> >>>>>>>> > > > > >>
L73K7D45A2IR7R3UMMM23/
>> >>>>>>>> > > > > >>
_______________________________________________
>> >>>>>>>> > > > > >> Devel mailing list
-- devel(a)ovirt.org
>> >>>>>>>> > > > > >> To unsubscribe
send an email to
devel-leave(a)ovirt.org
>> >>>>>>>> > > > > >> Privacy
Statement:
https://www.ovirt.org/site/privacy-policy/
>> >>>>>>>> > > > > >> oVirt Code of
Conduct:
>> >>>>>>>> > > > > >>
https://www.ovirt.org/community/about/community-guidelines/
>> >>>>>>>> > > > > >> List Archives:
>> >>>>>>>> > > > > >>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JN7MNUZ
>> >>>>>>>> > > > > >> N5K3
>> >>>>>>>> > > > > >>
NS5TGXFCILYES77KI5TZU/
>> >>>>>>>> > > >
>> >>>>>>>> > > >
>> >>>>>>>> > >
>> >>>>>>>> > >
_______________________________________________
>> >>>>>>>> > > Devel mailing list --
devel(a)ovirt.org
>> >>>>>>>> > > To unsubscribe send an email to
devel-leave(a)ovirt.org
>> >>>>>>>> > > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>> >>>>>>>> > > oVirt Code of Conduct:
>> >>>>>>>> > >
https://www.ovirt.org/community/about/community-guidelines/ List Archives:
>> >>>>>>>> > >
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UPJ5SEAV5Z65H
>> >>>>>>>> > > 5BQ3SCHOYZX6JMTQPBW/
>> >>>>>>>> >
>> >>>>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Martin Perina
>> >>>>> Manager, Software Engineering
>> >>>>> Red Hat Czech s.r.o.
>> >>>>>
>> >>>>>
>>