On Fri, Nov 22, 2019 at 8:57 PM Dominik Holler <dholler(a)redhat.com> wrote:
On Fri, Nov 22, 2019 at 5:54 PM Dominik Holler <dholler(a)redhat.com> wrote:
>
>
> On Fri, Nov 22, 2019 at 5:48 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
>>
>>
>> On Fri, Nov 22, 2019, 18:18 Marcin Sobczyk <msobczyk(a)redhat.com> wrote:
>>
>>>
>>>
>>> On 11/22/19 4:54 PM, Martin Perina wrote:
>>>
>>>
>>>
>>> On Fri, Nov 22, 2019 at 4:43 PM Dominik Holler <dholler(a)redhat.com>
>>> wrote:
>>>
>>>>
>>>> On Fri, Nov 22, 2019 at 12:17 PM Dominik Holler
<dholler(a)redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 22, 2019 at 12:00 PM Miguel Duarte de Mora Barroso <
>>>>> mdbarroso(a)redhat.com> wrote:
>>>>>
>>>>>> On Fri, Nov 22, 2019 at 11:54 AM Vojtech Juranek <
>>>>>> vjuranek(a)redhat.com> wrote:
>>>>>> >
>>>>>> > On pátek 22. listopadu 2019 9:56:56 CET Miguel Duarte de
Mora
>>>>>> Barroso wrote:
>>>>>> > > On Fri, Nov 22, 2019 at 9:49 AM Vojtech Juranek <
>>>>>> vjuranek(a)redhat.com>
>>>>>> > > wrote:
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > On pátek 22. listopadu 2019 9:41:26 CET Dominik
Holler wrote:
>>>>>> > > >
>>>>>> > > > > On Fri, Nov 22, 2019 at 8:40 AM Dominik
Holler <
>>>>>> dholler(a)redhat.com>
>>>>>> > > > > wrote:
>>>>>> > > > >
>>>>>> > > > > > On Thu, Nov 21, 2019 at 10:54 PM Nir
Soffer <
>>>>>> nsoffer(a)redhat.com>
>>>>>> > > > > > wrote:
>>>>>> > > > > >
>>>>>> > > > > >> On Thu, Nov 21, 2019 at 11:24 PM
Vojtech Juranek
>>>>>> > > > > >> <vjuranek(a)redhat.com>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> wrote:
>>>>>> > > > > >>
>>>>>> > > > > >> > Hi,
>>>>>> > > > > >> > OST fails (see e.g. [1]) in
>>>>>> 002_bootstrap.check_update_host. It
>>>>>> > > > > >> > fails
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> with
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> > FAILED! =>
{"changed": false, "failures": [], "msg":
>>>>>> "Depsolve
>>>>>> > > > > >> > Error
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> occured:
>>>>>> > > > > >>
>>>>>> > > > > >> > \n Problem 1: cannot install
the best update candidate
>>>>>> for package
>>>>>> > > > > >> > vdsm-
>>>>>> > > > > >> >
network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n -
>>>>>> nothing provides
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> nmstate
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> > needed by
>>>>>> vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>>>>>> > > > > >> > Problem 2:
>>>>>> > > > > >> > package
vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch
>>>>>> requires
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> vdsm-network
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> > = 4.40.0-1271.git524e08c8a.el8,
but none of the
>>>>>> providers can be
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> installed\n
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> > - cannot install the best
update candidate for package
>>>>>> vdsm-
>>>>>> > > > > >> >
python-4.40.0-1236.git63ea8cb8b.el8.noarch\n - nothing
>>>>>> provides
>>>>>> > > > > >> > nmstate
>>>>>> > > > > >> > needed by
>>>>>> vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> nmstate should be provided by copr
repo enabled by
>>>>>> > > > > >> ovirt-release-master.
>>>>>> > > > > >
>>>>>> > > > > >
>>>>>> > > > > >
>>>>>> > > > > > I re-triggered as
>>>>>> > > > > >
>>>>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131
>>>>>> > > > > > maybe
>>>>>> > > > > >
https://gerrit.ovirt.org/#/c/104825/
>>>>>> > > > > > was missing
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > > Looks like
>>>>>> > > > >
https://gerrit.ovirt.org/#/c/104825/ is
ignored by OST.
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > maybe not. You re-triggered with [1], which really
missed this
>>>>>> patch.
>>>>>> > > > I did a rebase and now running with this patch in
build #6132
>>>>>> [2]. Let's
>>>>>> > > > wait
>>>>>> > for it to see if gerrit #104825 helps.
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > [1]
https://jenkins.ovirt.org/job/standard-manual-runner/909/
>>>>>> > > > [2]
>>>>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > > Miguel, do you think merging
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > >
>>>>>>
https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos
>>>>>> > > > > t-cq
>>>>>> > .repo.in
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > > would solve this?
>>>>>> > >
>>>>>> > >
>>>>>> > > I've split the patch Dominik mentions above in two,
one of them
>>>>>> adding
>>>>>> > > the nmstate / networkmanager copr repos - [3].
>>>>>> > >
>>>>>> > > Let's see if it fixes it.
>>>>>> >
>>>>>> > it fixes original issue, but OST still fails in
>>>>>> > 098_ovirt_provider_ovn.use_ovn_provider:
>>>>>> >
>>>>>> >
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134
>>>>>>
>>>>>> I think Dominik was looking into this issue; +Dominik Holler
please
>>>>>> confirm.
>>>>>>
>>>>>> Let me know if you need any help Dominik.
>>>>>>
>>>>>
>>>>>
>>>>> Thanks.
>>>>> The problem is that the hosts lost connection to storage:
>>>>>
>>>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
>>>>> :
>>>>>
>>>>> 2019-11-22 05:39:12,326-0500 DEBUG (jsonrpc/5) [common.commands]
/usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgs --config 'devices {
preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3
filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|",
"r|.*|"] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings --units b
--nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
(cwd None) (commands:153)
>>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor]
Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
>>>>> Traceback (most recent call last):
>>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in
_pathChecked
>>>>> delay = result.delay()
>>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in delay
>>>>> raise exception.MiscFileReadException(self.path, self.rc,
self.err)
>>>>> vdsm.storage.exception.MiscFileReadException: Internal file read
failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
>>>>> 2019-11-22 05:39:12,416-0500 INFO (check/loop) [storage.Monitor]
Domain d10879c6-8de1-40ba-87fa-f447844eed2a became INVALID (monitor:472)
>>>>>
>>>>>
>>>>> I failed to reproduce local to analyze this, I will try again, any
>>>>> hints welcome.
>>>>>
>>>>>
>>>>
>>>>
>>>>
https://gerrit.ovirt.org/#/c/104925/1/ shows that
>>>> 008_basic_ui_sanity.py triggers the problem.
>>>> Is there someone with knowledge about the basic_ui_sanity around?
>>>>
>>> How do you think it's related? By commenting out the ui sanity tests
>>> and seeing OST with successful finish?
>>>
>>> Looking at 6134 run you were discussing:
>>>
>>> - timing of the ui sanity set-up [1]:
>>>
>>> 11:40:20 @ Run test: 008_basic_ui_sanity.py:
>>>
>>> - timing of first encountered storage error [2]:
>>>
>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] Error
>>> checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
>>> (monitor:501)
>>> Traceback (most recent call last):
>>> File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py",
line
>>> 499, in _pathChecked
>>> delay = result.delay()
>>> File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py",
line
>>> 391, in delay
>>> raise exception.MiscFileReadException(self.path, self.rc, self.err)
>>> vdsm.storage.exception.MiscFileReadException: Internal file read
>>> failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
>>> 1, 'Read timeout')
>>>
>>> Timezone difference aside, it seems to me that these storage errors
>>> occured before doing anything ui-related.
>>>
>>
You are right, a time.sleep(8*60) in
https://gerrit.ovirt.org/#/c/104925/2
has triggers the issue the same way.
> I remember talking with Steven Rosenberg on IRC a couple of days ago
>>> about some storage metadata issues and he said he got a response from Nir,
>>> that "it's a known issue".
>>>
>>> Nir, Amit, can you comment on this?
>>>
>>
>> The error mentioned here is not vdsm error but warning about storage
>> accessibility. We sould convert the tracebacks to warning.
>>
>> The reason for such issue can be misconfigured network (maybe network
>> team is testing negative flows?),
>>
>
> No.
>
>
>> or some issue in the NFS server.
>>
>>
> Only hint I found is
> "Exiting Time2Retain handler because session_reinstatement=1"
> but I have no idea what this means or if this is relevant at all.
>
>
>> One read timeout is not an issue. We have a real issue only if we have
>> consistent read timeouts or errors for couple of minutes, after that engine
>> can deactivate the storage domain or some hosts if only these hosts are
>> having trouble to access storage.
>>
>> In OST we never expect such conditions since we don't test negative
>> flows, and we should have good connectivity with the vms running on the
>> same host.
>>
>>
> Ack, this seems to be the problem.
>
>
>> Nir
>>
>>
>> [1]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/console
>>> [2]
>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
>>>
>>>
>>>>
>>> Marcin, could you please take a look?
>>>
>>>>
>>>>
>>>>
>>>>> >
>>>>>> > > [3] -
https://gerrit.ovirt.org/#/c/104897/
>>>>>> > >
>>>>>> > >
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > > >> Who installs this rpm in OST?
>>>>>> > > > > >
>>>>>> > > > > >
>>>>>> > > > > >
>>>>>> > > > > > I do not understand the question.
>>>>>> > > > > >
>>>>>> > > > > >
>>>>>> > > > > >
>>>>>> > > > > >> > [...]
>>>>>> > > > > >> >
>>>>>> > > > > >> >
>>>>>> > > > > >> >
>>>>>> > > > > >> > See [2] for full error.
>>>>>> > > > > >> >
>>>>>> > > > > >> >
>>>>>> > > > > >> >
>>>>>> > > > > >> > Can someone please take a
look?
>>>>>> > > > > >> > Thanks
>>>>>> > > > > >> > Vojta
>>>>>> > > > > >> >
>>>>>> > > > > >> >
>>>>>> > > > > >> >
>>>>>> > > > > >> > [1]
>>>>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/
>>>>>> > > > > >> > [2]
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact
>>>>>> > > > > >> /
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> >
exported-artifacts/test_logs/basic-suite-master/
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> post-002_bootstrap.py/lago-
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>>
basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________
>>>>>> > > > > >> ____
>>>>>> > > > > >>
________________________________>>
>>>>>> > > > > >>
>>>>>> > > > > >> > Devel mailing list --
devel(a)ovirt.org
>>>>>> > > > > >> > To unsubscribe send an email to
devel-leave(a)ovirt.org
>>>>>> > > > > >> > Privacy Statement:
>>>>>>
https://www.ovirt.org/site/privacy-policy/
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> > oVirt Code of Conduct:
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>>
https://www.ovirt.org/community/about/community-guidelines/
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>> > > > > >> > List Archives:
>>>>>> > > > > >>
>>>>>> > > > > >>
>>>>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4K5N3VQ
>>>>>> > > > > >> N26B
>>>>>> > > > > >> L73K7D45A2IR7R3UMMM23/
>>>>>> > > > > >>
_______________________________________________
>>>>>> > > > > >> Devel mailing list --
devel(a)ovirt.org
>>>>>> > > > > >> To unsubscribe send an email to
devel-leave(a)ovirt.org
>>>>>> > > > > >> Privacy Statement:
>>>>>>
https://www.ovirt.org/site/privacy-policy/
>>>>>> > > > > >> oVirt Code of Conduct:
>>>>>> > > > > >>
>>>>>>
https://www.ovirt.org/community/about/community-guidelines/
>>>>>> > > > > >> List Archives:
>>>>>> > > > > >>
>>>>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JN7MNUZ
>>>>>> > > > > >> N5K3
>>>>>> > > > > >> NS5TGXFCILYES77KI5TZU/
>>>>>> > > >
>>>>>> > > >
>>>>>> > >
>>>>>> > > _______________________________________________
>>>>>> > > Devel mailing list -- devel(a)ovirt.org
>>>>>> > > To unsubscribe send an email to devel-leave(a)ovirt.org
>>>>>> > > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>> > > oVirt Code of Conduct:
>>>>>> > >
https://www.ovirt.org/community/about/community-guidelines/
>>>>>> List Archives:
>>>>>> > >
>>>>>>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UPJ5SEAV5Z65H
>>>>>> > > 5BQ3SCHOYZX6JMTQPBW/
>>>>>> >
>>>>>>
>>>>>>
>>>
>>> --
>>> Martin Perina
>>> Manager, Software Engineering
>>> Red Hat Czech s.r.o.
>>>
>>>
>>>