On Mon, Nov 25, 2019 at 7:12 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
> On Mon, Nov 25, 2019 at 7:15 PM Dominik Holler <dholler(a)redhat.com>
> wrote:
> >
> >
> >
> > On Mon, Nov 25, 2019 at 6:03 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
> >>
> >> On Mon, Nov 25, 2019 at 6:48 PM Dominik Holler <dholler(a)redhat.com>
> wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Nov 25, 2019 at 5:16 PM Nir Soffer <nsoffer(a)redhat.com>
> wrote:
> >> >>
> >> >> On Mon, Nov 25, 2019 at 6:05 PM Dominik Holler
<dholler(a)redhat.com>
> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Nov 25, 2019 at 4:50 PM Nir Soffer
<nsoffer(a)redhat.com>
> wrote:
> >> >> >>
> >> >> >> On Mon, Nov 25, 2019 at 11:00 AM Dominik Holler <
> dholler(a)redhat.com> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Fri, Nov 22, 2019 at 8:57 PM Dominik Holler <
> dholler(a)redhat.com> wrote:
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Fri, Nov 22, 2019 at 5:54 PM Dominik Holler
<
> dholler(a)redhat.com> wrote:
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> On Fri, Nov 22, 2019 at 5:48 PM Nir Soffer
<
> nsoffer(a)redhat.com> wrote:
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On Fri, Nov 22, 2019, 18:18 Marcin
Sobczyk <
> msobczyk(a)redhat.com> wrote:
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>> On 11/22/19 4:54 PM, Martin Perina
wrote:
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>> On Fri, Nov 22, 2019 at 4:43 PM
Dominik Holler <
> dholler(a)redhat.com> wrote:
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>> On Fri, Nov 22, 2019 at 12:17 PM
Dominik Holler <
> dholler(a)redhat.com> wrote:
> >> >> >> >>>>>>>
> >> >> >> >>>>>>>
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> On Fri, Nov 22, 2019 at 12:00
PM Miguel Duarte de Mora
> Barroso <mdbarroso(a)redhat.com> wrote:
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> On Fri, Nov 22, 2019 at
11:54 AM Vojtech Juranek <
> vjuranek(a)redhat.com> wrote:
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> > On pátek 22.
listopadu 2019 9:56:56 CET Miguel Duarte
> de Mora Barroso wrote:
> >> >> >> >>>>>>>> > > On Fri, Nov 22,
2019 at 9:49 AM Vojtech Juranek <
> vjuranek(a)redhat.com>
> >> >> >> >>>>>>>> > > wrote:
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > > On pátek
22. listopadu 2019 9:41:26 CET Dominik
> Holler wrote:
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > > > On
Fri, Nov 22, 2019 at 8:40 AM Dominik Holler <
> dholler(a)redhat.com>
> >> >> >> >>>>>>>> > > > >
wrote:
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > > >
On Thu, Nov 21, 2019 at 10:54 PM Nir Soffer <
> nsoffer(a)redhat.com>
> >> >> >> >>>>>>>> > > > > >
wrote:
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > >
>> On Thu, Nov 21, 2019 at 11:24 PM Vojtech
> Juranek
> >> >> >> >>>>>>>> > > > >
>> <vjuranek(a)redhat.com>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> wrote:
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > Hi,
> >> >> >> >>>>>>>> > > > >
>> > OST fails (see e.g. [1]) in
> 002_bootstrap.check_update_host. It
> >> >> >> >>>>>>>> > > > >
>> > fails
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> with
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > FAILED! => {"changed": false, "failures":
> [], "msg": "Depsolve
> >> >> >> >>>>>>>> > > > >
>> > Error
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> occured:
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > \n Problem 1: cannot install the best
> update candidate for package
> >> >> >> >>>>>>>> > > > >
>> > vdsm-
> >> >> >> >>>>>>>> > > > >
>> >
> network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n - nothing provides
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> nmstate
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > needed by
> vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
> >> >> >> >>>>>>>> > > > >
>> > Problem 2:
> >> >> >> >>>>>>>> > > > >
>> > package
> vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch requires
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> vdsm-network
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > = 4.40.0-1271.git524e08c8a.el8, but none
> of the providers can be
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> installed\n
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > - cannot install the best update candidate
> for package vdsm-
> >> >> >> >>>>>>>> > > > >
>> >
> python-4.40.0-1236.git63ea8cb8b.el8.noarch\n - nothing provides
> >> >> >> >>>>>>>> > > > >
>> > nmstate
> >> >> >> >>>>>>>> > > > >
>> > needed by
> vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> nmstate should be provided by copr repo
> enabled by
> >> >> >> >>>>>>>> > > > >
>> ovirt-release-master.
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > > >
I re-triggered as
> >> >> >> >>>>>>>> > > > > >
>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131
> >> >> >> >>>>>>>> > > > > >
maybe
> >> >> >> >>>>>>>> > > > > >
https://gerrit.ovirt.org/#/c/104825/
> >> >> >> >>>>>>>> > > > > >
was missing
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > > Looks
like
> >> >> >> >>>>>>>> > > > >
https://gerrit.ovirt.org/#/c/104825/ is
> ignored by OST.
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > > maybe not.
You re-triggered with [1], which
> really missed this patch.
> >> >> >> >>>>>>>> > > > I did a
rebase and now running with this patch in
> build #6132 [2]. Let's
> >> >> >> >>>>>>>> > > > wait
> >> >> >> >>>>>>>> > for it to see if
gerrit #104825 helps.
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > > [1]
>
https://jenkins.ovirt.org/job/standard-manual-runner/909/
> >> >> >> >>>>>>>> > > > [2]
>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > > >
Miguel, do you think merging
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
>
https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos
> >> >> >> >>>>>>>> > > > > t-cq
> >> >> >> >>>>>>>> > .repo.in
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > > would
solve this?
> >> >> >> >>>>>>>> > >
> >> >> >> >>>>>>>> > >
> >> >> >> >>>>>>>> > > I've split
the patch Dominik mentions above in two,
> one of them adding
> >> >> >> >>>>>>>> > > the nmstate /
networkmanager copr repos - [3].
> >> >> >> >>>>>>>> > >
> >> >> >> >>>>>>>> > > Let's see
if it fixes it.
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> > it fixes original
issue, but OST still fails in
> >> >> >> >>>>>>>> >
098_ovirt_provider_ovn.use_ovn_provider:
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> >
>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> I think Dominik was
looking into this issue; +Dominik
> Holler please confirm.
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> Let me know if you need
any help Dominik.
> >> >> >> >>>>>>>
> >> >> >> >>>>>>>
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> Thanks.
> >> >> >> >>>>>>> The problem is that the hosts
lost connection to storage:
> >> >> >> >>>>>>>
>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
> :
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> 2019-11-22 05:39:12,326-0500
DEBUG (jsonrpc/5)
> [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n
> /sbin/lvm vgs --config 'devices { preferred_names=["^/dev/mapper/"]
> ignore_suspended_devices=1 write_cache_state=0
> disable_after_error_count=3
>
filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|",
> "r|.*|"] } global { locking_type=1 prioritise_write_locks=1
> wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }'
> --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o
>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> (cwd None) (commands:153)
> >> >> >> >>>>>>> 2019-11-22 05:39:12,415-0500
ERROR (check/loop)
> [storage.Monitor] Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
> (monitor:501)
> >> >> >> >>>>>>> Traceback (most recent call
last):
> >> >> >> >>>>>>> File
> "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in
> _pathChecked
> >> >> >> >>>>>>> delay = result.delay()
> >> >> >> >>>>>>> File
> "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in
delay
> >> >> >> >>>>>>> raise
exception.MiscFileReadException(self.path,
> self.rc, self.err)
> >> >> >> >>>>>>>
vdsm.storage.exception.MiscFileReadException: Internal
> file read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
> 1, 'Read timeout')
> >> >> >> >>>>>>> 2019-11-22 05:39:12,416-0500
INFO (check/loop)
> [storage.Monitor] Domain d10879c6-8de1-40ba-87fa-f447844eed2a became
> INVALID (monitor:472)
> >> >> >> >>>>>>>
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> I failed to reproduce local
to analyze this, I will try
> again, any hints welcome.
> >> >> >> >>>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>
https://gerrit.ovirt.org/#/c/104925/1/ shows that
> 008_basic_ui_sanity.py triggers the problem.
> >> >> >> >>>>>> Is there someone with knowledge
about the basic_ui_sanity
> around?
> >> >> >> >>>>>
> >> >> >> >>>>> How do you think it's related? By
commenting out the ui
> sanity tests and seeing OST with successful finish?
> >> >> >> >>>>>
> >> >> >> >>>>> Looking at 6134 run you were
discussing:
> >> >> >> >>>>>
> >> >> >> >>>>> - timing of the ui sanity set-up
[1]:
> >> >> >> >>>>>
> >> >> >> >>>>> 11:40:20 @ Run test:
008_basic_ui_sanity.py:
> >> >> >> >>>>>
> >> >> >> >>>>> - timing of first encountered storage
error [2]:
> >> >> >> >>>>>
> >> >> >> >>>>> 2019-11-22 05:39:12,415-0500 ERROR
(check/loop)
> [storage.Monitor] Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
> (monitor:501)
> >> >> >> >>>>> Traceback (most recent call last):
> >> >> >> >>>>> File
> "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in
> _pathChecked
> >> >> >> >>>>> delay = result.delay()
> >> >> >> >>>>> File
> "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in
delay
> >> >> >> >>>>> raise
exception.MiscFileReadException(self.path,
> self.rc, self.err)
> >> >> >> >>>>>
vdsm.storage.exception.MiscFileReadException: Internal
> file read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
> 1, 'Read timeout')
> >> >> >> >>>>>
> >> >> >> >>>>> Timezone difference aside, it seems
to me that these
> storage errors occured before doing anything ui-related.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> You are right, a time.sleep(8*60) in
> >> >> >> >>
https://gerrit.ovirt.org/#/c/104925/2
> >> >> >> >> has triggers the issue the same way.
> >> >> >>
> >> >> >> So this is a test issues, assuming that the UI tests can
> complete in
> >> >> >> less than 8 minutes?
> >> >> >>
> >> >> >
> >> >> > To my eyes this looks like storage is just stop working after
> some time.
> >> >> >
> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> > Nir or Steve, can you please confirm that this is a
storage
> problem?
> >> >> >>
> >> >> >> Why do you think we have a storage problem?
> >> >> >>
> >> >> >
> >> >> > I understand from the posted log snippets that they say that
the
> storage is not accessible anymore,
> >> >>
> >> >> No, so far one read timeout was reported, this does not mean
storage
> >> >> is not available anymore.
> >> >> It can be temporary issue that does not harm anything.
> >> >>
> >> >> > while the host is still responsive.
> >> >> > This might be triggered by something outside storage, e.g.
the
> network providing the storage stopped working,
> >> >> > But I think a possible next step in analysing this issue would
be
> to find the reason why storage is not happy.
> >> >>
> >> >
> >> > Sounds like there was a miscommunication in this thread.
> >> > I try to address all of your points, please let me know if something
> is missing or not clearly expressed.
> >> >
> >> >>
> >> >> First step is to understand which test fails,
> >> >
> >> >
> >> > 098_ovirt_provider_ovn.use_ovn_provider
> >> >
> >> >>
> >> >> and why. This can be done by the owner of the test,
> >> >
> >> >
> >> > The test was added by the network team.
> >> >
> >> >>
> >> >> understanding what the test does
> >> >
> >> >
> >> > The test tries to add a vNIC.
> >> >
> >> >>
> >> >> and what is the expected system behavior.
> >> >>
> >> >
> >> > It is expected that adding a vNIC works, because the VM should be up.
> >>
> >> What was the actual behavior?
> >>
> >> >> If the owner of the test thinks that the test failed because of a
> storage issue
> >> >
> >> >
> >> > I am not sure who is the owner, but I do.
> >>
> >> Can you explain why how a vNIC failed because of a storage issue?
> >>
> >
> >
> > Test fails with:
> >
> > Cannot add a Network Interface when VM is not Down, Up or Image-Locked.
> >
> > engine.log says:
> > {"jsonrpc": "2.0", "method":
> "|virt|VM_status|308bd254-9af9-4570-98ea-822609550acf",
"params":
> {"308bd254-9af9-4570-98ea-822609550acf": {"status":
"Paused", "pauseCode":
> "EOTHER", "ioerror": {"alias":
"ua-953dd722-5e8b-4b24-bccd-a2a5d5befeb6",
> "name": "vda", "path":
>
"/rhev/data-center/38c691d4-8556-4882-8f04-a88dff5d0973/bcd1622c-876b-460c-95a7-d09536c42ffe/images/953dd722-5e8b-4b24-bccd-a2a5d5befeb6/dcb5fec4-f219-4d3f-986c-628b0d00b349"}},
> "notify_time": 4298388570}}
>
> So you think adding vNIC failed because the VM was paused?
>
>
Yes, because of the error message "Cannot add a Network Interface when VM
is not Down, Up or Image-Locked."
> > vdsm.log says:
> >
> > 2019-11-20 10:51:06,026-0500 ERROR (check/loop) [storage.Monitor] Error
> checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/bcd1622c-876b-460c-95a7-d09536c42ffe/dom_md/metadata
> (monitor:501)
> > Traceback (most recent call last):
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py",
line
> 499, in _pathChecked
> > delay = result.delay()
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line
> 391, in delay
> > raise exception.MiscFileReadException(self.path, self.rc, self.err)
> > vdsm.storage.exception.MiscFileReadException: Internal file read
> failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/bcd1622c-876b-460c-95a7-d09536c42ffe/dom_md/metadata',
> 1, 'Read timeout')
>
> Is this related to the paused vm?
>
>
The log entry : '{"status": "Paused", "pauseCode":
"EOTHER", "ioerror"'
makes me thinking this.
> You did not provide a timestamp for the engine event above.
>
>
I can't find last weeks log, maybe they are faded out already.
Please find more recent logs in
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/6492
> > ...
> >
> > 2019-11-20 10:51:56,249-0500 WARN (check/loop) [storage.check] Checker
>
'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/64daa060-1d83-46b9-b7e8-72a902e1134b/dom_md/metadata'
> is blocked for 60.00 seconds (check:282)
> > 2019-11-20 10:51:56,885-0500 ERROR (monitor/775b710) [storage.Monitor]
> Error checking domain 775b7102-7f2c-4eee-a4d0-a41b55451f7e (monitor:427)
> > Traceback (most recent call last):
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py",
line
> 408, in _checkDomainStatus
> > self.domain.selftest()
> > File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py",
line
> 710, in selftest
> > self.oop.os.statvfs(self.domaindir)
> > File
"/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py",
> line 242, in statvfs
> > return self._iop.statvfs(path)
> > File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line
> 479, in statvfs
> > resdict = self._sendCommand("statvfs", {"path": path},
self.timeout)
> > File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line
> 442, in _sendCommand
> > raise Timeout(os.strerror(errno.ETIMEDOUT))
> > ioprocess.Timeout: Connection timed out
>
> This show that storage was not accessible for 60 seconds (ioprocess
> uses 60 seconds timeout).
>
> 60 seconds timeout is bad. If we have leases on this storage domain
> (e.g. SPM lease) they will
> expire in 20 seconds after this event and the vdsm on the SPM host
> will be killed.
>
> Do we have network tests changing the network used by the NFS storage
> domain before this event?
>
>
No.
> What were the changes the network tests or code since OST was successful?
>
>
I am not aware of a change, which might be relevant.
Maybe the fact that the hosts are on CentOS 8, while the Engine (storage)
is on CentOS 7 is relevant.
Also the occurrence of this issue seems not to be 100% deterministic, I
guess because it is timing related.
The error is reproducible locally by running OST, and just keep the
environment alive after basic-suite-master succeeded.
After some time, the storage will become inaccessible.
When this happens, does the storage domain change its state and goes south,
or is it a temporary glitch that only halts VMs?
Does the host or storage server host logs have something suspicious at that
time (kernel messages, nfs logs)?
> >> Can you explain how adding 8 minutes sleep instead of the UI tests
> >> reproduced the issue?
> >>
> >
> >
> > This shows that the issue is not triggered by the UI test, but maybe by
> passing time.
>
> Do we run the ovn tests after the UI tests?
>
> >> >> someone from storage can look at this.
> >> >>
> >> >
> >> > Thanks, I would appreciate this.
> >> >
> >> >>
> >> >> But the fact that adding long sleep reproduce the issue means it
is
> not related
> >> >> in any way to storage.
> >> >>
> >> >> Nir
> >> >>
> >> >> >
> >> >> >>
> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>>>>
> >> >> >> >>>>> I remember talking with Steven
Rosenberg on IRC a couple
> of days ago about some storage metadata issues and he said he got a
> response from Nir, that "it's a known issue".
> >> >> >> >>>>>
> >> >> >> >>>>> Nir, Amit, can you comment on this?
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> The error mentioned here is not vdsm
error but warning
> about storage accessibility. We sould convert the tracebacks to warning.
> >> >> >> >>>>
> >> >> >> >>>> The reason for such issue can be
misconfigured network
> (maybe network team is testing negative flows?),
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> No.
> >> >> >> >>>
> >> >> >> >>>>
> >> >> >> >>>> or some issue in the NFS server.
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>> Only hint I found is
> >> >> >> >>> "Exiting Time2Retain handler because
session_reinstatement=1"
> >> >> >> >>> but I have no idea what this means or if this
is relevant at
> all.
> >> >> >> >>>
> >> >> >> >>>>
> >> >> >> >>>> One read timeout is not an issue. We have
a real issue only
> if we have consistent read timeouts or errors for couple of minutes, after
> that engine can deactivate the storage domain or some hosts if only these
> hosts are having trouble to access storage.
> >> >> >> >>>>
> >> >> >> >>>> In OST we never expect such conditions
since we don't test
> negative flows, and we should have good connectivity with the vms running
> on the same host.
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>> Ack, this seems to be the problem.
> >> >> >> >>>
> >> >> >> >>>>
> >> >> >> >>>> Nir
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>>> [1]
>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/console
> >> >> >> >>>>> [2]
>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>
> >> >> >> >>>>> Marcin, could you please take a
look?
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>>>
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>> > > [3] -
https://gerrit.ovirt.org/#/c/104897/
> >> >> >> >>>>>>>> > >
> >> >> >> >>>>>>>> > >
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
> >> >> >> >>>>>>>> > > > >
>> Who installs this rpm in OST?
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > > >
I do not understand the question.
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > > >
> >> >> >> >>>>>>>> > > > >
>> > [...]
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> > See [2] for full error.
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> > Can someone please take a look?
> >> >> >> >>>>>>>> > > > >
>> > Thanks
> >> >> >> >>>>>>>> > > > >
>> > Vojta
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> >
> >> >> >> >>>>>>>> > > > >
>> > [1]
>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/
> >> >> >> >>>>>>>> > > > >
>> > [2]
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact
> >> >> >> >>>>>>>> > > > >
>> /
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> >
> exported-artifacts/test_logs/basic-suite-master/
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> post-002_bootstrap.py/lago-
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________
> >> >> >> >>>>>>>> > > > >
>> ____
> >> >> >> >>>>>>>> > > > >
>> ________________________________>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > Devel mailing list -- devel(a)ovirt.org
> >> >> >> >>>>>>>> > > > >
>> > To unsubscribe send an email to
> devel-leave(a)ovirt.org
> >> >> >> >>>>>>>> > > > >
>> > Privacy Statement:
>
https://www.ovirt.org/site/privacy-policy/
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > oVirt Code of Conduct:
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
>
https://www.ovirt.org/community/about/community-guidelines/
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>> > List Archives:
> >> >> >> >>>>>>>> > > > >
>>
> >> >> >> >>>>>>>> > > > >
>>
>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4K5N3VQ
> >> >> >> >>>>>>>> > > > >
>> N26B
> >> >> >> >>>>>>>> > > > >
>> L73K7D45A2IR7R3UMMM23/
> >> >> >> >>>>>>>> > > > >
>>
> _______________________________________________
> >> >> >> >>>>>>>> > > > >
>> Devel mailing list -- devel(a)ovirt.org
> >> >> >> >>>>>>>> > > > >
>> To unsubscribe send an email to
> devel-leave(a)ovirt.org
> >> >> >> >>>>>>>> > > > >
>> Privacy Statement:
>
https://www.ovirt.org/site/privacy-policy/
> >> >> >> >>>>>>>> > > > >
>> oVirt Code of Conduct:
> >> >> >> >>>>>>>> > > > >
>>
>
https://www.ovirt.org/community/about/community-guidelines/
> >> >> >> >>>>>>>> > > > >
>> List Archives:
> >> >> >> >>>>>>>> > > > >
>>
>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JN7MNUZ
> >> >> >> >>>>>>>> > > > >
>> N5K3
> >> >> >> >>>>>>>> > > > >
>> NS5TGXFCILYES77KI5TZU/
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > > >
> >> >> >> >>>>>>>> > >
> >> >> >> >>>>>>>> > >
_______________________________________________
> >> >> >> >>>>>>>> > > Devel mailing
list -- devel(a)ovirt.org
> >> >> >> >>>>>>>> > > To unsubscribe
send an email to
> devel-leave(a)ovirt.org
> >> >> >> >>>>>>>> > > Privacy
Statement:
>
https://www.ovirt.org/site/privacy-policy/
> >> >> >> >>>>>>>> > > oVirt Code of
Conduct:
> >> >> >> >>>>>>>> > >
>
https://www.ovirt.org/community/about/community-guidelines/ List
> Archives:
> >> >> >> >>>>>>>> > >
>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UPJ5SEAV5Z65H
> >> >> >> >>>>>>>> > >
5BQ3SCHOYZX6JMTQPBW/
> >> >> >> >>>>>>>> >
> >> >> >> >>>>>>>>
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>> --
> >> >> >> >>>>> Martin Perina
> >> >> >> >>>>> Manager, Software Engineering
> >> >> >> >>>>> Red Hat Czech s.r.o.
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >>
> >> >>
> >>
>
> _______________________________________________
Devel mailing list -- devel(a)ovirt.org
To unsubscribe send an email to devel-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/KMRWVJNQ6GA...