On Mon, Nov 25, 2019 at 7:12 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
On Mon, Nov 25, 2019 at 7:15 PM Dominik Holler
<dholler(a)redhat.com> wrote:
>
>
>
> On Mon, Nov 25, 2019 at 6:03 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>> On Mon, Nov 25, 2019 at 6:48 PM Dominik Holler <dholler(a)redhat.com>
wrote:
>> >
>> >
>> >
>> > On Mon, Nov 25, 2019 at 5:16 PM Nir Soffer <nsoffer(a)redhat.com>
wrote:
>> >>
>> >> On Mon, Nov 25, 2019 at 6:05 PM Dominik Holler
<dholler(a)redhat.com>
wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Nov 25, 2019 at 4:50 PM Nir Soffer
<nsoffer(a)redhat.com>
wrote:
>> >> >>
>> >> >> On Mon, Nov 25, 2019 at 11:00 AM Dominik Holler <
dholler(a)redhat.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Nov 22, 2019 at 8:57 PM Dominik Holler <
dholler(a)redhat.com> wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Fri, Nov 22, 2019 at 5:54 PM Dominik Holler <
dholler(a)redhat.com> wrote:
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Fri, Nov 22, 2019 at 5:48 PM Nir Soffer <
nsoffer(a)redhat.com> wrote:
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On Fri, Nov 22, 2019, 18:18 Marcin Sobczyk
<
msobczyk(a)redhat.com> wrote:
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> On 11/22/19 4:54 PM, Martin Perina
wrote:
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> On Fri, Nov 22, 2019 at 4:43 PM Dominik
Holler <
dholler(a)redhat.com> wrote:
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> On Fri, Nov 22, 2019 at 12:17 PM
Dominik Holler <
dholler(a)redhat.com> wrote:
>> >> >> >>>>>>>
>> >> >> >>>>>>>
>> >> >> >>>>>>>
>> >> >> >>>>>>> On Fri, Nov 22, 2019 at 12:00 PM
Miguel Duarte de Mora
Barroso <mdbarroso(a)redhat.com> wrote:
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> On Fri, Nov 22, 2019 at 11:54
AM Vojtech Juranek <
vjuranek(a)redhat.com> wrote:
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> > On pátek 22. listopadu
2019 9:56:56 CET Miguel Duarte
de Mora Barroso wrote:
>> >> >> >>>>>>>> > > On Fri, Nov 22,
2019 at 9:49 AM Vojtech Juranek <
vjuranek(a)redhat.com>
>> >> >> >>>>>>>> > > wrote:
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > > On pátek 22.
listopadu 2019 9:41:26 CET Dominik
Holler wrote:
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > > > On Fri,
Nov 22, 2019 at 8:40 AM Dominik Holler <
dholler(a)redhat.com>
>> >> >> >>>>>>>> > > > > wrote:
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > > > On
Thu, Nov 21, 2019 at 10:54 PM Nir Soffer <
nsoffer(a)redhat.com>
>> >> >> >>>>>>>> > > > > >
wrote:
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > >>
On Thu, Nov 21, 2019 at 11:24 PM Vojtech
Juranek
>> >> >> >>>>>>>> > > > > >>
<vjuranek(a)redhat.com>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
wrote:
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> Hi,
>> >> >> >>>>>>>> > > > > >>
> OST fails (see e.g. [1]) in
002_bootstrap.check_update_host. It
>> >> >> >>>>>>>> > > > > >>
> fails
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
with
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> FAILED! => {"changed": false, "failures":
[], "msg": "Depsolve
>> >> >> >>>>>>>> > > > > >>
> Error
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
occured:
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> \n Problem 1: cannot install the best
update candidate for package
>> >> >> >>>>>>>> > > > > >>
> vdsm-
>> >> >> >>>>>>>> > > > > >>
>
network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n - nothing provides
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
nmstate
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> needed by
vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>> >> >> >>>>>>>> > > > > >>
> Problem 2:
>> >> >> >>>>>>>> > > > > >>
> package
vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch requires
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
vdsm-network
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> = 4.40.0-1271.git524e08c8a.el8, but none of
the providers can be
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
installed\n
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> - cannot install the best update candidate
for package vdsm-
>> >> >> >>>>>>>> > > > > >>
>
python-4.40.0-1236.git63ea8cb8b.el8.noarch\n - nothing provides
>> >> >> >>>>>>>> > > > > >>
> nmstate
>> >> >> >>>>>>>> > > > > >>
> needed by
vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
nmstate should be provided by copr repo
enabled by
>> >> >> >>>>>>>> > > > > >>
ovirt-release-master.
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > > I
re-triggered as
>> >> >> >>>>>>>> > > > > >
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131
>> >> >> >>>>>>>> > > > > >
maybe
>> >> >> >>>>>>>> > > > > >
https://gerrit.ovirt.org/#/c/104825/
>> >> >> >>>>>>>> > > > > > was
missing
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > > Looks
like
>> >> >> >>>>>>>> > > > >
https://gerrit.ovirt.org/#/c/104825/ is ignored
by OST.
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > > maybe not. You
re-triggered with [1], which really
missed this patch.
>> >> >> >>>>>>>> > > > I did a rebase
and now running with this patch in
build #6132 [2]. Let's
>> >> >> >>>>>>>> > > > wait
>> >> >> >>>>>>>> > for it to see if gerrit
#104825 helps.
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > > [1]
https://jenkins.ovirt.org/job/standard-manual-runner/909/
>> >> >> >>>>>>>> > > > [2]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > > > Miguel,
do you think merging
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > >
https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos
>> >> >> >>>>>>>> > > > > t-cq
>> >> >> >>>>>>>> > .repo.in
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > > would
solve this?
>> >> >> >>>>>>>> > >
>> >> >> >>>>>>>> > >
>> >> >> >>>>>>>> > > I've split the
patch Dominik mentions above in two,
one of them adding
>> >> >> >>>>>>>> > > the nmstate /
networkmanager copr repos - [3].
>> >> >> >>>>>>>> > >
>> >> >> >>>>>>>> > > Let's see if it
fixes it.
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> > it fixes original issue,
but OST still fails in
>> >> >> >>>>>>>> >
098_ovirt_provider_ovn.use_ovn_provider:
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> >
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> I think Dominik was looking
into this issue; +Dominik
Holler please confirm.
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> Let me know if you need any
help Dominik.
>> >> >> >>>>>>>
>> >> >> >>>>>>>
>> >> >> >>>>>>>
>> >> >> >>>>>>> Thanks.
>> >> >> >>>>>>> The problem is that the hosts
lost connection to storage:
>> >> >> >>>>>>>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
:
>> >> >> >>>>>>>
>> >> >> >>>>>>> 2019-11-22 05:39:12,326-0500
DEBUG (jsonrpc/5)
[common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n
/sbin/lvm vgs --config 'devices { preferred_names=["^/dev/mapper/"]
ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3
filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|",
"r|.*|"] } global { locking_type=1 prioritise_write_locks=1
wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }'
--noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
(cwd None) (commands:153)
>> >> >> >>>>>>> 2019-11-22 05:39:12,415-0500
ERROR (check/loop)
[storage.Monitor] Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
>> >> >> >>>>>>> Traceback (most recent call
last):
>> >> >> >>>>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in
_pathChecked
>> >> >> >>>>>>> delay = result.delay()
>> >> >> >>>>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in delay
>> >> >> >>>>>>> raise
exception.MiscFileReadException(self.path,
self.rc, self.err)
>> >> >> >>>>>>>
vdsm.storage.exception.MiscFileReadException: Internal
file read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
>> >> >> >>>>>>> 2019-11-22 05:39:12,416-0500 INFO
(check/loop)
[storage.Monitor] Domain d10879c6-8de1-40ba-87fa-f447844eed2a became
INVALID (monitor:472)
>> >> >> >>>>>>>
>> >> >> >>>>>>>
>> >> >> >>>>>>> I failed to reproduce local to
analyze this, I will try
again, any hints welcome.
>> >> >> >>>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
https://gerrit.ovirt.org/#/c/104925/1/ shows that
008_basic_ui_sanity.py triggers the problem.
>> >> >> >>>>>> Is there someone with knowledge about
the basic_ui_sanity
around?
>> >> >> >>>>>
>> >> >> >>>>> How do you think it's related? By
commenting out the ui
sanity tests and seeing OST with successful finish?
>> >> >> >>>>>
>> >> >> >>>>> Looking at 6134 run you were discussing:
>> >> >> >>>>>
>> >> >> >>>>> - timing of the ui sanity set-up [1]:
>> >> >> >>>>>
>> >> >> >>>>> 11:40:20 @ Run test:
008_basic_ui_sanity.py:
>> >> >> >>>>>
>> >> >> >>>>> - timing of first encountered storage
error [2]:
>> >> >> >>>>>
>> >> >> >>>>> 2019-11-22 05:39:12,415-0500 ERROR
(check/loop)
[storage.Monitor] Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
>> >> >> >>>>> Traceback (most recent call last):
>> >> >> >>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in
_pathChecked
>> >> >> >>>>> delay = result.delay()
>> >> >> >>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in delay
>> >> >> >>>>> raise
exception.MiscFileReadException(self.path,
self.rc, self.err)
>> >> >> >>>>>
vdsm.storage.exception.MiscFileReadException: Internal file
read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
>> >> >> >>>>>
>> >> >> >>>>> Timezone difference aside, it seems to me
that these
storage errors occured before doing anything ui-related.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> You are right, a time.sleep(8*60) in
>> >> >> >>
https://gerrit.ovirt.org/#/c/104925/2
>> >> >> >> has triggers the issue the same way.
>> >> >>
>> >> >> So this is a test issues, assuming that the UI tests can
complete
in
>> >> >> less than 8 minutes?
>> >> >>
>> >> >
>> >> > To my eyes this looks like storage is just stop working after
some
time.
>> >> >
>> >> >>
>> >> >> >>
>> >> >> >
>> >> >> > Nir or Steve, can you please confirm that this is a
storage
problem?
>> >> >>
>> >> >> Why do you think we have a storage problem?
>> >> >>
>> >> >
>> >> > I understand from the posted log snippets that they say that the
storage is not accessible anymore,
>> >>
>> >> No, so far one read timeout was reported, this does not mean storage
>> >> is not available anymore.
>> >> It can be temporary issue that does not harm anything.
>> >>
>> >> > while the host is still responsive.
>> >> > This might be triggered by something outside storage, e.g. the
network providing the storage stopped working,
>> >> > But I think a possible next step in analysing this issue would be
to find the reason why storage is not happy.
>> >>
>> >
>> > Sounds like there was a miscommunication in this thread.
>> > I try to address all of your points, please let me know if something
is missing or not clearly expressed.
>> >
>> >>
>> >> First step is to understand which test fails,
>> >
>> >
>> > 098_ovirt_provider_ovn.use_ovn_provider
>> >
>> >>
>> >> and why. This can be done by the owner of the test,
>> >
>> >
>> > The test was added by the network team.
>> >
>> >>
>> >> understanding what the test does
>> >
>> >
>> > The test tries to add a vNIC.
>> >
>> >>
>> >> and what is the expected system behavior.
>> >>
>> >
>> > It is expected that adding a vNIC works, because the VM should be up.
>>
>> What was the actual behavior?
>>
>> >> If the owner of the test thinks that the test failed because of a
storage issue
>> >
>> >
>> > I am not sure who is the owner, but I do.
>>
>> Can you explain why how a vNIC failed because of a storage issue?
>>
>
>
> Test fails with:
>
> Cannot add a Network Interface when VM is not Down, Up or Image-Locked.
>
> engine.log says:
> {"jsonrpc": "2.0", "method":
"|virt|VM_status|308bd254-9af9-4570-98ea-822609550acf", "params":
{"308bd254-9af9-4570-98ea-822609550acf": {"status":
"Paused", "pauseCode":
"EOTHER", "ioerror": {"alias":
"ua-953dd722-5e8b-4b24-bccd-a2a5d5befeb6",
"name": "vda", "path":
"/rhev/data-center/38c691d4-8556-4882-8f04-a88dff5d0973/bcd1622c-876b-460c-95a7-d09536c42ffe/images/953dd722-5e8b-4b24-bccd-a2a5d5befeb6/dcb5fec4-f219-4d3f-986c-628b0d00b349"}},
"notify_time": 4298388570}}
So you think adding vNIC failed because the VM was paused?
Yes, because of the error message "Cannot add a Network Interface when VM
is not Down, Up or Image-Locked."
> vdsm.log says:
>
> 2019-11-20 10:51:06,026-0500 ERROR (check/loop) [storage.Monitor] Error
checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/bcd1622c-876b-460c-95a7-d09536c42ffe/dom_md/metadata
(monitor:501)
> Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line
499, in _pathChecked
> delay = result.delay()
> File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line
391, in delay
> raise exception.MiscFileReadException(self.path, self.rc, self.err)
> vdsm.storage.exception.MiscFileReadException: Internal file read
failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/bcd1622c-876b-460c-95a7-d09536c42ffe/dom_md/metadata',
1, 'Read timeout')
Is this related to the paused vm?
The log entry : '{"status": "Paused",
"pauseCode": "EOTHER", "ioerror"'
makes me thinking this.
You did not provide a timestamp for the engine event above.
I can't find last weeks log, maybe they are faded out already.
Please find more recent logs in
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/6492
> ...
>
> 2019-11-20 10:51:56,249-0500 WARN (check/loop) [storage.check] Checker
'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/64daa060-1d83-46b9-b7e8-72a902e1134b/dom_md/metadata'
is blocked for 60.00 seconds (check:282)
> 2019-11-20 10:51:56,885-0500 ERROR (monitor/775b710) [storage.Monitor]
Error checking domain 775b7102-7f2c-4eee-a4d0-a41b55451f7e (monitor:427)
> Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line
408, in _checkDomainStatus
> self.domain.selftest()
> File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line
710, in selftest
> self.oop.os.statvfs(self.domaindir)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py",
line 242, in statvfs
> return self._iop.statvfs(path)
> File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line
479, in statvfs
> resdict = self._sendCommand("statvfs", {"path": path},
self.timeout)
> File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line
442, in _sendCommand
> raise Timeout(os.strerror(errno.ETIMEDOUT))
> ioprocess.Timeout: Connection timed out
This show that storage was not accessible for 60 seconds (ioprocess
uses 60 seconds timeout).
60 seconds timeout is bad. If we have leases on this storage domain
(e.g. SPM lease) they will
expire in 20 seconds after this event and the vdsm on the SPM host
will be killed.
Do we have network tests changing the network used by the NFS storage
domain before this event?
No.
What were the changes the network tests or code since OST was
successful?
I am not aware of a change, which might be relevant.
Maybe the fact that the hosts are on CentOS 8, while the Engine (storage)
is on CentOS 7 is relevant.
Also the occurrence of this issue seems not to be 100% deterministic, I
guess because it is timing related.
The error is reproducible locally by running OST, and just keep the
environment alive after basic-suite-master succeeded.
After some time, the storage will become inaccessible.
>> Can you explain how adding 8 minutes sleep instead of the UI
tests
>> reproduced the issue?
>>
>
>
> This shows that the issue is not triggered by the UI test, but maybe by
passing time.
Do we run the ovn tests after the UI tests?
>> >> someone from storage can look at this.
>> >>
>> >
>> > Thanks, I would appreciate this.
>> >
>> >>
>> >> But the fact that adding long sleep reproduce the issue means it is
not related
>> >> in any way to storage.
>> >>
>> >> Nir
>> >>
>> >> >
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>>>>
>> >> >> >>>>> I remember talking with Steven Rosenberg
on IRC a couple of
days ago about some storage metadata issues and he said he got a response
from Nir, that "it's a known issue".
>> >> >> >>>>>
>> >> >> >>>>> Nir, Amit, can you comment on this?
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> The error mentioned here is not vdsm error
but warning about
storage accessibility. We sould convert the tracebacks to warning.
>> >> >> >>>>
>> >> >> >>>> The reason for such issue can be
misconfigured network
(maybe network team is testing negative flows?),
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> No.
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> or some issue in the NFS server.
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>> Only hint I found is
>> >> >> >>> "Exiting Time2Retain handler because
session_reinstatement=1"
>> >> >> >>> but I have no idea what this means or if this is
relevant at
all.
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> One read timeout is not an issue. We have a
real issue only
if we have consistent read timeouts or errors for couple of minutes, after
that engine can deactivate the storage domain or some hosts if only these
hosts are having trouble to access storage.
>> >> >> >>>>
>> >> >> >>>> In OST we never expect such conditions since
we don't test
negative flows, and we should have good connectivity with the vms running
on the same host.
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>> Ack, this seems to be the problem.
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> Nir
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>> [1]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/console
>> >> >> >>>>> [2]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exp...
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>
>> >> >> >>>>> Marcin, could you please take a look?
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>>>
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>> > > [3] -
https://gerrit.ovirt.org/#/c/104897/
>> >> >> >>>>>>>> > >
>> >> >> >>>>>>>> > >
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > >
>> >> >> >>>>>>>> > > > > >>
Who installs this rpm in OST?
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > > I do
not understand the question.
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > >
>> >> >> >>>>>>>> > > > > >>
> [...]
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
> See [2] for full error.
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
> Can someone please take a look?
>> >> >> >>>>>>>> > > > > >>
> Thanks
>> >> >> >>>>>>>> > > > > >>
> Vojta
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
>
>> >> >> >>>>>>>> > > > > >>
> [1]
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/
>> >> >> >>>>>>>> > > > > >>
> [2]
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact
>> >> >> >>>>>>>> > > > > >>
/
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>
exported-artifacts/test_logs/basic-suite-master/
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
post-002_bootstrap.py/lago-
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________
>> >> >> >>>>>>>> > > > > >>
____
>> >> >> >>>>>>>> > > > > >>
________________________________>>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> Devel mailing list -- devel(a)ovirt.org
>> >> >> >>>>>>>> > > > > >>
> To unsubscribe send an email to
devel-leave(a)ovirt.org
>> >> >> >>>>>>>> > > > > >>
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> oVirt Code of Conduct:
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
https://www.ovirt.org/community/about/community-guidelines/
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
> List Archives:
>> >> >> >>>>>>>> > > > > >>
>> >> >> >>>>>>>> > > > > >>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4K5N3VQ
>> >> >> >>>>>>>> > > > > >>
N26B
>> >> >> >>>>>>>> > > > > >>
L73K7D45A2IR7R3UMMM23/
>> >> >> >>>>>>>> > > > > >>
_______________________________________________
>> >> >> >>>>>>>> > > > > >>
Devel mailing list -- devel(a)ovirt.org
>> >> >> >>>>>>>> > > > > >>
To unsubscribe send an email to
devel-leave(a)ovirt.org
>> >> >> >>>>>>>> > > > > >>
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>> >> >> >>>>>>>> > > > > >>
oVirt Code of Conduct:
>> >> >> >>>>>>>> > > > > >>
https://www.ovirt.org/community/about/community-guidelines/
>> >> >> >>>>>>>> > > > > >>
List Archives:
>> >> >> >>>>>>>> > > > > >>
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JN7MNUZ
>> >> >> >>>>>>>> > > > > >>
N5K3
>> >> >> >>>>>>>> > > > > >>
NS5TGXFCILYES77KI5TZU/
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > > >
>> >> >> >>>>>>>> > >
>> >> >> >>>>>>>> > >
_______________________________________________
>> >> >> >>>>>>>> > > Devel mailing list
-- devel(a)ovirt.org
>> >> >> >>>>>>>> > > To unsubscribe send
an email to
devel-leave(a)ovirt.org
>> >> >> >>>>>>>> > > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>> >> >> >>>>>>>> > > oVirt Code of
Conduct:
>> >> >> >>>>>>>> > >
https://www.ovirt.org/community/about/community-guidelines/ List Archives:
>> >> >> >>>>>>>> > >
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UPJ5SEAV5Z65H
>> >> >> >>>>>>>> > >
5BQ3SCHOYZX6JMTQPBW/
>> >> >> >>>>>>>> >
>> >> >> >>>>>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Martin Perina
>> >> >> >>>>> Manager, Software Engineering
>> >> >> >>>>> Red Hat Czech s.r.o.
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >>
>> >>
>>