On Mon, Nov 25,
2019 at 7:15 PM Dominik Holler <dholler@redhat.com>
wrote:
>
>
>
> On Mon, Nov 25, 2019 at 6:03 PM Nir Soffer <nsoffer@redhat.com>
wrote:
>>
>> On Mon, Nov 25, 2019 at 6:48 PM Dominik
Holler <dholler@redhat.com>
wrote:
>> >
>> >
>> >
>> > On Mon, Nov 25, 2019 at 5:16 PM Nir
Soffer <nsoffer@redhat.com>
wrote:
>> >>
>> >> On Mon, Nov 25, 2019 at 6:05 PM
Dominik Holler <dholler@redhat.com>
wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Nov 25, 2019 at 4:50 PM
Nir Soffer <nsoffer@redhat.com>
wrote:
>> >> >>
>> >> >> On Mon, Nov 25, 2019 at
11:00 AM Dominik Holler <dholler@redhat.com>
wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Nov 22, 2019
at 8:57 PM Dominik Holler <dholler@redhat.com>
wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Fri, Nov 22,
2019 at 5:54 PM Dominik Holler <dholler@redhat.com>
wrote:
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Fri, Nov
22, 2019 at 5:48 PM Nir Soffer <nsoffer@redhat.com>
wrote:
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On Fri,
Nov 22, 2019, 18:18 Marcin Sobczyk <msobczyk@redhat.com>
wrote:
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> On
11/22/19 4:54 PM, Martin Perina wrote:
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> On
Fri, Nov 22, 2019 at 4:43 PM Dominik Holler <dholler@redhat.com>
wrote:
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> On
Fri, Nov 22, 2019 at 12:17 PM Dominik Holler <dholler@redhat.com>
wrote:
>> >> >>
>>>>>>>
>> >> >>
>>>>>>>
>> >> >>
>>>>>>>
>> >> >>
>>>>>>> On Fri, Nov 22, 2019 at
12:00 PM Miguel Duarte de Mora Barroso <mdbarroso@redhat.com>
wrote:
>> >> >>
>>>>>>>>
>> >> >>
>>>>>>>> On Fri, Nov 22, 2019
at 11:54 AM Vojtech Juranek <vjuranek@redhat.com>
wrote:
>> >> >>
>>>>>>>> >
>> >> >>
>>>>>>>> > On pátek 22.
listopadu 2019 9:56:56 CET Miguel Duarte de Mora
Barroso wrote:
>> >> >>
>>>>>>>> > > On Fri, Nov
22, 2019 at 9:49 AM Vojtech Juranek <vjuranek@redhat.com>
>> >> >>
>>>>>>>> > > wrote:
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > > On
pátek 22. listopadu 2019 9:41:26 CET Dominik Holler
wrote:
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > > >
On Fri, Nov 22, 2019 at 8:40 AM Dominik Holler <dholler@redhat.com>
>> >> >>
>>>>>>>> > > > >
wrote:
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
> On Thu, Nov 21, 2019 at 10:54 PM Nir Soffer <nsoffer@redhat.com>
>> >> >>
>>>>>>>> > > > >
> wrote:
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
>> On Thu, Nov 21, 2019 at 11:24 PM Vojtech
Juranek
>> >> >>
>>>>>>>> > > > >
>> <vjuranek@redhat.com>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> wrote:
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > Hi,
>> >> >>
>>>>>>>> > > > >
>> > OST fails (see e.g. [1]) in
002_bootstrap.check_update_host. It
>> >> >>
>>>>>>>> > > > >
>> > fails
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> with
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > FAILED! => {"changed": false,
"failures": [], "msg": "Depsolve
>> >> >>
>>>>>>>> > > > >
>> > Error
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> occured:
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > \n Problem 1: cannot install the best
update candidate for package
>> >> >>
>>>>>>>> > > > >
>> > vdsm-
>> >> >>
>>>>>>>> > > > >
>> >
network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n -
nothing provides
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> nmstate
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > needed by
vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>> >> >>
>>>>>>>> > > > >
>> > Problem 2:
>> >> >>
>>>>>>>> > > > >
>> > package
vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch
requires
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> vdsm-network
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > = 4.40.0-1271.git524e08c8a.el8, but none
of the providers can be
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> installed\n
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > - cannot install the best update
candidate for package vdsm-
>> >> >>
>>>>>>>> > > > >
>> >
python-4.40.0-1236.git63ea8cb8b.el8.noarch\n -
nothing provides
>> >> >>
>>>>>>>> > > > >
>> > nmstate
>> >> >>
>>>>>>>> > > > >
>> > needed by
vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> nmstate should be provided by copr repo
enabled by
>> >> >>
>>>>>>>> > > > >
>> ovirt-release-master.
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
> I re-triggered as
>> >> >>
>>>>>>>> > > > >
> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131
>> >> >>
>>>>>>>> > > > >
> maybe
>> >> >>
>>>>>>>> > > > >
> https://gerrit.ovirt.org/#/c/104825/
>> >> >>
>>>>>>>> > > > >
> was missing
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
Looks like
>> >> >>
>>>>>>>> > > > > https://gerrit.ovirt.org/#/c/104825/
is ignored by OST.
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > > maybe
not. You re-triggered with [1], which really missed
this patch.
>> >> >>
>>>>>>>> > > > I did
a rebase and now running with this patch in build
#6132 [2]. Let's
>> >> >>
>>>>>>>> > > > wait
>> >> >>
>>>>>>>> > for it to see
if gerrit #104825 helps.
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > > [1] https://jenkins.ovirt.org/job/standard-manual-runner/909/
>> >> >>
>>>>>>>> > > > [2] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > > >
Miguel, do you think merging
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > > https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos
>> >> >>
>>>>>>>> > > > >
t-cq
>> >> >>
>>>>>>>> > .repo.in
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
would solve this?
>> >> >>
>>>>>>>> > >
>> >> >>
>>>>>>>> > >
>> >> >>
>>>>>>>> > > I've split
the patch Dominik mentions above in two, one of them
adding
>> >> >>
>>>>>>>> > > the nmstate
/ networkmanager copr repos - [3].
>> >> >>
>>>>>>>> > >
>> >> >>
>>>>>>>> > > Let's see
if it fixes it.
>> >> >>
>>>>>>>> >
>> >> >>
>>>>>>>> > it fixes
original issue, but OST still fails in
>> >> >>
>>>>>>>> >
098_ovirt_provider_ovn.use_ovn_provider:
>> >> >>
>>>>>>>> >
>> >> >>
>>>>>>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134
>> >> >>
>>>>>>>>
>> >> >>
>>>>>>>> I think Dominik was
looking into this issue; +Dominik Holler please
confirm.
>> >> >>
>>>>>>>>
>> >> >>
>>>>>>>> Let me know if you
need any help Dominik.
>> >> >>
>>>>>>>
>> >> >>
>>>>>>>
>> >> >>
>>>>>>>
>> >> >>
>>>>>>> Thanks.
>> >> >>
>>>>>>> The problem is that the
hosts lost connection to storage:
>> >> >>
>>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exported-artifacts/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
:
>> >> >>
>>>>>>>
>> >> >>
>>>>>>> 2019-11-22
05:39:12,326-0500 DEBUG (jsonrpc/5) [common.commands]
/usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n
/sbin/lvm vgs --config 'devices {
preferred_names=["^/dev/mapper/"]
ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3
filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|",
"r|.*|"] } global { locking_type=1
prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min=50 retain_days=0
}' --noheadings --units b --nosuffix --separator '|'
--ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
(cwd None) (commands:153)
>> >> >>
>>>>>>> 2019-11-22
05:39:12,415-0500 ERROR (check/loop) [storage.Monitor]
Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
>> >> >>
>>>>>>> Traceback (most recent
call last):
>> >> >>
>>>>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py",
line 499, in _pathChecked
>> >> >>
>>>>>>> delay =
result.delay()
>> >> >>
>>>>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/check.py",
line 391, in delay
>> >> >>
>>>>>>> raise
exception.MiscFileReadException(self.path, self.rc,
self.err)
>> >> >>
>>>>>>>
vdsm.storage.exception.MiscFileReadException: Internal
file read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
>> >> >>
>>>>>>> 2019-11-22
05:39:12,416-0500 INFO (check/loop) [storage.Monitor]
Domain d10879c6-8de1-40ba-87fa-f447844eed2a became
INVALID (monitor:472)
>> >> >>
>>>>>>>
>> >> >>
>>>>>>>
>> >> >>
>>>>>>> I failed to reproduce
local to analyze this, I will try again, any hints
welcome.
>> >> >>
>>>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> https://gerrit.ovirt.org/#/c/104925/1/
shows that 008_basic_ui_sanity.py triggers the
problem.
>> >> >> >>>>>> Is
there someone with knowledge about the basic_ui_sanity
around?
>> >> >> >>>>>
>> >> >> >>>>> How do
you think it's related? By commenting out the ui
sanity tests and seeing OST with successful finish?
>> >> >> >>>>>
>> >> >> >>>>>
Looking at 6134 run you were discussing:
>> >> >> >>>>>
>> >> >> >>>>> -
timing of the ui sanity set-up [1]:
>> >> >> >>>>>
>> >> >> >>>>>
11:40:20 @ Run test: 008_basic_ui_sanity.py:
>> >> >> >>>>>
>> >> >> >>>>> -
timing of first encountered storage error [2]:
>> >> >> >>>>>
>> >> >> >>>>>
2019-11-22 05:39:12,415-0500 ERROR (check/loop)
[storage.Monitor] Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
(monitor:501)
>> >> >> >>>>>
Traceback (most recent call last):
>> >> >> >>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 499, in
_pathChecked
>> >> >> >>>>>
delay = result.delay()
>> >> >> >>>>> File
"/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in
delay
>> >> >> >>>>>
raise exception.MiscFileReadException(self.path,
self.rc, self.err)
>> >> >> >>>>>
vdsm.storage.exception.MiscFileReadException: Internal
file read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
1, 'Read timeout')
>> >> >> >>>>>
>> >> >> >>>>>
Timezone difference aside, it seems to me that these
storage errors occured before doing anything
ui-related.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> You are right, a
time.sleep(8*60) in
>> >> >> >> https://gerrit.ovirt.org/#/c/104925/2
>> >> >> >> has triggers the
issue the same way.
>> >> >>
>> >> >> So this is a test issues,
assuming that the UI tests can complete in
>> >> >> less than 8 minutes?
>> >> >>
>> >> >
>> >> > To my eyes this looks like
storage is just stop working after some time.
>> >> >
>> >> >>
>> >> >> >>
>> >> >> >
>> >> >> > Nir or Steve, can you
please confirm that this is a storage problem?
>> >> >>
>> >> >> Why do you think we have a
storage problem?
>> >> >>
>> >> >
>> >> > I understand from the posted
log snippets that they say that the storage is not
accessible anymore,
>> >>
>> >> No, so far one read timeout was
reported, this does not mean storage
>> >> is not available anymore.
>> >> It can be temporary issue that does
not harm anything.
>> >>
>> >> > while the host is still
responsive.
>> >> > This might be triggered by
something outside storage, e.g. the network providing
the storage stopped working,
>> >> > But I think a possible next
step in analysing this issue would be to find the
reason why storage is not happy.
>> >>
>> >
>> > Sounds like there was a miscommunication
in this thread.
>> > I try to address all of your points,
please let me know if something is missing or not
clearly expressed.
>> >
>> >>
>> >> First step is to understand which
test fails,
>> >
>> >
>> > 098_ovirt_provider_ovn.use_ovn_provider
>> >
>> >>
>> >> and why. This can be done by the
owner of the test,
>> >
>> >
>> > The test was added by the network team.
>> >
>> >>
>> >> understanding what the test does
>> >
>> >
>> > The test tries to add a vNIC.
>> >
>> >>
>> >> and what is the expected system
behavior.
>> >>
>> >
>> > It is expected that adding a vNIC works,
because the VM should be up.
>>
>> What was the actual behavior?
>>
>> >> If the owner of the test thinks that
the test failed because of a storage issue
>> >
>> >
>> > I am not sure who is the owner, but I
do.
>>
>> Can you explain why how a vNIC failed because
of a storage issue?
>>
>
>
> Test fails with:
>
> Cannot add a Network Interface when VM is not
Down, Up or Image-Locked.
>
> engine.log says:
> {"jsonrpc": "2.0", "method":
"|virt|VM_status|308bd254-9af9-4570-98ea-822609550acf",
"params": {"308bd254-9af9-4570-98ea-822609550acf":
{"status": "Paused", "pauseCode": "EOTHER", "ioerror":
{"alias": "ua-953dd722-5e8b-4b24-bccd-a2a5d5befeb6",
"name": "vda", "path":
"/rhev/data-center/38c691d4-8556-4882-8f04-a88dff5d0973/bcd1622c-876b-460c-95a7-d09536c42ffe/images/953dd722-5e8b-4b24-bccd-a2a5d5befeb6/dcb5fec4-f219-4d3f-986c-628b0d00b349"}},
"notify_time": 4298388570}}
So you think adding vNIC failed because the VM was
paused?
Yes, because of the error message "Cannot add a
Network Interface when VM is not Down, Up or
Image-Locked."
> vdsm.log says:
>
> 2019-11-20 10:51:06,026-0500 ERROR (check/loop)
[storage.Monitor] Error checking path
/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/bcd1622c-876b-460c-95a7-d09536c42ffe/dom_md/metadata
(monitor:501)
> Traceback (most recent call last):
> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py",
line 499, in _pathChecked
> delay = result.delay()
> File
"/usr/lib/python3.6/site-packages/vdsm/storage/check.py",
line 391, in delay
> raise
exception.MiscFileReadException(self.path, self.rc,
self.err)
> vdsm.storage.exception.MiscFileReadException:
Internal file read failure:
('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/bcd1622c-876b-460c-95a7-d09536c42ffe/dom_md/metadata',
1, 'Read timeout')
Is this related to the paused vm?
The log entry : '{"status": "Paused", "pauseCode":
"EOTHER", "ioerror"' makes me thinking this.
You did not provide a timestamp for the engine event
above.
I can't find last weeks log, maybe they are faded
out already.
Please find more recent logs in
> ...
>
> 2019-11-20 10:51:56,249-0500 WARN (check/loop)
[storage.check] Checker
'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/64daa060-1d83-46b9-b7e8-72a902e1134b/dom_md/metadata'
is blocked for 60.00 seconds (check:282)
> 2019-11-20 10:51:56,885-0500 ERROR
(monitor/775b710) [storage.Monitor] Error checking
domain 775b7102-7f2c-4eee-a4d0-a41b55451f7e
(monitor:427)
> Traceback (most recent call last):
> File
"/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py",
line 408, in _checkDomainStatus
> self.domain.selftest()
> File
"/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py",
line 710, in selftest
> self.oop.os.statvfs(self.domaindir)
> File
"/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py",
line 242, in statvfs
> return self._iop.statvfs(path)
> File
"/usr/lib/python3.6/site-packages/ioprocess/__init__.py",
line 479, in statvfs
> resdict = self._sendCommand("statvfs",
{"path": path}, self.timeout)
> File
"/usr/lib/python3.6/site-packages/ioprocess/__init__.py",
line 442, in _sendCommand
> raise Timeout(os.strerror(errno.ETIMEDOUT))
> ioprocess.Timeout: Connection timed out
This show that storage was not accessible for 60
seconds (ioprocess
uses 60 seconds timeout).
60 seconds timeout is bad. If we have leases on this
storage domain
(e.g. SPM lease) they will
expire in 20 seconds after this event and the vdsm on
the SPM host
will be killed.
Do we have network tests changing the network used by
the NFS storage
domain before this event?
No.
What were the changes the network tests or code since
OST was successful?
I am not aware of a change, which might be
relevant.
Maybe the fact that the hosts are on CentOS 8,
while the Engine (storage) is on CentOS 7 is relevant.
Also the occurrence of this issue seems not to be
100% deterministic, I guess because it is timing
related.
The error is reproducible locally by running OST,
and just keep the environment alive after
basic-suite-master succeeded.
After some time, the storage will become
inaccessible.
>> Can you explain how adding 8 minutes sleep
instead of the UI tests
>> reproduced the issue?
>>
>
>
> This shows that the issue is not triggered by the
UI test, but maybe by passing time.
Do we run the ovn tests after the UI tests?
>> >> someone from storage can look at
this.
>> >>
>> >
>> > Thanks, I would appreciate this.
>> >
>> >>
>> >> But the fact that adding long sleep
reproduce the issue means it is not related
>> >> in any way to storage.
>> >>
>> >> Nir
>> >>
>> >> >
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>>>>
>> >> >> >>>>> I
remember talking with Steven Rosenberg on IRC a couple
of days ago about some storage metadata issues and he
said he got a response from Nir, that "it's a known
issue".
>> >> >> >>>>>
>> >> >> >>>>> Nir,
Amit, can you comment on this?
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> The error
mentioned here is not vdsm error but warning about
storage accessibility. We sould convert the tracebacks
to warning.
>> >> >> >>>>
>> >> >> >>>> The reason
for such issue can be misconfigured network (maybe
network team is testing negative flows?),
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> No.
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> or some
issue in the NFS server.
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>> Only hint I
found is
>> >> >> >>> "Exiting
Time2Retain handler because session_reinstatement=1"
>> >> >> >>> but I have no
idea what this means or if this is relevant at all.
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> One read
timeout is not an issue. We have a real issue only if
we have consistent read timeouts or errors for couple
of minutes, after that engine can deactivate the
storage domain or some hosts if only these hosts are
having trouble to access storage.
>> >> >> >>>>
>> >> >> >>>> In OST we
never expect such conditions since we don't test
negative flows, and we should have good connectivity
with the vms running on the same host.
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>> Ack, this
seems to be the problem.
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> Nir
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>> [1] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/console
>> >> >> >>>>> [2] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exported-artifacts/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>
>> >> >> >>>>>
Marcin, could you please take a look?
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >>
>>>>>>>>
>> >> >>
>>>>>>>> >
>> >> >>
>>>>>>>> > > [3] - https://gerrit.ovirt.org/#/c/104897/
>> >> >>
>>>>>>>> > >
>> >> >>
>>>>>>>> > >
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
>> >> >>
>>>>>>>> > > > >
>> Who installs this rpm in OST?
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
> I do not understand the question.
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
>
>> >> >>
>>>>>>>> > > > >
>> > [...]
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> > See [2] for full error.
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> > Can someone please take a look?
>> >> >>
>>>>>>>> > > > >
>> > Thanks
>> >> >>
>>>>>>>> > > > >
>> > Vojta
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> >
>> >> >>
>>>>>>>> > > > >
>> > [1] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/
>> >> >>
>>>>>>>> > > > >
>> > [2]
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact
>> >> >>
>>>>>>>> > > > >
>> /
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> >
exported-artifacts/test_logs/basic-suite-master/
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> post-002_bootstrap.py/lago-
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________
>> >> >>
>>>>>>>> > > > >
>> ____
>> >> >>
>>>>>>>> > > > >
>> ________________________________>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > Devel mailing list -- devel@ovirt.org
>> >> >>
>>>>>>>> > > > >
>> > To unsubscribe send an email to devel-leave@ovirt.org
>> >> >>
>>>>>>>> > > > >
>> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > oVirt Code of Conduct:
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> https://www.ovirt.org/community/about/community-guidelines/
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> > List Archives:
>> >> >>
>>>>>>>> > > > >
>>
>> >> >>
>>>>>>>> > > > >
>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4K5N3VQ
>> >> >>
>>>>>>>> > > > >
>> N26B
>> >> >>
>>>>>>>> > > > >
>> L73K7D45A2IR7R3UMMM23/
>> >> >>
>>>>>>>> > > > >
>>
_______________________________________________
>> >> >>
>>>>>>>> > > > >
>> Devel mailing list -- devel@ovirt.org
>> >> >>
>>>>>>>> > > > >
>> To unsubscribe send an email to devel-leave@ovirt.org
>> >> >>
>>>>>>>> > > > >
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> >> >>
>>>>>>>> > > > >
>> oVirt Code of Conduct:
>> >> >>
>>>>>>>> > > > >
>> https://www.ovirt.org/community/about/community-guidelines/
>> >> >>
>>>>>>>> > > > >
>> List Archives:
>> >> >>
>>>>>>>> > > > >
>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JN7MNUZ
>> >> >>
>>>>>>>> > > > >
>> N5K3
>> >> >>
>>>>>>>> > > > >
>> NS5TGXFCILYES77KI5TZU/
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > > >
>> >> >>
>>>>>>>> > >
>> >> >>
>>>>>>>> > >
_______________________________________________
>> >> >>
>>>>>>>> > > Devel
mailing list -- devel@ovirt.org
>> >> >>
>>>>>>>> > > To
unsubscribe send an email to devel-leave@ovirt.org
>> >> >>
>>>>>>>> > > Privacy
Statement: https://www.ovirt.org/site/privacy-policy/
>> >> >>
>>>>>>>> > > oVirt Code
of Conduct:
>> >> >>
>>>>>>>> > > https://www.ovirt.org/community/about/community-guidelines/
List Archives:
>> >> >>
>>>>>>>> > > https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UPJ5SEAV5Z65H
>> >> >>
>>>>>>>> > >
5BQ3SCHOYZX6JMTQPBW/
>> >> >>
>>>>>>>> >
>> >> >>
>>>>>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Martin
Perina
>> >> >> >>>>>
Manager, Software Engineering
>> >> >> >>>>> Red
Hat Czech s.r.o.
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >>
>> >>
>>