On Tue, Nov 27, 2018 at 9:34 AM Sahina Bose <sabose@redhat.com> wrote:
On Tue, Nov 13, 2018 at 4:46 PM fsoyer <fsoyer@systea.fr> wrote:
1, 'Read timeout') - indicates that there was no response from storage
within 32s (I think this is the sanlock read timeout? Denis? Nir?)

This:

> 2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata (monitor:498)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in _pathChecked
>     delay = result.delay()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in delay
>     raise exception.MiscFileReadException(self.path, self.rc, self.err)
> MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata', 1, 'Read timeout')

Means that reading from storage timed out after 10 seconds.

See

We immediately change the storage domain to INVALID:

> 2018-11-11 14:33:49,450+0100 INFO  (check/loop) [storage.Monitor] Domain ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469)

When the next check succeeds, we move the status back to VALID, and resume paused
vms using this storage domain.

Once we got a timeout, until the read completes, will see this warning every 10 seconds:

> 2018-11-11 14:33:59,451+0100 WARN  (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata' is blocked for 20.00 seconds (check:282)

See
https://github.com/oVirt/vdsm/blob/9e80801f05a3e4033f51eb8f629f62fe715d0cb9/lib/vdsm/storage/check.py#L280

These timeouts are not related to sanlock, but will probably see similar timeouts in sanlock.log.
because both vdsm and sanlock use read timeout of 10 seconds.

Nir