
On Tue, Nov 27, 2018 at 9:34 AM Sahina Bose <sabose@redhat.com> wrote:
On Tue, Nov 13, 2018 at 4:46 PM fsoyer <fsoyer@systea.fr> wrote: 1, 'Read timeout') - indicates that there was no response from storage within 32s (I think this is the sanlock read timeout? Denis? Nir?)
This:
2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata (monitor:498) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in _pathChecked delay = result.delay() File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in delay raise exception.MiscFileReadException(self.path, self.rc, self.err) MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata', 1, 'Read timeout')
Means that reading from storage timed out after 10 seconds. See https://github.com/oVirt/vdsm/blob/9e80801f05a3e4033f51eb8f629f62fe715d0cb9/... We immediately change the storage domain to INVALID:
2018-11-11 14:33:49,450+0100 INFO (check/loop) [storage.Monitor] Domain ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469)
When the next check succeeds, we move the status back to VALID, and resume paused vms using this storage domain. Once we got a timeout, until the read completes, will see this warning every 10 seconds:
2018-11-11 14:33:59,451+0100 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata' is blocked for 20.00 seconds (check:282)
See https://github.com/oVirt/vdsm/blob/9e80801f05a3e4033f51eb8f629f62fe715d0cb9/... These timeouts are not related to sanlock, but will probably see similar timeouts in sanlock.log. because both vdsm and sanlock use read timeout of 10 seconds. Nir