On Mon, Feb 1, 2021 at 8:37 PM Gianluca Cecchi
<gianluca.cecchi(a)gmail.com> wrote:
On Mon, Feb 1, 2021 at 6:51 PM David Teigland <teigland(a)redhat.com> wrote:
>
> On Mon, Feb 01, 2021 at 07:18:24PM +0200, Nir Soffer wrote:
> > Assuming we could use:
> >
> > io_timeout = 10
> > renewal_retries = 8
> >
> > The worst case would be:
> >
> > 00 sanlock renewal succeeds
> > 19 storage fails
> > 20 sanlock try to renew lease 1/7 (timeout=10)
> > 30 sanlock renewal timeout
> > 40 sanlock try to renew lease 2/7 (timeout=10)
> > 50 sanlock renewal timeout
> > 60 sanlock try to renew lease 3/7 (timeout=10)
> > 70 sanlock renewal timeout
> > 80 sanlock try to renew lease 4/7 (timeout=10)
> > 90 sanlock renewal timeout
> > 100 sanlock try to renew lease 5/7 (timeout=10)
> > 110 sanlock renewal timeout
> > 120 sanlock try to renew lease 6/7 (timeout=10)
> > 130 sanlock renewal timeout
> > 139 storage is back
> > 140 sanlock try to renew lease 7/7 (timeout=10)
> > 140 sanlock renewal succeeds
> >
> > David, what do you think?
>
> I wish I could say, it would require some careful study to know how
> feasible it is. The timings are intricate and fundamental to correctness
> of the algorithm.
> Dave
>
I was taking values also reading this:
https://access.redhat.com/solutions/5152311
Perhaps it needs some review?
Yes, I think we need to update the effective timeout filed. The value
describe how sanlock and multipath configuration are related, but it
does not represent the maximum outage time.