
On Mon, Feb 1, 2021 at 5:23 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Feb 1, 2021 at 4:09 PM Nir Soffer <nsoffer@redhat.com> wrote:
...
For 120 seconds, you likey need
sanlock:io_timeout=20 no_path_retry=32
Shouldn't the above values for a 160 seconds timeout? I need 120
120 seconds for sanlock means that sanlock will expire the lease exactly 120 seconds since the last successful lease renewal. Sanlock cannot exceeds this deadline since other hosts assume that timeout when acquiring a lease from a "dead" host. When using 15 seconds timeout, sanlock renews the lease every 30 seconds. The best case flow is: 00 sanlock renewal succeeds 01 storage fails 30 sanlock try to renew lease 1/3 (timeout=15) 45 sanlock renewal timeout 60 sanlock try to renew lease 2/3 (timeout=15) 75 sanlock renewal timeout 90 sanlock tries to renew lease 3/3 (timeout=15) 105 sanlock renewal timeout 120 sanlock expire the lease, kill the vm/vdsm 121 storage is back If you use 20 seconds io timeout, sanlock checks every 40 seconds. The best case flow is: 00 sanlock renewal succeeds 01 storage fails 40 sanlock try to renew lease 1/3 (timeout=20) 60 sanlock renewal timeout 80 sanlock try to renew lease 2/3 (timeout=20) 100 sanlock renewal timeout 120 sanlock try to renew lease 3/3 (timeout=20) 121 storage is back 122 sanlock renwal succeeds But, we need to consider also the worst case flow: 00 sanlock renewal succeeds 39 storage fails 40 sanlock try to renew lease 1/3 (timeout=20) 60 sanlock renewal timeout 80 sanlock try to renew lease 2/3 (timeout=20) 100 sanlock renewal timeout 120 sanlock try to renew lease 3/3 (timeout=20) 140 sanlock renwal timeout 159 storage is back 160 sanlock expire lease, kill vm/vdsm etc. So even with 20 seconds io timeout, 120 seconds outage may not succeed. In practice we can assume that we detect storage outage sometime in the middle between sanlock renewals, so the flow would be: 00 sanlock renewal succeeds 20 storage fails 40 sanlock try to renew lease 1/3 (timeout=20) 60 sanlock renewal timeout 80 sanlock try to renew lease 2/3 (timeout=20) 100 sanlock renewal timeout 120 sanlock try to renew lease 3/3 (timeout=20) 140 storage is back 140 sanlock renwal succeeds 160 sanlock expire lease, kill vm/vdsm etc. So I would start with 20 seconds io timeout, and increase it if needed. These flows assume that multiapth timeout is configured properly. If multipath is using too short timeout, it will fail sanlock renewal immediately instead of queuing the I/O. I also did not add the time to detect that storage is available again. multipath check paths every 5 seconds (polling_internal), so this may add 5 seconds delay from the time the storage is up, until multipath detect it and try to send queued I/O. I think the current way sanlock works is not helpful for dealing with long outages on the storage side. If we could keep the io_timeout constant (e.g. 10 seconds), and change the number of retries we could work better and be easier to predict. Assuming we could use: io_timeout = 10 renewal_retries = 8 The worst case would be: 00 sanlock renewal succeeds 19 storage fails 20 sanlock try to renew lease 1/7 (timeout=10) 30 sanlock renewal timeout 40 sanlock try to renew lease 2/7 (timeout=10) 50 sanlock renewal timeout 60 sanlock try to renew lease 3/7 (timeout=10) 70 sanlock renewal timeout 80 sanlock try to renew lease 4/7 (timeout=10) 90 sanlock renewal timeout 100 sanlock try to renew lease 5/7 (timeout=10) 110 sanlock renewal timeout 120 sanlock try to renew lease 6/7 (timeout=10) 130 sanlock renewal timeout 139 storage is back 140 sanlock try to renew lease 7/7 (timeout=10) 140 sanlock renewal succeeds David, what do you think? ...
On another host with same config (other luns on the same storage), if I run:
multipath reconfigure -v4 > /tmp/multipath_reconfigure_v4.txt 2>&1
I get this: https://drive.google.com/file/d/1VkezFkT9IwsrYD8LoIp4-Q-j2X1dN_qR/view?usp=s...
anything important inside, concerned with path retry settings?
I don't see anything about no_path_retry, there, maybe logging was changed, or it is not the right flags to see all the info during reconfiguration. I think "multipathd show config" is the canonical way to look at the current configuration. It shows the actual values multipath will use during runtime, after local configuration was applied on top of the built configuration. Nir