After upgrading to 4.3.10 from 4.3.9, when doing VM backups after hours, we started to
have VMs freeze/pause during nightly backup runs. We expect the increased load exposes
the issue.
We reverted the hosts back to 4.3.9 and the problem went away, after some testing using a
single host on .10 , I am seeing the below error in sanlock.log:
2020-10-14 16:06:12 2939 [5724]: 95bd5893 aio timeout RD
0x7f7f9c0008c0:0x7f7f9c0008d0:0x7f7fa9efb000 ioto 10 to_count 1
2020-10-14 16:06:12 2939 [5724]: s2 delta_renew read timeout 10 sec offset 0
/dev/95bd5893-83d4-42f2-b333-1c65226f1d09/ids
2020-10-14 16:06:12 2939 [5724]: s2 renewal error -202 delta_length 10 last_success 2908
2020-10-14 16:06:14 2941 [5724]: 95bd5893 aio collect RD
0x7f7f9c0008c0:0x7f7f9c0008d0:0x7f7fa9efb000 result 1048576:0 match reap
So engine is still at 4.3.10. We also see the error below in messages:
Oct 14 16:09:20 HOSTNAME kernel: perf: interrupt took too long (2509 > 2500), lowering
kernel.perf_event_max_sample_rate to 79000
I guess my question is two fold, how do I go about troubleshooting this further. Otherwise
would it be better/possible to move to 4.4.2 (or 4.4.3 when released.) Do all hosts have
to be on 4.3.10, or can the hosts be on 4.3.9 while engine is 4.3.10 to do the migration?
Thank you!