[ovirt-users] how to debug sanlock issue

13 Jan 2022

      Hello,

I have 3 node HCI cluster with glusterfs. oVirt 4.4.9.5-1. In last 2 
weeks I experience 2 outages where HE and all/some vms were restarted. 
While digging in logs I can see that sanlock cannot renew leases and it 
leads to killing vms as is very good described in [1].

It looks to me like some hw issue with one of the hosts but cannot find 
which one.

for example today's outage restarted vms on hosts 1 and 2 but not 3. 
Sanlock logs

there are these lines in /var/log/messages on host 2 (ovirt-hci02)

Jan 13 08:27:25 ovirt-hci02 sanlock[1263]: 2022-01-13 08:27:25 1416706 
[341378]: s7 delta_renew read timeout 10 sec offset 0 
/rhev/data-center/mnt/glusterSD/10.0.4.11:_vms/6de5ae6d-c7cc-4292-bdbf-10495a38837b/dom_md/ids
Jan 13 08:28:59 ovirt-hci02 sanlock[1263]: 2022-01-13 08:28:59 1416800 
[341257]: write_sectors delta_leader offset 1024 rv -202 
/rhev/data-center/mnt/glusterSD/10.0.4.11:_engine/816a3d0b-2e10-4900-b3cb-4a9b5cd0dd5d/dom_md/ids
Jan 13 08:29:27 ovirt-hci02 sanlock[1263]: 2022-01-13 08:29:27 1416828 
[4189968]: write_sectors delta_leader offset 1024 rv -202 
/rhev/data-center/mnt/glusterSD/10.0.4.11:_engine/816a3d0b-2e10-4900-b3cb-4a9b5cd0dd5d/dom_md/ids

but not on hosts 1 and 3. Could it indicate that there could be storage 
related problem on host 1?

could you please suggest further/better debugging approach?

Thanx a lot,

Jiri

[1] https://www.ovirt.org/develop/developer-guide/vdsm/sanlock.html

[ovirt-users] how to debug sanlock issue

Jiří Sléžka