
Hi, since some time we get Error Messages from Sanlock, and so far i was not able to figure out what exactly they try to tell and more important if its something which can be ignored or needs to be fixed (and how). Here are the Versions we are using currently: Engine ovirt-engine-3.5.6.2-1.el6.noarch Nodes vdsm-4.16.34-0.el7.centos.x86_64 sanlock-3.2.4-1.el7.x86_64 libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64 libvirt-daemon-1.2.17-13.el7_2.3.x86_64 libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64 libvirt-1.2.17-13.el7_2.3.x86_64 -- snip -- May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:55:28 vm2 sanlock[1094]: 2016-05-30 09:55:28+0200 294110 [1099]: s9703 add_lockspace fail result -226 May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:55:59 vm2 sanlock[1094]: 2016-05-30 09:55:59+0200 294141 [1098]: s9704 add_lockspace fail result -226 May 30 09:56:05 vm2 sanlock[1094]: 2016-05-30 09:56:05+0200 294148 [1094]: s1527 check_other_lease invalid for host 0 0 ts 7566376 name in 4643f652-8014-4951-8a1a-02af41e67d08 May 30 09:56:05 vm2 sanlock[1094]: 2016-05-30 09:56:05+0200 294148 [1094]: s1527 check_other_lease leader 12212010 owner 1 11 ts 7566376 sn f757b127-a951-4fa9-bf90-81180c0702e6 rn f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:56:29 vm2 sanlock[1094]: 2016-05-30 09:56:29+0200 294171 [6415]: s9705 add_lockspace fail result -226 May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:56:59 vm2 sanlock[1094]: 2016-05-30 09:56:59+0200 294201 [6373]: s9706 add_lockspace fail result -226 May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:57:29 vm2 sanlock[1094]: 2016-05-30 09:57:29+0200 294231 [6399]: s9707 add_lockspace fail result -226 May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:57:59 vm2 sanlock[1094]: 2016-05-30 09:57:59+0200 294261 [6352]: s9708 add_lockspace fail result -226 -- snip -- sanlock log_dump also shows errors -- snip -- 2016-05-30 09:53:23+0200 7526415 [1017]: s567 check_other_lease invalid for host 0 0 ts 7566376 name in 4643f652-8014-4951-8a1a-02af41e67d08 2016-05-30 09:53:23+0200 7526415 [1017]: s567 check_other_lease leader 12212010 owner 1 11 ts 7566376 sn f757b127-a951-4fa9-bf90-81180c0702e6 rn f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 3786679 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 2 8 ts 3786679 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 6622415 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 3 14 ts 6622415 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 51c8f8e2-f9d8-462c-866c-e4052213ea81.vm3.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 6697413 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 4 4 ts 6697413 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 8d4f32dc-f595-4254-bfcb-d96e5057e110.vm4.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 7563413 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 5 8 ts 7563413 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 9fe564e6-cf40-4403-9fc0-eb118e1ee2cf.vm5.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 6129706 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 6 169 ts 6129706 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn b99ff588-f1ad-43f2-bd6c-6869f54d424d.vm1-tiny.i 2016-05-30 09:53:47+0200 7526439 [14576]: cmd_read_resource 23,58 /dev/c2212f15-35b7-4fa0-b13c-b50befda0af9/leases:1048576 -- snip -- any hint whould be highly appreciated Juergen

On Mon, May 30, 2016 at 11:06 AM, InterNetX - Juergen Gotteswinter <jg@internetx.com> wrote:
Hi,
since some time we get Error Messages from Sanlock, and so far i was not able to figure out what exactly they try to tell and more important if its something which can be ignored or needs to be fixed (and how).
Sanlock errors messages are somewhat cryptic, hopefully David can explain them.
Here are the Versions we are using currently:
Engine
ovirt-engine-3.5.6.2-1.el6.noarch
Nodes
vdsm-4.16.34-0.el7.centos.x86_64 sanlock-3.2.4-1.el7.x86_64 libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64 libvirt-daemon-1.2.17-13.el7_2.3.x86_64 libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64 libvirt-1.2.17-13.el7_2.3.x86_64
-- snip -- May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 [60137]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:55:28 vm2 sanlock[1094]: 2016-05-30 09:55:28+0200 294110 [1099]: s9703 add_lockspace fail result -226 May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 [60331]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:55:59 vm2 sanlock[1094]: 2016-05-30 09:55:59+0200 294141 [1098]: s9704 add_lockspace fail result -226 May 30 09:56:05 vm2 sanlock[1094]: 2016-05-30 09:56:05+0200 294148 [1094]: s1527 check_other_lease invalid for host 0 0 ts 7566376 name in 4643f652-8014-4951-8a1a-02af41e67d08 May 30 09:56:05 vm2 sanlock[1094]: 2016-05-30 09:56:05+0200 294148 [1094]: s1527 check_other_lease leader 12212010 owner 1 11 ts 7566376 sn f757b127-a951-4fa9-bf90-81180c0702e6 rn f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 [60496]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:56:29 vm2 sanlock[1094]: 2016-05-30 09:56:29+0200 294171 [6415]: s9705 add_lockspace fail result -226 May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 [60645]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:56:59 vm2 sanlock[1094]: 2016-05-30 09:56:59+0200 294201 [6373]: s9706 add_lockspace fail result -226 May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230 [60806]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:57:29 vm2 sanlock[1094]: 2016-05-30 09:57:29+0200 294231 [6399]: s9707 add_lockspace fail result -226 May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 May 30 09:57:58 vm2 sanlock[1094]: 2016-05-30 09:57:58+0200 294260 [60946]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 May 30 09:57:59 vm2 sanlock[1094]: 2016-05-30 09:57:59+0200 294261 [6352]: s9708 add_lockspace fail result -226 -- snip --
sanlock log_dump also shows errors
-- snip --
2016-05-30 09:53:23+0200 7526415 [1017]: s567 check_other_lease invalid for host 0 0 ts 7566376 name in 4643f652-8014-4951-8a1a-02af41e67d08 2016-05-30 09:53:23+0200 7526415 [1017]: s567 check_other_lease leader 12212010 owner 1 11 ts 7566376 sn f757b127-a951-4fa9-bf90-81180c0702e6 rn f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 3786679 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 2 8 ts 3786679 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 6622415 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 3 14 ts 6622415 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 51c8f8e2-f9d8-462c-866c-e4052213ea81.vm3.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 6697413 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 4 4 ts 6697413 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 8d4f32dc-f595-4254-bfcb-d96e5057e110.vm4.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 7563413 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 5 8 ts 7563413 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn 9fe564e6-cf40-4403-9fc0-eb118e1ee2cf.vm5.intern 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease invalid for host 0 0 ts 6129706 name in f757b127-a951-4fa9-bf90-81180c0702e6 2016-05-30 09:53:33+0200 7526425 [1017]: s568 check_other_lease leader 12212010 owner 6 169 ts 6129706 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn b99ff588-f1ad-43f2-bd6c-6869f54d424d.vm1-tiny.i 2016-05-30 09:53:47+0200 7526439 [14576]: cmd_read_resource 23,58 /dev/c2212f15-35b7-4fa0-b13c-b50befda0af9/leases:1048576 -- snip --
any hint whould be highly appreciated
Juergen _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids
leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2
VDSM has tried to join VG/lockspace/storage-domain "f757b127" on LV /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids. But sanlock finds that lockspace "4643f652" is initialized on that storage, i.e. inconsistency between the leases formatted on disk and what the leases are being used for. That should never happen unless sanlock and/or storage are used/moved/copied wrongly. The error is a sanlock sanity check to catch misuse.
s1527 check_other_lease invalid for host 0 0 ts 7566376 name in 4643f652-8014-4951-8a1a-02af41e67d08
s1527 check_other_lease leader 12212010 owner 1 11 ts 7566376 sn f757b127-a951-4fa9-bf90-81180c0702e6 rn f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern
Apparently sanlock is already managing a lockspace called "4643f652" when it finds another lease in that lockspace has the inconsistent/corrupt name "f757b127". I can't say what steps might have been done to lead to this. This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed.

On Thu, Jun 2, 2016 at 6:35 PM, David Teigland <teigland@redhat.com> wrote:
verify_leader 2 wrong space name 4643f652-8014-4951-8a1a-02af41e67d08 f757b127-a951-4fa9-bf90-81180c0702e6 /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids
leader1 delta_acquire_begin error -226 lockspace f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2
VDSM has tried to join VG/lockspace/storage-domain "f757b127" on LV /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids. But sanlock finds that lockspace "4643f652" is initialized on that storage, i.e. inconsistency between the leases formatted on disk and what the leases are being used for. That should never happen unless sanlock and/or storage are used/moved/copied wrongly. The error is a sanlock sanity check to catch misuse.
s1527 check_other_lease invalid for host 0 0 ts 7566376 name in 4643f652-8014-4951-8a1a-02af41e67d08
s1527 check_other_lease leader 12212010 owner 1 11 ts 7566376 sn f757b127-a951-4fa9-bf90-81180c0702e6 rn f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern
Apparently sanlock is already managing a lockspace called "4643f652" when it finds another lease in that lockspace has the inconsistent/corrupt name "f757b127". I can't say what steps might have been done to lead to this.
This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed.
Thanks David. Do you need more output from sanlock to understand this issue? Juergen, can you open ovirt bug, and include sanlock and vdsm logs from the time this error started? Thanks, Nir

On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote:
This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed.
Thanks David.
Do you need more output from sanlock to understand this issue?
I can think of nothing more to learn from sanlock. I'd suggest tighter, higher level checking or control of storage. Low level sanity checks detecting lease corruption are not a convenient place to work from.

Hello David, thanks for your explanation of those messages, is there any possibility to get rid of this? i already figured out that it might be an corruption of the ids file, but i didnt find anything about re-creating or other solutions to fix this. Imho this occoured after an outage where several hosts, and the iscsi SAN has been fenced and/or rebooted. Thanks, Juergen Am 6/2/2016 um 6:03 PM schrieb David Teigland:
On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote:
This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed.
Thanks David.
Do you need more output from sanlock to understand this issue?
I can think of nothing more to learn from sanlock. I'd suggest tighter, higher level checking or control of storage. Low level sanity checks detecting lease corruption are not a convenient place to work from.

What if we move all vm off the lun which causes this error, drop the lun and recreated it. Will we "migrate" the error with the VM to a different lun or could this be a fix? Am 6/3/2016 um 10:08 AM schrieb InterNetX - Juergen Gotteswinter:
Hello David,
thanks for your explanation of those messages, is there any possibility to get rid of this? i already figured out that it might be an corruption of the ids file, but i didnt find anything about re-creating or other solutions to fix this.
Imho this occoured after an outage where several hosts, and the iscsi SAN has been fenced and/or rebooted.
Thanks,
Juergen
Am 6/2/2016 um 6:03 PM schrieb David Teigland:
On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote:
This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed.
Thanks David.
Do you need more output from sanlock to understand this issue?
I can think of nothing more to learn from sanlock. I'd suggest tighter, higher level checking or control of storage. Low level sanity checks detecting lease corruption are not a convenient place to work from.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Fri, Jun 3, 2016 at 11:27 AM, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> wrote:
What if we move all vm off the lun which causes this error, drop the lun and recreated it. Will we "migrate" the error with the VM to a different lun or could this be a fix?
This should will fix the ids file, but since we don't know why this corruption happened, it may happen again. Please open a bug with the log I requested so we can investigate this issue. To fix the ids file you don't have to recreate the lun, just initialize the ids lv. 1. Put the domain to maintenance (via engine) No host should access it while you reconstruct the ids file 2. Activate the ids lv You may need to connect to this iscsi target first, unless you have other vgs connected on the same target. lvchange -ay sd_uuid/ids 3. Initialize the lockspace sanlock direct init -s <sd_uuid>:0:/dev/<sd_uuid>/ids:0 4. Deactivate the ids lv lvchange -an sd_uuid/ids 6. Activate the domain (via engine) The domain should become active after a while. Nir
Am 6/3/2016 um 10:08 AM schrieb InterNetX - Juergen Gotteswinter:
Hello David,
thanks for your explanation of those messages, is there any possibility to get rid of this? i already figured out that it might be an corruption of the ids file, but i didnt find anything about re-creating or other solutions to fix this.
Imho this occoured after an outage where several hosts, and the iscsi SAN has been fenced and/or rebooted.
Thanks,
Juergen
Am 6/2/2016 um 6:03 PM schrieb David Teigland:
On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote:
This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed.
Thanks David.
Do you need more output from sanlock to understand this issue?
I can think of nothing more to learn from sanlock. I'd suggest tighter, higher level checking or control of storage. Low level sanity checks detecting lease corruption are not a convenient place to work from.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Am 6/3/2016 um 6:37 PM schrieb Nir Soffer:
On Fri, Jun 3, 2016 at 11:27 AM, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> wrote:
What if we move all vm off the lun which causes this error, drop the lun and recreated it. Will we "migrate" the error with the VM to a different lun or could this be a fix?
This should will fix the ids file, but since we don't know why this corruption happened, it may happen again.
i am pretty sure to know when / why this happend, after a major outage with engine gone crazy in fencing hosts + crash / hard reset of the san this messages occoured the first time. but i can provide a log package, no problem
Please open a bug with the log I requested so we can investigate this issue.
To fix the ids file you don't have to recreate the lun, just initialize the ids lv.
1. Put the domain to maintenance (via engine)
No host should access it while you reconstruct the ids file
2. Activate the ids lv
You may need to connect to this iscsi target first, unless you have other vgs connected on the same target.
lvchange -ay sd_uuid/ids
3. Initialize the lockspace
sanlock direct init -s <sd_uuid>:0:/dev/<sd_uuid>/ids:0
4. Deactivate the ids lv
lvchange -an sd_uuid/ids
6. Activate the domain (via engine)
The domain should become active after a while.
oh, this is great, going to announce an maintance window. Thanks a lot, this already started to drive me crazy. Will Report after we did this!
Nir
Am 6/3/2016 um 10:08 AM schrieb InterNetX - Juergen Gotteswinter:
Hello David,
thanks for your explanation of those messages, is there any possibility to get rid of this? i already figured out that it might be an corruption of the ids file, but i didnt find anything about re-creating or other solutions to fix this.
Imho this occoured after an outage where several hosts, and the iscsi SAN has been fenced and/or rebooted.
Thanks,
Juergen
Am 6/2/2016 um 6:03 PM schrieb David Teigland:
On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote:
This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed.
Thanks David.
Do you need more output from sanlock to understand this issue?
I can think of nothing more to learn from sanlock. I'd suggest tighter, higher level checking or control of storage. Low level sanity checks detecting lease corruption are not a convenient place to work from.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (4)
-
David Teigland
-
InterNetX - Juergen Gotteswinter
-
InterNetX - Juergen Gotteswinter
-
Nir Soffer