sanlock ids file broken after server crash

Hello, The ids file for sanlock is broken on one setup. The first host id in the file is wrong.
From the logfile i have:
verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data-center/mnt/glusterSD/ Note the broken char in the space name. This also apears. And it seams as the hostid too is broken in the ids file: leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7afa5-3a91-415b- a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7 Note the broken chars there as well. If i check the ids file with less or strings the first row where my vbgkvm01 host are. That has broken chars. Can this be repaired in some way without taking down all the virtual machines on that storage? /Johan

Hi Johan, Can you please share the vdsm and engine logs. Also, it won't harm to also get the sanlock logs just in case sanlock was configured to save all debugging in a log file (see http://people.redhat.com/teigland/sanlock-messages.txt)). Try to share the sanlock ouput by running 'sanlock client status', 'sanlock client log_dump'. Regards, Maor On Thu, Jul 27, 2017 at 6:18 PM, Johan Bernhardsson <johan@kafit.se> wrote:
Hello,
The ids file for sanlock is broken on one setup. The first host id in the file is wrong.
From the logfile i have:
verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data-center/mnt/glusterSD/
Note the broken char in the space name.
This also apears. And it seams as the hostid too is broken in the ids file:
leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7 afa5-3a91-415b- a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
Note the broken chars there as well.
If i check the ids file with less or strings the first row where my vbgkvm01 host are. That has broken chars.
Can this be repaired in some way without taking down all the virtual machines on that storage?
/Johan _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

(First reply did not get to the list)
From sanlock.log:
2017-07-30 10:49:31+0200 1766275 [1171]: s310751 lockspace 0924ff77- ef51-435b-b90d-50bfbf2e8de7:1:/rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids:0 2017-07-30 10:49:31+0200 1766275 [10496]: verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-50bfbf2e<D5>ke7 0924ff77-ef51-435b-b90d- 50bfbf2e8de7 /rhev/data-center/mnt/glusterSD/vbgsan02:_fs02/0924ff77- ef51-435b-b90d-50bfbf2e8de7/dom_md/ids 2017-07-30 10:49:31+0200 1766275 [10496]: leader1 delta_acquire_begin error -226 lockspace 0924ff77-ef51-435b-b90d-50bfbf2e8de7 host_id 1 2017-07-30 10:49:31+0200 1766275 [10496]: leader2 path /rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids offset 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 4076 oi 1 og 2031079063 lv 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader4 sn 0924ff77-ef51- 435b-b90d-50bfbf2e<D5>ke7 rn <93><F6>7^\afa5-3a91-415b-a04c- 221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7 2017-07-30 10:49:32+0200 1766276 [1171]: s310751 add_lockspace fail result -226 vdsm logs doesnt have any errors and engine.log does not have any errors. And if i check the ids file manually. I can see that everything in it is correct except for the first host in the cluster where the space name and host id is broken. /Johan On Sun, 2017-07-30 at 11:18 +0300, Maor Lipchuk wrote:
Hi Johan,
Can you please share the vdsm and engine logs.
Also, it won't harm to also get the sanlock logs just in case sanlock was configured to save all debugging in a log file (see http://people.redhat.com/teigland/sanlock-messages.txt)). Try to share the sanlock ouput by running 'sanlock client status', 'sanlock client log_dump'.
Regards, Maor
On Thu, Jul 27, 2017 at 6:18 PM, Johan Bernhardsson <johan@kafit.se> wrote:
Hello,
The ids file for sanlock is broken on one setup. The first host id in the file is wrong.
From the logfile i have:
verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d- 50bfbf2e�ke7 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data- center/mnt/glusterSD/
Note the broken char in the space name.
This also apears. And it seams as the hostid too is broken in the ids file:
leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7 afa5-3a91- 415b- a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
Note the broken chars there as well.
If i check the ids file with less or strings the first row where my vbgkvm01 host are. That has broken chars.
Can this be repaired in some way without taking down all the virtual machines on that storage?
/Johan _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi David, I'm not sure how it got to that character in the first place. Nir, Is there a safe way to fix that while there are running VMs? Regards, Maor On Sun, Jul 30, 2017 at 11:58 AM, Johan Bernhardsson <johan@kafit.se> wrote:
(First reply did not get to the list)
From sanlock.log:
2017-07-30 10:49:31+0200 1766275 [1171]: s310751 lockspace 0924ff77- ef51-435b-b90d-50bfbf2e8de7:1:/rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids:0 2017-07-30 10:49:31+0200 1766275 [10496]: verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-50bfbf2e<D5>ke7 0924ff77-ef51-435b-b90d- 50bfbf2e8de7 /rhev/data-center/mnt/glusterSD/vbgsan02:_fs02/0924ff77- ef51-435b-b90d-50bfbf2e8de7/dom_md/ids 2017-07-30 10:49:31+0200 1766275 [10496]: leader1 delta_acquire_begin error -226 lockspace 0924ff77-ef51-435b-b90d-50bfbf2e8de7 host_id 1 2017-07-30 10:49:31+0200 1766275 [10496]: leader2 path /rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids offset 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 4076 oi 1 og 2031079063 lv 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader4 sn 0924ff77-ef51- 435b-b90d-50bfbf2e<D5>ke7 rn <93><F6>7^\afa5-3a91-415b-a04c- 221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7 2017-07-30 10:49:32+0200 1766276 [1171]: s310751 add_lockspace fail result -226
vdsm logs doesnt have any errors and engine.log does not have any errors.
And if i check the ids file manually. I can see that everything in it is correct except for the first host in the cluster where the space name and host id is broken.
/Johan
On Sun, 2017-07-30 at 11:18 +0300, Maor Lipchuk wrote:
Hi Johan,
Can you please share the vdsm and engine logs.
Also, it won't harm to also get the sanlock logs just in case sanlock was configured to save all debugging in a log file (see http://people.redhat.com/teigland/sanlock-messages.txt)). Try to share the sanlock ouput by running 'sanlock client status', 'sanlock client log_dump'.
Regards, Maor
On Thu, Jul 27, 2017 at 6:18 PM, Johan Bernhardsson <johan@kafit.se> wrote:
Hello,
The ids file for sanlock is broken on one setup. The first host id in the file is wrong.
From the logfile i have:
verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d- 50bfbf2e�ke7 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data- center/mnt/glusterSD/
Note the broken char in the space name.
This also apears. And it seams as the hostid too is broken in the ids file:
leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7 afa5-3a91- 415b- a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
Note the broken chars there as well.
If i check the ids file with less or strings the first row where my vbgkvm01 host are. That has broken chars.
Can this be repaired in some way without taking down all the virtual machines on that storage?
/Johan _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Sun, Jul 30, 2017 at 4:24 PM, Maor Lipchuk <mlipchuk@redhat.com> wrote:
Hi David, Sorry, I meant Johan
I'm not sure how it got to that character in the first place. Nir, Is there a safe way to fix that while there are running VMs?
Regards, Maor
On Sun, Jul 30, 2017 at 11:58 AM, Johan Bernhardsson <johan@kafit.se> wrote:
(First reply did not get to the list)
From sanlock.log:
2017-07-30 10:49:31+0200 1766275 [1171]: s310751 lockspace 0924ff77- ef51-435b-b90d-50bfbf2e8de7:1:/rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids:0 2017-07-30 10:49:31+0200 1766275 [10496]: verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-50bfbf2e<D5>ke7 0924ff77-ef51-435b-b90d- 50bfbf2e8de7 /rhev/data-center/mnt/glusterSD/vbgsan02:_fs02/0924ff77- ef51-435b-b90d-50bfbf2e8de7/dom_md/ids 2017-07-30 10:49:31+0200 1766275 [10496]: leader1 delta_acquire_begin error -226 lockspace 0924ff77-ef51-435b-b90d-50bfbf2e8de7 host_id 1 2017-07-30 10:49:31+0200 1766275 [10496]: leader2 path /rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids offset 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 4076 oi 1 og 2031079063 lv 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader4 sn 0924ff77-ef51- 435b-b90d-50bfbf2e<D5>ke7 rn <93><F6>7^\afa5-3a91-415b-a04c- 221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7 2017-07-30 10:49:32+0200 1766276 [1171]: s310751 add_lockspace fail result -226
vdsm logs doesnt have any errors and engine.log does not have any errors.
And if i check the ids file manually. I can see that everything in it is correct except for the first host in the cluster where the space name and host id is broken.
/Johan
On Sun, 2017-07-30 at 11:18 +0300, Maor Lipchuk wrote:
Hi Johan,
Can you please share the vdsm and engine logs.
Also, it won't harm to also get the sanlock logs just in case sanlock was configured to save all debugging in a log file (see http://people.redhat.com/teigland/sanlock-messages.txt)). Try to share the sanlock ouput by running 'sanlock client status', 'sanlock client log_dump'.
Regards, Maor
On Thu, Jul 27, 2017 at 6:18 PM, Johan Bernhardsson <johan@kafit.se> wrote:
Hello,
The ids file for sanlock is broken on one setup. The first host id in the file is wrong.
From the logfile i have:
verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d- 50bfbf2e�ke7 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data- center/mnt/glusterSD/
Note the broken char in the space name.
This also apears. And it seams as the hostid too is broken in the ids file:
leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7 afa5-3a91- 415b- a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
Note the broken chars there as well.
If i check the ids file with less or strings the first row where my vbgkvm01 host are. That has broken chars.
Can this be repaired in some way without taking down all the virtual machines on that storage?
/Johan _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Sun, Jul 30, 2017 at 4:24 PM Maor Lipchuk <mlipchuk@redhat.com> wrote:
On Sun, Jul 30, 2017 at 4:24 PM, Maor Lipchuk <mlipchuk@redhat.com> wrote:
Hi David, Sorry, I meant Johan
I'm not sure how it got to that character in the first place. Nir, Is there a safe way to fix that while there are running VMs?
Reparing sanlock ids file is explained here: http://lists.ovirt.org/pipermail/users/2016-February/038051.html If you cannot put the domain into maintenance, you can try to repair the ids file while the domain is online. This may work for you, but we don't support this. Nir
Regards, Maor
On Sun, Jul 30, 2017 at 11:58 AM, Johan Bernhardsson <johan@kafit.se>
wrote:
(First reply did not get to the list)
From sanlock.log:
2017-07-30 10:49:31+0200 1766275 [1171]: s310751 lockspace 0924ff77- ef51-435b-b90d-50bfbf2e8de7:1:/rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids:0 2017-07-30 10:49:31+0200 1766275 [10496]: verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-50bfbf2e<D5>ke7 0924ff77-ef51-435b-b90d- 50bfbf2e8de7 /rhev/data-center/mnt/glusterSD/vbgsan02:_fs02/0924ff77- ef51-435b-b90d-50bfbf2e8de7/dom_md/ids 2017-07-30 10:49:31+0200 1766275 [10496]: leader1 delta_acquire_begin error -226 lockspace 0924ff77-ef51-435b-b90d-50bfbf2e8de7 host_id 1 2017-07-30 10:49:31+0200 1766275 [10496]: leader2 path /rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids offset 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 4076 oi 1 og 2031079063 lv 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader4 sn 0924ff77-ef51- 435b-b90d-50bfbf2e<D5>ke7 rn <93><F6>7^\afa5-3a91-415b-a04c- 221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7 2017-07-30 10:49:32+0200 1766276 [1171]: s310751 add_lockspace fail result -226
vdsm logs doesnt have any errors and engine.log does not have any errors.
And if i check the ids file manually. I can see that everything in it is correct except for the first host in the cluster where the space name and host id is broken.
/Johan
On Sun, 2017-07-30 at 11:18 +0300, Maor Lipchuk wrote:
Hi Johan,
Can you please share the vdsm and engine logs.
Also, it won't harm to also get the sanlock logs just in case sanlock was configured to save all debugging in a log file (see http://people.redhat.com/teigland/sanlock-messages.txt)). Try to share the sanlock ouput by running 'sanlock client status', 'sanlock client log_dump'.
Regards, Maor
On Thu, Jul 27, 2017 at 6:18 PM, Johan Bernhardsson <johan@kafit.se> wrote:
Hello,
The ids file for sanlock is broken on one setup. The first host id in the file is wrong.
From the logfile i have:
verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d- 50bfbf2e�ke7 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data- center/mnt/glusterSD/
Note the broken char in the space name.
This also apears. And it seams as the hostid too is broken in the ids file:
leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7 afa5-3a91- 415b- a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
Note the broken chars there as well.
If i check the ids file with less or strings the first row where my vbgkvm01 host are. That has broken chars.
Can this be repaired in some way without taking down all the virtual machines on that storage?
/Johan _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

I have two options. One it got there when we moved the servers from our lab desk to the hosting site. We had some problems getting it running. Or two a couple of weeks ago two servers rebooted after high load. That might have caused a damage to the file. I did manage to move all servers from that storage and removed it, cleaned it and added it as a new storage. Not what i wanted but it solved the problem. /Johan On Sun, 2017-07-30 at 16:24 +0300, Maor Lipchuk wrote:
Hi David,
I'm not sure how it got to that character in the first place. Nir, Is there a safe way to fix that while there are running VMs?
Regards, Maor
On Sun, Jul 30, 2017 at 11:58 AM, Johan Bernhardsson <johan@kafit.se> wrote:
(First reply did not get to the list)
From sanlock.log:
2017-07-30 10:49:31+0200 1766275 [1171]: s310751 lockspace 0924ff77- ef51-435b-b90d-50bfbf2e8de7:1:/rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids:0 2017-07-30 10:49:31+0200 1766275 [10496]: verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-50bfbf2e<D5>ke7 0924ff77-ef51-435b- b90d- 50bfbf2e8de7 /rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77- ef51-435b-b90d-50bfbf2e8de7/dom_md/ids 2017-07-30 10:49:31+0200 1766275 [10496]: leader1 delta_acquire_begin error -226 lockspace 0924ff77-ef51-435b-b90d-50bfbf2e8de7 host_id 1 2017-07-30 10:49:31+0200 1766275 [10496]: leader2 path /rhev/data- center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d- 50bfbf2e8de7/dom_md/ids offset 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 4076 oi 1 og 2031079063 lv 0 2017-07-30 10:49:31+0200 1766275 [10496]: leader4 sn 0924ff77-ef51- 435b-b90d-50bfbf2e<D5>ke7 rn <93><F6>7^\afa5-3a91-415b-a04c- 221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7 2017-07-30 10:49:32+0200 1766276 [1171]: s310751 add_lockspace fail result -226
vdsm logs doesnt have any errors and engine.log does not have any errors.
And if i check the ids file manually. I can see that everything in it is correct except for the first host in the cluster where the space name and host id is broken.
/Johan
On Sun, 2017-07-30 at 11:18 +0300, Maor Lipchuk wrote:
Hi Johan,
Can you please share the vdsm and engine logs.
Also, it won't harm to also get the sanlock logs just in case sanlock was configured to save all debugging in a log file (see http://people.redhat.com/teigland/sanlock-messages.txt)). Try to share the sanlock ouput by running 'sanlock client status', 'sanlock client log_dump'.
Regards, Maor
On Thu, Jul 27, 2017 at 6:18 PM, Johan Bernhardsson <johan@kafit. se> wrote:
Hello,
The ids file for sanlock is broken on one setup. The first host id in the file is wrong.
From the logfile i have:
verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d- 50bfbf2e�ke7 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data- center/mnt/glusterSD/
Note the broken char in the space name.
This also apears. And it seams as the hostid too is broken in the ids file:
leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7 afa5- 3a91- 415b- a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
Note the broken chars there as well.
If i check the ids file with less or strings the first row where my vbgkvm01 host are. That has broken chars.
Can this be repaired in some way without taking down all the virtual machines on that storage?
/Johan _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (3)
-
Johan Bernhardsson
-
Maor Lipchuk
-
Nir Soffer