Hello Raz,

I have been digging more into the issue today, and I found one likely reason why I am getting the sanlock error: the /path/to/storagedomain/dom_md/leases file is apparently missing.

/var/log/sanlock.log
Jul 24 13:37:52 virt0 sanlock[3140]: 2015-07-24 13:37:52+0000 3012847 [9110]: open error -2 /rhev/data-center/mnt/glusterSD/virt-data.syseng.contoso.com:store1/30b39180-c50d-4464-a944-18c1bfbe4b22/dom_md/leases
Jul 24 13:37:53 virt0 sanlock[3140]: 2015-07-24 13:37:53+0000 3012848 [3140]: ci 2 fd 22 pid -1 recv errno 104

[root@virt2 30b39180-c50d-4464-a944-18c1bfbe4b22]# find dom_md/
dom_md/
dom_md/ids
dom_md/inbox
dom_md/outbox
dom_md/metadata

This is obviously a problem, but I do not know how to proceed. Is there a way to regenerate or repair the file in order to reattach the domain?

Thanks

Stephen

On Thu, Jul 23, 2015 at 5:35 PM, Raz Tamir <ratamir@redhat.com> wrote:
thanks for the detailed answer.
I will take a further look and update you when I will have news




Thanks in advance,
Raz Tamir
ratamir@redhat.com
RedHat Israel
RHEV-M QE Storage team


From: "Stephen Repetski" <srepetsk@srepetsk.net>
To: "Raz Tamir" <ratamir@redhat.com>
Cc: "users" <users@ovirt.org>
Sent: Friday, July 24, 2015 12:23:07 AM

Subject: Re: [ovirt-users] oVirt not starting primary storage domain

That is correct. The volume was 9 servers w/ 3x replication, and I wanted to move all data off of one of the sets of 3 servers, and those were which I removed w/ remove-brick start and commit. Per the RH documentation (https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/Administration_Guide/sect-User_Guide-Managing_Volumes-Shrinking.html), this should not be an issue assuming the remove-brick process completes before committing it.

Stephen Repetski

On Thu, Jul 23, 2015 at 5:17 PM, Raz Tamir <ratamir@redhat.com> wrote:
As far as I can see from the logs you removed 3 bricks. Can you confirm?




Thanks in advance,
Raz Tamir
ratamir@redhat.com
RedHat Israel
RHEV-M QE Storage team


From: "Stephen Repetski" <srepetsk@srepetsk.net>
To: "Raz Tamir" <ratamir@redhat.com>
Cc: "users" <users@ovirt.org>
Sent: Friday, July 24, 2015 12:01:16 AM
Subject: Re: [ovirt-users] oVirt not starting primary storage domain


Hi Raz:

I'm using vdsm-4.16.14-0.el6.x86_64 with glusterfs-3.6.2-1.el6.x86_64 on oVirt 3.5.2.

I removed the brick with: gluster remove-brick store1 replica 3 $1 $2 $3 start; gluster remove-brick store1 replica 3 $1 $2 $3 commit. Between the two commands I used the 'status' option to verify that all nodes were marked as 'completed' before running the 'commit' one.

Also, the two log files you requested are available here:
The gluster log file is from one of the servers from a different brick in the primary (aka "store1") datacenter/gluster volume, so it was and still is in the volume.


Thanks,
Stephen


Stephen Repetski
Rochester Institute of Technology '13 | http://srepetsk.net

On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir <ratamir@redhat.com> wrote:
Hi Stephen,
1) Can you please provide the vdsm and gluster versions?
2) How you removed the brick?
3) Can you please attach the glusterfs log located under /var/log ?

* Just for info - there is no support for gluster if the volume is not a 3-way replica




Thanks in advance,
Raz Tamir
ratamir@redhat.com
RedHat Israel
RHEV-M QE Storage team


From: "Stephen Repetski" <srepetsk@srepetsk.net>
To: "users" <users@ovirt.org>
Sent: Thursday, July 23, 2015 11:08:57 PM
Subject: [ovirt-users] oVirt not starting primary storage domain


Hi all,

I recently made a change with the gluster volume backing my primary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fails to activate the primary storage domain. After attempting to start the domain the engine goes through and does its various commications with VDSM, but then fails out with a "Sanlock resource read failure" - https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362ac43ae71984a90979a676f2738648ac4ac/gistfile1.txt

Is there a way to figure out more on what this SpmStatusVDS error is and what might be causing it?

Thanks,
Stephen

Stephen Repetski

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users