[ovirt-users] oVirt not starting primary storage domain

Fri Jul 24 13:53:02 UTC 2015

Hello Raz,

I have been digging more into the issue today, and I found one likely
reason why I am getting the sanlock error: the
/path/to/storagedomain/dom_md/leases file is apparently missing.

/var/log/sanlock.log
Jul 24 13:37:52 virt0 sanlock[3140]: 2015-07-24 13:37:52+0000 3012847
[9110]: open error -2
/rhev/data-center/mnt/glusterSD/virt-data.syseng.contoso.com:
store1/30b39180-c50d-4464-a944-18c1bfbe4b22/dom_md/leases
Jul 24 13:37:53 virt0 sanlock[3140]: 2015-07-24 13:37:53+0000 3012848
[3140]: ci 2 fd 22 pid -1 recv errno 104

[root at virt2 30b39180-c50d-4464-a944-18c1bfbe4b22]# find dom_md/
dom_md/
dom_md/ids
dom_md/inbox
dom_md/outbox
dom_md/metadata

This is obviously a problem, but I do not know how to proceed. Is there a
way to regenerate or repair the file in order to reattach the domain?

Thanks

Stephen

On Thu, Jul 23, 2015 at 5:35 PM, Raz Tamir <ratamir at redhat.com> wrote:

> thanks for the detailed answer.
> I will take a further look and update you when I will have news
>
>
>
>
> Thanks in advance,
> Raz Tamir
> ratamir at redhat.com
> RedHat Israel
> RHEV-M QE Storage team
>
> ------------------------------
> *From: *"Stephen Repetski" <srepetsk at srepetsk.net>
> *To: *"Raz Tamir" <ratamir at redhat.com>
> *Cc: *"users" <users at ovirt.org>
> *Sent: *Friday, July 24, 2015 12:23:07 AM
>
> *Subject: *Re: [ovirt-users] oVirt not starting primary storage domain
>
> That is correct. The volume was 9 servers w/ 3x replication, and I wanted
> to move all data off of one of the sets of 3 servers, and those were which
> I removed w/ remove-brick start and commit. Per the RH documentation (
> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/Administration_Guide/sect-User_Guide-Managing_Volumes-Shrinking.html),
> this should not be an issue assuming the remove-brick process completes
> before committing it.
>
> *Stephen Repetski*
>
> On Thu, Jul 23, 2015 at 5:17 PM, Raz Tamir <ratamir at redhat.com> wrote:
>
>> As far as I can see from the logs you removed 3 bricks. Can you confirm?
>>
>>
>>
>>
>> Thanks in advance,
>> Raz Tamir
>> ratamir at redhat.com
>> RedHat Israel
>> RHEV-M QE Storage team
>>
>> ------------------------------
>> *From: *"Stephen Repetski" <srepetsk at srepetsk.net>
>> *To: *"Raz Tamir" <ratamir at redhat.com>
>> *Cc: *"users" <users at ovirt.org>
>> *Sent: *Friday, July 24, 2015 12:01:16 AM
>> *Subject: *Re: [ovirt-users] oVirt not starting primary storage domain
>>
>>
>> Hi Raz:
>>
>> I'm using vdsm-4.16.14-0.el6.x86_64 with glusterfs-3.6.2-1.el6.x86_64 on
>> oVirt 3.5.2.
>>
>> I removed the brick with: gluster remove-brick store1 replica 3 $1 $2 $3
>> start; gluster remove-brick store1 replica 3 $1 $2 $3 commit. Between the
>> two commands I used the 'status' option to verify that all nodes were
>> marked as 'completed' before running the 'commit' one.
>>
>> Also, the two log files you requested are available here:
>> http://srepetsk.net/files/engine.log.20150723 &&
>> http://srepetsk.net/files/etc-glusterfs-glusterd.vol.log.20150723
>> The gluster log file is from one of the servers from a different brick in
>> the primary (aka "store1") datacenter/gluster volume, so it was and still
>> is in the volume.
>>
>>
>> Thanks,
>> Stephen
>>
>>
>> *Stephen Repetski*
>> Rochester Institute of Technology '13 | http://srepetsk.net
>>
>> On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir <ratamir at redhat.com> wrote:
>>
>>> Hi Stephen,
>>> 1) Can you please provide the vdsm and gluster versions?
>>> 2) How you removed the brick?
>>> 3) Can you please attach the glusterfs log located under /var/log ?
>>>
>>> * Just for info - there is no support for gluster if the volume is not a
>>> 3-way replica
>>>
>>>
>>>
>>>
>>> Thanks in advance,
>>> Raz Tamir
>>> ratamir at redhat.com
>>> RedHat Israel
>>> RHEV-M QE Storage team
>>>
>>> ------------------------------
>>> *From: *"Stephen Repetski" <srepetsk at srepetsk.net>
>>> *To: *"users" <users at ovirt.org>
>>> *Sent: *Thursday, July 23, 2015 11:08:57 PM
>>> *Subject: *[ovirt-users] oVirt not starting primary storage domain
>>>
>>>
>>> Hi all,
>>>
>>> I recently made a change with the gluster volume backing my primary
>>> storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now
>>> oVirt fails to activate the primary storage domain. After attempting to
>>> start the domain the engine goes through and does its various commications
>>> with VDSM, but then fails out with a "Sanlock resource read failure" -
>>> https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362ac43ae71984a90979a676f2738648ac4ac/gistfile1.txt
>>>
>>> Is there a way to figure out more on what this SpmStatusVDS error is and
>>> what might be causing it?
>>>
>>> Thanks,
>>> Stephen
>>>
>>> *Stephen Repetski*
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150724/15ae8bce/attachment-0001.html>