----- Original Message -----
> From: "Itamar Heim" <iheim(a)redhat.com>
> To: "Ted Miller" <tmiller(a)hcjb.org>, users(a)ovirt.org, "Federico
Simoncelli" <fsimonce(a)redhat.com>
> Cc: "Allon Mureinik" <amureini(a)redhat.com>
> Sent: Sunday, January 26, 2014 11:17:04 PM
> Subject: Re: [Users] Data Center stuck between "Non Responsive" and
"Contending"
>
> On 01/27/2014 12:00 AM, Ted Miller wrote:
>> On 1/26/2014 4:00 PM, Itamar Heim wrote:
>>> On 01/26/2014 10:51 PM, Ted Miller wrote:
>>>> On 1/26/2014 3:10 PM, Itamar Heim wrote:
>>>>> On 01/26/2014 10:08 PM, Ted Miller wrote:
>>>>> is this gluster storage (guessing sunce you mentioned a
'volume')
>>>> yes (mentioned under "setup" above)
>>>>> does it have a quorum?
>>>> Volume Name: VM2
>>>> Type: Replicate
>>>> Volume ID: 7bea8d3b-ec2a-4939-8da8-a82e6bda841e
>>>> Status: Started
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: 10.41.65.2:/bricks/01/VM2
>>>> Brick2: 10.41.65.4:/bricks/01/VM2
>>>> Brick3: 10.41.65.4:/bricks/101/VM2
>>>> Options Reconfigured:
>>>> cluster.server-quorum-type: server
>>>> storage.owner-gid: 36
>>>> storage.owner-uid: 36
>>>> auth.allow: *
>>>> user.cifs: off
>>>> nfs.disa
>>>>> (there were reports of split brain on the domain metadata before
when
>>>>> no quorum exist for gluster)
>>>> after full heal:
>>>>
>>>> [root@office4a ~]$ gluster volume heal VM2 info
>>>> Gathering Heal info on volume VM2 has been successful
>>>>
>>>> Brick 10.41.65.2:/bricks/01/VM2
>>>> Number of entries: 0
>>>>
>>>> Brick 10.41.65.4:/bricks/01/VM2
>>>> Number of entries: 0
>>>>
>>>> Brick 10.41.65.4:/bricks/101/VM2
>>>> Number of entries: 0
>>>> [root@office4a ~]$ gluster volume heal VM2 info split-brain
>>>> Gathering Heal info on volume VM2 has been successful
>>>>
>>>> Brick 10.41.65.2:/bricks/01/VM2
>>>> Number of entries: 0
>>>>
>>>> Brick 10.41.65.4:/bricks/01/VM2
>>>> Number of entries: 0
>>>>
>>>> Brick 10.41.65.4:/bricks/101/VM2
>>>> Number of entries: 0
>>>>
>>>> noticed this in host /var/log/messages (while looking for something
else). Loop seems to repeat over and over.
>>>>
>>>> Jan 26 15:35:52 office4a sanlock[3763]: 2014-01-26 15:35:52-0500 14678
[30419]: read_sectors delta_leader offset 512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
>>>>
>>>>
>>>> Jan 26 15:35:53 office4a sanlock[3763]: 2014-01-26 15:35:53-0500 14679
[3771]: s1997 add_lockspace fail result -90
>>>> Jan 26 15:35:58 office4a vdsm TaskManager.Task ERROR
Task=`89885661-88eb-4ea3-8793-00438735e4ab`::Unexpected error#012Traceback (most recent
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012
return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.py", line 45, in
wrapper#012 res = f(*args, **kwargs)#012 File
"/usr/share/vdsm/storage/hsm.py", line 2111, in getAllTasksStatuses#012
allTasksStatus = sp.getAllTasksStatuses()#012 File
"/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012
>>>> raise SecureError()#012SecureError
>>>> Jan 26 15:35:59 office4a sanlock[3763]: 2014-01-26 15:35:59-0500 14686
[30495]: read_sectors delta_leader offset 512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
>>>>
>>>>
>>>> Jan 26 15:36:00 office4a sanlock[3763]: 2014-01-26 15:36:00-0500 14687
[3772]: s1998 add_lockspace fail result -90
>>>> Jan 26 15:36:00 office4a vdsm TaskManager.Task ERROR
Task=`8db9ff1a-2894-407a-915a-279f6a7eb205`::Unexpected error#012Traceback (most recent
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012
return fn(*args, **kargs)#012 File "/usr/share/vdsm/storage/task.py", line 318,
in run#012 return self.cmd(*self.argslist, **self.argsdict)#012 File
"/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012
self.masterDomain.acquireHostId(self.id)#012 File
"/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012
self._clusterLock.acquireHostId(hostId, async)#012 File
"/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId#012
raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire
host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90,
'Sanlock lockspace add failure', 'Message too long'))
> fede - thoughts on above?
> (vojtech reported something similar, but it sorted out for him after
> some retries)
Something truncated the ids file, as also reported by:
> [root@office4a ~]$ ls
>
/rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
> -l
> total 1029
> -rw-rw---- 1 vdsm kvm 0 Jan 22 00:44 ids
> -rw-rw---- 1 vdsm kvm 0 Jan 16 18:50 inbox
> -rw-rw---- 1 vdsm kvm 2097152 Jan 21 18:20 leases
> -rw-r--r-- 1 vdsm kvm 491 Jan 21 18:20 metadata
> -rw-rw---- 1 vdsm kvm 0 Jan 16 18:50 outbox
In the past I saw that happening because of a glusterfs bug:
https://bugzilla.redhat.com/show_bug.cgi?id=862975
Anyway in general it seems that glusterfs is not always able to reconcile
the ids file (as it's written by all the hosts at the same time).
Maybe someone from gluster can identify easily what happened. Meanwhile if
you just want to repair your data-center you could try with:
$ cd
/rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
$ touch ids
$ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576
Federico,
I won't be able to do anything to the ovirt setup for another 5 hours or so
(it is a trial system I am working on at home, I am at work), but I will try
your repair script and report back.
In bugzilla 862975 they suggested turning off write-behind caching and "eager
locking" on the gluster volume to avoid/reduce the problems that come from
many different computers all writing to the same file(s) on a very frequent
basis. If I interpret the comment in the bug correctly, it did seem to help
in that situation. My situation is a little different. My gluster setup is
replicate only, replica 3 (though there are only two hosts). I was not
stress-testing it, I was just using it, trying to figure out how I can import
some old VMWare VMs without an ESXi server to run them on.
I am guessing that what makes cluster storage have the (Master) designation
is that this is the one that actually contains the sanlocks? If so, would it
make sense to set up a gluster volume to be (Master), but not use it for VM
storage, just for storing the sanlock info? Separate gluster volume(s) could
then have the VMs on it(them), and would not need the optimizations turned off.
If it is any help, I put some excerpts from my sanlock.log files in a
separate thread with the subject: "sanlock can't read empty 'ids'
file"
Thank you for your prompt response, .
Ted Miller
Elkhart, IN