
----- Original Message -----
From: "Ted Miller" <tmiller@hcjb.org> To: "Federico Simoncelli" <fsimonce@redhat.com>, "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Monday, January 27, 2014 7:16:14 PM Subject: Re: [Users] Data Center stuck between "Non Responsive" and "Contending"
On 1/27/2014 3:47 AM, Federico Simoncelli wrote:
Maybe someone from gluster can identify easily what happened. Meanwhile if you just want to repair your data-center you could try with:
$ cd /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ $ touch ids $ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576 Federico,
I won't be able to do anything to the ovirt setup for another 5 hours or so (it is a trial system I am working on at home, I am at work), but I will try your repair script and report back.
In bugzilla 862975 they suggested turning off write-behind caching and "eager locking" on the gluster volume to avoid/reduce the problems that come from many different computers all writing to the same file(s) on a very frequent basis. If I interpret the comment in the bug correctly, it did seem to help in that situation. My situation is a little different. My gluster setup is replicate only, replica 3 (though there are only two hosts). I was not stress-testing it, I was just using it, trying to figure out how I can import some old VMWare VMs without an ESXi server to run them on.
Have you done anything similar to what is described here in comment 21? https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21 When did you realize that you weren't able to use the data-center anymore? Can you describe exactly what you did and what happened, for example: 1. I created the data center (up and running) 2. I tried to import some VMs from VMWare 3. During the import (or after it) the data-center went in the contending state ... Did something special happened? I don't know, power loss, split-brain? For example also an excessive load on one of the servers could have triggered a timeout somewhere (forcing the data-center to go back in the contending state). Could you check if any host was fenced? (Forcibly rebooted)
I am guessing that what makes cluster storage have the (Master) designation is that this is the one that actually contains the sanlocks? If so, would it make sense to set up a gluster volume to be (Master), but not use it for VM storage, just for storing the sanlock info? Separate gluster volume(s) could then have the VMs on it(them), and would not need the optimizations turned off.
Any domain must be able to become the master at any time. Without a master the data center is unusable (at the present time), that's why we migrate (or reconstruct) it on another domain when necessary. -- Federico