[Users] Data Center stuck between "Non Responsive" and "Contending"
Federico Simoncelli
fsimonce at redhat.com
Mon Jan 27 16:46:10 EST 2014
----- Original Message -----
> From: "Ted Miller" <tmiller at hcjb.org>
> To: "Federico Simoncelli" <fsimonce at redhat.com>, "Itamar Heim" <iheim at redhat.com>
> Cc: users at ovirt.org
> Sent: Monday, January 27, 2014 7:16:14 PM
> Subject: Re: [Users] Data Center stuck between "Non Responsive" and "Contending"
>
>
> On 1/27/2014 3:47 AM, Federico Simoncelli wrote:
> > Maybe someone from gluster can identify easily what happened. Meanwhile if
> > you just want to repair your data-center you could try with:
> >
> > $ cd
> > /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
> > $ touch ids
> > $ sanlock direct init -s
> > 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576
> Federico,
>
> I won't be able to do anything to the ovirt setup for another 5 hours or so
> (it is a trial system I am working on at home, I am at work), but I will try
> your repair script and report back.
>
> In bugzilla 862975 they suggested turning off write-behind caching and "eager
> locking" on the gluster volume to avoid/reduce the problems that come from
> many different computers all writing to the same file(s) on a very frequent
> basis. If I interpret the comment in the bug correctly, it did seem to help
> in that situation. My situation is a little different. My gluster setup is
> replicate only, replica 3 (though there are only two hosts). I was not
> stress-testing it, I was just using it, trying to figure out how I can import
> some old VMWare VMs without an ESXi server to run them on.
Have you done anything similar to what is described here in comment 21?
https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21
When did you realize that you weren't able to use the data-center anymore?
Can you describe exactly what you did and what happened, for example:
1. I created the data center (up and running)
2. I tried to import some VMs from VMWare
3. During the import (or after it) the data-center went in the contending state
...
Did something special happened? I don't know, power loss, split-brain?
For example also an excessive load on one of the servers could have triggered
a timeout somewhere (forcing the data-center to go back in the contending
state).
Could you check if any host was fenced? (Forcibly rebooted)
> I am guessing that what makes cluster storage have the (Master) designation
> is that this is the one that actually contains the sanlocks? If so, would it
> make sense to set up a gluster volume to be (Master), but not use it for VM
> storage, just for storing the sanlock info? Separate gluster volume(s) could
> then have the VMs on it(them), and would not need the optimizations turned
> off.
Any domain must be able to become the master at any time. Without a master
the data center is unusable (at the present time), that's why we migrate (or
reconstruct) it on another domain when necessary.
--
Federico
More information about the Users
mailing list