Re: [Users] Data Center stuck between "Non Responsive" and "Contending"

Monday, 27 January 2014

----- Original Message -----
...
 From: "Ted Miller" <tmiller(a)hcjb.org&gt;
 To: "Federico Simoncelli" <fsimonce(a)redhat.com&gt;, "Itamar Heim"
<iheim(a)redhat.com&gt;
 Cc: users(a)ovirt.org
 Sent: Monday, January 27, 2014 7:16:14 PM
 Subject: Re: [Users] Data Center stuck between "Non Responsive" and
"Contending"

 On 1/27/2014 3:47 AM, Federico Simoncelli wrote:
 > Maybe someone from gluster can identify easily what happened. Meanwhile if
 > you just want to repair your data-center you could try with:
 >
 >   $ cd
 >  
/rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
 >   $ touch ids
 >   $ sanlock direct init -s
 >   0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576
 Federico,

 I won't be able to do anything to the ovirt setup for another 5 hours or so
 (it is a trial system I am working on  at home, I am at work), but I will try
 your repair script and report back.

 In bugzilla 862975 they suggested turning off write-behind caching and "eager
 locking" on the gluster volume to avoid/reduce the problems that come from
 many different computers all writing to the same file(s) on a very frequent
 basis.  If I interpret the comment in the bug correctly, it did seem to help
 in that situation.  My situation is a little different.  My gluster setup is
 replicate only, replica 3 (though there are only two hosts).  I was not
 stress-testing it, I was just using it, trying to figure out how I can import
 some old VMWare VMs without an ESXi server to run them on. 
Have you done anything similar to what is described here in comment 21?

https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21

When did you realize that you weren't able to use the data-center anymore?
Can you describe exactly what you did and what happened, for example:

1. I created the data center (up and running)
2. I tried to import some VMs from VMWare
3. During the import (or after it) the data-center went in the contending state
...

Did something special happened? I don't know, power loss, split-brain?
For example also an excessive load on one of the servers could have triggered
a timeout somewhere (forcing the data-center to go back in the contending
state).

Could you check if any host was fenced? (Forcibly rebooted)

...
 I am guessing that what makes cluster storage have the (Master)
designation
 is that this is the one that actually contains the sanlocks?  If so, would it
 make sense to set up a gluster volume to be (Master), but not use it for VM
 storage, just for storing the sanlock info?  Separate gluster volume(s) could
 then have the VMs on it(them), and would not need the optimizations turned
 off. 
Any domain must be able to become the master at any time. Without a master
the data center is unusable (at the present time), that's why we migrate (or
reconstruct) it on another domain when necessary.

-- 
Federico

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Users] Data Center stuck between "Non Responsive" and "Contending"