sanlock + gluster recovery -- RFE

--------------050904040002030300050808 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Itamar, I am addressing this to you because one of your assignments seems to be to coordinate other oVirt contributors when dealing with issues that are raised on the ovirt-users email list. As you are aware, there is an ongoing split-brain problem with running sanlock on replicated gluster storage. Personally, I believe that this is the 5th time that I have been bitten by this sanlock+gluster problem. I believe that the following are true (if not, my entire request is probably off base). * ovirt uses sanlock in such a way that when the sanlock storage is on a replicated gluster file system, very small storage disruptions can result in a gluster split-brain on the sanlock space o gluster is aware of the problem, and is working on a different way of replicating data, which will reduce these problems. * most (maybe all) of the sanlock locks have a short duration, measured in seconds * there are only a couple of things that a user can safely do from the command line when a file is in split-brain o delete the file o rename (mv) the file * x _How did I get into this mess?_ had 3 hosts running ovirt 3.3 each hosted VMs gluster replica 3 storage engine was external to cluster upgraded 3 hosts from ovirt 3.3 to 3.4 hosted-engine deploy used new gluster volume (accessed via nfs) for storage storage was accessed using localhost:engVM1 link (localhost was probably a poor choice) created new engine on VM (did not transfer any data from old engine) added 3 hosts to new engine via web-gui ran above setup for 3 days shut entire system down before I left on vacation (holiday) came back from vacation powered on hosts found that iptables did not have rules for gluster access (a continuing problem if host installation is allowed to set up firewall) added rules for gluster glusterfs now up and running added storage manually tried "hosted-engine --vm-start" vm did not start logs show sanlock errors "gluster volume heal engVM1full: "gluster volume heal engVM1 info split-brain" showed 6 files in split-brain all 5 prefixed by /rhev/data-center/mnt/localhost\:_engVM1 UUID/dom_md/ids UUID/images/UUID/UUID (VM hard disk) UUID/images/UUID/UUID.lease UUID/ha_agent/hosted-engine.lockspace UUID/ha_agent/hosted-engine.metadata I copied each of the above files off of each of the three bricks to a safe place (15 files copied) I renamed the 5 files on /rhev/.... I copied the 5 files from one of the bricks to /rhev/ files can now be read OK (e.g. cat ids) sanlock.log shows error sets like these: 2014-05-20 03:23:39-0400 36199 [2843]: s3358 lockspace 5ebb3b40-a394-405b-bbac-4c0e21ccd659:1:/rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids:0 2014-05-20 03:23:39-0400 36199 [18873]: open error -5 /rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids 2014-05-20 03:23:39-0400 36199 [18873]: s3358 open_disk /rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids error -5 2014-05-20 03:23:40-0400 36200 [2843]: s3358 add_lockspace fail result -19 I am now stuck What I would like to see in ovirt to help me (and others like me). Alternates listed in order from most desirable (automatic) to least desirable (set of commands to type, with lots of variables to figure out). 1. automagic recovery * When a host is not able to access sanlock, it writes a small "problem" text file into the shared storage o the host-ID as part of the name (so only one host ever accesses that file) o a status number for the error causing problems o time stamp o time stamp when last sanlock lease will expire o if sanlock is able to access the file, the "problem" file is deleted * when time passes for its last sanlock lease to be expired, highest number host does a survey o did all other hosts create "problem" files? o do all "problem" files show same (or compatible) error codes related to file access problems? o are all hosts communicating by network? o if yes to all above * delete all sanlock storage space * initialize sanlock from scratch * restart whatever may have given up because of sanlock * restart VM if necessary 2. recovery subcommand * add "hosted-engine --lock-initialize" command that would delete sanlock, start over from scratch 3. script * publish a script (in ovirt packages or available on web) which, when run, does all (or most) of the recovery process needed. 4. commands * publish on the web a "recipe" for dealing with files that commonly go split-brain o ids o *.lease o *.lockspace Any chance of any help on any of the above levels? Ted Miller Elkhart, IN, USA --------------050904040002030300050808 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> </head> <body text="#000000" bgcolor="#FFFFFF"> Itamar, I am addressing this to you because one of your assignments seems to be to coordinate other oVirt contributors when dealing with issues that are raised on the ovirt-users email list.<br> <br> As you are aware, there is an ongoing split-brain problem with running sanlock on replicated gluster storage. Personally, I believe that this is the 5th time that I have been bitten by this sanlock+gluster problem.<br> <br> I believe that the following are true (if not, my entire request is probably off base).<br> <ul> <li>ovirt uses sanlock in such a way that when the sanlock storage is on a replicated gluster file system, very small storage disruptions can result in a gluster split-brain on the sanlock space</li> <ul> <li>gluster is aware of the problem, and is working on a different way of replicating data, which will reduce these problems.</li> </ul> <li>most (maybe all) of the sanlock locks have a short duration, measured in seconds</li> <li>there are only a couple of things that a user can safely do from the command line when a file is in split-brain</li> <ul> <li>delete the file</li> <li>rename (mv) the file<br> </li> </ul> <li>x</li> </ul> <u>How did I get into this mess?</u><br> <br> had 3 hosts running ovirt 3.3<br> each hosted VMs<br> gluster replica 3 storage<br> engine was external to cluster<br> upgraded 3 hosts from ovirt 3.3 to 3.4<br> hosted-engine deploy<br> used new gluster volume (accessed via nfs) for storage<br> storage was accessed using localhost:engVM1 link (localhost was probably a poor choice)<br> created new engine on VM (did not transfer any data from old engine)<br> added 3 hosts to new engine via web-gui<br> ran above setup for 3 days<br> shut entire system down before I left on vacation (holiday)<br> came back from vacation<br> powered on hosts<br> found that iptables did not have rules for gluster access<br> (a continuing problem if host installation is allowed to set up firewall)<br> added rules for gluster<br> glusterfs now up and running<br> added storage manually<br> tried "hosted-engine --vm-start"<br> vm did not start<br> logs show sanlock errors<br> "gluster volume heal engVM1full:<br> "gluster volume heal engVM1 info split-brain" showed 6 files in split-brain<br> all 5 prefixed by /rhev/data-center/mnt/localhost\:_engVM1<br> UUID/dom_md/ids<br> UUID/images/UUID/UUID (VM hard disk)<br> UUID/images/UUID/UUID.lease<br> UUID/ha_agent/hosted-engine.lockspace<br> UUID/ha_agent/hosted-engine.metadata<br> I copied each of the above files off of each of the three bricks to a safe place (15 files copied)<br> I renamed the 5 files on /rhev/....<br> I copied the 5 files from one of the bricks to /rhev/<br> files can now be read OK (e.g. cat ids)<br> sanlock.log shows error sets like these:<br> <pre>2014-05-20 03:23:39-0400 36199 [2843]: s3358 lockspace 5ebb3b40-a394-405b-bbac-4c0e21ccd659:1:/rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids:0 2014-05-20 03:23:39-0400 36199 [18873]: open error -5 /rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids 2014-05-20 03:23:39-0400 36199 [18873]: s3358 open_disk /rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids error -5 2014-05-20 03:23:40-0400 36200 [2843]: s3358 add_lockspace fail result -19</pre> I am now stuck<br> <br> What I would like to see in ovirt to help me (and others like me). Alternates listed in order from most desirable (automatic) to least desirable (set of commands to type, with lots of variables to figure out).<br> <br> 1. automagic recovery<br> <ul> <li> When a host is not able to access sanlock, it writes a small "problem" text file into the shared storage</li> <ul> <li>the host-ID as part of the name (so only one host ever accesses that file)</li> <li>a status number for the error causing problems</li> <li>time stamp</li> <li>time stamp when last sanlock lease will expire</li> <li>if sanlock is able to access the file, the "problem" file is deleted</li> </ul> <li>when time passes for its last sanlock lease to be expired, highest number host does a survey</li> <ul> <li>did all other hosts create "problem" files?</li> <li>do all "problem" files show same (or compatible) error codes related to file access problems?</li> <li>are all hosts communicating by network?</li> <li>if yes to all above</li> </ul> <li>delete all sanlock storage space<br> </li> <li>initialize sanlock from scratch</li> <li>restart whatever may have given up because of sanlock</li> <li>restart VM if necessary</li> </ul> <p>2. recovery subcommand<br> </p> <ul> <li>add "hosted-engine --lock-initialize" command that would delete sanlock, start over from scratch</li> </ul> <p>3. script<br> </p> <ul> <li>publish a script (in ovirt packages or available on web) which, when run, does all (or most) of the recovery process needed.</li> </ul> <p>4. commands<br> </p> <ul> <li>publish on the web a "recipe" for dealing with files that commonly go split-brain</li> <ul> <li>ids</li> <li>*.lease</li> <li>*.lockspace</li> </ul> </ul> <p>Any chance of any help on any of the above levels?<br> </p> <p>Ted Miller<br> Elkhart, IN, USA<br> <br> </p> </body> </html> --------------050904040002030300050808--

On 05/21/2014 12:31 AM, Ted Miller wrote:
Itamar, I am addressing this to you because one of your assignments seems to be to coordinate other oVirt contributors when dealing with issues that are raised on the ovirt-users email list.
As you are aware, there is an ongoing split-brain problem with running sanlock on replicated gluster storage. Personally, I believe that this is the 5th time that I have been bitten by this sanlock+gluster problem.
I believe that the following are true (if not, my entire request is probably off base).
* ovirt uses sanlock in such a way that when the sanlock storage is on a replicated gluster file system, very small storage disruptions can result in a gluster split-brain on the sanlock space o gluster is aware of the problem, and is working on a different way of replicating data, which will reduce these problems. * most (maybe all) of the sanlock locks have a short duration, measured in seconds * there are only a couple of things that a user can safely do from the command line when a file is in split-brain o delete the file o rename (mv) the file * x
_How did I get into this mess?_
had 3 hosts running ovirt 3.3 each hosted VMs gluster replica 3 storage engine was external to cluster upgraded 3 hosts from ovirt 3.3 to 3.4 hosted-engine deploy used new gluster volume (accessed via nfs) for storage storage was accessed using localhost:engVM1 link (localhost was probably a poor choice) created new engine on VM (did not transfer any data from old engine) added 3 hosts to new engine via web-gui ran above setup for 3 days shut entire system down before I left on vacation (holiday) came back from vacation powered on hosts found that iptables did not have rules for gluster access (a continuing problem if host installation is allowed to set up firewall) added rules for gluster glusterfs now up and running added storage manually tried "hosted-engine --vm-start" vm did not start logs show sanlock errors "gluster volume heal engVM1full: "gluster volume heal engVM1 info split-brain" showed 6 files in split-brain all 5 prefixed by /rhev/data-center/mnt/localhost\:_engVM1 UUID/dom_md/ids UUID/images/UUID/UUID (VM hard disk) UUID/images/UUID/UUID.lease UUID/ha_agent/hosted-engine.lockspace UUID/ha_agent/hosted-engine.metadata I copied each of the above files off of each of the three bricks to a safe place (15 files copied) I renamed the 5 files on /rhev/.... I copied the 5 files from one of the bricks to /rhev/ files can now be read OK (e.g. cat ids) sanlock.log shows error sets like these:
2014-05-20 03:23:39-0400 36199 [2843]: s3358 lockspace 5ebb3b40-a394-405b-bbac-4c0e21ccd659:1:/rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids:0 2014-05-20 03:23:39-0400 36199 [18873]: open error -5 /rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids 2014-05-20 03:23:39-0400 36199 [18873]: s3358 open_disk /rhev/data-center/mnt/localhost:_engVM1/5ebb3b40-a394-405b-bbac-4c0e21ccd659/dom_md/ids error -5 2014-05-20 03:23:40-0400 36200 [2843]: s3358 add_lockspace fail result -19
I am now stuck
What I would like to see in ovirt to help me (and others like me). Alternates listed in order from most desirable (automatic) to least desirable (set of commands to type, with lots of variables to figure out).
1. automagic recovery
* When a host is not able to access sanlock, it writes a small "problem" text file into the shared storage o the host-ID as part of the name (so only one host ever accesses that file) o a status number for the error causing problems o time stamp o time stamp when last sanlock lease will expire o if sanlock is able to access the file, the "problem" file is deleted * when time passes for its last sanlock lease to be expired, highest number host does a survey o did all other hosts create "problem" files? o do all "problem" files show same (or compatible) error codes related to file access problems? o are all hosts communicating by network? o if yes to all above * delete all sanlock storage space * initialize sanlock from scratch * restart whatever may have given up because of sanlock * restart VM if necessary
2. recovery subcommand
* add "hosted-engine --lock-initialize" command that would delete sanlock, start over from scratch
3. script
* publish a script (in ovirt packages or available on web) which, when run, does all (or most) of the recovery process needed.
4. commands
* publish on the web a "recipe" for dealing with files that commonly go split-brain o ids o *.lease o *.lockspace
Any chance of any help on any of the above levels?
Ted Miller Elkhart, IN, USA
vijay/allon/federico --^ ?

----- Original Message -----
From: "Ted Miller" <tmiller@hcjb.org> To: "users" <users@ovirt.org> Sent: Tuesday, May 20, 2014 11:31:42 PM Subject: [ovirt-users] sanlock + gluster recovery -- RFE
As you are aware, there is an ongoing split-brain problem with running sanlock on replicated gluster storage. Personally, I believe that this is the 5th time that I have been bitten by this sanlock+gluster problem.
I believe that the following are true (if not, my entire request is probably off base).
* ovirt uses sanlock in such a way that when the sanlock storage is on a replicated gluster file system, very small storage disruptions can result in a gluster split-brain on the sanlock space
Although this is possible (at the moment) we are working hard to avoid it. The hardest part here is to ensure that the gluster volume is properly configured. The suggested configuration for a volume to be used with ovirt is: Volume Name: (...) Type: Replicate Volume ID: (...) Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: (...three bricks...) Options Reconfigured: network.ping-timeout: 10 cluster.quorum-type: auto The two options ping-timeout and quorum-type are really important. You would also need a build where this bug is fixed in order to avoid any chance of a split-brain: https://bugzilla.redhat.com/show_bug.cgi?id=1066996
How did I get into this mess?
...
What I would like to see in ovirt to help me (and others like me). Alternates listed in order from most desirable (automatic) to least desirable (set of commands to type, with lots of variables to figure out).
The real solution is to avoid the split-brain altogether. At the moment it seems that using the suggested configurations and the bug fix we shouldn't hit a split-brain.
1. automagic recovery
2. recovery subcommand
3. script
4. commands
I think that the commands to resolve a split-brain should be documented. I just started a page here: http://www.ovirt.org/Gluster_Storage_Domain_Reference Could you add your documentation there? Thanks! -- Federico
participants (3)
-
Federico Simoncelli
-
Itamar Heim
-
Ted Miller