
----- Original Message -----
From: "David Teigland" <teigland@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Dan Kenigsberg" <danken@redhat.com>, "Saggi Mizrahi" <smizrahi@redhat.com>, devel@ovirt.org, "Federico Simoncelli" <fsimonce@redhat.com>, "Allon Mureinik" <amureini@redhat.com> Sent: Wednesday, April 30, 2014 8:59:02 PM Subject: Re: Sanlock fencing reservations
On Wed, Apr 30, 2014 at 01:27:47PM -0400, Nir Soffer wrote:
----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com>, nsoffer@redhat.com Cc: devel@ovirt.org, "David Teigland" <teigland@redhat.com> Sent: Wednesday, April 16, 2014 10:33:39 AM Subject: Re: Sanlock fencing reservations
On Wed, Feb 26, 2014 at 11:14:33AM -0500, Saggi Mizrahi wrote:
I've recently been introduced to the this feature and I was wondering is this really the correct way to go for solving this particular problem.
My main issue is with making two unrelated flow dependent. By pushing this into the existing sanlock data structures you limit yourself in the future from changing either to optimize or even solve problems for a single use case.
Having an independent daemon to perform this task will give more room as to how to implement the feature.
Saggi, are you thinking about something similar to fence_sanlockd http://linux.die.net/man/8/fence_sanlockd and its client fence_sanlock?
Using them, instead of reimplementing parts of them within Vdsm, seem reasonable in first glance.
However http://www.ovirt.org/Features/Sanlock_Fencing#Why_not_use_fence_sanlockd.3F claim that it wastes 2G of storage and requires explicit master domain upgrade. I don't care about wasting 2G of storage. 2G is peanuts. Especially if we zero it out so unused space for hosts is actually not allocated for most modern storage servers.
Nir, could you explain why so much space is required by sanlockd? Can it be configured to use smaller area?
According to fence_sanlock manual, each host get a resource on the shared storage. Each sanlock resource is 1MB (sector size * max hosts). So to serve 2000 hosts we need 2G. Again, 1MB per host is not a lot to pay.
We can use less space if we want to support smaller number of hosts, but we need a new domain format that include a new volume for the fencing.
Initially, I also suggested a design similar to fence_sanlock, which I described in the last two paragraphs of this email:
https://lists.fedorahosted.org/pipermail/sanlock-devel/2014-February/000433....
One of the fundamental features of that design is that loss of storage causes a host to be reset/fenced, which is precisely what we want and require for fence_sanlock.
However, this is *not* the behavior that vdsm/rhev want, as described by Nir in the last two paragraphs in this email:
https://lists.fedorahosted.org/pipermail/sanlock-devel/2014-March/000436.htm...
This is a fundamental difference in goals and makes a fence_sanlock-like approach unsuitable for this feature. I think things get a bit scrambled here (which is why I insist on keeping those
If the user cares about 2G they don't have enough space for VMs. Domain upgrade is not a valid excuse IMO. We could also use magic numbers and hashes to detect that the space has not yet been initialized and initialize it on demand. things separate) If you don't have access to storage sanlock should try and kill all process using resources before fencing the host. The fencing agent IMO shouldn't do anything if it can't read the storage. Similar to how you wouldn't fence a host if the network is down. In general, if you don't have access to storage sanlock should make sure that when the host do get the storage back the system keep on chugging along. That means that the host was either fenced or all relevant processes were killed. To summarize sanlock starts working when storage is down. The fencing agent needs to do things when the storage is up. The do fundamentally different things, and they are two very critical pieces of out validation arch. I don't want a bug in one effecting the other.
Dave