Re: [ovirt-devel] Sanlock fencing reservations

On Wed, Feb 26, 2014 at 11:14:33AM -0500, Saggi Mizrahi wrote:
I've recently been introduced to the this feature and I was wondering is this really the correct way to go for solving this particular problem.
My main issue is with making two unrelated flow dependent. By pushing this into the existing sanlock data structures you limit yourself in the future from changing either to optimize or even solve problems for a single use case.
Having an independent daemon to perform this task will give more room as to how to implement the feature.
Saggi, are you thinking about something similar to fence_sanlockd http://linux.die.net/man/8/fence_sanlockd and its client fence_sanlock? Using them, instead of reimplementing parts of them within Vdsm, seem reasonable in first glance. However http://www.ovirt.org/Features/Sanlock_Fencing#Why_not_use_fence_sanlockd.3F claim that it wastes 2G of storage and requires explicit master domain upgrade. Nir, could you explain why so much space is required by sanlockd? Can it be configured to use smaller area?
I don't want to reach a situation where we need to change a sanlock struct and not be able to do it as it makes problems with fencing flows.
I believe in the mantra that things should do one thing and do it well. This feels like an ad-hoc solution to a very niche problem.
Further more, it kind of seems like a mailbox issue. Leaving a fencing request is just a message. In the future I can see it just being a suspend-to-disk request so that you don't even have to fence the host in such cases.
The only reason I see people putting it to sanlock is that it's a daemon that reads from disk and has does fencing.
I agree that in VDSMs current state putting this in the mailbox is unreliable to say the least but it doesn't mean that we can't have a small independent daemon to do the task until we get messaging in VDSM to a stable state.
IMHO it's better than having it as and ad-hoc feature to sanlock. A feature which we can't remove later as someone might depend on it. A feature that might limit us or that we might even abandon once we have more reliable disk based messaging in VDSM.

----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com>, nsoffer@redhat.com Cc: devel@ovirt.org, "David Teigland" <teigland@redhat.com> Sent: Wednesday, April 16, 2014 10:33:39 AM Subject: Re: Sanlock fencing reservations
On Wed, Feb 26, 2014 at 11:14:33AM -0500, Saggi Mizrahi wrote:
I've recently been introduced to the this feature and I was wondering is this really the correct way to go for solving this particular problem.
My main issue is with making two unrelated flow dependent. By pushing this into the existing sanlock data structures you limit yourself in the future from changing either to optimize or even solve problems for a single use case.
Having an independent daemon to perform this task will give more room as to how to implement the feature.
Saggi, are you thinking about something similar to fence_sanlockd http://linux.die.net/man/8/fence_sanlockd and its client fence_sanlock?
Using them, instead of reimplementing parts of them within Vdsm, seem reasonable in first glance.
However http://www.ovirt.org/Features/Sanlock_Fencing#Why_not_use_fence_sanlockd.3F claim that it wastes 2G of storage and requires explicit master domain upgrade.
Nir, could you explain why so much space is required by sanlockd? Can it be configured to use smaller area?
According to fence_sanlock manual, each host get a resource on the shared storage. Each sanlock resource is 1MB (sector size * max hosts). So to serve 2000 hosts we need 2G. We can use less space if we want to support smaller number of hosts, but we need a new domain format that include a new volume for the fencing. Nir

On Wed, Apr 30, 2014 at 01:27:47PM -0400, Nir Soffer wrote:
----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com>, nsoffer@redhat.com Cc: devel@ovirt.org, "David Teigland" <teigland@redhat.com> Sent: Wednesday, April 16, 2014 10:33:39 AM Subject: Re: Sanlock fencing reservations
On Wed, Feb 26, 2014 at 11:14:33AM -0500, Saggi Mizrahi wrote:
I've recently been introduced to the this feature and I was wondering is this really the correct way to go for solving this particular problem.
My main issue is with making two unrelated flow dependent. By pushing this into the existing sanlock data structures you limit yourself in the future from changing either to optimize or even solve problems for a single use case.
Having an independent daemon to perform this task will give more room as to how to implement the feature.
Saggi, are you thinking about something similar to fence_sanlockd http://linux.die.net/man/8/fence_sanlockd and its client fence_sanlock?
Using them, instead of reimplementing parts of them within Vdsm, seem reasonable in first glance.
However http://www.ovirt.org/Features/Sanlock_Fencing#Why_not_use_fence_sanlockd.3F claim that it wastes 2G of storage and requires explicit master domain upgrade.
Nir, could you explain why so much space is required by sanlockd? Can it be configured to use smaller area?
According to fence_sanlock manual, each host get a resource on the shared storage. Each sanlock resource is 1MB (sector size * max hosts). So to serve 2000 hosts we need 2G.
We can use less space if we want to support smaller number of hosts, but we need a new domain format that include a new volume for the fencing.
Initially, I also suggested a design similar to fence_sanlock, which I described in the last two paragraphs of this email: https://lists.fedorahosted.org/pipermail/sanlock-devel/2014-February/000433.... One of the fundamental features of that design is that loss of storage causes a host to be reset/fenced, which is precisely what we want and require for fence_sanlock. However, this is *not* the behavior that vdsm/rhev want, as described by Nir in the last two paragraphs in this email: https://lists.fedorahosted.org/pipermail/sanlock-devel/2014-March/000436.htm... This is a fundamental difference in goals and makes a fence_sanlock-like approach unsuitable for this feature. Dave

----- Original Message -----
From: "David Teigland" <teigland@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Dan Kenigsberg" <danken@redhat.com>, "Saggi Mizrahi" <smizrahi@redhat.com>, devel@ovirt.org, "Federico Simoncelli" <fsimonce@redhat.com>, "Allon Mureinik" <amureini@redhat.com> Sent: Wednesday, April 30, 2014 8:59:02 PM Subject: Re: Sanlock fencing reservations
On Wed, Apr 30, 2014 at 01:27:47PM -0400, Nir Soffer wrote:
----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com>, nsoffer@redhat.com Cc: devel@ovirt.org, "David Teigland" <teigland@redhat.com> Sent: Wednesday, April 16, 2014 10:33:39 AM Subject: Re: Sanlock fencing reservations
On Wed, Feb 26, 2014 at 11:14:33AM -0500, Saggi Mizrahi wrote:
I've recently been introduced to the this feature and I was wondering is this really the correct way to go for solving this particular problem.
My main issue is with making two unrelated flow dependent. By pushing this into the existing sanlock data structures you limit yourself in the future from changing either to optimize or even solve problems for a single use case.
Having an independent daemon to perform this task will give more room as to how to implement the feature.
Saggi, are you thinking about something similar to fence_sanlockd http://linux.die.net/man/8/fence_sanlockd and its client fence_sanlock?
Using them, instead of reimplementing parts of them within Vdsm, seem reasonable in first glance.
However http://www.ovirt.org/Features/Sanlock_Fencing#Why_not_use_fence_sanlockd.3F claim that it wastes 2G of storage and requires explicit master domain upgrade. I don't care about wasting 2G of storage. 2G is peanuts. Especially if we zero it out so unused space for hosts is actually not allocated for most modern storage servers.
Nir, could you explain why so much space is required by sanlockd? Can it be configured to use smaller area?
According to fence_sanlock manual, each host get a resource on the shared storage. Each sanlock resource is 1MB (sector size * max hosts). So to serve 2000 hosts we need 2G. Again, 1MB per host is not a lot to pay.
We can use less space if we want to support smaller number of hosts, but we need a new domain format that include a new volume for the fencing.
Initially, I also suggested a design similar to fence_sanlock, which I described in the last two paragraphs of this email:
https://lists.fedorahosted.org/pipermail/sanlock-devel/2014-February/000433....
One of the fundamental features of that design is that loss of storage causes a host to be reset/fenced, which is precisely what we want and require for fence_sanlock.
However, this is *not* the behavior that vdsm/rhev want, as described by Nir in the last two paragraphs in this email:
https://lists.fedorahosted.org/pipermail/sanlock-devel/2014-March/000436.htm...
This is a fundamental difference in goals and makes a fence_sanlock-like approach unsuitable for this feature. I think things get a bit scrambled here (which is why I insist on keeping those
If the user cares about 2G they don't have enough space for VMs. Domain upgrade is not a valid excuse IMO. We could also use magic numbers and hashes to detect that the space has not yet been initialized and initialize it on demand. things separate) If you don't have access to storage sanlock should try and kill all process using resources before fencing the host. The fencing agent IMO shouldn't do anything if it can't read the storage. Similar to how you wouldn't fence a host if the network is down. In general, if you don't have access to storage sanlock should make sure that when the host do get the storage back the system keep on chugging along. That means that the host was either fenced or all relevant processes were killed. To summarize sanlock starts working when storage is down. The fencing agent needs to do things when the storage is up. The do fundamentally different things, and they are two very critical pieces of out validation arch. I don't want a bug in one effecting the other.
Dave
participants (4)
-
Dan Kenigsberg
-
David Teigland
-
Nir Soffer
-
Saggi Mizrahi