----- Original Message -----
From: "David Teigland" <teigland(a)redhat.com>
To: "Nir Soffer" <nsoffer(a)redhat.com>
Cc: "Dan Kenigsberg" <danken(a)redhat.com>, "Saggi Mizrahi"
<smizrahi(a)redhat.com>, devel(a)ovirt.org, "Federico
Simoncelli" <fsimonce(a)redhat.com>, "Allon Mureinik"
<amureini(a)redhat.com>
Sent: Wednesday, April 30, 2014 8:59:02 PM
Subject: Re: Sanlock fencing reservations
On Wed, Apr 30, 2014 at 01:27:47PM -0400, Nir Soffer wrote:
> ----- Original Message -----
> > From: "Dan Kenigsberg" <danken(a)redhat.com>
> > To: "Saggi Mizrahi" <smizrahi(a)redhat.com>, nsoffer(a)redhat.com
> > Cc: devel(a)ovirt.org, "David Teigland" <teigland(a)redhat.com>
> > Sent: Wednesday, April 16, 2014 10:33:39 AM
> > Subject: Re: Sanlock fencing reservations
> >
> > On Wed, Feb 26, 2014 at 11:14:33AM -0500, Saggi Mizrahi wrote:
> > > I've recently been introduced to the this feature and I was wondering
> > > is
> > > this really
> > > the correct way to go for solving this particular problem.
> > >
> > > My main issue is with making two unrelated flow dependent.
> > > By pushing this into the existing sanlock data structures you limit
> > > yourself
> > > in the future from changing either to optimize or even solve problems
> > > for a
> > > single use case.
> > >
> > > Having an independent daemon to perform this task will give more room
> > > as to
> > > how to implement the feature.
> >
> > Saggi, are you thinking about something similar to fence_sanlockd
> >
http://linux.die.net/man/8/fence_sanlockd and its client fence_sanlock?
> >
> > Using them, instead of reimplementing parts of them within Vdsm, seem
> > reasonable in first glance.
> >
> > However
> >
http://www.ovirt.org/Features/Sanlock_Fencing#Why_not_use_fence_sanlockd.3F
> > claim that it wastes 2G of storage and requires explicit master domain
> > upgrade.
I don't care about wasting 2G of storage. 2G is peanuts.
Especially if we
zero it out so unused space for hosts is actually not allocated for most
modern storage servers.
If the user cares about 2G they don't have enough space for VMs.
Domain upgrade is not a valid excuse IMO. We could also use magic numbers and hashes
to detect that the space has not yet been initialized and initialize it on demand.
> >
> > Nir, could you explain why so much space is required by sanlockd? Can it
> > be configured to use smaller area?
>
> According to fence_sanlock manual, each host get a resource on the shared
> storage. Each sanlock resource is 1MB (sector size * max hosts). So to
> serve
> 2000 hosts we need 2G.
Again, 1MB per host is not a lot to pay.
>
> We can use less space if we want to support smaller number of hosts, but
> we need a new domain format that include a new volume for the fencing.
Initially, I also suggested a design similar to fence_sanlock, which I
described in the last two paragraphs of this email:
https://lists.fedorahosted.org/pipermail/sanlock-devel/2014-February/0004...
One of the fundamental features of that design is that loss of storage
causes a host to be reset/fenced, which is precisely what we want and
require for fence_sanlock.
However, this is *not* the behavior that vdsm/rhev want, as described by
Nir in the last two paragraphs in this email:
https://lists.fedorahosted.org/pipermail/sanlock-devel/2014-March/000436....
This is a fundamental difference in goals and makes a fence_sanlock-like
approach unsuitable for this feature.
I think things get a bit scrambled here
(which is why I insist on keeping those
things separate)
If you don't have access to storage sanlock should try and kill all process using
resources before fencing the host.
The fencing agent IMO shouldn't do anything if it can't read the storage.
Similar to how you wouldn't fence a host if the network is down.
In general, if you don't have access to storage sanlock should make sure that
when the host do get the storage back the system keep on chugging along.
That means that the host was either fenced or all relevant processes were
killed.
To summarize sanlock starts working when storage is down. The fencing agent
needs to do things when the storage is up.
The do fundamentally different things, and they are two very critical pieces
of out validation arch. I don't want a bug in one effecting the other.
Dave