[Engine-devel] [vdsm] RFC: New Storage API

Mon Jan 14 21:56:05 UTC 2013


----- Original Message -----
> 
> 
> ----- Original Message -----
> > From: "Deepak C Shetty" <deepakcs at linux.vnet.ibm.com>
> > To: "Saggi Mizrahi" <smizrahi at redhat.com>
> > Cc: "Shu Ming" <shuming at linux.vnet.ibm.com>, "engine-devel"
> > <engine-devel at ovirt.org>, "VDSM Project Development"
> > <vdsm-devel at lists.fedorahosted.org>, "Deepak C Shetty"
> > <deepakcs at linux.vnet.ibm.com>
> > Sent: Sunday, December 16, 2012 11:40:01 PM
> > Subject: Re: [vdsm] RFC: New Storage API
> > 
> > On 12/08/2012 01:23 AM, Saggi Mizrahi wrote:
> > >
> > > ----- Original Message -----
> > >> From: "Deepak C Shetty" <deepakcs at linux.vnet.ibm.com>
> > >> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> > >> Cc: "Shu Ming" <shuming at linux.vnet.ibm.com>, "engine-devel"
> > >> <engine-devel at ovirt.org>, "VDSM Project Development"
> > >> <vdsm-devel at lists.fedorahosted.org>, "Deepak C Shetty"
> > >> <deepakcs at linux.vnet.ibm.com>
> > >> Sent: Friday, December 7, 2012 12:23:15 AM
> > >> Subject: Re: [vdsm] RFC: New Storage API
> > >>
> > >> On 12/06/2012 10:22 PM, Saggi Mizrahi wrote:
> > >>> ----- Original Message -----
> > >>>> From: "Shu Ming" <shuming at linux.vnet.ibm.com>
> > >>>> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> > >>>> Cc: "VDSM Project Development"
> > >>>> <vdsm-devel at lists.fedorahosted.org>, "engine-devel"
> > >>>> <engine-devel at ovirt.org>
> > >>>> Sent: Thursday, December 6, 2012 11:02:02 AM
> > >>>> Subject: Re: [vdsm] RFC: New Storage API
> > >>>>
> > >>>> Saggi,
> > >>>>
> > >>>> Thanks for sharing your thought and I get some comments below.
> > >>>>
> > >>>>
> > >>>> Saggi Mizrahi:
> > >>>>> I've been throwing a lot of bits out about the new storage
> > >>>>> API
> > >>>>> and
> > >>>>> I think it's time to talk a bit.
> > >>>>> I will purposefully try and keep implementation details away
> > >>>>> and
> > >>>>> concentrate about how the API looks and how you use it.
> > >>>>>
> > >>>>> First major change is in terminology, there is no long a
> > >>>>> storage
> > >>>>> domain but a storage repository.
> > >>>>> This change is done because so many things are already called
> > >>>>> domain in the system and this will make things less confusing
> > >>>>> for
> > >>>>> new-commers with a libvirt background.
> > >>>>>
> > >>>>> One other changes is that repositories no longer have a UUID.
> > >>>>> The UUID was only used in the pool members manifest and is no
> > >>>>> longer needed.
> > >>>>>
> > >>>>>
> > >>>>> connectStorageRepository(repoId, repoFormat,
> > >>>>> connectionParameters={}):
> > >>>>> repoId - is a transient name that will be used to refer to
> > >>>>> the
> > >>>>> connected domain, it is not persisted and doesn't have to be
> > >>>>> the
> > >>>>> same across the cluster.
> > >>>>> repoFormat - Similar to what used to be type (eg.
> > >>>>> localfs-1.0,
> > >>>>> nfs-3.4, clvm-1.2).
> > >>>>> connectionParameters - This is format specific and will used
> > >>>>> to
> > >>>>> tell VDSM how to connect to the repo.
> > >>>> Where does repoID come from? I think repoID doesn't exist
> > >>>> before
> > >>>> connectStorageRepository() return.  Isn't repoID a return
> > >>>> value
> > >>>> of
> > >>>> connectStorageRepository()?
> > >>> No, repoIDs are no longer part of the domain, they are just a
> > >>> transient handle.
> > >>> The user can put whatever it wants there as long as it isn't
> > >>> already taken by another currently connected domain.
> > >> So what happens when user mistakenly gives a repoID that is in
> > >> use
> > >> before.. there should be something in the return value that
> > >> specifies
> > >> the error and/or reason for error so that user can try with a
> > >> new/diff
> > >> repoID ?
> > > Asi I said, connect fails if the repoId is in use ATM.
> > >>>>> disconnectStorageRepository(self, repoId)
> > >>>>>
> > >>>>>
> > >>>>> In the new API there are only images, some images are mutable
> > >>>>> and
> > >>>>> some are not.
> > >>>>> mutable images are also called VirtualDisks
> > >>>>> immutable images are also called Snapshots
> > >>>>>
> > >>>>> There are no explicit templates, you can create as many
> > >>>>> images
> > >>>>> as
> > >>>>> you want from any snapshot.
> > >>>>>
> > >>>>> There are 4 major image operations:
> > >>>>>
> > >>>>>
> > >>>>> createVirtualDisk(targetRepoId, size, baseSnapshotId=None,
> > >>>>>                      userData={}, options={}):
> > >>>>>
> > >>>>> targetRepoId - ID of a connected repo where the disk will be
> > >>>>> created
> > >>>>> size - The size of the image you wish to create
> > >>>>> baseSnapshotId - the ID of the snapshot you want the base the
> > >>>>> new
> > >>>>> virtual disk on
> > >>>>> userData - optional data that will be attached to the new VD,
> > >>>>> could
> > >>>>> be anything that the user desires.
> > >>>>> options - options to modify VDSMs default behavior
> > >> IIUC, i can use options to do storage offloads ? For eg. I can
> > >> create
> > >> a
> > >> LUN that represents this VD on my storage array based on the
> > >> 'options'
> > >> parameter ? Is this the intended way to use 'options' ?
> > > No, this has nothing to do with offloads.
> > > If by "offloads" you mean having other VDSM hosts to the heavy
> > > lifting then this is what the option autoFix=False and the fix
> > > mechanism is for.
> > > If you are talking about advanced scsi features (ie. write same)
> > > they will be used automatically whenever possible.
> > > In any case, how we manage LUNs (if they are even used) is an
> > > implementation detail.
> > 
> > I am a bit more interested in how storage array offloads ( by that
> > I
> > mean, offload VD creation, snapshot, clone etc to the storage array
> > when
> > available/possible) can be done from VDSM ?
> > In the past there were talks of using libSM to do that. How does
> > that
> > strategy play in this new Storage API scenario ? I agree its implmn
> > detail, but how & where does that implm sit and how it would be
> > triggered is not very clear to me. Looking at createVD args, it
> > sounded
> > like 'options' seems to be a trigger point for deciding whether to
> > use
> > storage offloads or not, but you spoke otherwise :) Can you provide
> > your
> > vision on how VDSM can understand the storage array capabilities &
> > exploit storgae array offloads in this New Storage API context ? --
> > Thanks deepak
> Some will be used automatically whenever possible (storage
> offloading).
> Features that favor a specific strategy will be activated when the
> proper strategy (space, performance) option is selected.
> In cases when only the user can know whether to use a feature or not
> we will have options to turn that on.
> In any case every domain exports a capability list through
> GetRepositoryCapabilities() that returns a list off repository
> capabilities.
> Some capabilities are VDSM specific like CLUSTERED or REQUIRES_SRM.
> Some are storage capabilities like NATIVE_SNAPSHOTS,
> NATIVE_THIN_PROVISIONING, SPARSE_VOLUMES, etc...
> 
> We are also considering an override mechanism where you can disable
> features in storage that supports it by setting it in the domain
> options. This will be done with NO_XXXXX (eg. NO_NATIVE_SNAPSHOTS).
> This will make the domain not use or expose the capability through
> the API. I assume it will only be used for testing or in cases where
> the storage array is known to have problems with a certain feature.
> Not everything can be disables as an example there is no real way to
> disable NATIVE_THING_PROVISIONING or SPARSE_VOLUMES.

Saggi, there are several different discussions going on here which I think require some clearing up (and perhaps splitting).
what I think is missing here:
1. distinction of what we believe should be at repo level and what at disk level 
    e.g. pre/postZero at repo level, native snapshots as described above would also be repo level (not defined per disk) etc.
2. how/where storage offload would work? is there a single implementation for repos which detects automatically storage capabilities or repo class for each storage type
3. and biggest topic is probably - a mapping of the image operations and details about the flows (did you send something about that?) - i.e. create vdisk flow, copy, etc.


> > 
> > 
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel at lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
>