[Engine-devel] [vdsm] RFC: New Storage API

Mon Dec 17 16:13:37 UTC 2012


----- Original Message -----
> From: "Deepak C Shetty" <deepakcs at linux.vnet.ibm.com>
> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> Cc: "Shu Ming" <shuming at linux.vnet.ibm.com>, "engine-devel" <engine-devel at ovirt.org>, "VDSM Project Development"
> <vdsm-devel at lists.fedorahosted.org>, "Deepak C Shetty" <deepakcs at linux.vnet.ibm.com>
> Sent: Sunday, December 16, 2012 11:40:01 PM
> Subject: Re: [vdsm] RFC: New Storage API
> 
> On 12/08/2012 01:23 AM, Saggi Mizrahi wrote:
> >
> > ----- Original Message -----
> >> From: "Deepak C Shetty" <deepakcs at linux.vnet.ibm.com>
> >> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> >> Cc: "Shu Ming" <shuming at linux.vnet.ibm.com>, "engine-devel"
> >> <engine-devel at ovirt.org>, "VDSM Project Development"
> >> <vdsm-devel at lists.fedorahosted.org>, "Deepak C Shetty"
> >> <deepakcs at linux.vnet.ibm.com>
> >> Sent: Friday, December 7, 2012 12:23:15 AM
> >> Subject: Re: [vdsm] RFC: New Storage API
> >>
> >> On 12/06/2012 10:22 PM, Saggi Mizrahi wrote:
> >>> ----- Original Message -----
> >>>> From: "Shu Ming" <shuming at linux.vnet.ibm.com>
> >>>> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> >>>> Cc: "VDSM Project Development"
> >>>> <vdsm-devel at lists.fedorahosted.org>, "engine-devel"
> >>>> <engine-devel at ovirt.org>
> >>>> Sent: Thursday, December 6, 2012 11:02:02 AM
> >>>> Subject: Re: [vdsm] RFC: New Storage API
> >>>>
> >>>> Saggi,
> >>>>
> >>>> Thanks for sharing your thought and I get some comments below.
> >>>>
> >>>>
> >>>> Saggi Mizrahi:
> >>>>> I've been throwing a lot of bits out about the new storage API
> >>>>> and
> >>>>> I think it's time to talk a bit.
> >>>>> I will purposefully try and keep implementation details away
> >>>>> and
> >>>>> concentrate about how the API looks and how you use it.
> >>>>>
> >>>>> First major change is in terminology, there is no long a
> >>>>> storage
> >>>>> domain but a storage repository.
> >>>>> This change is done because so many things are already called
> >>>>> domain in the system and this will make things less confusing
> >>>>> for
> >>>>> new-commers with a libvirt background.
> >>>>>
> >>>>> One other changes is that repositories no longer have a UUID.
> >>>>> The UUID was only used in the pool members manifest and is no
> >>>>> longer needed.
> >>>>>
> >>>>>
> >>>>> connectStorageRepository(repoId, repoFormat,
> >>>>> connectionParameters={}):
> >>>>> repoId - is a transient name that will be used to refer to the
> >>>>> connected domain, it is not persisted and doesn't have to be
> >>>>> the
> >>>>> same across the cluster.
> >>>>> repoFormat - Similar to what used to be type (eg. localfs-1.0,
> >>>>> nfs-3.4, clvm-1.2).
> >>>>> connectionParameters - This is format specific and will used to
> >>>>> tell VDSM how to connect to the repo.
> >>>> Where does repoID come from? I think repoID doesn't exist before
> >>>> connectStorageRepository() return.  Isn't repoID a return value
> >>>> of
> >>>> connectStorageRepository()?
> >>> No, repoIDs are no longer part of the domain, they are just a
> >>> transient handle.
> >>> The user can put whatever it wants there as long as it isn't
> >>> already taken by another currently connected domain.
> >> So what happens when user mistakenly gives a repoID that is in use
> >> before.. there should be something in the return value that
> >> specifies
> >> the error and/or reason for error so that user can try with a
> >> new/diff
> >> repoID ?
> > Asi I said, connect fails if the repoId is in use ATM.
> >>>>> disconnectStorageRepository(self, repoId)
> >>>>>
> >>>>>
> >>>>> In the new API there are only images, some images are mutable
> >>>>> and
> >>>>> some are not.
> >>>>> mutable images are also called VirtualDisks
> >>>>> immutable images are also called Snapshots
> >>>>>
> >>>>> There are no explicit templates, you can create as many images
> >>>>> as
> >>>>> you want from any snapshot.
> >>>>>
> >>>>> There are 4 major image operations:
> >>>>>
> >>>>>
> >>>>> createVirtualDisk(targetRepoId, size, baseSnapshotId=None,
> >>>>>                      userData={}, options={}):
> >>>>>
> >>>>> targetRepoId - ID of a connected repo where the disk will be
> >>>>> created
> >>>>> size - The size of the image you wish to create
> >>>>> baseSnapshotId - the ID of the snapshot you want the base the
> >>>>> new
> >>>>> virtual disk on
> >>>>> userData - optional data that will be attached to the new VD,
> >>>>> could
> >>>>> be anything that the user desires.
> >>>>> options - options to modify VDSMs default behavior
> >> IIUC, i can use options to do storage offloads ? For eg. I can
> >> create
> >> a
> >> LUN that represents this VD on my storage array based on the
> >> 'options'
> >> parameter ? Is this the intended way to use 'options' ?
> > No, this has nothing to do with offloads.
> > If by "offloads" you mean having other VDSM hosts to the heavy
> > lifting then this is what the option autoFix=False and the fix
> > mechanism is for.
> > If you are talking about advanced scsi features (ie. write same)
> > they will be used automatically whenever possible.
> > In any case, how we manage LUNs (if they are even used) is an
> > implementation detail.
> 
> I am a bit more interested in how storage array offloads ( by that I
> mean, offload VD creation, snapshot, clone etc to the storage array
> when
> available/possible) can be done from VDSM ?
> In the past there were talks of using libSM to do that. How does that
> strategy play in this new Storage API scenario ? I agree its implmn
> detail, but how & where does that implm sit and how it would be
> triggered is not very clear to me. Looking at createVD args, it
> sounded
> like 'options' seems to be a trigger point for deciding whether to
> use
> storage offloads or not, but you spoke otherwise :) Can you provide
> your
> vision on how VDSM can understand the storage array capabilities &
> exploit storgae array offloads in this New Storage API context ? --
> Thanks deepak
Some will be used automatically whenever possible (storage offloading).
Features that favor a specific strategy will be activated when the proper strategy (space, performance) option is selected.
In cases when only the user can know whether to use a feature or not we will have options to turn that on.
In any case every domain exports a capability list through GetRepositoryCapabilities() that returns a list off repository capabilities.
Some capabilities are VDSM specific like CLUSTERED or REQUIRES_SRM. Some are storage capabilities like NATIVE_SNAPSHOTS, NATIVE_THIN_PROVISIONING, SPARSE_VOLUMES, etc...

We are also considering an override mechanism where you can disable features in storage that supports it by setting it in the domain options. This will be done with NO_XXXXX (eg. NO_NATIVE_SNAPSHOTS). This will make the domain not use or expose the capability through the API. I assume it will only be used for testing or in cases where the storage array is known to have problems with a certain feature. Not everything can be disables as an example there is no real way to disable NATIVE_THING_PROVISIONING or SPARSE_VOLUMES.
> 
>