From: "Itamar Heim" <iheim(a)redhat.com>
To: "Saggi Mizrahi" <smizrahi(a)redhat.com>
Cc: "VDSM Project Development" <vdsm-devel(a)lists.fedorahosted.org>,
"engine-devel" <engine-devel(a)ovirt.org>
Sent: Monday, January 14, 2013 6:18:13 AM
Subject: Re: [vdsm] RFC: New Storage API
On 12/04/2012 11:52 PM, Saggi Mizrahi wrote:
> I've been throwing a lot of bits out about the new storage API and
> I think it's time to talk a bit.
> I will purposefully try and keep implementation details away and
> concentrate about how the API looks and how you use it.
>
> First major change is in terminology, there is no long a storage
> domain but a storage repository.
> This change is done because so many things are already called
> domain in the system and this will make things less confusing for
> new-commers with a libvirt background.
>
> One other changes is that repositories no longer have a UUID.
> The UUID was only used in the pool members manifest and is no
> longer needed.
>
>
> connectStorageRepository(repoId, repoFormat,
> connectionParameters={}):
> repoId - is a transient name that will be used to refer to the
> connected domain, it is not persisted and doesn't have to be the
> same across the cluster.
> repoFormat - Similar to what used to be type (eg. localfs-1.0,
> nfs-3.4, clvm-1.2).
> connectionParameters - This is format specific and will used to
> tell VDSM how to connect to the repo.
>
> disconnectStorageRepository(self, repoId):
>
>
> In the new API there are only images, some images are mutable and
> some are not.
> mutable images are also called VirtualDisks
> immutable images are also called Snapshots
>
> There are no explicit templates, you can create as many images as
> you want from any snapshot.
>
> There are 4 major image operations:
>
>
> createVirtualDisk(targetRepoId, size, baseSnapshotId=None,
> userData={}, options={}):
>
> targetRepoId - ID of a connected repo where the disk will be
> created
> size - The size of the image you wish to create
> baseSnapshotId - the ID of the snapshot you want the base the new
> virtual disk on
> userData - optional data that will be attached to the new VD, could
> be anything that the user desires.
> options - options to modify VDSMs default behavior
>
> returns the id of the new VD
>
> createSnapshot(targetRepoId, baseVirtualDiskId,
> userData={}, options={}):
> targetRepoId - The ID of a connected repo where the new sanpshot
> will be created and the original image exists as well.
> size - The size of the image you wish to create
> baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want
> to snapshot
> userData - optional data that will be attached to the new Snapshot,
> could be anything that the user desires.
> options - options to modify VDSMs default behavior
>
> returns the id of the new Snapshot
>
> copyImage(targetRepoId, imageId, baseImageId=None, userData={},
> options={})
> targetRepoId - The ID of a connected repo where the new image will
> be created
> imageId - The image you wish to copy
> baseImageId - if specified, the new image will contain only the
> diff between image and Id.
> If None the new image will contain all the bits of
> image Id. This can be used to copy partial parts of
> images for export.
> userData - optional data that will be attached to the new image,
> could be anything that the user desires.
> options - options to modify VDSMs default behavior
>
> return the Id of the new image. In case of copying an immutable
> image the ID will be identical to the original image as they
> contain the same data. However the user should not assume that and
> always use the value returned from the method.
>
> removeImage(repositoryId, imageId, options={}):
> repositoryId - The ID of a connected repo where the image to delete
> resides
> imageId - The id of the image you wish to delete.
>
>
> ----
> getImageStatus(repositoryId, imageId)
> repositoryId - The ID of a connected repo where the image to check
> resides
> imageId - The id of the image you wish to check.
>
> All operations return once the operations has been committed to
> disk NOT when the operation actually completes.
> This is done so that:
> - operation come to a stable state as quickly as possible.
> - In case where there is an SDM, only small portion of the
> operation actually needs to be performed on the SDM host.
> - No matter how many times the operation fails and on how many
> hosts, you can always resume the operation and choose when to do
> it.
> - You can stop an operation at any time and remove the resulting
> object making a distinction between "stop because the host is
> overloaded" to "I don't want that image"
>
> This means that after calling any operation that creates a new
> image the user must then call getImageStatus() to check what is
> the status of the image.
> The status of the image can be either optimized, degraded, or
> broken.
> "Optimized" means that the image is available and you can run VMs
> of it.
> "Degraded" means that the image is available and will run VMs but
> it might be a better way VDSM can represent the underlying data.
> "Broken" means that the image can't be used at the moment, probably
> because not all the data has been set up on the volume.
>
> Apart from that VDSM will also return the last persisted status
> information which will conatin
> hostID - the last host to try and optimize of fix the image
> stage - X/Y (eg. 1/10) the last persisted stage of the fix.
> percent_complete - -1 or 0-100, the last persisted completion
> percentage of the aforementioned stage. -1 means that no progress
> is available for that operation.
> last_error - This will only be filled if the operation failed
> because of something other then IO or a VDSM crash for obvious
> reasons.
> It will usually be set if the task was manually
> stopped
>
> The user can either be satisfied with that information or as the
> host specified in host ID if it is still working on that image by
> checking it's running tasks.
>
> checkStorageRepository(self, repositoryId, options={}):
> A method to go over a storage repository and scan for any existing
> problems. This includes degraded\broken images and deleted images
> that have no yet been physically deleted\merged.
> It returns a list of Fix objects.
> Fix objects come in 4 types:
> clean - cleans data, run them to get more space.
> optimize - run them to optimize a degraded image
> merge - Merges two images together. Doing this sometimes
> makes more images ready optimizing or cleaning.
> The reason it is different from optimize is that
> unmerged images are considered optimized.
> mend - mends a broken image
>
> The user can read these types and prioritize fixes. Fixes also
> contain opaque FIX data and they should be sent as received to
> fixStorageRepository(self, repositoryId, fix, options={}):
>
> That will start a fix operation.
>
>
> All major operations automatically start the appropriate "Fix" to
> bring the created object to an optimize\degraded state (the one
> that is quicker) unless one of the options is
> AutoFix=False. This is only useful for repos that might not be able
> to create volumes on all hosts (SDM) but would like to have the
> actual IO distributed in the cluster.
>
> Other common options is the strategy option:
> It has currently 2 possible values
> space and performance - In case VDSM has 2 ways of completing the
> same operation it will tell it to value one over the other. For
> example, whether to copy all the data or just create a qcow based
> of a snapshot.
> The default is space.
>
> You might have also noticed that it is never explicitly specified
> where to look for existing images. This is done purposefully, VDSM
> will always look in all connected repositories for existing
> objects.
> For very large setups this might be problematic. To mitigate the
> problem you have these options:
> participatingRepositories=[repoId, ...] which tell VDSM to narrow
> the search to just these repositories
> and
> imageHints={imgId: repoId} which will force VDSM to look for those
> image ID just in those repositories and fail if it doesn't find
> them there.
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel(a)lists.fedorahosted.org
>
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
>
you are using VirtualDisk and Snapshot for mutable and immutable.
the terminology of "image" for immutable and "volume" for mutable
seems
to be the one used by ec2/openstack/etc. - any thoughts on using
similar
terminology?
(though i think they also differ in forcing all images to be in an
image
repo, and all volumes in a volume repo, while we allow to mix them in
same repo).
We have volumes internally though, I wanted to call them slabs but Ayal and Edu
feel strongly about calling them volumes and the current naming system in general.