On Wed, Dec 7, 2016 at 8:10 PM, Oved Ourfali <oourfali(a)redhat.com> wrote:
On Dec 7, 2016 16:00, "Nir Soffer"
<nsoffer(a)redhat.com> wrote:
>
> On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <oourfali(a)redhat.com> wrote:
> >
> >
> > On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke(a)redhat.com> wrote:
> >>
> >> On 06/12/16 22:06 +0200, Arik Hadas wrote:
> >>>
> >>> Adam,
> >>
> >>
> >> :) You seem upset. Sorry if I touched on a nerve...
> >>
> >>> Just out of curiosity: when you write "v2v has promised" -
what
> >>> exactly
> >>> do you
> >>> mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and
> >>> I
> >>> that
> >>> implemented the integration with virt-v2v? I'm not aware of such a
> >>> promise by
> >>> any of these options :)
> >>
> >>
> >> Some history...
> >>
> >> Earlier this year Nir, Francesco (added), Shahar, and I began
> >> discussing the similarities between what storage needed to do with
> >> external commands and what was designed specifically for v2v. I am
> >> not sure if you were involved in the project at that time. The plan
> >> was to create common infrastructure that could be extended to fit the
> >> unique needs of the verticals. The v2v code was going to be moved
> >> over to the new infrastructure (see [1]) and the only thing that
> >> stopped the initial patch was lack of a VMWare testing environment for
> >> verification.
> >>
> >> At that time storage refocused on developing verbs that used the new
> >> infrastructure and have been maintaining its suitability for general
> >> use. Conversion of v2v -> Host Jobs is obviously a lower priority
> >> item and much more difficult now due to the early missed opportunity.
> >>
> >>> Anyway, let's say that you were given such a promise by someone and
> >>> thus
> >>> consider that mechanism to be deprecated - it doesn't really
matter.
> >>
> >>
> >> I may be biased but I think my opinion does matter.
> >>
> >>> The current implementation doesn't well fit to this flow (it
requires
> >>> per-volume job, it creates leases that are not needed for
template's
> >>> disks,
> >>> ...) and with the "next-gen API" with proper support for virt
flows
> >>> not
> >>> even
> >>> being discussed with us (and iiuc also not with the infra team) yet, I
> >>> don't
> >>> understand what do you suggest except for some strong, though
> >>> irrelevant,
> >>> statements.
> >>
> >>
> >> If you are willing to engage in a good-faith technical discussion I am
> >> sure I can help you to understand. These operations to storage demand
> >> some form of locking protection. If volume leases aren't appropriate
> >> then
> >> perhaps we should use the VM Leases / xleases that Nir is finishing
> >> off for 4.1 now.
> >>
> >>> I suggest loud and clear to reuse (not to add dependencies, not to
> >>> enhance, ..)
> >>> an existing mechanism for a very similar flow of virt-v2v that works
> >>> well
> >>> and
> >>> simple.
> >>
> >>
> >> I clearly remember discussions involving infra (hello Oved), virt
> >> (hola Michal), and storage where we decided that new APIs performing
> >> async operations involving external commands should use the HostJobs
> >> infrastructure instead of adding more information to Host Stats.
> >> These were the "famous" entity polling meetings.
>
> We discussed these issues behind close doors, not in the public mailing
> list,
> so it is not surprising that people do not know about the agreements we
> had.
>
The core team was there. So it is surprising.
> >>
> >> Of course plans can change but I have never been looped into any such
> >> discussions.
> >>
> >
> > Well, I think that when someone builds a good infrastructure he first
> > needs
> > to talk to all consumers and make sure it fits.
> > In this case it seems like most work was done to fit the storage
> > use-case,
> > and now you check whether it can fit others as well....
>
> The jobs framework is generic and can be used for any subsystem,
> there is nothing related to storage about it. But modifying disks *is*
> a storage operation, even if someone from the virt team worked on it.
>
> V2v is also storage operation - if we compare it with copying disks:
>
> - we create a new volume that nobody is using yet
> - if the operation fails, the disk must be in illegal state
> - if the operation fails we delete the disks
> - if the operation succeeds the volume must be legal
> - we need to limit the number of operations on a host
> - we need to detect the job state if the host becomes non-responsive
> - we may want to fence the job if the host becomes non-responsive
> in volume jobs, we can increment the volume generation and run
> the same job on another host.
> - we want to take a lease on storage to ensure that other hosts cannot
> access the same entity, or that the job will fail if someone else is
> using
> this entity
> - we want to take a lease on storage, ensuring that a job cannot get
> stuck for long time - sanlock kill the owner of a lease when storage
> becomes inaccessible.
> - we want to report progress
>
> sysprep is less risky because the operation is faster, but on storage even
> fast operation can get stuck for minutes.
>
> We need to agree on a standard way to do such operations that is safe
> enough
> and can be managed on the engine side.
>
> > IMO it makes much more sense to use events where possible (and you've
> > promised to use those as well, but I don't see you doing that...). v2v
> > should use events for sure, and they have promised to do that in the
> > past,
> > instead of using the v2v jobs. The reason events weren't used originally
> > with the v2v feature, was that it was too risky and the events
> > infrastructure was added too late in the game.
>
> Events are not replacing the need for managing jobs in the vdsm side.
> Engine must have a way to query the current jobs before subscribing
> to events from these jobs, otherwise you will loose events and engine
> will never notice a completed job after network errors.
>
> The jobs framework supports events, see
>
https://gerrit.ovirt.org/67118
>
> We are waiting for review from the infra team, maybe you can
> get someone to review this?
It would have been great to review the design for this before it reaches to
gerrit.
Anyway, I get permissions error when opening. Any clue why?
It is a recent bug in gerrit, or configuration issue, drafts are
private sometimes.
I added you as reviewer, can you see this now?
Nir
>
> Nir
>
> >
> >
> >>>
> >>> Do you "promise" to implement your "next gen API"
for 4.1 as an
> >>> alternative?
> >>
> >>
> >> I guess we need the design first.
> >>
> >>
> >>> On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke(a)redhat.com>
wrote:
> >>>
> >>> On 05/12/16 11:17 +0200, Arik Hadas wrote:
> >>>
> >>>
> >>>
> >>> On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer
> >>> <nsoffer(a)redhat.com>
> >>> wrote:
> >>>
> >>> On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud
> >>> <smelamud(a)redhat.com>
> >>> wrote:
> >>> >
> >>> > Hi!
> >>> >
> >>> > I'm currently working on integration of virt-sysprep
into
> >>> oVirt.
> >>> >
> >>> > Usually, if user creates a template from a regular VM,
and
> >>> then
> >>> creates
> >>> new VMs from this template, these new VMs inherit all
> >>> configuration
> >>> of the
> >>> original VM, including SSH keys, UDEV rules, MAC addresses,
> >>> system
> >>> ID,
> >>> hostname etc. It is unfortunate, because you cannot have two
> >>> network
> >>> devices with the same MAC address in the same network, for
> >>> example.
> >>> >
> >>> > To avoid this, user must clean all machine-specific
> >>> configuration
> >>> from
> >>> the original VM before creating a template from it. You can
> >>> do
> >>> this
> >>> manually, but there is virt-sysprep utility that does this
> >>> automatically.
> >>> >
> >>> > Ideally, virt-sysprep should be seamlessly integrated
into
> >>> template
> >>> creation process. But the first step is to create a simple
> >>> button:
> >>> user
> >>> selects a VM, clicks the button and oVirt executes
> >>> virt-sysprep
> >>> on
> >>> the VM.
> >>> >
> >>> > virt-sysprep works directly on VM's filesystem. It
accepts
> >>> list of
> >>> all
> >>> disks of the VM as parameters:
> >>> >
> >>> > virt-sysprep -a disk1.img -a disk2.img -a disk3.img
> >>> >
> >>> > The architecture is as follows: command on the Engine
side
> >>> runs a
> >>> job on
> >>> VDSM side and tracks its success/failure. The job on VDSM
> >>> side
> >>> runs
> >>> virt-sysprep.
> >>> >
> >>> > The question is how to implement the job correctly?
> >>> >
> >>> > I thought about using storage jobs, but they are
designed
> >>> to
> >>> work
> >>> only
> >>> with a single volume, correct?
> >>>
> >>> New storage verbs are volume based. This make it easy to
> >>> manage
> >>> them on the engine side, and will allow parallelizing volume
> >>> operations
> >>> on single or multiple hosts.
> >>>
> >>> A storage volume job is using sanlock lease on the modified
> >>> volume
> >>> and volume generation number. If a host running pending jobs
> >>> becomes
> >>> non-responsive and cannot be fenced, we can detect the state
> >>> of
> >>> the job, fence the job, and start the job on another host.
> >>>
> >>> In the SPM task, if a host becomes non-responsive and cannot
> >>> be
> >>> fenced, the whole setup is stuck, there is no way to perform
> >>> any
> >>> storage operation.
> >>> > Is is possible to use them with operation that is
> >>> performed
> >>> on
> >>> multiple
> >>> volumes?
> >>> > Or, alternatively, is it possible to use some kind of
'VM
> >>> jobs' -
> >>> that
> >>> work on VM at whole?
> >>>
> >>> We can do:
> >>>
> >>> 1. Add jobs with multiple volumes leases - can make error
> >>> handling
> >>> very
> >>> complex. How do tell a job state if you have multiple
> >>> leases?
> >>> which
> >>> volume generation you use?
> >>>
> >>> 2. Use volume job using one of the volumes (the boot
> >>> volume?).
> >>> This
> >>> does
> >>> not protect the other volumes from modification but
> >>> engine
> >>> is
> >>> responsible
> >>> for this.
> >>>
> >>> 3. Use new "vm jobs", using a vm lease (should be
available
> >>> this
> >>> week
> >>> on master).
> >>> This protects a vm during sysprep from starting the vm.
> >>> We still need a generation to detect the job state, I
> >>> think
> >>> we
> >>> can
> >>> use the sanlock
> >>> lease generation for this.
> >>>
> >>> I like the last option since sysprep is much like running a
> >>> vm.
> >>> > How v2v solves this problem?
> >>>
> >>> It does not.
> >>>
> >>> v2v predates storage volume jobs. It does not use volume
> >>> leases
> >>> and
> >>> generation
> >>> and does have any way to recover if a host running v2v
> >>> becomes
> >>> non-responsive
> >>> and cannot be fenced.
> >>>
> >>> It also does not use the jobs framework and does not use a
> >>> thread
> >>> pool for
> >>> v2v jobs, so it has no limit on the number of storage
> >>> operations on
> >>> a host.
> >>>
> >>>
> >>> Right, but let's be fair and present the benefits of
v2v-jobs
> >>> as
> >>> well:
> >>> 1. it is the simplest "infrastructure" in terms of LOC
> >>>
> >>>
> >>> It is also deprecated. V2V has promised to adopt the richer Host
> >>> Jobs
> >>> API in the future.
> >>>
> >>>
> >>> 2. it is the most efficient mechanism in terms of interactions
> >>> between
> >>> the
> >>> engine and VDSM (it doesn't require new verbs/call, the data
is
> >>> attached to
> >>> VdsStats; probably the easiest mechanism to convert to events)
> >>>
> >>>
> >>> Engine is already polling the host jobs API so I am not sure I
> >>> agree
> >>> with you here.
> >>>
> >>>
> >>> 3. it is the most efficient implementation in terms of
> >>> interaction
> >>> with
> >>> the
> >>> database (no date is persisted into the database, no polling is
> >>> done)
> >>>
> >>>
> >>> Again, we're already using the Host Jobs API. We'll gain
> >>> efficiency
> >>> by migrating away from the old v2v API and having a single, unified
> >>> approach (Host Jobs).
> >>>
> >>>
> >>> Currently we have 3 mechanisms to report jobs:
> >>> 1. VM jobs - that is currently used for live-merge. This
> >>> requires
> >>> the
> >>> VM entity
> >>> to exist in VDSM, thus not suitable for virt-sysprep.
> >>>
> >>>
> >>> Correct, not appropriate for this application.
> >>>
> >>>
> >>> 2. storage jobs - complicated infrastructure, targeted for
> >>> recovering
> >>> from
> >>> failures to maintain storage consistency. Many of the things
> >>> this
> >>> infrastructure knows to handle is irrelevant for virt-sysprep
> >>> flow, and
> >>> the
> >>> fact that virt-sysprep is invoked on VM rather than particular
> >>> disk
> >>> makes it
> >>> less suitable.
> >>>
> >>>
> >>> These are more appropriately called HostJobs and the have the
> >>> following semantics:
> >>> - They represent an external process running on a single host
> >>> - They are not persisted. If the host or vdsm restarts, the job is
> >>> aborted
> >>> - They operate on entities. Currently storage is the first adopter
> >>> of the infrastructure but virt was going to adopt these for the
> >>> next-gen API. Entities can be volumes, storage domains, vms,
> >>> network interfaces, etc.
> >>> - Job status and progress is reported by the Host Jobs API. If a
> >>> job
> >>> is not present, then the underlying entitie(s) must be polled by
> >>> engine to determine the actual state.
> >>>
> >>>
> >>> 3. V2V jobs - no mechanism is provided to resume failed jobs,
> >>> no
> >>> leases, etc
> >>>
> >>>
> >>> This is the old infra upon which Host Jobs are built. v2v has
> >>> promised to move to Host Jobs in the future so we should not add
> >>> new
> >>> dependencies to this code.
> >>>
> >>>
> >>> I have some arguments for using V2V-like jobs [1]:
> >>> 1. creating template from vm is rarely done - if host goes
> >>> unresponsive
> >>> or any
> >>> other failure is detected we can just remove the template and
> >>> report
> >>> the error
> >>>
> >>>
> >>> We can chose this error handling with Host Jobs as well.
> >>>
> >>>
> >>> 2. the phase of virt-sysprep is, unlike typical storage
> >>> operation,
> >>> short -
> >>> reducing the risk of failures during the process
> >>>
> >>>
> >>> Reduced risk of failures is never an excuse to have lax error
> >>> handling. The storage flavored host jobs provide tons of utilities
> >>> for making error handling standardized, easy to implement, and
> >>> correct.
> >>>
> >>>
> >>> 3. during the operation the VM is down - by locking the
> >>> VM/template and
> >>> its
> >>> disks on the engine side, we render leases-like mechanism
> >>> redundant
> >>>
> >>>
> >>> Eventually we want to protect all operations on storage with
> >>> sanlock
> >>> leases. This is safer and allows for a more distributed approach
> >>> to
> >>> management. Again, the use of leases correctly in host jobs
> >>> requires
> >>> about 5 lines of code. The benefits of standardization far
> >>> outweigh
> >>> any perceived simplification resulting from omitting it.
> >>>
> >>>
> >>> 4. in the worst case - the disk will not be corrupted (only
> >>> some
> >>> of the
> >>> data
> >>> might be removed).
> >>>
> >>>
> >>> Again, the way engine chooses to handle job failures is independent
> >>> of
> >>> the mechanism. Let's separate that from this discussion.
> >>>
> >>>
> >>> So I think that the mechanism for storage jobs is an over-kill
> >>> for
> >>> this
> >>> case.
> >>> We can keep it simple by generalise the V2V-job for other
> >>> virt-tools
> >>> jobs, like
> >>> virt-sysprep.
> >>>
> >>>
> >>> I think we ought to standardize on the Host Jobs framework where we
> >>> can collaborate on unit tests, standardized locking and error
> >>> handling, abort logic, etc. When v2v moves to host jobs then we
> >>> will
> >>> have a unified method of handling ephemeral jobs that are tied to
> >>> entities.
> >>>
> >>> --
> >>> Adam Litke
> >>>
> >>>
> >>
> >> --
> >> Adam Litke
> >
> >