[ovirt-devel] [VDSM] Correct implementation of virt-sysprep job

Wed Dec 7 18:21:48 UTC 2016

On Dec 7, 2016 20:16, "Nir Soffer" <nsoffer at redhat.com> wrote:
>
> On Wed, Dec 7, 2016 at 8:10 PM, Oved Ourfali <oourfali at redhat.com> wrote:
> > On Dec 7, 2016 16:00, "Nir Soffer" <nsoffer at redhat.com> wrote:
> >>
> >> On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <oourfali at redhat.com>
wrote:
> >> >
> >> >
> >> > On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke at redhat.com>
wrote:
> >> >>
> >> >> On 06/12/16 22:06 +0200, Arik Hadas wrote:
> >> >>>
> >> >>> Adam,
> >> >>
> >> >>
> >> >> :)  You seem upset.  Sorry if I touched on a nerve...
> >> >>
> >> >>> Just out of curiosity: when you write "v2v has promised" - what
> >> >>> exactly
> >> >>> do you
> >> >>> mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar
and
> >> >>> I
> >> >>> that
> >> >>> implemented the integration with virt-v2v? I'm not aware of such a
> >> >>> promise by
> >> >>> any of these options :)
> >> >>
> >> >>
> >> >> Some history...
> >> >>
> >> >> Earlier this year Nir, Francesco (added), Shahar, and I began
> >> >> discussing the similarities between what storage needed to do with
> >> >> external commands and what was designed specifically for v2v.  I am
> >> >> not sure if you were involved in the project at that time.  The plan
> >> >> was to create common infrastructure that could be extended to fit
the
> >> >> unique needs of the verticals.  The v2v code was going to be moved
> >> >> over to the new infrastructure (see [1]) and the only thing that
> >> >> stopped the initial patch was lack of a VMWare testing environment
for
> >> >> verification.
> >> >>
> >> >> At that time storage refocused on developing verbs that used the new
> >> >> infrastructure and have been maintaining its suitability for general
> >> >> use.  Conversion of v2v -> Host Jobs is obviously a lower priority
> >> >> item and much more difficult now due to the early missed
opportunity.
> >> >>
> >> >>> Anyway, let's say that you were given such a promise by someone and
> >> >>> thus
> >> >>> consider that mechanism to be deprecated - it doesn't really
matter.
> >> >>
> >> >>
> >> >> I may be biased but I think my opinion does matter.
> >> >>
> >> >>> The current implementation doesn't well fit to this flow (it
requires
> >> >>> per-volume job, it creates leases that are not needed for
template's
> >> >>> disks,
> >> >>> ...) and with the "next-gen API" with proper support for virt flows
> >> >>> not
> >> >>> even
> >> >>> being discussed with us (and iiuc also not with the infra team)
yet, I
> >> >>> don't
> >> >>> understand what do you suggest except for some strong, though
> >> >>> irrelevant,
> >> >>> statements.
> >> >>
> >> >>
> >> >> If you are willing to engage in a good-faith technical discussion I
am
> >> >> sure I can help you to understand.  These operations to storage
demand
> >> >> some form of locking protection.  If volume leases aren't
appropriate
> >> >> then
> >> >> perhaps we should use the VM Leases / xleases that Nir is finishing
> >> >> off for 4.1 now.
> >> >>
> >> >>> I suggest loud and clear to reuse (not to add dependencies, not to
> >> >>> enhance, ..)
> >> >>> an existing mechanism for a very similar flow of virt-v2v that
works
> >> >>> well
> >> >>> and
> >> >>> simple.
> >> >>
> >> >>
> >> >> I clearly remember discussions involving infra (hello Oved), virt
> >> >> (hola Michal), and storage where we decided that new APIs performing
> >> >> async operations involving external commands should use the HostJobs
> >> >> infrastructure instead of adding more information to Host Stats.
> >> >> These were the "famous" entity polling meetings.
> >>
> >> We discussed these issues behind close doors, not in the public mailing
> >> list,
> >> so it is not surprising that people do not know about the agreements we
> >> had.
> >>
> >
> > The core team was there. So it is surprising.
> >
> >> >>
> >> >> Of course plans can change but I have never been looped into any
such
> >> >> discussions.
> >> >>
> >> >
> >> > Well, I think that when someone builds a good infrastructure he first
> >> > needs
> >> > to talk to all consumers and make sure it fits.
> >> > In this case it seems like most work was done to fit the storage
> >> > use-case,
> >> > and now you check whether it can fit others as well....
> >>
> >> The jobs framework is generic and can be used for any subsystem,
> >> there is nothing related to storage about it. But modifying disks *is*
> >> a storage operation, even if someone from the virt team worked on it.
> >>
> >> V2v is also storage operation - if we compare it with copying disks:
> >>
> >> - we create a new volume that nobody is using yet
> >> - if the operation fails, the disk must be in illegal state
> >> - if the operation fails we delete the disks
> >> - if the operation succeeds the volume must be legal
> >> - we need to limit the number of operations on a host
> >> - we need to detect the job state if the host becomes non-responsive
> >> - we may want to fence the job if the host becomes non-responsive
> >>   in volume jobs, we can increment the volume generation and run
> >>   the same job on another host.
> >> - we want to take a lease on storage to ensure that other hosts cannot
> >>   access the same entity, or that the job will fail if someone else is
> >> using
> >>   this entity
> >> - we want to take a lease on storage, ensuring that a job cannot get
> >>   stuck for long time - sanlock kill the owner of a lease when storage
> >>   becomes inaccessible.
> >> - we want to report progress
> >>
> >> sysprep is less risky because the operation is faster, but on storage
even
> >> fast operation can get stuck for minutes.
> >>
> >> We need to agree on a standard way to do such operations that is safe
> >> enough
> >> and can be managed on the engine side.
> >>
> >> > IMO it makes much more sense to use events where possible (and you've
> >> > promised to use those as well, but I don't see you doing that...).
v2v
> >> > should use events for sure, and they have promised to do that in the
> >> > past,
> >> > instead of using the v2v jobs. The reason events weren't used
originally
> >> > with the v2v feature, was that it was too risky and the events
> >> > infrastructure was added too late in the game.
> >>
> >> Events are not replacing the need for managing jobs in the vdsm side.
> >> Engine must have a way to query the current jobs before subscribing
> >> to events from these jobs, otherwise you will loose events and engine
> >> will never notice a completed job after network errors.
> >>
> >> The jobs framework supports events, see
> >> https://gerrit.ovirt.org/67118
> >>
> >> We are waiting for review from the infra team, maybe you can
> >> get someone to review this?
> >
> > It would have been great to review the design for this before it
reaches to
> > gerrit.
> > Anyway, I get permissions error when opening. Any clue why?
>
> It is a recent bug in gerrit, or configuration issue, drafts are
> private sometimes.
>
> I added you as reviewer, can you see this now?
>

Yes. I see Piotr is already on it.
I'll also be happy to hear how are you going to use events in your current
design.

Also, is there a design page for this work?

Thanks,
Oved

> Nir
>
> >
> >>
> >> Nir
> >>
> >> >
> >> >
> >> >>>
> >> >>> Do you "promise" to implement your "next gen API" for 4.1 as an
> >> >>> alternative?
> >> >>
> >> >>
> >> >> I guess we need the design first.
> >> >>
> >> >>
> >> >>> On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke at redhat.com>
wrote:
> >> >>>
> >> >>>    On 05/12/16 11:17 +0200, Arik Hadas wrote:
> >> >>>
> >> >>>
> >> >>>
> >> >>>        On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer
> >> >>> <nsoffer at redhat.com>
> >> >>> wrote:
> >> >>>
> >> >>>           On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud
> >> >>> <smelamud at redhat.com>
> >> >>>        wrote:
> >> >>>           >
> >> >>>           > Hi!
> >> >>>           >
> >> >>>           > I'm currently working on integration of virt-sysprep
into
> >> >>> oVirt.
> >> >>>           >
> >> >>>           > Usually, if user creates a template from a regular VM,
and
> >> >>> then
> >> >>>        creates
> >> >>>           new VMs from this template, these new VMs inherit all
> >> >>> configuration
> >> >>>        of the
> >> >>>           original VM, including SSH keys, UDEV rules, MAC
addresses,
> >> >>> system
> >> >>>        ID,
> >> >>>           hostname etc. It is unfortunate, because you cannot have
two
> >> >>> network
> >> >>>           devices with the same MAC address in the same network,
for
> >> >>> example.
> >> >>>           >
> >> >>>           > To avoid this, user must clean all machine-specific
> >> >>> configuration
> >> >>>        from
> >> >>>           the original VM before creating a template from it. You
can
> >> >>> do
> >> >>> this
> >> >>>           manually, but there is virt-sysprep utility that does
this
> >> >>>        automatically.
> >> >>>           >
> >> >>>           > Ideally, virt-sysprep should be seamlessly integrated
into
> >> >>>        template
> >> >>>           creation process. But the first step is to create a
simple
> >> >>> button:
> >> >>>        user
> >> >>>           selects a VM, clicks the button and oVirt executes
> >> >>> virt-sysprep
> >> >>> on
> >> >>>        the VM.
> >> >>>           >
> >> >>>           > virt-sysprep works directly on VM's filesystem. It
accepts
> >> >>> list of
> >> >>>        all
> >> >>>           disks of the VM as parameters:
> >> >>>           >
> >> >>>           > virt-sysprep -a disk1.img -a disk2.img -a disk3.img
> >> >>>           >
> >> >>>           > The architecture is as follows: command on the Engine
side
> >> >>> runs a
> >> >>>        job on
> >> >>>           VDSM side and tracks its success/failure. The job on VDSM
> >> >>> side
> >> >>> runs
> >> >>>           virt-sysprep.
> >> >>>           >
> >> >>>           > The question is how to implement the job correctly?
> >> >>>           >
> >> >>>           > I thought about using storage jobs, but they are
designed
> >> >>> to
> >> >>> work
> >> >>>        only
> >> >>>           with a single volume, correct?
> >> >>>
> >> >>>           New storage verbs are volume based. This make it easy to
> >> >>> manage
> >> >>>           them on the engine side, and will allow parallelizing
volume
> >> >>>        operations
> >> >>>           on single or multiple hosts.
> >> >>>
> >> >>>           A storage volume job is using sanlock lease on the
modified
> >> >>> volume
> >> >>>           and volume generation number. If a host running pending
jobs
> >> >>> becomes
> >> >>>           non-responsive and cannot be fenced, we can detect the
state
> >> >>> of
> >> >>>           the job, fence the job, and start the job on another
host.
> >> >>>
> >> >>>           In the SPM task, if a host becomes non-responsive and
cannot
> >> >>> be
> >> >>>           fenced, the whole setup is stuck, there is no way to
perform
> >> >>> any
> >> >>>           storage operation.
> >> >>>             > Is is possible to use them with operation that is
> >> >>> performed
> >> >>> on
> >> >>>        multiple
> >> >>>           volumes?
> >> >>>           > Or, alternatively, is it possible to use some kind of
'VM
> >> >>> jobs' -
> >> >>>        that
> >> >>>           work on VM at whole?
> >> >>>
> >> >>>           We can do:
> >> >>>
> >> >>>           1. Add jobs with multiple volumes leases - can make error
> >> >>> handling
> >> >>>        very
> >> >>>               complex. How do tell a job state if you have multiple
> >> >>> leases?
> >> >>>        which
> >> >>>               volume generation you use?
> >> >>>
> >> >>>           2. Use volume job using one of the volumes (the boot
> >> >>> volume?).
> >> >>> This
> >> >>>        does
> >> >>>               not protect the other volumes from modification but
> >> >>> engine
> >> >>> is
> >> >>>           responsible
> >> >>>               for this.
> >> >>>
> >> >>>           3. Use new "vm jobs", using a vm lease (should be
available
> >> >>> this
> >> >>>        week
> >> >>>           on master).
> >> >>>               This protects a vm during sysprep from starting the
vm.
> >> >>>               We still need a generation to detect the job state, I
> >> >>> think
> >> >>> we
> >> >>>        can
> >> >>>           use the sanlock
> >> >>>               lease generation for this.
> >> >>>
> >> >>>           I like the last option since sysprep is much like
running a
> >> >>> vm.
> >> >>>             > How v2v solves this problem?
> >> >>>
> >> >>>           It does not.
> >> >>>
> >> >>>           v2v predates storage volume jobs. It does not use volume
> >> >>> leases
> >> >>> and
> >> >>>           generation
> >> >>>           and does have any way to recover if a host running v2v
> >> >>> becomes
> >> >>>           non-responsive
> >> >>>           and cannot be fenced.
> >> >>>
> >> >>>           It also does not use the jobs framework and does not use
a
> >> >>> thread
> >> >>>        pool for
> >> >>>           v2v jobs, so it has no limit on the number of storage
> >> >>> operations on
> >> >>>        a host.
> >> >>>
> >> >>>
> >> >>>        Right, but let's be fair and present the benefits of
v2v-jobs
> >> >>> as
> >> >>> well:
> >> >>>        1. it is the simplest "infrastructure" in terms of LOC
> >> >>>
> >> >>>
> >> >>>    It is also deprecated.  V2V has promised to adopt the richer
Host
> >> >>> Jobs
> >> >>>    API in the future.
> >> >>>
> >> >>>
> >> >>>        2. it is the most efficient mechanism in terms of
interactions
> >> >>> between
> >> >>>        the
> >> >>>        engine and VDSM (it doesn't require new verbs/call, the
data is
> >> >>>        attached to
> >> >>>        VdsStats; probably the easiest mechanism to convert to
events)
> >> >>>
> >> >>>
> >> >>>    Engine is already polling the host jobs API so I am not sure I
> >> >>> agree
> >> >>>    with you here.
> >> >>>
> >> >>>
> >> >>>        3. it is the most efficient implementation in terms of
> >> >>> interaction
> >> >>> with
> >> >>>        the
> >> >>>        database (no date is persisted into the database, no
polling is
> >> >>> done)
> >> >>>
> >> >>>
> >> >>>    Again, we're already using the Host Jobs API.  We'll gain
> >> >>> efficiency
> >> >>>    by migrating away from the old v2v API and having a single,
unified
> >> >>>    approach (Host Jobs).
> >> >>>
> >> >>>
> >> >>>        Currently we have 3 mechanisms to report jobs:
> >> >>>        1. VM jobs - that is currently used for live-merge. This
> >> >>> requires
> >> >>> the
> >> >>>        VM entity
> >> >>>        to exist in VDSM, thus not suitable for virt-sysprep.
> >> >>>
> >> >>>
> >> >>>    Correct, not appropriate for this application.
> >> >>>
> >> >>>
> >> >>>        2. storage jobs - complicated infrastructure, targeted for
> >> >>> recovering
> >> >>>        from
> >> >>>        failures to maintain storage consistency. Many of the things
> >> >>> this
> >> >>>        infrastructure knows to handle is irrelevant for
virt-sysprep
> >> >>> flow, and
> >> >>>        the
> >> >>>        fact that virt-sysprep is invoked on VM rather than
particular
> >> >>> disk
> >> >>>        makes it
> >> >>>        less suitable.
> >> >>>
> >> >>>
> >> >>>    These are more appropriately called HostJobs and the have the
> >> >>>    following semantics:
> >> >>>    - They represent an external process running on a single host
> >> >>>    - They are not persisted.  If the host or vdsm restarts, the
job is
> >> >>>      aborted
> >> >>>    - They operate on entities.  Currently storage is the first
adopter
> >> >>>      of the infrastructure but virt was going to adopt these for
the
> >> >>>      next-gen API.  Entities can be volumes, storage domains, vms,
> >> >>>      network interfaces, etc.
> >> >>>    - Job status and progress is reported by the Host Jobs API.  If
a
> >> >>> job
> >> >>>      is not present, then the underlying entitie(s) must be polled
by
> >> >>>      engine to determine the actual state.
> >> >>>
> >> >>>
> >> >>>        3. V2V jobs - no mechanism is provided to resume failed
jobs,
> >> >>> no
> >> >>>        leases, etc
> >> >>>
> >> >>>
> >> >>>    This is the old infra upon which Host Jobs are built.  v2v has
> >> >>>    promised to move to Host Jobs in the future so we should not add
> >> >>> new
> >> >>>    dependencies to this code.
> >> >>>
> >> >>>
> >> >>>        I have some arguments for using V2V-like jobs [1]:
> >> >>>        1. creating template from vm is rarely done - if host goes
> >> >>> unresponsive
> >> >>>        or any
> >> >>>        other failure is detected we can just remove the template
and
> >> >>> report
> >> >>>        the error
> >> >>>
> >> >>>
> >> >>>    We can chose this error handling with Host Jobs as well.
> >> >>>
> >> >>>
> >> >>>        2. the phase of virt-sysprep is, unlike typical storage
> >> >>> operation,
> >> >>>        short -
> >> >>>        reducing the risk of failures during the process
> >> >>>
> >> >>>
> >> >>>    Reduced risk of failures is never an excuse to have lax error
> >> >>>    handling.  The storage flavored host jobs provide tons of
utilities
> >> >>>    for making error handling standardized, easy to implement, and
> >> >>>    correct.
> >> >>>
> >> >>>
> >> >>>        3. during the operation the VM is down - by locking the
> >> >>> VM/template and
> >> >>>        its
> >> >>>        disks on the engine side, we render leases-like mechanism
> >> >>> redundant
> >> >>>
> >> >>>
> >> >>>    Eventually we want to protect all operations on storage with
> >> >>> sanlock
> >> >>>    leases.  This is safer and allows for a more distributed
approach
> >> >>> to
> >> >>>    management.  Again, the use of leases correctly in host jobs
> >> >>> requires
> >> >>>    about 5 lines of code.  The benefits of standardization far
> >> >>> outweigh
> >> >>>    any perceived simplification resulting from omitting it.
> >> >>>
> >> >>>
> >> >>>        4. in the worst case - the disk will not be corrupted (only
> >> >>> some
> >> >>> of the
> >> >>>        data
> >> >>>        might be removed).
> >> >>>
> >> >>>
> >> >>>    Again, the way engine chooses to handle job failures is
independent
> >> >>> of
> >> >>>    the mechanism.  Let's separate that from this discussion.
> >> >>>
> >> >>>
> >> >>>        So I think that the mechanism for storage jobs is an
over-kill
> >> >>> for
> >> >>> this
> >> >>>        case.
> >> >>>        We can keep it simple by generalise the V2V-job for other
> >> >>> virt-tools
> >> >>>        jobs, like
> >> >>>        virt-sysprep.
> >> >>>
> >> >>>
> >> >>>    I think we ought to standardize on the Host Jobs framework
where we
> >> >>>    can collaborate on unit tests, standardized locking and error
> >> >>>    handling, abort logic, etc.  When v2v moves to host jobs then we
> >> >>> will
> >> >>>    have a unified method of handling ephemeral jobs that are tied
to
> >> >>>    entities.
> >> >>>
> >> >>>    --
> >> >>>    Adam Litke
> >> >>>
> >> >>>
> >> >>
> >> >> --
> >> >> Adam Litke
> >> >
> >> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20161207/b930a881/attachment-0001.html>