[ovirt-devel] [VDSM] Correct implementation of virt-sysprep job

Wed Dec 7 18:16:21 UTC 2016

On Wed, Dec 7, 2016 at 8:10 PM, Oved Ourfali <oourfali at redhat.com> wrote:
> On Dec 7, 2016 16:00, "Nir Soffer" <nsoffer at redhat.com> wrote:
>>
>> On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <oourfali at redhat.com> wrote:
>> >
>> >
>> > On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke at redhat.com> wrote:
>> >>
>> >> On 06/12/16 22:06 +0200, Arik Hadas wrote:
>> >>>
>> >>> Adam,
>> >>
>> >>
>> >> :)  You seem upset.  Sorry if I touched on a nerve...
>> >>
>> >>> Just out of curiosity: when you write "v2v has promised" - what
>> >>> exactly
>> >>> do you
>> >>> mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and
>> >>> I
>> >>> that
>> >>> implemented the integration with virt-v2v? I'm not aware of such a
>> >>> promise by
>> >>> any of these options :)
>> >>
>> >>
>> >> Some history...
>> >>
>> >> Earlier this year Nir, Francesco (added), Shahar, and I began
>> >> discussing the similarities between what storage needed to do with
>> >> external commands and what was designed specifically for v2v.  I am
>> >> not sure if you were involved in the project at that time.  The plan
>> >> was to create common infrastructure that could be extended to fit the
>> >> unique needs of the verticals.  The v2v code was going to be moved
>> >> over to the new infrastructure (see [1]) and the only thing that
>> >> stopped the initial patch was lack of a VMWare testing environment for
>> >> verification.
>> >>
>> >> At that time storage refocused on developing verbs that used the new
>> >> infrastructure and have been maintaining its suitability for general
>> >> use.  Conversion of v2v -> Host Jobs is obviously a lower priority
>> >> item and much more difficult now due to the early missed opportunity.
>> >>
>> >>> Anyway, let's say that you were given such a promise by someone and
>> >>> thus
>> >>> consider that mechanism to be deprecated - it doesn't really matter.
>> >>
>> >>
>> >> I may be biased but I think my opinion does matter.
>> >>
>> >>> The current implementation doesn't well fit to this flow (it requires
>> >>> per-volume job, it creates leases that are not needed for template's
>> >>> disks,
>> >>> ...) and with the "next-gen API" with proper support for virt flows
>> >>> not
>> >>> even
>> >>> being discussed with us (and iiuc also not with the infra team) yet, I
>> >>> don't
>> >>> understand what do you suggest except for some strong, though
>> >>> irrelevant,
>> >>> statements.
>> >>
>> >>
>> >> If you are willing to engage in a good-faith technical discussion I am
>> >> sure I can help you to understand.  These operations to storage demand
>> >> some form of locking protection.  If volume leases aren't appropriate
>> >> then
>> >> perhaps we should use the VM Leases / xleases that Nir is finishing
>> >> off for 4.1 now.
>> >>
>> >>> I suggest loud and clear to reuse (not to add dependencies, not to
>> >>> enhance, ..)
>> >>> an existing mechanism for a very similar flow of virt-v2v that works
>> >>> well
>> >>> and
>> >>> simple.
>> >>
>> >>
>> >> I clearly remember discussions involving infra (hello Oved), virt
>> >> (hola Michal), and storage where we decided that new APIs performing
>> >> async operations involving external commands should use the HostJobs
>> >> infrastructure instead of adding more information to Host Stats.
>> >> These were the "famous" entity polling meetings.
>>
>> We discussed these issues behind close doors, not in the public mailing
>> list,
>> so it is not surprising that people do not know about the agreements we
>> had.
>>
>
> The core team was there. So it is surprising.
>
>> >>
>> >> Of course plans can change but I have never been looped into any such
>> >> discussions.
>> >>
>> >
>> > Well, I think that when someone builds a good infrastructure he first
>> > needs
>> > to talk to all consumers and make sure it fits.
>> > In this case it seems like most work was done to fit the storage
>> > use-case,
>> > and now you check whether it can fit others as well....
>>
>> The jobs framework is generic and can be used for any subsystem,
>> there is nothing related to storage about it. But modifying disks *is*
>> a storage operation, even if someone from the virt team worked on it.
>>
>> V2v is also storage operation - if we compare it with copying disks:
>>
>> - we create a new volume that nobody is using yet
>> - if the operation fails, the disk must be in illegal state
>> - if the operation fails we delete the disks
>> - if the operation succeeds the volume must be legal
>> - we need to limit the number of operations on a host
>> - we need to detect the job state if the host becomes non-responsive
>> - we may want to fence the job if the host becomes non-responsive
>>   in volume jobs, we can increment the volume generation and run
>>   the same job on another host.
>> - we want to take a lease on storage to ensure that other hosts cannot
>>   access the same entity, or that the job will fail if someone else is
>> using
>>   this entity
>> - we want to take a lease on storage, ensuring that a job cannot get
>>   stuck for long time - sanlock kill the owner of a lease when storage
>>   becomes inaccessible.
>> - we want to report progress
>>
>> sysprep is less risky because the operation is faster, but on storage even
>> fast operation can get stuck for minutes.
>>
>> We need to agree on a standard way to do such operations that is safe
>> enough
>> and can be managed on the engine side.
>>
>> > IMO it makes much more sense to use events where possible (and you've
>> > promised to use those as well, but I don't see you doing that...). v2v
>> > should use events for sure, and they have promised to do that in the
>> > past,
>> > instead of using the v2v jobs. The reason events weren't used originally
>> > with the v2v feature, was that it was too risky and the events
>> > infrastructure was added too late in the game.
>>
>> Events are not replacing the need for managing jobs in the vdsm side.
>> Engine must have a way to query the current jobs before subscribing
>> to events from these jobs, otherwise you will loose events and engine
>> will never notice a completed job after network errors.
>>
>> The jobs framework supports events, see
>> https://gerrit.ovirt.org/67118
>>
>> We are waiting for review from the infra team, maybe you can
>> get someone to review this?
>
> It would have been great to review the design for this before it reaches to
> gerrit.
> Anyway, I get permissions error when opening. Any clue why?

It is a recent bug in gerrit, or configuration issue, drafts are
private sometimes.

I added you as reviewer, can you see this now?

Nir

>
>>
>> Nir
>>
>> >
>> >
>> >>>
>> >>> Do you "promise" to implement your "next gen API" for 4.1 as an
>> >>> alternative?
>> >>
>> >>
>> >> I guess we need the design first.
>> >>
>> >>
>> >>> On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke at redhat.com> wrote:
>> >>>
>> >>>    On 05/12/16 11:17 +0200, Arik Hadas wrote:
>> >>>
>> >>>
>> >>>
>> >>>        On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer
>> >>> <nsoffer at redhat.com>
>> >>> wrote:
>> >>>
>> >>>           On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud
>> >>> <smelamud at redhat.com>
>> >>>        wrote:
>> >>>           >
>> >>>           > Hi!
>> >>>           >
>> >>>           > I'm currently working on integration of virt-sysprep into
>> >>> oVirt.
>> >>>           >
>> >>>           > Usually, if user creates a template from a regular VM, and
>> >>> then
>> >>>        creates
>> >>>           new VMs from this template, these new VMs inherit all
>> >>> configuration
>> >>>        of the
>> >>>           original VM, including SSH keys, UDEV rules, MAC addresses,
>> >>> system
>> >>>        ID,
>> >>>           hostname etc. It is unfortunate, because you cannot have two
>> >>> network
>> >>>           devices with the same MAC address in the same network, for
>> >>> example.
>> >>>           >
>> >>>           > To avoid this, user must clean all machine-specific
>> >>> configuration
>> >>>        from
>> >>>           the original VM before creating a template from it. You can
>> >>> do
>> >>> this
>> >>>           manually, but there is virt-sysprep utility that does this
>> >>>        automatically.
>> >>>           >
>> >>>           > Ideally, virt-sysprep should be seamlessly integrated into
>> >>>        template
>> >>>           creation process. But the first step is to create a simple
>> >>> button:
>> >>>        user
>> >>>           selects a VM, clicks the button and oVirt executes
>> >>> virt-sysprep
>> >>> on
>> >>>        the VM.
>> >>>           >
>> >>>           > virt-sysprep works directly on VM's filesystem. It accepts
>> >>> list of
>> >>>        all
>> >>>           disks of the VM as parameters:
>> >>>           >
>> >>>           > virt-sysprep -a disk1.img -a disk2.img -a disk3.img
>> >>>           >
>> >>>           > The architecture is as follows: command on the Engine side
>> >>> runs a
>> >>>        job on
>> >>>           VDSM side and tracks its success/failure. The job on VDSM
>> >>> side
>> >>> runs
>> >>>           virt-sysprep.
>> >>>           >
>> >>>           > The question is how to implement the job correctly?
>> >>>           >
>> >>>           > I thought about using storage jobs, but they are designed
>> >>> to
>> >>> work
>> >>>        only
>> >>>           with a single volume, correct?
>> >>>
>> >>>           New storage verbs are volume based. This make it easy to
>> >>> manage
>> >>>           them on the engine side, and will allow parallelizing volume
>> >>>        operations
>> >>>           on single or multiple hosts.
>> >>>
>> >>>           A storage volume job is using sanlock lease on the modified
>> >>> volume
>> >>>           and volume generation number. If a host running pending jobs
>> >>> becomes
>> >>>           non-responsive and cannot be fenced, we can detect the state
>> >>> of
>> >>>           the job, fence the job, and start the job on another host.
>> >>>
>> >>>           In the SPM task, if a host becomes non-responsive and cannot
>> >>> be
>> >>>           fenced, the whole setup is stuck, there is no way to perform
>> >>> any
>> >>>           storage operation.
>> >>>             > Is is possible to use them with operation that is
>> >>> performed
>> >>> on
>> >>>        multiple
>> >>>           volumes?
>> >>>           > Or, alternatively, is it possible to use some kind of 'VM
>> >>> jobs' -
>> >>>        that
>> >>>           work on VM at whole?
>> >>>
>> >>>           We can do:
>> >>>
>> >>>           1. Add jobs with multiple volumes leases - can make error
>> >>> handling
>> >>>        very
>> >>>               complex. How do tell a job state if you have multiple
>> >>> leases?
>> >>>        which
>> >>>               volume generation you use?
>> >>>
>> >>>           2. Use volume job using one of the volumes (the boot
>> >>> volume?).
>> >>> This
>> >>>        does
>> >>>               not protect the other volumes from modification but
>> >>> engine
>> >>> is
>> >>>           responsible
>> >>>               for this.
>> >>>
>> >>>           3. Use new "vm jobs", using a vm lease (should be available
>> >>> this
>> >>>        week
>> >>>           on master).
>> >>>               This protects a vm during sysprep from starting the vm.
>> >>>               We still need a generation to detect the job state, I
>> >>> think
>> >>> we
>> >>>        can
>> >>>           use the sanlock
>> >>>               lease generation for this.
>> >>>
>> >>>           I like the last option since sysprep is much like running a
>> >>> vm.
>> >>>             > How v2v solves this problem?
>> >>>
>> >>>           It does not.
>> >>>
>> >>>           v2v predates storage volume jobs. It does not use volume
>> >>> leases
>> >>> and
>> >>>           generation
>> >>>           and does have any way to recover if a host running v2v
>> >>> becomes
>> >>>           non-responsive
>> >>>           and cannot be fenced.
>> >>>
>> >>>           It also does not use the jobs framework and does not use a
>> >>> thread
>> >>>        pool for
>> >>>           v2v jobs, so it has no limit on the number of storage
>> >>> operations on
>> >>>        a host.
>> >>>
>> >>>
>> >>>        Right, but let's be fair and present the benefits of v2v-jobs
>> >>> as
>> >>> well:
>> >>>        1. it is the simplest "infrastructure" in terms of LOC
>> >>>
>> >>>
>> >>>    It is also deprecated.  V2V has promised to adopt the richer Host
>> >>> Jobs
>> >>>    API in the future.
>> >>>
>> >>>
>> >>>        2. it is the most efficient mechanism in terms of interactions
>> >>> between
>> >>>        the
>> >>>        engine and VDSM (it doesn't require new verbs/call, the data is
>> >>>        attached to
>> >>>        VdsStats; probably the easiest mechanism to convert to events)
>> >>>
>> >>>
>> >>>    Engine is already polling the host jobs API so I am not sure I
>> >>> agree
>> >>>    with you here.
>> >>>
>> >>>
>> >>>        3. it is the most efficient implementation in terms of
>> >>> interaction
>> >>> with
>> >>>        the
>> >>>        database (no date is persisted into the database, no polling is
>> >>> done)
>> >>>
>> >>>
>> >>>    Again, we're already using the Host Jobs API.  We'll gain
>> >>> efficiency
>> >>>    by migrating away from the old v2v API and having a single, unified
>> >>>    approach (Host Jobs).
>> >>>
>> >>>
>> >>>        Currently we have 3 mechanisms to report jobs:
>> >>>        1. VM jobs - that is currently used for live-merge. This
>> >>> requires
>> >>> the
>> >>>        VM entity
>> >>>        to exist in VDSM, thus not suitable for virt-sysprep.
>> >>>
>> >>>
>> >>>    Correct, not appropriate for this application.
>> >>>
>> >>>
>> >>>        2. storage jobs - complicated infrastructure, targeted for
>> >>> recovering
>> >>>        from
>> >>>        failures to maintain storage consistency. Many of the things
>> >>> this
>> >>>        infrastructure knows to handle is irrelevant for virt-sysprep
>> >>> flow, and
>> >>>        the
>> >>>        fact that virt-sysprep is invoked on VM rather than particular
>> >>> disk
>> >>>        makes it
>> >>>        less suitable.
>> >>>
>> >>>
>> >>>    These are more appropriately called HostJobs and the have the
>> >>>    following semantics:
>> >>>    - They represent an external process running on a single host
>> >>>    - They are not persisted.  If the host or vdsm restarts, the job is
>> >>>      aborted
>> >>>    - They operate on entities.  Currently storage is the first adopter
>> >>>      of the infrastructure but virt was going to adopt these for the
>> >>>      next-gen API.  Entities can be volumes, storage domains, vms,
>> >>>      network interfaces, etc.
>> >>>    - Job status and progress is reported by the Host Jobs API.  If a
>> >>> job
>> >>>      is not present, then the underlying entitie(s) must be polled by
>> >>>      engine to determine the actual state.
>> >>>
>> >>>
>> >>>        3. V2V jobs - no mechanism is provided to resume failed jobs,
>> >>> no
>> >>>        leases, etc
>> >>>
>> >>>
>> >>>    This is the old infra upon which Host Jobs are built.  v2v has
>> >>>    promised to move to Host Jobs in the future so we should not add
>> >>> new
>> >>>    dependencies to this code.
>> >>>
>> >>>
>> >>>        I have some arguments for using V2V-like jobs [1]:
>> >>>        1. creating template from vm is rarely done - if host goes
>> >>> unresponsive
>> >>>        or any
>> >>>        other failure is detected we can just remove the template and
>> >>> report
>> >>>        the error
>> >>>
>> >>>
>> >>>    We can chose this error handling with Host Jobs as well.
>> >>>
>> >>>
>> >>>        2. the phase of virt-sysprep is, unlike typical storage
>> >>> operation,
>> >>>        short -
>> >>>        reducing the risk of failures during the process
>> >>>
>> >>>
>> >>>    Reduced risk of failures is never an excuse to have lax error
>> >>>    handling.  The storage flavored host jobs provide tons of utilities
>> >>>    for making error handling standardized, easy to implement, and
>> >>>    correct.
>> >>>
>> >>>
>> >>>        3. during the operation the VM is down - by locking the
>> >>> VM/template and
>> >>>        its
>> >>>        disks on the engine side, we render leases-like mechanism
>> >>> redundant
>> >>>
>> >>>
>> >>>    Eventually we want to protect all operations on storage with
>> >>> sanlock
>> >>>    leases.  This is safer and allows for a more distributed approach
>> >>> to
>> >>>    management.  Again, the use of leases correctly in host jobs
>> >>> requires
>> >>>    about 5 lines of code.  The benefits of standardization far
>> >>> outweigh
>> >>>    any perceived simplification resulting from omitting it.
>> >>>
>> >>>
>> >>>        4. in the worst case - the disk will not be corrupted (only
>> >>> some
>> >>> of the
>> >>>        data
>> >>>        might be removed).
>> >>>
>> >>>
>> >>>    Again, the way engine chooses to handle job failures is independent
>> >>> of
>> >>>    the mechanism.  Let's separate that from this discussion.
>> >>>
>> >>>
>> >>>        So I think that the mechanism for storage jobs is an over-kill
>> >>> for
>> >>> this
>> >>>        case.
>> >>>        We can keep it simple by generalise the V2V-job for other
>> >>> virt-tools
>> >>>        jobs, like
>> >>>        virt-sysprep.
>> >>>
>> >>>
>> >>>    I think we ought to standardize on the Host Jobs framework where we
>> >>>    can collaborate on unit tests, standardized locking and error
>> >>>    handling, abort logic, etc.  When v2v moves to host jobs then we
>> >>> will
>> >>>    have a unified method of handling ephemeral jobs that are tied to
>> >>>    entities.
>> >>>
>> >>>    --
>> >>>    Adam Litke
>> >>>
>> >>>
>> >>
>> >> --
>> >> Adam Litke
>> >
>> >