Do you "promise" to implement your "next gen API" for 4.1 as an alternative?

I guess we need the design first.

On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:

On 05/12/16 11:17 +0200, Arik Hadas wrote:

On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:

   On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com>
wrote:
   >
   > Hi!
   >
   > I'm currently working on integration of virt-sysprep into oVirt.
   >
   > Usually, if user creates a template from a regular VM, and then
creates
   new VMs from this template, these new VMs inherit all configuration
of the
   original VM, including SSH keys, UDEV rules, MAC addresses, system
ID,
   hostname etc. It is unfortunate, because you cannot have two network
   devices with the same MAC address in the same network, for example.
   >
   > To avoid this, user must clean all machine-specific configuration
from
   the original VM before creating a template from it. You can do this
   manually, but there is virt-sysprep utility that does this
automatically.
   >
   > Ideally, virt-sysprep should be seamlessly integrated into
template
   creation process. But the first step is to create a simple button:
user
   selects a VM, clicks the button and oVirt executes virt-sysprep on
the VM.
   >
   > virt-sysprep works directly on VM's filesystem. It accepts list of
all
   disks of the VM as parameters:
   >
   > virt-sysprep -a disk1.img -a disk2.img -a disk3.img
   >
   > The architecture is as follows: command on the Engine side runs a
job on
   VDSM side and tracks its success/failure. The job on VDSM side runs
   virt-sysprep.
   >
   > The question is how to implement the job correctly?
   >
   > I thought about using storage jobs, but they are designed to work
only
   with a single volume, correct?

   New storage verbs are volume based. This make it easy to manage
   them on the engine side, and will allow parallelizing volume
operations
   on single or multiple hosts.

   A storage volume job is using sanlock lease on the modified volume
   and volume generation number. If a host running pending jobs becomes
   non-responsive and cannot be fenced, we can detect the state of
   the job, fence the job, and start the job on another host.

   In the SPM task, if a host becomes non-responsive and cannot be
   fenced, the whole setup is stuck, there is no way to perform any
   storage operation.
   > Is is possible to use them with operation that is performed on
multiple
   volumes?
   > Or, alternatively, is it possible to use some kind of 'VM jobs' -
that
   work on VM at whole?

   We can do:

   1. Add jobs with multiple volumes leases - can make error handling
very
      complex. How do tell a job state if you have multiple leases?
which
      volume generation you use?

   2. Use volume job using one of the volumes (the boot volume?). This
does
      not protect the other volumes from modification but engine is
   responsible
      for this.

   3. Use new "vm jobs", using a vm lease (should be available this
week
   on master).
      This protects a vm during sysprep from starting the vm.
      We still need a generation to detect the job state, I think we
can
   use the sanlock
      lease generation for this.

   I like the last option since sysprep is much like running a vm.
   > How v2v solves this problem?

   It does not.

   v2v predates storage volume jobs. It does not use volume leases and
   generation
   and does have any way to recover if a host running v2v becomes
   non-responsive
   and cannot be fenced.

   It also does not use the jobs framework and does not use a thread
pool for
   v2v jobs, so it has no limit on the number of storage operations on
a host.


Right, but let's be fair and present the benefits of v2v-jobs as well:
1. it is the simplest "infrastructure" in terms of LOC

It is also deprecated. V2V has promised to adopt the richer Host Jobs
API in the future.

2. it is the most efficient mechanism in terms of interactions between
the
engine and VDSM (it doesn't require new verbs/call, the data is
attached to
VdsStats; probably the easiest mechanism to convert to events)

Engine is already polling the host jobs API so I am not sure I agree
with you here.

3. it is the most efficient implementation in terms of interaction with
the
database (no date is persisted into the database, no polling is done)

Again, we're already using the Host Jobs API. We'll gain efficiency
by migrating away from the old v2v API and having a single, unified
approach (Host Jobs).

Currently we have 3 mechanisms to report jobs:
1. VM jobs - that is currently used for live-merge. This requires the
VM entity
to exist in VDSM, thus not suitable for virt-sysprep.

Correct, not appropriate for this application.

2. storage jobs - complicated infrastructure, targeted for recovering
from
failures to maintain storage consistency. Many of the things this
infrastructure knows to handle is irrelevant for virt-sysprep flow, and
the
fact that virt-sysprep is invoked on VM rather than particular disk
makes it
less suitable.

These are more appropriately called HostJobs and the have the
following semantics:
- They represent an external process running on a single host
- They are not persisted. If the host or vdsm restarts, the job is
   aborted
- They operate on entities. Currently storage is the first adopter
   of the infrastructure but virt was going to adopt these for the
   next-gen API. Entities can be volumes, storage domains, vms,
   network interfaces, etc.
- Job status and progress is reported by the Host Jobs API. If a job
   is not present, then the underlying entitie(s) must be polled by
   engine to determine the actual state.

3. V2V jobs - no mechanism is provided to resume failed jobs, no
leases, etc

This is the old infra upon which Host Jobs are built. v2v has
promised to move to Host Jobs in the future so we should not add new
dependencies to this code.

I have some arguments for using V2V-like jobs [1]:
1. creating template from vm is rarely done - if host goes unresponsive
or any
other failure is detected we can just remove the template and report
the error

We can chose this error handling with Host Jobs as well.

2. the phase of virt-sysprep is, unlike typical storage operation,
short -
reducing the risk of failures during the process

Reduced risk of failures is never an excuse to have lax error
handling. The storage flavored host jobs provide tons of utilities
for making error handling standardized, easy to implement, and
correct.

3. during the operation the VM is down - by locking the VM/template and
its
disks on the engine side, we render leases-like mechanism redundant

Eventually we want to protect all operations on storage with sanlock
leases. This is safer and allows for a more distributed approach to
management. Again, the use of leases correctly in host jobs requires
about 5 lines of code. The benefits of standardization far outweigh
any perceived simplification resulting from omitting it.

4. in the worst case - the disk will not be corrupted (only some of the
data
might be removed).

Again, the way engine chooses to handle job failures is independent of
the mechanism. Let's separate that from this discussion.

So I think that the mechanism for storage jobs is an over-kill for this
case.
We can keep it simple by generalise the V2V-job for other virt-tools
jobs, like
virt-sysprep.

I think we ought to standardize on the Host Jobs framework where we
can collaborate on unit tests, standardized locking and error
handling, abort logic, etc. When v2v moves to host jobs then we will
have a unified method of handling ephemeral jobs that are tied to
entities.

--
Adam Litke

--
Adam Litke