Re: [ovirt-devel] [VDSM] Correct implementation of virt-sysprep job

7 Dec 2016

      On Dec 7, 2016 16:00, "Nir Soffer" <nsoffer@redhat.com> wrote:
...
On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <oourfali@redhat.com> wrote:
...
On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke@redhat.com> wrote:
...
On 06/12/16 22:06 +0200, Arik Hadas wrote:
...
Adam,
:)  You seem upset.  Sorry if I touched on a nerve...
...
Just out of curiosity: when you write "v2v has promised" - what
...
...
...
...
do you
mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar
and I
that
implemented the integration with virt-v2v? I'm not aware of such a
promise by
any of these options :)
Some history...
Earlier this year Nir, Francesco (added), Shahar, and I began
discussing the similarities between what storage needed to do with
external commands and what was designed specifically for v2v.  I am
not sure if you were involved in the project at that time.  The plan
was to create common infrastructure that could be extended to fit the
unique needs of the verticals.  The v2v code was going to be moved
over to the new infrastructure (see [1]) and the only thing that
stopped the initial patch was lack of a VMWare testing environment for
verification.
At that time storage refocused on developing verbs that used the new
infrastructure and have been maintaining its suitability for general
use.  Conversion of v2v -> Host Jobs is obviously a lower priority
item and much more difficult now due to the early missed opportunity.
...
Anyway, let's say that you were given such a promise by someone and
...
...
...
...
consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
...
The current implementation doesn't well fit to this flow (it requires
per-volume job, it creates leases that are not needed for template's
disks,
...) and with the "next-gen API" with proper support for virt flows
not
even
being discussed with us (and iiuc also not with the infra team) yet, I
don't
understand what do you suggest except for some strong, though
irrelevant,
statements.
If you are willing to engage in a good-faith technical discussion I am
sure I can help you to understand.  These operations to storage demand
some form of locking protection.  If volume leases aren't appropriate
...
...
...
perhaps we should use the VM Leases / xleases that Nir is finishing
off for 4.1 now.
...
I suggest loud and clear to reuse (not to add dependencies, not to
enhance, ..)
an existing mechanism for a very similar flow of virt-v2v that works
well
and
simple.
I clearly remember discussions involving infra (hello Oved), virt
(hola Michal), and storage where we decided that new APIs performing
async operations involving external commands should use the HostJobs
infrastructure instead of adding more information to Host Stats.
These were the "famous" entity polling meetings.
We discussed these issues behind close doors, not in the public mailing
exactly
thus
then
list,
...
so it is not surprising that people do not know about the agreements we
had.
...
...
...
Of course plans can change but I have never been looped into any such
discussions.
Well, I think that when someone builds a good infrastructure he first
needs
to talk to all consumers and make sure it fits.
In this case it seems like most work was done to fit the storage
use-case,
and now you check whether it can fit others as well....
The jobs framework is generic and can be used for any subsystem,
there is nothing related to storage about it. But modifying disks *is*
a storage operation, even if someone from the virt team worked on it.
V2v is also storage operation - if we compare it with copying disks:
- we create a new volume that nobody is using yet
- if the operation fails, the disk must be in illegal state
- if the operation fails we delete the disks
- if the operation succeeds the volume must be legal
- we need to limit the number of operations on a host
- we need to detect the job state if the host becomes non-responsive
- we may want to fence the job if the host becomes non-responsive
  in volume jobs, we can increment the volume generation and run
  the same job on another host.
- we want to take a lease on storage to ensure that other hosts cannot
  access the same entity, or that the job will fail if someone else is
using
  this entity
- we want to take a lease on storage, ensuring that a job cannot get
  stuck for long time - sanlock kill the owner of a lease when storage
  becomes inaccessible.
- we want to report progress
sysprep is less risky because the operation is faster, but on storage even
fast operation can get stuck for minutes.
We need to agree on a standard way to do such operations that is safe
enough
and can be managed on the engine side.
...
IMO it makes much more sense to use events where possible (and you've
promised to use those as well, but I don't see you doing that...). v2v
should use events for sure, and they have promised to do that in the
The core team was there. So it is surprising.

past,
...
...
instead of using the v2v jobs. The reason events weren't used originally
with the v2v feature, was that it was too risky and the events
infrastructure was added too late in the game.
Events are not replacing the need for managing jobs in the vdsm side.
Engine must have a way to query the current jobs before subscribing
to events from these jobs, otherwise you will loose events and engine
will never notice a completed job after network errors.
The jobs framework supports events, see
https://gerrit.ovirt.org/67118
We are waiting for review from the infra team, maybe you can
get someone to review this?
It would have been great to review the design for this before it reaches to
gerrit.

Anyway, I get permissions error when opening. Any clue why?
...
Nir
...
...
...
Do you "promise" to implement your "next gen API" for 4.1 as an
alternative?
I guess we need the design first.
...
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com
...
...
...
wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud
<smelamud@redhat.com>
       wrote:
          >
          > Hi!
          >
          > I'm currently working on integration of virt-sysprep into
oVirt.
          >
          > Usually, if user creates a template from a regular VM, and
then
       creates
          new VMs from this template, these new VMs inherit all
configuration
       of the
          original VM, including SSH keys, UDEV rules, MAC addresses,
system
       ID,
          hostname etc. It is unfortunate, because you cannot have two
network
          devices with the same MAC address in the same network, for
example.
          >
          > To avoid this, user must clean all machine-specific
configuration
       from
          the original VM before creating a template from it. You can
...
...
...
...
this
          manually, but there is virt-sysprep utility that does this
       automatically.
          >
          > Ideally, virt-sysprep should be seamlessly integrated into
       template
          creation process. But the first step is to create a simple
button:
       user
          selects a VM, clicks the button and oVirt executes
virt-sysprep
on
       the VM.
          >
          > virt-sysprep works directly on VM's filesystem. It accepts
list of
       all
          disks of the VM as parameters:
          >
          > virt-sysprep -a disk1.img -a disk2.img -a disk3.img
          >
          > The architecture is as follows: command on the Engine side
runs a
       job on
          VDSM side and tracks its success/failure. The job on VDSM
side
runs
          virt-sysprep.
          >
          > The question is how to implement the job correctly?
          >
          > I thought about using storage jobs, but they are designed
to
work
       only
          with a single volume, correct?
New storage verbs are volume based. This make it easy to
manage
          them on the engine side, and will allow parallelizing volume
       operations
          on single or multiple hosts.
A storage volume job is using sanlock lease on the modified
volume
          and volume generation number. If a host running pending jobs
becomes
          non-responsive and cannot be fenced, we can detect the
state of
          the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and
cannot be
          fenced, the whole setup is stuck, there is no way to perform
any
          storage operation.
            > Is is possible to use them with operation that is
...
...
...
...
on
       multiple
          volumes?
          > Or, alternatively, is it possible to use some kind of 'VM
jobs' -
       that
          work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error
handling
       very
              complex. How do tell a job state if you have multiple
leases?
       which
              volume generation you use?
2. Use volume job using one of the volumes (the boot
volume?).
This
       does
              not protect the other volumes from modification but
engine
is
          responsible
              for this.
3. Use new "vm jobs", using a vm lease (should be available
this
       week
          on master).
              This protects a vm during sysprep from starting the vm.
              We still need a generation to detect the job state, I
...
...
...
...
we
       can
          use the sanlock
              lease generation for this.
I like the last option since sysprep is much like running a
vm.
            > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume
leases
and
          generation
          and does have any way to recover if a host running v2v
becomes
          non-responsive
          and cannot be fenced.
It also does not use the jobs framework and does not use a
thread
       pool for
          v2v jobs, so it has no limit on the number of storage
operations on
       a host.
Right, but let's be fair and present the benefits of v2v-jobs
as
well:
       1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated.  V2V has promised to adopt the richer Host
Jobs
   API in the future.
2. it is the most efficient mechanism in terms of interactions
between
       the
       engine and VDSM (it doesn't require new verbs/call, the data is
       attached to
       VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I
agree
   with you here.
3. it is the most efficient implementation in terms of
interaction
with
       the
       database (no date is persisted into the database, no polling is
done)
Again, we're already using the Host Jobs API.  We'll gain
efficiency
   by migrating away from the old v2v API and having a single, unified
   approach (Host Jobs).
Currently we have 3 mechanisms to report jobs:
       1. VM jobs - that is currently used for live-merge. This
requires
the
       VM entity
       to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for
recovering
       from
       failures to maintain storage consistency. Many of the things
do
performed
think
this
...
...
...
...
infrastructure knows to handle is irrelevant for virt-sysprep
flow, and
       the
       fact that virt-sysprep is invoked on VM rather than particular
disk
       makes it
       less suitable.
These are more appropriately called HostJobs and the have the
   following semantics:
   - They represent an external process running on a single host
   - They are not persisted.  If the host or vdsm restarts, the job is
     aborted
   - They operate on entities.  Currently storage is the first adopter
     of the infrastructure but virt was going to adopt these for the
     next-gen API.  Entities can be volumes, storage domains, vms,
     network interfaces, etc.
   - Job status and progress is reported by the Host Jobs API.  If a
job
...
is not present, then the underlying entitie(s) must be polled by
     engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs,
no
...
leases, etc
This is the old infra upon which Host Jobs are built.  v2v has
   promised to move to Host Jobs in the future so we should not add
new
...
dependencies to this code.
I have some arguments for using V2V-like jobs [1]:
       1. creating template from vm is rarely done - if host goes
unresponsive
       or any
       other failure is detected we can just remove the template and
report
       the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage
operation,
       short -
       reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error
   handling.  The storage flavored host jobs provide tons of utilities
   for making error handling standardized, easy to implement, and
   correct.
3. during the operation the VM is down - by locking the
VM/template and
       its
       disks on the engine side, we render leases-like mechanism
redundant
Eventually we want to protect all operations on storage with
sanlock
   leases.  This is safer and allows for a more distributed approach
to
   management.  Again, the use of leases correctly in host jobs
requires
   about 5 lines of code.  The benefits of standardization far
outweigh
   any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only
some
of the
       data
       might be removed).
Again, the way engine chooses to handle job failures is
independent of
   the mechanism.  Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill
for
this
       case.
       We can keep it simple by generalise the V2V-job for other
virt-tools
       jobs, like
       virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we
   can collaborate on unit tests, standardized locking and error
   handling, abort logic, etc.  When v2v moves to host jobs then we
will
   have a unified method of handling ephemeral jobs that are tied to
   entities.
--
   Adam Litke
--
Adam Litke