[VDSM] Correct implementation of virt-sysprep job

Hi! I'm currently working on integration of virt-sysprep into oVirt. Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: virt-sysprep -a disk1.img -a disk2.img -a disk3.img The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. The question is how to implement the job correctly? I thought about using storage jobs, but they are designed to work only with a single volume, correct? Is is possible to use them with operation that is performed on multiple volumes? Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole? How v2v solves this problem? Any ideas? Shmuel

On Dec 4, 2016 8:50 PM, "Shmuel Melamud" <smelamud@redhat.com> wrote: Hi! I'm currently working on integration of virt-sysprep into oVirt. Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. User selects a VM or a template disk? virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: virt-sysprep -a disk1.img -a disk2.img -a disk3.img I would suggest for the 1st phase implementing it on the boot device only. Yes, theoretically it may have to do some work on other disks (swap may be on a separate disk, etc.). Practically, I think we can live with the limitation. Y. The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. The question is how to implement the job correctly? I thought about using storage jobs, but they are designed to work only with a single volume, correct? Is is possible to use them with operation that is performed on multiple volumes? Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole? How v2v solves this problem? Any ideas? Shmuel _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Sun, Dec 4, 2016 at 10:39 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Dec 4, 2016 8:50 PM, "Shmuel Melamud" <smelamud@redhat.com> wrote:
Hi!
I'm currently working on integration of virt-sysprep into oVirt.
Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example.
To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically.
Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM.
User selects a VM or a template disk?
A VM. It is not safe to modify template disks. We cannot guarantee that there are no VMs based on this template, because some of them may reside on a detached storage. Shmuel

On Mon, Dec 5, 2016 at 9:18 AM, Shmuel Melamud <smelamud@redhat.com> wrote:
On Sun, Dec 4, 2016 at 10:39 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Dec 4, 2016 8:50 PM, "Shmuel Melamud" <smelamud@redhat.com> wrote:
Hi!
I'm currently working on integration of virt-sysprep into oVirt.
Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example.
To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically.
Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM.
User selects a VM or a template disk?
A VM. It is not safe to modify template disks. We cannot guarantee that there are no VMs based on this template, because some of them may reside on a detached storage.
Any template disk that VM were derived from it is not safe to perform this operation. On a pristine template disk it is Ok - and it is exactly where I expect this process to take place. The user flow should be a checkbox in the create template flow. Y.
Shmuel

On Mon, Dec 5, 2016 at 9:45 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Mon, Dec 5, 2016 at 9:18 AM, Shmuel Melamud <smelamud@redhat.com> wrote:
On Sun, Dec 4, 2016 at 10:39 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Dec 4, 2016 8:50 PM, "Shmuel Melamud" <smelamud@redhat.com> wrote:
Hi!
I'm currently working on integration of virt-sysprep into oVirt.
Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example.
To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically.
Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM.
User selects a VM or a template disk?
A VM. It is not safe to modify template disks. We cannot guarantee that there are no VMs based on this template, because some of them may reside on a detached storage.
Any template disk that VM were derived from it is not safe to perform this operation. On a pristine template disk it is Ok - and it is exactly where I expect this process to take place. The user flow should be a checkbox in the create template flow. Y.
big +1 here, a general bottom doesn't indicate the user when to perform the action, and takes it away from the desired flow. since the problem you are presenting here is specifically around template creation, i also thing that a checkbox here with the ability to "seal" the template, is the desired solution to be aiming for.
Shmuel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Mon, Dec 5, 2016 at 10:57 AM, Moran Goldboim <mgoldboi@redhat.com> wrote:
On Mon, Dec 5, 2016 at 9:45 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Mon, Dec 5, 2016 at 9:18 AM, Shmuel Melamud <smelamud@redhat.com> wrote:
On Sun, Dec 4, 2016 at 10:39 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Dec 4, 2016 8:50 PM, "Shmuel Melamud" <smelamud@redhat.com> wrote:
Hi!
I'm currently working on integration of virt-sysprep into oVirt.
Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example.
To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically.
Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM.
User selects a VM or a template disk?
A VM. It is not safe to modify template disks. We cannot guarantee that there are no VMs based on this template, because some of them may reside on a detached storage.
Any template disk that VM were derived from it is not safe to perform this operation. On a pristine template disk it is Ok - and it is exactly where I expect this process to take place. The user flow should be a checkbox in the create template flow. Y.
big +1 here, a general bottom doesn't indicate the user when to perform the action, and takes it away from the desired flow. since the problem you are presenting here is specifically around template creation, i also thing that a checkbox here with the ability to "seal" the template, is the desired solution to be aiming for.
another aspect of this specific user story, is the ability to directly transform this "sealed" vm directly to a template - many of the times it's just being deleted afterwards and the user is just "paying" for the storage actions please look at* Bug 1013675* <https://bugzilla.redhat.com/show_bug.cgi?id=1013675> - [RFE] In-place transformation of a VM to a Template - for details.
Shmuel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote:
Hi!
I'm currently working on integration of virt-sysprep into oVirt.
Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example.
To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically.
Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM.
virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters:
virt-sysprep -a disk1.img -a disk2.img -a disk3.img
The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep.
The question is how to implement the job correctly?
I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts. A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host. In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation.
Is is possible to use them with operation that is performed on multiple volumes? Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do: 1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use? 2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this. 3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this. I like the last option since sysprep is much like running a vm.
How v2v solves this problem?
It does not. v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced. It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host. Nir

On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote:
Hi!
I'm currently working on integration of virt-sysprep into oVirt.
Usually, if user creates a template from a regular VM, and then creates
new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example.
To avoid this, user must clean all machine-specific configuration from
the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically.
Ideally, virt-sysprep should be seamlessly integrated into template
creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM.
virt-sysprep works directly on VM's filesystem. It accepts list of all
disks of the VM as parameters:
virt-sysprep -a disk1.img -a disk2.img -a disk3.img
The architecture is as follows: command on the Engine side runs a job on
VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep.
The question is how to implement the job correctly?
I thought about using storage jobs, but they are designed to work only
with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation.
Is is possible to use them with operation that is performed on multiple volumes? Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm.
How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC 2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events) 3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done) Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep. 2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable. 3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error 2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process 3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant 4. in the worst case - the disk will not be corrupted (only some of the data might be removed). So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep. [1] I believe that as Moran and Yaniv noted, we can just do it in the create template flow without the intermediate (POC) stage of having an operation for doing that on existing VM or template - it only complicates stuff Nir
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Arik Hadas <ahadas@redhat.com> writes:
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC 2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events) 3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep. 2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable. 3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error 2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process 3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant 4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
[1] I believe that as Moran and Yaniv noted, we can just do it in the create template flow without the intermediate (POC) stage of having an operation for doing that on existing VM or template - it only complicates stuff
Nice summary. Based on the arguments you provide the v2v-like way looks like a good solution. Providing we can attach the operation to the create template flow it should be safe and simple.
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
These two don't look like options worth to bother with.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
If we can perform virt-sysprep within the create template flow, does this option provide any real benefit over what Arik suggests? Would it simplify things or add complexity to the proposed solution?

On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote:
Hi!
I'm currently working on integration of virt-sysprep into oVirt.
Usually, if user creates a template from a regular VM, and then creates
new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example.
To avoid this, user must clean all machine-specific configuration from
the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically.
Ideally, virt-sysprep should be seamlessly integrated into template
creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM.
virt-sysprep works directly on VM's filesystem. It accepts list of all
disks of the VM as parameters:
virt-sysprep -a disk1.img -a disk2.img -a disk3.img
The architecture is as follows: command on the Engine side runs a job on
VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep.
The question is how to implement the job correctly?
I thought about using storage jobs, but they are designed to work only
with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation.
Is is possible to use them with operation that is performed on multiple volumes? Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm.
How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities. -- Adam Litke

Adam, Just out of curiosity: when you write "v2v has promised" - what exactly do you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :) Anyway, let's say that you were given such a promise by someone and thus consider that mechanism to be deprecated - it doesn't really matter. The current implementation doesn't well fit to this flow (it requires per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements. I suggest loud and clear to reuse (not to add dependencies, not to enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple. Do you "promise" to implement your "next gen API" for 4.1 as an alternative? On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote:
Hi!
I'm currently working on integration of virt-sysprep into oVirt.
Usually, if user creates a template from a regular VM, and then
creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example.
To avoid this, user must clean all machine-specific configuration
from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically.
Ideally, virt-sysprep should be seamlessly integrated into template
creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM.
virt-sysprep works directly on VM's filesystem. It accepts list of
all disks of the VM as parameters:
virt-sysprep -a disk1.img -a disk2.img -a disk3.img
The architecture is as follows: command on the Engine side runs a
job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep.
The question is how to implement the job correctly?
I thought about using storage jobs, but they are designed to work
only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
Or, alternatively, is it possible to use some kind of 'VM jobs' -
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation. > Is is possible to use them with operation that is performed on multiple volumes? that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the
engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the
database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs:
1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from
failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases,
etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]:
1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short -
reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its
disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the
data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this
case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke

On 06/12/16 22:06 +0200, Arik Hadas wrote:
Adam,
:) You seem upset. Sorry if I touched on a nerve...
Just out of curiosity: when you write "v2v has promised" - what exactly do you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :)
Some history... Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit the unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification. At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity.
Anyway, let's say that you were given such a promise by someone and thus consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
The current implementation doesn't well fit to this flow (it requires per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements.
If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate then perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now.
I suggest loud and clear to reuse (not to add dependencies, not to enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple.
I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings. Of course plans can change but I have never been looped into any such discussions.
Do you "promise" to implement your "next gen API" for 4.1 as an alternative?
I guess we need the design first.
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep into oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. > > Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation. > Is is possible to use them with operation that is performed on multiple volumes? > Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke
-- Adam Litke

On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke@redhat.com> wrote:
On 06/12/16 22:06 +0200, Arik Hadas wrote:
Adam,
:) You seem upset. Sorry if I touched on a nerve...
Just out of curiosity: when you write "v2v has promised" - what exactly do
you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :)
Some history...
Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit the unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification.
At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity.
Anyway, let's say that you were given such a promise by someone and thus
consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
The current implementation doesn't well fit to this flow (it requires
per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements.
If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate then perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now.
I suggest loud and clear to reuse (not to add dependencies, not to
enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple.
I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings.
Of course plans can change but I have never been looped into any such discussions.
Well, I think that when someone builds a good infrastructure he first needs to talk to all consumers and make sure it fits. In this case it seems like most work was done to fit the storage use-case, and now you check whether it can fit others as well.... IMO it makes much more sense to use events where possible (and you've promised to use those as well, but I don't see you doing that...). v2v should use events for sure, and they have promised to do that in the past, instead of using the v2v jobs. The reason events weren't used originally with the v2v feature, was that it was too risky and the events infrastructure was added too late in the game.
Do you "promise" to implement your "next gen API" for 4.1 as an
alternative?
I guess we need the design first.
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud < smelamud@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep into oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. > > Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation. > Is is possible to use them with operation that is performed on multiple volumes? > Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke
-- Adam Litke

On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <oourfali@redhat.com> wrote:
On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke@redhat.com> wrote:
On 06/12/16 22:06 +0200, Arik Hadas wrote:
Adam,
:) You seem upset. Sorry if I touched on a nerve...
Just out of curiosity: when you write "v2v has promised" - what exactly do you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :)
Some history...
Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit the unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification.
At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity.
Anyway, let's say that you were given such a promise by someone and thus consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
The current implementation doesn't well fit to this flow (it requires per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements.
If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate then perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now.
I suggest loud and clear to reuse (not to add dependencies, not to enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple.
I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings.
We discussed these issues behind close doors, not in the public mailing list, so it is not surprising that people do not know about the agreements we had.
Of course plans can change but I have never been looped into any such discussions.
Well, I think that when someone builds a good infrastructure he first needs to talk to all consumers and make sure it fits. In this case it seems like most work was done to fit the storage use-case, and now you check whether it can fit others as well....
The jobs framework is generic and can be used for any subsystem, there is nothing related to storage about it. But modifying disks *is* a storage operation, even if someone from the virt team worked on it. V2v is also storage operation - if we compare it with copying disks: - we create a new volume that nobody is using yet - if the operation fails, the disk must be in illegal state - if the operation fails we delete the disks - if the operation succeeds the volume must be legal - we need to limit the number of operations on a host - we need to detect the job state if the host becomes non-responsive - we may want to fence the job if the host becomes non-responsive in volume jobs, we can increment the volume generation and run the same job on another host. - we want to take a lease on storage to ensure that other hosts cannot access the same entity, or that the job will fail if someone else is using this entity - we want to take a lease on storage, ensuring that a job cannot get stuck for long time - sanlock kill the owner of a lease when storage becomes inaccessible. - we want to report progress sysprep is less risky because the operation is faster, but on storage even fast operation can get stuck for minutes. We need to agree on a standard way to do such operations that is safe enough and can be managed on the engine side.
IMO it makes much more sense to use events where possible (and you've promised to use those as well, but I don't see you doing that...). v2v should use events for sure, and they have promised to do that in the past, instead of using the v2v jobs. The reason events weren't used originally with the v2v feature, was that it was too risky and the events infrastructure was added too late in the game.
Events are not replacing the need for managing jobs in the vdsm side. Engine must have a way to query the current jobs before subscribing to events from these jobs, otherwise you will loose events and engine will never notice a completed job after network errors. The jobs framework supports events, see https://gerrit.ovirt.org/67118 We are waiting for review from the infra team, maybe you can get someone to review this? Nir
Do you "promise" to implement your "next gen API" for 4.1 as an alternative?
I guess we need the design first.
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep into oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. > > Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation. > Is is possible to use them with operation that is performed on multiple volumes? > Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke
-- Adam Litke

On Dec 7, 2016 16:00, "Nir Soffer" <nsoffer@redhat.com> wrote:
On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <oourfali@redhat.com> wrote:
On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke@redhat.com> wrote:
On 06/12/16 22:06 +0200, Arik Hadas wrote:
Adam,
:) You seem upset. Sorry if I touched on a nerve...
Just out of curiosity: when you write "v2v has promised" - what
do you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :)
Some history...
Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit the unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification.
At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity.
Anyway, let's say that you were given such a promise by someone and
consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
The current implementation doesn't well fit to this flow (it requires per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements.
If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate
perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now.
I suggest loud and clear to reuse (not to add dependencies, not to enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple.
I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings.
We discussed these issues behind close doors, not in the public mailing
exactly thus then list,
so it is not surprising that people do not know about the agreements we had.
Of course plans can change but I have never been looped into any such discussions.
Well, I think that when someone builds a good infrastructure he first needs to talk to all consumers and make sure it fits. In this case it seems like most work was done to fit the storage use-case, and now you check whether it can fit others as well....
The jobs framework is generic and can be used for any subsystem, there is nothing related to storage about it. But modifying disks *is* a storage operation, even if someone from the virt team worked on it.
V2v is also storage operation - if we compare it with copying disks:
- we create a new volume that nobody is using yet - if the operation fails, the disk must be in illegal state - if the operation fails we delete the disks - if the operation succeeds the volume must be legal - we need to limit the number of operations on a host - we need to detect the job state if the host becomes non-responsive - we may want to fence the job if the host becomes non-responsive in volume jobs, we can increment the volume generation and run the same job on another host. - we want to take a lease on storage to ensure that other hosts cannot access the same entity, or that the job will fail if someone else is using this entity - we want to take a lease on storage, ensuring that a job cannot get stuck for long time - sanlock kill the owner of a lease when storage becomes inaccessible. - we want to report progress
sysprep is less risky because the operation is faster, but on storage even fast operation can get stuck for minutes.
We need to agree on a standard way to do such operations that is safe enough and can be managed on the engine side.
IMO it makes much more sense to use events where possible (and you've promised to use those as well, but I don't see you doing that...). v2v should use events for sure, and they have promised to do that in the
The core team was there. So it is surprising. past,
instead of using the v2v jobs. The reason events weren't used originally with the v2v feature, was that it was too risky and the events infrastructure was added too late in the game.
Events are not replacing the need for managing jobs in the vdsm side. Engine must have a way to query the current jobs before subscribing to events from these jobs, otherwise you will loose events and engine will never notice a completed job after network errors.
The jobs framework supports events, see https://gerrit.ovirt.org/67118
We are waiting for review from the infra team, maybe you can get someone to review this?
It would have been great to review the design for this before it reaches to gerrit. Anyway, I get permissions error when opening. Any clue why?
Nir
Do you "promise" to implement your "next gen API" for 4.1 as an alternative?
I guess we need the design first.
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com
wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep into oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can
this manually, but there is virt-sysprep utility that does this automatically. > > Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation. > Is is possible to use them with operation that is
on multiple volumes? > Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I
we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things
do performed think this
infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a
job
is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs,
no
leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add
new
dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke
-- Adam Litke

On Wed, Dec 7, 2016 at 8:10 PM, Oved Ourfali <oourfali@redhat.com> wrote:
On Dec 7, 2016 16:00, "Nir Soffer" <nsoffer@redhat.com> wrote:
On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <oourfali@redhat.com> wrote:
On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke@redhat.com> wrote:
On 06/12/16 22:06 +0200, Arik Hadas wrote:
Adam,
:) You seem upset. Sorry if I touched on a nerve...
Just out of curiosity: when you write "v2v has promised" - what exactly do you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :)
Some history...
Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit the unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification.
At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity.
Anyway, let's say that you were given such a promise by someone and thus consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
The current implementation doesn't well fit to this flow (it requires per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements.
If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate then perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now.
I suggest loud and clear to reuse (not to add dependencies, not to enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple.
I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings.
We discussed these issues behind close doors, not in the public mailing list, so it is not surprising that people do not know about the agreements we had.
The core team was there. So it is surprising.
Of course plans can change but I have never been looped into any such discussions.
Well, I think that when someone builds a good infrastructure he first needs to talk to all consumers and make sure it fits. In this case it seems like most work was done to fit the storage use-case, and now you check whether it can fit others as well....
The jobs framework is generic and can be used for any subsystem, there is nothing related to storage about it. But modifying disks *is* a storage operation, even if someone from the virt team worked on it.
V2v is also storage operation - if we compare it with copying disks:
- we create a new volume that nobody is using yet - if the operation fails, the disk must be in illegal state - if the operation fails we delete the disks - if the operation succeeds the volume must be legal - we need to limit the number of operations on a host - we need to detect the job state if the host becomes non-responsive - we may want to fence the job if the host becomes non-responsive in volume jobs, we can increment the volume generation and run the same job on another host. - we want to take a lease on storage to ensure that other hosts cannot access the same entity, or that the job will fail if someone else is using this entity - we want to take a lease on storage, ensuring that a job cannot get stuck for long time - sanlock kill the owner of a lease when storage becomes inaccessible. - we want to report progress
sysprep is less risky because the operation is faster, but on storage even fast operation can get stuck for minutes.
We need to agree on a standard way to do such operations that is safe enough and can be managed on the engine side.
IMO it makes much more sense to use events where possible (and you've promised to use those as well, but I don't see you doing that...). v2v should use events for sure, and they have promised to do that in the past, instead of using the v2v jobs. The reason events weren't used originally with the v2v feature, was that it was too risky and the events infrastructure was added too late in the game.
Events are not replacing the need for managing jobs in the vdsm side. Engine must have a way to query the current jobs before subscribing to events from these jobs, otherwise you will loose events and engine will never notice a completed job after network errors.
The jobs framework supports events, see https://gerrit.ovirt.org/67118
We are waiting for review from the infra team, maybe you can get someone to review this?
It would have been great to review the design for this before it reaches to gerrit. Anyway, I get permissions error when opening. Any clue why?
It is a recent bug in gerrit, or configuration issue, drafts are private sometimes. I added you as reviewer, can you see this now? Nir
Nir
Do you "promise" to implement your "next gen API" for 4.1 as an alternative?
I guess we need the design first.
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep into oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. > > Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation. > Is is possible to use them with operation that is performed on multiple volumes? > Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke
-- Adam Litke

On Dec 7, 2016 20:16, "Nir Soffer" <nsoffer@redhat.com> wrote:
On Wed, Dec 7, 2016 at 8:10 PM, Oved Ourfali <oourfali@redhat.com> wrote:
On Dec 7, 2016 16:00, "Nir Soffer" <nsoffer@redhat.com> wrote:
On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <oourfali@redhat.com>
On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke@redhat.com>
wrote:
On 06/12/16 22:06 +0200, Arik Hadas wrote:
Adam,
:) You seem upset. Sorry if I touched on a nerve...
Just out of curiosity: when you write "v2v has promised" - what exactly do you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar
and
I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :)
Some history...
Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit
wrote: the
unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification.
At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity.
Anyway, let's say that you were given such a promise by someone and thus consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
The current implementation doesn't well fit to this flow (it requires per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements.
If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate then perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now.
I suggest loud and clear to reuse (not to add dependencies, not to enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple.
I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings.
We discussed these issues behind close doors, not in the public mailing list, so it is not surprising that people do not know about the agreements we had.
The core team was there. So it is surprising.
Of course plans can change but I have never been looped into any
such
discussions.
Well, I think that when someone builds a good infrastructure he first needs to talk to all consumers and make sure it fits. In this case it seems like most work was done to fit the storage use-case, and now you check whether it can fit others as well....
The jobs framework is generic and can be used for any subsystem, there is nothing related to storage about it. But modifying disks *is* a storage operation, even if someone from the virt team worked on it.
V2v is also storage operation - if we compare it with copying disks:
- we create a new volume that nobody is using yet - if the operation fails, the disk must be in illegal state - if the operation fails we delete the disks - if the operation succeeds the volume must be legal - we need to limit the number of operations on a host - we need to detect the job state if the host becomes non-responsive - we may want to fence the job if the host becomes non-responsive in volume jobs, we can increment the volume generation and run the same job on another host. - we want to take a lease on storage to ensure that other hosts cannot access the same entity, or that the job will fail if someone else is using this entity - we want to take a lease on storage, ensuring that a job cannot get stuck for long time - sanlock kill the owner of a lease when storage becomes inaccessible. - we want to report progress
sysprep is less risky because the operation is faster, but on storage even fast operation can get stuck for minutes.
We need to agree on a standard way to do such operations that is safe enough and can be managed on the engine side.
IMO it makes much more sense to use events where possible (and you've promised to use those as well, but I don't see you doing that...). v2v should use events for sure, and they have promised to do that in the past, instead of using the v2v jobs. The reason events weren't used originally with the v2v feature, was that it was too risky and the events infrastructure was added too late in the game.
Events are not replacing the need for managing jobs in the vdsm side. Engine must have a way to query the current jobs before subscribing to events from these jobs, otherwise you will loose events and engine will never notice a completed job after network errors.
The jobs framework supports events, see https://gerrit.ovirt.org/67118
We are waiting for review from the infra team, maybe you can get someone to review this?
It would have been great to review the design for this before it reaches to gerrit. Anyway, I get permissions error when opening. Any clue why?
It is a recent bug in gerrit, or configuration issue, drafts are private sometimes.
I added you as reviewer, can you see this now?
Nir
Nir
Do you "promise" to implement your "next gen API" for 4.1 as an alternative?
I guess we need the design first.
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com>
wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep
into
oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does
automatically. > > Ideally, virt-sysprep should be seamlessly integrated
into
template creation process. But the first step is to create a
simple
button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to
any storage operation. > Is is possible to use them with operation that is performed on multiple volumes? > Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no
done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than
disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for
Yes. I see Piotr is already on it. I'll also be happy to hear how are you going to use events in your current design. Also, is there a design page for this work? Thanks, Oved this perform polling is particular the
next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If
a
job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke
-- Adam Litke

On 07 Dec 2016, at 09:17, Oved Ourfali <oourfali@redhat.com> wrote: On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke@redhat.com> wrote:
On 06/12/16 22:06 +0200, Arik Hadas wrote:
Adam,
:) You seem upset. Sorry if I touched on a nerve...
Just out of curiosity: when you write "v2v has promised" - what exactly do
you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :)
Some history...
Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit the unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification.
At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity.
Anyway, let's say that you were given such a promise by someone and thus
consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
The current implementation doesn't well fit to this flow (it requires
per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements.
If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate then perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now.
I suggest loud and clear to reuse (not to add dependencies, not to
enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple.
I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings.
Of course plans can change but I have never been looped into any such discussions.
Well, I think that when someone builds a good infrastructure he first needs to talk to all consumers and make sure it fits. In this case it seems like most work was done to fit the storage use-case, and now you check whether it can fit others as well.... IMO it makes much more sense to use events where possible (and you've promised to use those as well, but I don't see you doing that...). v2v should use events for sure, and they have promised to do that in the past, instead of using the v2v jobs. The reason events weren't used originally with the v2v feature, was that it was too risky and the events infrastructure was added too late in the game. Revisiting and refactoring code which is already in use is always a bit of luxury we can rarely prioritize. So indeed v2v is not using events. The generalization work has been done to some extent, but there is no incentive to rewrite it completely. On the other hand we are now trying to add events to migration progress reporting and hand over since that area is being touched due to post-copy enhancements. So, when there is a practical chance to improve functionality by utilizing events it indeed should be the first choice
Do you "promise" to implement your "next gen API" for 4.1 as an
alternative?
I guess we need the design first.
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud < smelamud@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep into oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. > > Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation. > Is is possible to use them with operation that is performed on multiple volumes? > Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke
-- Adam Litke

On Wed, Dec 7, 2016 at 8:24 PM, Michal Skrivanek <mskrivan@redhat.com> wrote:
On 07 Dec 2016, at 09:17, Oved Ourfali <oourfali@redhat.com> wrote:
On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <alitke@redhat.com> wrote:
On 06/12/16 22:06 +0200, Arik Hadas wrote:
Adam,
:) You seem upset. Sorry if I touched on a nerve...
Just out of curiosity: when you write "v2v has promised" - what exactly do you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :)
Some history...
Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit the unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification.
At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity.
Anyway, let's say that you were given such a promise by someone and thus consider that mechanism to be deprecated - it doesn't really matter.
I may be biased but I think my opinion does matter.
The current implementation doesn't well fit to this flow (it requires per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements.
If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate then perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now.
I suggest loud and clear to reuse (not to add dependencies, not to enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple.
I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings.
Of course plans can change but I have never been looped into any such discussions.
Well, I think that when someone builds a good infrastructure he first needs to talk to all consumers and make sure it fits. In this case it seems like most work was done to fit the storage use-case, and now you check whether it can fit others as well....
IMO it makes much more sense to use events where possible (and you've promised to use those as well, but I don't see you doing that...). v2v should use events for sure, and they have promised to do that in the past, instead of using the v2v jobs. The reason events weren't used originally with the v2v feature, was that it was too risky and the events infrastructure was added too late in the game.
Revisiting and refactoring code which is already in use is always a bit of luxury we can rarely prioritize. So indeed v2v is not using events. The generalization work has been done to some extent, but there is no incentive to rewrite it completely.
On the vdsm side we don't need to rewrite the interesting parts, just run the code using the current infrastructure. The first step is to use the jobs framework and delete the stale copy of the jobs framework in v2v.py. Then we can easily schedule the jobs in the task manager instead of starting a thread for each job - one line to schedule a job in hsm - see the sparsify patch. I think Shahar started to work on it, we need to rebase his patch and finish it.
On the other hand we are now trying to add events to migration progress reporting and hand over since that area is being touched due to post-copy enhancements. So, when there is a practical chance to improve functionality by utilizing events it indeed should be the first choice
Do you "promise" to implement your "next gen API" for 4.1 as an alternative?
I guess we need the design first.
On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <alitke@redhat.com> wrote:
On 05/12/16 11:17 +0200, Arik Hadas wrote:
On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smelamud@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep into oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. > > Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct?
New storage verbs are volume based. This make it easy to manage them on the engine side, and will allow parallelizing volume operations on single or multiple hosts.
A storage volume job is using sanlock lease on the modified volume and volume generation number. If a host running pending jobs becomes non-responsive and cannot be fenced, we can detect the state of the job, fence the job, and start the job on another host.
In the SPM task, if a host becomes non-responsive and cannot be fenced, the whole setup is stuck, there is no way to perform any storage operation. > Is is possible to use them with operation that is performed on multiple volumes? > Or, alternatively, is it possible to use some kind of 'VM jobs' - that work on VM at whole?
We can do:
1. Add jobs with multiple volumes leases - can make error handling very complex. How do tell a job state if you have multiple leases? which volume generation you use?
2. Use volume job using one of the volumes (the boot volume?). This does not protect the other volumes from modification but engine is responsible for this.
3. Use new "vm jobs", using a vm lease (should be available this week on master). This protects a vm during sysprep from starting the vm. We still need a generation to detect the job state, I think we can use the sanlock lease generation for this.
I like the last option since sysprep is much like running a vm. > How v2v solves this problem?
It does not.
v2v predates storage volume jobs. It does not use volume leases and generation and does have any way to recover if a host running v2v becomes non-responsive and cannot be fenced.
It also does not use the jobs framework and does not use a thread pool for v2v jobs, so it has no limit on the number of storage operations on a host.
Right, but let's be fair and present the benefits of v2v-jobs as well: 1. it is the simplest "infrastructure" in terms of LOC
It is also deprecated. V2V has promised to adopt the richer Host Jobs API in the future.
2. it is the most efficient mechanism in terms of interactions between the engine and VDSM (it doesn't require new verbs/call, the data is attached to VdsStats; probably the easiest mechanism to convert to events)
Engine is already polling the host jobs API so I am not sure I agree with you here.
3. it is the most efficient implementation in terms of interaction with the database (no date is persisted into the database, no polling is done)
Again, we're already using the Host Jobs API. We'll gain efficiency by migrating away from the old v2v API and having a single, unified approach (Host Jobs).
Currently we have 3 mechanisms to report jobs: 1. VM jobs - that is currently used for live-merge. This requires the VM entity to exist in VDSM, thus not suitable for virt-sysprep.
Correct, not appropriate for this application.
2. storage jobs - complicated infrastructure, targeted for recovering from failures to maintain storage consistency. Many of the things this infrastructure knows to handle is irrelevant for virt-sysprep flow, and the fact that virt-sysprep is invoked on VM rather than particular disk makes it less suitable.
These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state.
3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc
This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code.
I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error
We can chose this error handling with Host Jobs as well.
2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process
Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct.
3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant
Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it.
4. in the worst case - the disk will not be corrupted (only some of the data might be removed).
Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion.
So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep.
I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities.
-- Adam Litke
-- Adam Litke
participants (9)
-
Adam Litke
-
Arik Hadas
-
Michal Skrivanek
-
Milan Zamazal
-
Moran Goldboim
-
Nir Soffer
-
Oved Ourfali
-
Shmuel Melamud
-
Yaniv Kaul