<p dir="ltr"></p>
<p dir="ltr">On Dec 7, 2016 20:16, "Nir Soffer" <<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>> wrote:<br>
><br>
> On Wed, Dec 7, 2016 at 8:10 PM, Oved Ourfali <<a href="mailto:oourfali@redhat.com">oourfali@redhat.com</a>> wrote:<br>
> > On Dec 7, 2016 16:00, "Nir Soffer" <<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>> wrote:<br>
> >><br>
> >> On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <<a href="mailto:oourfali@redhat.com">oourfali@redhat.com</a>> wrote:<br>
> >> ><br>
> >> ><br>
> >> > On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <<a href="mailto:alitke@redhat.com">alitke@redhat.com</a>> wrote:<br>
> >> >><br>
> >> >> On 06/12/16 22:06 +0200, Arik Hadas wrote:<br>
> >> >>><br>
> >> >>> Adam,<br>
> >> >><br>
> >> >><br>
> >> >> :) You seem upset. Sorry if I touched on a nerve...<br>
> >> >><br>
> >> >>> Just out of curiosity: when you write "v2v has promised" - what<br>
> >> >>> exactly<br>
> >> >>> do you<br>
> >> >>> mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and<br>
> >> >>> I<br>
> >> >>> that<br>
> >> >>> implemented the integration with virt-v2v? I'm not aware of such a<br>
> >> >>> promise by<br>
> >> >>> any of these options :)<br>
> >> >><br>
> >> >><br>
> >> >> Some history...<br>
> >> >><br>
> >> >> Earlier this year Nir, Francesco (added), Shahar, and I began<br>
> >> >> discussing the similarities between what storage needed to do with<br>
> >> >> external commands and what was designed specifically for v2v. I am<br>
> >> >> not sure if you were involved in the project at that time. The plan<br>
> >> >> was to create common infrastructure that could be extended to fit the<br>
> >> >> unique needs of the verticals. The v2v code was going to be moved<br>
> >> >> over to the new infrastructure (see [1]) and the only thing that<br>
> >> >> stopped the initial patch was lack of a VMWare testing environment for<br>
> >> >> verification.<br>
> >> >><br>
> >> >> At that time storage refocused on developing verbs that used the new<br>
> >> >> infrastructure and have been maintaining its suitability for general<br>
> >> >> use. Conversion of v2v -> Host Jobs is obviously a lower priority<br>
> >> >> item and much more difficult now due to the early missed opportunity.<br>
> >> >><br>
> >> >>> Anyway, let's say that you were given such a promise by someone and<br>
> >> >>> thus<br>
> >> >>> consider that mechanism to be deprecated - it doesn't really matter.<br>
> >> >><br>
> >> >><br>
> >> >> I may be biased but I think my opinion does matter.<br>
> >> >><br>
> >> >>> The current implementation doesn't well fit to this flow (it requires<br>
> >> >>> per-volume job, it creates leases that are not needed for template's<br>
> >> >>> disks,<br>
> >> >>> ...) and with the "next-gen API" with proper support for virt flows<br>
> >> >>> not<br>
> >> >>> even<br>
> >> >>> being discussed with us (and iiuc also not with the infra team) yet, I<br>
> >> >>> don't<br>
> >> >>> understand what do you suggest except for some strong, though<br>
> >> >>> irrelevant,<br>
> >> >>> statements.<br>
> >> >><br>
> >> >><br>
> >> >> If you are willing to engage in a good-faith technical discussion I am<br>
> >> >> sure I can help you to understand. These operations to storage demand<br>
> >> >> some form of locking protection. If volume leases aren't appropriate<br>
> >> >> then<br>
> >> >> perhaps we should use the VM Leases / xleases that Nir is finishing<br>
> >> >> off for 4.1 now.<br>
> >> >><br>
> >> >>> I suggest loud and clear to reuse (not to add dependencies, not to<br>
> >> >>> enhance, ..)<br>
> >> >>> an existing mechanism for a very similar flow of virt-v2v that works<br>
> >> >>> well<br>
> >> >>> and<br>
> >> >>> simple.<br>
> >> >><br>
> >> >><br>
> >> >> I clearly remember discussions involving infra (hello Oved), virt<br>
> >> >> (hola Michal), and storage where we decided that new APIs performing<br>
> >> >> async operations involving external commands should use the HostJobs<br>
> >> >> infrastructure instead of adding more information to Host Stats.<br>
> >> >> These were the "famous" entity polling meetings.<br>
> >><br>
> >> We discussed these issues behind close doors, not in the public mailing<br>
> >> list,<br>
> >> so it is not surprising that people do not know about the agreements we<br>
> >> had.<br>
> >><br>
> ><br>
> > The core team was there. So it is surprising.<br>
> ><br>
> >> >><br>
> >> >> Of course plans can change but I have never been looped into any such<br>
> >> >> discussions.<br>
> >> >><br>
> >> ><br>
> >> > Well, I think that when someone builds a good infrastructure he first<br>
> >> > needs<br>
> >> > to talk to all consumers and make sure it fits.<br>
> >> > In this case it seems like most work was done to fit the storage<br>
> >> > use-case,<br>
> >> > and now you check whether it can fit others as well....<br>
> >><br>
> >> The jobs framework is generic and can be used for any subsystem,<br>
> >> there is nothing related to storage about it. But modifying disks *is*<br>
> >> a storage operation, even if someone from the virt team worked on it.<br>
> >><br>
> >> V2v is also storage operation - if we compare it with copying disks:<br>
> >><br>
> >> - we create a new volume that nobody is using yet<br>
> >> - if the operation fails, the disk must be in illegal state<br>
> >> - if the operation fails we delete the disks<br>
> >> - if the operation succeeds the volume must be legal<br>
> >> - we need to limit the number of operations on a host<br>
> >> - we need to detect the job state if the host becomes non-responsive<br>
> >> - we may want to fence the job if the host becomes non-responsive<br>
> >> in volume jobs, we can increment the volume generation and run<br>
> >> the same job on another host.<br>
> >> - we want to take a lease on storage to ensure that other hosts cannot<br>
> >> access the same entity, or that the job will fail if someone else is<br>
> >> using<br>
> >> this entity<br>
> >> - we want to take a lease on storage, ensuring that a job cannot get<br>
> >> stuck for long time - sanlock kill the owner of a lease when storage<br>
> >> becomes inaccessible.<br>
> >> - we want to report progress<br>
> >><br>
> >> sysprep is less risky because the operation is faster, but on storage even<br>
> >> fast operation can get stuck for minutes.<br>
> >><br>
> >> We need to agree on a standard way to do such operations that is safe<br>
> >> enough<br>
> >> and can be managed on the engine side.<br>
> >><br>
> >> > IMO it makes much more sense to use events where possible (and you've<br>
> >> > promised to use those as well, but I don't see you doing that...). v2v<br>
> >> > should use events for sure, and they have promised to do that in the<br>
> >> > past,<br>
> >> > instead of using the v2v jobs. The reason events weren't used originally<br>
> >> > with the v2v feature, was that it was too risky and the events<br>
> >> > infrastructure was added too late in the game.<br>
> >><br>
> >> Events are not replacing the need for managing jobs in the vdsm side.<br>
> >> Engine must have a way to query the current jobs before subscribing<br>
> >> to events from these jobs, otherwise you will loose events and engine<br>
> >> will never notice a completed job after network errors.<br>
> >><br>
> >> The jobs framework supports events, see<br>
> >> <a href="https://gerrit.ovirt.org/67118">https://gerrit.ovirt.org/67118</a><br>
> >><br>
> >> We are waiting for review from the infra team, maybe you can<br>
> >> get someone to review this?<br>
> ><br>
> > It would have been great to review the design for this before it reaches to<br>
> > gerrit.<br>
> > Anyway, I get permissions error when opening. Any clue why?<br>
><br>
> It is a recent bug in gerrit, or configuration issue, drafts are<br>
> private sometimes.<br>
><br>
> I added you as reviewer, can you see this now?<br>
></p>
<p dir="ltr">Yes. I see Piotr is already on it. <br>
I'll also be happy to hear how are you going to use events in your current design. </p>
<p dir="ltr">Also, is there a design page for this work? </p>
<p dir="ltr">Thanks,<br>
Oved </p>
<p dir="ltr">> Nir<br>
><br>
> ><br>
> >><br>
> >> Nir<br>
> >><br>
> >> ><br>
> >> ><br>
> >> >>><br>
> >> >>> Do you "promise" to implement your "next gen API" for 4.1 as an<br>
> >> >>> alternative?<br>
> >> >><br>
> >> >><br>
> >> >> I guess we need the design first.<br>
> >> >><br>
> >> >><br>
> >> >>> On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <<a href="mailto:alitke@redhat.com">alitke@redhat.com</a>> wrote:<br>
> >> >>><br>
> >> >>> On 05/12/16 11:17 +0200, Arik Hadas wrote:<br>
> >> >>><br>
> >> >>><br>
> >> >>><br>
> >> >>> On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer<br>
> >> >>> <<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>><br>
> >> >>> wrote:<br>
> >> >>><br>
> >> >>> On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud<br>
> >> >>> <<a href="mailto:smelamud@redhat.com">smelamud@redhat.com</a>><br>
> >> >>> wrote:<br>
> >> >>> ><br>
> >> >>> > Hi!<br>
> >> >>> ><br>
> >> >>> > I'm currently working on integration of virt-sysprep into<br>
> >> >>> oVirt.<br>
> >> >>> ><br>
> >> >>> > Usually, if user creates a template from a regular VM, and<br>
> >> >>> then<br>
> >> >>> creates<br>
> >> >>> new VMs from this template, these new VMs inherit all<br>
> >> >>> configuration<br>
> >> >>> of the<br>
> >> >>> original VM, including SSH keys, UDEV rules, MAC addresses,<br>
> >> >>> system<br>
> >> >>> ID,<br>
> >> >>> hostname etc. It is unfortunate, because you cannot have two<br>
> >> >>> network<br>
> >> >>> devices with the same MAC address in the same network, for<br>
> >> >>> example.<br>
> >> >>> ><br>
> >> >>> > To avoid this, user must clean all machine-specific<br>
> >> >>> configuration<br>
> >> >>> from<br>
> >> >>> the original VM before creating a template from it. You can<br>
> >> >>> do<br>
> >> >>> this<br>
> >> >>> manually, but there is virt-sysprep utility that does this<br>
> >> >>> automatically.<br>
> >> >>> ><br>
> >> >>> > Ideally, virt-sysprep should be seamlessly integrated into<br>
> >> >>> template<br>
> >> >>> creation process. But the first step is to create a simple<br>
> >> >>> button:<br>
> >> >>> user<br>
> >> >>> selects a VM, clicks the button and oVirt executes<br>
> >> >>> virt-sysprep<br>
> >> >>> on<br>
> >> >>> the VM.<br>
> >> >>> ><br>
> >> >>> > virt-sysprep works directly on VM's filesystem. It accepts<br>
> >> >>> list of<br>
> >> >>> all<br>
> >> >>> disks of the VM as parameters:<br>
> >> >>> ><br>
> >> >>> > virt-sysprep -a disk1.img -a disk2.img -a disk3.img<br>
> >> >>> ><br>
> >> >>> > The architecture is as follows: command on the Engine side<br>
> >> >>> runs a<br>
> >> >>> job on<br>
> >> >>> VDSM side and tracks its success/failure. The job on VDSM<br>
> >> >>> side<br>
> >> >>> runs<br>
> >> >>> virt-sysprep.<br>
> >> >>> ><br>
> >> >>> > The question is how to implement the job correctly?<br>
> >> >>> ><br>
> >> >>> > I thought about using storage jobs, but they are designed<br>
> >> >>> to<br>
> >> >>> work<br>
> >> >>> only<br>
> >> >>> with a single volume, correct?<br>
> >> >>><br>
> >> >>> New storage verbs are volume based. This make it easy to<br>
> >> >>> manage<br>
> >> >>> them on the engine side, and will allow parallelizing volume<br>
> >> >>> operations<br>
> >> >>> on single or multiple hosts.<br>
> >> >>><br>
> >> >>> A storage volume job is using sanlock lease on the modified<br>
> >> >>> volume<br>
> >> >>> and volume generation number. If a host running pending jobs<br>
> >> >>> becomes<br>
> >> >>> non-responsive and cannot be fenced, we can detect the state<br>
> >> >>> of<br>
> >> >>> the job, fence the job, and start the job on another host.<br>
> >> >>><br>
> >> >>> In the SPM task, if a host becomes non-responsive and cannot<br>
> >> >>> be<br>
> >> >>> fenced, the whole setup is stuck, there is no way to perform<br>
> >> >>> any<br>
> >> >>> storage operation.<br>
> >> >>> > Is is possible to use them with operation that is<br>
> >> >>> performed<br>
> >> >>> on<br>
> >> >>> multiple<br>
> >> >>> volumes?<br>
> >> >>> > Or, alternatively, is it possible to use some kind of 'VM<br>
> >> >>> jobs' -<br>
> >> >>> that<br>
> >> >>> work on VM at whole?<br>
> >> >>><br>
> >> >>> We can do:<br>
> >> >>><br>
> >> >>> 1. Add jobs with multiple volumes leases - can make error<br>
> >> >>> handling<br>
> >> >>> very<br>
> >> >>> complex. How do tell a job state if you have multiple<br>
> >> >>> leases?<br>
> >> >>> which<br>
> >> >>> volume generation you use?<br>
> >> >>><br>
> >> >>> 2. Use volume job using one of the volumes (the boot<br>
> >> >>> volume?).<br>
> >> >>> This<br>
> >> >>> does<br>
> >> >>> not protect the other volumes from modification but<br>
> >> >>> engine<br>
> >> >>> is<br>
> >> >>> responsible<br>
> >> >>> for this.<br>
> >> >>><br>
> >> >>> 3. Use new "vm jobs", using a vm lease (should be available<br>
> >> >>> this<br>
> >> >>> week<br>
> >> >>> on master).<br>
> >> >>> This protects a vm during sysprep from starting the vm.<br>
> >> >>> We still need a generation to detect the job state, I<br>
> >> >>> think<br>
> >> >>> we<br>
> >> >>> can<br>
> >> >>> use the sanlock<br>
> >> >>> lease generation for this.<br>
> >> >>><br>
> >> >>> I like the last option since sysprep is much like running a<br>
> >> >>> vm.<br>
> >> >>> > How v2v solves this problem?<br>
> >> >>><br>
> >> >>> It does not.<br>
> >> >>><br>
> >> >>> v2v predates storage volume jobs. It does not use volume<br>
> >> >>> leases<br>
> >> >>> and<br>
> >> >>> generation<br>
> >> >>> and does have any way to recover if a host running v2v<br>
> >> >>> becomes<br>
> >> >>> non-responsive<br>
> >> >>> and cannot be fenced.<br>
> >> >>><br>
> >> >>> It also does not use the jobs framework and does not use a<br>
> >> >>> thread<br>
> >> >>> pool for<br>
> >> >>> v2v jobs, so it has no limit on the number of storage<br>
> >> >>> operations on<br>
> >> >>> a host.<br>
> >> >>><br>
> >> >>><br>
> >> >>> Right, but let's be fair and present the benefits of v2v-jobs<br>
> >> >>> as<br>
> >> >>> well:<br>
> >> >>> 1. it is the simplest "infrastructure" in terms of LOC<br>
> >> >>><br>
> >> >>><br>
> >> >>> It is also deprecated. V2V has promised to adopt the richer Host<br>
> >> >>> Jobs<br>
> >> >>> API in the future.<br>
> >> >>><br>
> >> >>><br>
> >> >>> 2. it is the most efficient mechanism in terms of interactions<br>
> >> >>> between<br>
> >> >>> the<br>
> >> >>> engine and VDSM (it doesn't require new verbs/call, the data is<br>
> >> >>> attached to<br>
> >> >>> VdsStats; probably the easiest mechanism to convert to events)<br>
> >> >>><br>
> >> >>><br>
> >> >>> Engine is already polling the host jobs API so I am not sure I<br>
> >> >>> agree<br>
> >> >>> with you here.<br>
> >> >>><br>
> >> >>><br>
> >> >>> 3. it is the most efficient implementation in terms of<br>
> >> >>> interaction<br>
> >> >>> with<br>
> >> >>> the<br>
> >> >>> database (no date is persisted into the database, no polling is<br>
> >> >>> done)<br>
> >> >>><br>
> >> >>><br>
> >> >>> Again, we're already using the Host Jobs API. We'll gain<br>
> >> >>> efficiency<br>
> >> >>> by migrating away from the old v2v API and having a single, unified<br>
> >> >>> approach (Host Jobs).<br>
> >> >>><br>
> >> >>><br>
> >> >>> Currently we have 3 mechanisms to report jobs:<br>
> >> >>> 1. VM jobs - that is currently used for live-merge. This<br>
> >> >>> requires<br>
> >> >>> the<br>
> >> >>> VM entity<br>
> >> >>> to exist in VDSM, thus not suitable for virt-sysprep.<br>
> >> >>><br>
> >> >>><br>
> >> >>> Correct, not appropriate for this application.<br>
> >> >>><br>
> >> >>><br>
> >> >>> 2. storage jobs - complicated infrastructure, targeted for<br>
> >> >>> recovering<br>
> >> >>> from<br>
> >> >>> failures to maintain storage consistency. Many of the things<br>
> >> >>> this<br>
> >> >>> infrastructure knows to handle is irrelevant for virt-sysprep<br>
> >> >>> flow, and<br>
> >> >>> the<br>
> >> >>> fact that virt-sysprep is invoked on VM rather than particular<br>
> >> >>> disk<br>
> >> >>> makes it<br>
> >> >>> less suitable.<br>
> >> >>><br>
> >> >>><br>
> >> >>> These are more appropriately called HostJobs and the have the<br>
> >> >>> following semantics:<br>
> >> >>> - They represent an external process running on a single host<br>
> >> >>> - They are not persisted. If the host or vdsm restarts, the job is<br>
> >> >>> aborted<br>
> >> >>> - They operate on entities. Currently storage is the first adopter<br>
> >> >>> of the infrastructure but virt was going to adopt these for the<br>
> >> >>> next-gen API. Entities can be volumes, storage domains, vms,<br>
> >> >>> network interfaces, etc.<br>
> >> >>> - Job status and progress is reported by the Host Jobs API. If a<br>
> >> >>> job<br>
> >> >>> is not present, then the underlying entitie(s) must be polled by<br>
> >> >>> engine to determine the actual state.<br>
> >> >>><br>
> >> >>><br>
> >> >>> 3. V2V jobs - no mechanism is provided to resume failed jobs,<br>
> >> >>> no<br>
> >> >>> leases, etc<br>
> >> >>><br>
> >> >>><br>
> >> >>> This is the old infra upon which Host Jobs are built. v2v has<br>
> >> >>> promised to move to Host Jobs in the future so we should not add<br>
> >> >>> new<br>
> >> >>> dependencies to this code.<br>
> >> >>><br>
> >> >>><br>
> >> >>> I have some arguments for using V2V-like jobs [1]:<br>
> >> >>> 1. creating template from vm is rarely done - if host goes<br>
> >> >>> unresponsive<br>
> >> >>> or any<br>
> >> >>> other failure is detected we can just remove the template and<br>
> >> >>> report<br>
> >> >>> the error<br>
> >> >>><br>
> >> >>><br>
> >> >>> We can chose this error handling with Host Jobs as well.<br>
> >> >>><br>
> >> >>><br>
> >> >>> 2. the phase of virt-sysprep is, unlike typical storage<br>
> >> >>> operation,<br>
> >> >>> short -<br>
> >> >>> reducing the risk of failures during the process<br>
> >> >>><br>
> >> >>><br>
> >> >>> Reduced risk of failures is never an excuse to have lax error<br>
> >> >>> handling. The storage flavored host jobs provide tons of utilities<br>
> >> >>> for making error handling standardized, easy to implement, and<br>
> >> >>> correct.<br>
> >> >>><br>
> >> >>><br>
> >> >>> 3. during the operation the VM is down - by locking the<br>
> >> >>> VM/template and<br>
> >> >>> its<br>
> >> >>> disks on the engine side, we render leases-like mechanism<br>
> >> >>> redundant<br>
> >> >>><br>
> >> >>><br>
> >> >>> Eventually we want to protect all operations on storage with<br>
> >> >>> sanlock<br>
> >> >>> leases. This is safer and allows for a more distributed approach<br>
> >> >>> to<br>
> >> >>> management. Again, the use of leases correctly in host jobs<br>
> >> >>> requires<br>
> >> >>> about 5 lines of code. The benefits of standardization far<br>
> >> >>> outweigh<br>
> >> >>> any perceived simplification resulting from omitting it.<br>
> >> >>><br>
> >> >>><br>
> >> >>> 4. in the worst case - the disk will not be corrupted (only<br>
> >> >>> some<br>
> >> >>> of the<br>
> >> >>> data<br>
> >> >>> might be removed).<br>
> >> >>><br>
> >> >>><br>
> >> >>> Again, the way engine chooses to handle job failures is independent<br>
> >> >>> of<br>
> >> >>> the mechanism. Let's separate that from this discussion.<br>
> >> >>><br>
> >> >>><br>
> >> >>> So I think that the mechanism for storage jobs is an over-kill<br>
> >> >>> for<br>
> >> >>> this<br>
> >> >>> case.<br>
> >> >>> We can keep it simple by generalise the V2V-job for other<br>
> >> >>> virt-tools<br>
> >> >>> jobs, like<br>
> >> >>> virt-sysprep.<br>
> >> >>><br>
> >> >>><br>
> >> >>> I think we ought to standardize on the Host Jobs framework where we<br>
> >> >>> can collaborate on unit tests, standardized locking and error<br>
> >> >>> handling, abort logic, etc. When v2v moves to host jobs then we<br>
> >> >>> will<br>
> >> >>> have a unified method of handling ephemeral jobs that are tied to<br>
> >> >>> entities.<br>
> >> >>><br>
> >> >>> --<br>
> >> >>> Adam Litke<br>
> >> >>><br>
> >> >>><br>
> >> >><br>
> >> >> --<br>
> >> >> Adam Litke<br>
> >> ><br>
> >> ><br></p>