<p dir="ltr"></p>

<p dir="ltr">On Dec 7, 2016 20:16, &quot;Nir Soffer&quot; &lt;<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>&gt; wrote:<br>

&gt;<br>

&gt; On Wed, Dec 7, 2016 at 8:10 PM, Oved Ourfali &lt;<a href="mailto:oourfali@redhat.com">oourfali@redhat.com</a>&gt; wrote:<br>

&gt; &gt; On Dec 7, 2016 16:00, &quot;Nir Soffer&quot; &lt;<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>&gt; wrote:<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali &lt;<a href="mailto:oourfali@redhat.com">oourfali@redhat.com</a>&gt; wrote:<br>

&gt; &gt;&gt; &gt;<br>

&gt; &gt;&gt; &gt;<br>

&gt; &gt;&gt; &gt; On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke &lt;<a href="mailto:alitke@redhat.com">alitke@redhat.com</a>&gt; wrote:<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; On 06/12/16 22:06 +0200, Arik Hadas wrote:<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt; Adam,<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; :)  You seem upset.  Sorry if I touched on a nerve...<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt; Just out of curiosity: when you write &quot;v2v has promised&quot; - what<br>

&gt; &gt;&gt; &gt;&gt;&gt; exactly<br>

&gt; &gt;&gt; &gt;&gt;&gt; do you<br>

&gt; &gt;&gt; &gt;&gt;&gt; mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and<br>

&gt; &gt;&gt; &gt;&gt;&gt; I<br>

&gt; &gt;&gt; &gt;&gt;&gt; that<br>

&gt; &gt;&gt; &gt;&gt;&gt; implemented the integration with virt-v2v? I&#39;m not aware of such a<br>

&gt; &gt;&gt; &gt;&gt;&gt; promise by<br>

&gt; &gt;&gt; &gt;&gt;&gt; any of these options :)<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; Some history...<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; Earlier this year Nir, Francesco (added), Shahar, and I began<br>

&gt; &gt;&gt; &gt;&gt; discussing the similarities between what storage needed to do with<br>

&gt; &gt;&gt; &gt;&gt; external commands and what was designed specifically for v2v.  I am<br>

&gt; &gt;&gt; &gt;&gt; not sure if you were involved in the project at that time.  The plan<br>

&gt; &gt;&gt; &gt;&gt; was to create common infrastructure that could be extended to fit the<br>

&gt; &gt;&gt; &gt;&gt; unique needs of the verticals.  The v2v code was going to be moved<br>

&gt; &gt;&gt; &gt;&gt; over to the new infrastructure (see [1]) and the only thing that<br>

&gt; &gt;&gt; &gt;&gt; stopped the initial patch was lack of a VMWare testing environment for<br>

&gt; &gt;&gt; &gt;&gt; verification.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; At that time storage refocused on developing verbs that used the new<br>

&gt; &gt;&gt; &gt;&gt; infrastructure and have been maintaining its suitability for general<br>

&gt; &gt;&gt; &gt;&gt; use.  Conversion of v2v -&gt; Host Jobs is obviously a lower priority<br>

&gt; &gt;&gt; &gt;&gt; item and much more difficult now due to the early missed opportunity.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt; Anyway, let&#39;s say that you were given such a promise by someone and<br>

&gt; &gt;&gt; &gt;&gt;&gt; thus<br>

&gt; &gt;&gt; &gt;&gt;&gt; consider that mechanism to be deprecated - it doesn&#39;t really matter.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; I may be biased but I think my opinion does matter.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt; The current implementation doesn&#39;t well fit to this flow (it requires<br>

&gt; &gt;&gt; &gt;&gt;&gt; per-volume job, it creates leases that are not needed for template&#39;s<br>

&gt; &gt;&gt; &gt;&gt;&gt; disks,<br>

&gt; &gt;&gt; &gt;&gt;&gt; ...) and with the &quot;next-gen API&quot; with proper support for virt flows<br>

&gt; &gt;&gt; &gt;&gt;&gt; not<br>

&gt; &gt;&gt; &gt;&gt;&gt; even<br>

&gt; &gt;&gt; &gt;&gt;&gt; being discussed with us (and iiuc also not with the infra team) yet, I<br>

&gt; &gt;&gt; &gt;&gt;&gt; don&#39;t<br>

&gt; &gt;&gt; &gt;&gt;&gt; understand what do you suggest except for some strong, though<br>

&gt; &gt;&gt; &gt;&gt;&gt; irrelevant,<br>

&gt; &gt;&gt; &gt;&gt;&gt; statements.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; If you are willing to engage in a good-faith technical discussion I am<br>

&gt; &gt;&gt; &gt;&gt; sure I can help you to understand.  These operations to storage demand<br>

&gt; &gt;&gt; &gt;&gt; some form of locking protection.  If volume leases aren&#39;t appropriate<br>

&gt; &gt;&gt; &gt;&gt; then<br>

&gt; &gt;&gt; &gt;&gt; perhaps we should use the VM Leases / xleases that Nir is finishing<br>

&gt; &gt;&gt; &gt;&gt; off for 4.1 now.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt; I suggest loud and clear to reuse (not to add dependencies, not to<br>

&gt; &gt;&gt; &gt;&gt;&gt; enhance, ..)<br>

&gt; &gt;&gt; &gt;&gt;&gt; an existing mechanism for a very similar flow of virt-v2v that works<br>

&gt; &gt;&gt; &gt;&gt;&gt; well<br>

&gt; &gt;&gt; &gt;&gt;&gt; and<br>

&gt; &gt;&gt; &gt;&gt;&gt; simple.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; I clearly remember discussions involving infra (hello Oved), virt<br>

&gt; &gt;&gt; &gt;&gt; (hola Michal), and storage where we decided that new APIs performing<br>

&gt; &gt;&gt; &gt;&gt; async operations involving external commands should use the HostJobs<br>

&gt; &gt;&gt; &gt;&gt; infrastructure instead of adding more information to Host Stats.<br>

&gt; &gt;&gt; &gt;&gt; These were the &quot;famous&quot; entity polling meetings.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; We discussed these issues behind close doors, not in the public mailing<br>

&gt; &gt;&gt; list,<br>

&gt; &gt;&gt; so it is not surprising that people do not know about the agreements we<br>

&gt; &gt;&gt; had.<br>

&gt; &gt;&gt;<br>

&gt; &gt;<br>

&gt; &gt; The core team was there. So it is surprising.<br>

&gt; &gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; Of course plans can change but I have never been looped into any such<br>

&gt; &gt;&gt; &gt;&gt; discussions.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;<br>

&gt; &gt;&gt; &gt; Well, I think that when someone builds a good infrastructure he first<br>

&gt; &gt;&gt; &gt; needs<br>

&gt; &gt;&gt; &gt; to talk to all consumers and make sure it fits.<br>

&gt; &gt;&gt; &gt; In this case it seems like most work was done to fit the storage<br>

&gt; &gt;&gt; &gt; use-case,<br>

&gt; &gt;&gt; &gt; and now you check whether it can fit others as well....<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; The jobs framework is generic and can be used for any subsystem,<br>

&gt; &gt;&gt; there is nothing related to storage about it. But modifying disks *is*<br>

&gt; &gt;&gt; a storage operation, even if someone from the virt team worked on it.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; V2v is also storage operation - if we compare it with copying disks:<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; - we create a new volume that nobody is using yet<br>

&gt; &gt;&gt; - if the operation fails, the disk must be in illegal state<br>

&gt; &gt;&gt; - if the operation fails we delete the disks<br>

&gt; &gt;&gt; - if the operation succeeds the volume must be legal<br>

&gt; &gt;&gt; - we need to limit the number of operations on a host<br>

&gt; &gt;&gt; - we need to detect the job state if the host becomes non-responsive<br>

&gt; &gt;&gt; - we may want to fence the job if the host becomes non-responsive<br>

&gt; &gt;&gt;   in volume jobs, we can increment the volume generation and run<br>

&gt; &gt;&gt;   the same job on another host.<br>

&gt; &gt;&gt; - we want to take a lease on storage to ensure that other hosts cannot<br>

&gt; &gt;&gt;   access the same entity, or that the job will fail if someone else is<br>

&gt; &gt;&gt; using<br>

&gt; &gt;&gt;   this entity<br>

&gt; &gt;&gt; - we want to take a lease on storage, ensuring that a job cannot get<br>

&gt; &gt;&gt;   stuck for long time - sanlock kill the owner of a lease when storage<br>

&gt; &gt;&gt;   becomes inaccessible.<br>

&gt; &gt;&gt; - we want to report progress<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; sysprep is less risky because the operation is faster, but on storage even<br>

&gt; &gt;&gt; fast operation can get stuck for minutes.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; We need to agree on a standard way to do such operations that is safe<br>

&gt; &gt;&gt; enough<br>

&gt; &gt;&gt; and can be managed on the engine side.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt; IMO it makes much more sense to use events where possible (and you&#39;ve<br>

&gt; &gt;&gt; &gt; promised to use those as well, but I don&#39;t see you doing that...). v2v<br>

&gt; &gt;&gt; &gt; should use events for sure, and they have promised to do that in the<br>

&gt; &gt;&gt; &gt; past,<br>

&gt; &gt;&gt; &gt; instead of using the v2v jobs. The reason events weren&#39;t used originally<br>

&gt; &gt;&gt; &gt; with the v2v feature, was that it was too risky and the events<br>

&gt; &gt;&gt; &gt; infrastructure was added too late in the game.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; Events are not replacing the need for managing jobs in the vdsm side.<br>

&gt; &gt;&gt; Engine must have a way to query the current jobs before subscribing<br>

&gt; &gt;&gt; to events from these jobs, otherwise you will loose events and engine<br>

&gt; &gt;&gt; will never notice a completed job after network errors.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; The jobs framework supports events, see<br>

&gt; &gt;&gt; <a href="https://gerrit.ovirt.org/67118">https://gerrit.ovirt.org/67118</a><br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; We are waiting for review from the infra team, maybe you can<br>

&gt; &gt;&gt; get someone to review this?<br>

&gt; &gt;<br>

&gt; &gt; It would have been great to review the design for this before it reaches to<br>

&gt; &gt; gerrit.<br>

&gt; &gt; Anyway, I get permissions error when opening. Any clue why?<br>

&gt;<br>

&gt; It is a recent bug in gerrit, or configuration issue, drafts are<br>

&gt; private sometimes.<br>

&gt;<br>

&gt; I added you as reviewer, can you see this now?<br>

&gt;</p>

<p dir="ltr">Yes. I see Piotr is already on it. <br>

I&#39;ll also be happy to hear how are you going to use events in your current design. </p>

<p dir="ltr">Also, is there a design page for this work? </p>

<p dir="ltr">Thanks,<br>

Oved </p>

<p dir="ltr">&gt; Nir<br>

&gt;<br>

&gt; &gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; Nir<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;<br>

&gt; &gt;&gt; &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt; Do you &quot;promise&quot; to implement your &quot;next gen API&quot; for 4.1 as an<br>

&gt; &gt;&gt; &gt;&gt;&gt; alternative?<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; I guess we need the design first.<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt; On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke &lt;<a href="mailto:alitke@redhat.com">alitke@redhat.com</a>&gt; wrote:<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    On 05/12/16 11:17 +0200, Arik Hadas wrote:<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer<br>

&gt; &gt;&gt; &gt;&gt;&gt; &lt;<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt; wrote:<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud<br>

&gt; &gt;&gt; &gt;&gt;&gt; &lt;<a href="mailto:smelamud@redhat.com">smelamud@redhat.com</a>&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        wrote:<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; Hi!<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; I&#39;m currently working on integration of virt-sysprep into<br>

&gt; &gt;&gt; &gt;&gt;&gt; oVirt.<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; Usually, if user creates a template from a regular VM, and<br>

&gt; &gt;&gt; &gt;&gt;&gt; then<br>

&gt; &gt;&gt; &gt;&gt;&gt;        creates<br>

&gt; &gt;&gt; &gt;&gt;&gt;           new VMs from this template, these new VMs inherit all<br>

&gt; &gt;&gt; &gt;&gt;&gt; configuration<br>

&gt; &gt;&gt; &gt;&gt;&gt;        of the<br>

&gt; &gt;&gt; &gt;&gt;&gt;           original VM, including SSH keys, UDEV rules, MAC addresses,<br>

&gt; &gt;&gt; &gt;&gt;&gt; system<br>

&gt; &gt;&gt; &gt;&gt;&gt;        ID,<br>

&gt; &gt;&gt; &gt;&gt;&gt;           hostname etc. It is unfortunate, because you cannot have two<br>

&gt; &gt;&gt; &gt;&gt;&gt; network<br>

&gt; &gt;&gt; &gt;&gt;&gt;           devices with the same MAC address in the same network, for<br>

&gt; &gt;&gt; &gt;&gt;&gt; example.<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; To avoid this, user must clean all machine-specific<br>

&gt; &gt;&gt; &gt;&gt;&gt; configuration<br>

&gt; &gt;&gt; &gt;&gt;&gt;        from<br>

&gt; &gt;&gt; &gt;&gt;&gt;           the original VM before creating a template from it. You can<br>

&gt; &gt;&gt; &gt;&gt;&gt; do<br>

&gt; &gt;&gt; &gt;&gt;&gt; this<br>

&gt; &gt;&gt; &gt;&gt;&gt;           manually, but there is virt-sysprep utility that does this<br>

&gt; &gt;&gt; &gt;&gt;&gt;        automatically.<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; Ideally, virt-sysprep should be seamlessly integrated into<br>

&gt; &gt;&gt; &gt;&gt;&gt;        template<br>

&gt; &gt;&gt; &gt;&gt;&gt;           creation process. But the first step is to create a simple<br>

&gt; &gt;&gt; &gt;&gt;&gt; button:<br>

&gt; &gt;&gt; &gt;&gt;&gt;        user<br>

&gt; &gt;&gt; &gt;&gt;&gt;           selects a VM, clicks the button and oVirt executes<br>

&gt; &gt;&gt; &gt;&gt;&gt; virt-sysprep<br>

&gt; &gt;&gt; &gt;&gt;&gt; on<br>

&gt; &gt;&gt; &gt;&gt;&gt;        the VM.<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; virt-sysprep works directly on VM&#39;s filesystem. It accepts<br>

&gt; &gt;&gt; &gt;&gt;&gt; list of<br>

&gt; &gt;&gt; &gt;&gt;&gt;        all<br>

&gt; &gt;&gt; &gt;&gt;&gt;           disks of the VM as parameters:<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; virt-sysprep -a disk1.img -a disk2.img -a disk3.img<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; The architecture is as follows: command on the Engine side<br>

&gt; &gt;&gt; &gt;&gt;&gt; runs a<br>

&gt; &gt;&gt; &gt;&gt;&gt;        job on<br>

&gt; &gt;&gt; &gt;&gt;&gt;           VDSM side and tracks its success/failure. The job on VDSM<br>

&gt; &gt;&gt; &gt;&gt;&gt; side<br>

&gt; &gt;&gt; &gt;&gt;&gt; runs<br>

&gt; &gt;&gt; &gt;&gt;&gt;           virt-sysprep.<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; The question is how to implement the job correctly?<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; I thought about using storage jobs, but they are designed<br>

&gt; &gt;&gt; &gt;&gt;&gt; to<br>

&gt; &gt;&gt; &gt;&gt;&gt; work<br>

&gt; &gt;&gt; &gt;&gt;&gt;        only<br>

&gt; &gt;&gt; &gt;&gt;&gt;           with a single volume, correct?<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           New storage verbs are volume based. This make it easy to<br>

&gt; &gt;&gt; &gt;&gt;&gt; manage<br>

&gt; &gt;&gt; &gt;&gt;&gt;           them on the engine side, and will allow parallelizing volume<br>

&gt; &gt;&gt; &gt;&gt;&gt;        operations<br>

&gt; &gt;&gt; &gt;&gt;&gt;           on single or multiple hosts.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           A storage volume job is using sanlock lease on the modified<br>

&gt; &gt;&gt; &gt;&gt;&gt; volume<br>

&gt; &gt;&gt; &gt;&gt;&gt;           and volume generation number. If a host running pending jobs<br>

&gt; &gt;&gt; &gt;&gt;&gt; becomes<br>

&gt; &gt;&gt; &gt;&gt;&gt;           non-responsive and cannot be fenced, we can detect the state<br>

&gt; &gt;&gt; &gt;&gt;&gt; of<br>

&gt; &gt;&gt; &gt;&gt;&gt;           the job, fence the job, and start the job on another host.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           In the SPM task, if a host becomes non-responsive and cannot<br>

&gt; &gt;&gt; &gt;&gt;&gt; be<br>

&gt; &gt;&gt; &gt;&gt;&gt;           fenced, the whole setup is stuck, there is no way to perform<br>

&gt; &gt;&gt; &gt;&gt;&gt; any<br>

&gt; &gt;&gt; &gt;&gt;&gt;           storage operation.<br>

&gt; &gt;&gt; &gt;&gt;&gt;             &gt; Is is possible to use them with operation that is<br>

&gt; &gt;&gt; &gt;&gt;&gt; performed<br>

&gt; &gt;&gt; &gt;&gt;&gt; on<br>

&gt; &gt;&gt; &gt;&gt;&gt;        multiple<br>

&gt; &gt;&gt; &gt;&gt;&gt;           volumes?<br>

&gt; &gt;&gt; &gt;&gt;&gt;           &gt; Or, alternatively, is it possible to use some kind of &#39;VM<br>

&gt; &gt;&gt; &gt;&gt;&gt; jobs&#39; -<br>

&gt; &gt;&gt; &gt;&gt;&gt;        that<br>

&gt; &gt;&gt; &gt;&gt;&gt;           work on VM at whole?<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           We can do:<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           1. Add jobs with multiple volumes leases - can make error<br>

&gt; &gt;&gt; &gt;&gt;&gt; handling<br>

&gt; &gt;&gt; &gt;&gt;&gt;        very<br>

&gt; &gt;&gt; &gt;&gt;&gt;               complex. How do tell a job state if you have multiple<br>

&gt; &gt;&gt; &gt;&gt;&gt; leases?<br>

&gt; &gt;&gt; &gt;&gt;&gt;        which<br>

&gt; &gt;&gt; &gt;&gt;&gt;               volume generation you use?<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           2. Use volume job using one of the volumes (the boot<br>

&gt; &gt;&gt; &gt;&gt;&gt; volume?).<br>

&gt; &gt;&gt; &gt;&gt;&gt; This<br>

&gt; &gt;&gt; &gt;&gt;&gt;        does<br>

&gt; &gt;&gt; &gt;&gt;&gt;               not protect the other volumes from modification but<br>

&gt; &gt;&gt; &gt;&gt;&gt; engine<br>

&gt; &gt;&gt; &gt;&gt;&gt; is<br>

&gt; &gt;&gt; &gt;&gt;&gt;           responsible<br>

&gt; &gt;&gt; &gt;&gt;&gt;               for this.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           3. Use new &quot;vm jobs&quot;, using a vm lease (should be available<br>

&gt; &gt;&gt; &gt;&gt;&gt; this<br>

&gt; &gt;&gt; &gt;&gt;&gt;        week<br>

&gt; &gt;&gt; &gt;&gt;&gt;           on master).<br>

&gt; &gt;&gt; &gt;&gt;&gt;               This protects a vm during sysprep from starting the vm.<br>

&gt; &gt;&gt; &gt;&gt;&gt;               We still need a generation to detect the job state, I<br>

&gt; &gt;&gt; &gt;&gt;&gt; think<br>

&gt; &gt;&gt; &gt;&gt;&gt; we<br>

&gt; &gt;&gt; &gt;&gt;&gt;        can<br>

&gt; &gt;&gt; &gt;&gt;&gt;           use the sanlock<br>

&gt; &gt;&gt; &gt;&gt;&gt;               lease generation for this.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           I like the last option since sysprep is much like running a<br>

&gt; &gt;&gt; &gt;&gt;&gt; vm.<br>

&gt; &gt;&gt; &gt;&gt;&gt;             &gt; How v2v solves this problem?<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           It does not.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           v2v predates storage volume jobs. It does not use volume<br>

&gt; &gt;&gt; &gt;&gt;&gt; leases<br>

&gt; &gt;&gt; &gt;&gt;&gt; and<br>

&gt; &gt;&gt; &gt;&gt;&gt;           generation<br>

&gt; &gt;&gt; &gt;&gt;&gt;           and does have any way to recover if a host running v2v<br>

&gt; &gt;&gt; &gt;&gt;&gt; becomes<br>

&gt; &gt;&gt; &gt;&gt;&gt;           non-responsive<br>

&gt; &gt;&gt; &gt;&gt;&gt;           and cannot be fenced.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;           It also does not use the jobs framework and does not use a<br>

&gt; &gt;&gt; &gt;&gt;&gt; thread<br>

&gt; &gt;&gt; &gt;&gt;&gt;        pool for<br>

&gt; &gt;&gt; &gt;&gt;&gt;           v2v jobs, so it has no limit on the number of storage<br>

&gt; &gt;&gt; &gt;&gt;&gt; operations on<br>

&gt; &gt;&gt; &gt;&gt;&gt;        a host.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        Right, but let&#39;s be fair and present the benefits of v2v-jobs<br>

&gt; &gt;&gt; &gt;&gt;&gt; as<br>

&gt; &gt;&gt; &gt;&gt;&gt; well:<br>

&gt; &gt;&gt; &gt;&gt;&gt;        1. it is the simplest &quot;infrastructure&quot; in terms of LOC<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    It is also deprecated.  V2V has promised to adopt the richer Host<br>

&gt; &gt;&gt; &gt;&gt;&gt; Jobs<br>

&gt; &gt;&gt; &gt;&gt;&gt;    API in the future.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        2. it is the most efficient mechanism in terms of interactions<br>

&gt; &gt;&gt; &gt;&gt;&gt; between<br>

&gt; &gt;&gt; &gt;&gt;&gt;        the<br>

&gt; &gt;&gt; &gt;&gt;&gt;        engine and VDSM (it doesn&#39;t require new verbs/call, the data is<br>

&gt; &gt;&gt; &gt;&gt;&gt;        attached to<br>

&gt; &gt;&gt; &gt;&gt;&gt;        VdsStats; probably the easiest mechanism to convert to events)<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    Engine is already polling the host jobs API so I am not sure I<br>

&gt; &gt;&gt; &gt;&gt;&gt; agree<br>

&gt; &gt;&gt; &gt;&gt;&gt;    with you here.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        3. it is the most efficient implementation in terms of<br>

&gt; &gt;&gt; &gt;&gt;&gt; interaction<br>

&gt; &gt;&gt; &gt;&gt;&gt; with<br>

&gt; &gt;&gt; &gt;&gt;&gt;        the<br>

&gt; &gt;&gt; &gt;&gt;&gt;        database (no date is persisted into the database, no polling is<br>

&gt; &gt;&gt; &gt;&gt;&gt; done)<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    Again, we&#39;re already using the Host Jobs API.  We&#39;ll gain<br>

&gt; &gt;&gt; &gt;&gt;&gt; efficiency<br>

&gt; &gt;&gt; &gt;&gt;&gt;    by migrating away from the old v2v API and having a single, unified<br>

&gt; &gt;&gt; &gt;&gt;&gt;    approach (Host Jobs).<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        Currently we have 3 mechanisms to report jobs:<br>

&gt; &gt;&gt; &gt;&gt;&gt;        1. VM jobs - that is currently used for live-merge. This<br>

&gt; &gt;&gt; &gt;&gt;&gt; requires<br>

&gt; &gt;&gt; &gt;&gt;&gt; the<br>

&gt; &gt;&gt; &gt;&gt;&gt;        VM entity<br>

&gt; &gt;&gt; &gt;&gt;&gt;        to exist in VDSM, thus not suitable for virt-sysprep.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    Correct, not appropriate for this application.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        2. storage jobs - complicated infrastructure, targeted for<br>

&gt; &gt;&gt; &gt;&gt;&gt; recovering<br>

&gt; &gt;&gt; &gt;&gt;&gt;        from<br>

&gt; &gt;&gt; &gt;&gt;&gt;        failures to maintain storage consistency. Many of the things<br>

&gt; &gt;&gt; &gt;&gt;&gt; this<br>

&gt; &gt;&gt; &gt;&gt;&gt;        infrastructure knows to handle is irrelevant for virt-sysprep<br>

&gt; &gt;&gt; &gt;&gt;&gt; flow, and<br>

&gt; &gt;&gt; &gt;&gt;&gt;        the<br>

&gt; &gt;&gt; &gt;&gt;&gt;        fact that virt-sysprep is invoked on VM rather than particular<br>

&gt; &gt;&gt; &gt;&gt;&gt; disk<br>

&gt; &gt;&gt; &gt;&gt;&gt;        makes it<br>

&gt; &gt;&gt; &gt;&gt;&gt;        less suitable.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    These are more appropriately called HostJobs and the have the<br>

&gt; &gt;&gt; &gt;&gt;&gt;    following semantics:<br>

&gt; &gt;&gt; &gt;&gt;&gt;    - They represent an external process running on a single host<br>

&gt; &gt;&gt; &gt;&gt;&gt;    - They are not persisted.  If the host or vdsm restarts, the job is<br>

&gt; &gt;&gt; &gt;&gt;&gt;      aborted<br>

&gt; &gt;&gt; &gt;&gt;&gt;    - They operate on entities.  Currently storage is the first adopter<br>

&gt; &gt;&gt; &gt;&gt;&gt;      of the infrastructure but virt was going to adopt these for the<br>

&gt; &gt;&gt; &gt;&gt;&gt;      next-gen API.  Entities can be volumes, storage domains, vms,<br>

&gt; &gt;&gt; &gt;&gt;&gt;      network interfaces, etc.<br>

&gt; &gt;&gt; &gt;&gt;&gt;    - Job status and progress is reported by the Host Jobs API.  If a<br>

&gt; &gt;&gt; &gt;&gt;&gt; job<br>

&gt; &gt;&gt; &gt;&gt;&gt;      is not present, then the underlying entitie(s) must be polled by<br>

&gt; &gt;&gt; &gt;&gt;&gt;      engine to determine the actual state.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        3. V2V jobs - no mechanism is provided to resume failed jobs,<br>

&gt; &gt;&gt; &gt;&gt;&gt; no<br>

&gt; &gt;&gt; &gt;&gt;&gt;        leases, etc<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    This is the old infra upon which Host Jobs are built.  v2v has<br>

&gt; &gt;&gt; &gt;&gt;&gt;    promised to move to Host Jobs in the future so we should not add<br>

&gt; &gt;&gt; &gt;&gt;&gt; new<br>

&gt; &gt;&gt; &gt;&gt;&gt;    dependencies to this code.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        I have some arguments for using V2V-like jobs [1]:<br>

&gt; &gt;&gt; &gt;&gt;&gt;        1. creating template from vm is rarely done - if host goes<br>

&gt; &gt;&gt; &gt;&gt;&gt; unresponsive<br>

&gt; &gt;&gt; &gt;&gt;&gt;        or any<br>

&gt; &gt;&gt; &gt;&gt;&gt;        other failure is detected we can just remove the template and<br>

&gt; &gt;&gt; &gt;&gt;&gt; report<br>

&gt; &gt;&gt; &gt;&gt;&gt;        the error<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    We can chose this error handling with Host Jobs as well.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        2. the phase of virt-sysprep is, unlike typical storage<br>

&gt; &gt;&gt; &gt;&gt;&gt; operation,<br>

&gt; &gt;&gt; &gt;&gt;&gt;        short -<br>

&gt; &gt;&gt; &gt;&gt;&gt;        reducing the risk of failures during the process<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    Reduced risk of failures is never an excuse to have lax error<br>

&gt; &gt;&gt; &gt;&gt;&gt;    handling.  The storage flavored host jobs provide tons of utilities<br>

&gt; &gt;&gt; &gt;&gt;&gt;    for making error handling standardized, easy to implement, and<br>

&gt; &gt;&gt; &gt;&gt;&gt;    correct.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        3. during the operation the VM is down - by locking the<br>

&gt; &gt;&gt; &gt;&gt;&gt; VM/template and<br>

&gt; &gt;&gt; &gt;&gt;&gt;        its<br>

&gt; &gt;&gt; &gt;&gt;&gt;        disks on the engine side, we render leases-like mechanism<br>

&gt; &gt;&gt; &gt;&gt;&gt; redundant<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    Eventually we want to protect all operations on storage with<br>

&gt; &gt;&gt; &gt;&gt;&gt; sanlock<br>

&gt; &gt;&gt; &gt;&gt;&gt;    leases.  This is safer and allows for a more distributed approach<br>

&gt; &gt;&gt; &gt;&gt;&gt; to<br>

&gt; &gt;&gt; &gt;&gt;&gt;    management.  Again, the use of leases correctly in host jobs<br>

&gt; &gt;&gt; &gt;&gt;&gt; requires<br>

&gt; &gt;&gt; &gt;&gt;&gt;    about 5 lines of code.  The benefits of standardization far<br>

&gt; &gt;&gt; &gt;&gt;&gt; outweigh<br>

&gt; &gt;&gt; &gt;&gt;&gt;    any perceived simplification resulting from omitting it.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        4. in the worst case - the disk will not be corrupted (only<br>

&gt; &gt;&gt; &gt;&gt;&gt; some<br>

&gt; &gt;&gt; &gt;&gt;&gt; of the<br>

&gt; &gt;&gt; &gt;&gt;&gt;        data<br>

&gt; &gt;&gt; &gt;&gt;&gt;        might be removed).<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    Again, the way engine chooses to handle job failures is independent<br>

&gt; &gt;&gt; &gt;&gt;&gt; of<br>

&gt; &gt;&gt; &gt;&gt;&gt;    the mechanism.  Let&#39;s separate that from this discussion.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;        So I think that the mechanism for storage jobs is an over-kill<br>

&gt; &gt;&gt; &gt;&gt;&gt; for<br>

&gt; &gt;&gt; &gt;&gt;&gt; this<br>

&gt; &gt;&gt; &gt;&gt;&gt;        case.<br>

&gt; &gt;&gt; &gt;&gt;&gt;        We can keep it simple by generalise the V2V-job for other<br>

&gt; &gt;&gt; &gt;&gt;&gt; virt-tools<br>

&gt; &gt;&gt; &gt;&gt;&gt;        jobs, like<br>

&gt; &gt;&gt; &gt;&gt;&gt;        virt-sysprep.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    I think we ought to standardize on the Host Jobs framework where we<br>

&gt; &gt;&gt; &gt;&gt;&gt;    can collaborate on unit tests, standardized locking and error<br>

&gt; &gt;&gt; &gt;&gt;&gt;    handling, abort logic, etc.  When v2v moves to host jobs then we<br>

&gt; &gt;&gt; &gt;&gt;&gt; will<br>

&gt; &gt;&gt; &gt;&gt;&gt;    have a unified method of handling ephemeral jobs that are tied to<br>

&gt; &gt;&gt; &gt;&gt;&gt;    entities.<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;    --<br>

&gt; &gt;&gt; &gt;&gt;&gt;    Adam Litke<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt;<br>

&gt; &gt;&gt; &gt;&gt; --<br>

&gt; &gt;&gt; &gt;&gt; Adam Litke<br>

&gt; &gt;&gt; &gt;<br>

&gt; &gt;&gt; &gt;<br></p>