On Wed, Mar 15, 2017 at 2:28 PM Francesco Romani <fromani@redhat.com> wrote:
Hi everyone,

This is both a report of the current state of my Vdsm patches for Engine
XML support, and a proposal to how move forward and solve
the current open issues.

TL;DR:
1. we can and IMO should reuse the current JSON schema to describe the
structure (layout) and the types of the metadata section.
2. we don't need a priori validation of stuff in the metadata section.
We will just raise in the creation flow if data is missing, or wrong,
according to our schema.
2. we will add *few* items to the metadata section, only thing we can't
express clearly-or at all in the libvirt XML. Redundancy and verbosiness
will be thus kept at bay
3. I believe [3] is the best tool to do (de)serialize data to the
metadata section. Existing tools fits poorly in our very specific use case

Examples below

+++

Long(er) discussion:


I have working code[1][2] to encode any custom, picklable, python
object in the metadata section.

We should decide which module will do the actual python<=>XML
transformation.
Please note that this actually also influences how the data in the
medata section look like, so the two things are a bit coupled.

I'm eager to reinvent another wheel, but after
initial evaluation I honestly think that my pyxmlpickle[3] is the best
tool for the job over the current alternatives: plistlib[4] and
xmltodict[5].

I added the initial rationale here:
https://gerrit.ovirt.org/#/c/73790/4//COMMIT_MSG

I have completed the initial draft of patches to make it possible to
initialize devices from their XML representation [6]. This is the bare
minimum we need to support the Engine XML, and we *need* this anyway to
unlock the cleanup we planned and I outlined in my google doc.

So we are progressing, but I'd like to speed up things. Those [6]
patches are not yet complete, many flows are not covered or tested;  but
they are good enough to demonstrate that there *are* pieces of
information wen need to properly initialize the devices, but we can't
easily extract from the XML.

First examples that come to my mind are the storage.Drive UUIDs; there
could also be some ambiguity I'm investigating right now for
displayIp/displayNetwork in Graphics devices. In [6] there are various
TODO to mark more of those cases. Most likely, few more cases will pop
out as I cover all the flows we support.

Long story short: it is hard to correctly rebuild the device conf from
the XML. This is why in [6] I added the 'meta' argument to from_xml_tree
classmethod in [7].

'meta' is supposed to be the device metadata: extra data related to a
device which doesn't (yet) fit in the libvirt XML representation.
For example, we can store 'displayIp' and 'displayNetwork' here and be
done with that: using both per-device metadata and the XML
representation of one graphic device, we will have everything we need to
properly build one graphics.Graphics device.
This example may (hopefully) be bogus, but I'm keeping it because it is
one case easy to follow.

The device metadata is going to be stored in the vm metadata for the
short/mid term future. Even if the per-device metadata idea/RFE is
accepted (no answer yet, but we are working on it), we will not have in
7.4, and unlikely in 7.5.

As it stands today, I believe there are two open questions:

1. do we need a schema for the metadata section?
2. how do we bind the metadata to the devices? How do we know which
metadata belongs to which metadata, if we don't have aliases nor
addresses to match? (e.g. very first time the VM is created!)

My current stance is the following
1. In general, one schema gives us two benefits: 1.a. we document how
the layout of the data should be, including types; 1.b. we can validate
the data we receive.
So yes, we need a schema, but we don't need a *new* schema. I think we
are in good enough shape with the current Vdsm schema: we can just
translate the python object layout to a XML layout.

One example is probably more explicative. Some actual data may look
like, using my pyxmlpickle module:

<domain type='kvm' id='5'>
  <name>a0</name>
  <uuid>ccd945c8-8069-4f31-8471-bbb58e9dd6ea</uuid>
  <metadata xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0"

The url is broken - do we have this?
 
xmlns:ovirt-vm="http://ovirt.org/vm/1.0"
xmlns:ovirt-instance="http://ovirt.org/vm/instance/1.0">
    <ovirt-tune:qos/> 

What do we keep here? why does it need its own namespace?
 
    <ovirt-vm:vm/>

What do we keep here? why does it need its own namespace?

Can we merge all namespaces into one generic namespace?
 
    <ovirt-instance:instance>

What is instance?
 
      <ovirt-instance:value type="dict">  
        <ovirt-instance:item key="devices" type="list"> 
          <ovirt-instance:item index="0" type="dict">

Isn't index redundant?
 
            <ovirt-instance:item key="device"
type="str">vnc</ovirt-instance:item>
            <ovirt-instance:item key="specParams" type="dict">
              <ovirt-instance:item key="displayNetwork"
type="str">ovirtmgmt</ovirt-instance:item>
              <ovirt-instance:item key="displayIp"
type="str">192.168.1.53</ovirt-instance:item>
            </ovirt-instance:item>
            <ovirt-instance:item key="type"
type="str">graphics</ovirt-instance:item>
          </ovirt-instance:item>
          <ovirt-instance:item index="1" type="dict">
            <ovirt-instance:item key="device"
type="str">spice</ovirt-instance:item>
            <ovirt-instance:item key="specParams" type="dict">
              <ovirt-instance:item key="displayNetwork"
type="str">ovirtmgmt</ovirt-instance:item>
              <ovirt-instance:item key="displayIp"
type="str">192.168.1.53</ovirt-instance:item>
            </ovirt-instance:item>
            <ovirt-instance:item key="type"
type="str">graphics</ovirt-instance:item>
          </ovirt-instance:item>
          <ovirt-instance:item index="2" type="dict">
            <ovirt-instance:item key="poolID"
type="str">5890a292-0390-01d2-01ed-00000000029a</ovirt-instance:item>
            <ovirt-instance:item key="imageID"
type="str">66441539-f7ac-4946-8a25-75e422f939d4</ovirt-instance:item>
            <ovirt-instance:item key="domainID"
type="str">c578566d-bc61-420c-8f1e-8dfa0a18efd5</ovirt-instance:item>
            <ovirt-instance:item key="device"
type="str">disk</ovirt-instance:item>
            <ovirt-instance:item key="path"
type="str">/rhev/data-center/5890a292-0390-01d2-01ed-00000000029a/c578566d-bc61-420c-8f1e-8dfa0a18efd5/images/66441539-f7ac-4946-8a25-75e422f939d4/5c4eeed4-f2a7-490a-ab57-a0d6f3a711cc</ovirt-instance:item>
            <ovirt-instance:item key="volumeID"
type="str">5c4eeed4-f2a7-490a-ab57-a0d6f3a711cc</ovirt-instance:item>
          </ovirt-instance:item>
        </ovirt-instance:item>
      </ovirt-instance:value>
    </ovirt-instance:instance>
  </metadata>
 <!-- omitted for brevity -->
</domain>

How about a generic json namespace?

json:

{
    "devices": [
       {
           "foo": "bar"
       }
   ]
}

xml:

<json:object>
    <json:key>devices</json:key>
    <json:list>
       <json:object>
           <json:key>foo</json:key>
           <json:string>bar</json:string>
       </json:object>
    </json:list>
</json:object>

We can query and modify this xml using xpath, like this:

>>> r.findall("./metadata/json:object/[json:key='devices']/json:list/json:object/", namespaces={"json": "http://ovirt.org/json/1.0"})
[<Element '{http://ovirt.org/json/1.0}key' at 0x7fd3386a69d0>, <Element '{http://ovirt.org/json/1.0}string' at 0x7fd3386a6a50>]

Not sure this format will be easy to modify, needs to tinker more
with this.

It will be probably easier to parse the entire metadata, change it,
and serialize it back.

Maybe using key and type attribute as you suggest makes it simpler to use,
and we since we convert from xml to python or python to xml, we can
have a "py" namespace.

<py:item>
    <py:item key="devices" type="list">
        <py:item type="dict">
            <py:item key="foo" type="str">bar</py:item>
        </py:item>
    </py:item>
</py:item>

With this we can have:

>>> r.findall("./metadata/py:item/py:item[@key='devices']/py:item[1]/py:item[@key='foo']", namespaces={"py": "http://ovirt.org/py/1.0"})[0].text
'bar'

Something like this can be useful for others as well.

Please note that yes, this is still verbose, but we don't want to add
much data here, for most of information the most reliable source will
be the domain XML. We will add here only the extra info we can't really
fetch from that.

2. I don't think we need explicit validation: we could just raise along
the way in the creation flow if we don't find some extra metadata we
need. This will also solve the issue that if we reuse the current schema
and we omit most of data, we will lack quite a lot of elements
marked mandatory.

Once we reached agreement, I will update my
https://docs.google.com/document/d/1eD8KSLwwyo2Sk64MytbmE0wBxxMlpIyEI1GRcHDkp7Y/edit#heading=h.hqdqzmmm9i77
accordingly.

Final note: while device take the lion's share, we will likely need help
from the metadata section also to store VM extra info, but all the above
discussion also applies here.

+++

[1]
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:virt-metadata3
- uses xmltodict
[2]
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:virt-metadata-pyxmlpickle
ported the 'virt-metadata3' topic to pyxmlpickle
[3] https://github.com/fromanirh/pyxmlpickle
[4] https://docs.python.org/2/library/plistlib.html
[5] https://github.com/martinblech/xmltodict
[6]
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:vm-devs-xml
[7] https://gerrit.ovirt.org/#/c/72880/15/lib/vdsm/virt/vmdevices/core.py

--
Francesco Romani
Red Hat Engineering Virtualization R & D
IRC: fromani

_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel