Re: [ovirt-devel] [vdsm] Engine XML: metadata and devices from XML

18 Mar 2017

      On Wed, Mar 15, 2017 at 2:28 PM Francesco Romani <fromani@redhat.com> wrote:
...
Hi everyone,
This is both a report of the current state of my Vdsm patches for Engine
XML support, and a proposal to how move forward and solve
the current open issues.
TL;DR:
1. we can and IMO should reuse the current JSON schema to describe the
structure (layout) and the types of the metadata section.
2. we don't need a priori validation of stuff in the metadata section.
We will just raise in the creation flow if data is missing, or wrong,
according to our schema.
2. we will add *few* items to the metadata section, only thing we can't
express clearly-or at all in the libvirt XML. Redundancy and verbosiness
will be thus kept at bay
3. I believe [3] is the best tool to do (de)serialize data to the
metadata section. Existing tools fits poorly in our very specific use case
Examples below
+++
Long(er) discussion:
I have working code[1][2] to encode any custom, picklable, python
object in the metadata section.
We should decide which module will do the actual python<=>XML
transformation.
Please note that this actually also influences how the data in the
medata section look like, so the two things are a bit coupled.
I'm eager to reinvent another wheel, but after
initial evaluation I honestly think that my pyxmlpickle[3] is the best
tool for the job over the current alternatives: plistlib[4] and
xmltodict[5].
I added the initial rationale here:
https://gerrit.ovirt.org/#/c/73790/4//COMMIT_MSG
I have completed the initial draft of patches to make it possible to
initialize devices from their XML representation [6]. This is the bare
minimum we need to support the Engine XML, and we *need* this anyway to
unlock the cleanup we planned and I outlined in my google doc.
So we are progressing, but I'd like to speed up things. Those [6]
patches are not yet complete, many flows are not covered or tested;  but
they are good enough to demonstrate that there *are* pieces of
information wen need to properly initialize the devices, but we can't
easily extract from the XML.
First examples that come to my mind are the storage.Drive UUIDs; there
could also be some ambiguity I'm investigating right now for
displayIp/displayNetwork in Graphics devices. In [6] there are various
TODO to mark more of those cases. Most likely, few more cases will pop
out as I cover all the flows we support.
Long story short: it is hard to correctly rebuild the device conf from
the XML. This is why in [6] I added the 'meta' argument to from_xml_tree
classmethod in [7].
'meta' is supposed to be the device metadata: extra data related to a
device which doesn't (yet) fit in the libvirt XML representation.
For example, we can store 'displayIp' and 'displayNetwork' here and be
done with that: using both per-device metadata and the XML
representation of one graphic device, we will have everything we need to
properly build one graphics.Graphics device.
This example may (hopefully) be bogus, but I'm keeping it because it is
one case easy to follow.
The device metadata is going to be stored in the vm metadata for the
short/mid term future. Even if the per-device metadata idea/RFE is
accepted (no answer yet, but we are working on it), we will not have in
7.4, and unlikely in 7.5.
As it stands today, I believe there are two open questions:
1. do we need a schema for the metadata section?
2. how do we bind the metadata to the devices? How do we know which
metadata belongs to which metadata, if we don't have aliases nor
addresses to match? (e.g. very first time the VM is created!)
My current stance is the following
1. In general, one schema gives us two benefits: 1.a. we document how
the layout of the data should be, including types; 1.b. we can validate
the data we receive.
So yes, we need a schema, but we don't need a *new* schema. I think we
are in good enough shape with the current Vdsm schema: we can just
translate the python object layout to a XML layout.
One example is probably more explicative. Some actual data may look
like, using my pyxmlpickle module:
<domain type='kvm' id='5'>
  <name>a0</name>
  <uuid>ccd945c8-8069-4f31-8471-bbb58e9dd6ea</uuid>
  <metadata xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0"
xmlns:ovirt-vm="http://ovirt.org/vm/1.0"
xmlns:ovirt-instance="http://ovirt.org/vm/instance/1.0">
    <ovirt-tune:qos/>
    <ovirt-vm:vm/>
    <ovirt-instance:instance>
      <ovirt-instance:value type="dict">
        <ovirt-instance:item key="devices" type="list">
          <ovirt-instance:item index="0" type="dict">
            <ovirt-instance:item key="device"
type="str">vnc</ovirt-instance:item>
            <ovirt-instance:item key="specParams" type="dict">
              <ovirt-instance:item key="displayNetwork"
type="str">ovirtmgmt</ovirt-instance:item>
              <ovirt-instance:item key="displayIp"
type="str">192.168.1.53</ovirt-instance:item>
            </ovirt-instance:item>
            <ovirt-instance:item key="type"
type="str">graphics</ovirt-instance:item>
          </ovirt-instance:item>
          <ovirt-instance:item index="1" type="dict">
            <ovirt-instance:item key="device"
type="str">spice</ovirt-instance:item>
            <ovirt-instance:item key="specParams" type="dict">
              <ovirt-instance:item key="displayNetwork"
type="str">ovirtmgmt</ovirt-instance:item>
              <ovirt-instance:item key="displayIp"
type="str">192.168.1.53</ovirt-instance:item>
            </ovirt-instance:item>
            <ovirt-instance:item key="type"
type="str">graphics</ovirt-instance:item>
          </ovirt-instance:item>
          <ovirt-instance:item index="2" type="dict">
            <ovirt-instance:item key="poolID"
type="str">5890a292-0390-01d2-01ed-00000000029a</ovirt-instance:item>
            <ovirt-instance:item key="imageID"
type="str">66441539-f7ac-4946-8a25-75e422f939d4</ovirt-instance:item>
            <ovirt-instance:item key="domainID"
type="str">c578566d-bc61-420c-8f1e-8dfa0a18efd5</ovirt-instance:item>
            <ovirt-instance:item key="device"
type="str">disk</ovirt-instance:item>
            <ovirt-instance:item key="path"
type="str">/rhev/data-center/5890a292-0390-01d2-01ed-00000000029a/c578566d-bc61-420c-8f1e-8dfa0a18efd5/images/66441539-f7ac-4946-8a25-75e422f939d4/5c4eeed4-f2a7-490a-ab57-a0d6f3a711cc</ovirt-instance:item>
            <ovirt-instance:item key="volumeID"
type="str">5c4eeed4-f2a7-490a-ab57-a0d6f3a711cc</ovirt-instance:item>
          </ovirt-instance:item>
        </ovirt-instance:item>
      </ovirt-instance:value>
    </ovirt-instance:instance>
  </metadata>
 <!-- omitted for brevity -->
</domain>
Please note that yes, this is still verbose, but we don't want to add
much data here, for most of information the most reliable source will
be the domain XML. We will add here only the extra info we can't really
fetch from that.
2. I don't think we need explicit validation: we could just raise along
the way in the creation flow if we don't find some extra metadata we
need. This will also solve the issue that if we reuse the current schema
and we omit most of data, we will lack quite a lot of elements
marked mandatory.
...
Once we reached agreement, I will update my
https://docs.google.com/document/d/1eD8KSLwwyo2Sk64MytbmE0wBxxMlpIyEI1GRcHDk...
accordingly.
Final note: while device take the lion's share, we will likely need help
from the metadata section also to store VM extra info, but all the above
discussion also applies here.
+++
[1]
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:vi...
- uses xmltodict
[2]
...
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:vi...
ported the 'virt-metadata3' topic to pyxmlpickle
[3] https://github.com/fromanirh/pyxmlpickle

Looks good, I like the simple loads() and dumps().

Issues:
- index attribute seems unneeded
- not sure why we need the value element, seems that everything can be an
item
- not sure special root element is needed, why not parse the contents of
  any element, and raise ValueError if there is not type info?
- strange name - here some alternative names:
  - xmlon (xml object notation)
  - pxon (python xml object notation)

If there is no other library that we can use, this seems to be
the best direction.

Bug I think plist xml format is much nicer - it should be easy to
support this format instead of the type attributes:

pyxmlpickle:

<pyxmlpickle>
    <value type="dict">
        <item key="foo" type="str">bar</item>
        <item key="list" type="list">
           <item index="0" type="int">42</item>
           <item index="1" type="float">3.14</item>
        </item>
    </value>
</pyxmlpicle>

plist:

<plist>
    <dict>
       <key>foo</key>
       <string>bar</string>
       <key>list</key>
       <array>
          <integer>42</integer>
          <real>3.14</real>
       </array>
    </dict>
</plist>

Less noise, more readable, easier to parse?

Since we cannot use plistlib as is, we can make
the format nicer and more pythonic:

<py>
    <dict>
       <key>foo</key>
       <str>bar</str>
       <key>list</key>
       <list>
          <int>42</int>
          <float>3.14</float>
       </list>
    </dict>
</py>

Adding namespace will ruin this but if this required by libvirt
we have no choice.
...
[4] https://docs.python.org/2/library/plistlib.html
We cannot use it as is, since it does not support reading and writing
elements, only complete document. We need integration with etree
- convert dict to etree element and etree element to dict. Finally, it does
not support None.

We can borrow code from this module, it is well tested and the author
is the same author of simplejson and other nice stuff.
...
[5] https://github.com/martinblech/xmltodict
xmltodict converts everything to string, and does not parse the
type from the xml, this is very far from the json module.

I found also dicttoxml, which does keep the types, but does
not support parsing xml.

I don't see any value in including this dependency and hacking
around it to make it do what we want.

There is also:
https://pypi.python.org/pypi/xmljson

Using the parker module seems nice:
...
...
...
from xml.etree.ElementTree import tostring, fromstring
d = {'devices': [{'int': 42, 'none': None, 'float': 3.14, 'string':
'value'}]}
from xmljson import parker
root = parker.etree(d)[0]
tostring(root)
'<devices><int>42</int><none>None</none><float>3.14</float><string>value</string></devices>'
parker.data(root)
OrderedDict([('int', 42), ('none', 'None'), ('float', 3.14), ('string',
'value')])
But the list was converted to dict :-)

Seems that this project focus on converting any xml to json
while we care about converting json to xml and back, we
don't care about any xml.

The integration with etree is nice, we should have this.

[6]
...
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:vm...
[7] https://gerrit.ovirt.org/#/c/72880/15/lib/vdsm/virt/vmdevices/core.py
--
Francesco Romani
Red Hat Engineering Virtualization R & D
IRC: fromani
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] [vdsm] Engine XML: metadata and devices from XML

Nir Soffer