Lately I have been considering the best way to stub out XML. Often times I will want to start with some arbitrary XML and add extra elements or change defaults, so the question comes up which is the best way to handle it.
The first option is to create a stub. For example, a potential Atom Entry stub might be something like this:
atom_stub = '''<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
<title> </title>
<id> </id>
<updated> </updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml"></div>
</content>
</entry>
The problem with this is that as you begin trying to organize these kinds of snippets, you end up wanting to go ahead and initialize values as well as changing the stub slightly. In the above example, if you wanted to change the atom:content type to “html”, you would need to change the attribute and remove the “div” element.
Assuming you don’t need to change stubs very often, you still might create a rather large library of stubs. The obvious way to handle this would be to start giving stubs (or sets of stubs) their own files. This presents a potential problem in that depending on your needs, you’ll either end up with a file for each stub or a set of modules that have nothing but strings in them. The file case requires you to consider some sort of resolver or using pkg_resources to grab the values reliably. The string case could be problematic if due to potential escaping issues as well generally being unfriendly to edit.
The other option is to create classes, or more generally, python code for your stubs. This addresses many of the issues mentioned above but also presents its own problems. For example, here would a potential stub of the above example using Amara:
ATOM_NS = u'http://www.w3.org/2005/Atom'
XHTML_NS = u'http://www.w3.org/1999/xhtml'
class AtomEntry(object):
def __init__(self, details={}):
self.entry = self._initialize()
def _initialize(self):
doc = amara.create_document(u'entry', ATOM_NS)
doc.entry.xml_append(doc.xml_create_element(u'id', ATOM_NS))
doc.entry.xml_append(doc.xml_create_element(u'title', ATOM_NS))
doc.entry.xml_append(doc.xml_create_element(u'updated', ATOM_NS))
doc.entry.xml_append(
doc.xml_create_element(
u'content', ATOM_NS,
attributes={u'type': u'xhtml'}
))
doc.entry.content.xml_append(doc.xml_create_element(u'div', XHTML_NS))
return doc
This is relatively clean but as you start trying to initialize data such as adding default text or including text from parameters (ie the details dict passed to the constructor), the code will become much larger. Also, it is not as clear what the code is actually doing. It doesn’t take long to see it is working with XML, but actually knowing what that stub will look like in the end without an understanding of the library could create a learning curve.
Some other options to consider would be using something like XSLT plus a valid base document to create stubs. You still end up with the similar file resolution vs module issues, but you can keep your base stub the same and simply modify the XSLT to alter the stub. This would probably not keep the code small but it would provide both a physical distinction between the python and XML stub code as well as make it somewhat clearer what things will look like. Yet another option is to try and improve the syntax by finding patterns in your stub to create a wrapper around some internal XML object like Amara. You could then potentially use a set of of core data types to create the XML. It might make stubs look something like:
atom_stub = {
'entry':
{'title': ''},
{'id': ''},
{'updated': ''},
{ ('content', ('type', 'xhtml')):
{'div': '' }
},
}
This starts to look like a reasonable option, but I honestly would avoid it. The syntax seems error prone and you have to create your own parser all because you didn’t want to write things out via a DOM-ish syntax. While I am not a fan of DOM proper, Amara’s interface for writing XML is actually pretty reasonable. There is no silver bullet to these issues, but it is something to think about as it is always just grunt work creating stubs in that are helpful and maintainable.
If anyone does have any other good ideas, please share!