Ionrock Dot Org

by Eric Larson

My Weblog

XML Stubs

Lately I have been considering the best way to stub out XML. Often times I will want to start with some arbitrary XML and add extra elements or change defaults, so the question comes up which is the best way to handle it.

The first option is to create a stub. For example, a potential Atom Entry stub might be something like this:


atom_stub = '''<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
  <title>  </title>
  <id>  </id>
  <updated>  </updated>
  <content type="xhtml">
    <div xmlns="http://www.w3.org/1999/xhtml"></div>
  </content>
</entry>

The problem with this is that as you begin trying to organize these kinds of snippets, you end up wanting to go ahead and initialize values as well as changing the stub slightly. In the above example, if you wanted to change the atom:content type to “html”, you would need to change the attribute and remove the “div” element.

Assuming you don’t need to change stubs very often, you still might create a rather large library of stubs. The obvious way to handle this would be to start giving stubs (or sets of stubs) their own files. This presents a potential problem in that depending on your needs, you’ll either end up with a file for each stub or a set of modules that have nothing but strings in them. The file case requires you to consider some sort of resolver or using pkg_resources to grab the values reliably. The string case could be problematic if due to potential escaping issues as well generally being unfriendly to edit.

The other option is to create classes, or more generally, python code for your stubs. This addresses many of the issues mentioned above but also presents its own problems. For example, here would a potential stub of the above example using Amara:


ATOM_NS = u'http://www.w3.org/2005/Atom'
XHTML_NS = u'http://www.w3.org/1999/xhtml'

class AtomEntry(object):
    def __init__(self, details={}):
        self.entry = self._initialize()

    def _initialize(self):
        doc = amara.create_document(u'entry', ATOM_NS)
        doc.entry.xml_append(doc.xml_create_element(u'id', ATOM_NS))
        doc.entry.xml_append(doc.xml_create_element(u'title', ATOM_NS))
        doc.entry.xml_append(doc.xml_create_element(u'updated', ATOM_NS))
        doc.entry.xml_append(
            doc.xml_create_element(
                u'content', ATOM_NS,
                attributes={u'type': u'xhtml'}
        ))
        doc.entry.content.xml_append(doc.xml_create_element(u'div', XHTML_NS))
        return doc

This is relatively clean but as you start trying to initialize data such as adding default text or including text from parameters (ie the details dict passed to the constructor), the code will become much larger. Also, it is not as clear what the code is actually doing. It doesn’t take long to see it is working with XML, but actually knowing what that stub will look like in the end without an understanding of the library could create a learning curve.

Some other options to consider would be using something like XSLT plus a valid base document to create stubs. You still end up with the similar file resolution vs module issues, but you can keep your base stub the same and simply modify the XSLT to alter the stub. This would probably not keep the code small but it would provide both a physical distinction between the python and XML stub code as well as make it somewhat clearer what things will look like. Yet another option is to try and improve the syntax by finding patterns in your stub to create a wrapper around some internal XML object like Amara. You could then potentially use a set of of core data types to create the XML. It might make stubs look something like:


atom_stub = {
    'entry':
    {'title': ''},
    {'id': ''},
    {'updated': ''},
    { ('content', ('type', 'xhtml')):
        {'div': '' }
    },
}

This starts to look like a reasonable option, but I honestly would avoid it. The syntax seems error prone and you have to create your own parser all because you didn’t want to write things out via a DOM-ish syntax. While I am not a fan of DOM proper, Amara’s interface for writing XML is actually pretty reasonable. There is no silver bullet to these issues, but it is something to think about as it is always just grunt work creating stubs in that are helpful and maintainable.

If anyone does have any other good ideas, please share!

Posted Tue Sep 18 15:42:45 2007 by Eric Larson

accountpackage - Basic Functionality Working!

Basic Functionality Working!

This is something of a proof of concept with lots of different ways todo things. It is essentially just a very simple AtomPub store forcredits and debits. That's it. It has a restful-ish interface via aPOST to {base_url}/credit/ or {base_url}/debit/ which adds a value tothe credits or debits respectively. The total is found by adding upall the credits and subtracting the debits. I am sure this will hit awall at some point, but I realize that the essence of the total can becached and stored in a collection (!) and client side tools can add afew hundred values easily enough.

The todo will be to add things like tags and descriptions so you cansay things like "Made money selling fake drugs to stupid kids... afterall drugs don't pay...". I think this points to a larger issue thatamplee is actually covering very well, which is rendering feeds viaattributes/elements. All the feeds should be GETable and then easilyjoined via indexes of elements and/or attributes. Bright Content doesthis to an extent, but a more generic means of indexing via XPath andfiltering via similar methods seems positive.

That is probably the next step, but I'm walking slowing in order tonot create something like Yahoo! Pipes. It is a lot of cool, but realusage is always going to be more customized. That means a couplelibraries and a bunch of implementations specific to a domain... atleast IMHO.

Posted Thu Sep 13 23:34:24 2007 by Eric Larson

Don’t make it a design problem

I have heard some recent discussions recently that made me think about what kinds of problems you want to provide users. That seems like heresy in that you really never want to give users problems, but in the real world the facts suggest that users will have issues with software. As a programmer it is our responsibility to consider what will be an issue in terms of finding potential bugs. We should also consider going beyond bugs and consider what situational issues could arise from using our software along with how users can (and will) work around known issues. This is different from something like test coverage where a user hits a corner case that wasn’t coded for by the developer. The “issues” in this case are situational and would be different for every single user depending on how they consider the application within the scope of the virtual world.

In the world of HCI this is pretty much analyzing how well the application design matches the perceived viewpoint of the user. In other words, how well does the application meet user’s expectations. This is where you see metaphors come into play and dictate a real world situation where people map experiences on top of some application they are using. The problem goes beyond this though, because even within a metaphor or design, there are questions the user must answer in order to use the application. This is where a great application can buckle under the greatest design.

In an application like MS Word, there are many types of users. There are those people who struggle to get hanging indents and work with random changes affecting the entire document and users who manage to reflect daily on proper use of obscure dialogs and settings. In both of these cases, the user is forced to make a design decision in how they create their document. This is very similar to what objects a developer will use to create an application, which is always a very subjective decision depending on the constraints and requirements. It is this kind of issue that developers, if possible, should avoid at all costs.

Take subversion for example, they made a decision to push the terminology of branches, trunk, and tags to be a documentation issue. They rely on conventions to push users towards a workflow instead of hard coding those items within the application. From the developer standpoint this is great because there is no end to wealth of configurations and work flows you can use. But from the standpoint of someone trying to get something done, it doesn’t make a bit of difference. I am not saying it is not handy or valuable, but rather there was some cost for the added flexibility.

When you are creating an application for a more traditional user, it is important that they do not get hung up on details that do not reflect getting actual work done. In the case of subversion, a poorly laid out repository can be a real pain. In the case of a traditional user application, it can be a crippling bit if flexibility that has catastrophic affects on both the productivity with the application as well as the continued success. This does not mean all applications should lock user’s into their internal model, but rather sacrificing flexibility in order to reduce the design issues can prevent issues from ever cropping up. In addition to this, it can create the market for other orthogonal applications that can compliment each other in completely different means. Applications centered around deployment are one good example of a tool that can be complimentary to a confined application.

It is not that flexibility is inherently bad, but forcing the user to make a design decision will always leave the door open for poor experiences, which in the long run, makes the application difficult to use.

Posted Mon Sep 10 22:09:28 2007 by Eric Larson

Mapping is Bad

Lately there has been a good deal of discussion regarding Object Relational Mappers for Python. The discussion (from what I can tell) stemmed from using things like Elixir with SQLAlchemy and introducing more problems. I think this issue is almost identical to the different marshalling libraries for XML. The real issue is a perceived deficiency in some core data model compared to the programming model. There is a concern that having to think in two different conceptual paradigms will greatly slow development. Another aspect of this, maintenance of a system with different languages and models sprinkled around. While there are some tough issues you can run into when constantly switching between models, the cost of normalizing each model to the programming paradigm is extremely costly.

When thinking about this issue, I always come back to casting. Before I understood generics in C#, I was constantly using generic collections where I had to constantly be considering the type of the object. Even though I eventually realized my issues were solvable, it became clear that constantly having to reshape information in order to use it is time consuming. One thing someone can do is to create a translator or interface to more easily make transitions between types. This is exactly the path an ORM takes as well as an XML marshalling library. It aims to solve the problem of translating some database or XML to the model of the programming language, which is more often than not, and object oriented model.

Another path is to look at the problem from a build management standpoint. Instead of thinking of how to constantly translate data to one paradigm, try programming according to the data. This is the natural pattern for XML when you use tools such as XPath and XSLT. For example, in the XML as data case, you can just query the document via XPath for some value or run an XSLT on the file to change it to what you need. At this point the problem is not how to deal with the data, but rather how to deal with your build system including potentially many different models. At this point the question comes closer to should I name my folder “xslt” or “transforms” and how do I resolve those file locations.

This second case is not a trivial issue of course, but it is one that has been solved. In Python, we have the package resource tools such as eggs, easy_install, and setuptools. C# also has compiled transforms and the ability to save an XSLT as a dll, which means you could work with it from the GAC or use it via traditional Visual Studio project inclusion. None of this is perfect, but by changing the problem space from being an issue of translation between data types, the issue is working with files. This problem has been around for quite a long time so there are plenty of ways to find solutions that can fit within any application.

That said, tools such as SQLAlchemy and Amara provide a great way to get the simple stuff done quickly while staying out of the way when things get complicated. The cost is a slightly less optimized API, but overall the benefits are huge because the maintenance question is already answered with the fuller featured API. It can be tough at times to switch models all the time, but with the web being what it is and developers already feeling pressure to understand more languages and paradigms, accepting the challenge only seems like a good first step to eventually finding better solutions to constant transitions.

Posted Wed Sep 5 16:15:35 2007 by Eric Larson
Created using Python, jQuery and Emacs