Wednesday, 10 June 2009

XML, Objects and Python

I briefly mentioned that I had created two object related schemas for XML in my previous post and then elaborated on XOMS, the mapping schema. This time I'll elaborate on XOS - XML Object Schema. This is a schema I created to allow the specification of classes in XML (although I've called them objects for the XML syntax). I consider this less interesting than the mapping, although the implementation of the XOS processor in python led to some more interesting problems to overcome.

First though, a quick overview of the XML schema. This is simpler than the XOMS schema. Here is a schema for the same data model as seen previously:

<xos:object name="Person">
<xos:attribute name="name" type="xos:string" />
<xos:attribute name="address" type="Address" />
<xos:object name="Address">
<xos:attribute name="address" />

Again, this is fairly self explanatory. Given this schema, the XOS processor would produce two classes, one called Person, one called Address, with the attributes specified. In most languages, the XOS processor would really be more of a pre-processor - taking an object model and outputting a set of files that define the classes required. In Python though, I could go a step further and process a model at run-time, creating the classes dynamically and registering them for use the same as any other class.

This is where the more interesting stuff came in - dynamically creating classes in python. It turns out that doing this is remarkably easy, with the following code:

def create_object(object_name):
getattr(sys.modules['__main__'], object_name)
except AttributeError:
class BaseObject(object):
BaseObject.__name__ = object_name
BaseObject.__module__ = '__main__'
setattr(sys.modules['__main__'], object_name, BaseObject)

(apologies for the lack of indentation. It should be clear where the indentation should be though)

This surprisingly simple bit of code creates a 'template' object derived from object, assigns it a name and a module and then registers it in the '__main__' module with the same name. It doesn't create attributes on the object, but with python this isn't required as setattr() can add arbitrary attributes to an object or class. I have plans to add in some ability for this in the future though, as I could then provide chunks of python code to do things such as create SQLAlchemy database objects on the fly.

The next step is then allowing these to be defined in arbitrary modules. For this, I needed another function:

def create_module(module_name):
if sys.modules.has_key(module_name):
previous_module = sys.modules['__main__']
full_mod_name = ""
for mod_name in module_name.split('.'):
full_mod_name = ".".join([full_mod_name, mod_name])
previous_module = getattr(previous_module, mod_name)
except AttributeError:
mod = type(previous_module)(name=full_mod_name)
sys.modules[full_mod_name] = mod
setattr(previous_module, mod_name, mod)
previous_mod = mod

This is a fairly interesting function too. It first checks that the module being asked for hasn't already been loaded (the first line of the function). If it hasn't been loaded then the function loops through the module name split down by '.' and builds up the fully qualified name through the loop. For each loop iteration, it first checks that the previous module doesn't already have a module with the expected name. If it doesn't it creates a module with the fully qualified name, registers it in the sys.modules dictionary with that name as well, then uses setattr() on the previous module to set it with the individual name. It then sets the new module as the previous module and iterates.

The next step for XOS would be to have an inheritance mechanism modelled in the XML. Some preliminary experiments in python have shown that I'll need to use a metaclass to correctly set the base class for the objects created but I haven't finished this yet.

In the end, what this experiment has shown is that it's possible to re-implement the python object and module creation using Python and your own syntax. More surprisingly, what has been shown is that the actual creation is easy! Less than 20 lines of code to create objects and modules, something that wouldn't even be possible in something like C++ and would require huge amounts of reflection code in C# or Java.

Tuesday, 9 June 2009

XML and objects - my approach

So recently, I've been working on a project where I have the two extremes... a very complex tree structure in XML format and a very complex object structure in Python (which is incidentally mapped to an equally complex 56 table database using Elixir). The object structure would make no sense as a tree and XML doesn't work well as objects so I was left to consider alternatives.

My solution was to create a way of specifying the relationship between the XML nodes and the objects and created an XML schema language of sorts to do this (well, I created two, but the second one was more for completeness. I may mention it in a later blog post). I've unimaginatively called this schema the 'XML Object Mapping Schema' (XOMS for short) and a quick sample is as follows:

<person xoms_to="Person" name="">
<address xoms_to="Address" xoms_link="Person.address" xoms_content="Address.address">

It should be fairly obvious what the above should do. It says that a 'person' element should be mapped to a 'Person' object and it's 'name' attribute should be shoved into the '' property. The 'address' node should similarly be mapped to an 'Address' object and the content of an 'address' element should be shoved into the 'Address.address' property. The xoms_link says that this Address object should then be linked to the Person.address property. So with this XML:

<person name="Joe Bloggs">
1 Somewhere drive, Someplace, Earth

would produce the following objects:

[<person name="Joe Bloggs" address="<Address address="...">" >, <Address address="...">]

with a Python implementation.

I've got a fair few more features than this now implemented in a Python library that allows automated object construction from XML to a fairly arbitrary object structure, including the ability to call functions on the objects with parameters take from subnodes (for when I can't manage what I need just with XOMS). It's not perfect but it's doing well enough to populate the above mentioned data models and it's a lot leaner than my original approach. With about 500 lines of code and about 100 lines of XML I've replaced the same amount of XML processing, except that my new approach is complete where my original code was only mapping about 10 objects and not all their attributes. If I'd gone down that road I'd have gotten a couple of thousand lines of code that would have been messing, horrible to maintain and impossible to see where everything was going. This way I have 500 lines of code that performs some magic and all of the interesting stuff is kept in the XOMS file where I can easily see what's happening and change it.

So I've now become guilty of that heinous crime - solving problems with XML by throwing more XML at it... this time it seems to have worked though!