Monday 7 December 2009

NWRUG Code Kwoon, run by Ashley Moran of PatchSpace

I recently made my first foray into the deep, geeky depths of local user groups and went to a session being run by NWRUG (the North West Ruby User Group). The session was a 'Code Kwoon' designed to introduce people to the wonders of RSpec and Behaviour Driven Development and was run by Ashley Moran of the company PatchSpace.

Now, if you've been involved in Ruby at all in recent years, you probably will have at least heard of RSpec and BDD, but if (like me) you were living like a hermit crab and only sticking your head out from under a rock occasionally then you probably won't have gone any further than that. I personally had abandoned my rock a month or so previously and had started delving into RSpec and related BDD tools in a fairly serious manner but was on the lookout for anything that would help improve my knowledge of this area. This Code Kwoon seemed like a good opportunity.

Unfortunately, the Kwoon was meant for a much more introductory level than even me, being aimed at the people who had just arrived on the BDD planet and were blinking, stepping into the RSpec sun. But before I go into more detail on that, I should step back and give a brief (and probably wrong) explanation of RSpec, BDD and a 'Code Kwoon'.

So, RSpec. Where to start? Well, it's difficult to start on a description of RSpec without mentioning BDD so I should probably introduce that first... so BDD. Where to start? Well, BDD is a new philosophy rising out of the more mechanical process of Test Driven Development (TDD) and is gaining a lot of ground in the Ruby and RoR community currently. The driving principle, at least to me, is that while TDD says what to do (e.g. write your tests first), it lacks in saying what to test and how to test it, making it a mechanical process that is still a bit lacking. That's not to say it doesn't work (just look at all the TDD frameworks, books, and TDD 'best practices' that have arisen) but one of the key things about these is that they are all their own distinctive version of TDD. There is some overlap, but mainly at the mechanical level of writing your tests before writing the rest of your code.

What BDD brings to this is along the same lines as those frameworks in that it provides a version of TDD. However BDD recognises that this is required, and so names itself differently in order to make the distinction. TDD is the process you are following, but BDD is the set of principles guiding what you test and how you test it. Having gone through all that detail, the 'meat' of BDD is deceptively simple... you test 'behaviour'. This is done in a variety of ways, and it encompasses both the traditional unit testing granularity, and the integration testing, functional testing, all the way up to acceptance level testing. Now, I could probably go on and on about this and end up going around in circles (or possibly circling a drain) so I'll just leave it there, but with a mention that BDD tends to focus heavily on mocking software objects (I suggest you google this as many others have gone over what mocks are in a much more eloquent way than I could manage).

So, back to RSpec, which is now also deceptively simple to explain. Basically, RSpec is a framework (written in Ruby and designed for testing Ruby code) that implements a lot of the philosophy of BDD. It changes the language of tests from simple assertions to statements about what the code should be doing. While these are almost exactly the same in terms of physical implementation in the language, the change it makes in terms of understanding tests is remarkable. No longer are you doing a mechanical process of calling a function and ensuring it has an expected result. Instead you are saying that this function should do this, or that. In rails testing it really comes into it's own as you can easily write RSpec tests (called specs) that say that a particular action should be a success and do this and that, as opposed to the more traditional testing where you have a function call and then a series of fairly dry assertions about the result.

Now that I've thoroughly confused everyone regarding RSpec and BDD, it's time to muddy the waters with the 'Code Kwoon' aspect of the evening. Born from the (possibly feverish ;) ) imagination of Ashley as a 'Good Idea', the basic idea is similar to the 'coding kata' idea, except in a different language (Chinese rather than Japanese, I believe Ashley said). It's a way to practice coding skills with a specific problem and in this case it was being done in a pair-driven fashion with hot-seat pairs.

So, the evening... I mentioned previously that the level was a bit more introductory than I originally anticipated. This was mainly due to the fact that many of the attendees were making their very first forays into RSpec and the BDD arena. The problem on the table was a 'Poker hand recogniser' that was to be able to take in a series of Poker hands (potentially with different variations of Poker such as Texas Hold 'em or 5 card stud) and determine a winner. Given that the time allotted to the session was about 80 minutes this could be seen as a trifle ambitious ;) However, it worked in the sense that it wasn't a trivial problem, so it illustrated the process much better. To me the evening also showed what I'd describe as a 'clash' of methodologies. As said, there were a lot of people there making their first foray into RSpec land, and some attendees who were well established in the ways of RSpec and BDD. With an initial chunk of development done by established RSpecers, the hot-seating system took full force and some new-comers to the scene were taking development, and the change couldn't really be more obvious. Where the RSpecers are familiar with getting the tests to pass and then refactoring (so that you know you have a solution that passed the tests and your refactoring is only neatening things up), the newcomers were what I'd call 'traditional' developers and when faced with a failing test tried to build in new abstractions without fixing the failing test. The development at this point basically stopped as the hot-seat coding then meant that people were swapping out and then spending time changing the abstraction from the previous persons to how they thought about the problem and it wasn't until close to the end of the second session that the failing test was finally fixed (by going back to basics) and more tests were added.

Thus, the evening ended up not showing as much about RSpec as I thought it would (and probably not as much as the organisers intended) but it was an educational experience anyway. It showed how much trouble can occur when differing development styles clash (very much exaggerated by the quickness of the hot-seat - 7 minute slots), and it showed how more traditional developers try to solve a problem by redefining the problem rather than just getting it fixed and redefining and improving the code from a more solid foundation.

Thursday 19 November 2009

Enterprise Rails - Review

I've been reading this on the way to and from work this week and I have to say it's definitely a book that exceeded my initial expectations.

My initial thoughts about the books contents were (probably in line with most peoples thoughts of a book called 'Enterprise Rails') were that it would be filled with details on XML and SOAP and SOA and have very little information that is in lines with 'Railisms' that keep the elegance of the Rails framework. I figured there may be some nuggets of information about scalable architecture that would prove useful in the long run, which is why I acquired a copy.

As it turns out, I was wrong on almost every front. The book keeps 'Railisms' very much intact, instead concentrating on the areas that are ignored or under-treated in other rails books. It starts with chapters on code organisation using plugins and modules (and I immediately adopted the module organisation for one of my projects, before it was too late). It then moves on to several chapters targeted entirely at the database. Now, I don't consider myself bad at DB design and implementation. I can produce a data layout fairly easily that conforms to 3NF but I normally stopped there. The author here doesn't. He pushes well beyond this point into Domain Key Normal Form, shows how to easily base ActiveRecord models off views, how to ensure referential integrity at the database layer instead of in the application (where it is surprisingly easy to bypass even keeping within the ActiveRecord API) and generally pushing the database back up to a solid, working part of your application. In direct contrast, most rails books consider the database as of secondary concern and usually completely abstracted away by the use of ActiveRecord and migrations. The author acknowledges this viewpoint but points out that it is driven very much by applications that haven't reached the complexity of a 'simple' enterprise application and moreover it is supported in many ways by MySQL, which lacks the features of commercial quality databases and leads developers to think that these features are unimportant. The author makes the valid point that by the time these features become important (because your application has become hugely popular and is dying under the load) it is often too late to implement them in a complete fashion. So the chosen route is to engineer in all the constraints from the start and to choose PostgreSQL as the database, which is an open-source offering that DOES offer most of the features of a commercial offering.

That encompasses the first half of the book (the book is just over 300 pages long), and it is a testament to the quality of the author that he has fitted so much data into this space while not skimping on the quality of explanations or code samples. After just under a week of reading I have gone through the database and just started on his sections on SOA... and so far, it's as good quality as the database sections! Rather than the rather hazy descriptions regarding services you frequently get before someone launches into complicated SOAP envelopes and overwhelming a reader with XML, instead we get a picture of SOA as an architecture that does to monolithic web applications what OO design did to procedural coding.

Unfortunately, that is as far as I've read so far but based entirely on the first 2/3s of the book, I would say that this book is a must for any serious rails developer. It is well written, sensible, it keeps all the lovely rails conventions us developers love but fills in the gaps where Rails doesn't quite cover the ground completely. You may think you don't need the advice and code from this book, and you may be right. But if you plan on creating the Next Big Thing (tm) and building a site that WILL scale, then this book is a definite read.

Thursday 29 October 2009

Ruby XML Builder prefixes

This is a topic I've come across a few times now, and more recently turned up when someone asked a question in an IRC channel (it was either #ruby or #rails on irc.freenode.net, I can't quite remember which now). The basic problem was that they wanted to output an XML prefix to the tags generated with Builder. Anyone who has used Builder will be familiar with it's syntax:
xml.someTag do 
xml.anotherTag "tag content"
end
and can see that this doesn't work with a tag prefixed with an xml namespace, which includes a : (this being a special character in ruby). So what is the solution? The person asking the question actually got the answer from several people that it wasn't possible, but that didn't seem right. And it turns out that it is perfectly possible, it just requires a slightly more verbose syntax. Instead of
xml.someTag
you need to do
xml.tag! "somePrefix:someTag"
where the tag! function on an XML Builder object takes a string representing the entire tag and outputs it as is.

It turns out the consideration of the Builder creators didn't stop there. They have functions to allow for the full range of standard XML to be created. Need a CDATA field? use xml.cdata!, need to add, comments? Use xml.comment!, need to create a node with mixed text and child nodes? Use xml.text! like so:
xml.myNode do
xml.myChildNode "awesome"
xml.text! "More awesome"
end
All of these are supported with functions with a ! at the end, which keeps them nice and separate from tags that you would typically create.

And to really add to the point that these things were considered by the Builder creators, they even have a simpler form for using prefixes in XML now. You simply add a space between the prefix and the : to create an expression like:
xml.myPrefix :myElement, "look at this!"

or in more common ruby language, if you pass a symbol as the first argument when creating a tag, it will take the tag as a namespace prefix and the symbol as the actual tag name.

So yes, it is possible to add prefixes with Builder, and more than possible, it's simple! There's no excuse for saying it can't be done.

Monday 21 September 2009

Search and Indexing

I've had a lot of exposure to full text indexers since I started working at HedTek Ltd. From Lucene to Solr and now even Sphinx, I feel it's time to write up some of my experiences.

Lucene
Probably the best known of the indexers I've now encountered, Lucene is an Apache project that aimed (and succeeded) at implementing an efficient, simple and useful full text indexer in Java. This project is a great library for creating your own search indexes and performing quick searches across it with a familiar syntax. It's also very flexible, allowing you to plug-in your own functionality to index just about any kind of document.

With all that, you'd wonder why anyone would use anything else? Well, it's not all rose gardens with Lucene. Firstly, this is a low-level API designed to be the heart of an index and search engine. It's not a complete solution ready for use straight out the box. Second, in the project where I had my initial exposure to it, the version of lucene in use was the Zend PHP implementation of lucene. While this is an excellent idea (as it allows lucene to be used from PHP directly, no messing around with Java interfaces) there was one key problem - performance. With the index size in use (24 million records) searches that would take under a second with the Java library would take > 10 seconds with the Zend library. This is clearly very undesirable so other options are required.

Solr
Solr is one option to remove the need to interface with Java from your desired language while still retaining the Java lucene implementation. Solr calls itself an 'Enterprise search server' built on Lucene and fills in one of the gaps I mentioned earlier - Solr is a working search engine right out the box. It manages this feat by packaging the Lucene library up into a Java servlet container (runnable through any java server, e.g. Jetty, Tomcat) and providing a HTTP interface for searching. Results can be returned as XML or JSON straight out the box and there are a whole host of other features on top of this that are useful and help an ailing developer create a fully fledged search engine easily. One of the main ones is the ability to define 'schemas' that tell Solr how your records will look, adding a type system to the index and allowing malformed data to be picked up much more easily.

Of course, for Solr you need to have a java server set up. This isn't always the easiest task and there are some subtleties involved that can make this a daunting prospect (I certainly encountered this and still do as I'm not a java server expert... I'm barely a novice). Also the Solr schema is a requirement, so in order to set up your server you need to create a schema for your data. Not a huge imposition, but the schema is defined using an XML language that is a bit opaque to Solr newbies.

Sphinx
The last of the 3 I have tried, and I've only tried it so far on much smaller indexes. Sphinx is another alternative in the full text indexing marketplace, and doesn't rely on Lucene. It functions as a search server (making it more comparable to Solr rather than Lucene) and has several gains on Solr:
  • It is much easier to set up. Where Solr took me over a day to figure out how to install it and get it set up in just a basic configuration, Sphinx took me a bit over an hour to install and configure with a connection directly to a MySQL database.
  • It doesn't need a java server. Sphinx runs as a unix daemon, listening on a local port. This makes it much easier to set up and feels less clunky (at least to me)
  • Very easy to set up multiple indexes. This is possible in Lucene and Solr, but with Sphinx they make it very easy. You have a config file and just define lots of indexes. You can even use the same DB connection for them, allowing you to have indexes that are optimisations of a basic one, which is not as easy in Solr (it may be a simple process, but I haven't come across it yet, making it more effort to find initially with Solr than with Sphinx at the very least)
Sphinx does have disadvantages as well though. The search results it returns are less useful as they contain just the document ID, rather than the lucene results which return stored fields (which can be all you need in certain circumstances and avoid hitting the database after a search). It also seems more geared towards indexing databases, whereas Solr and Lucene are more general purpose. This makes Sphinx great when you are indexing a database but no good if you are indexing a large collection of XML files on disk, or crawling a web page.


So, I haven't come across an absolute winner in the full text indexing arena, but I have come across several alternatives and all of them are suitable for different purposes. If you need something indexed quickly and in Java, use Lucene. If you need a more robust server for general purpose indexing and searching, definitely check out Solr. And if you are searching databases specifically, then Sphinx should definitely be in your list of options.

Thursday 30 July 2009

TDD: The door analogy

Recently, I was explaining what test driven development was to my wife and used a description involving the creation of a door, and I realised this may be a very good way to explain what TDD is, how it's meant to function and why it produces superior results. I've thought a bit more about the analogy and fleshed it out some, so here goes:

Peter has asked you to create a door so you go away and start writing some tests for what the door should do based on his statement of what he wants. You start initially with:
1) The door should have a handle
2) If you turn the handle and push then the door should open

So you go away and create a door that fulfills these tests. You present this to Peter, he opens the door and it does this by falling over. So you go back to the tests and add some tests you missed:
3) When the door is open, you should be able to pull on the handle and it will close
4) The door should stay upright when both open and closed

You then create this door (realising with these extra tests that you needed hinges) and present it again. Peter is happier with this new door, but then notices that if he pushes the door without turning the handle it still opens. This is another missed test so you add it to your tests:
5) If you push on the door without turning the handle the door should stay closed

You then create a door, adding a latch to the door that retracts when you turn the handle and test again. When running the tests you notice that half the time, test's 3 and 5 are failing and you realise that it's because of the construction of the latch. If the door opens in one direction then the latch won't retract automatically when pulling the door closed. You go back to Peter and say you need to clarify what he wants to happen and present him with the following alternatives:
1) The door can only open in one direction so you need to push the door from one side and pull the door from the other
or
2) In order to close the door you must turn the handle in order to manually retract the latch and close the door fully

Peter considers this and says he wants option one. This then causes a rewrite of the test cases to the following:
1) The door should have a handle
2) The door should only open in one direction
3) To open the door twist the handle and either push or pull. Only one of these should work depending on which side of the door you are on
4) When the door is open, you can close the door by performing the opposite action to the one used to open it
5) The door should stay upright when both open and closed
6) If you attempt to open the door without turning the handle the door should stay closed

You rewrite the tests and then run the previously constructed door through these tests to see where the problems are. This time, tests 2, 3 and 6 fail. Looking at the first of these you see that the door opens in both directions, which is now a violation of the tests so you add a ridge to the door frame that stops it from opening in one direction and re-run the tests. This time no tests fail, you present your door to Peter one last time and he is happy with it and installs the door all over his house.

So with this process, you have several iterations and each one improves the door. More importantly, each iteration adds more tests in which show that the door is improved. Now, most people would think 'but a door is obvious, the final version is how I'd have created it initially' but consider... what if Peter had wanted a door that opened in both directions and required you to turn the handle to shut it? You would have created a door that didn't do that and wouldn't have identified the point where it was required. You would have just given Peter his door and he would have gone away less satisfied.

Also, this is a high level example. Consider what you'd do if you didn't know what a door was? You'd look at some doors and create something that looked like it, with no way of knowing it was correct or not. It may work initially, but after some improvements it suddenly starts falling over. Now you are stuck in the position of having no clue about why it's falling over and what it was meant to do initially. So you go 'Right, it shouldn't fall over' and just prop it up so it won't fall over... but then the door doesn't open and you have an annoyed customer. If you had your tests there you would be able to point to what it was meant to do (e.g. test 5, 'stay upright'), prop it up and then when retesting spot that other tests are failing. So you go away and look at the problem some more, coming up with the solution that you need a third hinge on the door to reinforce it and stop it falling over after some use. You also add to your test cases:
test 7 - The door should be able to be opened and shut multiple times without falling over
and present the new door to your customer who is now delighted that you've solved the problem properly.

The analogy is probably a bit strained now, but the principle still holds... the tests are there for more than just 'testing' the system. They are there as a verification, they are there as a safety net, they are there as your specifications and they are there as your guide in an area you may not know much about. If you have something that isn't working correctly (due to a lack of understanding for example), you should identify which test(s) are testing things incorrectly and then modify them to test for correct behaviour. You then re-run these tests *without* changing any code (even if you 'know' what's wrong) so that you verify first that the new tests are failing. If you modify your tests and they still pass then your tests are still wrong (as the program has incorrect behaviour still), but if you modify your tests and your code and your new tests pass you don't know if it was because your new code works perfectly or the modified tests are incorrect.

Now, I know this is all standard to people who are avid TDD people, but I'm still getting up to speed on this methodology and the reasons behind why I've avoided it are:
1) Difficulty of testing - Big things are hard to test, but the little things are trivial and seem like they don't need testing... test them anyway, you never know if you will find a bug there and by testing the little things you are then able to break your big things down into smaller tests that *you've already written* and only test the small bit of new behaviour.
2) Benefits - Until you really think about the process and break down where the tests come in, TDD seems like a silly reversal. Why would it have advantages? Of course, with the above description the advantages are that you more quickly identify problems (in the third iteration, you immediately spot the test failures when adding the latch and ask for what the customer wants done when you realise it's due to mutually exclusive test cases). If you didn't have tests, or your tests were incidental things written after you finished creating your door how you thought it should work then you lose this benefit. There are other benefits that I'm getting clearer on, but they are left as an exercise for the reader ;)
3) 'But some things can't be tested' - This is a common concern, and it is false. Some people see the UI as untestable, but there are a lot of tools nowadays that allow you to test the UI in it's entirety. And before you get to testing the full UI you have a lot of components building it up. These CAN be tested. You can test that they change state as expected when called with fake input. You can get it to draw to a bitmap and check that against a pixel perfect bitmap that is how it *should* look. So you can verify every step of the way and build up bigger tests from well tested components, making this exactly the same as reason 1.

Those are the big reasons for me, and they are very much false reasons. I'm starting to get on board the TDD bandwagon and in the future I intend to have much better tests and try to write them before writing my code :) Of course, if I sometimes fail then it's not the end of the world, but I'll know what to blame when things start going wrong.

Thursday 9 July 2009

Reflections on Python, testing, Continuous Integration and other things

It's been a busy few weeks, both in work and out (for out of work, see Shared Illusions). In work, and of a technical nature, I've been focusing on testing and Continuous Integration and I feel it's time to look back and see what has come out of it.

Firstly, what I've been using for this. The list is:
  • Python
  • Py.test - My first test package. Very nice and easy to use but abandoned for reasons explained later
  • Nose - A nice package available for Python for running tests with a variety of options including profiling and code coverage (using coverage.py)
  • Hudson - A well featured build server package with a plugin architecture and several useful Python related plugins
It should be fairly obvious from my previous posts that I'm using mainly Python for work code at the moment. I've gone on at length about various Python packages and my 'fun' with transforming a complex XML structure into an equally complex and completely disjoint object structure :)

So, onto Py.test. This was my first pick for Python unit testing and I'd written a fair few tests with it. I've now moved away from it, and this is mainly because it was a bit too simple for what I needed for CI. Basically, Py.test is an absolute gem for quickly writing and running unit tests. Simply prefix tests with 'test_' and run py.test in the project's root directory and your tests will be run and results printed out. The approach is great - it's simple, it's transparent and it's fairly stress free. It just doesn't provide some key features from Nose, namely the ability to output the test report in an XUnit XML file and code coverage integration. For a lot of people, these aren't necesarry, but to get the most out of a CI build server, they are very much desirable as then the build server can generate nice graphics over time for test failures and give much more accurate reports on build status. Therefore py.test was abandoned in favour of a more complete tool.

That tool was Nose. This is a more featured test runner for Python and it is pretty easy to transfer from using Py.test to using Nose (and also from using PyUnit to using Nose, if that's what you prefer). For Py.test to nose, you don't need to modify your tests at all, unless you have function/method level fixtures. If you have these then you need to decorate the methods that use them with @setupmethod(setup_func, teardown_func)... hardly a difficult change. Nose will then do the same thing as Py.test and discover all your test functions following a configurable nameing convention (which defaults to 'test_'). For this minimal change, you are now able to run your unit tests with a wider range of report generators (including the aforementiond xunit), profile your code and run code-coverage tools all with Nose plugins and simple command line switches. Well worth it if Py.test just doesn't quite provide the test reports you need.

Finally, I needed an automated build server. While python code isn't built/compiled, the other features of a build server were very much desirable, such as its source code repository monitoring, automated running of unit tests and a nice web interface to view test results and previous builds through. I've previously encountered CruiseControl and was less than impressed with it (not particularly easy to configure, lacking in some key areas, and the interface is a bit clunky). I went looking and found Hudson. This is an 'evil Java app' that runs through tomcat, but is pretty damn good at what it does :) It provides a much cleaner web interface than Cruise Control, it comes with the ability to create new projects through the web interface (I previously had to create my own project creation for Cruise Control) and has the ability to create build scripts within the web interface using Ant targets, Maven targets, shell commands and allows new options to be added with plugins. For all of that, I can stand the minor contamination of java testing my Python code, especially as I don't have to write any java code ;)

So, what was the purpose of all of these tools? Basically, they were to get my projects up to speed with automated unit testing. This is an area I've never managed to get fully on board with (it seemed a waste at uni and I didn't manage to get the build server set up for unit tests at EMCC before they went under) but it's benefits are starting to show themselves. While I don't think the current projects I'm working with will get full unit test suites (not enough time to do so, as they are primarily research rather than production projects), I now have the necesarry legwork out the way to provide much more comprehensive testing in the future and for the testing to be run easily and hassle free every time I commit a change.

Refs:
Hudson: https://hudson.dev.java.net/
Py.test: http://codespeak.net/py/dist/test/test.html
Nose: http://code.google.com/p/python-nose/
Python: http://www.python.org/

Wednesday 10 June 2009

XML, Objects and Python

I briefly mentioned that I had created two object related schemas for XML in my previous post and then elaborated on XOMS, the mapping schema. This time I'll elaborate on XOS - XML Object Schema. This is a schema I created to allow the specification of classes in XML (although I've called them objects for the XML syntax). I consider this less interesting than the mapping, although the implementation of the XOS processor in python led to some more interesting problems to overcome.

First though, a quick overview of the XML schema. This is simpler than the XOMS schema. Here is a schema for the same data model as seen previously:

<xos>
<xos:object name="Person">
<xos:attribute name="name" type="xos:string" />
<xos:attribute name="address" type="Address" />
</xos:object>
<xos:object name="Address">
<xos:attribute name="address" />
</xos:object>

Again, this is fairly self explanatory. Given this schema, the XOS processor would produce two classes, one called Person, one called Address, with the attributes specified. In most languages, the XOS processor would really be more of a pre-processor - taking an object model and outputting a set of files that define the classes required. In Python though, I could go a step further and process a model at run-time, creating the classes dynamically and registering them for use the same as any other class.

This is where the more interesting stuff came in - dynamically creating classes in python. It turns out that doing this is remarkably easy, with the following code:


def create_object(object_name):
try:
getattr(sys.modules['__main__'], object_name)
except AttributeError:
class BaseObject(object):
pass
BaseObject.__name__ = object_name
BaseObject.__module__ = '__main__'
setattr(sys.modules['__main__'], object_name, BaseObject)

(apologies for the lack of indentation. It should be clear where the indentation should be though)

This surprisingly simple bit of code creates a 'template' object derived from object, assigns it a name and a module and then registers it in the '__main__' module with the same name. It doesn't create attributes on the object, but with python this isn't required as setattr() can add arbitrary attributes to an object or class. I have plans to add in some ability for this in the future though, as I could then provide chunks of python code to do things such as create SQLAlchemy database objects on the fly.

The next step is then allowing these to be defined in arbitrary modules. For this, I needed another function:

def create_module(module_name):
if sys.modules.has_key(module_name):
return
previous_module = sys.modules['__main__']
full_mod_name = ""
for mod_name in module_name.split('.'):
full_mod_name = ".".join([full_mod_name, mod_name])
try:
previous_module = getattr(previous_module, mod_name)
except AttributeError:
mod = type(previous_module)(name=full_mod_name)
sys.modules[full_mod_name] = mod
setattr(previous_module, mod_name, mod)
previous_mod = mod


This is a fairly interesting function too. It first checks that the module being asked for hasn't already been loaded (the first line of the function). If it hasn't been loaded then the function loops through the module name split down by '.' and builds up the fully qualified name through the loop. For each loop iteration, it first checks that the previous module doesn't already have a module with the expected name. If it doesn't it creates a module with the fully qualified name, registers it in the sys.modules dictionary with that name as well, then uses setattr() on the previous module to set it with the individual name. It then sets the new module as the previous module and iterates.

The next step for XOS would be to have an inheritance mechanism modelled in the XML. Some preliminary experiments in python have shown that I'll need to use a metaclass to correctly set the base class for the objects created but I haven't finished this yet.

In the end, what this experiment has shown is that it's possible to re-implement the python object and module creation using Python and your own syntax. More surprisingly, what has been shown is that the actual creation is easy! Less than 20 lines of code to create objects and modules, something that wouldn't even be possible in something like C++ and would require huge amounts of reflection code in C# or Java.

Tuesday 9 June 2009

XML and objects - my approach

So recently, I've been working on a project where I have the two extremes... a very complex tree structure in XML format and a very complex object structure in Python (which is incidentally mapped to an equally complex 56 table database using Elixir). The object structure would make no sense as a tree and XML doesn't work well as objects so I was left to consider alternatives.

My solution was to create a way of specifying the relationship between the XML nodes and the objects and created an XML schema language of sorts to do this (well, I created two, but the second one was more for completeness. I may mention it in a later blog post). I've unimaginatively called this schema the 'XML Object Mapping Schema' (XOMS for short) and a quick sample is as follows:

<xoms>
<person xoms_to="Person" name="Person.name">
<address xoms_to="Address" xoms_link="Person.address" xoms_content="Address.address">
</person>
</xoms>

It should be fairly obvious what the above should do. It says that a 'person' element should be mapped to a 'Person' object and it's 'name' attribute should be shoved into the 'Person.name' property. The 'address' node should similarly be mapped to an 'Address' object and the content of an 'address' element should be shoved into the 'Address.address' property. The xoms_link says that this Address object should then be linked to the Person.address property. So with this XML:

<person name="Joe Bloggs">
<address>
1 Somewhere drive, Someplace, Earth
</address>
</person>

would produce the following objects:

[<person name="Joe Bloggs" address="<Address address="...">" >, <Address address="...">]

with a Python implementation.

I've got a fair few more features than this now implemented in a Python library that allows automated object construction from XML to a fairly arbitrary object structure, including the ability to call functions on the objects with parameters take from subnodes (for when I can't manage what I need just with XOMS). It's not perfect but it's doing well enough to populate the above mentioned data models and it's a lot leaner than my original approach. With about 500 lines of code and about 100 lines of XML I've replaced the same amount of XML processing, except that my new approach is complete where my original code was only mapping about 10 objects and not all their attributes. If I'd gone down that road I'd have gotten a couple of thousand lines of code that would have been messing, horrible to maintain and impossible to see where everything was going. This way I have 500 lines of code that performs some magic and all of the interesting stuff is kept in the XOMS file where I can easily see what's happening and change it.

So I've now become guilty of that heinous crime - solving problems with XML by throwing more XML at it... this time it seems to have worked though!

Thursday 14 May 2009

Elixir and SQLAlchemy

After my initial foray into SQLAlchemy I started the process of mapping my current object model onto database tables and came across issues with my custom, hand-rolled relationship classes. While I expect I could have shoehorned these into SQLAlchemy mappers I decided instead that I'd be better served with an object model that was slightly more integrated with the database structure.

Cue Elixir. Elixir is a thin, ActiveRecord style wrapper on top of SQLAlchemy. It provides some useful features (such as simple to use relationship objects, and polymorphic associations) and allows you to drop down into SQLAlchemy for those features that they haven't wrapped, such as AssociationProxy (for doing a relationship through another relationship). Elixir is the missing link in the stack I was building up, giving me a powerful ORM and database layer with a lovely declarative syntax for building up all the objects I needed.

It took me about a week to go from my old model, through SQLAlchemy and into Elixir but in doing so, I have avoided a whole mess of spaghetti code involved in writing my objects out to a database, obtained database independence, and forced myself to refactor out a lot of code that was proving less than suitable for my requirements. On top of that, I should be able to integrate my system into a pylons or django web application much more easily now than I could have a week ago. Now if only I could get a similar system for XML parsing and allow me to concentrate on the important tasks I still have to do :)

Friday 8 May 2009

SQLAlchemy

Continuing on with my sort of 'series' on various libraries and languages, I've recently started to use SQLAlchemy as an ORM layer in my current project.

The reason for using an ORM library came from the immortal problem of suddenly realising that I had a complicated object model defined and an equally complicated database schema and needing to get information between the two of them. I had started to hack together a script that would dump the data in using SQL, but I realised after the first few inserts that it would be a long hard slog that would be inflexible and complicated spaggeti code.

I didn't want to have to rewrite my object model significantly though, which meant I had to find an ORM library that would allow me to set up the mapping myself. I also needed it to work in Python, which then led me to SQLAlchemy. This nifty library comes in several libraries and is very flexible in the right ways to make it suitable for my needs (at least so far).

I'm still in the process of setting up the internal data structures. I'm needing to redo some parts of my internal object system but not significantly so, although I'm worried about it becoming a bit of a mess.

I'll comment on a bit more of the technical side of things in a few days when I've gotten more up to speed with the library.

Friday 24 April 2009

SAX Parsing

After some time playing with minidom in Python, I finally looked at alternatives and discovered SAX. For my usual requirements (pulling data out of XML files into various internal data structures) this is so much of an improvement over minidom that I wish I'd come to it first, rather than deal with the mess that is minidom for reading XML. I'm still learning the ins and outs of SAX, but as a brief summary for those who know it even less than me I'll explain the basics.

SAX is an event-based XML parser. What this means is that it will read in XML and generate events based on what it has read in. Events would be something like the start of an element, the end of an element, etc. Your application then receives these events and can respond to them appropriately. This can entail a small amount of extra book-keeping in your application to know where you are in the XML tree (assuming that's important for your app), but it is so much easier to deal with than the multiple levels of looping through child nodes that is required to pull data out of an entire DOM tree.

My first problem with SAX (at least from what I know of it so far anyway) is that with a complicated XML file you can end up with an extremely large and non-cohesive handler for your file. To get around this I created a SAX content handler that maintains a 'processor stack' that allows you to split down your processing into more flexible, cohesive units. It isn't perfect (it has some ugliness when it comes to transferring data between processors currently, and adding a new processor is something that is hard-coded into each processor) but it makes for a nice system to me as I can split up my processing to any level I like, the processor stack acts as that extra book keeping I mentioned and it does so transparently. It allows for more flexibility in changing the structure and reuse of components.

All in all, I quite like SAX and I'll probably like it even more if I spend some time on my content processing to make it a more generic solution :)

Friday 13 March 2009

Adobe Flex

I finally got around to trying this out over the last few days, and I have to admit I was surprised by how much I enjoyed the experience. To start explaining, I'll first detail a bit about what Flex is.

Flex is the free, open source SDK for creating Flash applications (or rather swf applications). It does away with that annoying 'timeline' concept thats used in Flash animation, replacing it with a very nice, very functional XML layout format with the ability to embed ActionScript 3 code into the layout. Add onto this the Flex Builder IDE (based on Eclipse, and unfortunately not free, but it does have a free 30 day trial version) with extremely useful code completion and all the nice organisation features I'm used to with IDE projects and you have an environment that makes it, speaking plainly, just makes it easy and fun to do things.

So, to start with, my initial thoughts upon trying Flex were dubious. I've never really liked Flash and have always had the mental model of it being an 'animation package' that shouldn't be used for mainstream web applications. I guess Adobe realised this, which is why they hid animation features in Flex to make it more a mainstream programming language. For reasons I don't want to get into here, I had to put this aside and actually try to create something useful with the language.

After about an hour of downloading(1) and then finding the 'Flex in a week' tutorials on Adobe(2) and I was set up, ready to start producing these hated flash applications. About 30 minutes later, I had a grasp of how to layout the XML applications (using the MXML library), how to link that into ActionScript code to do... whatever... and about 10 minutes later, even how to request data from a website and push that into my application to populate list controls. It was easy, despite having never done any AS coding before and having the barest grasp of the syntax. The tutorials didn't even cover the syntax. It assumed you were clever enough to pick it up by osmosis (a welcome relief I can tell you... I don't mind learning a new syntax, but being told for the 50th time that 'this is a for loop, you use it to iterate over something N times' is not something fun) and concentrated on the stuff that would be less familiar to a programmer first coming to flex - specifying a layout with XML, the conventions used, how to link to services, the event model of Flex and how to specify custom events, etc.

So, I haven't gotten through the complete 'week' course yet, but I rushed through the first two-and-a-half days fairly quickly before deciding to create something 'useful'. I decided that what I would try is a custom Flash reader for my wife's hamster comic strip, HAMIC. I used what I had picked up to request the RSS feeds (that I had created, and modified slightly to throw out some extra stuff I needed) and used my newly found knowledge to create the reader in about a day. I then modified it again to add the ability to create and update comics using a similar interface to the web interface on the actual site. I ran into a couple of small issues that helped me understand what the framework does better (such as when controls get instantiated, how bindable data works a bit better and so forth) but overcame them (apart from one small issue with single comics in a category... but we won't mention that ;)). I may end up doing a lot of web-related stuff as a job soon (fingers crossed, I really need a job right now) and this was the first step towards that goal.

It's certainly a far cry from Symbian, and that's almost certainly a good thing :) It's definitely made me re-evaluate my views on Flash technology.

(1) Flex builder 3 trial - http://www.adobe.com/cfusion/entitlement/index.cfm?e=flex3email

(2) Flex in a week video training course - http://www.adobe.com/devnet/flex/videotraining/

(3) The reader I created, for anyone interested. Please be nice to it, it's only a new app - http://flextestsite.workmad3.com/HAMICFlex/index.html

Saturday 14 February 2009

Fresh Start

I've just created this blog as a branch from my original over at Shared Illusions.

This is going to be my programming blog. I'm going to still post to Shared Illusions, but I'm going to try and keep this blog as a discussion of various coding topics and continue with rambling and general stream-of-consciousness over on Shared Illusions.

As for the name, it's basically drawn from this picture that my wife drew the other day:
coding-monkey.2

The caption was initially meant to be a play on the fact that it's a code-monkey reflected and that there is a programming technique called reflection (this part obviously only clear to coders with a fairly poor sense of humour, i.e. me). After reading This post from coding horror a third meaning became apparent. A reflective coder is a programmer who thinks about programming all the time, and uses this to constantly reflect upon and improve their skills.

I'm not a reflective coder yet, but I have the first part down as I never seem to stop thinking about coding (well, apart from when I'm thinking about physics, maths or the latest PTerry book :)). My discussions here will help me, and possibly anyone reading, along the path to being reflective coders and better programmers in general.