Friday, August 12, 2005

A seminal essay and a difference of opinion

Thanks to Wilfred van der Deijl's post here, I've just read a January 2003 article by Martin Fowler and Pramod Sadalage... and I can't believe that I had previously missed it!

Martin and Pramod very clearly discuss the overarching issues that appear once you start to look at database development in an agile manner, and you can very easily see from their discussion how important a clear and solid patch runner structure is to successful agile database development.

Not long after this article was written we were just starting to think about the same practices and many of the conclusions we came to matched. Not least the need to implement sandbox databases for each workspace and the need for consistent test data across those databases.

However, our conclusions did differ in one fundamental way, and I find it difficult to understand Martin and Pramod's (M&P) approach in this particular area.

My issue relates directly to the propagation of database changes across the development team. In M&P's approach, database changes are automatically applied to the development workspaces as they occur, managed by the team's DBA. They point out that:

'people are usually concerned that automatically updating developers databases underneath them will cause a problem, but we found that it worked just fine'

I am one of those concerned people. I am concerned in the same way that I would be if I was told that the source code I was working with would be automatically updated at regular intervals as changes occurred. Let me rephrase the above statement with that in mind:

'people are usually concerned that automatically updating developers source code underneath them will cause a problem, but we found that it worked just fine'

Suddenly the statement does not seem quite so reasonable!

In order to develop I need to clearly understand what is within my control and what is outside of my control. I need to be able to rely on the known state of my own workspace. For me, this is a founding principle behind the idea of the development workspace as a sandbox.
A given version of an application is developed to run against a given version of the database. Because of this, the database is as much a part of the workspace as the rest of the source code.

There is also the issue of the database change being applied in advance of the related code change being checked into the version control system, and the manual process therefore required. In M&P's process they will notify the DBA of the changes required once they are decided upon. At some point in the near future the DBA will then make the changes to the central databases and the changes will propagate to the development workspaces. As this happens there are three clear risks that I can see:

1.The database schema change is made a significant amount of time before or after the associated code change is committed to version control.
2.The change required by the developer is miscommunicated to the DBA and an incorrect change is applied.
3.The change is mistakenly applied to development databases that are following a different branch to that in which the change is made.

In all the above cases, the effective result is that the change may invalidate a particular component of the application due to the development (or integration) database being out of step with the source code being ran against that database.
I believe all this risks can be mitigated against by making the following changes to the described process:

1.Make the developers responsible for producing the actual patches that will be eventually be ran against all databases, development, through integration to live. These patches should form part of an clear structured patch runner, and should be placed in version control alongside the rest of the application source code.

2.Make each developer responsible for the maintenance of their own database workspace. Make it easy for a developer to upgrade their database, and make them do so using those scripts that will be ran against all other instances of the database. Make it part of the routine that immediately after updating their source code workspace they upgrade their database.

3.Put the database upgrade into the automated build / continuous integration test suite and ensure that the build being performed by the automated build is exactly the same as that which would be ran on the live system.

These practices have the added advantage of minimising the amount of work required by the DBA, who can exist in a more advisory role within the team.

Of course, there are times where data migrations will take a non trivial amount of time to complete. In these cases it is important that developers are warned of the fact and are able to schedule times when these changes take place so as to minimise the impact on their work.

Having said this, agile database development is a new venture for most everybody involved, and work such as this should not be underestimated, and cannot understated. There will likely be many different ideas on this topic, coming from many different people, and it's a pleasure to be involved. It is very easy to forget just how much database development has evolved in the last 10 years!

Technorati Tags: , , , , , , , , ,


Anonymous said...

A seminal essay, Shirley?

Rob Baillie said...

Man, I gotta learn to proof read more carefully!