Saturday, September 10, 2005

Database Patch Runner: Design by Contract

So let's say that you've managed to put together a build script that will install the latest version of your database with the minimum of work and you've got your developers using the build to upgrade their own workspaces.

How can you push the patch runner? Actually test the upgrade and get easy to analyse information back from a nightly or continuous integration build? How can you protect your live databases and ensure that patches are only ever applied to databases when those databases are in the right state to receive those patches?

Bertrand Meyer developed the concept of Design by Contract. He suggested a technique whereby for each method there will be a contract, stated in terms of:
  • Preconditions: These conditions should be true before the function executes.

  • Postconditions: Given the preconditions are upheld, the function guarantees that on completion the postconditions will be true.

In some languages (e.g. Eiffel) this contract is enforced, and if either side of the contract is broken the method throws an exception.

Our patch runner works in a similar way. Whenever a patch is written, pre and postcondition scripts are written. The precondition script embodies the assumptions about / requirements for the state of the
database immediately before the patch runs. The postconditions state the shape the database should be in once the patch has completed.

Immediately before running a given patch the patch runner will run the corresponding precondition script. The patch will not then be run unless the precondition script both exists and runs to completion. Only if and when the precondition script runs and reports no errors will the patch be applied.
Once the patch is complete the postcondition script is executed. If the postcondition script is missing, or reports a failure, then the build stops and the error is clearly reported.
Only when the pre, patch and post scripts have completed is the patch log updated to state that the patch has been applied. If any point of the process reports an error then the patch log is updated to state that the patch failed, and for what reason.

For the majority of patches, the pre and postconditions are fairly trivial, and can appear to be overkill. For example, if you're adding an unpopulated column to a table then the precondition is that the column does not exist, the postcondition is that it does.

The real advantage comes when you write data migrations. For example, a patch is needed that will move column X from table A to table B. A requirement may be that every value that was in B must exist in at least one column in table A, that no data is lost during the migration.
A precondition could be that there is at least one destination row in table A for each value. By checking this condition before the patch starts we can stop the patch from destroying data.

Another example may be the reforming of some tables that contain monetary values. It may be required that whilst data is being moved between rows or columns, that the sum total of all values in a particular set of columns be the same on completion of the patch as it was at the start. The precondition script can store the original sum totals, the postcondition can then check the resulting totals after the patch is complete.

Producing the pre and postcondition scripts focus the developer on two things:
  • What assumptions am I making about the structure of the data I am about to transform, and which of those assumptions are critical to my patch's success?

  • How can I measure the success of my patch, and what requirements do I have of the resulting data?
Both of these things are very different to the usual focus of: What processes do I need to apply to the data? However, the result of focusing on the first two will usually result in a clear path to the third.

We've found that using this structure greatly increases the feedback we get from our builds.

If you can get your DBA to buy into the idea, you can produce an overnight test build using the most recent backup from your live system. Overnight, automatically copy live, upgrade it and have your build produce a report on the success of the current build against the current candidate for upgrade. This report will warn you if the data in the live system is moving away from the assumptions you originally had when producing your patches. It is much easier to deal with such issues when you find out during your development cycle, rather than during or, even worse, days after the upgrade has been applied to the production environment. An upgrade patch becomes a very reliable thing when it's been ran 50 times against 50 different sets of data, each time reporting its success of failure.

In addition, the postcondition scripts act as clear documentation as to the aim of the patch. This is of great use if and when you find a performance problem with a particular patch. Rewriting a patch 2 months after it was first put together is so much easier when you have a postcondition script to act as both documentation and sanity check.

As a possible variation, on finding failures it should be possible for the patch runner to automatically rollback the failed change, taking the database back to the state immediately before the patch started. In some environments this may be critical, however we've not yet found a pressing need and therefore in our version we do not automatically rollback failures.

Of course, the mechanism is only as good as the pre and postcondition scripts that are produced, but then you ensure quality by always pair programming don't you ;-)

Technorati Tags: , , , , , , , , , , , ,

No comments: