Saturday, August 06, 2005

Rant: You want my money?

To all you companies out there that want to take my money, and want me to pay you by credit card, on the internet. When you ask for my credit card number, you've got three choices:

1 - Build the entryboxes so that it doesn't matter if I put spaces or dashes in, just let my format the number 1234-1234-1234-1234, strip the number out and just use that.

2 - Build the entry boxes so that I can't put spaces, dashes or too many characters in and make it clear that I can't. That way I'll type my credit card number with just the numbers.

3 - Let me type a combination of spaces, dashes and numbers in the way that's natural to me, then tell me I'm an idiot. Give me messages like "Your credit card number has too many characters, please try again", or (my personal favourite) "You have entered your card number with spaces, please enter just numbers".

Guess which option will reduce the chances of me using your company next time. Guess which option your company most likely uses.

Friday, August 05, 2005

The Database Patch Runner - Rollbacks

In response to my earlier post here, Andrew Beacock asked a couple of seemingly small questions.

First of all, he asked for more detail. I'm going to address that in a later post.

The second question, I didn't really expect. He basically asked “How do you rollback a patch?”

I'm assuming that he's asked this question in relation to the point I made that went along these lines: Since the patch runner stops running patches whenever one of the fails, then recovery is simply a case of rolling back the changes that the patch had already made and then restarting the patch runner. If the runner ensures it doesn't attempt to re-run any successful patches, but does re-run the failed one then it effectively picks up from where it left off and goes on its merry little way.

The point I was trying to make was that since the patch runner stops, then you don't get into the situation where you're attempting to fix a problem that occurred half way through the upgrade by getting to the end that then picking up the mess it left in its wake. Instead you know where the upgrade failed, the patch log states it clearly; you know what the patch was trying to do, since you have a functional grouping of changes in the patch, and this gives you context for the failed change; you know that you only have to worry about the failure that is reported, not the hidden data problems that the rest of the upgrade generated because the first patch didn't do its job properly.

So how do you rollback the changes in order to get the patch up and running again?

To be honest, there's nothing I can think of other than getting your hands dirty; rolling up your sleeves and getting into the database. Picking apart what happened, piecing together the story and trying to gaffer tape up the problems.

However, you can make that job easier:

The big thing is... Think about transactions in your patch

Every DDL statement issues an implicit COMMIT, so don't pepper your patch with DDL, leaving data in a transient state committed in the database. Think about the order in which you do things in order to reduce the chances that a failure at any given point leaves data in an invalid state.

Ensure that your 'add to schema' DDL comes first. That way, if you're moving two columns between tables, and the second column's create fails, you don't get stuck in a situation where you need to reverse the first change or apply the second one manually.

Ensure that your 'remove from schema' DDL comes last. For the same reason as above. It's much easier to deal with if you know that your data migrations have completed and it's just a case of removing those columns manually and marking the patch as complete.

If you can get away with it, don't issue commits in your patch's data migration until it's completed. Admitted, if you're moving millions of rows of data around, this may not be possible. But think about it... do you really need to issue that commit every 10,000 records, or can you get away with running the whole lot through first? That way, when you get a failure, the patch runner can ROLLBACK and you know what state the data is in... the state it was before the patch started.

If you can't get away without committing regularly, or the patch simply takes to long to risk having to repeat it, think about how you can make the migration restartable. That is, store a watermark, make your patch keep track of where it was so that it can restart after a failure. Make sure that the patch can handle being restarted without it trying to re-do work it has already done. If it really is that risky, why not check that you have enough space in your rollback segments before you generate GBytes of data?

Ideally you want to make sure that your patch doesn't fail, but with a bit of thought before you write the patch, you can make your life easier when it does.

Technorati Tags: , , , , , , , ,

Comments back

Comments should be running normally again now. The offending Haloscan comments javascript has been replaced with the original Blogger javascript.

I had inadvertently switched over to Haloscan comments without switching off the Blogger comments. So adding a comment when you arrived from a permalink and clicking on 'Post a comment' gave you a Blogger comment that couldn't be seen from the main page.

Ah well, you live and learn! Normal commenting has now been resumed.

Thursday, August 04, 2005

Comment problems

Just so you know, I'm having trouble with the comments on here...

I've managed to install two competing sets of comments.
Haloscan is dealing with the comments of the main page, then Blogger is dealing with the comments that you see when you look at an individual post. Look at this post and you'll see what I mean.

Over the weekend I'll try to make sure I fix the problem...
For now, please accept my apologies :-)

The Database Patch Runner

Over here Wilfred van der Deijl has raised the time old question when dealing with databases.

How do you easily upgrade a database from an arbitrary version to a new particular version with the smallest amount of effort.

I responded with a brief description here , which caused Amihay Gohen to ask a question or two over E-mail. I thought I'd put together something a little more coherent and detailed here, rather than just send it on to Amihay directly.

This is the basic solution...

Whenever a developer / pair of developers need to change the database they write a patch. This patch contains any alter tables / data migrations / etc. A single file contains the full set of changes for a single conceptual change and may cover changes to several tables.

The patches are written in PL/SQL procedures. That is, each single set of changes is implemented in a single procedure. Each DDL command is issued in an EXECUTE IMMEDIATE, or similar, as is any DML that references a table or column that does not yet exist. The patch is written in a procedure so that we can use the Oracle exception handler to propagate fatal errors in the code to a controlling procedure, something that is far more difficult to do in a script.

These patch procedures are then installed by a "patch runner". This runner is guided by a .txt list of patches that is created by the developers. If a developer wants to add a patch then they write the procedure and add it to the patch list. The runner then uses this patch list to tell it which patches need to be installed, and in what order. The runner is written in PL/SQL, using UTL_FILE and DBMS_SQL to install the procedures.

The runner logs when a patch starts, in a table, and marks the completion or failure of that patch when it finishes. By looking in the patch log before it runs a patch, the patch runner ensures that it never attempts to run a patch that has previously been marked as complete.

In addition, if any patch fails then the patch runner aborts, stating that it failed. "Crash don’t trash!". If the patch runner is restarted then it will attempt to re-run any patches that previously failed. This makes it easier to restart a failed upgrade if ever required. You reverse out the changes already made by the failed patch and then rerun the patch runner.

The runner itself is installed in a single PL/SQL script, and we make sure that it installs all the components it needs before it starts, and tears itself down once its finished. It does not rely on any other part of the database or application in order to work. Since there are no dependencies outside of the patch runner install script, we can use a single patch runner for all of our database based applications, with no changes.

The installation and upgrade of the patch_log table is dealt with in the more traditional manner of a single CREATE TABLE at the start followed by ALTER TABLE statements as it changes over time. The patch log is not destroyed down with the runner.

So, in summary, you check out a given tagged version of the application to run against an arbitrary database and run the patch runner. It loads the list of patches that are applicable for that tagged version. It runs over each patch in turn and checks if it has previously ran. If it has not, then it runs the patch.
By the time it reaches the end of the list it has ran all the patches, none of them twice. You can even run the patch runner twice in succession and guarantee that the second run will not change the state of the database.

The key to that is then to make sure that you use the build often. Get your developers their own database workspaces and make sure that that rebuild and upgrade them often. By repeating the upgrades often you ensure that they basically work before they get moved into a test environment with the bigger worry of large volumes of data.

The patch runner is, of course, only a small part of the database build, and a smaller part of the whole application build. So the runner needs to put into the context of a larger build script. We use batch files here in order to manage our database build process, but I see no reason why other, for complete scripting languages shouldn't be used, or ANT. The runner is, after all, just a PL/SQL script.

As an aside, we also write tests for the patches, checking the pre and post conditions are valid… but that’s a very different topic!

Technorati Tags: , , , , , , , , , ,