Friday, August 05, 2005

The Database Patch Runner - Rollbacks

In response to my earlier post here, Andrew Beacock asked a couple of seemingly small questions.

First of all, he asked for more detail. I'm going to address that in a later post.

The second question, I didn't really expect. He basically asked “How do you rollback a patch?”

I'm assuming that he's asked this question in relation to the point I made that went along these lines: Since the patch runner stops running patches whenever one of the fails, then recovery is simply a case of rolling back the changes that the patch had already made and then restarting the patch runner. If the runner ensures it doesn't attempt to re-run any successful patches, but does re-run the failed one then it effectively picks up from where it left off and goes on its merry little way.

The point I was trying to make was that since the patch runner stops, then you don't get into the situation where you're attempting to fix a problem that occurred half way through the upgrade by getting to the end that then picking up the mess it left in its wake. Instead you know where the upgrade failed, the patch log states it clearly; you know what the patch was trying to do, since you have a functional grouping of changes in the patch, and this gives you context for the failed change; you know that you only have to worry about the failure that is reported, not the hidden data problems that the rest of the upgrade generated because the first patch didn't do its job properly.

So how do you rollback the changes in order to get the patch up and running again?

To be honest, there's nothing I can think of other than getting your hands dirty; rolling up your sleeves and getting into the database. Picking apart what happened, piecing together the story and trying to gaffer tape up the problems.

However, you can make that job easier:

The big thing is... Think about transactions in your patch

Every DDL statement issues an implicit COMMIT, so don't pepper your patch with DDL, leaving data in a transient state committed in the database. Think about the order in which you do things in order to reduce the chances that a failure at any given point leaves data in an invalid state.

Ensure that your 'add to schema' DDL comes first. That way, if you're moving two columns between tables, and the second column's create fails, you don't get stuck in a situation where you need to reverse the first change or apply the second one manually.

Ensure that your 'remove from schema' DDL comes last. For the same reason as above. It's much easier to deal with if you know that your data migrations have completed and it's just a case of removing those columns manually and marking the patch as complete.

If you can get away with it, don't issue commits in your patch's data migration until it's completed. Admitted, if you're moving millions of rows of data around, this may not be possible. But think about it... do you really need to issue that commit every 10,000 records, or can you get away with running the whole lot through first? That way, when you get a failure, the patch runner can ROLLBACK and you know what state the data is in... the state it was before the patch started.

If you can't get away without committing regularly, or the patch simply takes to long to risk having to repeat it, think about how you can make the migration restartable. That is, store a watermark, make your patch keep track of where it was so that it can restart after a failure. Make sure that the patch can handle being restarted without it trying to re-do work it has already done. If it really is that risky, why not check that you have enough space in your rollback segments before you generate GBytes of data?

Ideally you want to make sure that your patch doesn't fail, but with a bit of thought before you write the patch, you can make your life easier when it does.

Technorati Tags: , , , , , , , ,

5 comments:

Anonymous said...

If you're database is running in ARCHIVELOG mode and you have exclusive access to the database during a patch run (which is good advice) perhaps you can use FLASHBACK DATABASE to rollback the entire database to a previous point in time.

This is 10g feature which I haen't used myself yet but it should be possible. See http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10734/rcmflash.htm#1020720 for more info.

Rob Baillie said...

Looks very interesting, cheers for the tip off!

As people who read this blog might have guessed, I'm much more of a developer than a DBA, so I tend to lean towards development solutions rather than DBA ones.

Once of the situations I like at our place is that the database upgrades are now so solid that the DBA team very very rarely gets involved. We normally have a member of the support team performing the upgrade with a DBA and developer on standby on the off chance that something goes wrong. It very rarely does.

Our DBAs like it!

Anonymous said...

I'm not really sure what our DBA's would think if we no longer need them during upgrades. I'm already taking (automating) so much work away from them ;-)

Anonymous said...

What if you record each step within your PL/SQL program and condition each step (Each step being a DDL activity)?

Your patch runner then not only examines for the version, but the last successfully implemented step within the versioned patch. Another possibility is to break out each DDL statement as a patch...this could get cumbersome as well!

Rob Baillie said...

Yep Shannon, you're on the money.

It can get cumbersome whichever way round you do it. At the moment we're still keeping our patches small and haven't added another layer of recording into the patch runner, though we're now at the point where we have more than a couple of hundred patch files sitting in our release. It's no big deal really I suppose, but we are considering producing a new greenfield build at a version before which we would not move. It's going to be a bit of a task, but it will speed our development builds up a little.

Whatever you do, whichever option you take, there are pros and cons. You've just got to make sure you understand what they are first, and then react to them when they change.