Thursday, August 04, 2005

The Database Patch Runner

Over here Wilfred van der Deijl has raised the time old question when dealing with databases.

How do you easily upgrade a database from an arbitrary version to a new particular version with the smallest amount of effort.

I responded with a brief description here , which caused Amihay Gohen to ask a question or two over E-mail. I thought I'd put together something a little more coherent and detailed here, rather than just send it on to Amihay directly.

This is the basic solution...

Whenever a developer / pair of developers need to change the database they write a patch. This patch contains any alter tables / data migrations / etc. A single file contains the full set of changes for a single conceptual change and may cover changes to several tables.

The patches are written in PL/SQL procedures. That is, each single set of changes is implemented in a single procedure. Each DDL command is issued in an EXECUTE IMMEDIATE, or similar, as is any DML that references a table or column that does not yet exist. The patch is written in a procedure so that we can use the Oracle exception handler to propagate fatal errors in the code to a controlling procedure, something that is far more difficult to do in a script.

These patch procedures are then installed by a "patch runner". This runner is guided by a .txt list of patches that is created by the developers. If a developer wants to add a patch then they write the procedure and add it to the patch list. The runner then uses this patch list to tell it which patches need to be installed, and in what order. The runner is written in PL/SQL, using UTL_FILE and DBMS_SQL to install the procedures.

The runner logs when a patch starts, in a table, and marks the completion or failure of that patch when it finishes. By looking in the patch log before it runs a patch, the patch runner ensures that it never attempts to run a patch that has previously been marked as complete.

In addition, if any patch fails then the patch runner aborts, stating that it failed. "Crash don’t trash!". If the patch runner is restarted then it will attempt to re-run any patches that previously failed. This makes it easier to restart a failed upgrade if ever required. You reverse out the changes already made by the failed patch and then rerun the patch runner.

The runner itself is installed in a single PL/SQL script, and we make sure that it installs all the components it needs before it starts, and tears itself down once its finished. It does not rely on any other part of the database or application in order to work. Since there are no dependencies outside of the patch runner install script, we can use a single patch runner for all of our database based applications, with no changes.

The installation and upgrade of the patch_log table is dealt with in the more traditional manner of a single CREATE TABLE at the start followed by ALTER TABLE statements as it changes over time. The patch log is not destroyed down with the runner.

So, in summary, you check out a given tagged version of the application to run against an arbitrary database and run the patch runner. It loads the list of patches that are applicable for that tagged version. It runs over each patch in turn and checks if it has previously ran. If it has not, then it runs the patch.
By the time it reaches the end of the list it has ran all the patches, none of them twice. You can even run the patch runner twice in succession and guarantee that the second run will not change the state of the database.

The key to that is then to make sure that you use the build often. Get your developers their own database workspaces and make sure that that rebuild and upgrade them often. By repeating the upgrades often you ensure that they basically work before they get moved into a test environment with the bigger worry of large volumes of data.

The patch runner is, of course, only a small part of the database build, and a smaller part of the whole application build. So the runner needs to put into the context of a larger build script. We use batch files here in order to manage our database build process, but I see no reason why other, for complete scripting languages shouldn't be used, or ANT. The runner is, after all, just a PL/SQL script.

As an aside, we also write tests for the patches, checking the pre and post conditions are valid… but that’s a very different topic!

Technorati Tags: , , , , , , , , , ,

18 comments:

Andrew Beacock said...

An interesting post Rob, this was something I was thinking about the other day with regards to database upgrades to Postgres.

I'm not sure if you are able to have the same kind of '1 procedure per patch' idea in Postgres, and so the rollback of a failed patch may be tricky.

If you are able to post any more information on how the patch runner works, what it updates in the database, etc. that would be a great help to anyone interested in this area.

Anonymous said...

Hi Rob,

Thanks for the great explanation at your own blog. I think this can really get me going.

The only thing I'm wondering is if you're not missing the version history of a single table. A table might be changes numerous times during its lifetime in different patch-scripts. I was hoping to have one script per table, so you can still see some version history in the file. However, this will make the PatchRunner a lot more complicated.

Don't you miss the version history of a single table?

Rob Baillie said...

Wilfred,

I'm glad its along the right lines for you.

In response to the single script with history for each table. Until you'd mentioned it I hadn't noticed that it wasn't there, so no... I suppose we don't miss it.

It may be to do with our working environment. We rarely have the situation where we need to look back to find out Who added that damn column.

We work with a combination of collaborative design, pair programming and constant refactoring, so we rarely get that drop it all crash bug. That's eXtreme Programming for you.

I'm not saying that's the only reason you ever need that history, but I find it to be the most common!

In terms of actually getting hold of the table's history when you do need it. If you have all your patch files in a single directory (or at least in a single tree), and have nothing but patches in that tree then it's just a case of doing a 'find in files' for the table name.
Admitted, it's a little trickier finding the order those changes were applied and suchlike.

One big thing you DO get, along the same lines, is the patch log itself. It's easy for someone to tell what changes have been applied as they're listed in a table. You can also add info onto the log such as the tag that was used when the code checkout was made. This means you can see which releases were issued to the database and when. It would also tell you if someone upgraded the database without using a tagged version.

Rather than having the group of changes relating to a single table, the script containing the changes relating to a single functional upgrade gives you a good functional history of the application. Just as long as those peices of functionality aren't too big.

I think there are also advantages relating to branching. Making the patch relate to a single functional change means that the patch itself does not neccesserily need to be merged. Most (in our case, all) of the time you'll find that you need to apply the same patch in every version. This is simply a case of putting the patch into each branch. When you get to the situation where a database moves from a branch back onto the trunk the patch runner effectively deals with your merge process for you, refusing to re-run the patches that have already been applied.

The process is complicated by patches that differ in different schemas, but I've not experienced that in practice yet.

Rob Baillie said...

Andy,

Maybe I need to put together some more text on this, with more detail... what sort of detail would be looking for?

Rob Baillie said...

Gonen
The short answer is: I think that's a question about version control rather than one about the patch runner...
The long answer is: keep your eyes open on the blog, I have a post on exactly that coming up very soon...

Anonymous said...

Hi Rob,

Thanks for a great blog. I have a question regarding the database patch runner.

When you check out a tagged version of the application, the patch runner loads a list of patches relevant to that version. Does that mean that the list only contains the patches needed to move from the previous release of the app?

The reason I ask is that we support multiple sites. Each site may be running a different version of the app. And very often, they may be two or three releases behind the version they are installing.

Therefore, am I right in assuming they would have to migrate release by release ?

br
peter

Rob Baillie said...

Not quite...

the patch list contains a list of all the patches that will ever run against the system. That list is held in the same version control system as the rest of the system.

That means that when you check out version 1.2.5 of the system you get the patch list that will completely build version 1.2.5 of the database for that system. If you check out version 2.4.3 then you get the list for completely building 2.4.3 (which almost certainly starts with the same patches as the 1.2.5 version of the list).

However, the patch runner will never run a single patch twice. That is, it will loop over all the patches in the list, and then run any that have not been run previously.

So, if your client is at version 1.2.5 and wants to upgrade to 2.4.3, you just check out version 2.4.3 and then execute the patch runner. It will know that the frst x number of patches have already run and skip them, only running those that take the system from 1.2.5 to 2.4.3.

For a client that wants to go from 2.4.2 to 2.4.3 you do exactly the same.

There is no requirement to know what version of the system you're upgrading from, and you only need to run the patch runner once.

Does that make sense?

Anonymous said...

Hi Rob,

thanks for the reply - yes it makes sense. But i have a couple of more questions, if I may !

Lets assume you've released 2.4.3 and work begins on 2.5. A bug is found in a pl/sql package in 2.4.3 - so a patch is created to fix this (patch no. 100 say) and deployed to sites using running this branch.

Meanwhile, the same package is being modified for the main branch (2.5) and has been assigned to patch 101. Do you simply merge the fix from patch 100 into this ? Or would you explicitly put patch 100 into the list of patches for 2.5.

I would be tempted to merge the change into the later patch but this would mean you would have a 'missing' patch. I guess it doesn't matter - do you see any problems with that ?

And finally, when you define the patch order in the patch runner, do you put pl/sql procedures / packages in dependency order ? or do you simple recompile at the end ?

br
peter

Rob Baillie said...

Ahhh, I can see where a bit of confusion has come from.

I would never put a PL/SQL package install into a patch.

Patches are for non-repeatable or destructive changes to the database structure. That is - they contain changes to tables / sequences / indexes / constraints or datamigrations / data fixes and the like.

These differ from mere PL/SQL functions / packages / procedure installs in that you can't just re-run them.

When it comes to views / packages and the like, I just install the whole lot, every time. Even if a package doesn't change, it doesn't do any harm to re-install. Especially if it forms part of a bigger build script that installs everything and ensures that everything is compiled at the end.

It makes for a much simpler and quicker build... your pathces don't install the same package 17 times in a row. And MOST importantly, you don't need someone to compile the changes to the packages into a build script.

You might notice that I focus on using packages as my main example - that's for a pretty simple reason. I tend to use packages pretty much exclusively as a means of producing PL/SQL. It means you can install all the specifications and then all the bodies and (assuming they're syntactically correct and consistent) you know everything will compile.

So - to cut a long story short - the patch runner is just a small part of a bigger database build script. The build script will use the patch runner to manage the creation of tables / indexes / foreign key constraints and other such database objects before it then installs and compiles all the PL/SQL code.

I have a more coherent and thorough discussion of this structure in another post:

http://robertbaillie.blogspot.com/2006/03/uk-oug-and-running.html

Skip to the section 'Implement a build script that an idiot could run'.

Other than that, check out:

http://robertbaillie.blogspot.com/2007/09/database-build-script-greatest-hits.html

It is an index of all my posts on the topic.

I did once submit a proposal for a book on the topic of agile database development, which incidently included a chapter on database build scripts. If I find the time I'll try to dig it out and post it soon...

Anonymous said...

ok. now I understand !

However, re-compiling PL/SQL package specs and bodies does present problems for us.

We support a J2EE application with an application server and connection pooling in the middle tier.

Recompiling the spec or even the body causes problems in the connection pool.

For example, when you recompile a package body, you invalidate the package state :

ORA-04068: existing state of packages has been discarded
ORA-04061: existing state of package body "SOME:PACKAGE" has been invalidated

Following the recompile, each session in the connection pool that has previously called the package (usually all of them in our case) will get this error on the next call to the package.

There are workarounds :
- you catch the ORA-04068 exception and ignore it but we cant do that for legacy reasons
- I'm also looking at deploying all the packages but using synonyms and doing a CREATE OR REPLACE SYNONYM but i havent tested what happens in the middle tier with this yet.

But these are just implementation issues - I just thought I'd share them with you.

thanks for the clarification.

br
peter

Anonymous said...

what if forgot to mention is that our company is trying to achieve downtime free deployments (or as little downtime as possible). This maybe a futile effort - but there is a lot of pressure to achieved this.

peter

Rob Baillie said...

No problem... good luck with the implementation.

Maybe the build script could detect which packages specs / bodies have changed and then only compile those? I'm sure that wouldn't prove too difficult.

Can you get the connection pool to refresh its connections - maybe just drop the lot and recreate them on request? You would get a brief drop in performance as each new connection was made but it should pass fairly quickly.

One method I've often considered for such circumstances, but never actually implemented, is to run two (or more) mirrored databases. Take one offline (drops all its connections) upgrade it and then bring up (allows connections) and then repeat with the other. That way you still get a drop in performance, but everything's available through the upgrade and the connections are refreshed.

Still - easy to say - never once thought about the details of how you might do it.

Let me know how the synonym idea goes... If I could find enough minutes I'd probably test it myself - but my lunch hour has now gone...

Unknown said...

"The patch is written in a procedure so that we can use the Oracle exception handler to propagate fatal errors in the code to a controlling procedure, something that is far more difficult to do in a script."

Platform independence is another benefit of using plsql, but for a moment let's ignore that.

What does the controlling procedure do? In http://robertbaillie.blogspot.com/2005/08/database-patch-runner-rollbacks.html
I think the conclusion was that on an error, you've got to get your hands dirty (or at least investigate whether you got lucky). Maybe it writes an error status into the db, maybe it sends email... anything else?

I ask because I'm considering a controller written in shell that runs sqlplus (with WHENEVER SQLERROR EXIT) for the db work. Haven't yet envisioned a need for UTL_FILE and DBMS_SQL with that approach, but I haven't even started a design doc yet (and certainly not pseudocode).

Thanks so much for the wonderful blogs on this topic. :-)

Rob Baillie said...

Laubster.

I reckon you've got it already. It almost doesn't matter what you want to do in an error condition, all that matters is that you want to do something.

You can have the patches installed by SQL scripts and then have the controller exit on error, but the problem is the complete loss of context when that happens.

You can have your Shell / DOS / ANT / PHP / whatever script perform actions based on that exit, but you normally need to dig out your context from some log file, by interrogating the database, or similar in order to put something useful into your database log / e-mail / xml report file.

If you have some way of keeping the control in the PL/SQL layer then you have the native exception handling available to you and the context of the failure remains. It makes it much easier for you to perform actions based on the context.

Obviously, there are ways of getting round it and still running pure SQL scripts, we just found it was easier to stay native.

Plus, if you feel the platform independance of the patch running is important to you, you can define a return mechanism from the PL/SQL layer that results in your own proprietry exception mechanism being used to cross the boundry between PL/SQL and your patch runner implementation. This would be handy if you needed to implement the database layer in another language / against another platform. Your patch runner might deal with the e-mailing / logging / etc, but you still need to the PL/SQL layer to reform the expections into something more generic and handleable.

Anonymous said...

Hi Rob,

You say that there is no way to automatically back out a patch that has failed. Instead you have to manually piece back together the destruction you have caused.

I thought of a potential solution to this, though I have not had chance to test it out yet.

Essentially at the start of each patch you write the current timestamp to an easily accessible place, potentially a specifically created table somewhere in Oracle. If the script fails then you make use of Oracles flashback database facility (10g onwards) and "flashback" to the last time stamp you wrote.

Please let me know what you think.

Cheers,

Matt T

Rob Baillie said...

Matt, that sounds spot on. Great idea.

In our implementatio, the patch log has a timestamp that states the moment that each individual patch was installed, so we can easily use that. That gives us the opportunity to rollback to any state during the upgrade if we wanted to.

Mind you, the original post was written before we'd played with 10g much, and we toyed with other mechanisms in 9i before we realised that we never needed to back anything out anyway. Because the upgrades are so thoroughly tested we find that nothing ever actually goes wrong when you hit the live site.

Anonymous said...

Hi there,

Excellent stuff. Strange to find after years of working with Oracle, I’d independently built the tool using PL/SQL as the “patch runner” and MS Access for the screen based front end. Thought was being innovative – seems others came to the same conclusion.

The screens allow you to create “Applications” (so you can manage multiple applications or schemas in parallel), register schemas (users – to which changes will be applied), and register the database changes (create table, alter table etc).

I’ve personally used DVM (Database Version Manager), in four projects, and it’s based upon an idea developed to manage over fifty developers. It works a treat, and means we guarantee that what we developed in “DEV” works in “LIVE” with close to zero DBA script management time.

TOAD and Embarcado seem to rely upon schema differences (a shockingly complex and quite hit and miss affair ).

Question – is there a commercial version?

I find my own home grown tool absolutely vital – surprised it’s not been incorporated into a commercial product yet. Any ideas? Is there one aready I don’t know about?

Good description though !!

Rob Baillie said...

Cheers for the feedback.

It's good to hear that other people think along the same lines, especially when they reach the same conclusions independantly.

I've got to be honest, since we've developed our in-house tool I've found any great need to look into the problem in any more depth (our solution works really really well for us). I'm sure there must be similar tools out there that people have put together. I just don't know what they are.

As ever, I think the biggest problem is the slow uptake in the vast majority of the Oracle community to new ideas. It's not that the community doesn't have great people doing great things - it's just that there's a huge amount of 'Oracle' people making money off the back of doing things badly. I found I didn't really get moving until I'd crossed into PHP / Java environment and started to get some cross-polination.

It's probably not an Oracle problem as such - just something that happens when people work in one arena for too long without any real outside influences.

Still, back to the original question...

I haven't used any off-the-shelf tools, but take a look at Ruby's database migration support. I've not tried it, but it sounds very very tasty. May prove an inspiration..?

And if your tool is so good, why not open source it (once you get rid of access ;-)). I know, I could ask myself the same question...

And get yourself registered onto the agile DB group over on Yahoo groups: agileDatabases@yahoogroups.com. There's a lot of great people posting over there.

Once again, thanks for your input