Wednesday, September 07, 2005

The Database Patch Runner: Dealing with code

Previously I've talked about the database patch runner as a way of easily and consistently getting database changes into version control, onto databases and under automated frameworks. But one topic I think I glossed over was: Which database objects fall under the patch runner, and how do you deal with those that don't.

A big advantage of the patch runner is that it groups functional changes together and will ensure that those changes are applied to a given database once and only once. A disadvantage is that it removes the object centric view of changes from version control and puts it into the database. For tables, I really don't see this as a problem. For packages, procedure and functions this is simply unacceptable; we need to keep source code grouped in the more traditional object centric manner.

We could have developers make their changes to the individual procedures and suchlike and then get them to add the source code to the patches one they've finished. But they'd probably forget a file every now and again, and would moan about the task. We could have a build manager monitor the version control repository and generate the list of source code files that are needed for the jump between any two given tagged versions of the system and build patch files upon request. But that would mean that we would need to employ a build manager to do this job.

Ideally we want a solution that will minimise the overall work in managing the build script, not just the work for the developer.

One of the reasons why we use the patch runner is to ensure that destructive code, or code that changes the underlying structure of a database cannot be run more that once. The implication of this is that statements which can be ran multiple times without destroying structure or content, due to their intrinsic nature, do not need to form part of the patch runner.

That is, if a procedure was to be dropped and re-created, the only impact would be to temporarily invalidate any other procedures calling that procedure. Oracle can deal with this and can be set to re-compile dependencies when a procedure is called. The overall structure and content of the database is not changed by re-creating a procedure. The intrinsic nature of the procedure means that it can be re-installed without damaging the database.

This is not so when dealing with tables. If a table was to be dropped and re-created*, the impact would be to loose the data previously held in that table. The result of re-creating a table that has not changed is that the overall structure will remain the same, but the content will not.

* Re-creating a table here being: dropping the table, cascading the drop of any keys on / to that table, and re-creating both the table and those keys.
This is not a small amount of work in itself, and if this was not done, the structure would also change.


This difference is crucial in understanding which objects need to fall under the control of the patch runner and which objects do not, as well as understanding how we can then deal with the objects that do not.

Of course, there is nothing stopping you from putting any particular set of objects under the control of the patch runner if you feel it is appropriate. For example, indexes can be dropped and re-created without any risk of loss of content and so you may regard these as outside the patch runner's concerns. However, indexes on large tables can take time to create and most of the data migrations that take place in the patch runner scripts would benefit from those indexes. For these reasons it may sometimes be appropriate to put the indexes into patch scripts.

For us, the split between patch runner and non patch runner is as such:

Patch Runner:
  • Tables

  • Indexes

  • Materialized Views


Non Patch Runner:
  • Grants / Synonyms

  • Views

  • Packages / Functions / Procedures


The list isn't exhaustive by any means; it covers the objects we use. As any new object type comes into focus we get together to decide if it should be installed by the patch runner.

So, knowing which objects do not fall into the realm of the patch runner, we then need to actually deal with those objects.

As stated above, the primary factor for deciding when an object should fall under outside of the patch runner's remit is if that object can be freely replaced with a new version of that object with no further work. We use that to our advantage, and state that since all objects outside of the patch runner can be replaced when they have changed, then all objects outside of the patch runner will be replaced irrespective of whether that object has changed or not.

That is, our patch runner sits within a larger build script. In our case this build script is a master batch file and a collection of SQL scripts, though I see no reason why this shouldn't be Unix scripts, ANT tasks, or whatever. The overarching build scripts contain sections that will install objects that are required before the patch runner can execute, and others that require the patch runner to have completed its task before they are installed. In every case, each of these scripts install all objects required by the database, not caring if they previously existed in the correct form.

In some cases a single script will create the full set of a single object type; all grants are created in a single script, as are synonyms. In other cases a single script will exist for each individual object; each package specification has a script to itself, separated from its corresponding package body. In adding a new object to the database schema, the developer needs to add that object to the relevant controlling build script.

This setup allows us to separately manage the version control for each individual object where that object is sufficiently important, whilst managing the number of scripts that exist for the less verbose creation statements.

By ensuring that all objects are created during every build sequence we simplify two processes:
  1. Implementing a change in a given object requires only that the create script is edited, therefore minimising the work a developer must do to implement that change.

  2. Building the release does not require knowledge of the source version, only the target version, therefore minimising the work required in generating a release.


The result is that whilst the objects fall within a highly structured build script, the developer need not think about it except when adding new objects. The PL/SQL source code can be managed like any other source code, each component falling under version control as an individual component.

We acknowledge that the build script could be better, and that the controlling install scripts could be generated (run all files from directory x, or run all files with extension '.xxx'), though in reality the maintenance of this script is minimal (a single number of minutes each week). Also, putting this in place would have an impact only on the build script, not on the individual object create scripts.

Additionally, it can appear to be overkill to install the full set of source code on each build, even small patch releases. However, installing source code onto an Oracle database is generally pretty fast. We find that the build script spends far more time running data migrations than installing code.

We exclusively use packages (no standalone procedures, functions or triggers), and so our build scripts are simple: We can install all the package specifications, then all the package bodies and can be sure that they will compile; dependencies do not affect the order of installation.
Views are installed in an order dictated by dependencies, though this does not prove difficult. If it did I'm sure we could 'force create' them.

Simplicity aside, I'm certain that even if we had difficult interdependencies with a very large number of objects, we would first test the ability to manage their creation in this way.
In short, I acknowledge that for larger projects there may be difficulties with the approach in the simplest form, but I'm more than confident that the basic theory holds. If not, I'd like to hear why, since our small (ish) project will inevitably become a very large one, as is the way with database applications.

Technorati Tags: , , , , , , , ,

8 comments:

Anonymous said...

Thanks for this detailed report. It's along the line I was thinking.

But how do you handle dropping an object. What if you want to drop a pacakge. Do you change the last version of the PackageSpec SQL file to have a DROP statement or do you include this in a patch script?

I would expect the first approach, so you can see clearly in version control that the package has been dropped.

You do end up with a lot of PackageSpec files for packages that are not realy there.

You could also just delete the file from version-control and have your build script notice a package in the database that's not part of the source-files and drop it. This does give you a cleaner version control but do you trust your build-script enough to go ahead and drop things?

Niall said...

Hey Robert - great post, but do we have to see all of it on orablogs/ in our news reader?

Niall

Rob Baillie said...

Niall,

From now on, no you don't.

I've limited the RSS feed to 1000 characters... hope that works for you!

All apologies.

Rob Baillie said...

Wilfred,

I'm glad it’s helped, and very glad it’s in line with your thinking.

We deal with dropping objects in line with your first description. We produce a patch to drop the objects we no longer need and then remove the object scripts from version control. This way it’s a conscious decision to remove objects and you don't have your defunct scripts hanging around confusing matters. Obviously you need version control software that deals with this properly (earlier tagged versions should still contain the removed scripts), but I think this is fairly standard behaviour.

I hadn't really thought in depth about it before, but I suppose there are other options too.
The build script could remove all the views / procedures / packages at the start, thus ensuring that only those created by this run of the build script will exist when the upgrade is complete. This is probably simpler to implement that checking if an object should be there after everything’s installed.

In answer to the question:
"do you trust your build-script enough to go ahead and drop things?"

Yes, yes we do.
Each developer will use the build script several times a week in order to keep their development databases up to date. Therefore we find that an error in the build script is caught pretty quickly: usually in a matter of hours.

Using the live version of the build script to upgrade and rebuild the development databases is the easiest way to improve the simplicity and the integrity of the build script.
It's amazing how quickly developers fix things when they need them to work themselves.

Andrew Beacock said...

Great post Rob, but I have a bit of a query that I think you could clear up...

Is the advantage of having Non Patch Runner items purely so that you can have them living elsewhere in the code repository?

I would have thought that the logging information that the Patch Runner provides would be a great thing, but if you have Non Patch Runner items then I assume that you lose that advantage.

Is that it, or am I grossly missing the point? (probably...)

Rob Baillie said...

Wilfred,

Just a thought...

Whilst I trust the build script to do what it does effectively, I may not trust support departments not to add code to the system directly on production databases. Anything that keeps the database 'clean' of such code will then have an impact on the support department, DBAs and such.

So maybe the question should be:
Do you trust your support department to follow the proper processes and ensure that no required code is installed outside of the proper build script?

Rob Baillie said...

Andy

In answer to your question;
Is the advantage of having Non Patch Runner items purely so that you can have them living elsewhere in the code repository?

Yes, I think it is. But I don't think the benefit of this can be overestimated.
If the source code objects were placed in the patch files, then in order to edit a given procedure you would have to either extract it from your development database or find the patch in which it was last upgraded. Having the source code separated means that it's easy to find.

Imagine if I said that in order for you to find a given Java class you had to extract if from the compiled application and de-compile it?

Whilst table definition scripts differ immensely from most other pieces of code in an application, Oracle packages and procedures do not. They are simple source code that I think should by managed in the same way as any other source code. The build script is then merely the Oracle equivalent of another language's MAKE or ANT file.


In terms of logging.
Yes, logging is a huge advantage in the patch runner, though not just because it is a log. In my opinion it is because of the way this log is then used. Most importantly the patch runner uses this log to ensure that no patch is ever ran twice. In some cases, performing a given data migration twice may destroy data.

Whilst we probably shouldn't install procedures twice in a single upgrade, there's no functional problem in doing so, it's just that we're doing the same job twice.

Does that answer your questions?

Andrew Beacock said...

Rob,

Thanks for your answers, it does clear quite a lot up. I think I'm having problems as I'm not an Oracle boy, just a cheap Postgresql kind of guy. :)