Friday, September 30, 2005

If I had a hammer...

I'm sure that pretty much every experienced developer out there has heard the proverb:

  • If the only tool you have is a hammer, then everything's a nail.



Well, I've heard a few variations on the theme and so I've decided to start collecting them...


  • Sometimes it's just a nail, and all you need is a hammer.

  • Don't ignore your hammer just because you heard someone else hit their thumb with theirs.

  • When you're in the market for a new tool, make sure you're not buying yet another hammer.

  • You can enhance the usefulness of your hammer by adding a chisel.

  • A Rotary Electro-pneumatic hammer delivering 4600 hammer blows a minute is still just a hammer.



Anyone got any more?

Wednesday, September 28, 2005

The In Thing - Simplified

I've had a few people visit the site looking for some help on the Oracle 9i TABLE command. With this, you can issue SELECT statements against PL/SQL tables.

Now, previously I'd posted about the self same command, in a series of posts here:

http://robertbaillie.blogspot.com/2005/07/in-thing.html
http://robertbaillie.blogspot.com/2005/07/in-thing-example.html
http://robertbaillie.blogspot.com/2005/07/in-thing-simpler-example.html

Soon, I plan to put a few more notes together, starting from a simpler example. But for now I'll just thank Amis for the post here, which has pointed out a complexity in my own example that I didn't realise...

For some reason, the TABLE command cannot work out the data type of variable passed to it, and so will tend to throw an exception:
ORA-22905: cannot access rows from a non-nested table item

As I'd only ever done this when working with package, I previously worked around this by coding a function that returns the content of the relevent package variable. TABLE has no problem working out the datatype returned from a function.

It turns out I could have got away with just a simple CAST.
E.G. SELECT * FROM TABLE( CAST( table_variable AS table_type ) )

So, with that in mind, the first 'In Thing' example becomes this:

The Example:

In order to have something to access, we need the tables and data:


CREATE TABLE product
( id NUMBER
, descr VARCHAR2( 100 ) )
/

INSERT INTO product ( id, descr ) VALUES ( 1, 'one' );
INSERT INTO product ( id, descr ) VALUES ( 2, 'two' );
INSERT INTO product ( id, descr ) VALUES ( 3, 'three' );
INSERT INTO product ( id, descr ) VALUES ( 4, 'four' );
INSERT INTO product ( id, descr ) VALUES ( 5, 'five' );
INSERT INTO product ( id, descr ) VALUES ( 6, 'six' );

COMMIT;



In order to perform the cast we need to declare a type for the row in the table, and then the table type itself:


CREATE TYPE gt_id_type AS OBJECT ( id NUMBER )
/

CREATE TYPE gt_id_list_type AS TABLE OF gt_id_type
/


Then we define the package that will perform the cast.
We have the following functions

  • GET_PRODUCTS, which is passed a table of IDs, and returns a ref cursor containing the products requested.

  • GET_PRODS_BY_NAME, which is passed a string, and returns all the products that have a description containing that string. This function uses GET_PRODUCTS to return the product details




CREATE OR REPLACE PACKAGE product_pkg AS
TYPE gt_product_cur IS REF CURSOR;
FUNCTION get_products ( pt_id_list gt_id_list_type )
RETURN gt_product_cur;
FUNCTION get_prods_by_name ( pc_product_name VARCHAR2 )
RETURN gt_product_cur;
END;
/

CREATE OR REPLACE PACKAGE BODY product_pkg AS
--
gt_id_list gt_id_list_type;
--
FUNCTION get_products
( pt_id_list gt_id_list_type )
RETURN gt_product_cur IS
--
vt_product_cur gt_product_cur;
--
BEGIN
--
gt_id_list := pt_id_list;
--
OPEN vt_product_cur FOR
SELECT *
FROM product
WHERE id IN ( SELECT id
FROM TABLE( CAST( gt_id_list AS gt_id_list_type ) ) );
--
RETURN vt_product_cur;
--
END;
--
FUNCTION get_prods_by_name
( pc_product_name VARCHAR2 )
RETURN gt_product_cur IS
--
vt_product_ids gt_id_list_type;
--
BEGIN
--
SELECT gt_id_type( id )
BULK COLLECT
INTO vt_product_ids
FROM product
WHERE descr LIKE '%'|| pc_product_name|| '%';
--
RETURN get_products( vt_product_ids );
--
END;
--
END;
/



Finally, we have some code to run the GET_PRODS_BY_NAME function and return its set of values.


SET SERVEROUTPUT ON SIZE 1000000

DECLARE
--
vt_cur product_pkg.gt_product_cur;
vr_product_rec product%ROWTYPE;
--
BEGIN
--
vt_cur := product_pkg.get_prods_by_name( 't' );
--
LOOP
--
FETCH vt_cur INTO vr_product_rec;
EXIT WHEN vt_cur%NOTFOUND;
DBMS_OUTPUT.PUT_LINE( vr_product_rec.descr );
--
END LOOP;
--
END;
/


And the output:


two
three


Disclaimer: I've not yet managed to test this out with access from a PHP layer...

Technorati Tags: , , , , , , , ,

Friday, September 23, 2005

Friday afternoon...

Friday afternoons always seem to lead to pointless conversations, which sometimes lead to classic moments of comedy...

During an E-mail conversation about being vegetarian:

"In fact humans aren't carnivores at all. We meat eaters are technically eating carrion, which is meat that is dead for several days or weeks. That's why they hang cows and stuff up to enhance their flavour. We eat carrion, I.E. like vultures and hyeenas. This means we shouldn't kill at all, just eat what we find. I just happen to find it in the supermarket. It's not my fault there's a load of people who go around murdering animals and leave them in shops for me!"

Monday, September 19, 2005

Google maps? PAH!

Ever since Google released the API for Google Maps it seems the whole of the internet has gone crazy for it. Cool things have appeared like an index of the current BBC news stories on a map of the UK. Add to that the fact that Google Earth is pretty tasty too and you'd be forgiven for thinking that no other mapping tools existed.

But... there is, and one of them does something cool and useful that Google Maps doesn't.

I run. Not much, but enough to manage a 10km run without passing out. And when I train I like to make sure that I know exactly how far I'm travelling. I like to plan my training schedule so I know that on Monday I'll do 5km, on Wednesday I'll do 2km fast + 1km jog + 2km fast + 1km jog. I know, I know, I can't help it.

Anyway. Google Maps doesn't help me there. But Map 24...

Map 24 has a really handy little ruler tool. Point and click a rubber band round the route you run and BOSH, the distance travelled rounded to the nearest 100th of a mile (or thereabouts). Very nice for working out 1km / 3km / 5km laps round your local streets.
Oh, and it does a really cool wooshy zoom thing when you put an address in!

Technorati Tags: , , , ,

Friday, September 16, 2005

Feeling Smug

We've just gone live with a 40 user pilot of the latest version of the product we're developing, and once again I've been reminded why we work in an extreme programming kind of way, and why sometimes I just plain love this job!

Feedback, from the coal face...

  • "This is really, really good. You've done a great job."

  • "It's very straightforward, isn't it?"

  • "I'm loving this! You can tell that a lot of thought has gone into this"



And my personal favourite:

  • "I thought I was going to hate it - but I love it!"



Technorati Tags: , , , , , ,

Thursday, September 15, 2005

Ch-ch-ch-ch-Chaaaanges

Well, after a couple of weeks of fiddling with HTML and CSS I've finally managed to replace the blog's bog standard Blogger template with a home grown one...
It may not be perfect, but at least it's mine!

Comments are more than welcome.

Sunday, September 11, 2005

Easy RSS syndication with FeedDigest

Thanks to a heads up from Andrew Beacock, I've taken a bit of a look at FeedDigest.

It's a very simple syndication site that allows you to easily put together RSS feeds and then produce HTML versions of them for placing on your site.

I can see how this sort of tool can really start to push the use of RSS. For me all it's meant that Bobablog now has the last 5 OraBlogs posts and the last 3 BobaPhotoBlog posts. But there's no reason why it should stop there.

You could put up a feed of your del.icio.us bookmarks, the latest news from the BBC, a central collection of your task lists from BackPack, or (using the search facility) a short list of your own posts on a particular topic.

The reason I think this is cool isn't because it does anything new... it doesn't. The reason is that it does things in a way that means that non developers can do it.


Technorati Tags: , , , , , , ,

Saturday, September 10, 2005

Database Patch Runner: Design by Contract

So let's say that you've managed to put together a build script that will install the latest version of your database with the minimum of work and you've got your developers using the build to upgrade their own workspaces.

How can you push the patch runner? Actually test the upgrade and get easy to analyse information back from a nightly or continuous integration build? How can you protect your live databases and ensure that patches are only ever applied to databases when those databases are in the right state to receive those patches?

Bertrand Meyer developed the concept of Design by Contract. He suggested a technique whereby for each method there will be a contract, stated in terms of:
  • Preconditions: These conditions should be true before the function executes.

  • Postconditions: Given the preconditions are upheld, the function guarantees that on completion the postconditions will be true.

In some languages (e.g. Eiffel) this contract is enforced, and if either side of the contract is broken the method throws an exception.

Our patch runner works in a similar way. Whenever a patch is written, pre and postcondition scripts are written. The precondition script embodies the assumptions about / requirements for the state of the
database immediately before the patch runs. The postconditions state the shape the database should be in once the patch has completed.

Immediately before running a given patch the patch runner will run the corresponding precondition script. The patch will not then be run unless the precondition script both exists and runs to completion. Only if and when the precondition script runs and reports no errors will the patch be applied.
Once the patch is complete the postcondition script is executed. If the postcondition script is missing, or reports a failure, then the build stops and the error is clearly reported.
Only when the pre, patch and post scripts have completed is the patch log updated to state that the patch has been applied. If any point of the process reports an error then the patch log is updated to state that the patch failed, and for what reason.

For the majority of patches, the pre and postconditions are fairly trivial, and can appear to be overkill. For example, if you're adding an unpopulated column to a table then the precondition is that the column does not exist, the postcondition is that it does.

The real advantage comes when you write data migrations. For example, a patch is needed that will move column X from table A to table B. A requirement may be that every value that was in B must exist in at least one column in table A, that no data is lost during the migration.
A precondition could be that there is at least one destination row in table A for each value. By checking this condition before the patch starts we can stop the patch from destroying data.

Another example may be the reforming of some tables that contain monetary values. It may be required that whilst data is being moved between rows or columns, that the sum total of all values in a particular set of columns be the same on completion of the patch as it was at the start. The precondition script can store the original sum totals, the postcondition can then check the resulting totals after the patch is complete.

Producing the pre and postcondition scripts focus the developer on two things:
  • What assumptions am I making about the structure of the data I am about to transform, and which of those assumptions are critical to my patch's success?

  • How can I measure the success of my patch, and what requirements do I have of the resulting data?
Both of these things are very different to the usual focus of: What processes do I need to apply to the data? However, the result of focusing on the first two will usually result in a clear path to the third.

We've found that using this structure greatly increases the feedback we get from our builds.

If you can get your DBA to buy into the idea, you can produce an overnight test build using the most recent backup from your live system. Overnight, automatically copy live, upgrade it and have your build produce a report on the success of the current build against the current candidate for upgrade. This report will warn you if the data in the live system is moving away from the assumptions you originally had when producing your patches. It is much easier to deal with such issues when you find out during your development cycle, rather than during or, even worse, days after the upgrade has been applied to the production environment. An upgrade patch becomes a very reliable thing when it's been ran 50 times against 50 different sets of data, each time reporting its success of failure.

In addition, the postcondition scripts act as clear documentation as to the aim of the patch. This is of great use if and when you find a performance problem with a particular patch. Rewriting a patch 2 months after it was first put together is so much easier when you have a postcondition script to act as both documentation and sanity check.

As a possible variation, on finding failures it should be possible for the patch runner to automatically rollback the failed change, taking the database back to the state immediately before the patch started. In some environments this may be critical, however we've not yet found a pressing need and therefore in our version we do not automatically rollback failures.

Of course, the mechanism is only as good as the pre and postcondition scripts that are produced, but then you ensure quality by always pair programming don't you ;-)

Technorati Tags: , , , , , , , , , , , ,

Wednesday, September 07, 2005

The Database Patch Runner: Dealing with code

Previously I've talked about the database patch runner as a way of easily and consistently getting database changes into version control, onto databases and under automated frameworks. But one topic I think I glossed over was: Which database objects fall under the patch runner, and how do you deal with those that don't.

A big advantage of the patch runner is that it groups functional changes together and will ensure that those changes are applied to a given database once and only once. A disadvantage is that it removes the object centric view of changes from version control and puts it into the database. For tables, I really don't see this as a problem. For packages, procedure and functions this is simply unacceptable; we need to keep source code grouped in the more traditional object centric manner.

We could have developers make their changes to the individual procedures and suchlike and then get them to add the source code to the patches one they've finished. But they'd probably forget a file every now and again, and would moan about the task. We could have a build manager monitor the version control repository and generate the list of source code files that are needed for the jump between any two given tagged versions of the system and build patch files upon request. But that would mean that we would need to employ a build manager to do this job.

Ideally we want a solution that will minimise the overall work in managing the build script, not just the work for the developer.

One of the reasons why we use the patch runner is to ensure that destructive code, or code that changes the underlying structure of a database cannot be run more that once. The implication of this is that statements which can be ran multiple times without destroying structure or content, due to their intrinsic nature, do not need to form part of the patch runner.

That is, if a procedure was to be dropped and re-created, the only impact would be to temporarily invalidate any other procedures calling that procedure. Oracle can deal with this and can be set to re-compile dependencies when a procedure is called. The overall structure and content of the database is not changed by re-creating a procedure. The intrinsic nature of the procedure means that it can be re-installed without damaging the database.

This is not so when dealing with tables. If a table was to be dropped and re-created*, the impact would be to loose the data previously held in that table. The result of re-creating a table that has not changed is that the overall structure will remain the same, but the content will not.

* Re-creating a table here being: dropping the table, cascading the drop of any keys on / to that table, and re-creating both the table and those keys.
This is not a small amount of work in itself, and if this was not done, the structure would also change.


This difference is crucial in understanding which objects need to fall under the control of the patch runner and which objects do not, as well as understanding how we can then deal with the objects that do not.

Of course, there is nothing stopping you from putting any particular set of objects under the control of the patch runner if you feel it is appropriate. For example, indexes can be dropped and re-created without any risk of loss of content and so you may regard these as outside the patch runner's concerns. However, indexes on large tables can take time to create and most of the data migrations that take place in the patch runner scripts would benefit from those indexes. For these reasons it may sometimes be appropriate to put the indexes into patch scripts.

For us, the split between patch runner and non patch runner is as such:

Patch Runner:
  • Tables

  • Indexes

  • Materialized Views


Non Patch Runner:
  • Grants / Synonyms

  • Views

  • Packages / Functions / Procedures


The list isn't exhaustive by any means; it covers the objects we use. As any new object type comes into focus we get together to decide if it should be installed by the patch runner.

So, knowing which objects do not fall into the realm of the patch runner, we then need to actually deal with those objects.

As stated above, the primary factor for deciding when an object should fall under outside of the patch runner's remit is if that object can be freely replaced with a new version of that object with no further work. We use that to our advantage, and state that since all objects outside of the patch runner can be replaced when they have changed, then all objects outside of the patch runner will be replaced irrespective of whether that object has changed or not.

That is, our patch runner sits within a larger build script. In our case this build script is a master batch file and a collection of SQL scripts, though I see no reason why this shouldn't be Unix scripts, ANT tasks, or whatever. The overarching build scripts contain sections that will install objects that are required before the patch runner can execute, and others that require the patch runner to have completed its task before they are installed. In every case, each of these scripts install all objects required by the database, not caring if they previously existed in the correct form.

In some cases a single script will create the full set of a single object type; all grants are created in a single script, as are synonyms. In other cases a single script will exist for each individual object; each package specification has a script to itself, separated from its corresponding package body. In adding a new object to the database schema, the developer needs to add that object to the relevant controlling build script.

This setup allows us to separately manage the version control for each individual object where that object is sufficiently important, whilst managing the number of scripts that exist for the less verbose creation statements.

By ensuring that all objects are created during every build sequence we simplify two processes:
  1. Implementing a change in a given object requires only that the create script is edited, therefore minimising the work a developer must do to implement that change.

  2. Building the release does not require knowledge of the source version, only the target version, therefore minimising the work required in generating a release.


The result is that whilst the objects fall within a highly structured build script, the developer need not think about it except when adding new objects. The PL/SQL source code can be managed like any other source code, each component falling under version control as an individual component.

We acknowledge that the build script could be better, and that the controlling install scripts could be generated (run all files from directory x, or run all files with extension '.xxx'), though in reality the maintenance of this script is minimal (a single number of minutes each week). Also, putting this in place would have an impact only on the build script, not on the individual object create scripts.

Additionally, it can appear to be overkill to install the full set of source code on each build, even small patch releases. However, installing source code onto an Oracle database is generally pretty fast. We find that the build script spends far more time running data migrations than installing code.

We exclusively use packages (no standalone procedures, functions or triggers), and so our build scripts are simple: We can install all the package specifications, then all the package bodies and can be sure that they will compile; dependencies do not affect the order of installation.
Views are installed in an order dictated by dependencies, though this does not prove difficult. If it did I'm sure we could 'force create' them.

Simplicity aside, I'm certain that even if we had difficult interdependencies with a very large number of objects, we would first test the ability to manage their creation in this way.
In short, I acknowledge that for larger projects there may be difficulties with the approach in the simplest form, but I'm more than confident that the basic theory holds. If not, I'd like to hear why, since our small (ish) project will inevitably become a very large one, as is the way with database applications.

Technorati Tags: , , , , , , , ,

Friday, September 02, 2005

A little more conversation

In my usual style, I'm finding it difficult to walk away from a conversation on Wilfred van der Deijl's blog, this time with a guy called Chuck.

His comment can be found here, with my reply following close behind.

I wouldn't normally link to a comment like this, but I feel the message I'm trying to get across with the Patch Runner is well illustrated by my confusions about Chuck's company's approach.

As he clearly points out, the CMM level 3 accreditation states that the process is repeatable, but not necessarily appropriate.

Technorati Tags: , , , , , , ,

WTF

For a while now I've been reading the consistently amusing Daily WTF. Well now, thanks to William Robertson, James Padfield, Thai Rices and Adrian Billington we have the Oracle WTF. It's started well...

Technorati Tags: , , , , ,