Sunday, December 31, 2006

P-dd 0.1 finally released

So, with just 6 1/2 hours to go before the year ends I've finally managed to get P-dd - The PHP Database Documentor up to version 0.1 standard.

The blog's up and running, a slab of source code sits on Google code and at last I feel like I can stand beside the library and say "yeah, well it's good enough for now".

You can find the code here:

And the blog here:

So what does it do?

Put simply, it's a library of PHP classes that allow for the easy generation of documentation from a set of database sources.

The idea is that, over time, database sources will be added that will allow for the collection of meta-data from all the major database players (Oracle / MySql / Postgres / etc) and produce documentation in most of the popular forms (HTML / XML / RTF / PDF / etc) including ER diagrams.

The aim is to make the library simple to use to produce either applications that output documentation for static publication or applications that allow for navigation through the database structure. Note that it is not the aim of the project to produce either of these applications, merely to allow for their creation.

It is also recognised that in the future it is desirable to take the library into a more analysis role. For example - inferring foreign keys that are not explicitly stated, either by examining the table structures or the data within those tables.

The library is very much in its early stages though, and for now we've got the following:

  • A database model that consists of:

    • Tables

    • Columns

    • Primary Keys

    • Foreign Keys

  • The database model can be created from the following sources:

    • Oracle

    • XML File

  • The model can be rendered into the following formats:

    • HTML

    • XML

    • Graphviz Neato Diagram (producing an ER diagram)

There are also lots of other little goodies in there such as datasource in dependant filters, a datasource caching system that limits the round trips to the database and a plethora of examples showing how components can be used as well as a simple Oracle database viewer application to show off what can be possible with just a small amount of work.

I hope the code is of use, and I'm fully committed to getting more and more functionality into the code as soon as possible in the new year.

Note: The eagle eyed of you may notice that I've added a new sidebar to this blog which will list the blog posts from the P-dd blog...

Technorati Tags: , , , , , , , , , , , , ,

Wednesday, December 06, 2006

Repeating a point

The other day I mentioned the principle "Don't repeat yourself". I think it may have inspired Andy Clarke to write this up, and he's quite right. It comes from the Pragmatic Programmers.

APC's spot on in his description as it relates to writing code, but he doesn't go far enough.

DRY relates to every part of software development, not just the bit where you're knocking out code.

If, in any part of the process, you find you have a duplication of knowledge then you have a potential problem.

Anyone ever read that comment at the top of a procedure and found its description doesn't match the code that follows?

Watched that demonstration video and found that it's showing you an utterly different version of the system to that which you've just installed?

Looked at that definitive ER diagram and found it's missing half the tables?

Well, don't put a comment at the top of the procedure, instead document the behaviour by writing an easy to read unit test for it. Whilst the knowledge might be duplicated (the test duplicates the knowledge inside the procedure), at least those pieces of knowledge are validated against each other (if you have to repeat, put in some validation)

Don't have a team writing automated functional tests and another producing videos, write your video scripts as automated tests and have them generated with every build.

Instead of manually creating a set of ER diagrams and documentation on what the system will be like, write some documentation generation software and have it generated from the current state of the database instead.

You might notice that there's a running theme here... generation. Well yup. One of the best ways of reducing the chances of discrepancies between sources of knowledge is by ensuring there is only one representation of that knowledge and generating the others.

It's one of the reasons why I've been working on the new Open Source library 'P-dd' (Php Database Documentor). It's intended be a simple library for the production of database documentation from a number of different sources - the ultimate aim is to be able to read from any of the major RDBMS systems, Wikis, XML files and suchlike and be able to simply output many different forms, HTML, GIF, PDF, XML, Open Office Doc. Over the next week I intend on letting people know where they can find it, in its early form...

Wednesday, November 29, 2006

Worth repeating...

A mantra for all elements of software development...

Repeat after me:

Don't Repeat Yourself

Don't Repeat Yourself

Don't Repeat Yourself.

Monday, November 06, 2006

Testing doesn't have to be formal to be automated

Something I hear quite a bit from people who don't 'do' automated testing is the set of excuses that goes something like this:

"Look, we know it's a really good idea and everything but we just can't afford the start-up time to bring in all these automated test tools, set up a continuous integration server and write all the regression tests we'd need in order to get it up and running. And even if WE thought we could, there's no way we'd get it past the management team."

Now let's for a second assume that the standard arguments haven't worked in response: How can you afford not to; It'll save you time in the long run; Yada yada yada. When people are in that mind set there's not much you can do...

For reasons that I'm going to explain to you right now, we have a project on the go that isn't covered by automated tests. It's an inherited system that can't have tests retro-fitted in the kind of time we have. In reality, most of the work we're doing is actually removing functionality, with a few cosmetic changes and a little bit of extra stuff in the middle. No more than a couple of weeks work.

It turns out our standard automated test tools can't just be readily fit onto the system we have.

But that's OK. It certainly doesn't mean we're not going test, and I'm damn sure that we're going automate a big chunk of it.

First of all we've separated all the functionality that can be delivered in a new module and that part WILL be fully unit and story tested. That leaves a pretty small amount of work in the legacy system. Small enough that we could probably accept the risk associated with not doing any automated testing.

But that would be defeatist.

So instead we've picked up Selenium.

Not the full blown selenium server and continuous integration hooks and whatever. Just the Firefox based IDE.

It's simple, and requires absolutely minimal set-up... it's an xpi that just drops straight into Firefox like any other extension. Having installed the IDE you get action record and playback, and nice context sensitive right click options on any page that allows you to 'assert "this ole text" appears on page' or 'check the value of this item is x'. Basically it's almost trivial to get a regression test up and running. Then you can use the IDE to run the test.

So having got that up an running, before we set about deleting huge swathes of functionality we create a regression test that ensures that the functionality we want to keep stays there. We've found that a decent sized test that covers a fair few screens and actions can take us as little as an hour to get together. To put it into perspective: we put together a test script today that took about 20 minutes to run manually. That test took us about 45 minutes to sort out in the Selenium IDE and then a minute or two to run it each time after that. So by the time we'd run it 3 times we'd saved ourselves 15 minutes!

Running it might involve executing a SQL script manually, running a Selenium script, then checking some e-mails arrive, then running another Selenium script. In short, it might involve a few tasks performed one after another. And yes we could automate the whole lot, but like I say... we just don't have the time right now. But using the tool to add what tests we can right now, to help us with our short (a few hours each) tasks means that we're building up a functional test suite without ever really thinking about it. We'll keep those scripts, and maybe in a couple of weeks we'll realise that we DO have the time to set up everything else we need for proper functional testing.

Yep, it could be a hell of a lot better (and on our other projects it is), but some informal testing using an automated runner is an order of magnitude better than no automated testing at all.

Technorati Tags: , , , , , , ,

Wednesday, November 01, 2006

Going Dotty

There's a new big thing in my sphere of interest: Dotty.

For the uninitiated: Dotty, Neato and Lefty are a family of products from Graphviz that take pretty simple text files and generate directed or un-directed graphs.

For the initiated: No, I can't believe I've never found it before either!

It was name checked during the Google LTAC by at least one presenter, and they reminded me that I'd heard its name quite some time ago and meant to look it up. When we were looking at a diagramming problem just a few weeks ago and I figured I should track it down. I was by no means disappointed.

Basically we wanted something that would graph our MVC workflow configurations to make them more readable.

That is, our MVC structure allows us to string arbitrary tasks together: perform x, if result is y, go to task z, if result is h go to task i.

The idea is to keep these configurations as simple as possible; they're only really receiving user input and then prodding objects, but still, there are some complexities. This is especially true when branches split and rejoin. For some reason, XML files or a PHP arrays can be difficult to read ;-)

Quite a long time ago we wrote a small application that would graph them in HTML, but we never liked its results. When paths split and rejoin, the HTML representation wouldn’t show the rejoin.

So, as I say, we picked up Dotty.

Simply genius.

For directional graphs the Dot output is stunning. We can pass it a file in (the trivially simple) Dot language and it'll produce great looking diagrams.
For example, the file:

digraph finite_state_machine {
node [ fontsize="12", fontname="arial"]
edge [ fontsize="8", fontname="arial" ]
EntryPoint [ label="EntryPoint (BuildSheepFromInput)", shape="diamond" ];
EntryPoint->SaveEditedSheep [ label="DEFAULT" ];
SaveEditedSheep [ label="SaveEditedSheep (SaveEditedSheepTask)" ];
SaveEditedSheep->SaveCheese [ label="DEFAULT" ];
EditSheep__EntryPoint [ shape="box" ];
SaveEditedSheep->EditSheep__EntryPoint [ label="ERRORS" ];
SaveCheese [ label="SaveCheese (SaveCheeseTask)" ];
SaveCheese->GetCheeseType [ label="DEFAULT" ];
GetCheeseType [ label="GetCheeseType (GetCheeseTypeTask)" ];
DisplayWensleydale__EntryPoint [ shape="box" ];
GetCheeseType->DisplayWensleydale__EntryPoint [ label="WENSLEYDALE" ];
GetCheeseType->DisplayCheddar__ComposeMessage [ label="CHEDDAR" ];

Would produce:
SaveCheeseWorkflow – Example DOT image


It's not difficult to write code to generate the DOT files, and the output from neato (the same as dot, but for undirected graphs) is just as high quality.

Of course, as soon as I saw the output the cogs started moving in my mind... I'm now on a bit of a brainstorm on what can come next: How about ER diagrams generated from the database schema and published on an internal site. Generated documentation is never out of date, and it's a damn site easier having it generated of the fly than it is to load up Visio and get THAT monstrosity to do the job for you.

Anyway, the ER diagramming library will be open source, and it IS on its way... I promise.

(Note: if you want more info on dotty, take a look here)

Technorati Tags: , , , , , , , , ,