Sunday, January 15, 2006

We need space to develop

I've talked previously about how important I think it is that individual developers (or pairs) get to work in their own sandbox (not least here). This means that developers can experiment; can try avenues that would otherwise break the workspace for other developers; can refactor the database itself.

Obviously, in order to do this, you need to actually have development databases for those developers to own. This tends to lead to the DBA team announcing that you'd need far too many databases, that you simply can't be serious, that we've no idea how overworked they already are and that we can't expect them to take on another x databases, where x equals the number of developers. Oh, and by the way... who's going to build them? I can fit the first 3 in some time the middle of the year after next, god knows when the other 7 will get built.

I can understand the reaction. I'd be exactly the same if I thought that someone was suggesting that I doubled my workload. Luckily for DBAs, that's most definitely not what I'm suggesting.

The misunderstanding comes from the thought that a development database is the same species of beast as the live database. It isn't, not by a long way.

The major difference between a live and development database is how their role defines the use of data. In a live database the single most important thing in that database is the data it holds. It is the sole purpose for the existence of that database. Fine, we reshape that database so that we can access the database in new and interesting ways, but all said and done, if that data wasn't there then the database would be useless. In the development environment this is not the case.

The most important thing in a development database is its structure. Its purpose is to work as a validation tool for the live database, to ensure that the application we're producing works against the live database and to allow us to develop the structure of that database in order to allow to us to provide our application. It's important that we hold data that has a texture similar to that which exists in live, but it doesn't matter which data we hold. The data itself has no intrinsic value. We can generate our own.
In fact, when we run unit tests on our PL/SQL code, this is exactly what we do; we generate our test data so we know exactly what form it is in. That way we can make clear statements about the form we expect it to be in when our PL/SQL has completed.

Also, we don't need to hold the same volume of data. In live, every single piece of data is of importance and so you can't just remove it*. In development we can limit ourselves to create only the data that is of importance to us at this point in time. If our development databases contain a very small amount of data, and the means to generate the data it needs, then you remove a large amount of the DBA's work.

That is, most the work a DBA will do in maintaining a live database is purely down to the fact that it is a system that contains a large volume of critical data. It must always be available, it must serve data within certain performance requirements, and if the machine it's running on suddenly blows up then we must be able to recover the data. Quickly.
Take away everything other than the 'it must always be available', and you've taken away most of the DBA's work.

When we started development, we didn't tell our DBA that we needed 'x' new development databases. We told our DBA that we needed 1. We said we didn't need it backed up, just that we needed it to be up during working hours. We said, make it about the same size as live and it'll always be big enough.
We then ran our build scripts multiple times on that database, creating a schema for each of our pairs of developers. This is the same build script we would use to upgrade live, it just has a development mode that allows us to say which schema name we want it to install into.
The DBA has needed to do next to nothing with the database since it was created.

Now I admit, we have only around 10 development databases at the moment. Maybe the problem gets a lot harder to manage when you have 100 development databases. But then again, if you have 100 developers then I'm damn sure you've got more than 1 DBA. You've got a DBA team. And if you've got a team then surely someone in that team's got the time to build you enough databases to support your development team. If not, then you've probably got more serious problems to deal with...

Of course, none of this takes into account the fact that your customer needs to test the application, that you'll need to performance tune and test along side all the other systems running before you release and you need a test environment for duplicating issues once the system goes live. But NONE of these things should be happening in your development environment... a topic for another post. Later.

* Actually, you often find that most the data in a live database is useless and only a small proportion is ever used. But try telling that to your customer and getting them to let you archive it...

Technorati Tags: , , , , , , , , ,, , ,


Wilfred said...

Hi Rob,

I'm currently setting up version control and build automation at Eurotransplant.

We also decided to go for each developer having his/her own sandbox. However, I cannot manage this in one central database since our application consists of a number of schemas and rely on public synonym.

I used Ant to automate the process of setting up a whole new database. I'm doing that in our continuous integration with CruiseControl. A developer can once again use Ant to create a local database on his/her workstation by getting these files from the central build server.

This means the developer is running a database on a local workstation. I was lucky enough that we were replacing the workstations anyhow. We decided to invest an extra $300 per workstation for some extra memory and harddisk.

Keep up the good work. I got a lot of ideas from your blog in setting up our development process for about 22 Oracle developers.

Rob Baillie said...

Cheers for the encouragement Wilfred, always nice to know when my blathering is appreciated!

We also have an application that has multiple schemas and a few public synonyms. Thankfully we managed to create a dual mode build script that changes the public synonyms to private in the development environment.

I've got a plan to write up the structure soon as I think it may be of use as a generic approach to developing against legacy systems...

Anyway, I'm sure that there are situations where you simply can't avoid creating individual databases on developer workstations. However, it's not ideal when you pair program, and I think you generally end up with the DBA involved more often than if it was a central database:

'Mate, can you just take a look at my machine, I keep getting "failed to start dedicated server process".'

'Yeah, it's because you installed a shoddy copy of ## arcane product x ## and it's messed around with your PATH setting and your services. Give me half an hour...'

Like I say though, I'm sure there are situations where you can't avoid it!