Tuesday, July 25, 2006

Version Control and the Patch Runner

Commenting on my first post on upgrading databases, someone with the moniker 'gonen' asked a question:

"How do you handle the situation where one developer is checking in code that doesn't have to be released in current release?"

It's a good question, but it's a question mainly of version control rather than the patch runner itself. In fact, whilst the question was asked against the database patch runner post, it's interesting that the question doesn't mention the word 'patch', it uses the word 'code'. This is a general question that relates to any code, documentation, or other product of software development.

But, before I answer the question I'm going to turn it round a little and ask:

'Why do you have code that you want in version control that you don't want in the next release?'.

I can see three main answers to this one:

1 - The code is useless and so there's no reason to ever release it.

This is probably the most unlikely answer, but hey, it happens.
If this is the case, why are you keeping it? Commonly such code is kept because "it might be useful in the future and I don't want to throw it away".
Well, is it really likely to be useful? If it is historic code that is already is version control, then with most VC you can remove it and then still get it back at a later date if you need to. Proper version control software will record the fact that the file was deleted at a particular point in time but make sure the version exists in the historic versions. Otherwise you wouldn't be able to re-release only versions of your software that requires that code.
If it's new code that's not used anywhere, isn't yet in version control and isn't currently useful... why on earth would you want that in your system? It'll likely never get used, and merely hang around for years to come by which time people will be confused by existence but too scared to delete it.
The answer: If you don't need it, don't check it in. If it's already checked in, remove it. In either case, delete it, you don't need it.

2 - You're trying to fix a bug in the live system, but since you've made the original live release you've done a load more development. You're not ready to release the new stuff to live yet.

Or to put it another way. There was a live version of the system was released and since then Alex has been working on bug fixes. At the same time Ben's working on the new release and he new funky functionality. A few weeks in and Ben's been checking in his new stuff, but hasn't quite finished it. Alex gets given a critical bug to fix that must ship to the customer as soon as it's fixed. She can't commit and release the head of the trunk because it contains Ben's half finished stuff.

Ok. So when you made the live release you tagged up the release version in your version control didn't you. If not, you can always go back and do it, you just need to know the date you cut the release. And from now on you'll tag up everything that you release right?

The tagged version is the version you want to make your live bug fixes against, not the version at the head of the trunk. When you release a bug fix version you only want the bug fixes to be released, never the new functionality

To do this, you create a branch from the tag and do your bug fixing there. On that branch you now have your live version plus your bug fix. This branch doesn't contain any of the changes that you've made in the trunk since the live version was released. Of course, whenever you make a bug fix in the branch you make sure that the bug fixed is made in the trunk version as well, otherwise when your customer comes off the branch the bug will reappear.

3 - You have several people working on code at the same time, some on short term work, some on longer term stuff. Therefore have incomplete work you don't want to release yet.

For example, Alice has a long running bit of work and has been checking code into VC as she goes along. She hasn't really finished the job yet. Ben has been doing the same, but has managed to complete his. The trunk now contains a mixture of complete and incomplete work.

If you have several streams of concurrent long term development and often get to points where you need to release one stream without releasing another then branches can help again.
Rather than allow Alice and Ben to work directly on the trunk, give them a branch each. Known as 'task branches', each branch exists purely for the length of that task. Once the task is complete the developer merges their branch into the trunk and then discards the branch. When the next long term task arrives another branch is created and the work for that task done there

Going back to the example above:
Alice and Ben work on their own branches. As Ben finishes his work and commits into his branch we have the following situation: Ben's work is complete on his branch. Alice's incomplete work is on her branch. The trunk doesn't contain either Alice or Ben's work.
Now Ben's finished his work on the branch he merges it onto the trunk, effectively promoting it into the general release. Ben's task branch can now be discarded. The trunk can now be released to the customer.

Note that if Alice had finished her work first, then her work hits the trunk and the customer release can contain that bit of functionality instead of Ben's. It is not important which bit of work is finished first, which branch is discarded first. There is flexibility inherent in the system.

However, in this situation there may be another underlying problem. Ask yourself the question, would it be possible and be more efficient to have a single stream of work that gets completed quickly rather than multiple streams that each take time? That could engender a team attitude as well as focus the developers, testers, project managers, and business as a whole on a single task at a time. Yep, I know, that's not always possible...

So what about the patch runner then?

The patch runner can work very nicely in this branching structure, just as long as it's coded to deal with the fact that patches should only ever be installed once.

To quote myself, from that earlier post:
So, in summary, you check out a given tagged version of the application to run against an arbitrary database and run the patch runner. It loads the list of patches that are applicable for that tagged version. It runs over each patch in turn and checks if it has previously ran. If it has not, then it runs the patch.
By the time it reaches the end of the list it has ran all the patches, none of them twice. You can even run the patch runner twice in succession and guarantee that the second run will not change the state of the database.


Also, if the answer is in version control then the patch runner, the patch list and all the patches themselves need to be in version control. If they're not, then the fact that version control can answer your original question is itself a great reason for putting version control in place. Did you know that CVS and Subversion are both free... as is Tortoise (CVS and SVN). Download and install one of them, it'll take you all of 15 minutes to get a local version up and running and experimenting with.

Finally, if you want more information on version control then there are three books out there that you really should read. Well, you should read two out of three of them anyway...

Software Configuration Patterns - Steve Berczuk and Brad Appleton
Pragmatic Version Control using Subversion - Mike Mason
or
Pragmatic Version Control using CVS - Dave Thomas

And if you want more blog posts on version control then your man is Mike Mason... The go to guy on using Subversion in the real world.

Technorati Tags: , , , , , , , , , , , , ,

Tuesday, July 04, 2006

Merge triggers

Had a quick puzzler at work today that I thought I'd share... and the link to Ask Tom that solved it (before I could be bothered to delve into the test case myself).

Q: Which statement level triggers fires on a MERGE statement?

A: Since it is possible for either inserts or updates to occur during the statement, and since statement level triggers always fire even when no rows are processed, both the INSERT and UPDATE triggers fire every time, regardless of whether any inserts or updates occur.

Obvious when you think about it. Trouble is, how many our YOUR statement level triggers would work properly if both sets fired at the same time? There's no reason why they shouldn't, if you code for the possibility....

Tom (and Mikito and Kevin and others) explains here.