Tuesday, March 25, 2008

A reading list for our developers

An idea I'm thinking of trying to get implemented at our place is a required reading list for all our developers. A collection of books that will improve the way that developers think about their code, and they ways in which they solve problems. The company would buy the books as gifts to the employees, maybe one or two every three months.

Some questions though:

  • Is it fair for a company to expect its employees to read educational material out of hours?

Conversely:
  • Is it fair for an employee to expect to be moved forward in their career without a little bit of personal development outside the office?


If anyone has any books out there that they'd recommend - please let me know. Otherwise, here's my initial ideas - the first three would be in your welcome pack:

Update:Gary Myers came up with a good point, being that any book should really be readable on public transport. That probably rules out Code Complete (although I read it on the tube, I can see that it's a little tricky), but Design Patterns and Refactoring to Patterns are small enough I reckon.

Unfortunately, Code Complete is a really good book that gives a lot of great, simple, valuable advice. Does anyone out there have any other suggestions for similar books?

Update 2:Andy Beacock reminded me of Fowler's Refactoring, which really should also make the list.

Update 3:The development team have bought into the idea and the boss has been asked. In fact, I'm pretty pleased with the enthusiasm shown by the team for the idea. I can't see the boss turning it down. Interestingly though, someone suggested that Code Complete go onto the list...

In this order:


Ruled out because of their size:

Tuesday, September 04, 2007

Database Build Script "Greatest Hits"

I know its been a quiet time on this blog for a while now, but I've noticed that I'm still getting visitors looking up old blog posts. It's especially true of the posts that relate to "The Patch Runner". Many of them come through a link from Wilfred van der Deijl, mainly his great post of "Version control of Database Objects". The patch runner is my grand idea for a version controlled database build script that you can use to give your developers sandbox databases to play with as well as ensuring that your live database upgrades work first time, every time. It's all still working perfectly here, and people still seem to be interested, so with that in mind I've decided to collate them a little bit. basically provide an index of all the posts I've made over the years that directly relate to database build scripts, sandboxes and version control. So, Rob's database build script 'Greatest Hits': All of the posts describe processes and patch runners that are very similar to those that I use in my work every day. I started playing with these theories over 3 years ago now and there is no way I'd go back to implement database upgrades the way I did before. However, I'd LOVE to hear ideas on how things can be improved. I'd be amazed if my three year old thinking was still up to date! Technorati Tags: , , , , ,

Friday, July 20, 2007

Problems with CVS removes?

Accidently removed a file in CVS that you want to keep?

Sounds like a stupid question, because when you know the answer to this problem it just seems blindingly obvious,
but what if you've issued a 'remove' against a file in CVS and before you commit the remove you decided that you
made a mistake and still want to keep it?

I.E you issued (for example)

> cvs remove -f sheep.php

But not issued

> cvs commit -m removed sheep.php

I've heard work arounds such as:
  • Edit the "entries" file in the relevant CVS directory in your workspace, removing the reference to the file.
    This makes the file appear unknown to CVS.
  • Perform an update in that directory. This gets the repository version of the file and updates the "entries"
    file correctly


All you actually need to do is re-add the file:

> cvs add sheep.php

U sheep.php
cvs server: sheep.php, version 1.6, resurrected

When used in this way, the add command will issue an update against the file and retrieve the repository version of the file.

A word of warning though, if you had uncommitted changes in that file before you issued a remove, CVS isn't going to recover that for you...

How about if you've removed a file, but your version of the file is out of date and so you can't commit it?

So you've issued the following:

> cvs remove -f sheep.txt

cvs server: scheduling 'sheep.txt' for removal
cvs server: use 'cvs sheep' to remove this file permanently

> cvs commit -m removed sheep.txt

cvs server: Up-to-date check failed for 'sheep.txt'
cvs server: correct above errors first!

You can't issue an update because you get the following:

> cvs update sheep.txt

cvs server: conflict: removed sheep.txt was modified by second party
C rob_tmp.txt

Again, add the file.

> cvs add sheep.php

U sheep.php
cvs server: sheep.php, version 1.6, resurrected

This gets you the most up to date version from the repository, that you can then check for changes (you wouldn't want to just remove it now that someone's added new content would you?)

Once you've convinced yourself that it's still a good idea to delete it, just issue the remove and commit.

Simple when you know how!

Thursday, July 12, 2007

Can a change in execution plan change the results?

We've been using Oracle Domain indexes for a while now in order to search documents to get back a ranked order of things that meet certain criteria. The documents are releated to people, and we augment the basic text search with other filters and score metrics based on the 'people' side of things to get an overall 'suitability' score for the results in a search. Without giving too much away about the business I work with I can't really tell you much more about the product than that, but it's probably enough of a background for this little gem. We've known for a while that the domain index 'score' returned from a 'contains' clause is based not only on the document to which that score relates, but also on the rest of the set that is searched. An individual document score does not live in isolation, rather in lives in the context of the whole result set. No problem. As I say, we've known this for a while and so have our customers. Quite a while ago they stopped asking what the numbers mean and learned to trust them. However, today we realised something. Since the results are affected by the result set that is searched, this means that the results can be affected by the order in which the optimizer decides to execute a query. I can't give you a full end to end example, but I can assure you that the following is most definately the case on one of our production domain indexes (names changed, obviously): We have a two column table 'document_index', which contains 'id' and 'document_contents'. Both columns have an index. The ID being the primary key and the other being a domain index. The following SQL gives the related execution path: SELECT id, SCORE( 1 ) FROM document_index WHERE CONTAINS( document_contents, :1, 1 ) > 0 AND id = :2 SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SCOTT.DOCUMENT_INDEX DOMAIN INDEX SCOTT.DOCUMENT_INDEX_IDX01 However, the alternative SQL gives this execution path: SELECT id, SCORE( 1 ) FROM document_index WHERE CONTAINS( document_contents, 'Some text', 1 ) > 0 AND id = :2 SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SCOTT.DOCUMENT_INDEX INDEX UNIQUE SCAN SCOTT.DOCUMENT_INDEX_PK Normally, this kind of change in execution path wouldn't be a problem. But as stated earlier, the result of a score operation against a domain index is not just dependant on the individual records, but the context of the whole result set. The first execution provides you a score for the single document in the context of the all the documents in the table, the second gives you a score within the context of just that document. The scores are different. Now obviously, this is an extreme example, but more subtle examples will almost certainly exist if you combine the domain index lookups with any other where clause criteria. This is especially true if you're using literal values instead of bind variables in which case you may find the execution path changing between calls to the 'same' piece of SQL. My advice? Well, we're going to split our domain index look ups from all the rest of the filtering criteria, that way we can prepare the set of documents we want the search to be within and know that the scoring algorithm will be applied consistently.