BOBABLOG: Agile software development and Salesforce

Tuesday, September 12, 2006

Spare a moment for the little people

They're here, and they may need your help.

Also, it's good to see the new world order we were promised has finally arrived, though if might appear that Disney World has taken the political situation a bit far over here

Sunday, September 10, 2006

A language for discussing performance testing

OK, so all I'm doing here is repeating the text that I previously had in a post about the Google London Automation Testing Conference, discussing the talk by Goranka Bjedov on Performance testing.

I figured that it was sitting in the middle of a fairly large post, and I wanted it to be seen and reviewed by more people than would be bothered to plough through the other stuff.

It's a suggested series of terms by which different types of performance tests can be described:

Performance Test: Given load X, how fast will the system perform function Y?
Stress Test: Under what load will the system fail, and in what way will it fail?
Load Test: Given a certain load, how will the system behave?
Benchmark Test: Given this simplified / repeatable / measurable test, if I run it many times during the system development, how does the behaviour of the system change?
Scalability Test: If I change characteristic X, (e.g. Double the server's memory) how does the performance of the system change?
Reliability Test: Under a particular load, how long will the system stay operational?
Availability Test: When the system fails, how long will it take for the system to recover automatically?

In addition to the above, there is then another term, which I would suggest is not a type of test in its own right, rather it is a term denoting the depth of analysis being performed.

Profiling: Performing a deep analysis on the behaviour of the system, such as stack traces, function call counts, etc.

Any thoughts?

Saturday, September 09, 2006

Google Automated Testing Conference (London) - Day 2 (part 2)

Selenium: The in-browser acceptance testing tool (Google video)

Another talk I was looking forward to was Jason Huggins, the creator of Selenium talking about his tool. I was hoping for a little on the basics of the tool, but really the idea is simple enough to be easy to describe.

It is a test framework that allows you to build Fit like (HTML table), or code based UI tests and then drive your web application through multiple browsers. The result is a solution to the web version of the mobile problem... How do you test the against the diversity of target platforms.

The product ships with an IDE that sits in Firefox and allows you to record actions against the application, then edit the resulting test scripts. These scripts can be in any of many languages such as Java, C# and Ruby. I did notice the lack of PHP support though. Shame, but does it really matter?

The bulk of his talk was then one some of the possibilities you can get from this. In particular, his demo was to check a change into subversion, this would kick off cruise control and run the unit tests for his app. The successful build message would then get picked up by a number of listeners running in different OSs running on virtual machines. These listeners would kick off Selenium for different browsers, all run the same set of tests and record the resulting actions as movies.

His ultimate aim was then to add some voice over and these screencasts could be used as marketing videos: in fact, he suggested that the Apple voice synthesis would be good enough for that job (especially the new version coming out soon), and the spoken text could be in the test itself.

Very nice idea. it's a shame the demo didn't actually work ;)

One point made was that the movies are going to get big, and maybe you wouldn't want to keep them for all versions. Maybe, but then maybe you don't need movie files... Maybe just the html files will do. You can then wrap it up in a player that will move you between pages and kick off the right bit of audio at the right time. A selenium driven app won't show you the mouse pointer, so you don't loose that... You could highlight clicks and focus with a bit of nifty css. Alas I didn't get a chance to pass the idea on to Jason as he was (unsurprisingly) surrounded for most the rest of the day.

Main message: When you're innovating, your demos might not be as smooth as you want them to be ;)

Or maybe

Main message: The DRY principle (don't repeat yourself) crosses so many boundaries it's crazy. Be aware that there are many ways you can re-use the same data / process / tools, and automation can be applied to many many things.

Testing Metro WiFi (No Google video yet :( )

Karl Garcia of Google was next up, talking about testing the open wireless network in Mountain View, California.
This new network covers about 12 square miles of reasonably densely populated urban area. The network consists of just under 400 wireless nodes sitting on the top of lampposts spread no more than about 150 yard apart. This mesh is then connected to 3 base stations via a number of gateways spread more sparsely.

The question is, how do you automate the testing of such a beast.

The testing was covered in two main stages. First the coverage test... Simply taking a device that will poll the network and driving it down every street. Hook that up to GPS and get it to record the network strength and you've got stage 1 covered.

The second is the throughput testing. For this the team grabbed cheap routers and PDAs, installed Linux and iPerf (network testing tool), setup their start-up so that they'd report in to the main server and get ready to run test, make sure that if they fail they go into a restart mode and then gave them cheap solar panels. Then it was just a case of picking a spot to test, taking the clients out and leaving them.

Back at the office the test server polls for the clients, gets them to run the suites and reports the results. OK, so the machines have to be taken out and placed, but a small number of inexpensive components makes the kit cheap. If the client machines go down then you just need to wait for them to reboot and they'll get back in touch. No need to go out and reset them.

When you need to change the test area you just go out, pick the clients up and move them. Karl admitted that they could have had more clients and that they considered having more expensive bits, but in the end the kit they had was stupidly cheap and more than fit for purpose.

Main message: Not every part of the process can be automated, but you can minimise those bits that can't and still get great benefits. The most expensive way of doing that isn't always the best.

Distributed Continuous Quality Assurance: (Google video)

And the last full talk of the day was Adam Porter. With an incredible bio (in terms of academia), it seemed as though Adam thought the lightning talks had started early and he powered through 45minutes of information dispensing, at an incredible speed.

His discussion as very much at an academic level, but the system he talked about sounds like it may be one of the next big things in testing.

The starting premise is this: traditional testing strategies are failing large scale development because the variations possible in the configuration of most new systems are enormous. It simply isn't possible to test the full variation of configuration options, deployment operating systems, setups of those operating systems and so on.

His system (Skoll) attempts to address that.

I'm going to go into no depth on this topic whatsoever because the amount of information was overwhelming, but I'll try to give an overview.

By splitting QA tasks into small chunks of work, defining the valid combinations of deployment, providing a means of producing those variations and using a grid of machines to test on it is possible to cover an enormous amount of combinations in your tests.
If you then get smart about which combinations should be tested then you can statistically cover the whole set without having to actually test them all.

You can test disparate configurations until you find a failure, then take that configuration, change it in a single dimension and then test again... Feel out the extent of the failure scape.

You can then datamine those results to try to provide clues on the underlying problem.
He gave compelling arguments that certain strategies for choosing particular configurations work and followed every one with a case study that tested it empirically.

He also showed how the approach can be used for tracking performance problems with a selective method of benchmarking.

It was a very solid, very info laden, well thought out talk n a great approach. When the video comes out, I urge you to watch it. At half speed.

And if you're really interested in reading the papers, there's one here and you can find many more (or copies of the same) by searching on Google for Skoll testing.

Main message: It's not just about the volume of QA you perform, it's about the QUALITY of the QA. By getting smarter in your testing you can cover more of the application in less time.

Lightning Talks: (No Google video yet :( )

Finally it was the turn of the lightning talks. 10 speakers, 10 subjects, 5 minutes per speaker, no exceptions. I'm not going to cover much of the ground here because it was, of course, a lot of info in a small amount of time. It's worth picking up the video when it's available and watching it. The quality's a bit hit and miss, but you can always skip the 5 minutes of the person you don't like.

Highlights were Dan North on 'Getting Lean' – what it means and how automation helps you get there, Steve Freman and Nat Price giving us a glimpse of jMock, particularly of jMock 2 (looking tasty). James Lyndsay reminding us that "automation good" does not equal "manual bad" and Jordan Dea-Mattson reminding us that our QA processes need to have "defence in depth", or "overlapping fields of fire".

All in all, an excellent 2 days!

Google Automated Testing Conference (London) - Day 2 (part 1)

So, day 2 has been and gone, and it seemed slightly quieter than day 1.
It seems like the free bar took its toll. Despite the slight drop in numbers, the conversations seem a little more free flowing. Damn it! Lesson to earn from this conference: 'No matter how bad you feel, go to the free bar'.

Anyway, this time round I'm going to tackle the blog entry in two chunks... Thursday's entry was just too damn big!

Objects - They just work: (Google video)

I may be biased here, but I was really disappointed with this one. It looked like it might be a talk on the decomposition of objects into the simplest form so that they just work.

It wasn't.

The title was a reference to the NeXTSTEP presentation given by Steve Jobs way back in 92 when he said the same. The point being that they didn't just work. There was a lot of pain and hardship to get them to work, and that's a recurring theme in software development.

So the talk was given by Bob Binder, CEO and founder of mVerify. He discussed a few of the difficulties and resulting concepts behind their mobile testing framework.

We've already had a discussion on the difficulties involved (permutations), so that didn't tell us anything we didn't already know.

He moved on to mention TTCN (Testing and Test Control Notation) an international standard for generic test definition. It's probably worth a look.

He also mentioned the fact that their framework added pre and post conditions to their tests - require and ensure. I may be a die hard stick in the mud here, but the simplicity of 'setup – prod – check – throw away' seems like a pretty flawless workflow for testing to me. Though I admit, I could well be missing something: If anyone can enlighten me on their use I'd be reasonably grateful. Though thinking about it, I wasn't interested enough to ask for an example then, so maybe you shouldn't bother ;)

One good thing to come out was the link to that NeXTSTEP demo.

I really want to say more and get some enthusiasm going, but sorry Bob, I just can't.

Main message: Try as they might, CEOs can't do anything without pimping their product and glossing over the details.

Goranka Bjedov - Using Open Source tools for performance testing: (Google video)

After the disappointment of the first talk, this one was definitely a welcome breath of fresh air.

Like the first time I read Pragmatic Programmer, this talk was packed full of 'yes, Yes, YES' moments. If you took out all the bits I had previously thought about and agreed with you'd be left with a lot of things I hadn't thought about, but agreed with.

When the videos hit the web you MUST watch this talk.

Goranka proposed a vocabulary for talking about performance tests, each test type with a clear purpose. Having this kind of clear distinction allows people to more clearly define what they're testing for, decide what tests to run, and ultimately work out what the test results are telling them.

Performance Test – Given load X, how fast will the system perform function Y?
Stress Test – Under what load will the system fail, and in what way will it fail?
Load Test – Given a certain load, how will the system behave?
Benchmark Test– Given this simplified / repeatable / measurable test, if I run it many times during the system development, how does the behaviour of the system change?
Scalability Test – If I change characteristic X, (e.g. Double the server's memory) how does the performance of the system change?
Profiling – Performing a deep analysis on the behaviour of the system, such as stack traces, function call counts, etc.
Reliability Test – Under a particular load, how long will the system stay operational?
Availability Test – When the system fails, how long will it take for the system to recover automatically?

I would probably split the profiling from the list and say that you could profile during any of the above tests, that's really about the depth of information you're collecting. Other than that I'd say the list is perfect and we should adopt this language now.

She then put forward that infrastructure you need in order to do the job.

I don't want to be smug about it, but the description was scarily similar to that which we've put together.

Alas, the smugness didn't last long because she then went on to tell us the reasons why we shouldn't bother trying to write this stuff ourselves... That the open source community is already rich for doing these jobs, directing us to look at Jmeter, OpenSTA and Grinder. A helpful bystander also directed us to opensourcetesting.org - there are a lot of test tools on there.

Fair enough... I admit we didn't look when we put together our test rig, but you live and learn. And I'll definitely be taking a look for some DB test tools.

A big idea I'll be taking away is the thought that we could put together a benchmarking system for our products. This isn't a new thought but rather an old one presented in a new way. Why shouldn't we put together a run that kicks off every night and warns us when we've just killed the performance of the system. It's just about running a smoke test and getting easy to read latency numbers back. Why not? Oh, I'll tell you why not... We need production hardware ;)

She then gave us a simple approach to start performance testing with, a series of steps we can follow to start grabbing some useful numbers quickly:

Set up a realistic environment
Stress test
- Check the overload behaviour
- Find the 80% load point
Build a performance test based on the 80%
- Make it run long enough for a steady state to appear
- Give it time to warm up at the start
- Collect the throughput and latency numbers for the app and the machine performance stats.

If I wasn't already married, I might have fallen in love :)

Main message: You CAN performance test with off the (open source) shelf software, it just takes clarity of purpose, infrastructure, a production like deployment and time.

Oh, and you're always happiest in a conference when someone tells you something you already know ;)

Testing Mobile Frameworks with FitNesse: (Google video)

As the last one before lunch Uffe Koch took the floor with a pretty straightforward talk. By this time I was sick of hearing about mobile testing ;)
The thing is, the manual testing problem is so big with mobiles that it's prime for automation.

It turned out that he gave a pretty good talk on the fundamentals of story (or functional) testing practice. For a different audience, this would have been a fantastic talk, but unfortunately, I think most people here are already doing many of the things.

A lot of the early part of the discussion crossed over between the Fit and Literate Testing talks from the day before, though the ideas weren't presented in the same kind of depth. The suggestion that the test definition language of 'click 1 button' was a domain was pushing it, but the point is reasonably valid. The structure of test definition languages need to be very different to the programming languages we're used to. This is one of the winning points of Fit, since its presentation is very different to Java, C# or whatever it's approached by the developers in a very different way. Kudos to Uffe for realising this explicitly and producing a language for driving the app framework.

His team have put together a UI object set that can be driven through the serial or USB port of a phone and can report the UI state back to the tester at any time, passing it to the tester as an XML document.

It's very similar to our method. We do it so that the test don't need to know about page structure, just the individual components; so we don't need to worry with things like XPath when we want to extract data from the page; so our story tests aren't as brittle as they could be. They're doing this to solve the problem of screen scraping from a phone.

It's an elegant solution to testing these phones and whilst Uffe admits that it means you're not testing the real front end, or the real screen display, it allows them to hook up a phone to the test rig and run the full suite. I'm sure those tests must take and age though... doing a UI test of a web page is bad enough, but some of those phones can take some time to respond! I'd like to see the continuous integration environment. I've got an image of 500 Dell machines hooked up to different phones through masses of cables. That'd be cool!

The common FitNesse question did come up: How did you address the version control of the FitNesse scripts. Like everyone else (it seems), the archiving was switched off, local copies of the wikis were created and they got checked into the same version control as the code when they were changed. I really feel I've got to ask the question: If this is the way everyone does it, why isn't there an extension to the suite to allow this out of the box?

Main message: With a bit of thought and design even the most difficult to test targets can be tested. You just might need a tiny touch of emulation in there.

And that led us on to lunch...

Thursday, September 07, 2006

Google Automated Testing Conference (London) - Day 1

Google Automated Testing Conference (London) - Day 1

(Get ready, it's a long one)

So, finally the Google testing conference has come around, and it's pretty good at making a man feel like a small fish in a big pond. It's pretty clear that the place is populated by developers who are all working in some form of test driven way. A large number contribute to open source software, and many of those are working on test frameworks... 3 of the 4 main contributors to jMock are in attendance. I don't mind admitting that I feel like a bit of an interloper.

But before I start, I've got to point out that (of course) all the words here are my own interpretations of the presenters words... and I could very easily have got it all very very wrong. But then that would be their fault for presenting badly ;)

Also, Google assured us that the talks will be available on Google video, and that many supporting links will be sent out to attendees. Once I get those things I'll update this entry to keep you updated.

Anyway, the first day didn't disappoint:

Distributed Testing with SmartFrog: (Google video)

Steve Lougran and Julio Guijarro talked about their HP Labs research project on testing distributed systems: SmartFrog. They've got a good looking framework together that allows you to define a system deployment as a class hierarchy and then describe their relationships. That means that you can use it to state when and where components need to be deployed, services need to start and so on. A nice tool for rolling out complex distributed systems.

But their interesting points came when they talked about using it for testing. By wrapping a few testing frameworks (JUnit for one) and then describing the test suites as components to install, they're producing a framework that allows you to test each component of the system in a place similar to where it would run when in production. Not only that, but it allows you to install emulated components such as switches, routers and flaky proxy servers allowing you to test on complex and failure prone infrastructure. Not bad.

Their main problems at the moment seem to come from trying to then collect all the test results, logs and suchlike and compiling that into a reasonable test result summary. It's fine when things pass, but as soon as you get a lot of failures it starts to stutter. But that problem's surely surmountable (if dull to solve ;) ). A few people out there must be looking at the same thing under a different guise.

One great thing to come out of it was the call to arms to those producing test frameworks... Where is the common reporting standard? A guy working on SimpleTest (sorry, didn't catch his name) seemed to be up for it...

Main message: Test using real deployments, not just idealised or local versions.

Literate Functional Testing: (Google video)

Robert Chatley and Tom White of Kizoom talked about their functional testing framework that extends the Knuth idea of 'Literate Programming'. It's most succinctly described in Knuth's own words: "Let us change our traditional attitude to the construction of programs. Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do."

The aim is to end up with a language that can be used by both developers and customers; that the resulting test code can be read by non developers and therefore can by used by the customer to validate the developers' interpretation of the system requirements.

They have produced a language that takes few jMock ideas (like constraints) and uses them to produce truly elegant code.

So, the test cases become along the lines of


assertThat( currentPage, has( 3, selectBoxes.named('Region', 'Time', 'Method' ) ) );

The idea is lofty, looks pretty damn good and definitely reverberates with my own ideas on producing story tests. They drive through the user interface and are easy to read. Implementing the same tests in a more traditional Java approach leads to a very difficult to read lump of code.

It's a shame they haven't taken it a little further so that the code is a fully readable English script rather than a halfway house between English and Java, but I love it none the less.

Oh, and like any good tool should, it's going open source...

Main message: It's possible to write functional tests that can be read by your customer, it just takes a different approach to the rest of your code.

Testing using Real Objects: (Google video)

Massimo and Massimo (Arnoldi and Milan) talked about their approach to generating test data for their (sounds) highly successful company Lifeware. By no means particular to their market (life insurance), they find that bugs don't occur in their system with short lived data. Rather it's the contracts that have lived in the system for years, and have a huge amount of complex events in their lifetime are always the ones that fail. Those contracts are basically atypical of the ones that are usually used for testing.

They found that there was always a great deal of difficulty in producing test cases for these large data sets and so have produced a method of exporting data from live systems an importing it into the test suite.

They admitted that it was born of a data migration tool, and you can see how. Having identified an object that needs to be extracted they generate a set of events that will re-create it. Those events are simply any data changes that you would need to create the primary and related objects in the state that they exist now. If a contract has a number of payments against it, then you'll see a number of distinct payment events in the list.
This list of events is then translated into a series of method calls that can be used to recreate the object in another environment. Having created the data set it's merely a case of describing whatever assertions you need in your test.

It sounds like there's some clever Smalltalk code in the background going on, and much of the talk was on that, but it's the idea that's the important component.

As a means of extracting or generating test data it sounds great. The events list is a neat solution. However, as a means of describing a data migration it sounds phenomenal! And that's where I really see the benefits. Being of a DB background that's no real surprise ;-)

If you can always generate data in your system from a set of precise events then when you need to migrate data from an external system you don't need to create a data-mapping, you need to create an event mapping. Customers are notoriously bad at data mapping because the data often not in a form they recognise. But these events sound to me like a domain language just waiting to jump out.

Main message: Test using real data (objects), not just idealised versions.

Doubling the value of Automated tests: (Google video)

Next up was Rick Mugridge, the man behind FitLibrary (an extension to Fit a means of specifying and running tests), and co-author of 'Fit for Developing Software'. He's a big believer in story tests and much of his work seems to be around getting the process of producing them as slick as possible.

His big idea is 'Story-test Driven Development' (SDD). That is, taking the idea of Test Driven Development (TDD) a step further and getting system requirements specified in tests as early as possible in the process. The 'story' to which the TLA relates is the Extreme Programming idea of a story. A single action that a user wishes to perform with a system.

He proposed that writing story tests in a language that describes user interface interaction is like programming in assembler. It has its uses, but is far too low a level for many purposes; that using a low level language can hide the underlying purpose of the test, being to test the business rules of the application with respect to the story.

In producing a story test you should be talking in the domain language not a UI language. By doing so the business rules become apparent and the vocabulary becomes clear enough to be used by business analysts, product managers, the customers. He advocates the use of story tests as a means of the customers defining the requirements of the system. That these can then be further refined by the developers and the test teams, but that the customers ultimately own them.

Also, if the purpose of the story test is to convey information between disparate parties (in both geography and time), then concise, concrete examples are the way forward.

I whole heartedly agree with the basic premise that there is a different approach to testing that can be forgotten about, namely testing the integration of objects in the domain language. I'm just not sure it replaces the UI testing. I'm not 100%, but I don't think that Rick was suggesting this.

Also, I'm really not sure about the Fit method of showing tests in tables (example here). I like the idea of a non text based representation, but I'm just not sure the tables really work for anything other than trivial examples. Still, I was very impressed with his notion that tests could be described in diagrams.

Main message: Test in the domain language, not in a UI language. If you do that you can always generate a UI test, and your tests will express the essential business rules rather than workflow.

Auto-test, Push button testing using contracts: (Google video)

Next it was Andreas Leitner's turn, A PHD student working with ETH Zurich.

In typical PHD style, his talk centred around Design by Contract. For those that aren't familiar with the concept he described the ideas of the pre and post conditions and invariants that are apparent in such languages (Eiffel being his language of choice). The idea is that for a given method there will be defined:
Pre-conditions - The conditions that must be true before that method is called.
Post-conditions – The conditions that the method guarantee will be true once the method is complete.
Invariants – The conditions that can never be broken.

The framework Andreas has put together can be used to test Eiffel classes to ensure that the post and invariant conditions are never broken. The innovative approach that this framework takes is that it does not require the developer to produce any test code. Instead test code is generated based on a 'strategy', for which there are already a number created.

The most basic of those strategies is the purely random: Create a bunch of objects, call methods on them with parameters that pass the pre-conditions, make sure the post conditions are true. He also offers what he calls an 'Adaptive Random' strategy, where each object tested is aimed to be as different in structure from the last as possible, and AI based strategies influenced by the well understood maze solving technique of World /State / Goal definition.

These tests are then intended to run for a large amount of time, unlike the traditional unit test idea of 'run as fast as you can through the interesting cases'. This then becomes a brute force attack on the objects, attempting to break them pretty much by chance.
On finding a failure case, the framework will then try to extract the essential nature of the test and provide you with a script that can be added as a standard unit test to your suite. The obvious (but clever) way of checking the generated test script is valid... it runs the script again and checks it fails.

The big win is the fact that this can be used to test third party libraries without having to define the tests yourself. OK, so the intent of each method isn't really tested, merely the accuracy of the post-conditions and invariants, but that's definitely better than just taking everything on trust.

He also asserted that you don't need explicit pre and post conditions in your language in order to use this technique. Java has extensions that provide the capability, and SPEC# does something similar in the C# world. Also, it was pointed out that there are other tools doing similar things, Agitar being one that takes Java classes and trying to work out how to break them.

Main message: It's possible to produce auto generated test cases for classes as long as you have (or can infer) design by contract components and enough time.

Does my button look big in this? Building Testable AJAX Applications: (Google video)

Finally some Google guys got in on the act and Adam Connors and Joe Walnes gave us a presentation on how to test AJAX apps.

In reality this was a pretty generic talk illustrating the fact the any type of application in any language can be decomposed into testable components. It quite rightly put forward the idea that the industry (or more accurately, the people in it) is still naive when it comes to writing Javascript. Good practice goes out the window as soon as the manipulation of a Document Object Model and an XMLHTTPRequest object comes into play.

But, as they demonstrated, it's not that hard to design Javascript code in a way that can be tested, it just takes a little bit of thought and a lot of discipline.

Still, they did suffer the wrath of the audience when they suggested that the DOM interaction and the View code doesn't need to be unit tested. They proposed that it's OK not to unit test some components just as long as those components are as simple as possible in what they do. Contentious, but it does have merit. I tried to suggest that the story tests can take care of that, but Joe didn't seem to want to bite!

Main message: AJAX apps are like any other type of app. There are easy to test bits, hard to test bits and seemingly impossible to test bits. Separate the bits and you make your life easier. The fact that it's Javascript is no excuse for bad design.

And that was day one. I could have gone to the pub, but in the end it was far too exhausting for me and I took my free t-shirt, fancy LED laden pen and Google notepad and skulked off home to prepare for tomorrow (and write this entry, of course).

If tomorrow's as good as today, I'll be exhausted, happy and a lot richer for the experience!

Update: As per the multiple requests by Google, this post is tagged: Google LATC