Wednesday, September 13, 2006

Google LTAC - A more personal note

Alright, so I've been going on about the Google LTAC quite a bit recently, but I wanted to mention a few more personal observations...

  • Is it coincidence that the comedy presenters were called Adam and Joe?
  • Google might talk about Work / Life balance, but you're always at the conference even when you're on the toilet thanks to the highly informative 'testing on the toilet' hints and tips sheets ;) (see below)
  • Even the user interface 'simplicity innovators' can't help themselves when it comes to conference freebies... I never realised that I needed a coloured light on my pen until now. (Shame the light makes the pen a bit too heavyweight, the ink keeps clogging meaning a slow startup time and the blue LED keeps on cutting out)
  • Also, for a company that's very much 'of the now', a mouse mat is just soooo last millennium.
  • I've never been at a conference where there were so many laptops. Although I'm a little surprised that no-one brought any internet enabled CI lava lamps with them.
  • Google may not be Evil, but they still gave us plastic cutlery and polystyrene plates (boooo)


And a couple of awards

  • Phrase of the conference: "That's a big bucket of suck"
  • Agile Pimp: Dan North, a man with an eye for spotting the delegate that's ripe for a bit of lean process
  • Free snack food of the conference: Innocent Smoothies.
  • Information Download Award: Adam Porter, watch the video (when it's out), you'll understand.
  • Demo of the conference: Jason Huggins, the cutting edge can cut you deeply when you've got an audience.


Testing on the toilet

Tuesday, September 12, 2006

Spare a moment for the little people

They're here, and they may need your help.

Also, it's good to see the new world order we were promised has finally arrived, though if might appear that Disney World has taken the political situation a bit far over here

Sunday, September 10, 2006

A language for discussing performance testing

OK, so all I'm doing here is repeating the text that I previously had in a post about the Google London Automation Testing Conference, discussing the talk by Goranka Bjedov on Performance testing.

I figured that it was sitting in the middle of a fairly large post, and I wanted it to be seen and reviewed by more people than would be bothered to plough through the other stuff.

It's a suggested series of terms by which different types of performance tests can be described:

  • Performance Test: Given load X, how fast will the system perform function Y?
  • Stress Test: Under what load will the system fail, and in what way will it fail?
  • Load Test: Given a certain load, how will the system behave?
  • Benchmark Test: Given this simplified / repeatable / measurable test, if I run it many times during the system development, how does the behaviour of the system change?
  • Scalability Test: If I change characteristic X, (e.g. Double the server's memory) how does the performance of the system change?
  • Reliability Test: Under a particular load, how long will the system stay operational?
  • Availability Test: When the system fails, how long will it take for the system to recover automatically?

In addition to the above, there is then another term, which I would suggest is not a type of test in its own right, rather it is a term denoting the depth of analysis being performed.

  • Profiling: Performing a deep analysis on the behaviour of the system, such as stack traces, function call counts, etc.


Any thoughts?

Saturday, September 09, 2006

Google Automated Testing Conference (London) - Day 2 (part 2)

Selenium: The in-browser acceptance testing tool (Google video)

Another talk I was looking forward to was Jason Huggins, the creator of Selenium talking about his tool. I was hoping for a little on the basics of the tool, but really the idea is simple enough to be easy to describe.

It is a test framework that allows you to build Fit like (HTML table), or code based UI tests and then drive your web application through multiple browsers. The result is a solution to the web version of the mobile problem... How do you test the against the diversity of target platforms.

The product ships with an IDE that sits in Firefox and allows you to record actions against the application, then edit the resulting test scripts. These scripts can be in any of many languages such as Java, C# and Ruby. I did notice the lack of PHP support though. Shame, but does it really matter?

The bulk of his talk was then one some of the possibilities you can get from this. In particular, his demo was to check a change into subversion, this would kick off cruise control and run the unit tests for his app. The successful build message would then get picked up by a number of listeners running in different OSs running on virtual machines. These listeners would kick off Selenium for different browsers, all run the same set of tests and record the resulting actions as movies.

His ultimate aim was then to add some voice over and these screencasts could be used as marketing videos: in fact, he suggested that the Apple voice synthesis would be good enough for that job (especially the new version coming out soon), and the spoken text could be in the test itself.

Very nice idea. it's a shame the demo didn't actually work ;)

One point made was that the movies are going to get big, and maybe you wouldn't want to keep them for all versions. Maybe, but then maybe you don't need movie files... Maybe just the html files will do. You can then wrap it up in a player that will move you between pages and kick off the right bit of audio at the right time. A selenium driven app won't show you the mouse pointer, so you don't loose that... You could highlight clicks and focus with a bit of nifty css. Alas I didn't get a chance to pass the idea on to Jason as he was (unsurprisingly) surrounded for most the rest of the day.

Main message: When you're innovating, your demos might not be as smooth as you want them to be ;)

Or maybe

Main message: The DRY principle (don't repeat yourself) crosses so many boundaries it's crazy. Be aware that there are many ways you can re-use the same data / process / tools, and automation can be applied to many many things.




Testing Metro WiFi (No Google video yet :( )

Karl Garcia of Google was next up, talking about testing the open wireless network in Mountain View, California.
This new network covers about 12 square miles of reasonably densely populated urban area. The network consists of just under 400 wireless nodes sitting on the top of lampposts spread no more than about 150 yard apart. This mesh is then connected to 3 base stations via a number of gateways spread more sparsely.

The question is, how do you automate the testing of such a beast.

The testing was covered in two main stages. First the coverage test... Simply taking a device that will poll the network and driving it down every street. Hook that up to GPS and get it to record the network strength and you've got stage 1 covered.

The second is the throughput testing. For this the team grabbed cheap routers and PDAs, installed Linux and iPerf (network testing tool), setup their start-up so that they'd report in to the main server and get ready to run test, make sure that if they fail they go into a restart mode and then gave them cheap solar panels. Then it was just a case of picking a spot to test, taking the clients out and leaving them.

Back at the office the test server polls for the clients, gets them to run the suites and reports the results. OK, so the machines have to be taken out and placed, but a small number of inexpensive components makes the kit cheap. If the client machines go down then you just need to wait for them to reboot and they'll get back in touch. No need to go out and reset them.

When you need to change the test area you just go out, pick the clients up and move them. Karl admitted that they could have had more clients and that they considered having more expensive bits, but in the end the kit they had was stupidly cheap and more than fit for purpose.

Main message: Not every part of the process can be automated, but you can minimise those bits that can't and still get great benefits. The most expensive way of doing that isn't always the best.




Distributed Continuous Quality Assurance: (Google video)

And the last full talk of the day was Adam Porter. With an incredible bio (in terms of academia), it seemed as though Adam thought the lightning talks had started early and he powered through 45minutes of information dispensing, at an incredible speed.

His discussion as very much at an academic level, but the system he talked about sounds like it may be one of the next big things in testing.

The starting premise is this: traditional testing strategies are failing large scale development because the variations possible in the configuration of most new systems are enormous. It simply isn't possible to test the full variation of configuration options, deployment operating systems, setups of those operating systems and so on.

His system (Skoll) attempts to address that.

I'm going to go into no depth on this topic whatsoever because the amount of information was overwhelming, but I'll try to give an overview.

By splitting QA tasks into small chunks of work, defining the valid combinations of deployment, providing a means of producing those variations and using a grid of machines to test on it is possible to cover an enormous amount of combinations in your tests.
If you then get smart about which combinations should be tested then you can statistically cover the whole set without having to actually test them all.

You can test disparate configurations until you find a failure, then take that configuration, change it in a single dimension and then test again... Feel out the extent of the failure scape.

You can then datamine those results to try to provide clues on the underlying problem.
He gave compelling arguments that certain strategies for choosing particular configurations work and followed every one with a case study that tested it empirically.

He also showed how the approach can be used for tracking performance problems with a selective method of benchmarking.

It was a very solid, very info laden, well thought out talk n a great approach. When the video comes out, I urge you to watch it. At half speed.

And if you're really interested in reading the papers, there's one here and you can find many more (or copies of the same) by searching on Google for Skoll testing.

Main message: It's not just about the volume of QA you perform, it's about the QUALITY of the QA. By getting smarter in your testing you can cover more of the application in less time.




Lightning Talks: (No Google video yet :( )

Finally it was the turn of the lightning talks. 10 speakers, 10 subjects, 5 minutes per speaker, no exceptions. I'm not going to cover much of the ground here because it was, of course, a lot of info in a small amount of time. It's worth picking up the video when it's available and watching it. The quality's a bit hit and miss, but you can always skip the 5 minutes of the person you don't like.

Highlights were Dan North on 'Getting Lean' – what it means and how automation helps you get there, Steve Freman and Nat Price giving us a glimpse of jMock, particularly of jMock 2 (looking tasty). James Lyndsay reminding us that "automation good" does not equal "manual bad" and Jordan Dea-Mattson reminding us that our QA processes need to have "defence in depth", or "overlapping fields of fire".

All in all, an excellent 2 days!

Google Automated Testing Conference (London) - Day 2 (part 1)

So, day 2 has been and gone, and it seemed slightly quieter than day 1.
It seems like the free bar took its toll. Despite the slight drop in numbers, the conversations seem a little more free flowing. Damn it! Lesson to earn from this conference: 'No matter how bad you feel, go to the free bar'.

Anyway, this time round I'm going to tackle the blog entry in two chunks... Thursday's entry was just too damn big!

Objects - They just work: (Google video)

I may be biased here, but I was really disappointed with this one. It looked like it might be a talk on the decomposition of objects into the simplest form so that they just work.

It wasn't.

The title was a reference to the NeXTSTEP presentation given by Steve Jobs way back in 92 when he said the same. The point being that they didn't just work. There was a lot of pain and hardship to get them to work, and that's a recurring theme in software development.

So the talk was given by Bob Binder, CEO and founder of mVerify. He discussed a few of the difficulties and resulting concepts behind their mobile testing framework.

We've already had a discussion on the difficulties involved (permutations), so that didn't tell us anything we didn't already know.

He moved on to mention TTCN (Testing and Test Control Notation) an international standard for generic test definition. It's probably worth a look.

He also mentioned the fact that their framework added pre and post conditions to their tests - require and ensure. I may be a die hard stick in the mud here, but the simplicity of 'setup – prod – check – throw away' seems like a pretty flawless workflow for testing to me. Though I admit, I could well be missing something: If anyone can enlighten me on their use I'd be reasonably grateful. Though thinking about it, I wasn't interested enough to ask for an example then, so maybe you shouldn't bother ;)

One good thing to come out was the link to that NeXTSTEP demo.

I really want to say more and get some enthusiasm going, but sorry Bob, I just can't.

Main message: Try as they might, CEOs can't do anything without pimping their product and glossing over the details.



Goranka Bjedov - Using Open Source tools for performance testing: (Google video)

After the disappointment of the first talk, this one was definitely a welcome breath of fresh air.

Like the first time I read Pragmatic Programmer, this talk was packed full of 'yes, Yes, YES' moments. If you took out all the bits I had previously thought about and agreed with you'd be left with a lot of things I hadn't thought about, but agreed with.

When the videos hit the web you MUST watch this talk.

Goranka proposed a vocabulary for talking about performance tests, each test type with a clear purpose. Having this kind of clear distinction allows people to more clearly define what they're testing for, decide what tests to run, and ultimately work out what the test results are telling them.

  • Performance Test – Given load X, how fast will the system perform function Y?
  • Stress Test – Under what load will the system fail, and in what way will it fail?
  • Load Test – Given a certain load, how will the system behave?
  • Benchmark Test– Given this simplified / repeatable / measurable test, if I run it many times during the system development, how does the behaviour of the system change?
  • Scalability Test – If I change characteristic X, (e.g. Double the server's memory) how does the performance of the system change?
  • Profiling – Performing a deep analysis on the behaviour of the system, such as stack traces, function call counts, etc.
  • Reliability Test – Under a particular load, how long will the system stay operational?
  • Availability Test – When the system fails, how long will it take for the system to recover automatically?


I would probably split the profiling from the list and say that you could profile during any of the above tests, that's really about the depth of information you're collecting. Other than that I'd say the list is perfect and we should adopt this language now.

She then put forward that infrastructure you need in order to do the job.

I don't want to be smug about it, but the description was scarily similar to that which we've put together.

Alas, the smugness didn't last long because she then went on to tell us the reasons why we shouldn't bother trying to write this stuff ourselves... That the open source community is already rich for doing these jobs, directing us to look at Jmeter, OpenSTA and Grinder. A helpful bystander also directed us to opensourcetesting.org - there are a lot of test tools on there.

Fair enough... I admit we didn't look when we put together our test rig, but you live and learn. And I'll definitely be taking a look for some DB test tools.

A big idea I'll be taking away is the thought that we could put together a benchmarking system for our products. This isn't a new thought but rather an old one presented in a new way. Why shouldn't we put together a run that kicks off every night and warns us when we've just killed the performance of the system. It's just about running a smoke test and getting easy to read latency numbers back. Why not? Oh, I'll tell you why not... We need production hardware ;)

She then gave us a simple approach to start performance testing with, a series of steps we can follow to start grabbing some useful numbers quickly:

  • Set up a realistic environment
  • Stress test
    • Check the overload behaviour
    • Find the 80% load point
  • Build a performance test based on the 80%
    • Make it run long enough for a steady state to appear
    • Give it time to warm up at the start
    • Collect the throughput and latency numbers for the app and the machine performance stats.


If I wasn't already married, I might have fallen in love :)

Main message: You CAN performance test with off the (open source) shelf software, it just takes clarity of purpose, infrastructure, a production like deployment and time.

Oh, and you're always happiest in a conference when someone tells you something you already know ;)




Testing Mobile Frameworks with FitNesse: (Google video)

As the last one before lunch Uffe Koch took the floor with a pretty straightforward talk. By this time I was sick of hearing about mobile testing ;)
The thing is, the manual testing problem is so big with mobiles that it's prime for automation.

It turned out that he gave a pretty good talk on the fundamentals of story (or functional) testing practice. For a different audience, this would have been a fantastic talk, but unfortunately, I think most people here are already doing many of the things.

A lot of the early part of the discussion crossed over between the Fit and Literate Testing talks from the day before, though the ideas weren't presented in the same kind of depth. The suggestion that the test definition language of 'click 1 button' was a domain was pushing it, but the point is reasonably valid. The structure of test definition languages need to be very different to the programming languages we're used to. This is one of the winning points of Fit, since its presentation is very different to Java, C# or whatever it's approached by the developers in a very different way. Kudos to Uffe for realising this explicitly and producing a language for driving the app framework.

His team have put together a UI object set that can be driven through the serial or USB port of a phone and can report the UI state back to the tester at any time, passing it to the tester as an XML document.

It's very similar to our method. We do it so that the test don't need to know about page structure, just the individual components; so we don't need to worry with things like XPath when we want to extract data from the page; so our story tests aren't as brittle as they could be. They're doing this to solve the problem of screen scraping from a phone.

It's an elegant solution to testing these phones and whilst Uffe admits that it means you're not testing the real front end, or the real screen display, it allows them to hook up a phone to the test rig and run the full suite. I'm sure those tests must take and age though... doing a UI test of a web page is bad enough, but some of those phones can take some time to respond! I'd like to see the continuous integration environment. I've got an image of 500 Dell machines hooked up to different phones through masses of cables. That'd be cool!

The common FitNesse question did come up: How did you address the version control of the FitNesse scripts. Like everyone else (it seems), the archiving was switched off, local copies of the wikis were created and they got checked into the same version control as the code when they were changed. I really feel I've got to ask the question: If this is the way everyone does it, why isn't there an extension to the suite to allow this out of the box?

Main message: With a bit of thought and design even the most difficult to test targets can be tested. You just might need a tiny touch of emulation in there.




And that led us on to lunch...