Saturday, September 09, 2006

Google Automated Testing Conference (London) - Day 2 (part 1)

So, day 2 has been and gone, and it seemed slightly quieter than day 1.
It seems like the free bar took its toll. Despite the slight drop in numbers, the conversations seem a little more free flowing. Damn it! Lesson to earn from this conference: 'No matter how bad you feel, go to the free bar'.

Anyway, this time round I'm going to tackle the blog entry in two chunks... Thursday's entry was just too damn big!

Objects - They just work: (Google video)

I may be biased here, but I was really disappointed with this one. It looked like it might be a talk on the decomposition of objects into the simplest form so that they just work.

It wasn't.

The title was a reference to the NeXTSTEP presentation given by Steve Jobs way back in 92 when he said the same. The point being that they didn't just work. There was a lot of pain and hardship to get them to work, and that's a recurring theme in software development.

So the talk was given by Bob Binder, CEO and founder of mVerify. He discussed a few of the difficulties and resulting concepts behind their mobile testing framework.

We've already had a discussion on the difficulties involved (permutations), so that didn't tell us anything we didn't already know.

He moved on to mention TTCN (Testing and Test Control Notation) an international standard for generic test definition. It's probably worth a look.

He also mentioned the fact that their framework added pre and post conditions to their tests - require and ensure. I may be a die hard stick in the mud here, but the simplicity of 'setup – prod – check – throw away' seems like a pretty flawless workflow for testing to me. Though I admit, I could well be missing something: If anyone can enlighten me on their use I'd be reasonably grateful. Though thinking about it, I wasn't interested enough to ask for an example then, so maybe you shouldn't bother ;)

One good thing to come out was the link to that NeXTSTEP demo.

I really want to say more and get some enthusiasm going, but sorry Bob, I just can't.

Main message: Try as they might, CEOs can't do anything without pimping their product and glossing over the details.

Goranka Bjedov - Using Open Source tools for performance testing: (Google video)

After the disappointment of the first talk, this one was definitely a welcome breath of fresh air.

Like the first time I read Pragmatic Programmer, this talk was packed full of 'yes, Yes, YES' moments. If you took out all the bits I had previously thought about and agreed with you'd be left with a lot of things I hadn't thought about, but agreed with.

When the videos hit the web you MUST watch this talk.

Goranka proposed a vocabulary for talking about performance tests, each test type with a clear purpose. Having this kind of clear distinction allows people to more clearly define what they're testing for, decide what tests to run, and ultimately work out what the test results are telling them.

  • Performance Test – Given load X, how fast will the system perform function Y?
  • Stress Test – Under what load will the system fail, and in what way will it fail?
  • Load Test – Given a certain load, how will the system behave?
  • Benchmark Test– Given this simplified / repeatable / measurable test, if I run it many times during the system development, how does the behaviour of the system change?
  • Scalability Test – If I change characteristic X, (e.g. Double the server's memory) how does the performance of the system change?
  • Profiling – Performing a deep analysis on the behaviour of the system, such as stack traces, function call counts, etc.
  • Reliability Test – Under a particular load, how long will the system stay operational?
  • Availability Test – When the system fails, how long will it take for the system to recover automatically?

I would probably split the profiling from the list and say that you could profile during any of the above tests, that's really about the depth of information you're collecting. Other than that I'd say the list is perfect and we should adopt this language now.

She then put forward that infrastructure you need in order to do the job.

I don't want to be smug about it, but the description was scarily similar to that which we've put together.

Alas, the smugness didn't last long because she then went on to tell us the reasons why we shouldn't bother trying to write this stuff ourselves... That the open source community is already rich for doing these jobs, directing us to look at Jmeter, OpenSTA and Grinder. A helpful bystander also directed us to - there are a lot of test tools on there.

Fair enough... I admit we didn't look when we put together our test rig, but you live and learn. And I'll definitely be taking a look for some DB test tools.

A big idea I'll be taking away is the thought that we could put together a benchmarking system for our products. This isn't a new thought but rather an old one presented in a new way. Why shouldn't we put together a run that kicks off every night and warns us when we've just killed the performance of the system. It's just about running a smoke test and getting easy to read latency numbers back. Why not? Oh, I'll tell you why not... We need production hardware ;)

She then gave us a simple approach to start performance testing with, a series of steps we can follow to start grabbing some useful numbers quickly:

  • Set up a realistic environment
  • Stress test
    • Check the overload behaviour
    • Find the 80% load point
  • Build a performance test based on the 80%
    • Make it run long enough for a steady state to appear
    • Give it time to warm up at the start
    • Collect the throughput and latency numbers for the app and the machine performance stats.

If I wasn't already married, I might have fallen in love :)

Main message: You CAN performance test with off the (open source) shelf software, it just takes clarity of purpose, infrastructure, a production like deployment and time.

Oh, and you're always happiest in a conference when someone tells you something you already know ;)

Testing Mobile Frameworks with FitNesse: (Google video)

As the last one before lunch Uffe Koch took the floor with a pretty straightforward talk. By this time I was sick of hearing about mobile testing ;)
The thing is, the manual testing problem is so big with mobiles that it's prime for automation.

It turned out that he gave a pretty good talk on the fundamentals of story (or functional) testing practice. For a different audience, this would have been a fantastic talk, but unfortunately, I think most people here are already doing many of the things.

A lot of the early part of the discussion crossed over between the Fit and Literate Testing talks from the day before, though the ideas weren't presented in the same kind of depth. The suggestion that the test definition language of 'click 1 button' was a domain was pushing it, but the point is reasonably valid. The structure of test definition languages need to be very different to the programming languages we're used to. This is one of the winning points of Fit, since its presentation is very different to Java, C# or whatever it's approached by the developers in a very different way. Kudos to Uffe for realising this explicitly and producing a language for driving the app framework.

His team have put together a UI object set that can be driven through the serial or USB port of a phone and can report the UI state back to the tester at any time, passing it to the tester as an XML document.

It's very similar to our method. We do it so that the test don't need to know about page structure, just the individual components; so we don't need to worry with things like XPath when we want to extract data from the page; so our story tests aren't as brittle as they could be. They're doing this to solve the problem of screen scraping from a phone.

It's an elegant solution to testing these phones and whilst Uffe admits that it means you're not testing the real front end, or the real screen display, it allows them to hook up a phone to the test rig and run the full suite. I'm sure those tests must take and age though... doing a UI test of a web page is bad enough, but some of those phones can take some time to respond! I'd like to see the continuous integration environment. I've got an image of 500 Dell machines hooked up to different phones through masses of cables. That'd be cool!

The common FitNesse question did come up: How did you address the version control of the FitNesse scripts. Like everyone else (it seems), the archiving was switched off, local copies of the wikis were created and they got checked into the same version control as the code when they were changed. I really feel I've got to ask the question: If this is the way everyone does it, why isn't there an extension to the suite to allow this out of the box?

Main message: With a bit of thought and design even the most difficult to test targets can be tested. You just might need a tiny touch of emulation in there.

And that led us on to lunch...


Anonymous said...

Re "Objects -- They Just Work"

The purpose of this talk was to share an important insight into how the architecture of object-oriented automated testing frameworks limits their functionality and what can be done about this.

The primary strength of x-unit frameworks is that they are composable -- any test object can use any other. However, in general, x-unit frameworks do not allow conditional or iterative execution of any or all members of a test suite at any level, nor dynamic determination or generation of test objects. This limitation was unacceptable for the kind of testing we wanted to do.

TTCN provides a complete and robust model of test execution, which also addresses distributed, concurrent test execution.

We therefore decided to develop a test framework with the composablity of x-unit test objects and the robust distributed control of the TTCN model.

Because a “distributed system is one where a computer you don’t know exists can crash your program,” we added pre- and post-conditions to TestObject to handle run time errors and exceptions (a feature of neither x-unit nor TTCN.) With this, inevitable runtime resource problems and other dependencies can be managed, including attempting restart, recovery, and retry. Without this, test runs just crash -- making it unclear what happened. This was unacceptable, because a testing system should provide unambiguous information about the system under test – crashes waste time and make work for testers. We support multi-day stability runs. If a run fails after 36 hours out of 48, you do not want to throw out what you’ve learned up to the point of failure. You want to know if the failure was in the system under test or elsewhere. Pre- and post-conditions provide the run time support to mitigate these problems.

In contrast to tools for research or personal use, we have to deliver a commercially viable product with high reliability, usability, and complete documentation.

Meeting all these design goals was non-trivial project that took about two years to complete. My ironic observation was that our test objects “just work,” in the same sense that objects in NeXTSTEP were claimed to “just work.” Both systems achieved a very high level of innovative functionality, but only after a considerable amount of work.

Your characterization of my presentation as “pimping” is simply unfair and inaccurate. This talk shared unique, valuable, and hard-won insights with an audience working on closely related problems.

Bob Binder

Rob Baillie said...

Thanks for your response Bob (I feel I should probably say Mr Binder, but it just sounds wrong). I sincerely welcome your comments.

You have more than one good point: I neglected to mention the focus on the use of composition in your test framework and that's probably as much because I didn't quite get what you were saying as much as anything else.

Also, your clarification on the use of the pre and post conditions in the tests is greatly appreciated. I thought that area was completely glossed over in the talk.
I don't think that any x-unit system should just crash on reaching an exception be it caused by the test or the units it's testing. If the pre and post conditions are to mitigate against that problem in a complex environment then you have a clear reason for including them. My concern was that without the reasons being clearly defined (or, more accurately, clearly understood by myself) I was unable to make an informed decision on if I thought their introduction was appropriate or not. Of course, without digging into an example of their usage I'm still in that position.

I'm naturally averse to adding complexity into any system, especially when I can't see the reason.

I'm not sure why you raise the point: "In contrast to tools for research or personal use, we have to deliver a commercially viable product with high reliability, usability, and complete documentation."

Of course there are distinct differences between commercial and non-commercial products. Of course not all non-commercial products are purely for research or personal use. It does sound like a thinly veiled dig, but that's fair enough as I made one or two of my own at you in my summary.

I stand by my 'pimping' comment. Every person at the conference who stood beside a product or open source project that they were working on did a pretty good job of selling (pimping) their tool to the audience. It may have been a little unfair to level the criticism (if it is such) directly at you.

Finally, I think the major problem I had with the talk was the fact that the title of the presentation gave no inclination as to its content. This meant that I made an assumption about its content before it started, and from that point on I was destined to be disappointed.

Of course I urge everyone to watch ALL of the videos of the presentations and make their own mind up. Not one of them was an hour wasted. I apologise if I hadn't made that point a little clearer on my posts.