I received an email this morning fromJakob Nielsen’s AlertBox, to let me know that it is 20 years since he first publicly advocated the idea of “discount usability” testing. Nielsen summarises this approach to usability testing as:
Fair enough. It is something I continue to use, when working on website designs. But last week, I was reading a paper (from 2007) by Gitte Lingaard and Jarinee Chattratichart (“Usability Testing: What have we overlooked?“) that discusses the validity of getting hooked on this idea of just testing with 5 users. Shouldn’t we also think carefully about the tasks we are asking those test users to carry out, and the number of those tasks?
A valuable alternative to expensive usability labs
There is no doubt that discount usability is a valuable and effective approach to take, and the authors of the paper didn’t suggest that we should all be testing with tens of users in a big, fancy usability laboratory. It is often difficult to recruit suitable test users, especially in numbers, and very expensive to build a custom usability “lab”. The authors of the paper did, however, call into question the idea that testing with 5 users will suffice to reveal 85% of all the usability problems in the user interface being tested. That is to say, simply testing with 5 users is unlikely to tell you everything you need to fix on your website.
Lingaard & Chattratichart tested two hypotheses:
- That there is a correlation between number of users and the proportion of problems found
- That there is a correlation between number of user tasks and the proportion of problems found
Some of you might notice that, because this was an experimental comparison of difference usability evaluation methods, an interface with a presumably known number of problems was used. In real life, of course, developers and designers are unlikely to know the total number of problems, at least in the first iteration of testing.
That point aside, what did they find?
5 is (not?) a magic number
Broadly speaking, having 5 or 6 test users was not a magic bullet, and the usability teams in the study couldn’t replicate a “succes rate” of finding 85% of the usability problems. Having lots of users also didn’t help a great deal.
The big thing that seemed to make a difference was the degree to which the tasks set for the test users were planned and well-defined, and the number of those tasks. To quote the paper:
“…with careful participant recruitment, investing in wide task coverage is more fruitful than increasing the number of users. It pays off to give many sets of user tasks to a small number of users, rather than giving many users the same, limited set of tasks in a usability test.”
“Apparently, this is true provided care is also taken in the selection of users and tasks. For an optimum ROI, it would thus seem that usability engineers would be wise to strike a balance between the number of user tasks and the number of users [in the study]…”
But Nielsen’s the usability guru! What do we do?
This is just one paper, sure, but it made a lot of sense to me. Even so, I am not about to throw Jakob Nielsen’s books out of window, or unsubscribe from AlertBox. I think that it is well worth taking something from both perspectives here, and developing a suitable approach accordingly.
Nielsen’s advocation of testing with 5 users has largely to do with working within a budget, and the ROI. Although Lingaard & Chattratichart show that we perhaps shouldn’t treat the number 5 as being magic in terms of sample size, they also show that simply adding more users is unlikely to help uncover more issues. Something that they clearly agree on is that just testing with lots of users will not find all the problems.
We still have that budget to think about, though. We need to add usability testing into the mix of a software or application development project, and that means time and financial commitments. As Nielsen states:
“…test with at least 15 users to discover all the usability problems in the design. So why do I recommend testing with a much smaller number of users?
The main reason is that it is better to distribute your budget for user testing across many small tests instead of blowing everything on a single, elaborate study. Let us say that you do have the funding to recruit 15 representative customers and have them test your design. Great. Spend this budget on three tests with 5 users each!“
Exactly. Spread it out, create pool of test users, and test iteratively. with a range of tasks, throughout the development process. At the same time, plan carefully what we ask the test users to do. It clearly pays to work on user scenarios, goals, typical tasks, and common paths through an application or piece of software. But perhaps it is not safe to assume that just carrying out a simple first test with 5 users will find 85% of the usability problems.
Furthermore, as Nielsen states:
“You need to test additional users when a website has several highly distinct groups of users. The formula only holds for comparable users who will be using the site in fairly similar ways.”
And let’s not forget that we are dealing with specialised websites that will probably require complex or detailed tasks to achieve goals, so we need to develop a testing protocol that works for us, and avoid generalising too much.
So what should we do?
- Get to know the users (this is already happening)
- Select a range of suitable users for usability testing
- Create a wide range of representative, detailed tasks for those users to carry out
- Iterate testing throughout the design and development process
Steps towards improving usability testing at the EBI
In terms of approaches to usability testing that I think that we could look at here at the EBI, I would like to:
- Develop a matrix to show common activities, features and linkages across the various EBI tools
- Create a pool of potential test users for each EBI tool (there may be overlap)
- Identify a room and arrangement we can use for carrying out observational usability testing
- Get a camcorder (to record usability sessions, affinity diagramming or card sorting sessions, etc)
- Experiment with simplified, or “guerilla”, testing at training sessions or conferences, for example
- Look into the possibilities for more formal field testing with users
- Develop knowledge of remote testing possiblities
- Develop personae based on experience with real users
I plan to write about these points, and explain why I think they are important, in future posts.