GlossaryUsability Testing

The most direct way to find out if your product actually makes sense to the people using it

Usability testing is the practice of observing real users attempt real tasks with your product — unscripted, uncoached, and as close to natural use as possible. It's the most direct method available for understanding whether a design works in practice, not just in theory.

What It Is — Not What People Think

Usability testing is not user acceptance testing. It's not A/B testing. It's not asking users whether they like something or gathering satisfaction ratings. It's observation.

You give a user a realistic task. You watch them attempt it. You don't help, you don't hint, you don't explain what you meant. You take notes on where they succeed, where they hesitate, where they take the wrong path, and where they fail entirely.

That's it. The value is in the observation — specifically in the gap between what designers think users will do and what users actually do. Every usability study surfaces that gap. The teams that run these regularly stop being surprised by it. The ones that don't are consistently blindsided by launch-week support spikes and churn they can't explain.

Moderated vs Unmoderated

These are two distinct methods that answer different questions. Choosing the wrong one wastes time.

Moderated usability testing runs with a facilitator present — in person or via video call. The facilitator can probe, ask follow-up questions, and pivot when something unexpected happens. It produces richer data and is better at diagnosing why a problem exists, not just that it does.

Unmoderated usability testing uses tools like Maze, Lookback, or UserTesting to send tasks to participants asynchronously. Users complete them on their own, usually recorded. It scales faster and costs less per participant, but you lose the ability to follow up in the moment.

Moderated is better for: exploratory research, diagnosing unclear or complex problems, testing with specialist users (clinicians, developers, finance professionals) who need a real conversation, and early-stage designs where the questions are still open.

Unmoderated is better for: validating a specific flow after moderated research has already shaped the design, getting statistical weight on a known issue, and testing with large or geographically distributed participant pools.

The mistake is treating unmoderated as 'cheaper usability testing' when it's a different tool for different questions. Most product cycles need both.

The Five-Users Rule — What It Actually Means

In 2000, Jakob Nielsen and Tom Landauer published research showing that five users are enough to identify 85% of usability problems in a given design. That finding became famous. It also became badly misapplied.

The five-user rule applies when:

  • You're testing a single, coherent user group
  • You're looking for qualitative patterns, not statistical precision
  • You're in a discovery or formative phase, not validating quantitatively

It does not mean five users are always enough. If your product serves two distinct groups — say, admins and end users — you need five from each. If you're doing {{LINK:ux-benchmarking}} and need statistically significant task success rates, you're looking at 25 or more participants per condition.

The spirit of the finding is useful though: in qualitative research, you hit diminishing returns fast. After five or six participants, you're mostly hearing the same problems again. Stop there, fix things, and test again. Iterative rounds of five beat one exhaustive round of twenty.

Task Design — Where Most Tests Go Wrong

The most common reason usability tests produce shallow findings is bad task design. Tasks that are too abstract or too close to the UI generate noise instead of signal.

Don't write tasks like this: 'Go to the dashboard and find the analytics section.'

Write them like this: 'Your team just wrapped a product launch. You want to check how many new users signed up in the last 30 days. Show me how you'd find that.'

The difference is context. Real-world framing activates the user's actual mental model. Navigational instructions bypass the part of the experience you're trying to test.

A few other principles that hold up:

  • Don't use the product's own terminology in the task prompt — if you ask someone to find the 'Workspace', you've already pointed them to the right word
  • One goal per task. Multi-step tasks make it hard to identify where in the sequence the problem occurred
  • Include at least one task where the right answer is 'this doesn't seem possible here' — to test how users handle edge cases and error states

What to Do With the Output

The deliverable from a usability study isn't a report. It's a prioritised list of problems with enough context to fix them.

A finding is not 'users struggled with navigation.' A finding is 'four of five participants looked for billing settings under Account, not under the Admin panel. Two never found it without prompting. All four expected billing to be account-level, not workspace-level, based on experience with similar tools.'

The second version tells the team what happened, who was affected, and why. That's actionable. The first isn't.

Prioritise by frequency and severity. Frequency: how many participants hit this problem? Severity: when they hit it, could they recover, or did it block the task entirely? High-frequency, task-blocking issues go first. Low-frequency, cosmetic ones go in a backlog or get cut.

Usability testing feeds directly into {{LINK:heuristic-evaluation}} and {{LINK:cognitive-load}} work — it validates or challenges the assumptions those frameworks surface. Pair the methods and your research gets dramatically more useful.

If your team isn't running this before major releases, our {{INTERNAL:/services/ux-research}} process covers it end-to-end, from recruitment to synthesis.

Related: {{LINK:heuristic-evaluation}}, {{LINK:ux-benchmarking}}, {{LINK:mental-models}}