Evaluation techniques for interactive systems

What is Evaluation?

The role of evaluation is to access designs and test systems to ensure that they function as expected and fulfill user needs.
Ideally, assessment should take place throughout the design life cycle, with the outcomes of the evaluation flowing back into design improvements.

Evaluation is a process that critically examines a program. It involves collecting and analyzing information about a program’s activities, characteristics, and outcomes. Its purpose is to make judgments about a program, to improve its effectiveness, and/or to inform programming decisions (Patton, 1987).

Goals of evaluation

  1. Assess the extent of system functionality

The system’s functioning is critical since it must meet the needs of the users. This level of evaluation may analyze the system’s efficacy in supporting the job by measuring the user’s performance with it.

For example, if a filing clerk is used to obtaining a customer’s file by postal address, the computerized file system should have the same capacity (at the very least).

2.Assess the effect of interface on the user

It is critical to evaluate the user’s interaction experience and its influence on the user. And this includes factors such as how simple the system is to learn, its usefulness, and the user’s contentment with it. It may also involve his satisfaction and emotional response, especially in the case of leisure or entertainment systems.

3. Identify specific problems

This might be design elements that, when utilized in their intended context, produce unexpected consequences or cause user confusion. And it is connected to the design’s usefulness as well as usability.

These systems are used by humans. It is extremely normal for humans to make mistakes. As a result, the system should not frustrate people over little errors. In addition, the system’s behavior should not confuse the user. This is connected to the design’s utility as well as usability.

Evaluation through expert analysis

The evaluation of a system should ideally take place before any implementation work begins. If the design itself can be evaluated, costly mistakes can be avoided since the design may be changed before any large resource commitments are made. A variety of strategies for evaluating interactive systems using expert analysis have been developed. These methods are flexible assessment approaches since they may be utilized at any point of the development process, from design specifications through storyboards and prototypes to full implementations.

There are a few expert-based assessment approaches.

  • Cognitive Walkthrough
  • Heuristic Evaluation
  • Model-based evaluation

Cognitive Walkthrough

Polson and colleagues presented this as an attempt to incorporate psychological theory into the informal and subjective walkthrough approach. The primary goal is to determine how simple a system is to learn by hands-on experience.

You will need four items to complete the tour.
1.A system specification or prototype.
2.A description of the job that the user must do on the system.
3.A detailed, documented description of the steps required to execute the task using the suggested system.
4.An indication of who the users are and what level of expertise and knowledge the assessors might reasonably expect from them.

Heuristic Evaluation

Nielsen and Molich suggest the heuristic, which is a guideline or general concept or rule of thumb that may lead a design decision or be used to criticize an existing solution. Three to five assessors are adequate.

Nielsen’s ten heuristics are -

  1. Visibility of system status — Always keep users informed about what is going on, through appropriate feedback within a reasonable time.Eg if a system operation will take some time, give an indication of how long and how much is complete.
  2. Match between system and the real world — System should speak the user’s language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order
  3. User control and freedom — Users often choose system functions by mistake and need a clearly marked ‘emergency exit’ to leave the unwanted state without having to go through an extended dialog. Support undo and redo.
  4. Consistency and standards — Users should not have to wonder whether words, situations or actions mean the same thing in different contexts. Follow platform conventions and accepted standards
  5. Error prevention — Make it difficult to make errors. Even better than good error messages is a careful design that prevents a problem from occurring in the first place
  6. Recognition rather than recall — Make objects, actions and options visible. The user should not have to remember information from one part of the dialog to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate
  7. Flexibility and efficiency of use — Allow users to tailor frequent actions. Accelerators — unseen by the novice user — may often speed up the interaction for the expert user to such an extent that the system can cater to both inexperienced and experienced users
  8. Aesthetic and minimalist design — Dialogs should not contain information that is irrelevant or rarely needed. Every extra unit of information in a dialog competes with the relevant units of information and diminishes their relative visibility.
  9. Help users recognize, diagnose and recover from errors — Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.
  10. Help and documentation — Any such information should be easy to search, focusing on the user’s task, list concrete steps to be carried out, and not be too large.

Model-based evaluation

Model-based evaluation uses a model of how a human might use a proposed technology to calculate or simulate projected usability metrics. These forecasts can be used in place of or in addition to empirical measures collected from user testing. For the assessment process, model-based evaluation combines cognitive and design models.

Models used for model-based Evaluations,

  • GOMS model
  • Keystroke-level model
  • Design rationale
  • Dialog models

Evaluation through user participation

Styles of evaluation

Techniques that are available for evaluation with users, we will distinguish between two distinct evaluation styles: those performed under laboratory conditions and those conducted in the work environment or ‘in the field’.

Laboratory studies-

Users are taken out of their normal work environment to take part in controlled tests, often in a specialist usability laboratory.

Advantages -

  • Specialist equipment available- Contain sophisticated audio/visual recording and analysis facilities, two-way mirrors, instrumented computers and the like, which cannot be replicated in the work environment.
  • Uninterrupted environment- The participant operates in an interruption-free environment.

Disadvantages -

  • Lack of context — The unnatural situation may mean that one accurately records a situation that never arises in the real world
  • Difficult to observe several users cooperating

Appropriate — if system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use.

Field studies -

This type of evaluation takes the designer or evaluator out into the user’s work environment in order to observe the system in action.

Advantages -

  • Natural environment — Observe interactions between systems and between individuals that would have been missed in a laboratory study.
  • Context retained (though observation may alter it)- Seeing the user in his ‘natural environment’.
  • Longitudinal studies are possible.

Disadvantages -

  • Distractions — High levels of ambient noise, greater levels of movement and constant interruptions, such as phone calls, all make field observation difficult.
  • Noise

Appropriate — where context is crucial for longitudinal studies.

Empirical methods: experimental evaluation

This provides empirical evidence to support a particular claim or hypothesis. The evaluator chooses a hypothesis to test. Any changes in the behavioural measures are attributed to the different conditions.

There is a number of factors that are important to the overall reliability of the experiment.

  1. Participants

Represent the set of people who going used for the experiment. Since the choice of a participant is vital to the successful participants should be chosen to match the expected user population as closely as possible.

2. Variables

Represent things to modify and measure in the evaluation. There are two types of variables, independent and dependent variables.

  • Independent variables — Characteristic changed to produce different conditions. e.g. interface style, a number of menu items
  • Dependent variables — Characteristics measured in the experiment i. e.g. time is taken, a number of errors.

3. Hypothesis

A hypothesis is a prediction of the outcome of an experiment. It is framed in terms of variables. The aim of the experiment is to show that this prediction is correct, this is done by disproving the null hypothesis.

4. Experimental design

This design represents the process of doing the evaluation. There are two main methods and they are between-subjects and within-subjects.

  • Between-subjects (or randomized) design — each participant is assigned to a different condition, more users required and variation can bias results.
  • Within-subjects (or repeated measures) — each user performs under each different condition and less costly and less likely to suffer from user variation

Once you gather the data you need to analyze the data. You need to identify the type of data, discrete or continuous and then according to that you need analyze the data using statistical methods.

Observational techniques

  1. Think Aloud

In this method, a user asked to describe what he is doing and why what he thinks is happening. Eg. describing what he believes is happening, why he takes an action, what he is trying to do. This method requires little expertise ( simplicity ) and provides useful insight and shows the actual use of the system. But this method is can’t be used in every scenario.


  • Simplicity — requires little expertise
  • Can provide useful insight with an interface
  • Can show how system is actually use


  • Subjective
  • Selective — depending on the tasks provided
  • Act of describing may alter task performance — The process of observation can alter the way that people perform tasks and so provide a biased view

2. Cooperative Evaluation

In this method, user and evaluator collaborate and ask each other questions throughout. This is constrained and easier to use. Also, user encourages to even criticize the system.


  • Less constrained and easier to use
  • User is encouraged to criticize the system
  • Clarification possible

3. Protocol Analysis

Methods for recording user actions in protocol analysis,

  • paper and pencil — cheap, limited to writing speed
  • audio — good for a think-aloud, difficult to match with other protocols
  • video — accurate and realistic, needs special equipment, obtrusive
  • computer logging — automatic and unobtrusive, large amounts of data difficult to analyze
  • user notebooks- coarse and subjective, useful insights, good for longitudinal studies
  • Mixed-use in practice
  • audio/video transcription difficult and requires skill.
    Some automatic support tools available

4. Automated Analysis

Automatic analysis includes tools like EVA (Experimental Video Annotator), which is a system that operates on a multimedia workstation with a direct interface to a video recorder to help the task. The analyst has more time to focus on pertinent situations using Automated Analysis because the job isn’t interrupted as much.


  • The analyst has time to focus on relevant incidents
  • Avoid excessive interruption of the task


  • Lack of freshness
  • Maybe post-hoc interpretation of events

5. Post-task walkthrough

In this approach, the user considers the action after it has occurred. This gives the analyst more time to focus on pertinent instances and prevents the process from being disrupted excessively. However, this approach is out of date.

Query techniques

  1. Interviews

In this method, the Method Analyst asks the user one-on-one questions regarding his experience with the design. In comparison to other ways, this method is informal, subjective, and very inexpensive. However, it takes longer than other approaches.


  • Can be varied to suit the context
  • Issues can be explored more fully
  • Can elicit user views and identify unanticipated problems


  • Very subjective
  • Time-consuming

2. Questionnaires

Users are asked a series of predefined questions about what they like and how they feel about the design in this method. This strategy allows you to contact a large number of individuals in a short amount of time. However, this is less adaptable and probing.


  • Quick and reaches large user group
  • Can be analyzed more rigorously


  • Less flexible
  • Less probing

Evaluation through monitoring physiological responses

Eye Tracking

The location of the eye is tracked using head or desk mounted equipment in the eye-tracking approach. The following measures are obtained with the equipment, and the assessment is done by evaluating those measurements.

Fixations: eye maintains a stable position.

  • Number of fixations — The more fixations the less efficient the search strategy
  • Fixations duration — Indicate the level of difficulty with the display

Saccades: rapid eye movement from one point of interest to another
Scan paths: moving straight to a target with a short fixation at the target is optimal

Physiological Measurements

In this method users emotions and physical changes when using the user interface is observed and based on those data the evaluation is conducted.

Following are such changes observed in the process,

  • Heart activity, including blood pressure, volume and pulse.
  • The activity of sweat glands: Galvanic Skin Response (GSR)
  • Electrical activity in muscle: electromyogram (EMG)
  • Electrical activity in the brain: electroencephalogram (EEG)

Abdullah M.R.M



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store