Comparison_of_UX_Evaluation_Techniques_CA2_N00147768

Assignment Cover Sheet
MSc in User Experience Design
Student Name: Stephen Norman
Student Number: N00147768
Programme: MSc UX Design
Year of Programme: 2015/2016
Module Name: User Research and Usability
Assignment: Comparison of UX Evaluation Techniques
Assignment Deadline: 14/02/2015
I declare that that this submission is my own work. Where I have read, consulted and
used the work of others I have acknowledged this in the text.
Signature: Stephen Norman Date: 14/02/2016

Table of Contents
1. Introduction ....................................................................................................................................2
2. Evaluation Methods........................................................................................................................2
2.1. Usability Testing......................................................................................................................2
2.1.1. Case Study: “Find it if you can: Usability Case Study of Search Engines for Young
Users” 3
2.1.2. Case Study Review ..........................................................................................................3
2.2. UX Curve..................................................................................................................................4
2.2.1. Case Study: “Comparing the Effectiveness of Electronic Diary and UX Curve Methods
in Multi-Component Product Study” ..............................................................................................5
2.3. Web Surveys ...........................................................................................................................6
2.3.1. Case Study: “Approaches to Cross-Cultural Design: Two Case Studies with UX Web-
Surveys” 6
3. Comparison of Evaluation Methods ...............................................................................................7
4. Conclusion.......................................................................................................................................8
5. References ......................................................................................................................................8
6. Bibiligraphy .................................................................................................................................9

1. Introduction
There are numerous amounts of user experience evaluation methods currently in use.
Some 841 evaluation methods are currently in use, a reduction in methods since a
publication by Vermeeren et al. (2010) where 96 methods were referenced. This
paper will examine three methods; Usability Testing, UX Curve and Web Surveys.
These will be discussed and their effectiveness demonstrated though various case
studies. Furthermore, their performance will be examined and critiqued. Examples of
their multi-functional roles will also be introduced and discussed. Followed up by a
comparison on each for their real world feasibility and any future improvements
demonstrated in the conclusion.
2. Evaluation Methods
In this chapter three evaluation methods will be introduced. The methods discussed
are Usability Testing, UX Curve and Web Surveys. Each method will be described
and analysed through a case study review.
2.1. Usability Testing
Usability testing is a single behavioural study using as many as five users to maximise
outcome (Nielsen, 2012). Participants are set tasks while observers watch, listen and
take notes (Usability.gov, 2016). It is an effective method at gathering both quantitative
and qualitative information. Usability Testing is also ideal for examining attitudinal and
behavioural dimensions (Rohrer, 2014). These tests are cost effective requiring no
formal laboratory (Usability.gov, 2016), any room with portable recording devices will
be sufficient, or the testing can be performed remotely, whilst eliminating such factors
which can alter human behaviour such as location, time of day, season, or temperature
(Trivedi, 2012). Remote testing is conducted in one of two ways; moderated or
unmoderated (Schade, 2013). Moderated testing is conducted where there is two-way
communication between the participant and facilitator allowing for additional
information to be gathered. Unmoderated testing is done solely by the user without a
facilitator where users are set predefined tasks without moderation. Unmoderated
studies lack real time questioning and support (Schade, 2013). Because usability
testing is done mostly in controlled environments Monahan et al., (2008) argues this
is a disadvantage as these studies lack context. However, this also depends on the
type of application being tested.
1
http://www.allaboutux.org/all-methods

2.1.1. Case Study: “Find it if you can: Usability Case Study of Search Engines for
Young Users”
This study set out to assess 7 English search engines, and 5 German on their ability
to successfully match their interface to the abilities and skills of children. Interestingly
the study’s method was conducted without the involvement of any children, which
deviates it from standard practices (Nielsen, 2012). Three main points were
addressed; motor skills, cognitive skills and presentation of results. The motor skills
research included artefacts such as a mouse and keyboard, which assessed the
abilities of these devices from the handling, to their accuracy on the interface. This
included button sizes, clickable regions such as imagery and use of alternate methods
of providing results such as tangible figurines used in applications like TeddIR2
proposed by Jansen et al., 2010. Cognitive abilities were studied in both their
understanding of general search and how they interacted with these interfaces from
previous research. Children from age six to thirteen were in scope, as well as two
types of interfaces; browsing versus keyword orientated. Browsing interfaces allow
users to navigate and explore a set of predefined categories as used in KidsClick3,
whereas keyword orientated interfaces e.g. Google, require the user to type each
query. Final assessment criteria focused on font size, number of results per page, use
of imagery and did the search cater for semantics and spell checking.
2.1.2. Case Study Review
This was an untraditional usability evaluation with regards to using existing research
of children’s web use. It was acknowledged that to verify their research and enrich the
results that further studies should be conducted with children. With sufficient prior
research a good user model was created to allow the researchers to conduct their own
study of these interfaces thus saving time and money.
Furthermore, the chosen method was appropriate for producing desired results.
However further studies such as contextual usability inquiries, or EmoCards4 could be
performed to gather richer qualitative data.
Moreover, credit should be given to the paper’s authors with regards to their
organisational skills. Exemplary efforts were carried out on the categorisation (Figure 1),
which was conducted on all criteria throughout the paper. Without these efforts it
would have been difficult to assess the search engines properly.
2
An interface designer to help children retrieve books by placing tangible figurines on screen to represent search terms in hopes of reducing
errors from spelling and finding the correct query (Gossen et al., 2010).
3
http://www.kidsclick.org/
4
http://www.allaboutux.org/emocards

Figure 1- Categorisation of search results by button size and page length.
2.2. UX Curve
UX Curve is a method in which participants are asked to sketch their retrospective
experiences of a product use over time (Figure 2). UX Curve has been designed to better
understand user emotions and experiences chronologically (Kujala et al., 2011a).
Sketching is done on a template divided in to two planes; x-axis is time with the y-axis
can be any desired evaluating factors e.g. satisfaction or dissatisfaction (Sahar,
Varsaluoma & Kujala, 2014).
Figure 2- (Left) Showing a deteriorating and stable curve. (Right) Improving ease of use curve.
When compared to a questionnaire UX Curve has proven to be more effective at
collecting the hedonic aspects of users such as fun and pleasure (Kujala et al., 2011b).
However, in a later study concluded that long term diary studies were more effective
at collecting detailed information versus UX Curve (Sahar et al., 2014). Due to the
longevity of this study, results favoured the long-term diary study (LTDS) as it recorded
data more accurately, whereas recollection was required during UX Curve evaluation
due its presentation after the diary studies had concluded. Having to recall such

information can lead to biases argues Norman, D.A, (2009); “Retrospective
evaluations of long-term user experiences are based on memories of the user and
they can be vulnerable to biases” (Kujala et al., 2011b). According to Vermeeren et
al. (2010), it is one of the lesser used methods because it is not cost effective
impractical in product development contexts (Kujala et al. 2011b).
2.2.1. Case Study: “Comparing the Effectiveness of Electronic Diary and UX Curve
Methods in Multi-Component Product Study”
This case study assessed the performance of both UX Curve and LTDS, each for
collecting qualitative data as a remote research method (Sahar, Varsaluoma & Kujala,
2014). Twenty-five customers were recruited who had recently purchased a sports
watch and were using it at least five times per week. This multi-product study included
connected accessories such as a heart rate monitor, speed sensor and website, was
conducted remotely over an eight-week period. Participants were asked to completed
the electronic diary online up to twice a week, upon completion of the eight-week
period they were sent four UX Curve templates; one for each of the components. The
templates addressed the “Attractiveness” of the product. “We chose ‘attractiveness’
UX dimension because it represents overall appeal and non-instrumental qualities
(aesthetics, symbolic and motivational aspects), although these were not specific to
the users” (Sahar, et al., 2014), and was also chosen based on a previous study done
by Kujala et al. (2014).
The results were clear that the LTDS proved more effective at its ability to collect
further in depth information about each component when compared to UX Curve.
Although getting good user response rates was a challenge over the study duration
according to (Sahar et al.,2014).
UX Curve took less time overall from implementation to deployment and analysis as it
was conducted in one session with participants. However, this study did not maximise
on the best intended use for UX Curve; “UX Curve is intended to be used in a face to
face setting where the researcher is better able to inquire into the participants’
reasoning and thoughts” (Kujala et al. 2011a). Instead it was mailed to participants
during the Sahar et al. (2014) study eliminating the potential of qualitative data
gathering. Although limiting the full use of UX Curve, it did consider the existing
research of Kujala et al., (2011a) whom tested six UX Curve types (Figure 3) and
identified “attractiveness” as the best performing template.

Figure 3- Different curve types used while testing a product.
2.3. Web Surveys
Web Surveys are a commonly used method in the researcher’s toolkit. These allow
greater access from a larger audience due to accessibility of the internet. Both Walsh
(2012) and Vermeeren et al. (2010) agree that these are desirable due to their
lightweight nature; speed on implementation, and ease of use.
These highly versatile studies can be used at any stage of the design process. In a
recent project (Norman, 2016) a web survey was used during the exploratory phase
to gauge people’s attitudes and use of the An Post website before prototype
conceptualisation. In the opposite scale web surveys are used as a LTDS used in a
study by Sahar, et al. (2014).
Challenges raised by both Walsh (2012) and Sahar et al. (2014) was the ability to
keep participants engaged for the duration as users tended to drop out or not complete
the survey. These should be considered when surveys are being used in LTDS
contexts. An issue identified by Walsh (2012) is that researchers who formulate
questions and hypotheses should consider their own cultural background. This may
affect research questions, its performance and participant interpretation if testing
occurs in different regions and cultural backgrounds.
2.3.1. Case Study: “Approaches to Cross-Cultural Design: Two Case Studies with UX
Web-Surveys”
This study assesses the use of web surveys in two different cases. One covers an
online gaming site and the other an online sports diary. The online gaming site
objectives were to gain insights on how to design a good UX for new markets in the
future (Walsh, 2012). The online sports diary evaluated customer usage of over a
period of three months. The sample size in each greatly differed; 11,238 participants
of the gaming site were sent an invite email, with only 632 responding. 17 were
recruited for the online sports diary with 7 dropping out through the evaluation. A more
effective response was noted for the online sports diary which screened for willing
volunteers prior to the evaluation. Both surveys were sent internationally, however,
translations had to be considered prior to survey deployment (Walsh, 2012).

Therefore, the survey was created in both Swedish and Spanish, requiring researchers
to translate to English for collection. For the sports diary an invitation questionnaire
was first sent, allowing the researchers to screen for English speaking participants,
and collecting internet and device usage information.
It is believed researchers conducted their research using the best practice approaches
to both studies. However, in the diary study, the survey could be used in conjunction
with UX Curve, reflecting more positive results with regards to customer satisfaction
based on the study by Sahar et al. (2014). Again Walsh (2012) experienced the same
difficulties as Sahar et al. (2014) with regards to participation levels dropping on long-
term studies. However, the benefits of the long term diary’s format allowed for the
collection of rich qualitative data in conjunction with context, which is an important
cultural factor according to Gillham (2005).
The gaming site sought the research of Soley & Smith (2008) as their research
appeared to prove that a “sentence completion survey” method proved to be most
effective across cultures. Also this research could be improved by introducing an
invitation questionnaire to recruit willing participants initially opening research to
additional forms of questioning.
3. Comparison of Evaluation Methods
The short term studies such as Usability Testing and Web Survey used in the gaming
website (Section 2.3.1) are more cost effective, requiring less time to implement, run
and analyse. However, short term studies lack visibility on long term user experienced
emotions. Whereas LTDS provides rich qualitative data during the evaluation because
users provides feedback usually within the same day of use, while the information is
fresh.
It seems UX Curve is open to a debate, as some researchers would argue that
evaluating retrospectively during a long term study can be open to biases (Norman,
2009). However, during their study Sahar et al. (2014) found that although UX Curve
requires less implementation effort, it requires additional time analysing and converting
user sketches to digital formats.
Although not in scope, iScale is worth studying further as its application can certainly
be improved as the original publication was issues in 2012. Technology has improved
and there is potential for this product to be every bit as intuitive as sketching on paper.
If both projects were combined perhaps a better UX evaluation could emerge.

4. Conclusion
With all evaluations there are definite challenges, from participation levels, to time
required to implement, coordination, and evaluate the data. Given the facts, it is the
opinion of the author that with current trends and technology Web urveys of any form
are the most effective at acquiring rich data. Although the set up may be longer than
other methods, having the ability to easily access a database of users quickly and
easily makes this a strong candidate to address the majority of business objectives,
especially from a costing perspective.
Interestingly, during the analysis of UX Curve the author had questioned the feasibility
of a digital platform to address the same issue. It would eliminate the time needed to
convert sketches to spreadsheet. Surprisingly enough development of such an
application has been conceived. A more in depth study is required for analysing the
potential of merging UX Curve and iScale with current technology.
5. References
Desmet, P., Overbeeke, K., & Tax, S. (2001). Designing products with added
emotional value: Development and application of an approach for research
through design. The design journal, 4(1), 32-47.
Gossen, T., Hempel, J., & Nürnberger, A. (2013). Find it if you can: usability case
study of search engines for young users. Personal and Ubiquitous
Computing, 17(8), 1593-1603.
Gillham, R. (2005). Diary Studies as a Tool for Efficient Cross-Cultural Design. In
IWIPS (pp. 57-65).
Kujala, S., Roto, V., Väänänen-Vainio-Mattila, K., Karapanos, E., & Sinnelä, A.
(2011). UX Curve: A method for evaluating long-term user experience.
Interacting with Computers, 23(5), 473-483.
Kujala, S., Roto, V., Väänänen-Vainio-Mattila, K., & Sinnelä, A. (2011, June).
Identifying hedonic factors in long-term user experience. In Proceedings of the
2011 Conference on Designing Pleasurable Products and Interfaces (p. 17).
ACM.
Nielsen, J. (2012). How Many Test Users in a Usability Study? Nngroup.com.
Retrieved 10 February 2016, from https://www.nngroup.com/articles/how-
many-test-users/
Norman, S. (2016). Interaction Design Project - Anpost.ie (1st ed., pp. 3-4).
Monahan, K., Lahteenmaki, M., McDonald, S., & Cockton, G. (2008, September). An
investigation into the use of field methods in the design and evaluation of
interactive systems. In Proceedings of the 22nd British HCI Group Annual
Conference on People and Computers: Culture, Creativity, Interaction-Volume

1 (pp. 99-108). British Computer Society.
Reijneveld, K., de Looze, M., Krause, F., & Desmet, P. (2003, June). Measuring the
emotions elicited by office chairs. In Proceedings of the 2003 international
conference on Designing pleasurable products and interfaces (pp. 6-10). ACM.
Rohrer, C. (2014). When to Use Which User-Experience Research Methods.
Nngroup.com. Retrieved 7 January 2016, from
https://www.nngroup.com/articles/which-ux-research-methods/
Sahar, F., Varsaluoma, J., & Kujala, S. (2014, November). Comparing the
effectiveness of electronic diary and UX curve methods in multi-component
product study. In Proceedings of the 18th International Academic MindTrek
Conference: Media Business, Management, Content & Services (pp. 93-100).
ACM.
Schade, A. (2013). Remote Usability Tests: Moderated and Unmoderated.
Nngroup.com. Retrieved 10 February 2016, from
https://www.nngroup.com/articles/remote-usability-tests/
Soley, L., & Smith, A. (2008). Projective techniques for social science and business
research.
Usability.gov. (2016). Usability Testing. Retrieved 11 February 2016, from
http://www.usability.gov/how-to-and-tools/methods/usability-testing.html
Vermeeren, A. P., Law, E. L. C., Roto, V., Obrist, M., Hoonhout, J., & Väänänen-
Vainio-Mattila, K. 2010, October). User experience evaluation methods:
current state and development needs. In Proceedings of the 6th Nordic
Conference on Human-Computer Interaction: Extending Boundaries (pp. 521-
530). ACM.
Walsh, T., & Nurkka, P. (2012, November). Approaches to cross-cultural design: two
case studies with UX web-surveys. In Proceedings of the 24th Australian
Computer-Human Interaction Conference (pp. 633-642). ACM.
6. Bibiligraphy
Allaboutux.org,. (2016). All UX evaluation methods Â« All About UX. Retrieved 11
February 2016, from http://www.allaboutux.org/all-methods
Karapanos, E., Martens, J. B., & Hassenzahl, M. (2012). Reconstructing experiences
with iScale. International Journal of Human-Computer Studies,70(11), 849-865.
Jansen, M., Bos, W., van der Vet, P., Huibers, T., & Hiemstra, D. (2010, June).
TeddIR: tangible information retrieval for children. In Proceedings of the 9th
international conference on interaction design and children (pp. 282-285). ACM.
Norman, D. A. (2009). THE WAY I SEE IT Memory is more important than
actuality. Interactions, 16(2), 24-26.

Comparison_of_UX_Evaluation_Techniques_CA2_N00147768

Comparison_of_UX_Evaluation_Techniques_CA2_N00147768

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Comparison_of_UX_Evaluation_Techniques_CA2_N00147768