Qualitative Studies in Software Engineering - Interviews, Observation, Grounded Theory

Qualitative Studies in
Software Engineering
Alessio Ferrari, ISTI-CNR, Pisa, Italy

alessio.ferrari@isti.cnr.it
cf. Alan Bryman, Social Research Methods, 5th Ed. Oxford University Press, 2016
April, 2020

Qualitative Studies
• A qualitative study is the application of any empirical
investigation strategy (field study, field experiment, survey, etc.)
in which qualitative data collection and analysis methods are
used

• While quantitative methods deal with numbers, qualitative
methods mainly deal with concepts and words

• They are typically used in social sciences, as they have a
greater focus in human aspects
• Therefore, they are appropriate when you want to focus on
human and social aspects of software engineering, which, we
recall, is a socio-technical field
Include Ethnography

Theory
Operationalisation
Sample Deﬁnition, Data
Collection
Qualitative Studies are
Inductive Approaches to Build Theories
Qualitative
Data Collection
(Interviews, Observations)
Qualitative
Data Analysis
(Grounded Theory*)
*Actually, grounded theory
takes into account also data collection

The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight Simulator
Courtroom
Referendum
Mathematical Model
Mostly applied in
Field Studies, Field Experiments,
Judgment Studies
Not applied in Lab Experiments
and Computer Simulations

Qualitative Studies
• Qualitative studies are based on the analysis of interviews with people
(developers, users), observation of people at work (testing activities, group
meetings), analysis of archival data (software documentation, emails, logs)
people at work
meetings
interviews
documentation,
emails, logs

Examples
• Interviews: I want to understand how developers see
testers and vice-versa, to possibly develop better
collaboration strategies. I interview groups of developers
and testers in a company.
• Observations: I observe the meetings in a company to
understand which are the typical patterns of
communication.
• Analysis of archival data: I analyse the documentation
to understand which are the common elements between
diﬀerent types of systems of the same company (e.g., to
create a common generic platform)

Examples: by Strategy
• Field Study: I interview people in a company to understand which are the main pain
points with requirements from developers’ and managers’ perspective (focus on context)

• Field Experiment: I interview and observe people at work to understand whether the
novel tool developed for automated requirements analysis addresses the previous
problems of their analysis activity (focus on context and improvements)
• Sample study (Survey): I ask people from different companies to summarise which are
the major issues with requirements in their company (generalise over contexts)

• Formal Theory (Literature Review): I search the literature for evidence of problems
related to requirements and proposed solutions in existing works (focus on historical
problems, generalise over contexts)
• Judgment Study: I ask a series of expert requirements analysts from different companies
to analyse a sample requirements document, explain the main defects of the document,
in which way they differ from typical defects that they encounter (focus on today’s
problems, generalise over contexts)
The topic is similar (requirements defects) but the strategies are completely different!

Quantitative vs Qualitative
• The overall process is similar between qualitative studies and
quantitative studies—ask a research question, collect data, analyse
data, answer the question— however some RELEVANT diﬀerences exist

• The main diﬀerence resides in the degree of objectivity of the data
analysis process: while quantitative studies aim to be objective (and
repeatable), qualitative studies accept the subjectivity inherent to the
interpretation of (qualitative) data

• The important thing is providing evidence that the interpretation is
derived from the data (e.g., interview transcripts) in a sound and
reasonable way
• In this module, we will see Data Collection and Data Analysis methods
that are appropriate for qualitative studies

Quantitative Studies
e.g., lab experiments
PREPARATION EXECUTION REPORTING
Theory
Hypothesis and
Variable Definition
Research Design
Research Question
Define Measures for
Variables
Recruit Participants
/ Select Artifacts
Collect Data
Analyse Data
Report Answers
Internal Validity
External Validity
Construct &
Conclusion Validity
Construct
Validity
Discuss
The process normally starts from a Theory
and discusses/modifies it in relation to the results

Qualitative Studies
Theory
Research Design
Research Question
Collect Data
Analyse Data
Report Answers
Internal Validity
External Validity
/ Select Artifacts
Reliability
The process normally starts from a Research Question
and derives a Theory from the data

Qualitative Studies
Theory
Research Design
Research Question
Collect Data
Analyse Data
Report Answers
Internal Validity
External Validity
/ Select Artifacts
Reliability
The process normally starts from a Research Question
and derives a Theory from the data
There is (more) iteration
Full control of the process is limited

Quantitative Qualitative
Numbers Words (and Images)
Researcher-driven Participants-driven
Researcher is distant Researcher is close
Theory is tested against data Theory emerges from data
Linear Iterative
Structured Unstructured
Generalisation-Oriented Context-oriented
Hard, reliable data Rich, deep data
Behaviour Meaning
Artiﬁcial settings Natural settings

Quantitative Qualitative
Numbers Words (and Images)
Researcher-driven Participants-driven
Researcher is distant Researcher is close
Theory is tested against data Theory emerges from data
Linear Iterative
Structured Unstructured
Generalisation-Oriented Context-oriented
Hard, reliable data Rich, deep data
Behaviour Meaning
Artiﬁcial settings Natural settings
This differentiation is not so strict!

• Behaviour vs Meaning: Quantitative studies also aim to find some meaning
in the data, and qualitative studies also search for patterns and behaviours

• Testing theory vs Eliciting Theory from Data: Quantitative studies
sometimes are not based on well-established theories and some iteration is
needed; qualitative studies cannot assume that no pre-existing theory exist
in the mind of the researcher

• Number vs Words: words may occur more frequently than others (e.g., in
interviews), and this may give relevance to certain concepts in qualitative
research; I can use qualitative data but extract quantitative information (e.g.,
occurrences of terms in Tweets)

• Natural vs Artificial: how natural is to perform an interview? do I really get
the actual information that I need? (people tend to say things that may be
different from reality and and also from what they think!)
As usual, classification is just a convention, reality is always shaded…

Qualitative Data Collection
and Qualitative Data Analysis
Documents
People
Source of Data
Software Process
Systems
Data Collection Data Analysis
Inquisitive
Techniques
(e.g., interviews)
Observational
Techniques
(e.g., observation)
Archival Data
Collection (e.g.,
mining logs, software
documentation,
e-mails)
Grounded
Theory
Coding
Thematic
Analysis

• Source of qualitative data:

• Are mostly people and documents

• Also systems can be source of qualitative data (e.g., code comments), but we do not
consider them in this lecture, as systems are mostly used for quantitative data —treated in
an aggregate form and automatically processed (e.g., data logs, and code information)

• Data Collection techniques: we will see interviews, observations (surveys/questionnaires are
considered in another separate lecture, and are mostly for quantitative data) and archival data
collection
• Data Analysis techniques: many techniques with diﬀerent names exist, BUT we focus on
grounded theory-thematic analysis, and on the main tool used for qualitative data analysis,
namely coding (i.e., associating conceptual labels to textual fragments)
Grounded theory and Thematic analysis are, in principle, DIFFERENT
Here you will learn Grounded theory, which somewhat includes Thematic Analysis
(but there are many opinions on this)

More on Data Collection Techniques
(for Case Studies - aka Field Studies and Experiments)
Table 1. Data collection techniques suitable for ﬁeld studies of software engineering.
Category Technique
Inquisitive techniques
First Degree
(direct involvement of software engineers)
& Brainstorming and Focus Groups
& Interviews
& Questionnaires
& Conceptual Modeling
Observational techniques
& Work Diaries
& Think-aloud Protocols
& Shadowing and Observation Synchronized Shadowing
& Participant Observation (Joining the Team)
Second Degree
(indirect involvement of software engineers)
& Instrumenting Systems
& Fly on the Wall (Participants Taping Their Work)
Third Degree (study of work artifacts only) & Analysis of Electronic Databases of Work Performed
& Analysis of Tool Use Logs
& Documentation Analysis
& Static and Dynamic Analysis of a System
DATA COLLECTION METHODS 313
cf. Lethbridge et al., 2005. https://link.springer.com/content/pdf/10.1007/s10664-005-1290-x.pdf
This will be useful when we will discuss case studies, but it is good to have it here as in SE
qualitative studies are often performed in the context of cases studies

Data collection Techniques vs
Research Goal (for Case Studies)
Table 2. Questions asked by software engineering researchers (column 2) that can be answered by ﬁeld study techniques.
Technique Used by researchers when their goal is to understand: Volume of data
Also
engin
First Order Techniques
Brainstorming and Focus Groups Ideas and general background about the process and product,
general opinions (also useful to enhance participant rapport)
Small Requ
proje
Surveys General information (including opinions) about process,
product, personal knowledge etc.
Small to Large Requ
Conceptual modeling Mental models of product or process Small Requ
Work Diaries Time spent or frequency of certain tasks (rough approximation,
over days or weeks)
Medium
Think-aloud sessions Mental models, goals, rationale and patterns of activities Medium to large UI e
Shadowing and Observation Time spent or frequency of tasks (intermittent over relatively
short periods), patterns of activities, some goals and rationale
Small Adva
case
Participant observation
(joining the team)
Deep understanding, goals and rationale for actions, time spent
or frequency over a long period
Medium
Second Order Techniques
Instrumenting systems Software usage over a long period, for many participants Large Softw
Fly in the wall Time spent intermittently in one location, patterns of activities
(particularly collaboration)
Medium
Third Order Techniques
Analysis of work databases Long-term patterns relating to software evolution, faults etc. Large Metr
Analysis of tool use logs Details of tool usage Large
Documentation analysis Design and documentation practices, general understanding Medium Reve
Static and dynamic analysis Design and programming practices, general understanding Large Prog
metr
cf. Lethbridge et al., 2005. https://link.springer.com/content/pdf/10.1007/s10664-005-1290-x.pdf
Interviews/questionnaires
Participant Observation
Archival Data Collection

Sampling in
Qualitative Research
April, 2020

Probability vs Purposive Sampling
• Sampling means identifying the units that need to be
involved as sources of data in order to properly answer the
RQs

• Units can be people, organisations, documents, departments, etc. and
can have embedded units (more on this when we will discuss case
studies)

• Probability sampling: given a population of interest, I select a number
of units that are representative of my population according to some
probabilistic scheme (normally random sampling, or stratiﬁed random
sampling) — not frequent in qualitative research, more appropriate for
Surveys/Questionnaires (we will see this in another lecture)
• Purposive sampling: given the research question, I sample
strategically, by selecting the units that, in the given context, are the
most appropriate to give diﬀerent internal perspectives to come to a
(locally) complete view—appropriate for Qualitative Studies

Purposive Sampling in SE
• Sample of context: select based on heterogeneity (contrasting contexts, to
increase relevance of possibly different findings), or homogeneity (common
contexts, to better define the scope of the possible findings)

• Example: Testers in company A and B, developers in company A and B
(heterogeneity) all with the same degree of experience (homogeneity)

• Sample of subjects: given the selected contexts, select the subjects that are
representative for that context

• Example: 14 developers/testers from company A and 13 from company B

• Sample of documents (or artefacts in general): given the selected contexts,
select the documents that are representative for that context (e.g., produced by
testers vs produced by developers, if I want to compare what they write)

• Example: 10 documents produced by different people in A, and 10 in B
Example: Which are the differences between the writing styles of developers and testers?
The choice depends on the focus of your RQs!

Types of Purposive Sampling
1. Criterion sampling. Sampling all units (cases or individuals) that meet a particular criterion (e.g., > 5 years experience)

2. Typical case sampling. Sampling a case because it exemplifies a dimension of interest (e.g., one expert for each
team)

3. Extreme or deviant case sampling. Sampling cases that are unusual or that are unusually at the far end(s) of a
particular dimension of interest (e.g., experts with many years of experience and close to retirement)

3. Critical case sampling. Sampling a crucial case that permits a logical inference about the phenomenon of interest—
for example, a case might be chosen precisely because it is anticipated that it might allow a theory to be tested (e.g.,
one subject that just is not expert for a theory that applies only to experts; another expert in same team)

4. Maximum variation sampling. Sampling to ensure as wide a variation as possible in terms of the dimension of
interest (e.g., people with different degrees of experience)

6. Theoretical sampling. Typical of Grounded Theory, units are selected if they are expected to confirm or reject a
certain theory/hypothesis or can extend a certain category identified during the Grounded Theory process. Basically,
based on your current data, you identify what is missing or what you want to investigate more.

7. Snowball sampling. Ask participants for additional contacts to be interviewed

8. Opportunistic sampling. Capitalizing on opportunities to collect data from certain individuals, contact with whom is
largely unforeseen but who may provide data relevant to the RQ (e.g., non-developers, testers, managers)

9. Stratified purposive sampling. Sampling of usually typical cases or individuals within subgroups of interest (e.g.,
expert and novices, small projects vs large projects)
Remember that the process is ITERATIVE
Example: Which are the bug correction strategies of expert developers?

Yes, but How Many People
should I Interview in a SE study?
Between 20 and 30 subjects
How Many Documents/Artefacts should
I select in a SE Qualitative Study?
20 to 10 if they are long documents (30-100pp)
More than 40 if they are short ones (e.g., 2-5pp)

Why? MAGIC

Why? MAGIC
WARNING: Many studies focus on qualitative data (e.g.
comments, app reviews) but do quantitative analysis:
these numbers are not applicable for them (you need more samples)

Qualitative Data
Collection: Interviews

Interview Types
• Structured Interviews: similar to questionnaires, but
delivered by a person—mostly quantitative data, can help
to clarify questions

• Semi-structured Interviews: you have a set of (open-
ended) questions to ask and concepts to cover, but you
can add new questions
• Unstructured Interviews: no predeﬁned questions,
conversational approach
We will focus on semi-structured interviews

Interview Process
Research Questions
Formulate Interview
Questions
Revise Interview
Questions
Identify Novel Issues
Finalise Questions
Preparation of Questions Interview Preparation
Identify Scope of Research
Questions
Interview Conduct
Agree on a Quiet Place and
Suitable Time for Interview
Recruit Interview Subject(s)
Prepare Disclosure Agreement
and Data Management Policy
Ask Questions
Store Interview Recording
and Write Notes
Transcribe Interview
Recording
Interview Data Creation
Study Domain Jargon of
Subject(s)
Pilot Questions
Identify Interview Subject(s)
Set-up and Try Recording
and Storage Equipment
Create Rapport
Check Recording and
Storage Equipment
Summary and Wrap-up
Important yet
underestimated
steps are in RED
parallel
activities

Interview Process: Highlights
• Study Domain Jargon of Subject(s): each domain, role, and even each
company, use a speciﬁc terminology, and you need to have an idea of the
words that your interviewees will use; be prepared to a large usage of
jargon by them, but do not use jargon yourself (keep your questions simple)
• Pilot Questions: you need to be sure that your questions can be clearly
understood, and that are suﬃcient to gather the information you want.
Therefore you need to do preliminary interviews with your questions; the
best would be to pilot the questions with part of your sample of subjects; in
reality, you may need to pilot questions with colleagues.

• Identify Interview Subjects(s): you may want to interview people in a
company, in more than one company, but you need to know which are the
right ones that can answer your questions. If you do not select the right
people, you will not get the “right” answers. You may also identify relevant
people when interviewing someone (snowballing sampling)

• Create Rapport: in most of the cases you do not personally know the
person that you interview, so you need to be kind and create a relationship
in short time; suggestion: act like a bartender (self-confident yet
accommodating)
• Set-up/Try and Check Recording and Storage Equipment: if the
recording/storage equipment does not work properly, you do not have
data; try the equipment in advance, and also right before the interview;
check that you have enough storage space; check that the voice can be
clearly heard (noice cancelling microphones)
• Summary and Wrap-up: you should summarise what you have
understood from the interviewee, as this normally triggers clarifications and
other information; do not stop the recording also when the interview is
finished (people tend to say relevant information in the informal
environment that is created at the end of the interview)
Interview Process: Highlights

Characteristics of a Successful Interviewer
• Knowledgeable: is thoroughly familiar with the focus of the interview (pilot interviews to become knowledgeable!)

• Structuring: gives purpose for interview; asks whether interviewee has questions.

• Clear: asks simple, easy, short questions; no jargon.

• Gentle: lets people finish; gives them time to think; tolerates pauses.

• Sensitive: listens attentively to what is said and how it is said; is empathetic in dealing with the interviewee.

• Open: responds to what is important to interviewee and is flexible.

• Steering: knows what he or she wants to find out.

• Critical: is prepared to challenge what is said—for example, dealing with inconsistencies in interviewees’ replies.

• Remembering: relates what is said to what has previously been said.

• Interpreting: clarifies and extends meanings of interviewees’ statements, but without imposing meaning on them.

• Balanced: does not talk too much, which may make the interviewee passive, and does not talk too little, which may
result in the interviewee feeling they are not talking along the right lines.

• Ethically sensitive: is sensitive to the ethical dimension of interviewing, ensuring the interviewee appreciates what the
research is about, its purposes, and that his or her answers will be treated confidentially.

• ADAPTABLE: each person is different, and you have to adapt your behaviour…

Your First Interview: Challenges
• Unexpected interviewee behaviour or environmental problems: expect
the unexpected, in terms of what you hear, and in terms of noise in the
environment (people can become too honest, place could be loud, people
may interrupt your interview)

• Intrusion of own biases and expectations: be careful not to ask leading
questions, do not influence the interviewee

• Maintain focus: pass to the next question only when you are satisfied with
the answer to the current question, otherwise ask probing/clarification
questions, do not hurry (this is your only chance to get that information)

• Dealing with sensitive issues: sometimes interviewees may get
uncomfortable with some questions, be receptive and change topic

• Transcription: be prepared to spend a lot of time (5-6 hours every 1 hour
of interview)
cf. Roulston et al., 2003 https://doi.org/10.1177/1077800403252736

Formulating Questions
• Order of the questions is crucial, so start with general questions,
separate questions by topic, create a natural flow in the conversation

• Focus your questions on 1. Process, 2. People, 3. Artefacts, and
follow this order, as first you need to understand the process, then
the involved subjects and then what is produced

• Use a language that is understandable to the interviewee (again)

• Do not ask leading questions (again)

• Do not forget to ask and record information of a general kind (name,
age, gender, etc.) and a specific kind (position in company, number of
years employed, number of years involved in a group, etc.)
More information on how to formulate questions when we will discuss questionnaires!

Types of Questions
• Introducing questions: Can you tell me how you started working for the company? Can you explain
me which are your duties in the company? Can you tell me when do you typically use the tool X? (e.g.,
if the subject is a user)
• Follow-up questions: You mentioned project Y in your last answer. Can you give me more details?
• Probing/Interpreting questions: I have understood that you do not like documenting your code, am I
right?
• Specifying questions: What did you do at that point?
• Direct questions: Are you happy with the current process?
• Indirect questions: What do most people around here think of the ways that management treats the
developers? perhaps followed up by: Is that the way you feel too?
• Structuring questions: I would like to move now to a diﬀerent topic (not really a question, but well…)

• Silence: When there are silent gaps in an interview, there is a tendency for the interviewer to keep
talking. However, the interviewer should try not to ﬁll the silent gap and let the interviewee talk.
8 Types of Questions, cf. https://www.bbc.co.uk/bitesize/guides/zctwqty/revision/8

Topics of Questions in Software Engineering
Process and Tasks People and Roles
Products
and
Artefacts
Fact
Structural/Recurring: What are
your main duties? How much
time does it normally take to
ﬁnalise the testing?

Episodic: Could you tell me
about the experience with
project X?
S/R: Who are the people
involved in task X? Which are
their roles?
E: Who was involved in
project X? In which roles?
S/R: Which documents
are produced in this task?
How many normally?
E: How many tests were
carried out during project
X?
Opinion
S/R: What do you like of task T?
E: What did you learn during the
project X experience?
S/R: How do you like your
role and position in the
organisation?
E: Was that treatment fair for
the developer in project X?
S/R: Is the quality of code
normally high in the
company?
E: Which were the most
buggy
modules in project X?
Structural/Recurring: related to how are things normally;
Episodic: refer to speciﬁc experiences

Issues with Interviews
• You must be a naturally good interviewer, and curious, and
you must be a likeable person. You do not learn that…

• Identifying the right people is hard, and conﬁdentiality may
prevent them from disclosing useful information

• People do not have time (but if they ﬁnd the time, they like to
talk with someone else…and be understood)
• Technical people like to speak technical language
• A lot of data is produced and it takes a lot of time to
transcribe and to analyse
Yet, they are the best tool to get to know people and knowledge!

Qualitative Data
Collection: Observation
cf. Zhang et al. 2019, https://doi.org/10.1145/3338906.3338976
cf. Sharp et al. 2016, http://dx.doi.org/doi:10.1109/TSE.2016.2519887

Documents
People
Source of Data
Software Process
Systems
Data Collection Data Analysis
Inquisitive
Techniques
(e.g., interviews)
Observational
Techniques
(e.g.,
observation)
Archival Data
Collection (e.g.,
mining logs, software
documentation,
e-mails)
Grounded
Theory
Coding
Thematic
Analysis

Observation Types
• Structured and Systematic Observation.
• Observe participants according to some rules, e.g., related to
time or actions, so that diﬀerent participants can be compared
in terms of behaviour (e.g., developers vs testers)

• They use an observation schedule (similar to a questionnaire),
e.g., annotate frequency and quality of meetings, annotate
ﬁnal tasks performed during each day
• Unstructured observation.

• Does not entail the use of an observation schedule for the
recording of behaviour.

• The aim is to record in as much detail as possible the
behaviour of participants with the aim of developing a
narrative account of that behaviour.
Observation is not so common in current SE research

Observation Types
• Participant observation.
• Prolonged immersion of the observer in a social setting in which they observe
the behaviour of members of that setting (group, organization, community, etc.)
and to elicit the meanings they attribute to their environment and behaviour.

• Participant observers vary considerably in how much they participate in the
social settings in which they locate themselves (e.g., I am part of the team, and
participate to code review once in a while, OR I am contributing with a tool and
stay there the whole time)

• Non-participant observation. This is a term that is used to describe a situation in
which the observer observes but does not participate in what is going on in the
social setting

• Simple observation vs contrived observation. With simple observation, the
observer has no inﬂuence over the situation being observed; in the case of
contrived observation, the observer actively alters the situation to observe the
eﬀects of an intervention (introduces a new tool)

Participant Observation
• Covert Full Member: the others do not know that you are a researcher, and you
are hired
• Overt Full Member: the others know you do research, but you are also hired by
the company (e.g., you are a Ph. D. student but you also work for the company)
• Participating Observer: they know you do research, and you cooperate but not
as full member (e.g., you are a research assistant from University, temporarily
placed in a company)
• Partially Participating: the observer partially participante to the activity,
observation is not the main source of data, but you use also interviews and
document analysis
• Non-participating (with interactions): minimal observation, contact through
interviews and document analysis
The observer participates to the environment

Field Notes
• Mental Notes: when it is not appropriate to be seen taking notes
(e.g., relaxed environments such as coffee-breaks)
• Jotted Notes/Scratch Notes: when you can write unseen, but
don’t have much time. Use a paper notebook.

• Audio Notes: to be recorded when you want to reflect on
something, may be useful to share them with another individual
(e.g., through Whatsapp voice messages).
• Full-field Notes: detailed notes, made as soon as possible, which
will be your main data source. They should be written at the end
of the day or sooner if possible. They are similar to a diary.
Doing ethnography means relying on field notes
What should they contain? Any impression or fact, they should be DENSE DIARIES

5 Dimensions of
Observational Studies
cf. Sharp et al. 2016, http://dx.doi.org/doi:10.1109/TSE.2016.2519887
Degree of
Participation
(Participant vs
Non-participant)
Duration of the Study
(and Frequency
of Presence)
Space and Location
(Distributed or Local SE)
Focus
(People, Relationships,
Activities, Artefacts,
Information)
Goal of the Researcher
(Improve, Understand, Solve)

Observation Process
Research Questions
Observer Role Definition
Solve Bureaucratic Issues
Timeline Definition
Design
Company Identification
Observe
Execution
Focus Definition Take Notes
Participate
Analyse/Rework Notes Observation Data
Get to know what you can and cannot publish
as early as possible (and check if the company
name can be disclosed)
Analyse/Rework your notes Every Day

Checklist for Observational Studies
Design Phase
1.What organizations or teams will you study? What environment do they have? Why do you study
them? 
 
Describe the research object (e.g., what kind of culture the organization claims it has, the ongoing
software projects in the organization, who is involved in the organization and what links do they have
outside the organization)
2.What things and who will you focus on during your study? 
 
State the key roles (e.g., Project Leader, Consultant) you are studying in the organization
3. How much do you know about the organization before your study? How much effort will you
spend on learning the organization? 
 
State the way you getting the knowledge of the organization (e.g., by official document, network,
others’ introduction)
4. How long will your study last? Is it enough? 
 
State the duration of your study. 8 months is the average duration in SE. If your duration is less, study
more examples (e.g., the same team’s different project)
valid for any Ethnographic study

Checklist for Observational Studies in SE
Execution Phase
1. How will you enter the organization (e.g., introduced by a member, on your own)? Will your
entering disturb others’ normal work? If so, how big is the impact?
State the way you enter the organization, and analyze the eﬀect. If you become a member of the project
you study, describe your contribution to the project.

2. Who will collect and analyze the data, one researcher or more? If the latter, who will do what,
and how can their work be coordinated?
Detailed instructions are needed in the ethnographic research. The participation of researchers in the
project will aﬀect data collection.

3. What data will be collected (e.g., the recording of interviews, the application log, daily
documents, videos of meetings)? How and when will the data be collected?
Describe the data collection methods (e.g., interview, participant-observation, questionnaires).

If interview was used, the recording should be transcribed and the voice speed, tone, emotion, and
background of the interviewee should be recorded.

If participant-observation was used, every detail of the participant’s daily life should be recorded (e.g.,
when and where an observation began and ended).

Checklist for Observational Studies in SE
Execution Phase
4. How many aspects of the organization can your data show? What are they and
what is their meaning?
Describe how your data reflect the organization and explain the meaning of the data.

Triangulation is an important strategy of the traditional ethnography. You need to find as
many aspects as possible to understand more completely the part a member plays in
software projects.

5. Will you put your own experience into the analysis? Are you biased against data
when analyzing?
If you are a software engineer at the same time as an ethnographer, you will be biased
against some data (e.g., missing some important details).

State what may influence your analysis and give an explanation

Issues with Observations
• Observational studies take A LOT of time

• Most of the times, you have to do the work for the company, and report
about your experience (so two jobs, basically)

• Observational studies are unavoidably biased and subjective

• Observational studies tend to produce THICK amounts of data, and are
hard to report in a paper —more suitable for books

• There is no accepted standard for reporting observational studies in
software engineering

• Many things may be conﬁdential and you may not publish them!
Tip: always combine observations with interviews
Tip: possibly include analysis of archival data (if they allow you access)

Qualitative Data
Collection: Archival Data

Archival Data Types
• Archival data can be qualitative (documentation, code
comments, social media information, app reviews) and
quantitative (e.g., number of commits in repository, time spent
in a task, etc.)

• Archival data may include also diagrams (e.g., models)

Archival Data Types
Business Requirements
Specification
System Requirements
Specification
Test Reports
System Design
Requirements
Reviews
Official documents produced
by the SE process
E-mails
Tweets
App Reviews
StackOverflow
Issues and
Bug Reports
User Manuals
Code comments
Internal data supporting
the SE process
Data related to the
SE process
Subject to in-depth
qualitative analysis
Subject to simple
classification

Examples: Requirements
As a user, I want to share pictures, so that my friends will see them
If track data at least to the location where the relevant MA ends are
not available on-board, the MA shall be rejected
The voucher numbers are system
generated and created with unique
identiﬁcation numbers with security
protocols in-built. The created unique
numbers are then printed out in the form
of bar-codes, which will complement (or
stuck on the voucher) the voucher. […]
User Story
One Sentence - High
Unstructured
When MA_received = FALSE and T_speed > 0 and MA_time > 15, then T_brake = 1
One Sentence - Low
Actor Student
Success Scenario 1. Student selects “List”
2. System displays available courses
3. Student selects one of the courses
Structured - Use Case

Examples: Bug reports and
Feedback
It would be nice to have a way to search
my previous messages by keyword
User’s Feedback
Application does not create a new item when clicking the
SAVE button while creating a new item. Steps to
reproduce:
1) Login into the application
2) Pressed button New Item
3) Filled the information for the new item
4) Clicked on Save button
5) Seen an error page “ADA121 Exception: value error”
Bug Report

Examples: Bug reports and
Feedback
It would be nice to have a way to search
my previous messages by keyword
User’s Feedback
Application does not create a new item when clicking the
SAVE button while creating a new item. Steps to
reproduce:
1) Login into the application
2) Pressed button New Item
3) Filled the information for the new item
4) Clicked on Save button
5) Seen an error page “ADA121 Exception: value error”
Bug Report
Type of archival data to consider depend on the context
(process, domain, task, company size, etc.)!

Archival Data Collection Process
Research Questions
Definition of Data Types
Solve Bureaucratic Issues
Companies or Tools
Identification
Preliminary Analysis
Execution
Selection of a
Representative Sample
Selected Data
Design
Share Data (if possible) Github, Zenodo,
but ANONYMISE them*!
NOTE: the fact that data
are PUBLIC or CONFIDENTIAL is crucial!
In field studies these data
are often connected with
other data (observation, interviews)
*cf. Peters and Menzies, 2012 http://menzies.us/pdf/12privacy.pdf

Issues with Archival Data
• Official Documentation and Internal Data:
• Documentation and internal data may be not updated with the software (e.g., comments not
updated, test cases not updated)

• Documentation may not be understandable without other documents, without the authors, or
without a clear picture of the overall process (e.g., system requirements not understandable
without user-level requirements)

• Documentation may not exist, and you may have to elicit information from the code itself or from
the system itself (by trying it out!)

• In paper-rich projects, it may be hard to make sense of how certain documents are used and
which is their role in the process

• If you compare documents from different nations they may have different languages

• Data related to the software process:
• A lot of data and potentially noisy data (e.g., typos, slang, errors)

• Relevant data may be limited (e.g., most of the app reviews are not informative)

• Normally classified, and used to train machine learning algorithms to do the classification

Qualitative Data Analysis:
Grounded Theory, Thematic
Analysis and Coding

Qualitative Data:
Observations,
Interviews, etc.
Coding
(i.e., assign
labels/tags to
relevant text)
happy with boss
unhappy with
colleagues
Thematic
Analysis
(identify patterns
and abstractions
in codes)
Causes of
Unhappiness
Causes of
Happiness
This is all Grounded Theory

Qualitative Data:
Observations,
Interviews, etc.
Coding
(i.e., assign
labels/tags to
relevant text)
happy with boss
unhappy with
colleagues
Thematic
Analysis
(identify patterns
and abstractions
in codes)
Causes of
Unhappiness
Causes of
Happiness
unhappy with
type of work
Constant
Comparison
(compare abstractions and
patterns with data)

Theoretical
Sampling
(search for additional
data sources
based on theory)
Qualitative Data:
Observations,
Interviews, etc.
Coding
(i.e., assign
labels/tags to
relevant text)
happy with boss
unhappy with
colleagues
Thematic
Analysis
(identify patterns
and abstractions
in codes)
Causes of
Unhappiness
Causes of
Happiness
unhappy with
type of work
Constant
Comparison
patterns with data)

Theoretical
Sampling
(search for additional
data sources
based on theory)
TheoryTheoretical Saturation
Qualitative Data:
Observations,
Interviews, etc.
Coding
(i.e., assign
labels/tags to
relevant text)
happy with boss
unhappy with
colleagues
Thematic
Analysis
(identify patterns
and abstractions
in codes)
Causes of
Unhappiness
Causes of
Happiness
unhappy with
type of work
Constant
Comparison
patterns with data)

• Grounded Theory is a systematic technique to support induction of a theory from
qualitative data

• Normally Grounded Theory and Thematic Analysis are treated as separate
techniques, mostly for historical reasons

• For our purposes, Grounded Theory is a framework that includes Thematic Analysis,
which makes use of Coding, i.e., labelling chunks of qualitative data

• The Grounded Theory process starts with the initial data collected, which can be ANY
type of qualitative data, the researcher critically reads the data, adds labels/memos
(with nVivo or Excel), creates higher level categories, and searches for patterns within
the data (i.e., recurrent themes and relations among themes)

• The Grounded Theory process requires:

• Constant Comparison with the data (to check that the theory is in line with the data)

• Thematic Sampling (sampling of new subjects based on the current theory, to get
additional data)

• and is based on Theoretical Saturation (you stop when you feel nothing new can
be discovered)
Grounded Theory, Thematic Analysis and Coding

• Grounded Theory is a systematic technique to support induction of a theory from
qualitative data

• Normally Grounded Theory and Thematic Analysis are treated as separate
techniques, mostly for historical reasons

• For our purposes, Grounded Theory is a framework that includes Thematic Analysis,
which makes use of Coding, i.e., labelling chunks of qualitative data

• The Grounded Theory process starts with the initial data collected, which can be ANY
type of qualitative data, the researcher critically reads the data, adds labels/memos
(with nVivo or Excel), creates higher level categories, and searches for patterns within
the data (i.e., recurrent themes and relations among themes)

• The Grounded Theory process requires:

• Constant Comparison with the data (to check that the theory is in line with the data)

• Thematic Sampling (sampling of new subjects based on the current theory, to get
additional data)

• and is based on Theoretical Saturation (you stop when you feel nothing new can
be discovered)
Grounded Theory, Thematic Analysis and Coding
Grounded theory is gloriﬁed abstraction from data…

Coding
• A code in qualitative inquiry is a word or short phrase that symbolically
assigns a summative, salient, essence capturing, and/or evocative
attribute for a portion of data

• Open Coding: codes are not pre-defined, and are “invented” by the
researcher. Starting point for Grounded Theory.

• Closed Coding: codes are pre-defined, e.g., based on existing literature
or based on some consolidated codes derived from a previous activity of
open coding (similar to classification)

• Coding is an iterative activity, in which code change names and are
linked one to the other

• Coding is not just labelling text, is abstracting, and understanding

Coding Phases (in Grounded Theory
Terminology): Open, Axial, Selective
Open Coding
Axial Coding
aka
Thematic
Analysis
Selective
Coding
aka
Theory
Generation
Concepts
Categories
Relationships
between
Categories
Central
Category
Hierarchical
Categorisation
Data
Theory
Constant
Comparison
Memos Memos Memos

Memos
• Memos (or Analytic Memos) are just annotations
associated to reﬂections that you make while you read
and code

• They are useful to pass from one coding stage to the
other, since, as you code, the memos can lead you to
more structured and abstract reasoning

• Just write down what you think, and link it to the chunks
of text that triggered a speciﬁc reasoning

Open Coding
• Open coding is oriented to identify initial concepts, and to group concepts into categories
• Types of Open Coding:
• Descriptive Coding: identify the topic of the data, normally names. I lose sense of time
when programming. Possible code: [sense of time]

• Analytic Coding: refer to higher abstractions, and derive from the researchers’ reﬂections,
normally names. I lose sense of time when programming. Possible code: [engagement]

• Process Coding: refer to the process and actions performed, normally ends in -ing.  
I lose sense of time when programming. Possible codes: [programming]
• Techniques to Start:
• In Vivo Coding: use the terms that appear in the data (e.g., [sense of time] above)

• Line-by-line Coding: assign a code to each line, regardless of relevance

• Sentence Coding: highlight the sentences that appear relevant and assign a code only to
them.
Tip: do sentence coding, start with descriptive and process coding, then analytic

Open Coding in SE: What Should I Look at?
• practices (daily routines, occupational tasks, etc.)

• roles (tester, developer, manager, etc.) and social types
(bully, geek, kind, etc.)

• artefacts (code, tests, documentation, etc.)

• tools (software, hardware, etc.)
• social and personal relationships (friend, relative, boss,
etc.)

• groups and cliques (young developers, testers, etc.)

• organizations (suppliers, customers, etc.)

• spaces (oﬃces, virtual spaces, distributed, etc.)

• episodes (unanticipated or irregular activities such as
delays, bugs, unexpected failures of the system, etc.)

• encounters (a temporary interaction between two or more
individuals such as a speciﬁc customer or manager, etc.)
Units of Social Organisation in SE Perspectives
• Factual: units as they are or happen

• Cognitive: units as they are interpreted
• Emotional: units as they are perceived

• Learning: units as they are learned
• Relational: units as they are related
(according to some aspects)
Units looked through a Perspective
generate a possible concept
(basic code)

Open Coding: from Concepts to Categories
When the program seems to work and it is
late afternoon, I force myself to stop and go out
I loose sense of time when programming
When I get stuck with a bug and cannot solve it during the day,
I often wake up at night with some idea
When I have to solve a bug I forget to eat
sense of time engagement programming
torment debugging
debugging engagement
programming go out
Activity Feeling
programming
debugging
engagement
torment
sleeping
eating go out
sleeping
eating
CATEGORIES
CONCEPTS

Axial Coding (aka Thematic
Analysis)
• Axial coding is oriented to identify a graph of categories and concepts

• In this context, axial coding is Thematic Analysis

• Multiple links can be created between categories and concepts

• Link by similarity

• Link by hierarchy

• Link by causal relationship
• Introduce new codes and categories, if needed

• Always compare the graph with your data

Axial Coding (aka Thematic Analysis):
From Categories to Relational Graph
Activity Feeling
programming
debugging
engagementtorment
sleeping eatinggo out
Work Life
New
Category
Hierarchical
relationship
Causal
relationship
Build the graph and then search for these categories and relationships in the text,
and ﬁnd additional concepts, categories and relationships
During the meetings, I feel the urge to say my viewpoint, but in the end I don’t
Leaving things as they are makes me frustrated frustration
meeting

Selective Coding
(aka Theory Generation)
• Simply consists in ﬁnding the message that groups all the other categories, can be a
single word or concept that characterise all the data, a title for your essay (e.g., The
Cycle of Developer's Emotions)

• The resulting hypothesis is expressed as a sentence:

Developers’ working activities have an emotional impact that has
consequences on daily-life activities
• Then I collect other data, and verify that the data are in agreement with the hypothesis

• My hypothesis becomes my substantive theory — or gets adjusted based on
additional evidence found in the data

• If I apply my theory to other settings (e.g., considering managers instead of
developers), and see that it applies (the data are in agreement with the theory), it
becomes a formal theory
aka Theoretical Coding

Overall Process, as Grounded Theory “should” Be
Research
Question
Theoretical
Sampling
Collect
Data
(Open)
Coding
Constant
Comparison
Saturate
Categories
Concepts Categories
Explore Category
Relations &
Hierarchy
Theoretical
Sampling
Collect
Data
Saturate Category
Relations &
Hierarchy
Category
Relations
& Hierarchy
aka Hypothesis Test
Hypothesis
Substantive
Theory
Collect and
Analyse Data in
other Settings
Formal
Theory

Did I finish? Saturation and Practical
Techniques for Theory Elicitation
• Each coding step ends when you feel that you have reached some form of saturation with
respect to 1. your codes; 2. what you can find in the data; 3. what you can gather from the
data sources (people, documents).

• Practical Techniques to elicit a theory that does not want to emerge:

1. Top 10 list: print 10 relevant quotes from your data, and combine them in different ways:
chronologically, hierarchically, episodically, narratively, from the expository to the climactic,
from the mundane to the insightful, from the smallest detail to the bigger picture, etc.

2.Trinity test: take the most relevant three categories, and find the dominant one, or
relationships among them

3. Touch Test: if you can touch it, it is not abstract enough (you can touch a programmer but
you cannot touch “software development”) so you have to go on. If all your categories are
abstract enough, then do 1 or 2.

4. Code weaving: take your codes (categories and concepts) and combine them in full
sentences. Pick the sentences that make more sense to you.

Tips: Codebook
• When you code, you are advised to have a codebook, in which your codes are deﬁned and
detailed. Each item includes the following:

1. short description – the name of the code itself  
2. detailed description – a 1–3 sentence description of the coded datum’s qualities or properties  
3. inclusion criteria – conditions of the datum or phenomenon that merit the code  
4. exclusion criteria – exceptions or particular instances of the datum or phenomenon that  
do not merit the code  
5. typical exemplars – a few examples of data that best represent the code  
6. atypical exemplars – extreme or special examples of data that still represent the code  
7.“close, but no” – data examples that could mistakenly be assigned this particular code  
cf. Saldana. The coding manual for qualitative researchers. Sage, 2015.
This can be useful for reporting!

Tips: Qualities of the Coder
• Organisation: following all the process and the large amount of data
is impossible if you are not organised
• Perseverance: it can take a huge amount of time, and you may feel
lost, especially if you are organised and control freak

• Deal with Ambiguity: you have to accept that reality is shaded

• Flexibility: your codes will change, your theory will change

• Creativity: your theory must say something that is not obvious, you
must be able to abstract concepts from reality
• Honesty: with the people and their data
cf. Saldana. The coding manual for qualitative researchers. Sage, 2015.

Threats to Validity in
Valid for Interviews and Observations

Ensuring Quality of
• Before going into the details of the (multiple) criteria
to assess validity for qualitative research, let us see
a few techniques to ensure general quality:

• Respondent Validation/Member Checking: check
with your subjects that they agree with your ﬁndings
(they may have defensive reactions/censorship,
they may agree out of respect, they may not
understand…)
• Triangulation: look at diﬀerent data (interviews AND
observations), involve other subjects to cross-check

Ensuring Quality of
• Before going into the details of the (multiple) criteria
to assess validity for qualitative research, let us see
a few techniques to ensure general quality:

• Respondent Validation/Member Checking: check
with your subjects that they agree with your ﬁndings
(they may have defensive reactions/censorship,
they may agree out of respect, they may not
understand…)
• Triangulation: look at diﬀerent data (interviews AND
observations), involve other subjects to cross-check
That’s it…you cannot do much more

Main Quality Criteria 1:
Trustworthiness
• Credibility: did I really understand the context? —
mitigation: triangulation and respondent validation

• Transferability: to which extent the ﬁndings can be
extended to other contexts —mitigation: thick
characterisation of the context’s features

• Dependability: can my research be assessed? —
mitigation: external peer-audit

• Conﬁrmability: is it evident that I acted without bias? —
mitigation: external peer-audit

Main Quality Criteria 2:
Authenticity
• Fairness: Does the research fairly represent diﬀerent viewpoints among
members of the context? (e.g., did I consider developers and testers?)

• Ontological authenticity: Does the research help members to arrive to a
better understanding of their context? (e.g., are they surprised with ﬁndings?)

• Educative authenticity: Does the research help members to appreciate
better the perspectives of other members of their context?

• Catalytic authenticity: Has the research acted as an impetus to members to
engage in action to change their circumstances? (e.g., thinking about
improving relationships)

• Tactical authenticity: Has the research empowered members to take the
steps necessary for engaging in action? (e.g., improve relationships)
Focus on members/participants

Checklist for Evaluating Qualitative Research
1. How credible are the ﬁndings?

2. Has knowledge/understanding been extended by the research?

3. How well does the evaluation address its original aims and purposes?

4. Scope for drawing wider inﬂuences—how well is this explained?

5. How clear is the basis of the evaluative appraisal?

6. How defensible is the research design?

7. How well defended is the sample design/target selection of cases/documents?

8. Sample composition/case inclusion—how well is the eventual coverage described?

9. How well was the data collection carried out?

10. How well has the approach to, and formulation of, the analysis been conveyed?

11. Contexts of data sources—how well are they retained and portrayed?

12. How well has diversity of perspective and content been explored?

13. How well has detail, depth and complexity (richness?) of the data been conveyed?

14. How clear are the links between data, interpretation and conclusions?

15. How clear and coherent is the reporting?

16. How clear are the assumptions/theoretical perspectives/values that have shaped the evaluation?

17. What evidence is there of attention to ethical issues?

18. How adequately has the research process been documented?
Use it to judge your report!

Threats to Validity
• Reliability: how consistent and verifiable are the finding? Is is
clear the link between the data and the theory?

• Validity: how appropriate is your overall research design in terms
of tools, process and data? Have you spent enough time in the
company? Have you interviewed the right roles and the right
people? How did you guarantee that? Have you performed
triangulation and member checking?
• Generalisability: (aka external validity) to which extent the
findings are applicable to other settings? One normally needs to
explain what are the salient characteristics of the context (e.g.,
nationality, number of employees, domain) and infer which
contexts may be similar.
…what you should write in a paper
cf. Leung, 2015. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4535087/

Threats to Validity
Theory
Research Design
Research Question
Collect Data
Analyse Data
Report Answers
Validity
Generalisability
/ Select Artifacts
Reliability

Example of a Theory
- RQ1: Which aspects of the current way of working with requirements impact
development speed? 
- RQ2: Which new aspects should be considered when deﬁning a new way
of working with requirements to increase development speed? 
- RQ3: To what extent will either aspects be addressed through the ongoing
agile transformation?
Goal: Study on the impact of requirements practices on development speed
Research Questions
They performed 30 interviews with managers and technical experts
cf. Ågren et al., 2019 https://doi.org/10.1007/s00766-019-00319-8

Example of a Theory 321Requirements Engineering (2019) 24:315–340
Fig. 1 Causal relations between concepts. Dashed line indicates which aspects will likely be addressed through the agile transformation (RQ3),
gray box lists additional concepts from the second round of interviews
cf. Ågren et al., 2019 https://doi.org/10.1007/s00766-019-00319-8

Showing Evidence: Example
5.1 RE style dominated by safety and legal concerns
Automotive systems are inherently safety-critical, not least because of how
they are perceived by customers and users:
“That’s something that can be perceived as very frightening for the customers and
also be dangerous if you just out of the blue suddenly brake the car.” – R6
“We have product liability, legal requirements, documentation obligations. If
something happens—if someone crashes and the airbag doesn’t deploy—in
accordance with which requirements have we developed, in accordance with
which requirements have we tested and veriﬁed and so on for our product
liability.” – R3
In the results, report those quotes from your data
that are linked to certain contexts
Respondent 3

Summary
• Qualitative studies in software engineering are useful to identify
human-related and social aspects, as well as opinions

• Useful when your research is at the exploratory stages

• They can be used in different research strategies

• Data collection strategies are Interviews, Observations and
Archival Data collection

• Data Analysis is based on coding, thematic analysis and ground
theory

• Often performed in the context of case studies (field studies, field
experiments)

Qualitative Studies in Software Engineering - Interviews, Observation, Grounded Theory

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Qualitative Studies in Software Engineering - Interviews, Observation, Grounded Theory

Similar to Qualitative Studies in Software Engineering - Interviews, Observation, Grounded Theory (20)

More from alessio_ferrari

More from alessio_ferrari (9)

Recently uploaded

Recently uploaded (20)

Qualitative Studies in Software Engineering - Interviews, Observation, Grounded Theory