This document discusses the need to study data science as a discipline through examining the processes, techniques, and outputs. It presents data science as consisting of iterative steps like forming hypotheses, collecting and analyzing data, and extracting results. Ontologies and platforms are proposed as tools to systematically describe datasets, licenses, models, and tasks. Case studies examine modeling data flows and understanding patterns in large data science systems. The document argues for an interdisciplinary approach and using techniques like science fiction to ensure data science is developed and applied responsibly through considering social and ethical implications.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Data-View of Data Science Process & Need for Multidisciplinary Study
1. A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
2. A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
3. A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
Why am I talking to you about
?
6. A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
?
7. A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
?
As in Biology? Simplifying, the observation of naturally
occurring phenomenons and principles in relation to data?
As in Physics? Again simplifying, the theorisation and
experimental verification of fundamental laws of data?
As in Social Sciences? Really simplifying, the investigation
and the social, economic or cultural implications of data
on individuals, groups and society?
19. Example: Describing a data process with ontologies
(The Datanode ontology - E. Daga)
A vocabulary to describe the
relationships between input
data set, intermediary data
assets and the outputs of a
data process.
27. Thousands of datasets used in
thousands of data science
processes.
Allows us to better understand
the tasks of data science, how
they occur, in what contexts…
As well as what characteristics
of datasets lead to what use in
data science processes.
28. Data Ethics
Hypo. /
Question
Plan Collect
data
Analyse
data
Extract
results
Exploit
results
Where ethical implications are (might be) considered
Where they are important
29. Towards a methodology for Ethics by Design in Data Science
(with P. Troullinou)
‘Ethics by
Design’ for Data
Science
Dialectic
The process is based on a conversational
approach between data and critical social
scientists throughout the project’s life-cycle.
Reflective
Ethical concerns are not pre-fixed; they may
emanate from any stage of the project; thus,
constant reflexivity on activities and
researchers is needed.
Creative, not disruptive
The objective of this process is to achieve a
positive impact on the research, increase its
value addressing ethics throughout the
project’s life-cycle.
All- encompassing
Ethical concerns appear as much in the
research activities as in their outcomes, their
use and exploitation; the process needs to
expand on all stages.
30. Using science fiction to guide ethical thinking
Used/controlled by a small number of individuals
Used/controlled by all
Usedaccuratelyaccordingtointended
purpose
Hacked,biased,inaccurate
S3E1: Nosedive
S3E5: Men
against fire
S3E6: Hated
in the nation
S4E2: Arkangel
S4E3: Crocodile
S4E5:
Metalhead
S3E2:
Playtest
S2E1:
Be
right
back
S1E3: The Entire history of
you
31. Using science fiction to guide ethical thinking
Write scenarios, short stories, based on the following four
premisses: In a near future, what I am developing/the results I
will obtain will be...
Used as intended
by millions/most
people/many
people
Used as intended
a small group with
control/power
Abused, hacked,
inaccurate or
biased, while used
by millions/most
people/many
people
Abused, hacked,
inaccurate or
biased, while used
by a small group
with control/power
What could possibly go
wrong?
(see Re-coding Black
Mirror workshops)
32. Conclusion
Data Science has grown very quickly as a discipline, to reach huge
economic and societal impact. And it is not stopping.
This is leading to the creation of a very large number of datasets,
techniques, tools, models, approaches, methods, that are driven by
practices and applications in various domains.
The study of those artefacts is becoming critical, to extract the
fundamental principles that guide data science as a discipline and a
process. Understanding those principles is essential to drive the
impact of data science in an informed way.
Data science practice can support data science theory, but this is not a
job for the data/computer scientist alone. It needs to be a
conversation with social scientists, business experts, legal experts...