The document discusses big data and analytics in three concise summaries:
1. It defines big data using the "five Vs" - volume, variety, velocity, veracity, and value. It explains that volume is less important than variety, velocity, and veracity (data quality). The goal of big data is to enable learning from data through analytics.
2. It distinguishes data science from statistics, noting that data science typically analyzes secondary data using inductive reasoning to generate hypotheses, while statistics analyzes primary data using deductive reasoning to test hypotheses. Both approaches are needed for proper data-driven decision making.
3. It explains that analytics aims to describe what happened (descriptive),
Big Data and Analytics Fuel Digital Transformation
1. Big Data as the Fuel and
Analytics as the Engine of the
Digital Transformation
Prof. Dr. Diego Kuonen, CStat PStat CSci
Statoo Consulting, Berne, Switzerland
@DiegoKuonen + kuonen@statoo.com + www.statoo.info
āInformation Builders Think Tank Lunchā, Zurich, Switzerland ā June 13, 2017
2. About myself (about.me/DiegoKuonen)
PhD in Statistics, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
MSc in Mathematics, EPFL, Lausanne, Switzerland.
ā¢ CStat (āChartered Statisticianā), Royal Statistical Society, UK.
ā¢ PStat (āAccredited Professional Statisticianā), American Statistical Association, USA.
ā¢ CSci (āChartered Scientistā), Science Council, UK.
ā¢ Elected Member, International Statistical Institute, NL.
ā¢ Senior Member, American Society for Quality, USA.
ā¢ President of the Swiss Statistical Society (2009-2015).
Founder, CEO & CAO, Statoo Consulting, Switzerland (since 2001).
Professor of Data Science, Research Center for Statistics (RCS), Geneva School of Economics
and Management (GSEM), University of Geneva, Switzerland (since 2016).
Founding Director of GSEMās new MSc in Business Analytics program (starting fall 2017).
Principal Scientiļ¬c and Strategic Big Data Analytics Advisor for the Directorate and Board of
Management, Swiss Federal Statistical Oļ¬ce (FSO), NeuchĖatel, Switzerland (since 2016).
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
2
3.
4. About Statoo Consulting (www.statoo.info)
ā¢ Founded Statoo Consulting in 2001.
2017 ā 2001 = 16 + .
ā¢ Statoo Consulting is a software-vendor independent Swiss consulting ļ¬rm
specialised in statistical consulting and training, data analysis, data mining
(data science) and big data analytics services.
ā¢ Statoo Consulting oļ¬ers consulting and training in statistical thinking, statistics,
data mining and big data analytics in English, French and German.
Are you drowning in uncertainty and starving for knowledge?
Have you ever been Statooed?
5.
6. Contents
Contents 6
1. Demystifying the ābig dataā hype 8
2. Demystifying the two approaches of ālearning from dataā 16
3. Questions āanalyticsā tries to answer 23
4. Demystifying the āmachine intelligence and learningā hype 26
5. Conclusion and key principles for success 31
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
6
7. āData is arguably the most important natural
resource of this century. ... Big data is big news just
about everywhere you go these days. Here in Texas,
everything is big, so we just call it data.ā
Michael Dell, 2014
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
7
8. 1. Demystifying the ābig dataā hype
ā¢ āBig dataā have hit the business, government and scientific sectors.
The term ābig dataā ā coined in 1997 by two researchers at the NASA ā has
acquired the trappings of religion.
ā¢ But, what exactly are ābig dataā?
The term ābig dataā applies to an accumulation of data that can not be
processed or handled using traditional data management processes or tools.
Big data are a data management infrastructure which should ensure that the
underlying hardware, software and architecture have the ability to enable ālearning
from dataā, i.e. āanalyticsā.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
8
9. ā¢ The following characteristics ā āthe four Vsā ā provide a deļ¬nition:
ā āVolumeā : ādata at restā, i.e. the amount of data ( ādata explosion problemā),
with respect to the number of observations ( āsizeā of the data), but also with
respect to the number of variables ( ādimensionalityā of the data);
ā āVarietyā : ādata in many formsā, āmixed dataā or ābroad dataā, i.e. diļ¬erent
types of data (e.g. structured, semi-structured and unstructured, e.g. log ļ¬les,
text, web or multimedia data such as images, videos, audio), data sources (e.g.
internal, external, open, public), data resolutions (e.g. measurement scales and
aggregation levels) and data granularities;
ā āVelocityā : ādata in motionā or āfast dataā, i.e. the speed by which data are
generated and need to be handled (e.g. streaming data from devices, machines,
sensors and social data);
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
9
10. ā āVeracityā : ādata in doubtā or ātrust in dataā, i.e. the varying levels of noise and
processing errors, including the reliability (āquality over timeā), capability and
validity of the data.
ā¢ āVolumeā is often the least important issue: it is deļ¬nitely not a requirement to
have a minimum of a petabyte of data, say.
Bigger challenges are āvarietyā (e.g. combining diļ¬erent data sources such as
internal data with social networking data and public data) and āvelocityā, but most
important is āveracityā and the related quality of the data .
Indeed, big data come with the data quality and data governance challenges of
āsmallā data along with new challenges of its own!
Existing āsmallā data quality frameworks need to be extended, i.e. augmented!
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
10
12. ā¢ The above deļ¬nition of big data is vulnerable to the criticism of sceptics that these
four Vs have always been there.
Nevertheless, the definition provides a clear and concise framework to communicate
about how to solve diļ¬erent data processing challenges.
But, what is new?
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
12
13. āData is part of Switzerlandās infrastructure, such as
road, railways and power networks, and is of great
value. The government and the economy are obliged
to generate added value from these data.ā
digitalswitzerland, November 22, 2016
Source: digitalswitzerlandās āDigital Manifesto for Switzerlandā (digitalswitzerland.com).
The 5th V of big data: āValueā , i.e. the āusefulness of dataā.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
13
14. Intermediate summary: the āļ¬ve Vsā of (big) data
āVolumeā, āVarietyā and āVelocityā are the āessentialā characteristics of (big) data;
āVeracityā and āValueā are the āqualiļ¬cation for useā characteristics of (big) data.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
14
15. āData are not taken for museum purposes; they are
taken as a basis for doing something. If nothing is to
be done with the data, then there is no use in
collecting any. The ultimate purpose of taking data
is to provide a basis for action or a recommendation
for action.ā
W. Edwards Deming, 1942
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
15
16. 2. Demystifying the two approaches of ālearning from dataā
Data science, statistics and their connection
ā¢ The demand for ādata scientistsā ā the āmagicians of the big data eraā ā is
unprecedented in sectors where value, competitiveness and eļ¬ciency are data-driven.
The term ādata scienceā was originally coined in 1997 by a statistician.
Data science ā a rebranding of ādata miningā ā is the non-trivial
process of identifying valid (that is, the patterns hold in general, i.e. being
valid on new data in the face of uncertainty), novel, potentially useful
and ultimately understandable patterns or structures or models or trends
or relationships in data to enable data-driven decision making.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
16
17. ā¢ Is data science āstatistical dĀ“ej`a vuā?
But, what is āstatisticsā?
Statistics is the science of ālearning from dataā (or of making sense out
of data), and of measuring, controlling and communicating uncertainty.
It is a process that includes everything from planning for the collection of data and
subsequent data management to end-of-the-line activities such as drawing conclusions
of data and presentation of results.
Uncertainty is measured in units of probability, which is the currency (or grammar)
of statistics.
Statistics is concerned with the study of data-driven decision making in the face
of uncertainty.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
17
18. What distinguishes data science from statistics?
ā¢ Statistics traditionally is concerned with analysing primary (e.g. experimental or
āmadeā or ādesignedā) data that have been collected to explain and check the validity
of specific existing ideas (hypotheses), i.e. through operationalisation of theoretical
concepts.
Primary data analysis or top-down (explanatory and confirmatory) analysis.
āIdea (hypothesis) evaluation or testingā .
āDeductive reasoningā as āidea (theory) ļ¬rstā.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
18
19. ā¢ Data science, on the other hand, typically is concerned with analysing secondary
(e.g. observational or āfoundā or āorganicā) data that have been collected (and
designed) for other reasons (and not āunder controlā of the investigator) to create
new ideas (hypotheses).
Secondary data analysis or bottom-up (exploratory and predictive) analysis.
āIdea (hypothesis) generationā .
āInductive reasoningā as ādata ļ¬rstā.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
19
20. Do not forget the term āscienceā in ādata scienceā!
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
20
21. ā¢ The two approaches of ālearning from dataā are complementary and should proceed
side by side ā in order to enable proper data-driven decision making and to enable
continuous improvement.
Source: Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791ā799.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
21
22. āNeither exploratory nor confirmatory is sufficient
alone. To try to replace either by the other is
madness. We need them both.ā
John W. Tukey, 1980
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
22
23. 3. Questions āanalyticsā tries to answer
Source: Jean-Francois Puget, Chief Architect, IBM Analytics Solutions, September 21, 2015 (goo.gl/Vl4l2d).
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
23
24. Data-driven decision making : refers to the practice of basing decisions on
āanalyticsā (i.e. ālearning from dataā), rather than purely on gut feeling and intuition:
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
24
26. 4. Demystifying the āmachine intelligence and learningā hype
John McCarthy, one of the founders of āArtiļ¬cial Intelligenceā (AI) (now
sometimes referred to as āmachine intelligenceā) research, deļ¬ned in 1956 the ļ¬eld
of AI as āgetting a computer to do things which, when done by people, are said to
involve intelligenceā, e.g. visual perception, speech recognition, language translation
and visual translation.
AI is about (smart) machines capable of performing tasks normally performed by
humans ( ālearning machinesā), i.e. āmaking machines smartā.
In 1959, Arthur Samuel deļ¬ned āMachine Learningā (ML) as one part of a larger
AI framework āthat gives computers the ability to learnā.
ML explores the study and construction of algorithms that can learn from and
make predictions on data, i.e. āprediction makingā through the use of computers.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
26
27. āIn short, the biggest difference between AI then and
now is that the necessary computational capacity,
raw volumes of data, and processing speed are readily
available so the technology can really shine.ā
Kris Hammond, September 14, 2015
Source: Kris Hammond, āWhy artiļ¬cial intelligence is succeeding: then and nowā,
Computerworld, September 14, 2015 (goo.gl/Q3giSn).
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
27
28. ā¢ However, without humans as a guide, current AI is no more capable than a computer
without software!
āBusiness is not chess; smart machines alone can not
win the game for you. The best that they can do for
you is to augment the strengths of your people.ā
Thomas H. Davenport, August 12, 2015
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
28
30. āIn the anticipated symbiotic [manācomputer]
partnership, men will set the goals, formulate the
hypotheses, determine the criteria, and perform the
evaluations. Computing machines will do the
routinizable work that must be done to prepare the
way for insights and decisions in technical and
scientiļ¬c thinking. ... In one sense of course, any
man-made system is intended to help man, to help a
man or men outside the system.ā
Joseph C. R. Licklider, 1960
Source: Licklider, J. C. R. (1960). Manācomputer symbiosis.
IRE Transactions on Human Factors in Electronics, 1, 4ā11.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
30
31. 5. Conclusion and key principles for success
ā¢ Decision making that was once based on hunches and intuition should be driven by
data ( data-driven decision making, i.e. muting the HIPPOs).
ā¢ Despite an awful lot of marketing hype, big data are here to stay ā as well as the
āInternet of Thingsā (IoT; a term coined in 1999!) ā and will impact every single
domain!
ā¢ The key elements for a successful (big) data analytics and data science future are
statistical principles and rigour of humans!
ā¢ Statistics, (big) data analytics and data science are aids to thinking and not
replacements for it!
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
31
32. Technology is not the real challenge of the digital transformation!
Digital is not about the technologies (which change too quickly)!
Note: edge computing is also referred to as fog computing, mesh computing, dew computing and remote cloud.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
32
33. āDigital strategies ... go beyond the technologies
themselves. ... They target improvements in
innovation, decision making and, ultimately,
transforming how the business works.ā
Gerald C. Kane, Doug Palmer, Anh N. Phillips, David Kiron and Natasha Buckley, 2015
Source: Kane, G. C., Palmer, D., Phillips, A. N., Kiron, D. & Buckley, N. (2015). Strategy, not technology,
drives digital transformation. MIT Sloan Management Review (goo.gl/Dkb96o).
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
33
34. My key principles for analyticsā success
ā¢ Do not neglect the following four principles that ensure successful outcomes:
ā use of sequential approaches to problem solving and improvement, as studies
are rarely completed with a single data set but typically require the sequential
analysis of several data sets over time;
ā having a strategy for the project and for the conduct of the data analysis;
including thought about the ābusinessā objectives ( āstrategic thinkingā );
ā carefully considering data quality and assessing the ādata pedigreeā before,
during and after the data analysis; and
ā applying sound subject matter knowledge (ādomain knowledgeā or ābusiness
knowledgeā, i.e. knowing the ābusinessā context, process and problem to which
analytics will be applied), which should be used to help deļ¬ne the problem, to
assess the data pedigree, to guide data analysis and to interpret the results.
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
34
35. āAll improvement takes place project by project and
in no other way.ā
Joseph M. Juran, 1989
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
35
36. āBy āaugmenting human intellectā we mean increasing
the capability of a man to approach a complex
problem situation, to gain comprehension to suit his
particular needs, and to derive solutions to problems.ā
Douglas C. Engelbart, 1962
Source: Engelbart, D. C. (1962). āAugmenting human intellect: a conceptual frameworkā (1962paper.org).
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
36
37. āWe can not solve problems by using the same kind
of thinking we used when we created them.ā
Albert Einstein
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
37
38. āThe only person who likes change is a wet baby.ā
Mark Twain
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
38
39. āIt is not necessary to change. Survival is not
mandatory.ā
W. Edwards Deming
Another version by W. Edwards Deming: āSurvival is not compulsory.
Improvement is not compulsory. But improvement is necessary for survival.ā
Copyright c 2001ā2017, Statoo Consulting, Switzerland. All rights reserved.
39
40. Have you been Statooed?
Prof. Dr. Diego Kuonen, CStat PStat CSci
Statoo Consulting
Morgenstrasse 129
3018 Berne
Switzerland
email kuonen@statoo.com
@DiegoKuonen
web www.statoo.info
41. Copyright c 2001ā2017 by Statoo Consulting, Switzerland. All rights reserved.
No part of this presentation may be reprinted, reproduced, stored in, or introduced
into a retrieval system or transmitted, in any form or by any means (electronic,
mechanical, photocopying, recording, scanning or otherwise), without the prior
written permission of Statoo Consulting, Switzerland.
Warranty: none.
Trademarks: Statoo is a registered trademark of Statoo Consulting, Switzerland.
Other product names, company names, marks, logos and symbols referenced herein
may be trademarks or registered trademarks of their respective owners.
Presentation code: āIBI.BDA.June13.2017ā.
Typesetting: LATEX, version 2 . PDF producer: pdfTEX, version 3.141592-1.40.3-2.2 (Web2C 7.5.6).
Compilation date: 09.06.2017.