SlideShare a Scribd company logo
1 of 100
Download to read offline
1 Data Scientists: Who are they? What do they do? How do they work?
Data Scientists:
Who are they?
What do they do?
How do they work?
2 Data Scientists: Who are they? What do they do? How do they work?
“The sexiest job in the next ten
years will be that of the statistician.
People think I’m joking, but who
would’ve guessed that computer
engineers would’ve had the sexiest
job of the 1990s?”.
Hal Varian, October 2008.
3 Data Scientists: Who are they? What do they do? How do they work?
Introduction: Data Scientist, the sexiest job of the decade
- Data, data and more data
- A little bit of history
1. Where do Data Scientists come from?
- Understanding the role of each specialist
2. Data Scientists: seeking their place in the organizational chart
- The data was already in-house

- Are companies ready to listen to the Data Scientist?
3. Who needs a Data Scientist?
4. The Data Scientist skill set
- Technical skills 

- Above and beyond technical skills
- How to choose your data scientist 

- Struggling to find a data scientist? Train them in-house 

- Supermen and superwomen? No, super teams!
5. The Data Scientist’s tools
- Data processing system construction, databases, visualization,
and data wrangling tools 

- Open source or proprietary software? 	
6. Getting down to it: the work process
- Three obstacles to overcome before accessing data
- From data to decision... if nothing goes wrong
7. Evaluating the Data Scientist’s work
8. Trust: an essential component in the process of data science
- Ethics: science’s essential accessory
9. Data scientists in Spain today
- Who’s making the most out of data science in Spain?
10. Conclusions: still a great deal to be done
- What does the adulthood of big data look like?
4 Data Scientists: Who are they? What do they do? How do they work?
The data scientist is a sort of
mix between a programmer,
an analyst, a communicator
and an adviser. A very difficult
combination to come across.
5 Data Scientists: Who are they? What do they do? How do they work?
Data scientist,
the sexiest job of the decade
The figure of the data scientist first emerged in the early twenty-first century. A decade
after the widespread business adoption of the Internet, Hal Varian, chief economist at
Google, predicted in an interview in October 2008: “The sexiest job in the next ten years will be
statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve
had the sexiest job of the 1990s?”
Varian, also a professor at the University of California, Berkeley, was one of the first to
recognize the strategic importance of extracting information from data, and not just
at a corporate level. “The ability to take data - to be able to understand it, to process it, to
extract value from it, to visualize it, to communicate it - that’s going to be a hugely important
skill in the next decades. And not only at the professional level, but even at the educational
level for elementary school kids, for high school kids, for college kids. Because now we really do
have essentially free and ubiquitous data. So, the complimentary scarce factor is the ability to
understand that data and extract value from it”.
The truth is that in 2008 a few companies had already incorporated the position in order
to manage a volume of information hitherto unknown, due to its variety and sheer scope,
in a quest for findings relevant to the business. Until then nobody had called them “Data
Scientists”. The first to do so were DJ Patil and Jeff Hammerbacher, then heads of Data
Analytics at LinkedIn and Facebook respectively.
Eight years later, in 2016, with an increasing volume of data generated on a daily basis,
Varian’s predictions are more poignant than ever. According to the McKinsey Global
Institute report “Game changers: Five opportunities for US growth and renewal”, the big
data industry in the United States could increase annual GDP by 325 billion dollars by
2020. According to the same report, the United States alone will face a shortage of up to
190,000 data scientists and 1.5 million professionals with enough proficiency to use big
data effectively. Between 2010 and 2020, the number of companies seeking to incorporate
the figure of a data scientist will grow by 18.7%, according to the EMC study “The
Digital Universe in 2020”. An estimated 40,000 exabytes of data will be created by 2020,
underlying the need for organizations to incorporate talent to conduct in-depth analysis of
information.
6 Data Scientists: Who are they? What do they do? How do they work?
In reality, many companies (the biggest or the most pioneering ones) have already
incorporated the figure of data scientist in any one of its variations. Their sudden
appearance in the business world and the high demand for these professionals expected
over the coming years confirm that there is a growing need to process large volumes of
information and transform it into a valuable asset, given that data “in its raw state” is
not useful for companies. Only an in-depth analysis offers the chance to reveal patterns
and trends, which at the same time streamline business processes and optimize decision-
making. This is where data science emerges as the process that enables the collection,
preparation, analysis, visualization, management and preservation of large volumes
of data. Extracting valuable information from all types of sources provides solutions to
a companies’ vital strategic issues, such as those related to time and cost savings, new
product development, the optimization of offers and faster and more accurate decision-
making processes.
But what does a data scientist actually do? Here at Good Rebels we wanted to outline
a profile of this new profession, with the help of various industry leaders from
academia, business and institutions. In short, we concluded that the main tasks of
a data scientist are to identify data, transform it when incomplete, categorize it,
prepare it for analysis, perform the analysis, visualize the results and communicate
them. To do this, the data scientist must have technical training in programming,
data management, statistics and data mining. And let’s not forget, aside from the
analytical part, the ability to focus on creating value for the company. This is why, in
a competitive scenario where challenges are constantly renewed and data doesn’t stop
flowing, the data scientist’s work enables managers to move from an ad hoc analysis to
an ongoing conversation with the data.
What kind of person is able to perform this task? The data scientist is a mix between a
programmer, an analyst, a communicator and an adviser. With proficiency in statistics,
technology, math, and data architecture. All this without forgetting human qualities. A
very difficult skill set to find all in one person? Probably so. Simply because there are
not many people who can do all that.
7 Data Scientists: Who are they? What do they do? How do they work?
So basically, we’re talking about a well-rounded jack-of-all trades proficient in
mathematics, IT and data architecture, knowledgeable of business, with strong
communication skills as well as empathetic virtues... Professionals refer to this ideal
person, given the practical impossibility of finding one on the market, with labels such
as “El Dorado, “Unicorn”, “The Data Science Superhero”, “The Dark Beast” or “The
New Renaissance Man”. An extremely powerful combination... and very hard to find,
because demand is growing and such professionals are in short supply. The solution:
training, retraining and building teams that when combined are able to integrate a
profile like the one described.
Read more:
Hal Varian interview at McKinsey.com
DJ Patil Biography
Building Data Science Teams, at Amazon.com
8 Data Scientists: Who are they? What do they do? How do they work?
Data, data and more data
With countless services and connected devices, it is estimated that 90% of data has been
generated in the last two years. This volume is higher than all the information ever created
in the history of mankind. And this is also very good news for anyone who specializes in
data management and processing: they’ll probably never be short of work for the rest of
their lives.
Numerous indicators illustrate this spectacular explosion of data. For example:
- In 2020, 1.7 MB of information will be created per second and for every human
being, according to EMC forecasts.
- Information is constantly being generated, which someone needs to monitor.
For example, on Google alone there are 40,000 searches every second.
- Facebook is another behemoth when it comes to data generation. Every minute,
its users send an average of 31.2 million messages and watch 2.77 million videos.
- In May 2016, Facebook and Microsoft began laying a 6,600-km underwater
cable between Europe and the US, capable of transmitting 160 TB of data per
second.
- 80% of photos will be taken with smartphones in 2017. A high percentage of
them will be shared via the Internet.
- It is estimated that in 2020 more smartphones will be in use than landlines, with
a total of 6,100 million users worldwide.
- Also in 2020, there will be 50 billion smart devices in use worldwide, all
collecting, analyzing, and sharing data. A third of data will travel through the
cloud.
- 80% of data generated today is unstructured. This includes data found in emails,
spreadsheets, social media, the Internet, etc.
9 Data Scientists: Who are they? What do they do? How do they work?
- The market for Hadoop (an open-source software framework used to manage
networked computers) will grow at an annual rate of 58%, exceeding the value
of 1 billion dollars in 2020.
- For an average company on the Fortune 1000, an improvement of just 10% in
data accessibility will result in over $60 million of additional net income.
- Businesses that make full use of the potential of data could boost their operating
margins by up to 60%.
- Perhaps the most mind-boggling fact, and which highlights the enormous
potential that lies ahead for the big data industry: according to MIT, less than
0.5% of all data generated right now is analyzed.
Read more:
Big Data: 20 Mind-Boggling Facts Everyone Must Read
Internet Live Stats
Big data: The next frontier for innovation, competition, and productivity
10 Data Scientists: Who are they? What do they do? How do they work?
A little bit of history
The Cyclopædia of Commercial and Business Anecdotes, published in 1865 by Richard Millar
Devens, contains the first recorded reference of the term “business intelligence”. The
author described how a banker, Sir Henry Furnese, succeeded by having an understanding
of market conditions before his competitors: “Throughout Holland, Flanders, France, and
Germany, he maintained a complete and perfect train of business intelligence. The news…was thus
received first by him”, Devens writes. Furnese ultimately used this advance knowledge to
duplicitous ends and became renowned as a corrupt financier. However, he can be credited
for sowing the seeds of business intelligence.
Technology did not advance to the point where it could be considered an agent of business
intelligence until well into the 20th century. The first commercial computers arrived in the
United States in the 1950s. Hans Peter Luhn, a pioneering researcher at IBM, published in
1958 the article “A Business Intelligence System”, in which he defined business intelligence as
“the ability to apprehend the interrelationships of presented facts in such a way as to guide
action towards a desired goal”.
Luhn contemplated the development of an automatic and intelligent system, built on
document processing equipment, capable of designing target-specific action guidelines
for the various sections of any organization. With this article, Luhn, considered the father
of business intelligence, laid the foundations for information analysis and distribution to
serve the needs of a company.
It wasn’t until three decades later, in 1989 to be exact, when the analyst Howard Dresner
brought the modern definition of business intelligence into the common vernacular.
Encompassing somewhat cumbersome-sounding concepts related to data storage and data
processing, Dresner summed up the idea of business intelligence as “concepts and methods
to improve business decision-making by using fact-based support systems”.
From the 2000s, the intersection between different technologies and business needs
prompted new concepts and terminologies: data engineering, business analytics, data
mining, etc. There is currently no clear consensus on exactly where the skills of each of
these disciplines begin and end, nor to what extent some overlap with others. But what’s
clear is that they all coexist under the umbrella of big data.
11 Data Scientists: Who are they? What do they do? How do they work?
Read more:
Richard Miller Devens - Cyclopædia of commercial and business
anecdotes
Hans Peter Luhn – A Business Intelligence System
Howard Dresner’s blog
12 Data Scientists: Who are they? What do they do? How do they work?
13 Data Scientists: Who are they? What do they do? How do they work?
The following people have
participated in this study:
Bosco Aranguren
Chief Marketing Officer, Microsoft Iberia
CMO at Microsoft Iberia since March 2017. Previously, he was
responsible for Programmatic Media Buying at Google. He joined
Google in 2010 as Industry Head Automotive, and in 2012 he
became Industry Head CPG  Entertainment
Álvaro Barbero
Chief Data Scientist at Instituto de Ingeniería del Conocimiento (IIC)
Expert in the fields of machine learning, optimization and
algorithm engineering. His work is to transform advances in
these areas into practical Big Data systems, from predictive and
recommender systems to automated text analysis and resource
optimization.
Richard Benjamins
Director of External Positioning  Big Data, LUCA: Data-Driven Decisions
Director of External Positioning  Big Data for Social Good at
Telefonica in Telefonica’s Chief Data Office. In his previous position
of Group Director BI  Big Data he was responsible for internal
exploitation of Big Data across Telefonica. He was also Director of
Business Intelligence at Telefonica Digital, and before that he was
Director of User Modelling where he led Global BI programs.
14 Data Scientists: Who are they? What do they do? How do they work?
Fuencisla Clemares
Country Manager at Google Spain  Portugal
Joined Google in 2009 as Manager of Retail and Consumer
goods; after that, she led the Telecommunications, Banking and
Insurance sectors, along with the mobile strategy for Spain. Prior
to Google, she worked for seven years as a strategic consultant at
McKinsey  Company, and later became Director of Purchasing in
the Carrefour home division.
Manuel Marín
Data Analytics Manager, PwC
Data Analytics Manager at PwC. Before that, he was Chief
Technical Officer at APARA, and applied predictive analytics
in telco, banking, insurance, energy, health, sports and retail
companies in the areas of fraud detection and customer
intelligence.
Esteban Moro
Associate Professor at Universidad Carlos III de Madrid
Esteban is professor at Universidad Carlos III de Madrid and
member of the Joint Institute UC3M-Santander on Big Data and
academic director of the Master of Data Science and Big Data
on Finance by AFI. He serves as consultant for many public and
private institutions. His areas of interests are applied mathematics,
financial mathematics, viral marketing and social network.
15 Data Scientists: Who are they? What do they do? How do they work?
Felipe Ortega
Director of the Master in Data Science at Universidad Rey Juan Carlos
Assistant Professor in the Department of Theory on Signal and
Communication and Telematic Systems and Computing, School of
Telecommunications Engineering at University Rey Juan Carlos (Madrid).
He is co-founder of the Data Science Lab at the Center for Intelligent
Information Systems (CETINIA) and Academic Director of the Master
in Data Science at UJC. His main areas of research are data engineering,
computational statistics, machine learning, quantitative methods, open
source software, large-scale data management and data visualization.
Pep Porrà
Business Performance Director, King.com
Business Performance Director at King.com, where he leads a team
of Data Scientists and Business Performance managers focused on
evaluate, anticipate and understand the monetization impact of
game features. Prior to work in corporate, he was a Statistics and
Mathematics Professor at University of Barcelona.
Alejandro Rodríguez
Professor at Universidad Politécnica de Madrid
Professor at the Department of Computer Languages and Systems
and Software Engineering at UPM. Specialized researcher in the fields
of medical informatics, knowledge representation, expert systems
and semantic web.
Marcelo Soria
Partner at Tramontana.co
From mid-2016, partner at Tramontana.co. Between May 2014 and
May 2016, he was VP of Data Services at BBVA Data  Analytics, and
before that he was Big Data // Smart Cities initiative co-leader at
BBVA.
16 Data Scientists: Who are they? What do they do? How do they work?
1.
Where do data
scientists come
from?
17 Data Scientists: Who are they? What do they do? How do they work?
18 Data Scientists: Who are they? What do they do? How do they work?
Where are data scientists?
More than half of these professionals are
concentrated in the United States. Spain
is ranked as the eighth country in the
world with the highest number of data
scientists in employment.
“The State of Data Science”, Stitchdata.com
19 Data Scientists: Who are they? What do they do? How do they work?
DJ Patil, currently Chief Data Scientist for the US government, was the first to coin the
term “data scientist”, during his tenure at LinkedIn. But nearly a decade after, there is still
some controversy about its exact meaning, and whether or not this role differs from that
performed by data analysts in companies for many years now.
For some, the origin of data science lies in machine learning. All prediction and
classification models have been developed from this branch. Professionals trained in this
discipline were mainly mathematicians who also had programming skills that enabled
them to implement and test predictive models, as it represents a non-theoretical branch of
mathematics.
The huge change in the amount of data being handled by organizations is the main
driving force behind the new profile. If elements such as big data and machine learning
are added to traditional data analytics, we may well be talking about a new theoretical
discipline - and also job category - whose terms are being defined virtually at the same
time as the market creates demand. What distinguishes a data scientist is a different,
more scientific type of training, which allows them to use the very latest techniques
to access mass data, not only at the level of exploration, but also speed. A profile both
academic training and professional.
Due to the current lack of consensus on their characteristics and skills, there is a wide
spectrum of professionals included in the category of data scientist. It is important, though,
that they meet a set of characteristics: they should be able to use their knowledge to
extract non-obvious information from data and empirical evidence, and also present it
in an understandable way.
Each specialist has their place and time
Data science, big data, data analytics... Terms that we’ve been hearing for years now,
but are still somewhat enshrouded in confusion when it comes to their definition and
competencies. What’s involved in each of these disciplines?
First and foremost, it’s important to stress that the role of data scientist is different from
that of an analyst who designs models or forecasts. The data scientist is not only expected
to explain the effect that the data will have on the company’s future, but also to provide
solutions that help the company to grow, both in the present and in the future.
20 Data Scientists: Who are they? What do they do? How do they work?
“You can not communicate a
relevant decision in your business
if you are not able to explain how
you got it, what data you have
used, and what processes you have
followed to break it down.”
Esteban Moro.
21 Data Scientists: Who are they? What do they do? How do they work?
Data science
- Faced with structured or unstructured data, data science is a field that encompasses
everything related to the cleaning (curation), preparation and analysis of data.
- Data science consists of a medley of statistics, mathematics and programming,
peppered with problem-solving, data extraction using as much ingenuity as required
and the ability to scrutinize a problem from different perspectives.
- The data scientist shifts business cases to an analytical plane, develops hypotheses
and patterns, and evaluates their impact on the business. This deep analysis has
the ultimate goal of solving complex business issues efficiently and anticipating
future needs.
Big Data
- Big data refers to huge volumes of data, proprietary or third-party and usually non-
aggregated, the size of which prevents it from being processed effectively using
traditional applications.
- Big data is a term that is gaining more and more ground in firms and industries.
The analysis of data trends using sophisticated algorithms and other cutting-edge
information processing methods ultimately improves strategic decisions that are a
driving force behind business.
Data analytics
- Data analytics uses data to examine market and business trends, and to develop or
improve methods linked to productivity and cost reduction.
- The essence of data analytics is inference, which is the process of drawing
conclusions based solely on what the researcher already knows.
- Data analytics is used in many industries to help companies improve decision-
making, as well as to verify or refute existing theories and models.
22 Data Scientists: Who are they? What do they do? How do they work?
“The next big challenge in the
gaming industry is to create smart
systems. To convert data into new
value for the company”.
Pep Porrà.
23 Data Scientists: Who are they? What do they do? How do they work?
A hypothetical case will let us see the different processes involved in a data science project.
Let’s imagine that every day millions of images are uploaded to a restaurant review site and
they need to be catalogued: are they pictures of food? What kind of food? Or are they of a
restaurant? Of the outside or the inside?
Machine learning automatically classifies each image into its respective category.
Properly “trained”, a computer can figure out, for example, if the photo of a restaurant is
of the inside or the outside. The data scientist oversees the entire project, from selecting
the right algorithm to engineering design.
- The data scientist creates the model which allows the computer to make this
distinction, using different sources of information ranging from manually classified
images to keywords in screenshots.
- Using data engineering techniques, a data feed and storage system is created, to
which algorithms are applied on a large scale.
- Finally, analysis is made of the business implications for the company of the
innovation applied: is it useful for business? Will it help the website generate more
traffic?... and so on. The findings are then presented using visualization tools.
24 Data Scientists: Who are they? What do they do? How do they work?
2.
Data scientists:
seeking their
place within the
organizational chart
25 Data Scientists: Who are they? What do they do? How do they work?
26 Data Scientists: Who are they? What do they do? How do they work?
“The problem we often find is
that data has been managed in
isolation. And then the time comes
to enable that data and there’s no
communication going on”.
Bosco Aranguren.
27 Data Scientists: Who are they? What do they do? How do they work?
The data scientist isn’t a radically new profile that’s being defined from scratch.
Companies have long been resorting to in-depth data analysis as a valuable tool that
helps meet or exceed their goals. What’s changed now is the dimension of this analysis,
as in a greater volume of data calls for a different approach, with regard both to
procedures and the purpose of the analysis.
	
Many experts stress the idea of rediscovering data, or rather, discovering its value
contribution to the company. The person who used to manage data, target customers or
detect products with the greatest turnover quite clearly added value to the company. But
the data scientist’s role goes much further.
The data was already in-house
It’s true that the figure of data manager has existed in companies for some time now.
Data Analytics has been used in the telecommunications industry for at least 20 years.
Banking also has been using Business Intelligence for several years, as have - somewhat
more mutedly - all major companies at the helm of their respective industries. However,
far from being a cross-disciplinary practice, data analysis has often only been applied
in specific departments, mainly in Marketing, Communication and Customer Insights.
A form of pigeonholing which has to a certain extent jeopardized its importance within
the hierarchy of company priorities.
The main problem in companies without a data-focused corporate culture is that they were
often run in a decentralized and disorganized way. As a result of this siloed management,
each corporate department has been taking technology-related decisions it deemed the
most appropriate at any given time.
Now that the time has come to deal with data, experts are encountering barriers and
incompatibilities that hugely complicate their work. In institutions with enormous
historical repositories, grouping together and processing data files is a colossal effort, but
once this path of self-learning has been completed, the work translates into improvements
in internal processes, people management and/or customer service.
28 Data Scientists: Who are they? What do they do? How do they work?
“Technically you can do just
about everything, but the
organization must then be
prepared to use it”.
Richard Benjamins.
29 Data Scientists: Who are they? What do they do? How do they work?
The difference when compared to the situation in recent years is that data analytics
specialists now have much more powerful and effective technological resources, allowing
them to extract greater value from the information. Computing costs are lower, data
availability is higher and connectivity between both is greater, so this raises the chances of
finding patterns or potential case-based reasoning, helping to update the practice of using
data to improve management.
In this process of recognizing the status of data scientists, it’s vital to mention a
fundamental advance in their professional acknowledgement: they have taken on the
crucial responsibility to commit towards improving company results. Their mission is
no longer limited to guiding or advising the actions of other departments, nor to crunching
data to later present it to managers responsible for decision-making. The data scientist’s
work culminates with the delivery of new business opportunities founded on the
comprehensive inspection of data.
Is the company ready to listen to the data scientist?
The data scientist in many cases faces another crucial battle to make sure that their new
status within the company is acknowledged: overcoming resistance to change. Digital
inertia is pushing many companies towards the culture of data, but in more traditional or
larger organizations, where digital natives are often part of the management, this can end
up being a costly journey if it is long, or traumatic if it is short.
The first leg of the company’s journey towards big data must receive firm support
from the management. There are so many departments involved (IT, Business
Intelligence, e-Commerce, Marketing, etc.), and so much coordination among them is
needed for data to flow, be shared and properly used, that only by providing resources
from the top will it be possible for change to take place. Without agility and cooperation,
there can be no results.
In companies where there’s a tendency towards convenience or resistance to change, the
data scientist might even be seen as a gatecrasher who has turned up to lecture experts on
how to run the business. Executives who have long established the rules of the game are
wary of the mathematician, who even seems to be speaking a language that is foreign to
the company.
30 Data Scientists: Who are they? What do they do? How do they work?
The first step in a company’s
journey towards Big Data needs
support from top management.
31 Data Scientists: Who are they? What do they do? How do they work?
This is a cultural issue: the scientific endorsement behind the data scientist’s
recommendations must tap into traditional decision-making processes, based on
experience or other types of indicators, sometimes as simple as a spreadsheet. There may
even be people who ignore the contributions of the data scientist, as they may fear being
put into a compromise to improve results: meeting KPIs can be a painful goal.
A phenomenon that is repeated in all kinds of organizations, including startups, because
ultimately each person tends to protect their own teams and projects. That’s why, as we
shall see later on, entropy and communication are two of the essential non-technical
qualities required to work as a data scientist.
32 Data Scientists: Who are they? What do they do? How do they work?
3.
Who needs a
data scientist?
33 Data Scientists: Who are they? What do they do? How do they work?
34 Data Scientists: Who are they? What do they do? How do they work?
In the United States, data scientist
was listed in 2016 as the job with the
best prospects, based on three factors:
job openings, salary and potential for
career development.
Source: 25 best Jobs in America, Glassdoor.com
35 Data Scientists: Who are they? What do they do? How do they work?
Companies and organizations in countless industries today are embarking upon
projects related to data analysis: banking, communications, entertainment,
healthcare, education, natural resources, insurance, retail, transport, energy, etc.
Many institutions publish their big data repositories, and moreover technologies
to visualize and analyze data are generally available. This scenario facilitates
investigation as anyone with basic training can raise a company-related issue and
collect the data required to solve it.
Why would a company venture into a big data related project? The main objective is
usually to improve customer experience, but other goals include reducing costs,
refocusing marketing strategies, streamlining internal processes or improving
security. We know that we have unprecedented access to information and data.
What’s more, complex systems appear in any field of knowledge. Unpredictability
can manifest itself in all kinds of disciplines: mathematics, physics, chemistry,
engineering, programming, economics, sociology, psychology, etc. There is a
continual challenge to find order or a behavior pattern among the seemingly chaotic
nature of any system.
As a result, there is no shortage of data or, obviously, problems to solve. And there is so
much knowledge out there that it is difficult to create new knowledge, in this instance
understood as any algorithm or model to help improve business performance. Taking on
all these challenges, in addition to a solid technical background, requires huge doses of
passion and motivation. That’s why defining the criticality of the problem to be solved is
crucial for the data scientist.
But, how do you define a good problem? How it is recognized and how are resources
allocated to solve this particular issue and not another? The answer may be subjective,
depending on the other person. But basically, a good problem should meet three
conditions:
• Demonstrate a clear and direct impact on the business.
• Prove solvable with the data at hand.
• Provide sufficient motivation to the data scientist and his/her team.
36 Data Scientists: Who are they? What do they do? How do they work?
“It’s impossible to have someone
who is knowledgeable in all the
businesses in the world. The
company may have a generalist
data scientist and specialists in
the areas where business can be
developed”.
Álvaro Barbero.
37 Data Scientists: Who are they? What do they do? How do they work?
The last question is who can take charge of solving such problems. In his book Building
Data Science Teams, DJ Patil sums up the essence of a guide for employing or hiring a data
scientist:
“The inventor of LinkedIn’s ‘People You May Know’ was an experimental physicist. A
computational chemist on my decision sciences team had solved a 100-year-old problem on
energy states of water. An oceanographer made major impacts on the way we identify fraud.
Perhaps most surprising was the neurosurgeon who turned out to be a wizard at identifying rich
underlying trends in the data”.
Ultimately, all scientists, whatever their training, are able to meet the challenge of
extracting information from data, as long as they convey enough passion for problem-
solving. And it is always beneficial to test the robustness of a model based on the variety of
perspectives provided by different scientific disciplines.
38 Data Scientists: Who are they? What do they do? How do they work?
4.
Skills of a
data scientist
39 Data Scientists: Who are they? What do they do? How do they work?
40 Data Scientists: Who are they? What do they do? How do they work?
“MOOCs are very useful for
training, because they are very
specific and oriented towards a
specific objective.”.
Alejandro Rodríguez.
41 Data Scientists: Who are they? What do they do? How do they work?
The data scientist is not necessarily a professional with a “numbers” training. It’s
not essential to come from disciplines such as mathematics, statistics, physics or exact
sciences, although these educational backgrounds provide a useful foundation. Some data
scientists come from fields such as telecommunications, engineering or computer science,
and even from seemingly obscure areas such as communication, economics, finance or
biomedicine.
Why? Because the most important part of their job is ultimately to analyze data: play with
it, work with it, question it, and love it. The data scientist should be a curious, creative,
innovative and even defiant person, capable of questioning the status quo. And that’s
why their training is not as decisive as their attitude is.
Technical skills
What is clear is that the data scientist’s work revolves around the combination of
technology, creativity and data. There are likely common core requirements when it comes
to their qualifications and performance, but as time goes by, the profile will gradually
diversify into multiple branches and specializations.
In short, the data scientist should be fully at ease with the following four disciplines:
• Statistics / Mathematics: they should be able to analyze databases, build models,
make statistical forecasts and distinguish what is representative from what is not.
Therefore, they should have a strong mathematical background that allows them to
control supervised models with predictive techniques (data mining, machine learning)
and unsupervised segmentation models. Prior to this modelling, they should be able to
work with all mathematical techniques of data pre-processing, and once the model is
built, of data evaluation. In short, they should be familiar with a skill set of techniques
to enable them to construct and to evaluate a predictive model, as well as apply
statistical logic to programming languages.
• Technology: as a requirement for transforming data into knowledge, the data scientist
must understand the business’ technological and have the know how to implement them.
Algorithm design is key to data transformation, and calls for fluency in multiple computer
languages, as well as full knowledge of database management. It’s very important to be
proficient in automation, since many processes are repeated on a computer while the data
scientist is working on refining or calibrating the model.
42 Data Scientists: Who are they? What do they do? How do they work?
“In Spain, we lack the mindset to help
people grow, take risks, even train
them to grow in their job positions”.
Fuencisla Clemares.
43 Data Scientists: Who are they? What do they do? How do they work?
• Business analytics: the data scientist should speak the corporate language,
understand the company’s goals, the industry in which it operates and the
processes that drive profit and growth. Only in this way will they be able to
discern which problems can be feasibly solved through data processing, and
only by understanding the inner workings of the company will they be able
to convert data analysis into insights and valuable recommendations for the
company. Without certain knowledge of the business environment, mere
technical qualifications can lead to rejection of the “techie” or difficulty in
understanding them, or even awkward situations where all they are offered are
obvious answers.
• Communication: the data scientist will at some point have to present meticulous
and accurate results of their work - not based on experience, but on their
analysis- to professionals, often managers with decision-making powers and
extensive business experience but who lack technical training. That’s why they
should possess the ability to communicate with ease and create a dialogue tailored
to the level of their audience. It’s paramount that the result of an analytical
process be able to be understood by any manager within the company, whether
that be an engineer or a social media specialist.
Skills above and beyond technical ones
The data scientist doesn’t only subsist on technical know-how. Ideally, the above
capabilities are complemented by a series of personal characteristics, thereby forming
a skill set (sometimes merely utopian) in which merges specialisation with human
qualities.
• Creativity: in order to give a different perspective analysis thanks to the ability to
use new methods to collect, interpret and analyze data. The technology itself is
not a differential factor from the moment that a program is made available to any
organization. That’s why the significance of know-how is vital: the tools may be the
same for everyone, but the minds handling them are not. Technological uniformity
melts down when intelligence is added, turning the results offered by a software
solution – one which may even be used by the competition - into unique ones.
• Intuition: the ability to choose between one way or another of reaching a solution
is extremely important. Experts underline the importance of applying an artistic
component to a technical working process that usually triggers a fixed sequence
44 Data Scientists: Who are they? What do they do? How do they work?
“To stay on top of everything and
constantly refresh one’s knowledge,
curiosity is essential”.
Marcelo Soria.
45 Data Scientists: Who are they? What do they do? How do they work?
(data processing, curation, modelling, etc.), but which requires an intuitive spark to
discriminate which steps are suited to critical analysis.
• Flexibility: Trial and error mechanisms allow us to evaluate and choose one
option or another for the work already underway, complementing - or even
rectifying - decisions made before starting the project. Mathematical models are
not unique, but are grouped into toolboxes that encompass different techniques.
Therefore, agility is required to opt for a technique or one analytical tool or
another, depending on the structure of the data, the information available, etc.
For professionals trained in theory but with little experience in the practical side
this may represent a point of weakness.
• Curiosity: understood as the ability to ask questions, to comprehend what is asked
and to envisage the right path to take. Curiosity is essential for keeping abreast of
techniques and arts, as well as for constantly refreshing knowledge base. Ultimately,
this will lead the data scientist to draw meaningful inferences from the data.
• Empathy: Although their work is the result of hours and hours spent in front of a
computer, the data scientist is not a lone wolf. The human factor must be present
in their daily lives, in the sense that their work depends on collaboration with other
departments, and it is impossible to pull it off without cooperation. Accustomed
to mobility between projects and areas, the challenge lies in creating free-flowing
dialogue with other parts of the organization. What’s more, they may sometimes
have to present undesirable results to clients or superiors, further reinforcing the
importance of the personal touch.
• Pragmatism: Finally, there’s no point in all this theoretical analysis if it isn’t
accompanied by a practical impact. Technical skills are of little use if the data
scientist isn’t able to integrate into a team or convert all their analytical potential
into results that benefit the company or other working groups. Therefore, they must
be able to transfer data analysis into insights or actions with a direct impact on the
business.
46 Data Scientists: Who are they? What do they do? How do they work?
“At Google, we try to work extensively
in the ecosystem, which is a word we’re
very fond of. We aren’t the ones who
are going to train people, but we can
influence other experts to encourage
such initiatives”.
Fuencisla Clemares.
47 Data Scientists: Who are they? What do they do? How do they work?
How to choose your data scientist
For a profession that is still evolving, traditional recruitment processes are of no use.
Companies like Facebook, Amazon, Google or Microsoft are at the forefront of corporate use
of data science, serving as a benchmark for companies from all industries to understand the
professional profile of recruits and the type of work they perform.
It goes without saying that their technological background is critical: without the relevant
technical training, it is impossible to address the mission of data processing. That’s why
above all it is important to evaluate training and experience in mathematics and computer
science.
But we must also assess the ability to refresh knowledge, grow and learn in an ever-
changing environment, because we’re likely to recruit someone who doesn’t know which
challenges they are going to face in three years’ time.
Therefore, in the selection process it is important to test reasoning skills through problems
where it is not as important to find the right solution as it is to follow a logical process.
Nor is it uncommon to consult references seldom used in other selection processes, for
example, work developed on platforms such as GitHub.
Struggling to find a data scientist? Train them in-house
When recruiting a data processing specialist becomes a complex or financially costly chore,
some companies opt for internal promotion. Professionals already working in an area
related to data analytics are trained or re-trained in disciplines adapted to the new needs of
the company. This is a widespread and perfectly valid procedure for companies that choose
to re-train their specialists in data analytics.
This re-training is favored by the trend towards standardization brought on by technology:
there are countless tools that make the prior task of data analysis and cleansing easier, and
which allow professionals already in the workforce - especially in business intelligence - to
be re-trained in data science.
The pull effect of what some describe as today’s coolest profession, along with
technological standardization, has somewhat lowered the bar of technical knowledge
required to perform the role of data scientist, which actually poses a risk that threatens the
quality of the decision-making process. The tools that automate some of the work with
48 Data Scientists: Who are they? What do they do? How do they work?
Where has data scientist studied?
When looking at data scientists’
academic backgrounds, it’s surprising
that Business Administration is the
second-most common course of study.
Source: “The State of Data Science”, Stitchdata.com
49 Data Scientists: Who are they? What do they do? How do they work?
less specific knowledge globalize and streamline the practice of extracting value from
data, without the need to aspire towards having a data scientist, or at least a data analyst,
on the payroll.
Another advantage of in-house training stems from the unique nature of the data
scientist’s work. Their concerns and personal motivations do not always coincide with
those of other professionals. Their passion for research - let’s not forget that we’re
talking about scientists - and their motivation to learn may actually replace the priority
levels they give to variables such as their rank in the company, advancement, salary
or responsibilities. In this regard, the profile lies halfway between professional and
academic, although we must remember that performance metrics in a company are not the
same as those at a university.
Supermen and superwomen? No, super teams!
Statistics, Technology, Analytics, Communication... Without forgetting human qualities. Is
this skill set very difficult to come across all in one person? Probably so. Simply because
there aren’t many people who can do all that. The alternative is simple: working in
multidisciplinary teams. This involves creating groups that, as a whole, satisfy all these
qualities. A collaborative effort that goes beyond the work of a single person, where the
most important thing is to create a climate where curiosity, motivation, knowledge sharing
and cooperation are encouraged.
Each team member has a clearly defined role, and does not need to know everything: the
modelling expert will work alongside the analytics expert; and the business specialist with
the head of communication. But what is important is that the generalist data scientist has
a global vision of the entire work process, which will avoid situations where, for example,
they invent a mathematical model that cannot be run with the available hardware.
The group should operate smoothly, within a dynamic rather than a rigid structure, because
once the general problem has been identified, specialists centered in a particular area can
be incorporated. Such a smooth operation, besides oiling the wheels of the team, will allow
each group member to focus on areas that most appeal to them.
50 Data Scientists: Who are they? What do they do? How do they work?
“Right now, there is demand
from our Data Science
students even before they
complete their training”.
Esteban Moro.
51 Data Scientists: Who are they? What do they do? How do they work?
The ideal CV
Looking to work as a data scientist? In that case, you should make sure
that your CV features the maximum number of the following skills and
qualifications:
• Programming
- R
- Python
- Spreadsheets
- JavaSript and HTML
- C/C++ o Java, Julia
• Statistics
- Descriptive and inferential statistics
- Experimental design
• Mathematics
- Functions and graphs
- Multivariable calculus
- Linear algebra
And an essential complement: a good command of English, the language in which an
enormous amount of new knowledge is generated.
How much does each specialist earn?
Salaries (in the US)
Data Scientist	 $113,000 / year
Big Data Specialist	 $62,000 / year
Data Analyst 	 $60,000 / year
Source: Glasdoor.com
• Data management
- Database systems
- SQL
• Data communication and visualization
- Visual coding
- Data presentation
- Knowledge of audiences
• Bonus: Intuition
- Project management
- Industry knowledge
• Machine learning
- Supervised learning
- Unsupervised learning
- Reinforcement learning
52 Data Scientists: Who are they? What do they do? How do they work?
5.
The Data
Scientist’s Tools
53 Data Scientists: Who are they? What do they do? How do they work?
54 Data Scientists: Who are they? What do they do? How do they work?
“Expectations are the issue.
Companies don’t understand that
in research, there are times when
things just don’t work out”.
Alejandro Rodríguez.
55 Data Scientists: Who are they? What do they do? How do they work?
Construction of data processing systems, databases,
visualization tools, and data wrangling tools
Within engineering related to the construction of data processing systems, there are three
basic tools to embark upon the analysis of huge volumes of information: Python, R and
Hadoop. While these programming languages are relatively news and not as widespread,
they are easier to grasp for professionals already proficient in programming languages like
Java or C.
R Project. Considered the standard among statistical programming languages, some
know it as “the golden boy” of data science. R is a free software environment dedicated to
statistical computing and graphics, compatible with UNIX, Windows, and MacOS platforms.
It is a must in data science, and being proficient in it practically guarantees a job offer,
given the increasing number of commercial applications and its advantageous versatility.
- R is free: anyone can install, use, upgrade, clone, modify, redistribute, and even
resell R. Not only does it save money on technology projects, but it also provides
constant updates, which are always useful for any statistical programming language.
- R is a high-performance language, which helps users handle large data packages,
making it a great tool for managing big data. It’s also ideal for intense and resource-
intensive simulations.
- Given all its advantages, it is increasingly popular. It has about 2 million users,
who make up an active and supportive community. There are more than 2,000
free libraries with statistical resources devoted to finance, cluster analysis, and
much more.
56 Data Scientists: Who are they? What do they do? How do they work?
Any cultural change is costly or
takes a long time; and if it’s short,
it’s traumatic.
57 Data Scientists: Who are they? What do they do? How do they work?
Python. Another flexible and straightforward open-source programming language. A
programmer working with Python ends up writing less code thanks to its “friendly”
features for beginners, such as code readability, simplified syntax and ease of
implementation.
- As with R, programming in Python is suited to a great deal of industries and
applications. Python powers Google’s search engine, as well as YouTube, Dropbox,
or Reddit. Institutions such as NASA, IBM, and Mozilla also depend heavily on
Python.
- Python is also free, which benefits startups and small businesses. Since the language
favors simplification, it can be handled by small teams. And a good knowledge of the
basics of this target-focused language lets you migrate to another similar language
just by learning the syntax of the new language.
- As a high-performance language, Python is the option often chosen to construct
fast-access applications. Plus, its huge library of resources provides the necessary
help to ensure that productivity is just a few clicks away.
Hadoop. Another staple for anyone who wants to venture into the analysis of big data.
Available as an open-source framework, Hadoop facilitates the storage and processing
of huge amounts of data. It is considered the cornerstone of any flexible and forward-
thinking data platform.
- Hadoop is one of the technologies with the greatest potential for growth within the
data industry. Companies like Dell, Amazon Web Services, IBM, Yahoo, Microsoft,
Google, eBay, and Oracle are firmly committed to Hadoop’s implementation.
- One of its major benefits is to help companies with their marketing needs:
Identifying customer behavior patterns on the website, providing recommendations
and custom targeting, etc.
- Hadoop opens great career opportunities up in a wide variety of positions. Given
its relevance in many industries, Hadoop specialists can find work as an architect,
developer, administrator or data scientist.
58 Data Scientists: Who are they? What do they do? How do they work?
“The reality of Data Scientist’s work is
that you do not know what you’re going
to find behind the data. If you want to
work agilely, you have to be flexible and,
above all, be very practical”.
Álvaro Barbero.
59 Data Scientists: Who are they? What do they do? How do they work?
Another frequent interaction in the data scientist‘s work is with databases. Here it’s
common to work with NoSQL databases, Apache Storm, and processing tools like Spark,
as well as with virtual machines like Storm.
Visualization tools are not as important for creating value as they are for convincing. In
this sense, they’re associated with the results communication phase and the actual work
of rediscovering the value of the data: it’s not the same to trawl through numbers as it is
to present them. Programs such as QlikView, Tableau, and Spotfire are used for this.
Finally, there’s a pretty unglamorous part of the data scientist’s work, which is a process
known as data wrangling. Raw data is often presented in a confused or imperfect way,
so the data first needs to be manually collected and cleaned up before it can be converted
into a structured format to be explored and analyzed. And this is a task that can take
up more than 50% of the data scientist’s working time, using tools like OpenRefine or
Fusion Tables.
Open source or proprietary software?
As in any area where specific software is required, data science professionals can choose
between programs marketed by private companies and open-source software.
Before embarking on a data science project, it’s very important to know exactly which
technological needs will be required to adapt resources and budgets accordingly. This is
one of the reasons why more and more companies are opting for the flexibility of open-
source alternatives. The variety of options arising from the open-source environment
has also helped to expand the use of new technologies and knowledge. Fee-charging
commercial tools that dominated the market up until recently are increasingly seeing their
prominence diminished in favor of free alternatives.
Some experts have warned about manufacturers who try to impose their commercial
solutions on businesses, which end up investing heavily in proprietary applications that
always have an open-source alternative. This captive nature is replaceable by open-source
projects, which are scalable and can offer a performance that’s comparable to proprietary
software.
60 Data Scientists: Who are they? What do they do? How do they work?
6.
Getting down to it:
the work process
61 Data Scientists: Who are they? What do they do? How do they work?
62 Data Scientists: Who are they? What do they do? How do they work?
“Some people get scared because
they think you want to impose an
army of mathematicians on them”.
Manuel Marín.
63 Data Scientists: Who are they? What do they do? How do they work?
The coexistence between analysts and specialists in a company within mixed teams
involves starting out on a journey that will ideally culminate in the opening of new lines
of business. Results don’t sprout up from one day to the next, but data science makes once
seemingly unattainable milestones feasible.
Three obstacles before accessing data
Before buckling down to work, the data scientist first must overcome three obstacles:
1. Access to data
Many companies may amass huge amounts of customer data, but the nature of their
services includes restrictions related to security and privacy. This presents a ‘chicken
and egg’ type of dilemma: as a condition for access to data, management will want to
know the potential value it can bring to the company. No matter how much the analyst
may sound off about this, the real benefits for the company cannot be demonstrated if the
necessary data cannot be accessed.
How can we get out of this quandary? One way of doing so is by pressing on through
scaled models which progressively show the management team the benefits analytics
can bring. Access to a sample of data will help create a model that solves a specific
problem. A small-scale study of specific customers, which can trigger a decision with
immediate impact on the company, is a good starting point. Once the management team
can verify the model’s suitability, by applying it to immediate decisions, the first step
will have been taken.
In this scenario, choosing a suitable problem that has a visible impact on the business
is crucial. Therefore, the analyst needs to show their skills, intuition, and knowledge of
the business. It goes without saying that a model built from a limited sample will have
limited significance, but it is a requirement to fling open the doors of data.
64 Data Scientists: Who are they? What do they do? How do they work?
“There will be a lot of demand
from companies that we could
consider more traditional”.
Bosco Aranguren.
65 Data Scientists: Who are they? What do they do? How do they work?
2. Technological means
Having overcome the first obstacle, the next one appears: having the necessary
technological infrastructure to support access to data, analysis, and the exploration
of results.
It’s not about looking for a culprit if such means are not available: there might not be
anybody in the organization cognizant of the impact that data analysis can have on the
business. But, this path offers no shortcuts: if this work isn’t done, someone will have
to deal with it.
A further problem that often comes up is the decentralization of data. With disaggregated
departments and dispersed databases, each with its own access and security protocols, the
data scientist, sometimes with the help of an engineer, will have to focus on grouping the
data in one place, before they can even get to work.
3. Human resource management
Part of data science, like any other science, is exploration. And exploration calls for a great
deal of inspiration and the lowest possible number of strict orders that stifle creativity.
Passion, perseverance, and curiosity are qualities required in this type of work, and
are often not compatible with rigid organizational structures. Therefore, managers
must be patient and understanding, and always within the varying pressure dictated
by financial results, should grant the data scientist the necessary time and freedom to
move forward with his or her investigation. Once the balance has been achieved between
what motivates employees and the business’s priorities, the results should start to
appear.
From data to decision... if nothing goes wrong
Once the data is available, the data scientist generally undertakes a scaled process. He or
she will have to devote much of their time to cleaning the data, and then set off on a route
that begins with small samples and will end, if all goes well, with the extraction of useful
conclusions based on a predictive model.
66 Data Scientists: Who are they? What do they do? How do they work?
“Oftentimes the reason they end
up hiring you astonishes you”.
Manuel Marín.
67 Data Scientists: Who are they? What do they do? How do they work?
If all goes well... Because data science is not a foolproof process. As in any research project,
there are no absolute certainties. Therefore, we must be prepared for possible failure,
however hard it may be for companies with high expectations and often do not consider the
lack of results to assume.
In projects involving vast databases, it’s not always necessary to use all the data.
Therefore, it is important to scale: starting with a manageable database, going back
and forth, and setting up a permanent dialogue with the person or department most
interested in the project. Then, once a small insight into the potential scope has been
gained, scaling can begin.
The road to this point is sometimes littered with issues related to decision-making:
the focus of the investigation, the data to be used, the analytics to be used… Technical
knowledge does not guarantee the customization of specific projects, always subject to
unforeseen circumstances that are not covered in training centers.
The ratio between available information and decisions is very unbalanced towards
the former. The process of transforming data into decisions may lead to swathes of
information being lost, and the way the process is transmitted plays a role in this journey.
An important decision for the company cannot be conveyed if it is not backed up with
solid arguments about the source of this conclusion, which data has been used and which
processes have been followed to analyze this information and turn it into the nugget that
is the decision.
68 Data Scientists: Who are they? What do they do? How do they work?
7.
Evaluating the
data scientist’s
work
69 Data Scientists: Who are they? What do they do? How do they work?
70 Data Scientists: Who are they? What do they do? How do they work?
In what industries can we find
data scientists?
Technology-heavy industries
account for the largest
accumulation of data scientists.
Fuente: “The State of Data Science”, Stitchdata.com
71 Data Scientists: Who are they? What do they do? How do they work?
Mathematician George E. P. Box, considered one of the most important statisticians of
the twentieth century, famously once said: “All models are wrong, but some are useful”.
Wrong in the sense that they cannot capture all the details of a system, because if they did
that, the model would be so complex that it would contradict the very purpose of modeling.
Yet, that does render models useless; but it does force them to be constantly reinterpreted
and validated using empirical data and knowledge of the system itself, regardless of the
techniques or algorithms used in the analysis.
How can we measure the results of the data scientist’s work? First, we must take the
time horizon into account: benefits are never seen in the short term. The data scientist
develops a predictive model, whose execution depends on whether it is accepted by
management. Machine learning techniques will then be run on the model created to
improve accuracy.
For team leaders, it is important to emphasize the work’s practical application. It is
fundamental, especially in large companies, to ensure that algorithms do not end up simply
as beautiful theories. The responsibility of the data scientist can officially be wrapped up
once they have finished constructing their model, but personal responsibility presses on,
even at the risk of sounding gloomy, until the model is run.
Then comes the wait for results. Models are not foolproof: a key parameter may have been
left out, either because a wrong variable altering the outcome has been entered or because
the subtleties of the business have not been grasped. Execution may also fail: the insight
might be good, but it is not put into practice in the right way.
The quality of the algorithm is not the exclusive yardstick to measure that data scientist’s
performance. Their responsibilities include some sales-related work-dealing with
customers, explaining to them what they have found, guiding them on what to do with
their data, always using the communication skills that the data scientist - or any member
of their team - should hold. Another type of valuation can be extracted from this work.
Finally, let’s once again remind ourselves of the importance of the human factor. Data
science is not a black box enshrouded in mystery. Data scientists are not oracles, nor are
their words prophecies: the algorithm may make a specific prediction, but the option to
translate that insight to the business or not, with all the consequences that it may incur,
ultimately depends on the person who makes the decision. Hence the importance of the
human factor in the whole process.
72 Data Scientists: Who are they? What do they do? How do they work?
8.
Trust: an essential
component in
the data science
process.
73 Data Scientists: Who are they? What do they do? How do they work?
74 Data Scientists: Who are they? What do they do? How do they work?
“In terms of training, I don’t
think there is a gap between
Spain and the United States
or the United Kingdom”.
Pep Porrà.
75 Data Scientists: Who are they? What do they do? How do they work?
Data is highly sensitive, especially when working with outside information. In such cases,
the customer relationship should be respectful and diplomatic: it’s their business, it’s their
data and it’s often their most asset with the most value.
In some industries, there is a certain idea of harnessing a return on data, but the lack
of experience with big data leads to reservations before they even dare to venture into
data analytics. Younger companies are more cautious, perhaps waiting for others in their
industry to take the first step.
It’s also common for companies to take the big data route but are later reluctant to give up
their data, either because they hold back from sharing any conclusions with the market or
because they don’t even want analysts to know them. In this context, the most common
formula is: acquire the tool, train the team in the tool, and then give support.
Another delicate situation arises with the dangers of do-it-yourself data science.
There are some people who choose to blindly apply tools only after learning about them
superficially, with unpredictable results. This creates a buzz that is detrimental to the
entire data science industry, in the sense that companies don’t receive the advertised
benefits of big data, without truly understanding why they haven’t reaped the full rewards.
There are many disoriented companies, that have heard the fanfare about big data, spend
lots of money without knowing what they’re spending it on, or have yet to see the
results. They need to be treated sensitively, with sound judgement and common sense,
clarifying and simplifying the guidelines for action. In an industry where the raw material
is so perilous, trust is essential.
Ethics: the essential complement to science
The data scientist takes on a strong ethical commitment, in the sense that they must
ensure a responsible use of the information given to them. In an increasingly digitalized
society where everyone unwittingly and involuntarily leaves trails, it would be possible
to invade anybody’s freedom simply by using the appropriate knowledge and powerful
servers. But nobody wants that to happen.
Ethical commitment is not just a sign of sound judgement; it is also imperative in an
information society that may face dangers that are not fully known: mass surveillance,
lack of privacy, large-scale loss of data, etc. It is therefore the data scientist’s duty to
work transparently, explaining in a simple and accessible way what their job is and how
76 Data Scientists: Who are they? What do they do? How do they work?
“Clients sometimes comes
across things that they weren’t
expecting, and communicating it
requires specialists who are very
good with people”.
Felipe Ortega.
77 Data Scientists: Who are they? What do they do? How do they work?
they do it, to quash the threat to privacy that people might often associate with big data.
Few people are interested in knowing the intricacies of an algorithm, but they do want an
outline of the route that the data follows.
One way to ensure that data gets used ethically is to work on open data projects, where
anyone can access the data, contributing in some way social awareness and utility. For
example, Spanish bank BBVA has launched several of these projects, designed to improve
the quality of life of citizens or to optimize efficiency in cities through the intelligent use
of information.
Open the data, give something back to society, become an aggregated data platform for
others to use for the creation of value in cutting-edge projects where altruism replaces
the quest for profit. That is the ethical commitment that many data scientists have taken
to safeguard the good name of their specialty.
78 Data Scientists: Who are they? What do they do? How do they work?
9.
Data scientists
in Spain today
79 Data Scientists: Who are they? What do they do? How do they work?
80 Data Scientists: Who are they? What do they do? How do they work?
To stay on top of everything
and constantly refresh one’s
knowledge, curiosity is essential.
81 Data Scientists: Who are they? What do they do? How do they work?
Are Spanish data scientists more qualified or less qualified than other nationalities? Is there
a shortage of professionals? Will academic programs keep up with the expected demand in
the years to come?
Overall, experts agree that Spain is at a par with the leading countries in data science.
There is no shortage of highly qualified professionals or startups specializing in big data
processing which stand out among the most advanced in Europe, if not the world. The
professional level is so high that it’s not unreasonable to think of Spain as a global
powerhouse in data science.
This opportunity must be managed well to make sure it doesn’t fail. As in other scientific
disciplines, excellent professionals are going to other countries to pursue their careers. It’s
true that money draws professionals to places like California, but a high concentration does
not necessarily imply a higher level. For Spanish data scientists to prove their worth, they
should start with loving themselves, acting with professionalism and discretion to ensure a
promising future.
The range of academic programs is also increasingly extensive in both public and private
colleges, where there are countless Master’s programs and specialized courses. This mix is
indispensable in a discipline that is permanently in coexistence with innovation and research.
So, if something were to jeopardize the advancement of data science in Spain, it wouldn’t
be the academic level of specialists, but rather some of the endemic problems provoked by
how work is organized in Spanish corporations. For example, agility when implementing
projects is not comparable to the United States, where there are far fewer bureaucratic
obstacles. Similarly, there is still a gap between academia and the business world: there’s a
lack of dynamism when integrating the work of a data scientist into the business world.
In Spain, there are claims that there is less flexibility in the labor market when it comes to
re-training. Once the professional has focused on a career path, taking the risk to change
it is more difficult than in other countries, due to a tendency towards convenience or
pigeonholing. Therefore, it is important for organizations to support their employees.
That said, Spanish professionals, as well as those from Latin American, have a bonus that
can give them a competitive advantage over their peers in rest of the world: creativity,
understood as the ability to seek out alternative problem-solving processes that nobody
else has imagined. And that fits in with and complements the empathy side. In other words,
other words, creativity lets Spanish data scientists apply a part of art - the other is science
- to problem-solving.
82 Data Scientists: Who are they? What do they do? How do they work?
“Everyone must realize that
our daily life is going to be very
dependent on and influenced
by data analysis”.
Felipe Ortega.
83 Data Scientists: Who are they? What do they do? How do they work?
Who’s making the most out of data
science in Spain?
Three industries are at the forefront of the implementation of data science in Spain:
banking, telecommunications, and tourism. Overall, large companies are investing more
resources into data science. These include entities such as Santander, BBVA, Telefónica,
Bankinter, Sabadell, La Caixa, Amadeus, Kayak, etc.
But this investment isn’t exclusively for large companies. More moderately-sized
companies are using data science in a very creative and innovative way, with worldwide
recognition of their work. Two examples:
Carto
http://www.cartodb.com
Founded in Madrid in 2012, originally as CartoDB. Its most popular tool is Carto
Builder, which allows visualization enthusiasts to build interactive maps from
geodata with no programming skills required. With more than 1,400 customers,
200,000 registered users and an office in New York, its goals focus on offering
large corporations an optimization tool for decision-making and predicting
consumer trends.
Stratio
http://www.stratio.com
Also, founded in 2012 as an offshoot of predecessor Paradigma. Stratio develops
platforms and products from big data technologies such as Cassandra, Apache
Stark, and proprietary developments. Customers using its real-time processing
solution come from banking, insurance, tourism, and retail. More than 25
specialists in big data architecture work out of Stratio’s Madrid headquarters.
Stratio also has an office in Palo Alto, California, the heart of Silicon Valley.
84 Data Scientists: Who are they? What do they do? How do they work?
10.
Conclusions:
still a great deal
to be done
85 Data Scientists: Who are they? What do they do? How do they work?
86 Data Scientists: Who are they? What do they do? How do they work?
“People ask us: are you opening
up data so that everyone can do
business? Well, yes: we let others
have a better knowledge of reality
from our data”.
Marcelo Soria.
87 Data Scientists: Who are they? What do they do? How do they work?
The analysis of big data has already left behind the emerging technology phase (hype cycle)
and is taking hold in many companies. Or, at the very least, certain “core” technologies are,
like: distributed databases, real-time processing, large analytical layers, etc.
With the initial implementation being wrapped up, data science professionals are treading
towards specialization. As the field continues to grow, it is normal for it to split up into
specialties, to form an ecosystem. Companies, to some extent, are promoting this trend
because they cannot afford to properly compensate large teams of data scientists.
The same is happening in training. It’s no longer possible to offer a set of core courses,
so the range of academic content is beginning to diversify. As they define their needs,
companies will continue to increasingly demand sought-after professionals, who are
often awarded grants by the companies that recruit them or guaranteed immediate
employment upon completing their education.
Lots of companies invest huge sums into market research. Some will realize that data
science represents another data source, a new form of RD that converts data into a new
value for the company.
But big data is still in its teenage years. Many challenges lie ahead, derived from handling
large volumes of information and its conversion into useful tools.
What’s the adulthood of big data looking like?
Attention should be shifted from the “bigness” of data to its application. The famous
“Four Vs” of big data (Volume, Velocity, Variety and Veracity) must be expanded to
bring in a new concept: Value. This involves reducing the noise of data and increasing its
contribution.
Data science will mature, strengthen its position, gain recognition as a career and surprise
us with future discoveries. It should be designed as a tool to not only bring transparency to
the present, but to anticipate the future in a way conducive to business growth.
88 Data Scientists: Who are they? What do they do? How do they work?
“It is our duty to give something back
to society. With all the information
companies have about people, they
can greatly improve their lives”.
Richard Benjamins.
89 Data Scientists: Who are they? What do they do? How do they work?
This will be possible by converting data into knowledge, and that knowledge into practical
actions, whether to provide better customer service, boost efficiency through automation,
or create new business opportunities by identifying cross-sells or opening new markets.
At present, most projects related to data analysis focus on cost optimization and process
integration. In the future, predictive analysis will place emphasis on data monetization
and the delivery of new applications and business opportunities. Predictive models in
cloud environments, parallel data processing or sophisticated machine learning algorithms
will optimize or guide the decision-making process.
Ultimately, companies will have to reinvent themselves or reinterpret themselves as
their business becomes more digital and customer proposals will increasingly depend
on lessons learned from data. Companies like Siemens, defined by its CEO defines as “a
software company”, have already fully embarked on this process. A key element of this
evolution will be existence alongside an environment of experimentation, tolerance, and
short development cycles that drive innovation.
The companies leading this evolution will be those who place the figure of data scientist
at the core of their strategy. This way, they will be able to develop the conditions (talent
acquisition, employee commitment and priority-setting) needed to place them at the head
of the race to turn data into a long-lasting and tangible competitive advantage.
In our daily lives, we are already using applications and products that come from
processing a huge amount of data: spam filters in email inboxes, recommendations on
social networks, search engine results, medical tests and prescriptions, investment funds,
etc. And with the future promised by The Internet of Things, the need to process more and
more information will only grow and grow. Our lives may end up highly conditioned, or
heavily influenced at the very least, by the analysis of all the data surrounding us.
A future, in any case, where all those involved in the analysis of big data should be very
cautious with everything related to data privacy and consumer confidence. It doesn’t matter
if our data is used to better manage our time or our money, customize the advertising we
see or improve our health. If we believe that it will improve our lives, we won’t object to
anybody’s use of it.
90 Data Scientists: Who are they? What do they do? How do they work?
Annexed.
91 Data Scientists: Who are they? What do they do? How do they work?
Business case 1
Commerce360
What are my customers most interested in? On what day does my competition outsell me?
Are their items more expensive or cheaper than mine? When do I sell the most? Where
do my buyers live? What is their gender, their age, and how much do they spend on every
purchase?
Any business would like to know the answers to these and similar questions. Large and
medium companies can do this by allocating resources to business intelligence, but it’s
more difficult for independent traders or local stores.
That’s why Spanish bank BBVA has developed Commerce360, a tool that aims to make
business intelligence accessible to any company. Based on aggregated and anonymous data
from BBVA card payments, the application extracts indicators related to the industry and
profile of customers who buy items in a specific area.
“Commerce360 is a tool for retailers, where by using our information on card payments we can
provide a store with its economic activity, purchasing dynamics, socio-demographic information
on what its customers are like, age, gender, where and when they shop, etc., comparing all this
with aggregated businesses that are their competition or other businesses in the area that perform
the same type of activity,” as Marcelo Soria explains.
As a result, retailers once guided by intuition or other traditional methods have access to an
analytical tool that lets them discover the origin of their customers, measure their loyalty,
study their demographic characteristics and identify high-value customers. “For us it is a
very interesting line for democratizing access to data and data-based intelligence. This is
the future of retail,” adds Soria.
92 Data Scientists: Who are they? What do they do? How do they work?
93 Data Scientists: Who are they? What do they do? How do they work?
Business case 2
Smart Steps
SmartSteps is a geo-marketing program developed by Telefónica using data from its mobile
phone network. Data is aggregated and extrapolated anonymously to extract information
on user trends or behavior patterns in a specific area.
The project captures billions of data points from Telefónica’s mobile network, 365 days a
year, 24 hours a day. This data is matched with different sociodemographic and mobility
indicators (residence, means of transport, age) that can offer companies precise targeting
based on the movements of their potential customers.
Smart Steps can be applied to any industry in which the movement and knowledge of the
user profile are important, such as travel and transport, tourism, or outdoor advertising.
For example, local retailers could find out whether participants in an event such as San
Fermín are regular or sporadic, where they come from, where they are staying, the length
of their visit, etc., and with this information they can tailor their sales approach.
It is also useful in the public sector, as knowing people’s movement patterns helps improve
traffic management in the city, adapt public transport, or analyze the need to build new
infrastructure. In 2014, the program was used to map out the most crime-prone areas
in London: the generated algorithm obtained an accuracy of 70% when predicting crime
hotspots.
94 Data Scientists: Who are they? What do they do? How do they work?
95 Data Scientists: Who are they? What do they do? How do they work?
Business case 3
Home Risk Fire Map
25,000 people are killed or injured in house fires every year in the United States. The
American Red Cross aims to reduce the number of victims through an initiative based on
big data.
The Home Fire Risk Map program identifies the locations most house fire-prone across the
country, and will be used by Red Cross volunteers to install smoke alarms and provide fire
safety courses where they’re needed most. Data suggests that 60% of fires can be prevented
simply by having a working smoke alarm and by knowing what to do in the event of a fire.
Using different open data repositories, 50 volunteers worked for over a year to create a map
that identifies high-risk areas throughout the country. First, they built a model to identify
those communities with the least amount of smoke alarm coverage. After that, another
algorithm predicted the places most prone to fires. Lastly, a third program calculated the
likelihood of injury or death when a home fire does occur. The three models and their
results come together on the map presented here.
Thanks to this initiative launched in June 2016, the first month saw the installation of
400,000 smoke alarms in households across the United States, with the goal of reaching
2.5 million alarms. Smoke alarms have an average lifespan of 10 years, which signals that a
year’s work is expected to result in medium-term benefits.
96 Data Scientists: Who are they? What do they do? How do they work?
97 Data Scientists: Who are they? What do they do? How do they work?
Business case 4
The Huffington Post
The Huffington Post is one of the widest-read digital media resources in the world.
And an environment where data analysts enjoy almost as much prominence as editors,
since much of their success is due to big data, which optimizes content, authenticates
comments, boosts advertising clout, and improves user experience.
Real-time statistics and analytical platforms define the editorial process. For HuffPost
it is essential to provide the right content to each reader straight away and in the
right format. For example, data analytics for the Parents section showed that this
demographic mainly uses mobile devices to connect, especially when children are in
bed, and is more active on weekend mornings. Content and advertising is tailored to
these habits.
The huge number of comments received on the website (more than 300 million in 2013)
also encouraged HuffPost executives to debug data to improve user experience. This
was achieved by means of conjoint analysis, a statistical technique used to evaluate the
different characteristics of a product or service. The analysis found that the quality of
comments increased by geographic proximity and in identified users, which led THP to
banning anonymous comments.
Big data was also used to improve user loyalty. In collaboration with technology
company Gravity, HuffPost identified topics of interest for its readers, connecting
the most compelling content for each type of reader through what it calls “passive
personalization”. The technology also provides information on where each reader
accesses content, and helps optimize navigation around the website. With an average of
10 to 12 articles read in each session, the goal is to reach 15.
98 Data Scientists: Who are they? What do they do? How do they work?
99 Data Scientists: Who are they? What do they do? How do they work?
Business case 5
Hillary Clinton’s 2016 campaign
Few Americans will have heard of the name Elan Kriegel. Yet millions of them were in his
sight during the 2016 presidential campaign. Kriegel led a team of 60 mathematicians and
analysts responsible for guiding each of the Democrat candidate’s promotional activities in
the campaign, from the party primaries up to the final vote with absolute precision.
For example, Kriegel’s team developed an algorithm that decided where to spend each cent
of the $60 million TV advertising budget during the primaries. With hundreds of local and
state TV networks scattered throughout the country, the victory over Bernie Sanders was
molded by carefully choosing the states, networks, programs, and schedules where Clinton
would convey her message to voters.
Unlike in other countries, campaigns for elections in the United States get fully customized.
Key decisions were made based on the work of analysts, such as at what time and how
to send email messages to voters, which doors canvassers knock, which numbers phone
bankers would dial, which voters to target via a Facebook ad, and which to address through
regular mail.
This meticulous work turned Clinton’s campaign into more of a mathematical than
inspirational exercise. A ground-breaking and efficient campaign organized around
models defined by data analysis, and which paves the way for a new era in the definition of
political campaigns, based on data culture. And in the meantime, Kriegel’s team is already
incubating the next generation of talent within the Democratic Party, unknown names for
now but which will play a key role in 2020.
#REBELTHINKING
REBEL THINKERS
Iñaki Bagazgoitia
Mar Castaño
Carlos Corredor
Laura Dinneen
Carlota García-Abril
Amelia Hernández
Natasha Morrison
Ellen Thomas
HAVE COLLABORATED
Fuencisla Clemares
Bosco Aranguren
Richard Benjamins
Marcelo Soria
Álvaro Barbero
Alejandro Rodríguez
Manuel Marín
Esteban Moro
Felipe Ortega
Pep Porrà
ACKNOWLEDGMENT

More Related Content

What's hot

The State (and Future) of Digital Marketplaces by Brian Solis
The State (and Future) of Digital Marketplaces by Brian SolisThe State (and Future) of Digital Marketplaces by Brian Solis
The State (and Future) of Digital Marketplaces by Brian Solis
Brian Solis
 
What's the Future of Business Bonus Chapter by Brian Solis
What's the Future of Business Bonus Chapter by Brian SolisWhat's the Future of Business Bonus Chapter by Brian Solis
What's the Future of Business Bonus Chapter by Brian Solis
Brian Solis
 
Digital Publishing-20pp (2)
Digital Publishing-20pp (2)Digital Publishing-20pp (2)
Digital Publishing-20pp (2)
Nick Bennett
 
Making sense-of-the-chaos
Making sense-of-the-chaosMaking sense-of-the-chaos
Making sense-of-the-chaos
swaipnew
 

What's hot (20)

Digital Leadership Interview : Gavin Starks, CEO of the Open Data Institute (...
Digital Leadership Interview : Gavin Starks, CEO of the Open Data Institute (...Digital Leadership Interview : Gavin Starks, CEO of the Open Data Institute (...
Digital Leadership Interview : Gavin Starks, CEO of the Open Data Institute (...
 
The State Of Digital Transformation In China Versus The Rest Of The World by ...
The State Of Digital Transformation In China Versus The Rest Of The World by ...The State Of Digital Transformation In China Versus The Rest Of The World by ...
The State Of Digital Transformation In China Versus The Rest Of The World by ...
 
The State (and Future) of Digital Marketplaces by Brian Solis
The State (and Future) of Digital Marketplaces by Brian SolisThe State (and Future) of Digital Marketplaces by Brian Solis
The State (and Future) of Digital Marketplaces by Brian Solis
 
Horizon Scan: ICT and the Future of Retail
Horizon Scan: ICT and the Future of RetailHorizon Scan: ICT and the Future of Retail
Horizon Scan: ICT and the Future of Retail
 
The 2015 Innovation Forecast Report
The 2015 Innovation Forecast ReportThe 2015 Innovation Forecast Report
The 2015 Innovation Forecast Report
 
Vision Bro. Final
Vision Bro. FinalVision Bro. Final
Vision Bro. Final
 
What's the Future of Business Bonus Chapter by Brian Solis
What's the Future of Business Bonus Chapter by Brian SolisWhat's the Future of Business Bonus Chapter by Brian Solis
What's the Future of Business Bonus Chapter by Brian Solis
 
The End of Business as Usual Rewire the Way You Work to Succeed in the Consum...
The End of Business as Usual Rewire the Way You Work to Succeed in the Consum...The End of Business as Usual Rewire the Way You Work to Succeed in the Consum...
The End of Business as Usual Rewire the Way You Work to Succeed in the Consum...
 
Digital Transformation Review No. 6
Digital Transformation Review No. 6Digital Transformation Review No. 6
Digital Transformation Review No. 6
 
Socially Intelligent Business
Socially Intelligent BusinessSocially Intelligent Business
Socially Intelligent Business
 
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate Studio
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate StudioDigital Leadership Interview : Jon Nordmark, co-founder of Iterate Studio
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate Studio
 
Customer As Strategy
Customer As StrategyCustomer As Strategy
Customer As Strategy
 
Digital Publishing-20pp (2)
Digital Publishing-20pp (2)Digital Publishing-20pp (2)
Digital Publishing-20pp (2)
 
This is the year that was in B2B Marketing crunched
This is the year that was in B2B Marketing crunchedThis is the year that was in B2B Marketing crunched
This is the year that was in B2B Marketing crunched
 
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success
[REPORT PREVIEW] The AI Maturity Playbook: Five Pillars of Enterprise Success
 
The 2016 State of Digital Transformation - Altimeter
The 2016 State of Digital Transformation - AltimeterThe 2016 State of Digital Transformation - Altimeter
The 2016 State of Digital Transformation - Altimeter
 
Blockchain the inception of a new database of everything by dinis guarda bloc...
Blockchain the inception of a new database of everything by dinis guarda bloc...Blockchain the inception of a new database of everything by dinis guarda bloc...
Blockchain the inception of a new database of everything by dinis guarda bloc...
 
6 Data-Driven Leadership Trends for 2016
6 Data-Driven Leadership Trends for 20166 Data-Driven Leadership Trends for 2016
6 Data-Driven Leadership Trends for 2016
 
Big Data evento I ENAA (I Encontro Nacional de Anunciantes e Agencias 2014
Big Data evento I ENAA (I Encontro Nacional de Anunciantes e Agencias 2014Big Data evento I ENAA (I Encontro Nacional de Anunciantes e Agencias 2014
Big Data evento I ENAA (I Encontro Nacional de Anunciantes e Agencias 2014
 
Making sense-of-the-chaos
Making sense-of-the-chaosMaking sense-of-the-chaos
Making sense-of-the-chaos
 

Similar to Data Scientist - Good Rebels -

Why is Data Science a Popular Career Choice.pdf
Why is Data Science a Popular Career Choice.pdfWhy is Data Science a Popular Career Choice.pdf
Why is Data Science a Popular Career Choice.pdf
USDSI
 

Similar to Data Scientist - Good Rebels - (20)

Data science
Data scienceData science
Data science
 
The Future of Big Data
The Future of Big Data The Future of Big Data
The Future of Big Data
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usa
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
 
Data Science
Data ScienceData Science
Data Science
 
Data analytics with managerial application ass 2
Data analytics with managerial application ass 2Data analytics with managerial application ass 2
Data analytics with managerial application ass 2
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science and the art of persuasion
Data science and the art of persuasionData science and the art of persuasion
Data science and the art of persuasion
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Big Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerBig Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its power
 
Ds article ppt
Ds article pptDs article ppt
Ds article ppt
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
The 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big DataThe 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big Data
 
Why is Data Science a Popular Career Choice.pdf
Why is Data Science a Popular Career Choice.pdfWhy is Data Science a Popular Career Choice.pdf
Why is Data Science a Popular Career Choice.pdf
 
Policy paper need for focussed big data & analytics skillset building throu...
Policy  paper  need for focussed big data & analytics skillset building throu...Policy  paper  need for focussed big data & analytics skillset building throu...
Policy paper need for focussed big data & analytics skillset building throu...
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Big Data RF
Big Data RFBig Data RF
Big Data RF
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
 

More from Good Rebels

Webinar: Brands, storytelling and sustainability
Webinar: Brands, storytelling and sustainabilityWebinar: Brands, storytelling and sustainability
Webinar: Brands, storytelling and sustainability
Good Rebels
 
Webinar: "Data Driven Marketing Research Techniques"
Webinar:  "Data Driven Marketing Research Techniques"Webinar:  "Data Driven Marketing Research Techniques"
Webinar: "Data Driven Marketing Research Techniques"
Good Rebels
 
Good Rebels Webinar: Crowdsourcing and Co-creation
Good Rebels Webinar: Crowdsourcing and Co-creation Good Rebels Webinar: Crowdsourcing and Co-creation
Good Rebels Webinar: Crowdsourcing and Co-creation
Good Rebels
 
Cómo construir un proyecto de machine learning desde la Dirección de Márketing
Cómo construir un proyecto de machine learning desde la Dirección de MárketingCómo construir un proyecto de machine learning desde la Dirección de Márketing
Cómo construir un proyecto de machine learning desde la Dirección de Márketing
Good Rebels
 

More from Good Rebels (20)

Marketing Automation para el Sector Asegurador
Marketing Automation para el Sector AseguradorMarketing Automation para el Sector Asegurador
Marketing Automation para el Sector Asegurador
 
Servicio de Marketing Automation Good Rebels
Servicio de Marketing Automation Good RebelsServicio de Marketing Automation Good Rebels
Servicio de Marketing Automation Good Rebels
 
Rastreator (Marketing Automation Good Rebels)
Rastreator (Marketing Automation Good Rebels)Rastreator (Marketing Automation Good Rebels)
Rastreator (Marketing Automation Good Rebels)
 
Selligent insurance (Marketing automation) Good Rebels
Selligent insurance (Marketing automation) Good RebelsSelligent insurance (Marketing automation) Good Rebels
Selligent insurance (Marketing automation) Good Rebels
 
"Los retos del Dircom ante la tecnología conversacional" (Resumen ejecutivo)
"Los retos del Dircom ante la tecnología conversacional" (Resumen ejecutivo)"Los retos del Dircom ante la tecnología conversacional" (Resumen ejecutivo)
"Los retos del Dircom ante la tecnología conversacional" (Resumen ejecutivo)
 
"Los retos del Dircom ante la tecnología conversacional"
"Los retos del Dircom ante la tecnología conversacional" "Los retos del Dircom ante la tecnología conversacional"
"Los retos del Dircom ante la tecnología conversacional"
 
Webinar: Brands, storytelling and sustainability
Webinar: Brands, storytelling and sustainabilityWebinar: Brands, storytelling and sustainability
Webinar: Brands, storytelling and sustainability
 
The Future of Social: Rebel Cocktail 3ª edición
The Future of Social: Rebel Cocktail 3ª ediciónThe Future of Social: Rebel Cocktail 3ª edición
The Future of Social: Rebel Cocktail 3ª edición
 
Diputados en Twitter: Influencia y Conversación
Diputados en Twitter: Influencia y ConversaciónDiputados en Twitter: Influencia y Conversación
Diputados en Twitter: Influencia y Conversación
 
Webinar: Futuro of Social Media by Fernando Polo
Webinar: Futuro of Social Media by Fernando PoloWebinar: Futuro of Social Media by Fernando Polo
Webinar: Futuro of Social Media by Fernando Polo
 
Webinar: "Rebuilding your Brand for a Human-Centred World"
Webinar: "Rebuilding your Brand for a Human-Centred World"Webinar: "Rebuilding your Brand for a Human-Centred World"
Webinar: "Rebuilding your Brand for a Human-Centred World"
 
Marketing in the AI area
Marketing in the AI areaMarketing in the AI area
Marketing in the AI area
 
Webinar: "Data Driven Marketing Research Techniques"
Webinar:  "Data Driven Marketing Research Techniques"Webinar:  "Data Driven Marketing Research Techniques"
Webinar: "Data Driven Marketing Research Techniques"
 
Good Rebels Webinar: Crowdsourcing and Co-creation
Good Rebels Webinar: Crowdsourcing and Co-creation Good Rebels Webinar: Crowdsourcing and Co-creation
Good Rebels Webinar: Crowdsourcing and Co-creation
 
Machine Learning Aplicado al Marketing: Mejorando tu Negocio.
Machine Learning Aplicado al Marketing: Mejorando tu Negocio.Machine Learning Aplicado al Marketing: Mejorando tu Negocio.
Machine Learning Aplicado al Marketing: Mejorando tu Negocio.
 
Cómo construir un proyecto de machine learning desde la Dirección de Márketing
Cómo construir un proyecto de machine learning desde la Dirección de MárketingCómo construir un proyecto de machine learning desde la Dirección de Márketing
Cómo construir un proyecto de machine learning desde la Dirección de Márketing
 
181009 Webinar Data_Driven_Marketing
181009 Webinar Data_Driven_Marketing181009 Webinar Data_Driven_Marketing
181009 Webinar Data_Driven_Marketing
 
Why Social Media is the Heart of Digital Marketing
Why Social Media is the Heart of Digital MarketingWhy Social Media is the Heart of Digital Marketing
Why Social Media is the Heart of Digital Marketing
 
ROI B2B Maximization. Data Driven. Digital Transformation Roadmap
ROI B2B Maximization. Data Driven. Digital Transformation RoadmapROI B2B Maximization. Data Driven. Digital Transformation Roadmap
ROI B2B Maximization. Data Driven. Digital Transformation Roadmap
 
Good rebels smart social webinar - 21 june 2018
Good rebels   smart social webinar - 21 june 2018Good rebels   smart social webinar - 21 june 2018
Good rebels smart social webinar - 21 june 2018
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Data Scientist - Good Rebels -

  • 1. 1 Data Scientists: Who are they? What do they do? How do they work? Data Scientists: Who are they? What do they do? How do they work?
  • 2. 2 Data Scientists: Who are they? What do they do? How do they work? “The sexiest job in the next ten years will be that of the statistician. People think I’m joking, but who would’ve guessed that computer engineers would’ve had the sexiest job of the 1990s?”. Hal Varian, October 2008.
  • 3. 3 Data Scientists: Who are they? What do they do? How do they work? Introduction: Data Scientist, the sexiest job of the decade - Data, data and more data - A little bit of history 1. Where do Data Scientists come from? - Understanding the role of each specialist 2. Data Scientists: seeking their place in the organizational chart - The data was already in-house
 - Are companies ready to listen to the Data Scientist? 3. Who needs a Data Scientist? 4. The Data Scientist skill set - Technical skills 
 - Above and beyond technical skills - How to choose your data scientist 
 - Struggling to find a data scientist? Train them in-house 
 - Supermen and superwomen? No, super teams! 5. The Data Scientist’s tools - Data processing system construction, databases, visualization, and data wrangling tools 
 - Open source or proprietary software? 6. Getting down to it: the work process - Three obstacles to overcome before accessing data - From data to decision... if nothing goes wrong 7. Evaluating the Data Scientist’s work 8. Trust: an essential component in the process of data science - Ethics: science’s essential accessory 9. Data scientists in Spain today - Who’s making the most out of data science in Spain? 10. Conclusions: still a great deal to be done - What does the adulthood of big data look like?
  • 4. 4 Data Scientists: Who are they? What do they do? How do they work? The data scientist is a sort of mix between a programmer, an analyst, a communicator and an adviser. A very difficult combination to come across.
  • 5. 5 Data Scientists: Who are they? What do they do? How do they work? Data scientist, the sexiest job of the decade The figure of the data scientist first emerged in the early twenty-first century. A decade after the widespread business adoption of the Internet, Hal Varian, chief economist at Google, predicted in an interview in October 2008: “The sexiest job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve had the sexiest job of the 1990s?” Varian, also a professor at the University of California, Berkeley, was one of the first to recognize the strategic importance of extracting information from data, and not just at a corporate level. “The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it - that’s going to be a hugely important skill in the next decades. And not only at the professional level, but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So, the complimentary scarce factor is the ability to understand that data and extract value from it”. The truth is that in 2008 a few companies had already incorporated the position in order to manage a volume of information hitherto unknown, due to its variety and sheer scope, in a quest for findings relevant to the business. Until then nobody had called them “Data Scientists”. The first to do so were DJ Patil and Jeff Hammerbacher, then heads of Data Analytics at LinkedIn and Facebook respectively. Eight years later, in 2016, with an increasing volume of data generated on a daily basis, Varian’s predictions are more poignant than ever. According to the McKinsey Global Institute report “Game changers: Five opportunities for US growth and renewal”, the big data industry in the United States could increase annual GDP by 325 billion dollars by 2020. According to the same report, the United States alone will face a shortage of up to 190,000 data scientists and 1.5 million professionals with enough proficiency to use big data effectively. Between 2010 and 2020, the number of companies seeking to incorporate the figure of a data scientist will grow by 18.7%, according to the EMC study “The Digital Universe in 2020”. An estimated 40,000 exabytes of data will be created by 2020, underlying the need for organizations to incorporate talent to conduct in-depth analysis of information.
  • 6. 6 Data Scientists: Who are they? What do they do? How do they work? In reality, many companies (the biggest or the most pioneering ones) have already incorporated the figure of data scientist in any one of its variations. Their sudden appearance in the business world and the high demand for these professionals expected over the coming years confirm that there is a growing need to process large volumes of information and transform it into a valuable asset, given that data “in its raw state” is not useful for companies. Only an in-depth analysis offers the chance to reveal patterns and trends, which at the same time streamline business processes and optimize decision- making. This is where data science emerges as the process that enables the collection, preparation, analysis, visualization, management and preservation of large volumes of data. Extracting valuable information from all types of sources provides solutions to a companies’ vital strategic issues, such as those related to time and cost savings, new product development, the optimization of offers and faster and more accurate decision- making processes. But what does a data scientist actually do? Here at Good Rebels we wanted to outline a profile of this new profession, with the help of various industry leaders from academia, business and institutions. In short, we concluded that the main tasks of a data scientist are to identify data, transform it when incomplete, categorize it, prepare it for analysis, perform the analysis, visualize the results and communicate them. To do this, the data scientist must have technical training in programming, data management, statistics and data mining. And let’s not forget, aside from the analytical part, the ability to focus on creating value for the company. This is why, in a competitive scenario where challenges are constantly renewed and data doesn’t stop flowing, the data scientist’s work enables managers to move from an ad hoc analysis to an ongoing conversation with the data. What kind of person is able to perform this task? The data scientist is a mix between a programmer, an analyst, a communicator and an adviser. With proficiency in statistics, technology, math, and data architecture. All this without forgetting human qualities. A very difficult skill set to find all in one person? Probably so. Simply because there are not many people who can do all that.
  • 7. 7 Data Scientists: Who are they? What do they do? How do they work? So basically, we’re talking about a well-rounded jack-of-all trades proficient in mathematics, IT and data architecture, knowledgeable of business, with strong communication skills as well as empathetic virtues... Professionals refer to this ideal person, given the practical impossibility of finding one on the market, with labels such as “El Dorado, “Unicorn”, “The Data Science Superhero”, “The Dark Beast” or “The New Renaissance Man”. An extremely powerful combination... and very hard to find, because demand is growing and such professionals are in short supply. The solution: training, retraining and building teams that when combined are able to integrate a profile like the one described. Read more: Hal Varian interview at McKinsey.com DJ Patil Biography Building Data Science Teams, at Amazon.com
  • 8. 8 Data Scientists: Who are they? What do they do? How do they work? Data, data and more data With countless services and connected devices, it is estimated that 90% of data has been generated in the last two years. This volume is higher than all the information ever created in the history of mankind. And this is also very good news for anyone who specializes in data management and processing: they’ll probably never be short of work for the rest of their lives. Numerous indicators illustrate this spectacular explosion of data. For example: - In 2020, 1.7 MB of information will be created per second and for every human being, according to EMC forecasts. - Information is constantly being generated, which someone needs to monitor. For example, on Google alone there are 40,000 searches every second. - Facebook is another behemoth when it comes to data generation. Every minute, its users send an average of 31.2 million messages and watch 2.77 million videos. - In May 2016, Facebook and Microsoft began laying a 6,600-km underwater cable between Europe and the US, capable of transmitting 160 TB of data per second. - 80% of photos will be taken with smartphones in 2017. A high percentage of them will be shared via the Internet. - It is estimated that in 2020 more smartphones will be in use than landlines, with a total of 6,100 million users worldwide. - Also in 2020, there will be 50 billion smart devices in use worldwide, all collecting, analyzing, and sharing data. A third of data will travel through the cloud. - 80% of data generated today is unstructured. This includes data found in emails, spreadsheets, social media, the Internet, etc.
  • 9. 9 Data Scientists: Who are they? What do they do? How do they work? - The market for Hadoop (an open-source software framework used to manage networked computers) will grow at an annual rate of 58%, exceeding the value of 1 billion dollars in 2020. - For an average company on the Fortune 1000, an improvement of just 10% in data accessibility will result in over $60 million of additional net income. - Businesses that make full use of the potential of data could boost their operating margins by up to 60%. - Perhaps the most mind-boggling fact, and which highlights the enormous potential that lies ahead for the big data industry: according to MIT, less than 0.5% of all data generated right now is analyzed. Read more: Big Data: 20 Mind-Boggling Facts Everyone Must Read Internet Live Stats Big data: The next frontier for innovation, competition, and productivity
  • 10. 10 Data Scientists: Who are they? What do they do? How do they work? A little bit of history The Cyclopædia of Commercial and Business Anecdotes, published in 1865 by Richard Millar Devens, contains the first recorded reference of the term “business intelligence”. The author described how a banker, Sir Henry Furnese, succeeded by having an understanding of market conditions before his competitors: “Throughout Holland, Flanders, France, and Germany, he maintained a complete and perfect train of business intelligence. The news…was thus received first by him”, Devens writes. Furnese ultimately used this advance knowledge to duplicitous ends and became renowned as a corrupt financier. However, he can be credited for sowing the seeds of business intelligence. Technology did not advance to the point where it could be considered an agent of business intelligence until well into the 20th century. The first commercial computers arrived in the United States in the 1950s. Hans Peter Luhn, a pioneering researcher at IBM, published in 1958 the article “A Business Intelligence System”, in which he defined business intelligence as “the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal”. Luhn contemplated the development of an automatic and intelligent system, built on document processing equipment, capable of designing target-specific action guidelines for the various sections of any organization. With this article, Luhn, considered the father of business intelligence, laid the foundations for information analysis and distribution to serve the needs of a company. It wasn’t until three decades later, in 1989 to be exact, when the analyst Howard Dresner brought the modern definition of business intelligence into the common vernacular. Encompassing somewhat cumbersome-sounding concepts related to data storage and data processing, Dresner summed up the idea of business intelligence as “concepts and methods to improve business decision-making by using fact-based support systems”. From the 2000s, the intersection between different technologies and business needs prompted new concepts and terminologies: data engineering, business analytics, data mining, etc. There is currently no clear consensus on exactly where the skills of each of these disciplines begin and end, nor to what extent some overlap with others. But what’s clear is that they all coexist under the umbrella of big data.
  • 11. 11 Data Scientists: Who are they? What do they do? How do they work? Read more: Richard Miller Devens - Cyclopædia of commercial and business anecdotes Hans Peter Luhn – A Business Intelligence System Howard Dresner’s blog
  • 12. 12 Data Scientists: Who are they? What do they do? How do they work?
  • 13. 13 Data Scientists: Who are they? What do they do? How do they work? The following people have participated in this study: Bosco Aranguren Chief Marketing Officer, Microsoft Iberia CMO at Microsoft Iberia since March 2017. Previously, he was responsible for Programmatic Media Buying at Google. He joined Google in 2010 as Industry Head Automotive, and in 2012 he became Industry Head CPG Entertainment Álvaro Barbero Chief Data Scientist at Instituto de Ingeniería del Conocimiento (IIC) Expert in the fields of machine learning, optimization and algorithm engineering. His work is to transform advances in these areas into practical Big Data systems, from predictive and recommender systems to automated text analysis and resource optimization. Richard Benjamins Director of External Positioning Big Data, LUCA: Data-Driven Decisions Director of External Positioning Big Data for Social Good at Telefonica in Telefonica’s Chief Data Office. In his previous position of Group Director BI Big Data he was responsible for internal exploitation of Big Data across Telefonica. He was also Director of Business Intelligence at Telefonica Digital, and before that he was Director of User Modelling where he led Global BI programs.
  • 14. 14 Data Scientists: Who are they? What do they do? How do they work? Fuencisla Clemares Country Manager at Google Spain Portugal Joined Google in 2009 as Manager of Retail and Consumer goods; after that, she led the Telecommunications, Banking and Insurance sectors, along with the mobile strategy for Spain. Prior to Google, she worked for seven years as a strategic consultant at McKinsey Company, and later became Director of Purchasing in the Carrefour home division. Manuel Marín Data Analytics Manager, PwC Data Analytics Manager at PwC. Before that, he was Chief Technical Officer at APARA, and applied predictive analytics in telco, banking, insurance, energy, health, sports and retail companies in the areas of fraud detection and customer intelligence. Esteban Moro Associate Professor at Universidad Carlos III de Madrid Esteban is professor at Universidad Carlos III de Madrid and member of the Joint Institute UC3M-Santander on Big Data and academic director of the Master of Data Science and Big Data on Finance by AFI. He serves as consultant for many public and private institutions. His areas of interests are applied mathematics, financial mathematics, viral marketing and social network.
  • 15. 15 Data Scientists: Who are they? What do they do? How do they work? Felipe Ortega Director of the Master in Data Science at Universidad Rey Juan Carlos Assistant Professor in the Department of Theory on Signal and Communication and Telematic Systems and Computing, School of Telecommunications Engineering at University Rey Juan Carlos (Madrid). He is co-founder of the Data Science Lab at the Center for Intelligent Information Systems (CETINIA) and Academic Director of the Master in Data Science at UJC. His main areas of research are data engineering, computational statistics, machine learning, quantitative methods, open source software, large-scale data management and data visualization. Pep Porrà Business Performance Director, King.com Business Performance Director at King.com, where he leads a team of Data Scientists and Business Performance managers focused on evaluate, anticipate and understand the monetization impact of game features. Prior to work in corporate, he was a Statistics and Mathematics Professor at University of Barcelona. Alejandro Rodríguez Professor at Universidad Politécnica de Madrid Professor at the Department of Computer Languages and Systems and Software Engineering at UPM. Specialized researcher in the fields of medical informatics, knowledge representation, expert systems and semantic web. Marcelo Soria Partner at Tramontana.co From mid-2016, partner at Tramontana.co. Between May 2014 and May 2016, he was VP of Data Services at BBVA Data Analytics, and before that he was Big Data // Smart Cities initiative co-leader at BBVA.
  • 16. 16 Data Scientists: Who are they? What do they do? How do they work? 1. Where do data scientists come from?
  • 17. 17 Data Scientists: Who are they? What do they do? How do they work?
  • 18. 18 Data Scientists: Who are they? What do they do? How do they work? Where are data scientists? More than half of these professionals are concentrated in the United States. Spain is ranked as the eighth country in the world with the highest number of data scientists in employment. “The State of Data Science”, Stitchdata.com
  • 19. 19 Data Scientists: Who are they? What do they do? How do they work? DJ Patil, currently Chief Data Scientist for the US government, was the first to coin the term “data scientist”, during his tenure at LinkedIn. But nearly a decade after, there is still some controversy about its exact meaning, and whether or not this role differs from that performed by data analysts in companies for many years now. For some, the origin of data science lies in machine learning. All prediction and classification models have been developed from this branch. Professionals trained in this discipline were mainly mathematicians who also had programming skills that enabled them to implement and test predictive models, as it represents a non-theoretical branch of mathematics. The huge change in the amount of data being handled by organizations is the main driving force behind the new profile. If elements such as big data and machine learning are added to traditional data analytics, we may well be talking about a new theoretical discipline - and also job category - whose terms are being defined virtually at the same time as the market creates demand. What distinguishes a data scientist is a different, more scientific type of training, which allows them to use the very latest techniques to access mass data, not only at the level of exploration, but also speed. A profile both academic training and professional. Due to the current lack of consensus on their characteristics and skills, there is a wide spectrum of professionals included in the category of data scientist. It is important, though, that they meet a set of characteristics: they should be able to use their knowledge to extract non-obvious information from data and empirical evidence, and also present it in an understandable way. Each specialist has their place and time Data science, big data, data analytics... Terms that we’ve been hearing for years now, but are still somewhat enshrouded in confusion when it comes to their definition and competencies. What’s involved in each of these disciplines? First and foremost, it’s important to stress that the role of data scientist is different from that of an analyst who designs models or forecasts. The data scientist is not only expected to explain the effect that the data will have on the company’s future, but also to provide solutions that help the company to grow, both in the present and in the future.
  • 20. 20 Data Scientists: Who are they? What do they do? How do they work? “You can not communicate a relevant decision in your business if you are not able to explain how you got it, what data you have used, and what processes you have followed to break it down.” Esteban Moro.
  • 21. 21 Data Scientists: Who are they? What do they do? How do they work? Data science - Faced with structured or unstructured data, data science is a field that encompasses everything related to the cleaning (curation), preparation and analysis of data. - Data science consists of a medley of statistics, mathematics and programming, peppered with problem-solving, data extraction using as much ingenuity as required and the ability to scrutinize a problem from different perspectives. - The data scientist shifts business cases to an analytical plane, develops hypotheses and patterns, and evaluates their impact on the business. This deep analysis has the ultimate goal of solving complex business issues efficiently and anticipating future needs. Big Data - Big data refers to huge volumes of data, proprietary or third-party and usually non- aggregated, the size of which prevents it from being processed effectively using traditional applications. - Big data is a term that is gaining more and more ground in firms and industries. The analysis of data trends using sophisticated algorithms and other cutting-edge information processing methods ultimately improves strategic decisions that are a driving force behind business. Data analytics - Data analytics uses data to examine market and business trends, and to develop or improve methods linked to productivity and cost reduction. - The essence of data analytics is inference, which is the process of drawing conclusions based solely on what the researcher already knows. - Data analytics is used in many industries to help companies improve decision- making, as well as to verify or refute existing theories and models.
  • 22. 22 Data Scientists: Who are they? What do they do? How do they work? “The next big challenge in the gaming industry is to create smart systems. To convert data into new value for the company”. Pep Porrà.
  • 23. 23 Data Scientists: Who are they? What do they do? How do they work? A hypothetical case will let us see the different processes involved in a data science project. Let’s imagine that every day millions of images are uploaded to a restaurant review site and they need to be catalogued: are they pictures of food? What kind of food? Or are they of a restaurant? Of the outside or the inside? Machine learning automatically classifies each image into its respective category. Properly “trained”, a computer can figure out, for example, if the photo of a restaurant is of the inside or the outside. The data scientist oversees the entire project, from selecting the right algorithm to engineering design. - The data scientist creates the model which allows the computer to make this distinction, using different sources of information ranging from manually classified images to keywords in screenshots. - Using data engineering techniques, a data feed and storage system is created, to which algorithms are applied on a large scale. - Finally, analysis is made of the business implications for the company of the innovation applied: is it useful for business? Will it help the website generate more traffic?... and so on. The findings are then presented using visualization tools.
  • 24. 24 Data Scientists: Who are they? What do they do? How do they work? 2. Data scientists: seeking their place within the organizational chart
  • 25. 25 Data Scientists: Who are they? What do they do? How do they work?
  • 26. 26 Data Scientists: Who are they? What do they do? How do they work? “The problem we often find is that data has been managed in isolation. And then the time comes to enable that data and there’s no communication going on”. Bosco Aranguren.
  • 27. 27 Data Scientists: Who are they? What do they do? How do they work? The data scientist isn’t a radically new profile that’s being defined from scratch. Companies have long been resorting to in-depth data analysis as a valuable tool that helps meet or exceed their goals. What’s changed now is the dimension of this analysis, as in a greater volume of data calls for a different approach, with regard both to procedures and the purpose of the analysis. Many experts stress the idea of rediscovering data, or rather, discovering its value contribution to the company. The person who used to manage data, target customers or detect products with the greatest turnover quite clearly added value to the company. But the data scientist’s role goes much further. The data was already in-house It’s true that the figure of data manager has existed in companies for some time now. Data Analytics has been used in the telecommunications industry for at least 20 years. Banking also has been using Business Intelligence for several years, as have - somewhat more mutedly - all major companies at the helm of their respective industries. However, far from being a cross-disciplinary practice, data analysis has often only been applied in specific departments, mainly in Marketing, Communication and Customer Insights. A form of pigeonholing which has to a certain extent jeopardized its importance within the hierarchy of company priorities. The main problem in companies without a data-focused corporate culture is that they were often run in a decentralized and disorganized way. As a result of this siloed management, each corporate department has been taking technology-related decisions it deemed the most appropriate at any given time. Now that the time has come to deal with data, experts are encountering barriers and incompatibilities that hugely complicate their work. In institutions with enormous historical repositories, grouping together and processing data files is a colossal effort, but once this path of self-learning has been completed, the work translates into improvements in internal processes, people management and/or customer service.
  • 28. 28 Data Scientists: Who are they? What do they do? How do they work? “Technically you can do just about everything, but the organization must then be prepared to use it”. Richard Benjamins.
  • 29. 29 Data Scientists: Who are they? What do they do? How do they work? The difference when compared to the situation in recent years is that data analytics specialists now have much more powerful and effective technological resources, allowing them to extract greater value from the information. Computing costs are lower, data availability is higher and connectivity between both is greater, so this raises the chances of finding patterns or potential case-based reasoning, helping to update the practice of using data to improve management. In this process of recognizing the status of data scientists, it’s vital to mention a fundamental advance in their professional acknowledgement: they have taken on the crucial responsibility to commit towards improving company results. Their mission is no longer limited to guiding or advising the actions of other departments, nor to crunching data to later present it to managers responsible for decision-making. The data scientist’s work culminates with the delivery of new business opportunities founded on the comprehensive inspection of data. Is the company ready to listen to the data scientist? The data scientist in many cases faces another crucial battle to make sure that their new status within the company is acknowledged: overcoming resistance to change. Digital inertia is pushing many companies towards the culture of data, but in more traditional or larger organizations, where digital natives are often part of the management, this can end up being a costly journey if it is long, or traumatic if it is short. The first leg of the company’s journey towards big data must receive firm support from the management. There are so many departments involved (IT, Business Intelligence, e-Commerce, Marketing, etc.), and so much coordination among them is needed for data to flow, be shared and properly used, that only by providing resources from the top will it be possible for change to take place. Without agility and cooperation, there can be no results. In companies where there’s a tendency towards convenience or resistance to change, the data scientist might even be seen as a gatecrasher who has turned up to lecture experts on how to run the business. Executives who have long established the rules of the game are wary of the mathematician, who even seems to be speaking a language that is foreign to the company.
  • 30. 30 Data Scientists: Who are they? What do they do? How do they work? The first step in a company’s journey towards Big Data needs support from top management.
  • 31. 31 Data Scientists: Who are they? What do they do? How do they work? This is a cultural issue: the scientific endorsement behind the data scientist’s recommendations must tap into traditional decision-making processes, based on experience or other types of indicators, sometimes as simple as a spreadsheet. There may even be people who ignore the contributions of the data scientist, as they may fear being put into a compromise to improve results: meeting KPIs can be a painful goal. A phenomenon that is repeated in all kinds of organizations, including startups, because ultimately each person tends to protect their own teams and projects. That’s why, as we shall see later on, entropy and communication are two of the essential non-technical qualities required to work as a data scientist.
  • 32. 32 Data Scientists: Who are they? What do they do? How do they work? 3. Who needs a data scientist?
  • 33. 33 Data Scientists: Who are they? What do they do? How do they work?
  • 34. 34 Data Scientists: Who are they? What do they do? How do they work? In the United States, data scientist was listed in 2016 as the job with the best prospects, based on three factors: job openings, salary and potential for career development. Source: 25 best Jobs in America, Glassdoor.com
  • 35. 35 Data Scientists: Who are they? What do they do? How do they work? Companies and organizations in countless industries today are embarking upon projects related to data analysis: banking, communications, entertainment, healthcare, education, natural resources, insurance, retail, transport, energy, etc. Many institutions publish their big data repositories, and moreover technologies to visualize and analyze data are generally available. This scenario facilitates investigation as anyone with basic training can raise a company-related issue and collect the data required to solve it. Why would a company venture into a big data related project? The main objective is usually to improve customer experience, but other goals include reducing costs, refocusing marketing strategies, streamlining internal processes or improving security. We know that we have unprecedented access to information and data. What’s more, complex systems appear in any field of knowledge. Unpredictability can manifest itself in all kinds of disciplines: mathematics, physics, chemistry, engineering, programming, economics, sociology, psychology, etc. There is a continual challenge to find order or a behavior pattern among the seemingly chaotic nature of any system. As a result, there is no shortage of data or, obviously, problems to solve. And there is so much knowledge out there that it is difficult to create new knowledge, in this instance understood as any algorithm or model to help improve business performance. Taking on all these challenges, in addition to a solid technical background, requires huge doses of passion and motivation. That’s why defining the criticality of the problem to be solved is crucial for the data scientist. But, how do you define a good problem? How it is recognized and how are resources allocated to solve this particular issue and not another? The answer may be subjective, depending on the other person. But basically, a good problem should meet three conditions: • Demonstrate a clear and direct impact on the business. • Prove solvable with the data at hand. • Provide sufficient motivation to the data scientist and his/her team.
  • 36. 36 Data Scientists: Who are they? What do they do? How do they work? “It’s impossible to have someone who is knowledgeable in all the businesses in the world. The company may have a generalist data scientist and specialists in the areas where business can be developed”. Álvaro Barbero.
  • 37. 37 Data Scientists: Who are they? What do they do? How do they work? The last question is who can take charge of solving such problems. In his book Building Data Science Teams, DJ Patil sums up the essence of a guide for employing or hiring a data scientist: “The inventor of LinkedIn’s ‘People You May Know’ was an experimental physicist. A computational chemist on my decision sciences team had solved a 100-year-old problem on energy states of water. An oceanographer made major impacts on the way we identify fraud. Perhaps most surprising was the neurosurgeon who turned out to be a wizard at identifying rich underlying trends in the data”. Ultimately, all scientists, whatever their training, are able to meet the challenge of extracting information from data, as long as they convey enough passion for problem- solving. And it is always beneficial to test the robustness of a model based on the variety of perspectives provided by different scientific disciplines.
  • 38. 38 Data Scientists: Who are they? What do they do? How do they work? 4. Skills of a data scientist
  • 39. 39 Data Scientists: Who are they? What do they do? How do they work?
  • 40. 40 Data Scientists: Who are they? What do they do? How do they work? “MOOCs are very useful for training, because they are very specific and oriented towards a specific objective.”. Alejandro Rodríguez.
  • 41. 41 Data Scientists: Who are they? What do they do? How do they work? The data scientist is not necessarily a professional with a “numbers” training. It’s not essential to come from disciplines such as mathematics, statistics, physics or exact sciences, although these educational backgrounds provide a useful foundation. Some data scientists come from fields such as telecommunications, engineering or computer science, and even from seemingly obscure areas such as communication, economics, finance or biomedicine. Why? Because the most important part of their job is ultimately to analyze data: play with it, work with it, question it, and love it. The data scientist should be a curious, creative, innovative and even defiant person, capable of questioning the status quo. And that’s why their training is not as decisive as their attitude is. Technical skills What is clear is that the data scientist’s work revolves around the combination of technology, creativity and data. There are likely common core requirements when it comes to their qualifications and performance, but as time goes by, the profile will gradually diversify into multiple branches and specializations. In short, the data scientist should be fully at ease with the following four disciplines: • Statistics / Mathematics: they should be able to analyze databases, build models, make statistical forecasts and distinguish what is representative from what is not. Therefore, they should have a strong mathematical background that allows them to control supervised models with predictive techniques (data mining, machine learning) and unsupervised segmentation models. Prior to this modelling, they should be able to work with all mathematical techniques of data pre-processing, and once the model is built, of data evaluation. In short, they should be familiar with a skill set of techniques to enable them to construct and to evaluate a predictive model, as well as apply statistical logic to programming languages. • Technology: as a requirement for transforming data into knowledge, the data scientist must understand the business’ technological and have the know how to implement them. Algorithm design is key to data transformation, and calls for fluency in multiple computer languages, as well as full knowledge of database management. It’s very important to be proficient in automation, since many processes are repeated on a computer while the data scientist is working on refining or calibrating the model.
  • 42. 42 Data Scientists: Who are they? What do they do? How do they work? “In Spain, we lack the mindset to help people grow, take risks, even train them to grow in their job positions”. Fuencisla Clemares.
  • 43. 43 Data Scientists: Who are they? What do they do? How do they work? • Business analytics: the data scientist should speak the corporate language, understand the company’s goals, the industry in which it operates and the processes that drive profit and growth. Only in this way will they be able to discern which problems can be feasibly solved through data processing, and only by understanding the inner workings of the company will they be able to convert data analysis into insights and valuable recommendations for the company. Without certain knowledge of the business environment, mere technical qualifications can lead to rejection of the “techie” or difficulty in understanding them, or even awkward situations where all they are offered are obvious answers. • Communication: the data scientist will at some point have to present meticulous and accurate results of their work - not based on experience, but on their analysis- to professionals, often managers with decision-making powers and extensive business experience but who lack technical training. That’s why they should possess the ability to communicate with ease and create a dialogue tailored to the level of their audience. It’s paramount that the result of an analytical process be able to be understood by any manager within the company, whether that be an engineer or a social media specialist. Skills above and beyond technical ones The data scientist doesn’t only subsist on technical know-how. Ideally, the above capabilities are complemented by a series of personal characteristics, thereby forming a skill set (sometimes merely utopian) in which merges specialisation with human qualities. • Creativity: in order to give a different perspective analysis thanks to the ability to use new methods to collect, interpret and analyze data. The technology itself is not a differential factor from the moment that a program is made available to any organization. That’s why the significance of know-how is vital: the tools may be the same for everyone, but the minds handling them are not. Technological uniformity melts down when intelligence is added, turning the results offered by a software solution – one which may even be used by the competition - into unique ones. • Intuition: the ability to choose between one way or another of reaching a solution is extremely important. Experts underline the importance of applying an artistic component to a technical working process that usually triggers a fixed sequence
  • 44. 44 Data Scientists: Who are they? What do they do? How do they work? “To stay on top of everything and constantly refresh one’s knowledge, curiosity is essential”. Marcelo Soria.
  • 45. 45 Data Scientists: Who are they? What do they do? How do they work? (data processing, curation, modelling, etc.), but which requires an intuitive spark to discriminate which steps are suited to critical analysis. • Flexibility: Trial and error mechanisms allow us to evaluate and choose one option or another for the work already underway, complementing - or even rectifying - decisions made before starting the project. Mathematical models are not unique, but are grouped into toolboxes that encompass different techniques. Therefore, agility is required to opt for a technique or one analytical tool or another, depending on the structure of the data, the information available, etc. For professionals trained in theory but with little experience in the practical side this may represent a point of weakness. • Curiosity: understood as the ability to ask questions, to comprehend what is asked and to envisage the right path to take. Curiosity is essential for keeping abreast of techniques and arts, as well as for constantly refreshing knowledge base. Ultimately, this will lead the data scientist to draw meaningful inferences from the data. • Empathy: Although their work is the result of hours and hours spent in front of a computer, the data scientist is not a lone wolf. The human factor must be present in their daily lives, in the sense that their work depends on collaboration with other departments, and it is impossible to pull it off without cooperation. Accustomed to mobility between projects and areas, the challenge lies in creating free-flowing dialogue with other parts of the organization. What’s more, they may sometimes have to present undesirable results to clients or superiors, further reinforcing the importance of the personal touch. • Pragmatism: Finally, there’s no point in all this theoretical analysis if it isn’t accompanied by a practical impact. Technical skills are of little use if the data scientist isn’t able to integrate into a team or convert all their analytical potential into results that benefit the company or other working groups. Therefore, they must be able to transfer data analysis into insights or actions with a direct impact on the business.
  • 46. 46 Data Scientists: Who are they? What do they do? How do they work? “At Google, we try to work extensively in the ecosystem, which is a word we’re very fond of. We aren’t the ones who are going to train people, but we can influence other experts to encourage such initiatives”. Fuencisla Clemares.
  • 47. 47 Data Scientists: Who are they? What do they do? How do they work? How to choose your data scientist For a profession that is still evolving, traditional recruitment processes are of no use. Companies like Facebook, Amazon, Google or Microsoft are at the forefront of corporate use of data science, serving as a benchmark for companies from all industries to understand the professional profile of recruits and the type of work they perform. It goes without saying that their technological background is critical: without the relevant technical training, it is impossible to address the mission of data processing. That’s why above all it is important to evaluate training and experience in mathematics and computer science. But we must also assess the ability to refresh knowledge, grow and learn in an ever- changing environment, because we’re likely to recruit someone who doesn’t know which challenges they are going to face in three years’ time. Therefore, in the selection process it is important to test reasoning skills through problems where it is not as important to find the right solution as it is to follow a logical process. Nor is it uncommon to consult references seldom used in other selection processes, for example, work developed on platforms such as GitHub. Struggling to find a data scientist? Train them in-house When recruiting a data processing specialist becomes a complex or financially costly chore, some companies opt for internal promotion. Professionals already working in an area related to data analytics are trained or re-trained in disciplines adapted to the new needs of the company. This is a widespread and perfectly valid procedure for companies that choose to re-train their specialists in data analytics. This re-training is favored by the trend towards standardization brought on by technology: there are countless tools that make the prior task of data analysis and cleansing easier, and which allow professionals already in the workforce - especially in business intelligence - to be re-trained in data science. The pull effect of what some describe as today’s coolest profession, along with technological standardization, has somewhat lowered the bar of technical knowledge required to perform the role of data scientist, which actually poses a risk that threatens the quality of the decision-making process. The tools that automate some of the work with
  • 48. 48 Data Scientists: Who are they? What do they do? How do they work? Where has data scientist studied? When looking at data scientists’ academic backgrounds, it’s surprising that Business Administration is the second-most common course of study. Source: “The State of Data Science”, Stitchdata.com
  • 49. 49 Data Scientists: Who are they? What do they do? How do they work? less specific knowledge globalize and streamline the practice of extracting value from data, without the need to aspire towards having a data scientist, or at least a data analyst, on the payroll. Another advantage of in-house training stems from the unique nature of the data scientist’s work. Their concerns and personal motivations do not always coincide with those of other professionals. Their passion for research - let’s not forget that we’re talking about scientists - and their motivation to learn may actually replace the priority levels they give to variables such as their rank in the company, advancement, salary or responsibilities. In this regard, the profile lies halfway between professional and academic, although we must remember that performance metrics in a company are not the same as those at a university. Supermen and superwomen? No, super teams! Statistics, Technology, Analytics, Communication... Without forgetting human qualities. Is this skill set very difficult to come across all in one person? Probably so. Simply because there aren’t many people who can do all that. The alternative is simple: working in multidisciplinary teams. This involves creating groups that, as a whole, satisfy all these qualities. A collaborative effort that goes beyond the work of a single person, where the most important thing is to create a climate where curiosity, motivation, knowledge sharing and cooperation are encouraged. Each team member has a clearly defined role, and does not need to know everything: the modelling expert will work alongside the analytics expert; and the business specialist with the head of communication. But what is important is that the generalist data scientist has a global vision of the entire work process, which will avoid situations where, for example, they invent a mathematical model that cannot be run with the available hardware. The group should operate smoothly, within a dynamic rather than a rigid structure, because once the general problem has been identified, specialists centered in a particular area can be incorporated. Such a smooth operation, besides oiling the wheels of the team, will allow each group member to focus on areas that most appeal to them.
  • 50. 50 Data Scientists: Who are they? What do they do? How do they work? “Right now, there is demand from our Data Science students even before they complete their training”. Esteban Moro.
  • 51. 51 Data Scientists: Who are they? What do they do? How do they work? The ideal CV Looking to work as a data scientist? In that case, you should make sure that your CV features the maximum number of the following skills and qualifications: • Programming - R - Python - Spreadsheets - JavaSript and HTML - C/C++ o Java, Julia • Statistics - Descriptive and inferential statistics - Experimental design • Mathematics - Functions and graphs - Multivariable calculus - Linear algebra And an essential complement: a good command of English, the language in which an enormous amount of new knowledge is generated. How much does each specialist earn? Salaries (in the US) Data Scientist $113,000 / year Big Data Specialist $62,000 / year Data Analyst $60,000 / year Source: Glasdoor.com • Data management - Database systems - SQL • Data communication and visualization - Visual coding - Data presentation - Knowledge of audiences • Bonus: Intuition - Project management - Industry knowledge • Machine learning - Supervised learning - Unsupervised learning - Reinforcement learning
  • 52. 52 Data Scientists: Who are they? What do they do? How do they work? 5. The Data Scientist’s Tools
  • 53. 53 Data Scientists: Who are they? What do they do? How do they work?
  • 54. 54 Data Scientists: Who are they? What do they do? How do they work? “Expectations are the issue. Companies don’t understand that in research, there are times when things just don’t work out”. Alejandro Rodríguez.
  • 55. 55 Data Scientists: Who are they? What do they do? How do they work? Construction of data processing systems, databases, visualization tools, and data wrangling tools Within engineering related to the construction of data processing systems, there are three basic tools to embark upon the analysis of huge volumes of information: Python, R and Hadoop. While these programming languages are relatively news and not as widespread, they are easier to grasp for professionals already proficient in programming languages like Java or C. R Project. Considered the standard among statistical programming languages, some know it as “the golden boy” of data science. R is a free software environment dedicated to statistical computing and graphics, compatible with UNIX, Windows, and MacOS platforms. It is a must in data science, and being proficient in it practically guarantees a job offer, given the increasing number of commercial applications and its advantageous versatility. - R is free: anyone can install, use, upgrade, clone, modify, redistribute, and even resell R. Not only does it save money on technology projects, but it also provides constant updates, which are always useful for any statistical programming language. - R is a high-performance language, which helps users handle large data packages, making it a great tool for managing big data. It’s also ideal for intense and resource- intensive simulations. - Given all its advantages, it is increasingly popular. It has about 2 million users, who make up an active and supportive community. There are more than 2,000 free libraries with statistical resources devoted to finance, cluster analysis, and much more.
  • 56. 56 Data Scientists: Who are they? What do they do? How do they work? Any cultural change is costly or takes a long time; and if it’s short, it’s traumatic.
  • 57. 57 Data Scientists: Who are they? What do they do? How do they work? Python. Another flexible and straightforward open-source programming language. A programmer working with Python ends up writing less code thanks to its “friendly” features for beginners, such as code readability, simplified syntax and ease of implementation. - As with R, programming in Python is suited to a great deal of industries and applications. Python powers Google’s search engine, as well as YouTube, Dropbox, or Reddit. Institutions such as NASA, IBM, and Mozilla also depend heavily on Python. - Python is also free, which benefits startups and small businesses. Since the language favors simplification, it can be handled by small teams. And a good knowledge of the basics of this target-focused language lets you migrate to another similar language just by learning the syntax of the new language. - As a high-performance language, Python is the option often chosen to construct fast-access applications. Plus, its huge library of resources provides the necessary help to ensure that productivity is just a few clicks away. Hadoop. Another staple for anyone who wants to venture into the analysis of big data. Available as an open-source framework, Hadoop facilitates the storage and processing of huge amounts of data. It is considered the cornerstone of any flexible and forward- thinking data platform. - Hadoop is one of the technologies with the greatest potential for growth within the data industry. Companies like Dell, Amazon Web Services, IBM, Yahoo, Microsoft, Google, eBay, and Oracle are firmly committed to Hadoop’s implementation. - One of its major benefits is to help companies with their marketing needs: Identifying customer behavior patterns on the website, providing recommendations and custom targeting, etc. - Hadoop opens great career opportunities up in a wide variety of positions. Given its relevance in many industries, Hadoop specialists can find work as an architect, developer, administrator or data scientist.
  • 58. 58 Data Scientists: Who are they? What do they do? How do they work? “The reality of Data Scientist’s work is that you do not know what you’re going to find behind the data. If you want to work agilely, you have to be flexible and, above all, be very practical”. Álvaro Barbero.
  • 59. 59 Data Scientists: Who are they? What do they do? How do they work? Another frequent interaction in the data scientist‘s work is with databases. Here it’s common to work with NoSQL databases, Apache Storm, and processing tools like Spark, as well as with virtual machines like Storm. Visualization tools are not as important for creating value as they are for convincing. In this sense, they’re associated with the results communication phase and the actual work of rediscovering the value of the data: it’s not the same to trawl through numbers as it is to present them. Programs such as QlikView, Tableau, and Spotfire are used for this. Finally, there’s a pretty unglamorous part of the data scientist’s work, which is a process known as data wrangling. Raw data is often presented in a confused or imperfect way, so the data first needs to be manually collected and cleaned up before it can be converted into a structured format to be explored and analyzed. And this is a task that can take up more than 50% of the data scientist’s working time, using tools like OpenRefine or Fusion Tables. Open source or proprietary software? As in any area where specific software is required, data science professionals can choose between programs marketed by private companies and open-source software. Before embarking on a data science project, it’s very important to know exactly which technological needs will be required to adapt resources and budgets accordingly. This is one of the reasons why more and more companies are opting for the flexibility of open- source alternatives. The variety of options arising from the open-source environment has also helped to expand the use of new technologies and knowledge. Fee-charging commercial tools that dominated the market up until recently are increasingly seeing their prominence diminished in favor of free alternatives. Some experts have warned about manufacturers who try to impose their commercial solutions on businesses, which end up investing heavily in proprietary applications that always have an open-source alternative. This captive nature is replaceable by open-source projects, which are scalable and can offer a performance that’s comparable to proprietary software.
  • 60. 60 Data Scientists: Who are they? What do they do? How do they work? 6. Getting down to it: the work process
  • 61. 61 Data Scientists: Who are they? What do they do? How do they work?
  • 62. 62 Data Scientists: Who are they? What do they do? How do they work? “Some people get scared because they think you want to impose an army of mathematicians on them”. Manuel Marín.
  • 63. 63 Data Scientists: Who are they? What do they do? How do they work? The coexistence between analysts and specialists in a company within mixed teams involves starting out on a journey that will ideally culminate in the opening of new lines of business. Results don’t sprout up from one day to the next, but data science makes once seemingly unattainable milestones feasible. Three obstacles before accessing data Before buckling down to work, the data scientist first must overcome three obstacles: 1. Access to data Many companies may amass huge amounts of customer data, but the nature of their services includes restrictions related to security and privacy. This presents a ‘chicken and egg’ type of dilemma: as a condition for access to data, management will want to know the potential value it can bring to the company. No matter how much the analyst may sound off about this, the real benefits for the company cannot be demonstrated if the necessary data cannot be accessed. How can we get out of this quandary? One way of doing so is by pressing on through scaled models which progressively show the management team the benefits analytics can bring. Access to a sample of data will help create a model that solves a specific problem. A small-scale study of specific customers, which can trigger a decision with immediate impact on the company, is a good starting point. Once the management team can verify the model’s suitability, by applying it to immediate decisions, the first step will have been taken. In this scenario, choosing a suitable problem that has a visible impact on the business is crucial. Therefore, the analyst needs to show their skills, intuition, and knowledge of the business. It goes without saying that a model built from a limited sample will have limited significance, but it is a requirement to fling open the doors of data.
  • 64. 64 Data Scientists: Who are they? What do they do? How do they work? “There will be a lot of demand from companies that we could consider more traditional”. Bosco Aranguren.
  • 65. 65 Data Scientists: Who are they? What do they do? How do they work? 2. Technological means Having overcome the first obstacle, the next one appears: having the necessary technological infrastructure to support access to data, analysis, and the exploration of results. It’s not about looking for a culprit if such means are not available: there might not be anybody in the organization cognizant of the impact that data analysis can have on the business. But, this path offers no shortcuts: if this work isn’t done, someone will have to deal with it. A further problem that often comes up is the decentralization of data. With disaggregated departments and dispersed databases, each with its own access and security protocols, the data scientist, sometimes with the help of an engineer, will have to focus on grouping the data in one place, before they can even get to work. 3. Human resource management Part of data science, like any other science, is exploration. And exploration calls for a great deal of inspiration and the lowest possible number of strict orders that stifle creativity. Passion, perseverance, and curiosity are qualities required in this type of work, and are often not compatible with rigid organizational structures. Therefore, managers must be patient and understanding, and always within the varying pressure dictated by financial results, should grant the data scientist the necessary time and freedom to move forward with his or her investigation. Once the balance has been achieved between what motivates employees and the business’s priorities, the results should start to appear. From data to decision... if nothing goes wrong Once the data is available, the data scientist generally undertakes a scaled process. He or she will have to devote much of their time to cleaning the data, and then set off on a route that begins with small samples and will end, if all goes well, with the extraction of useful conclusions based on a predictive model.
  • 66. 66 Data Scientists: Who are they? What do they do? How do they work? “Oftentimes the reason they end up hiring you astonishes you”. Manuel Marín.
  • 67. 67 Data Scientists: Who are they? What do they do? How do they work? If all goes well... Because data science is not a foolproof process. As in any research project, there are no absolute certainties. Therefore, we must be prepared for possible failure, however hard it may be for companies with high expectations and often do not consider the lack of results to assume. In projects involving vast databases, it’s not always necessary to use all the data. Therefore, it is important to scale: starting with a manageable database, going back and forth, and setting up a permanent dialogue with the person or department most interested in the project. Then, once a small insight into the potential scope has been gained, scaling can begin. The road to this point is sometimes littered with issues related to decision-making: the focus of the investigation, the data to be used, the analytics to be used… Technical knowledge does not guarantee the customization of specific projects, always subject to unforeseen circumstances that are not covered in training centers. The ratio between available information and decisions is very unbalanced towards the former. The process of transforming data into decisions may lead to swathes of information being lost, and the way the process is transmitted plays a role in this journey. An important decision for the company cannot be conveyed if it is not backed up with solid arguments about the source of this conclusion, which data has been used and which processes have been followed to analyze this information and turn it into the nugget that is the decision.
  • 68. 68 Data Scientists: Who are they? What do they do? How do they work? 7. Evaluating the data scientist’s work
  • 69. 69 Data Scientists: Who are they? What do they do? How do they work?
  • 70. 70 Data Scientists: Who are they? What do they do? How do they work? In what industries can we find data scientists? Technology-heavy industries account for the largest accumulation of data scientists. Fuente: “The State of Data Science”, Stitchdata.com
  • 71. 71 Data Scientists: Who are they? What do they do? How do they work? Mathematician George E. P. Box, considered one of the most important statisticians of the twentieth century, famously once said: “All models are wrong, but some are useful”. Wrong in the sense that they cannot capture all the details of a system, because if they did that, the model would be so complex that it would contradict the very purpose of modeling. Yet, that does render models useless; but it does force them to be constantly reinterpreted and validated using empirical data and knowledge of the system itself, regardless of the techniques or algorithms used in the analysis. How can we measure the results of the data scientist’s work? First, we must take the time horizon into account: benefits are never seen in the short term. The data scientist develops a predictive model, whose execution depends on whether it is accepted by management. Machine learning techniques will then be run on the model created to improve accuracy. For team leaders, it is important to emphasize the work’s practical application. It is fundamental, especially in large companies, to ensure that algorithms do not end up simply as beautiful theories. The responsibility of the data scientist can officially be wrapped up once they have finished constructing their model, but personal responsibility presses on, even at the risk of sounding gloomy, until the model is run. Then comes the wait for results. Models are not foolproof: a key parameter may have been left out, either because a wrong variable altering the outcome has been entered or because the subtleties of the business have not been grasped. Execution may also fail: the insight might be good, but it is not put into practice in the right way. The quality of the algorithm is not the exclusive yardstick to measure that data scientist’s performance. Their responsibilities include some sales-related work-dealing with customers, explaining to them what they have found, guiding them on what to do with their data, always using the communication skills that the data scientist - or any member of their team - should hold. Another type of valuation can be extracted from this work. Finally, let’s once again remind ourselves of the importance of the human factor. Data science is not a black box enshrouded in mystery. Data scientists are not oracles, nor are their words prophecies: the algorithm may make a specific prediction, but the option to translate that insight to the business or not, with all the consequences that it may incur, ultimately depends on the person who makes the decision. Hence the importance of the human factor in the whole process.
  • 72. 72 Data Scientists: Who are they? What do they do? How do they work? 8. Trust: an essential component in the data science process.
  • 73. 73 Data Scientists: Who are they? What do they do? How do they work?
  • 74. 74 Data Scientists: Who are they? What do they do? How do they work? “In terms of training, I don’t think there is a gap between Spain and the United States or the United Kingdom”. Pep Porrà.
  • 75. 75 Data Scientists: Who are they? What do they do? How do they work? Data is highly sensitive, especially when working with outside information. In such cases, the customer relationship should be respectful and diplomatic: it’s their business, it’s their data and it’s often their most asset with the most value. In some industries, there is a certain idea of harnessing a return on data, but the lack of experience with big data leads to reservations before they even dare to venture into data analytics. Younger companies are more cautious, perhaps waiting for others in their industry to take the first step. It’s also common for companies to take the big data route but are later reluctant to give up their data, either because they hold back from sharing any conclusions with the market or because they don’t even want analysts to know them. In this context, the most common formula is: acquire the tool, train the team in the tool, and then give support. Another delicate situation arises with the dangers of do-it-yourself data science. There are some people who choose to blindly apply tools only after learning about them superficially, with unpredictable results. This creates a buzz that is detrimental to the entire data science industry, in the sense that companies don’t receive the advertised benefits of big data, without truly understanding why they haven’t reaped the full rewards. There are many disoriented companies, that have heard the fanfare about big data, spend lots of money without knowing what they’re spending it on, or have yet to see the results. They need to be treated sensitively, with sound judgement and common sense, clarifying and simplifying the guidelines for action. In an industry where the raw material is so perilous, trust is essential. Ethics: the essential complement to science The data scientist takes on a strong ethical commitment, in the sense that they must ensure a responsible use of the information given to them. In an increasingly digitalized society where everyone unwittingly and involuntarily leaves trails, it would be possible to invade anybody’s freedom simply by using the appropriate knowledge and powerful servers. But nobody wants that to happen. Ethical commitment is not just a sign of sound judgement; it is also imperative in an information society that may face dangers that are not fully known: mass surveillance, lack of privacy, large-scale loss of data, etc. It is therefore the data scientist’s duty to work transparently, explaining in a simple and accessible way what their job is and how
  • 76. 76 Data Scientists: Who are they? What do they do? How do they work? “Clients sometimes comes across things that they weren’t expecting, and communicating it requires specialists who are very good with people”. Felipe Ortega.
  • 77. 77 Data Scientists: Who are they? What do they do? How do they work? they do it, to quash the threat to privacy that people might often associate with big data. Few people are interested in knowing the intricacies of an algorithm, but they do want an outline of the route that the data follows. One way to ensure that data gets used ethically is to work on open data projects, where anyone can access the data, contributing in some way social awareness and utility. For example, Spanish bank BBVA has launched several of these projects, designed to improve the quality of life of citizens or to optimize efficiency in cities through the intelligent use of information. Open the data, give something back to society, become an aggregated data platform for others to use for the creation of value in cutting-edge projects where altruism replaces the quest for profit. That is the ethical commitment that many data scientists have taken to safeguard the good name of their specialty.
  • 78. 78 Data Scientists: Who are they? What do they do? How do they work? 9. Data scientists in Spain today
  • 79. 79 Data Scientists: Who are they? What do they do? How do they work?
  • 80. 80 Data Scientists: Who are they? What do they do? How do they work? To stay on top of everything and constantly refresh one’s knowledge, curiosity is essential.
  • 81. 81 Data Scientists: Who are they? What do they do? How do they work? Are Spanish data scientists more qualified or less qualified than other nationalities? Is there a shortage of professionals? Will academic programs keep up with the expected demand in the years to come? Overall, experts agree that Spain is at a par with the leading countries in data science. There is no shortage of highly qualified professionals or startups specializing in big data processing which stand out among the most advanced in Europe, if not the world. The professional level is so high that it’s not unreasonable to think of Spain as a global powerhouse in data science. This opportunity must be managed well to make sure it doesn’t fail. As in other scientific disciplines, excellent professionals are going to other countries to pursue their careers. It’s true that money draws professionals to places like California, but a high concentration does not necessarily imply a higher level. For Spanish data scientists to prove their worth, they should start with loving themselves, acting with professionalism and discretion to ensure a promising future. The range of academic programs is also increasingly extensive in both public and private colleges, where there are countless Master’s programs and specialized courses. This mix is indispensable in a discipline that is permanently in coexistence with innovation and research. So, if something were to jeopardize the advancement of data science in Spain, it wouldn’t be the academic level of specialists, but rather some of the endemic problems provoked by how work is organized in Spanish corporations. For example, agility when implementing projects is not comparable to the United States, where there are far fewer bureaucratic obstacles. Similarly, there is still a gap between academia and the business world: there’s a lack of dynamism when integrating the work of a data scientist into the business world. In Spain, there are claims that there is less flexibility in the labor market when it comes to re-training. Once the professional has focused on a career path, taking the risk to change it is more difficult than in other countries, due to a tendency towards convenience or pigeonholing. Therefore, it is important for organizations to support their employees. That said, Spanish professionals, as well as those from Latin American, have a bonus that can give them a competitive advantage over their peers in rest of the world: creativity, understood as the ability to seek out alternative problem-solving processes that nobody else has imagined. And that fits in with and complements the empathy side. In other words, other words, creativity lets Spanish data scientists apply a part of art - the other is science - to problem-solving.
  • 82. 82 Data Scientists: Who are they? What do they do? How do they work? “Everyone must realize that our daily life is going to be very dependent on and influenced by data analysis”. Felipe Ortega.
  • 83. 83 Data Scientists: Who are they? What do they do? How do they work? Who’s making the most out of data science in Spain? Three industries are at the forefront of the implementation of data science in Spain: banking, telecommunications, and tourism. Overall, large companies are investing more resources into data science. These include entities such as Santander, BBVA, Telefónica, Bankinter, Sabadell, La Caixa, Amadeus, Kayak, etc. But this investment isn’t exclusively for large companies. More moderately-sized companies are using data science in a very creative and innovative way, with worldwide recognition of their work. Two examples: Carto http://www.cartodb.com Founded in Madrid in 2012, originally as CartoDB. Its most popular tool is Carto Builder, which allows visualization enthusiasts to build interactive maps from geodata with no programming skills required. With more than 1,400 customers, 200,000 registered users and an office in New York, its goals focus on offering large corporations an optimization tool for decision-making and predicting consumer trends. Stratio http://www.stratio.com Also, founded in 2012 as an offshoot of predecessor Paradigma. Stratio develops platforms and products from big data technologies such as Cassandra, Apache Stark, and proprietary developments. Customers using its real-time processing solution come from banking, insurance, tourism, and retail. More than 25 specialists in big data architecture work out of Stratio’s Madrid headquarters. Stratio also has an office in Palo Alto, California, the heart of Silicon Valley.
  • 84. 84 Data Scientists: Who are they? What do they do? How do they work? 10. Conclusions: still a great deal to be done
  • 85. 85 Data Scientists: Who are they? What do they do? How do they work?
  • 86. 86 Data Scientists: Who are they? What do they do? How do they work? “People ask us: are you opening up data so that everyone can do business? Well, yes: we let others have a better knowledge of reality from our data”. Marcelo Soria.
  • 87. 87 Data Scientists: Who are they? What do they do? How do they work? The analysis of big data has already left behind the emerging technology phase (hype cycle) and is taking hold in many companies. Or, at the very least, certain “core” technologies are, like: distributed databases, real-time processing, large analytical layers, etc. With the initial implementation being wrapped up, data science professionals are treading towards specialization. As the field continues to grow, it is normal for it to split up into specialties, to form an ecosystem. Companies, to some extent, are promoting this trend because they cannot afford to properly compensate large teams of data scientists. The same is happening in training. It’s no longer possible to offer a set of core courses, so the range of academic content is beginning to diversify. As they define their needs, companies will continue to increasingly demand sought-after professionals, who are often awarded grants by the companies that recruit them or guaranteed immediate employment upon completing their education. Lots of companies invest huge sums into market research. Some will realize that data science represents another data source, a new form of RD that converts data into a new value for the company. But big data is still in its teenage years. Many challenges lie ahead, derived from handling large volumes of information and its conversion into useful tools. What’s the adulthood of big data looking like? Attention should be shifted from the “bigness” of data to its application. The famous “Four Vs” of big data (Volume, Velocity, Variety and Veracity) must be expanded to bring in a new concept: Value. This involves reducing the noise of data and increasing its contribution. Data science will mature, strengthen its position, gain recognition as a career and surprise us with future discoveries. It should be designed as a tool to not only bring transparency to the present, but to anticipate the future in a way conducive to business growth.
  • 88. 88 Data Scientists: Who are they? What do they do? How do they work? “It is our duty to give something back to society. With all the information companies have about people, they can greatly improve their lives”. Richard Benjamins.
  • 89. 89 Data Scientists: Who are they? What do they do? How do they work? This will be possible by converting data into knowledge, and that knowledge into practical actions, whether to provide better customer service, boost efficiency through automation, or create new business opportunities by identifying cross-sells or opening new markets. At present, most projects related to data analysis focus on cost optimization and process integration. In the future, predictive analysis will place emphasis on data monetization and the delivery of new applications and business opportunities. Predictive models in cloud environments, parallel data processing or sophisticated machine learning algorithms will optimize or guide the decision-making process. Ultimately, companies will have to reinvent themselves or reinterpret themselves as their business becomes more digital and customer proposals will increasingly depend on lessons learned from data. Companies like Siemens, defined by its CEO defines as “a software company”, have already fully embarked on this process. A key element of this evolution will be existence alongside an environment of experimentation, tolerance, and short development cycles that drive innovation. The companies leading this evolution will be those who place the figure of data scientist at the core of their strategy. This way, they will be able to develop the conditions (talent acquisition, employee commitment and priority-setting) needed to place them at the head of the race to turn data into a long-lasting and tangible competitive advantage. In our daily lives, we are already using applications and products that come from processing a huge amount of data: spam filters in email inboxes, recommendations on social networks, search engine results, medical tests and prescriptions, investment funds, etc. And with the future promised by The Internet of Things, the need to process more and more information will only grow and grow. Our lives may end up highly conditioned, or heavily influenced at the very least, by the analysis of all the data surrounding us. A future, in any case, where all those involved in the analysis of big data should be very cautious with everything related to data privacy and consumer confidence. It doesn’t matter if our data is used to better manage our time or our money, customize the advertising we see or improve our health. If we believe that it will improve our lives, we won’t object to anybody’s use of it.
  • 90. 90 Data Scientists: Who are they? What do they do? How do they work? Annexed.
  • 91. 91 Data Scientists: Who are they? What do they do? How do they work? Business case 1 Commerce360 What are my customers most interested in? On what day does my competition outsell me? Are their items more expensive or cheaper than mine? When do I sell the most? Where do my buyers live? What is their gender, their age, and how much do they spend on every purchase? Any business would like to know the answers to these and similar questions. Large and medium companies can do this by allocating resources to business intelligence, but it’s more difficult for independent traders or local stores. That’s why Spanish bank BBVA has developed Commerce360, a tool that aims to make business intelligence accessible to any company. Based on aggregated and anonymous data from BBVA card payments, the application extracts indicators related to the industry and profile of customers who buy items in a specific area. “Commerce360 is a tool for retailers, where by using our information on card payments we can provide a store with its economic activity, purchasing dynamics, socio-demographic information on what its customers are like, age, gender, where and when they shop, etc., comparing all this with aggregated businesses that are their competition or other businesses in the area that perform the same type of activity,” as Marcelo Soria explains. As a result, retailers once guided by intuition or other traditional methods have access to an analytical tool that lets them discover the origin of their customers, measure their loyalty, study their demographic characteristics and identify high-value customers. “For us it is a very interesting line for democratizing access to data and data-based intelligence. This is the future of retail,” adds Soria.
  • 92. 92 Data Scientists: Who are they? What do they do? How do they work?
  • 93. 93 Data Scientists: Who are they? What do they do? How do they work? Business case 2 Smart Steps SmartSteps is a geo-marketing program developed by Telefónica using data from its mobile phone network. Data is aggregated and extrapolated anonymously to extract information on user trends or behavior patterns in a specific area. The project captures billions of data points from Telefónica’s mobile network, 365 days a year, 24 hours a day. This data is matched with different sociodemographic and mobility indicators (residence, means of transport, age) that can offer companies precise targeting based on the movements of their potential customers. Smart Steps can be applied to any industry in which the movement and knowledge of the user profile are important, such as travel and transport, tourism, or outdoor advertising. For example, local retailers could find out whether participants in an event such as San Fermín are regular or sporadic, where they come from, where they are staying, the length of their visit, etc., and with this information they can tailor their sales approach. It is also useful in the public sector, as knowing people’s movement patterns helps improve traffic management in the city, adapt public transport, or analyze the need to build new infrastructure. In 2014, the program was used to map out the most crime-prone areas in London: the generated algorithm obtained an accuracy of 70% when predicting crime hotspots.
  • 94. 94 Data Scientists: Who are they? What do they do? How do they work?
  • 95. 95 Data Scientists: Who are they? What do they do? How do they work? Business case 3 Home Risk Fire Map 25,000 people are killed or injured in house fires every year in the United States. The American Red Cross aims to reduce the number of victims through an initiative based on big data. The Home Fire Risk Map program identifies the locations most house fire-prone across the country, and will be used by Red Cross volunteers to install smoke alarms and provide fire safety courses where they’re needed most. Data suggests that 60% of fires can be prevented simply by having a working smoke alarm and by knowing what to do in the event of a fire. Using different open data repositories, 50 volunteers worked for over a year to create a map that identifies high-risk areas throughout the country. First, they built a model to identify those communities with the least amount of smoke alarm coverage. After that, another algorithm predicted the places most prone to fires. Lastly, a third program calculated the likelihood of injury or death when a home fire does occur. The three models and their results come together on the map presented here. Thanks to this initiative launched in June 2016, the first month saw the installation of 400,000 smoke alarms in households across the United States, with the goal of reaching 2.5 million alarms. Smoke alarms have an average lifespan of 10 years, which signals that a year’s work is expected to result in medium-term benefits.
  • 96. 96 Data Scientists: Who are they? What do they do? How do they work?
  • 97. 97 Data Scientists: Who are they? What do they do? How do they work? Business case 4 The Huffington Post The Huffington Post is one of the widest-read digital media resources in the world. And an environment where data analysts enjoy almost as much prominence as editors, since much of their success is due to big data, which optimizes content, authenticates comments, boosts advertising clout, and improves user experience. Real-time statistics and analytical platforms define the editorial process. For HuffPost it is essential to provide the right content to each reader straight away and in the right format. For example, data analytics for the Parents section showed that this demographic mainly uses mobile devices to connect, especially when children are in bed, and is more active on weekend mornings. Content and advertising is tailored to these habits. The huge number of comments received on the website (more than 300 million in 2013) also encouraged HuffPost executives to debug data to improve user experience. This was achieved by means of conjoint analysis, a statistical technique used to evaluate the different characteristics of a product or service. The analysis found that the quality of comments increased by geographic proximity and in identified users, which led THP to banning anonymous comments. Big data was also used to improve user loyalty. In collaboration with technology company Gravity, HuffPost identified topics of interest for its readers, connecting the most compelling content for each type of reader through what it calls “passive personalization”. The technology also provides information on where each reader accesses content, and helps optimize navigation around the website. With an average of 10 to 12 articles read in each session, the goal is to reach 15.
  • 98. 98 Data Scientists: Who are they? What do they do? How do they work?
  • 99. 99 Data Scientists: Who are they? What do they do? How do they work? Business case 5 Hillary Clinton’s 2016 campaign Few Americans will have heard of the name Elan Kriegel. Yet millions of them were in his sight during the 2016 presidential campaign. Kriegel led a team of 60 mathematicians and analysts responsible for guiding each of the Democrat candidate’s promotional activities in the campaign, from the party primaries up to the final vote with absolute precision. For example, Kriegel’s team developed an algorithm that decided where to spend each cent of the $60 million TV advertising budget during the primaries. With hundreds of local and state TV networks scattered throughout the country, the victory over Bernie Sanders was molded by carefully choosing the states, networks, programs, and schedules where Clinton would convey her message to voters. Unlike in other countries, campaigns for elections in the United States get fully customized. Key decisions were made based on the work of analysts, such as at what time and how to send email messages to voters, which doors canvassers knock, which numbers phone bankers would dial, which voters to target via a Facebook ad, and which to address through regular mail. This meticulous work turned Clinton’s campaign into more of a mathematical than inspirational exercise. A ground-breaking and efficient campaign organized around models defined by data analysis, and which paves the way for a new era in the definition of political campaigns, based on data culture. And in the meantime, Kriegel’s team is already incubating the next generation of talent within the Democratic Party, unknown names for now but which will play a key role in 2020.
  • 100. #REBELTHINKING REBEL THINKERS Iñaki Bagazgoitia Mar Castaño Carlos Corredor Laura Dinneen Carlota García-Abril Amelia Hernández Natasha Morrison Ellen Thomas HAVE COLLABORATED Fuencisla Clemares Bosco Aranguren Richard Benjamins Marcelo Soria Álvaro Barbero Alejandro Rodríguez Manuel Marín Esteban Moro Felipe Ortega Pep Porrà ACKNOWLEDGMENT