SlideShare a Scribd company logo
1 of 29
50 AVENUE DES CHAMPS-ÉLYSÉES 75008 PARIS > FRANCE > WWW.OCTO.COM
HADOOP SUMMIT 2016 - DUBLIN
PRACTICAL ADVICE TO BUILD A DATA DRIVEN
COMPANY
Simon MABY
@simonmaby
2OCTO TECHNOLOGY > THERE IS A BETTER WAY
Story : Data Driven E-Commerce
3
A continuous improvement of all business
processes, through a smart use of the data, all the
time, everywhere and to all purposes
OCTO TECHNOLOGY > THERE IS A BETTER WAY
4
BEING DATA DRIVEN IS BEING LEAN
OCTO TECHNOLOGY > THERE IS A BETTER WAY
IDEA
CODEDATA
BUILD
MEASURE
LEARN
5
REQUIREMENTS
OCTO TECHNOLOGY > THERE IS A BETTER WAY
IDEA
CODE
DATA Data must be easily accessible
Business must be aware of opportunities to use algorithms
Datascience projects should have the lowest time to market
possible
6
DATA
7
DATA
Data must be easily accessible
OCTO TECHNOLOGY > THERE IS A BETTER WAY
8
Your Datalake is a service to your company.
It should be managed like a startup
Your employees are you first clients. The more
they use it, the more you are Data Driven
OCTO TECHNOLOGY > THERE IS A BETTER WAY
9
FOCUS ON USABILITY OVER ARCHITECTURE
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Services
Datalake
Datalake Team :
OPS - DEVs - DESIGNERS
End Users and projects
Design services
for usability and
grant support
Gather
requirements
and usage
metrics
10
FOCUS ON USABILITY OVER ARCHITECTURE : EXAMPLES
 How simple is it to share data to other projects?
 How simple is it to suscribe to a data feed?
 Is it possible to run a full search on available datasets?
 Is it possible to ask other projects for details about their data through a social
network?
 Auto-completion over SQL request from other projects?
 Bookmarking, sharing, upvoting datasets, tagging metadata…
OCTO TECHNOLOGY > THERE IS A BETTER WAY
11
CODE
12
CODE
Datascience projects should have the lowest time
to market possible
OCTO TECHNOLOGY > THERE IS A BETTER WAY
13
EXPLORATION VERSUS PREDICTION
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Explore as quickly as possible
Deliver frequently in production
14OCTO TECHNOLOGY > THERE IS A BETTER WAY
(Not so)
Big Data Infrastructure
(For exploration)
15
WHAT IF WE GIVE LESS DATA TO OUR ALGORITHMS?
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Cf. Zoltan Prekopcsak, Hadoop
Summit EU. 2015
16
FEATURE TEAMS TO DELIVER CODE READY FOR PRODUCTION
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Business
rep.
Developer
Data Sc.
17
MESSAGE BROKER TO REUSE DATA FLOWS
OCTO TECHNOLOGY > THERE IS A BETTER WAY
App A App B
DW
DB X
App A App B
DW DB X
Kafka
App C
? ? ?
- Custom dev
- Data formats?
- SLA?
- Scheduling?
…
- Standard format
- Prod Ready
- Exploration and prod will
share same formats
18
KAPPA ARCHITECTURE : EVERYTHING IS A STREAM
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Stream Data Stream Processing Serving DB
Topic Streaming app v1
Streaming app v2
Result data v1
Result data v2
Kafka
 Batch jobs are just historical data you send into a streaming app
 Application code is decoupled from technical requirements
 One shot exploration code respecting the stream abstraction can go in
production easily
19
IDEAS
20
IDEAS
Business must be aware of the opportunities to
use algorithms
OCTO TECHNOLOGY > THERE IS A BETTER WAY
21
MIX THESE PEOPLE
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Business
Knows what is
valuable
Data Scientist
Knows what is
feasible
Culture &
Collaboration
22
FEATURE TEAMS ONCE AGAIN
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Business
rep.
Developer
Data Sc.
23
EXPLAIN THEM THAT MACHINE LEARNING IS EASY (IT’S METHODOLOGY)
OCTO TECHNOLOGY > THERE IS A BETTER WAY
24
EXPLAIN THEM THAT MACHINE LEARNING IS EASY (IT’S MAGIC)
OCTO TECHNOLOGY > THERE IS A BETTER WAY
25
SPEND TIME TOGETHER
 Show them the data
 Pair Programming
 Swap roles for one day
OCTO TECHNOLOGY > THERE IS A BETTER WAY
26
SOFTWARE IS EATING THE WORLD : MAKE THEM CODE
27OCTO TECHNOLOGY > THERE IS A BETTER WAY
Story : Octo Datascience Competition Platform
HOW WIDELY DATADRIVEN IS YOUR
COMPANY?
 Everybody is willing to make value out of
the available data
 Data serves not only the core business but
every single function
 Data is used in day-to-day activity in real-
time
OCTO TECHNOLOGY > THERE IS A BETTER WAY
HOW DEEPLY DATADRIVEN IS YOUR
COMPANY?
OCTO TECHNOLOGY > THERE IS A BETTER WAY
 You are using cutting edges algorithms
to automate processes
 You are used to A/B testing based on
data every week
 You cross multiple data sources to build
insights and models

More Related Content

What's hot

Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle TechnologiesOleksii Movchaniuk
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeDataWorks Summit
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsDataWorks Summit
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...DataWorks Summit
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...DataWorks Summit
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 

What's hot (20)

Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-Time
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Rob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San JoseRob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San Jose
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 

Viewers also liked

Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Carl Anderson
 
Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -...
 Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -... Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -...
Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -...Hugues Rey
 
Data-driven media relations
Data-driven media relationsData-driven media relations
Data-driven media relationsAndrii Degeler
 
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven CompanyPyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven CompanyArik Fraimovich
 
The art and science of data-driven journalism
The art and science of data-driven journalism The art and science of data-driven journalism
The art and science of data-driven journalism Alexander Howard
 
Building a Data Driven Company
Building a Data Driven CompanyBuilding a Data Driven Company
Building a Data Driven CompanyMaciej Mróz
 
The Road to Becoming a Data Driven Company
The Road to Becoming a Data Driven CompanyThe Road to Becoming a Data Driven Company
The Road to Becoming a Data Driven CompanyFramed Data
 
What Does It Mean to Digitize a Company?
What Does It Mean to Digitize a Company?What Does It Mean to Digitize a Company?
What Does It Mean to Digitize a Company?Teradata
 
How to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsHow to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsLooker
 
Building a data-driven culture
Building a data-driven cultureBuilding a data-driven culture
Building a data-driven cultureOlof Hoverfält
 
Harvesting business Value with Data Science
Harvesting business Value with Data ScienceHarvesting business Value with Data Science
Harvesting business Value with Data ScienceInfoFarm
 
Creating a Data-Driven Organization (Data Day Seattle 2015)
Creating a Data-Driven Organization (Data Day Seattle 2015)Creating a Data-Driven Organization (Data Day Seattle 2015)
Creating a Data-Driven Organization (Data Day Seattle 2015)Carl Anderson
 

Viewers also liked (20)

Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015
 
Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -...
 Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -... Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -...
Predictions 2015: Meaningful & Data-Driven - Marketing Media Communication -...
 
Data-driven media relations
Data-driven media relationsData-driven media relations
Data-driven media relations
 
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven CompanyPyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
PyData London 2015 - How We Turned EverythingMe Into a Data Driven Company
 
The art and science of data-driven journalism
The art and science of data-driven journalism The art and science of data-driven journalism
The art and science of data-driven journalism
 
Building a Data Driven Company
Building a Data Driven CompanyBuilding a Data Driven Company
Building a Data Driven Company
 
The Road to Becoming a Data Driven Company
The Road to Becoming a Data Driven CompanyThe Road to Becoming a Data Driven Company
The Road to Becoming a Data Driven Company
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
What Does It Mean to Digitize a Company?
What Does It Mean to Digitize a Company?What Does It Mean to Digitize a Company?
What Does It Mean to Digitize a Company?
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
 
How to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to InsightsHow to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to Insights
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
Building a data-driven culture
Building a data-driven cultureBuilding a data-driven culture
Building a data-driven culture
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?
 
Hadoop Everywhere
Hadoop EverywhereHadoop Everywhere
Hadoop Everywhere
 
Harvesting business Value with Data Science
Harvesting business Value with Data ScienceHarvesting business Value with Data Science
Harvesting business Value with Data Science
 
Creating a Data-Driven Organization (Data Day Seattle 2015)
Creating a Data-Driven Organization (Data Day Seattle 2015)Creating a Data-Driven Organization (Data Day Seattle 2015)
Creating a Data-Driven Organization (Data Day Seattle 2015)
 

Similar to Practical advice to build a data driven company

big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork OCTO Technology Suisse
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranJoseph Glorieux
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale
 
Anatomy of a data science project
Anatomy of a data science projectAnatomy of a data science project
Anatomy of a data science projectAdam Sroka
 
Introduction To Denodo March 2009
Introduction To Denodo March 2009Introduction To Denodo March 2009
Introduction To Denodo March 2009GladstoneUSA
 
IBM Bluemix Nice Meetup #1 - CEEI NCA - 20160630 -
IBM Bluemix Nice Meetup #1 - CEEI NCA - 20160630 - IBM Bluemix Nice Meetup #1 - CEEI NCA - 20160630 -
IBM Bluemix Nice Meetup #1 - CEEI NCA - 20160630 - IBM France Lab
 
Fifth Edition Architecture Week @Gothenburg 141009
Fifth Edition Architecture Week @Gothenburg 141009Fifth Edition Architecture Week @Gothenburg 141009
Fifth Edition Architecture Week @Gothenburg 141009Capgemini
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopOCTO Technology
 
FasterCapital Acceleration Program 2nd Round 2016
FasterCapital Acceleration Program 2nd Round 2016FasterCapital Acceleration Program 2nd Round 2016
FasterCapital Acceleration Program 2nd Round 2016FasterCapital
 
InfoRepos Academy Introduction v1.1 - IIOT Experiential Learning Program
InfoRepos Academy  Introduction v1.1 - IIOT Experiential Learning ProgramInfoRepos Academy  Introduction v1.1 - IIOT Experiential Learning Program
InfoRepos Academy Introduction v1.1 - IIOT Experiential Learning ProgramInfoRepos Technologies
 
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...Santiago Cabrera-Naranjo
 
Harness the Power of Big Data with Oracle
Harness the Power of Big Data with OracleHarness the Power of Big Data with Oracle
Harness the Power of Big Data with OracleSai Janakiram Penumuru
 
Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan Pal
 
Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan Pal
 
The latest trend in Engineering & Technology.pptx
The latest trend in Engineering & Technology.pptxThe latest trend in Engineering & Technology.pptx
The latest trend in Engineering & Technology.pptxssuserfdb139
 
Maciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The TradeMaciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The TradeCodiax
 
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssenDatenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssenDenodo
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesJeff Bertman
 
World’s 10 Best Data Integration Solution Providers 2022.pdf
World’s 10 Best Data Integration Solution Providers 2022.pdfWorld’s 10 Best Data Integration Solution Providers 2022.pdf
World’s 10 Best Data Integration Solution Providers 2022.pdfInsightsSuccess4
 

Similar to Practical advice to build a data driven company (20)

big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écran
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Anatomy of a data science project
Anatomy of a data science projectAnatomy of a data science project
Anatomy of a data science project
 
Is IIOT Right for You?
Is IIOT Right for You?Is IIOT Right for You?
Is IIOT Right for You?
 
Introduction To Denodo March 2009
Introduction To Denodo March 2009Introduction To Denodo March 2009
Introduction To Denodo March 2009
 
IBM Bluemix Nice Meetup #1 - CEEI NCA - 20160630 -
IBM Bluemix Nice Meetup #1 - CEEI NCA - 20160630 - IBM Bluemix Nice Meetup #1 - CEEI NCA - 20160630 -
IBM Bluemix Nice Meetup #1 - CEEI NCA - 20160630 -
 
Fifth Edition Architecture Week @Gothenburg 141009
Fifth Edition Architecture Week @Gothenburg 141009Fifth Edition Architecture Week @Gothenburg 141009
Fifth Edition Architecture Week @Gothenburg 141009
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
FasterCapital Acceleration Program 2nd Round 2016
FasterCapital Acceleration Program 2nd Round 2016FasterCapital Acceleration Program 2nd Round 2016
FasterCapital Acceleration Program 2nd Round 2016
 
InfoRepos Academy Introduction v1.1 - IIOT Experiential Learning Program
InfoRepos Academy  Introduction v1.1 - IIOT Experiential Learning ProgramInfoRepos Academy  Introduction v1.1 - IIOT Experiential Learning Program
InfoRepos Academy Introduction v1.1 - IIOT Experiential Learning Program
 
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
 
Harness the Power of Big Data with Oracle
Harness the Power of Big Data with OracleHarness the Power of Big Data with Oracle
Harness the Power of Big Data with Oracle
 
Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013
 
Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013
 
The latest trend in Engineering & Technology.pptx
The latest trend in Engineering & Technology.pptxThe latest trend in Engineering & Technology.pptx
The latest trend in Engineering & Technology.pptx
 
Maciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The TradeMaciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The Trade
 
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssenDatenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
Datenstrategie der Zukunft - Technologietrends, die Sie kennen müssen
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
 
World’s 10 Best Data Integration Solution Providers 2022.pdf
World’s 10 Best Data Integration Solution Providers 2022.pdfWorld’s 10 Best Data Integration Solution Providers 2022.pdf
World’s 10 Best Data Integration Solution Providers 2022.pdf
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Practical advice to build a data driven company

  • 1. 50 AVENUE DES CHAMPS-ÉLYSÉES 75008 PARIS > FRANCE > WWW.OCTO.COM HADOOP SUMMIT 2016 - DUBLIN PRACTICAL ADVICE TO BUILD A DATA DRIVEN COMPANY Simon MABY @simonmaby
  • 2. 2OCTO TECHNOLOGY > THERE IS A BETTER WAY Story : Data Driven E-Commerce
  • 3. 3 A continuous improvement of all business processes, through a smart use of the data, all the time, everywhere and to all purposes OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 4. 4 BEING DATA DRIVEN IS BEING LEAN OCTO TECHNOLOGY > THERE IS A BETTER WAY IDEA CODEDATA BUILD MEASURE LEARN
  • 5. 5 REQUIREMENTS OCTO TECHNOLOGY > THERE IS A BETTER WAY IDEA CODE DATA Data must be easily accessible Business must be aware of opportunities to use algorithms Datascience projects should have the lowest time to market possible
  • 7. 7 DATA Data must be easily accessible OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 8. 8 Your Datalake is a service to your company. It should be managed like a startup Your employees are you first clients. The more they use it, the more you are Data Driven OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 9. 9 FOCUS ON USABILITY OVER ARCHITECTURE OCTO TECHNOLOGY > THERE IS A BETTER WAY Services Datalake Datalake Team : OPS - DEVs - DESIGNERS End Users and projects Design services for usability and grant support Gather requirements and usage metrics
  • 10. 10 FOCUS ON USABILITY OVER ARCHITECTURE : EXAMPLES  How simple is it to share data to other projects?  How simple is it to suscribe to a data feed?  Is it possible to run a full search on available datasets?  Is it possible to ask other projects for details about their data through a social network?  Auto-completion over SQL request from other projects?  Bookmarking, sharing, upvoting datasets, tagging metadata… OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 12. 12 CODE Datascience projects should have the lowest time to market possible OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 13. 13 EXPLORATION VERSUS PREDICTION OCTO TECHNOLOGY > THERE IS A BETTER WAY Explore as quickly as possible Deliver frequently in production
  • 14. 14OCTO TECHNOLOGY > THERE IS A BETTER WAY (Not so) Big Data Infrastructure (For exploration)
  • 15. 15 WHAT IF WE GIVE LESS DATA TO OUR ALGORITHMS? OCTO TECHNOLOGY > THERE IS A BETTER WAY Cf. Zoltan Prekopcsak, Hadoop Summit EU. 2015
  • 16. 16 FEATURE TEAMS TO DELIVER CODE READY FOR PRODUCTION OCTO TECHNOLOGY > THERE IS A BETTER WAY Business rep. Developer Data Sc.
  • 17. 17 MESSAGE BROKER TO REUSE DATA FLOWS OCTO TECHNOLOGY > THERE IS A BETTER WAY App A App B DW DB X App A App B DW DB X Kafka App C ? ? ? - Custom dev - Data formats? - SLA? - Scheduling? … - Standard format - Prod Ready - Exploration and prod will share same formats
  • 18. 18 KAPPA ARCHITECTURE : EVERYTHING IS A STREAM OCTO TECHNOLOGY > THERE IS A BETTER WAY Stream Data Stream Processing Serving DB Topic Streaming app v1 Streaming app v2 Result data v1 Result data v2 Kafka  Batch jobs are just historical data you send into a streaming app  Application code is decoupled from technical requirements  One shot exploration code respecting the stream abstraction can go in production easily
  • 20. 20 IDEAS Business must be aware of the opportunities to use algorithms OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 21. 21 MIX THESE PEOPLE OCTO TECHNOLOGY > THERE IS A BETTER WAY Business Knows what is valuable Data Scientist Knows what is feasible Culture & Collaboration
  • 22. 22 FEATURE TEAMS ONCE AGAIN OCTO TECHNOLOGY > THERE IS A BETTER WAY Business rep. Developer Data Sc.
  • 23. 23 EXPLAIN THEM THAT MACHINE LEARNING IS EASY (IT’S METHODOLOGY) OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 24. 24 EXPLAIN THEM THAT MACHINE LEARNING IS EASY (IT’S MAGIC) OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 25. 25 SPEND TIME TOGETHER  Show them the data  Pair Programming  Swap roles for one day OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 26. 26 SOFTWARE IS EATING THE WORLD : MAKE THEM CODE
  • 27. 27OCTO TECHNOLOGY > THERE IS A BETTER WAY Story : Octo Datascience Competition Platform
  • 28. HOW WIDELY DATADRIVEN IS YOUR COMPANY?  Everybody is willing to make value out of the available data  Data serves not only the core business but every single function  Data is used in day-to-day activity in real- time OCTO TECHNOLOGY > THERE IS A BETTER WAY
  • 29. HOW DEEPLY DATADRIVEN IS YOUR COMPANY? OCTO TECHNOLOGY > THERE IS A BETTER WAY  You are using cutting edges algorithms to automate processes  You are used to A/B testing based on data every week  You cross multiple data sources to build insights and models

Editor's Notes

  1. When you are able to evaluate the returns on your data analysis investments The true revolution is analytics and data science and how it is democratized and easy is to use
  2. How many views on which dashboards? Trends? Useless ones? Dashboard recommendation? Data items search : i want to search for any dataset based on it’s content and semantics Code and query transparency : what people query the most on the database? How they do it? Easy pub/sub Data brokering. Hey, as i Data Scientist i suscribe to this data stream. Free search Vision client 360 Interfaces webs intéractives Dashboards personnalisés Abonnement à des Data News Réseau social d’entreprise autour de contenu data-oriented THIS IS NOT MAINLY ABOUT SCALABILITY
  3. Example : instead of offering many complex services, just provide Hive/HDFS/Spark access with good guarantees and secured access Offer software factories
  4. Rely on python and R : Maturity of Big Data framework frameworks is not that good for data exploration Most of the time you don’t even have Big Data
  5. Under Sampling services over Big Data
  6. Les outils des Data Scientists ne sont pas taillés pour de la production Les outils principalement utilisés ne scalent pas et produisent du code de mauvaise qualité (ipython, R Studio…) et se reposent sur des librairies dont la stabilité est variable Les outils du marché présentés comme les incontournables sont lents à utiliser, et amènent souvent à des architectures éléphantesques. De plus leurs fonctionnalités en Data Science sont limitées (Spark Mllib par ex.) D’un projet à l’autre les librairies pour l’exploration peuvent varier énormément ou devenir très spécifiques (Vowpal wabbit, Deep Learning, Code custom…) La mise en production pose de nombreuses questions : Les choix d’architecture limiteront ils le modèle d’un point de vue scientifique? Doit-on simplifier les modèles au profit de l’opérabilité? Quelles sont les performances techniques attendues? Sont-elles en adéquation avec les librairies ou les choix scientifiques réalisés lors des études? Quelles différences entre l’étude hors ligne et un vrai modèle en production? étudier les effets d’hidden feedback loop, mettre en place de l’A/B testing
  7. One topic per entity Everything is avro The integration effort is provided by the datalake team Once its done, the format is known and its easy to get data À gauche : en dev je dev des flux, en prod j’en dev d’autres à cause d’autres contraintes (ou parce que j’avais fait un import one shot) À droite : quand je suis dev j’ai une interface normalisée, quand je passe en prod j’ai la même interface de lecture que lors de l’exploration
  8. Why its good : both testing on historical and new data To run in production is transparent whatever the requirements are, batch or streaming You make data exploration with this in mind, and not on static data that will need development
  9. Rituals of a team – The 10 minutes rule, morning standups, Brown Bag Lunches, monthly conferences
  10. Machine learning is 90% about methodology : it is understandable from a non technical person. How do you define the problem? What is your target? What is the question you're trying to answer? What is an example in your dataset? How do you choose and generate features that are relevant to your target? How do you cross-validate the results, what does the validation metric mean to the business? How do you make profit out of the model in production? Is there any particular issue such as hidden feedback loops or presentation biais? Mention TRAINING.
  11. Data scientist and head of marketing
  12. Coding CDO
  13. Gamification / Engagement / Aligning the culture between different departments and profiles