Anne-Sophie Roessler, International Business Developer at Dataiku presented "3 ways to Fail your Data Lab Implementation" as part of the Big Data, Berlin v 8.0 meetup organised on the 14th of July 2016 at the WeWork headquarters.
4. Why a data Lab?
• 1 single Workflow : from a segmentated workflow to a transversal one
• Several use cases: Ability to adress many different data centric topics within a
single unit
• Multiple competences: Business focused approached mixing many different
competences
• End to end projects : combining data from different sources to handle several
aspects on a single topic
5. Deployment ofthe
predictions
Dataiku DSSfor fraud prediction
Client service
Sensor data
Garage data
Administration
• 1 Project Owner (IT)
• 1 Project Manager (Business)
• 1 Data scientist in house
• 3 data scientist sfrom 3 different firms
• 3 consultants from 3 different firms
• 1 architect (external)
Accepted file
INVESTIGATE !
Thetransactions areblocked
dependingontheir gap with the
business rules and behavioral
patterns
7. Focuson the framework,not on the input
Data
Acquisition &
Understanding
Data
Preparation
Model Creation
Evaluation Deployment
Scored
dataset
Scored
dataset
Iteration 1
Iteration 2
Iteration n
✓ Read and import raw data
✓ Detect schemas and structure
✓ Analyze distributions
✓ Assess quality: outliers,
missing values...
✓ Performance metrics
✓ Robustness & generalization
(cross validation)
✓ Insights (eg variable importance)
✓ Create derived and
aggregated variables
→ Analytical dataset
→ Report
✓ Feature selection
✓ Compare algorithms
✓ Scoring engine
✓ Publish predictions
✓ Monitor performance
✓ API
Business
Understanding
Adapted from the CRISP-DM methodology
Dataset
1
Dataset
2
Dataset
n
8. People and Governance
?
PolyglottVS dictator
Problems :
• Collaboration between
technical and non
technical profiles inside
a single project
• Nécessary
collaboration between
business and tech
teams to adress
transversal projects
accurately
Focus :
• Promote diversity
• …within a workflow
centric environment
9. End to end, from prototyping into production
Do it you way …
11. DataLab Organisation
Data Lab
Lab Environment
MultydisciplinaryTeam:
Direction/ Project Management
Business Analysts
Data Miners / Data Scientists
Production Environment
Business needs
Internal Data
sources
External
datasources
Missions :
Priorisationof the business needs
Prototyping /Agile solution engineering
Support for Apps deployment
Business Applications
Marketing CampaignAutomation
Reporting webanalytics
Data as A Service Platform
Conceptionof“DATAPRODUCTS”
Integration of DataProducts
OptimisationEngine
Real Time Scoring
Data Flow
Insights & Services
Processing chain
API Deployment