1. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Michal Brys
Data Scientist @ Allegro
Measure Camp | London, 10th September 2016
Find signal in noise.
6 steps to find value from messy data.
2. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Michal Brys
Data Scientist @ Allegro
Specialized also in:
+ Google Analytics
+ Google Tag Manager
michalbrys.com
about.me/michal.brys
3. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Framework for data analysis
CRISP-DM
- Cross Industry Standard Process for Data Mining
- Set up in 1996 (SPSS, Teradata, Daimler AG, NCR ,OHRA)
- Still works!
Read more: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
4. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
1: Business Understanding
- Define analysis goal
- What you want to achieve by analysis?
- Check business context
- Don’t be afraid to ask questions
5. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
1: Business Understanding
I want to select customers group with the
highest probability of response (...)
to target marketing campaign for this group.
6. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
- Collect data
Check:
- What all variables in dataset means
- How about missing values?
- Exploratory data analysis (EDA)
7. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
Google Analytics with client id as custom dimension
- Source: Cookies + JavaScript tracker
- Processed by Google Analytics
- No access to raw data
8. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
10 000 records with 11 variables
9. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
3: Data Preparation
- Data cleaning
- Prepare new variables, transform data
- Remove missing and outstanding values
- Check distributions
10. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
3: Data Preparation
Example: Fix variables type.
11. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
4: Modeling
- Classification problem
- Prepare models by different methods
- Training and test subset
- CART
C5.0
Logit Regression
13. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
6: Deployment
- Prepare report
- Implement in system
- Bulid product
- ...
14. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Summary
CRISP-DM
+ Keeps business goal in mind
+ Result will answer for initial question
+ Reproducible and documented process
Image: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#/media/File:CRISP-DM_Process_Diagram.png
15. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
More inspiration
“Data Mining Methods and Models”
Daniel T. Larose
“The Signal and the Noise”
Nate Silver
16. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
One more thing...
michalbrys.gitbooks.io/r-google-analytics/
17. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Q&A
Michal Brys
about.me/michal.brys
github.com/michalbrys