SlideShare a Scribd company logo
1 of 17
Download to read offline
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Michal Brys
Data Scientist @ Allegro
Measure Camp | London, 10th September 2016
Find signal in noise.
6 steps to find value from messy data.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Michal Brys
Data Scientist @ Allegro
Specialized also in:
+ Google Analytics
+ Google Tag Manager
michalbrys.com
about.me/michal.brys
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Framework for data analysis
CRISP-DM
- Cross Industry Standard Process for Data Mining
- Set up in 1996 (SPSS, Teradata, Daimler AG, NCR ,OHRA)
- Still works!
Read more: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
1: Business Understanding
- Define analysis goal
- What you want to achieve by analysis?
- Check business context
- Don’t be afraid to ask questions
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
1: Business Understanding
I want to select customers group with the
highest probability of response (...)
to target marketing campaign for this group.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
- Collect data
Check:
- What all variables in dataset means
- How about missing values?
- Exploratory data analysis (EDA)
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
Google Analytics with client id as custom dimension
- Source: Cookies + JavaScript tracker
- Processed by Google Analytics
- No access to raw data
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
10 000 records with 11 variables
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
3: Data Preparation
- Data cleaning
- Prepare new variables, transform data
- Remove missing and outstanding values
- Check distributions
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
3: Data Preparation
Example: Fix variables type.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
4: Modeling
- Classification problem
- Prepare models by different methods
- Training and test subset
- CART
C5.0
Logit Regression
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
5: Evaluation
Model
True
Negative
True
Positive
False
Negative
False
Positive
Total Error
Rate
CART 5081 3150 1080 689 17.69%
C5.0 4089 2701 1606 1604 32.10%
Logit Regression 5871 2107 1307 715 20.22%
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
6: Deployment
- Prepare report
- Implement in system
- Bulid product
- ...
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Summary
CRISP-DM
+ Keeps business goal in mind
+ Result will answer for initial question
+ Reproducible and documented process
Image: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#/media/File:CRISP-DM_Process_Diagram.png
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
More inspiration
“Data Mining Methods and Models”
Daniel T. Larose
“The Signal and the Noise”
Nate Silver
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
One more thing...
michalbrys.gitbooks.io/r-google-analytics/
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Q&A
Michal Brys
about.me/michal.brys
github.com/michalbrys

More Related Content

What's hot

Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...semanticsconference
 
Evolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and ApplicationsEvolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهWeb Standards School
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million WebsitesChris Bizer
 
Graph-based Network & IT Management.
Graph-based Network & IT Management.Graph-based Network & IT Management.
Graph-based Network & IT Management.Linkurious
 
Graph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise PapersGraph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise PapersLinkurious
 
Fast Data processing with RFX
Fast Data processing with RFXFast Data processing with RFX
Fast Data processing with RFXTrieu Nguyen
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarSpazioDati
 
Using graph technology for multi-INT investigations
Using graph technology for multi-INT investigationsUsing graph technology for multi-INT investigations
Using graph technology for multi-INT investigationsLinkurious
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsLinkurious
 
Web scraping
Web scrapingWeb scraping
Web scrapingSelecto
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...ErhardRahm
 
Searching Linked Data with Spinque
Searching Linked Data with SpinqueSearching Linked Data with Spinque
Searching Linked Data with SpinqueArjen de Vries
 
Cloud architectures for data science
Cloud architectures for data scienceCloud architectures for data science
Cloud architectures for data scienceMargriet Groenendijk
 
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...Martin Hawksey
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseGiorgio Orsi
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Stefan Urbanek
 

What's hot (19)

Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Evolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and ApplicationsEvolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and Applications
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million Websites
 
Graph-based Network & IT Management.
Graph-based Network & IT Management.Graph-based Network & IT Management.
Graph-based Network & IT Management.
 
Graph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise PapersGraph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise Papers
 
Fast Data processing with RFX
Fast Data processing with RFXFast Data processing with RFX
Fast Data processing with RFX
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
Data Science in the Cloud
Data Science in the CloudData Science in the Cloud
Data Science in the Cloud
 
Using graph technology for multi-INT investigations
Using graph technology for multi-INT investigationsUsing graph technology for multi-INT investigations
Using graph technology for multi-INT investigations
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projects
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
 
Searching Linked Data with Spinque
Searching Linked Data with SpinqueSearching Linked Data with Spinque
Searching Linked Data with Spinque
 
Cloud architectures for data science
Cloud architectures for data scienceCloud architectures for data science
Cloud architectures for data science
 
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
 

Viewers also liked

Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.Michal Brys
 
Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.Michal Brys
 
Cloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud PlatformCloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud PlatformMichal Brys
 
Google Analytics + R
Google Analytics + RGoogle Analytics + R
Google Analytics + RMichal Brys
 
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi GoogleBigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi GoogleMichal Brys
 
Poznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google AnalyticsPoznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google AnalyticsMichal Brys
 
Web users tracking behind the scenes
Web users tracking behind the scenesWeb users tracking behind the scenes
Web users tracking behind the scenesMichal Brys
 

Viewers also liked (7)

Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.
 
Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.
 
Cloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud PlatformCloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud Platform
 
Google Analytics + R
Google Analytics + RGoogle Analytics + R
Google Analytics + R
 
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi GoogleBigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
 
Poznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google AnalyticsPoznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google Analytics
 
Web users tracking behind the scenes
Web users tracking behind the scenesWeb users tracking behind the scenes
Web users tracking behind the scenes
 

Similar to Find signal in noise.

Itility marianne faro
Itility marianne faroItility marianne faro
Itility marianne faroBigDataExpo
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBigML, Inc
 
SC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope projectSC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope projectBigData_Europe
 
BDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBigData_Europe
 
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...Patrick Guimonet
 
How to become the best datascientist in Europe
How to become the best datascientist in EuropeHow to become the best datascientist in Europe
How to become the best datascientist in EuropeDigitYser
 
BDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBigData_Europe
 
From an idea to an apache tlp
From an idea to an apache tlpFrom an idea to an apache tlp
From an idea to an apache tlpChristofer Dutz
 
Elasticsearch – Introducing New Containerized Metrics
Elasticsearch – Introducing New Containerized MetricsElasticsearch – Introducing New Containerized Metrics
Elasticsearch – Introducing New Containerized MetricsLetsConnect
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesChris Bizer
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Data Science Thailand
 
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewBigData_Europe
 
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BigData_Europe
 
Odoo code search
Odoo code searchOdoo code search
Odoo code searchinitOS GmbH
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data ModellingJisc RDM
 
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)e-dialog GmbH
 
Sopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureSopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureXavier BERNARD
 

Similar to Find signal in noise. (20)

Itility marianne faro
Itility marianne faroItility marianne faro
Itility marianne faro
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic Workflows
 
SC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope projectSC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope project
 
BDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - Martin
 
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
 
How to become the best datascientist in Europe
How to become the best datascientist in EuropeHow to become the best datascientist in Europe
How to become the best datascientist in Europe
 
BDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWC
 
Meetup Spark UDF performance
Meetup Spark UDF performanceMeetup Spark UDF performance
Meetup Spark UDF performance
 
From an idea to an apache tlp
From an idea to an apache tlpFrom an idea to an apache tlp
From an idea to an apache tlp
 
Elasticsearch – Introducing New Containerized Metrics
Elasticsearch – Introducing New Containerized MetricsElasticsearch – Introducing New Containerized Metrics
Elasticsearch – Introducing New Containerized Metrics
 
Data Activities in Austria
Data Activities in AustriaData Activities in Austria
Data Activities in Austria
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning Techniques
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project Overview
 
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
 
Odoo code search
Odoo code searchOdoo code search
Odoo code search
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data Modelling
 
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)
 
Sopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureSopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital Architecture
 
MVA Transcript
MVA TranscriptMVA Transcript
MVA Transcript
 

Recently uploaded

ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Find signal in noise.

  • 1. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Michal Brys Data Scientist @ Allegro Measure Camp | London, 10th September 2016 Find signal in noise. 6 steps to find value from messy data.
  • 2. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Michal Brys Data Scientist @ Allegro Specialized also in: + Google Analytics + Google Tag Manager michalbrys.com about.me/michal.brys
  • 3. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Framework for data analysis CRISP-DM - Cross Industry Standard Process for Data Mining - Set up in 1996 (SPSS, Teradata, Daimler AG, NCR ,OHRA) - Still works! Read more: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 4. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 1: Business Understanding - Define analysis goal - What you want to achieve by analysis? - Check business context - Don’t be afraid to ask questions
  • 5. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 1: Business Understanding I want to select customers group with the highest probability of response (...) to target marketing campaign for this group.
  • 6. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 2: Data Understanding - Collect data Check: - What all variables in dataset means - How about missing values? - Exploratory data analysis (EDA)
  • 7. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 2: Data Understanding Google Analytics with client id as custom dimension - Source: Cookies + JavaScript tracker - Processed by Google Analytics - No access to raw data
  • 8. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 2: Data Understanding 10 000 records with 11 variables
  • 9. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 3: Data Preparation - Data cleaning - Prepare new variables, transform data - Remove missing and outstanding values - Check distributions
  • 10. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 3: Data Preparation Example: Fix variables type.
  • 11. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 4: Modeling - Classification problem - Prepare models by different methods - Training and test subset - CART C5.0 Logit Regression
  • 12. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 5: Evaluation Model True Negative True Positive False Negative False Positive Total Error Rate CART 5081 3150 1080 689 17.69% C5.0 4089 2701 1606 1604 32.10% Logit Regression 5871 2107 1307 715 20.22%
  • 13. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 6: Deployment - Prepare report - Implement in system - Bulid product - ...
  • 14. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Summary CRISP-DM + Keeps business goal in mind + Result will answer for initial question + Reproducible and documented process Image: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#/media/File:CRISP-DM_Process_Diagram.png
  • 15. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 More inspiration “Data Mining Methods and Models” Daniel T. Larose “The Signal and the Noise” Nate Silver
  • 16. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 One more thing... michalbrys.gitbooks.io/r-google-analytics/
  • 17. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Q&A Michal Brys about.me/michal.brys github.com/michalbrys