SlideShare a Scribd company logo
1 of 71
Download to read offline
Store, Extract, Transform, Load, Visualize
Ani Lopez
@anilopez
linkedin.com/in/anilopez
What is this All About
In the beginning there was Data
Infrastructure &
Data Base Admins
BIs
Analysts
And everybody was fairly happy
Data got big & moved in need of strong support
What made Analysts’ work way harder
How do we solve that?
As long as you have
access to Sources,
and control over SETL,
you are ready to funk it up!
Go beyond GA/AA interface. You have to
No need to be an engineer. You can do it
BigData is not scary anymore
This is about how you take over the process
with minimum or no technical knowledge
Analyze
Visualize
Store, Extract, Transform, Load
Automate!
Step 1. Storage
Typical sources
• Online traffic measuring tools like GA or AA
• Social media platforms
• Customer Relationship Management platforms
• Booking systems, Call centers, Retailing
• Telemetry
Data don't exist till fixed somewhere
First challenge: get access
• Amount of sources: one, many, too many
• Access difficulty: simple, complicated, impossible
• Combinations of the above
Sources usually come with a Storing Solution
Yours
Why Our Own Storage?
Source
Source
Source
Source
Source
Safe
Why Our Own Storage?
Source
Source
Source
Types
• Internal
• Excel
• MSSQL / MySQL Server
• External or Cloud
• BigQuery, Cloud SQL, Big Table, DataStorage
• Amazon Redshift
Build your Own Storage
If you are lucky
• All data in a decent storage. Nothing else to do!
• DB / Infrastructure Admins connect the pipes for you
If you don’t
• Do it yourself, a little bit of coding becomes handy
• Cry for help
How?
Step 2. Extract
First
• From Sources to your Storage
• Minimum or no transformation at all
Second
• From your Storage to Intermediate tables
• Heavily transformed
Two moments of Extraction
Dirty cheap
• Next Analytics / BigQuery add-ins for Excel
• Supermetrics / OWOX BQ add-ins for Google Sheets
Careful
• They should be able to automate extraction
• If not some scripting might be required
Tools for Extraction (I)
Data Integration Services
Not so cheap, no coding!
• Analytics Canvas
• Xplenty
• Alteryx
• Fivetran
• Mode
Tools for Extraction (II)
With a hand from DBAs and Engineers
• Google Cloud Dataflow
• Amazon Kinesis
Tools for Extraction (III)
Step 3. Transform
• Viz is important, transformation is key
• No good data = No SUCCESS
Transformation
First
• Data cleansing
• Data enrichment
• Consistency ensuring
Second
• Data Modeling previous to analysis or visualization
Two moments of Transformation
• SQL is the tool to answer complex business question
• It can take you to the BI realm = more $$$ :-D
• A bit of code takes you further
• modeanalytics.com --> Resources
Learn SQL and some JS/Python
Step 4. Load
Why not connecting Viz tool directly to Storage?
• They die when volume of data is huge
• Limited options for transformation
Solution
• Automate materialization to intermediate tables
• Feed Viz tools from those tables
Feed the Viz
Rows: 3,706M
Total time: 180 secs
CPU time: 1.7 days
Rows: 2,3M
Total time: 18 secs
CPU time: 17 secs
Flight delays
1 year of data
Extract only November
10% sample of that
Quick guess
What city and day of November had highest delays?
And you need some
quick charts too
If you don’t know SQL
Xplenty
If you know
Step 5. Visualize
• It's not the same a dashboard than a visual analysis tool
• Insights don't come from any of those
• Insights are the outcome of analyst’s work
Let’s get some stuff straight
• Objective of the visualization itself, representative or exploratory
• Interactivity requirements (on click drill down?)
• Maturity of client's Measurement Culture
• What's data consumer's role: CEO, Analyst, Media planner
• Size of the audience and distribution needs
• Available infrastructure
• Data government and its requirements
• Time to finish the project
• Budget
• Politics
Viz: Factors determining What & How to use
• All of them
• From humble Excel
• To big guys like Qlik and Tableau
• And the middle ones like Data Studio
• Desktop or online solutions
• Coding your own (D3.js)? Interesting but resources intensive,
not agile for those just creating / distributing dashboards
Viz Tools?
• Lady Gaga KO
• Tron Legacy KO
• Minimal OK
3 Styles of Dashboards
• Those using Excel default charts deserve the worst
• Same with the new shiny thing: Data Studio
What dashboards made with default styles look like to me
• Never use Excel default charts or Data Studio templates
• Read about art
• Modern Art de Giulio Carlo Argan
• Focus on: Rationalism / Minimalism / Functionalism
• Follow Viz masters
• Edward Tufte, Stephen Few, Robert Kosara, Alberto Cairo
For Fucks Sake, Educate your Aesthetics!
Examples
Viz
1. Franchise Based Business
SETLV all in once
Windows Task
Scheduler
Online Source
Internal Store
Offline Source
Server
Plotly + Shiny
2. Large Department Store Group. First Setup
Transform
& Viz
to Storage
Online Source
Internal Store
Offline Source
Server
2. Large Department Store Group. Second Setup
Transform
& Load
Vizto Storage
Storage Vizto Storage
3. Sports Equipment Company
Transform
GA
Views
Load
.tde
Live Example
Automated ETL with BigQuery + Apps Script
$0.0, 30 lines of code, 10 minutes
Scheduled
Transformation
Small & Fast
BQ Table
Visualization Tool
of your choice
Huge
BQ Table
Source Table
Destination Table
SQL QUERY doing the Transformation
We want
• To run the transformation every day/week/month
• Append results to existing table feeding the visualization tool
We need
• Your Transforming Query + SQL minifier
• Google Sheets + Apps Script (JavaScript)
Destination Table
Process
• Open a new Google Sheet
• Go to Tools > Script Editor
In Script Editor go to Resources
• Advanced Google Services: Enable BigQuery API
• Developers Console Project: Project Number (of the project
where tables live)
• Place the script and tweak accordingly. Save and schedule
Google Sheets
function saveQueryToTable() {
// Get previous day from cell B2 in spreadsheet
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1');
var previousDay = sheet.getRange("B2").getValue()
// Query
var sql = 'SELECT date, COUNT(*) FROM [bigquery-146904:test_datasets.flights_MASTER] WHERE YEAR(date)=2012 AND MONTH(date)='+previousDay+' GROUP BY date';
// Table destination details
var projectId = 'bigquery-XXXXXX';
var datasetId = 'test_datasets';
var newTableId = 'flights_2012';
// Job definition
var job = {
configuration: {
query: {
query: sql,
writeDisposition:'WRITE_APPEND',
destinationTable: {
projectId: projectId,
datasetId: datasetId,
tableId: newTableId
}
}
}
};
// Job execution
var queryResults = BigQuery.Jobs.insert(job, projectId);
Logger.log(queryResults.status);
}
JS Script
Schedule
Almost there
• Don’t try to sell to stakeholders the megaproject of your life
• Start small and simple, get buy in, grow little by little
• Plan SETLV carefully according to circumstances
• Don’t just buy first vendor solution presented
• Many solutions out there, ask for demos
• It tends to get messy, don’t panic
$0.02 more of advice

More Related Content

What's hot

Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Marc Lelijveld
 
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...Looker
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsPerficient, Inc.
 
DataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudDataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudLei Fang
 
Tableau 2018 - Introduction to Visual analytics
Tableau 2018 - Introduction to Visual analyticsTableau 2018 - Introduction to Visual analytics
Tableau 2018 - Introduction to Visual analyticsArun K
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scaleLooker
 
Data Visualization and Discovery
Data Visualization and DiscoveryData Visualization and Discovery
Data Visualization and DiscoveryDatavail
 
Incorta story with product
Incorta story with productIncorta story with product
Incorta story with productIncorta
 
Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Arun K
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
 
Intro of Key Features of eCAAT-TS
Intro of Key Features of eCAAT-TSIntro of Key Features of eCAAT-TS
Intro of Key Features of eCAAT-TSrafeq
 
6 steps to richer visualizations using alteryx for microsoft power bi updated
6 steps to richer visualizations using alteryx for microsoft power bi updated6 steps to richer visualizations using alteryx for microsoft power bi updated
6 steps to richer visualizations using alteryx for microsoft power bi updatedPhillip Reinhart
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentationMichael Young
 
Executive Dashboard Design on Tableau
Executive Dashboard Design on TableauExecutive Dashboard Design on Tableau
Executive Dashboard Design on TableauMethod360
 
KTern - The Best product for SAP S/4HANA Conversion
KTern - The Best product for SAP S/4HANA ConversionKTern - The Best product for SAP S/4HANA Conversion
KTern - The Best product for SAP S/4HANA ConversionAkilesh Kumaran
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
From Architecture to Analytics: A look at Simply Business’s data strategy
From Architecture to Analytics: A look at Simply Business’s data strategy From Architecture to Analytics: A look at Simply Business’s data strategy
From Architecture to Analytics: A look at Simply Business’s data strategy Looker
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Looker
 
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...Srinath Reddy
 

What's hot (20)

Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.Enterprise and multi-tier Power BI deployments with Azure DevOps.
Enterprise and multi-tier Power BI deployments with Azure DevOps.
 
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle Analytics
 
DataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in CloudDataCanvas: Big Data Analytic Flow in Cloud
DataCanvas: Big Data Analytic Flow in Cloud
 
Tableau 2018 - Introduction to Visual analytics
Tableau 2018 - Introduction to Visual analyticsTableau 2018 - Introduction to Visual analytics
Tableau 2018 - Introduction to Visual analytics
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scale
 
Data Visualization and Discovery
Data Visualization and DiscoveryData Visualization and Discovery
Data Visualization and Discovery
 
Incorta story with product
Incorta story with productIncorta story with product
Incorta story with product
 
Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old Constraints
 
Intro of Key Features of eCAAT-TS
Intro of Key Features of eCAAT-TSIntro of Key Features of eCAAT-TS
Intro of Key Features of eCAAT-TS
 
6 steps to richer visualizations using alteryx for microsoft power bi updated
6 steps to richer visualizations using alteryx for microsoft power bi updated6 steps to richer visualizations using alteryx for microsoft power bi updated
6 steps to richer visualizations using alteryx for microsoft power bi updated
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentation
 
Executive Dashboard Design on Tableau
Executive Dashboard Design on TableauExecutive Dashboard Design on Tableau
Executive Dashboard Design on Tableau
 
Implementing best practice dashboards & KPIs
Implementing best practice dashboards & KPIsImplementing best practice dashboards & KPIs
Implementing best practice dashboards & KPIs
 
KTern - The Best product for SAP S/4HANA Conversion
KTern - The Best product for SAP S/4HANA ConversionKTern - The Best product for SAP S/4HANA Conversion
KTern - The Best product for SAP S/4HANA Conversion
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
From Architecture to Analytics: A look at Simply Business’s data strategy
From Architecture to Analytics: A look at Simply Business’s data strategy From Architecture to Analytics: A look at Simply Business’s data strategy
From Architecture to Analytics: A look at Simply Business’s data strategy
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
 

Viewers also liked

Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process Omid Vahdaty
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 
"Taming Advanced Analytics Implementations at EA Scale" - Electronic Arts, Di...
"Taming Advanced Analytics Implementations at EA Scale" - Electronic Arts, Di..."Taming Advanced Analytics Implementations at EA Scale" - Electronic Arts, Di...
"Taming Advanced Analytics Implementations at EA Scale" - Electronic Arts, Di...Tealium
 
Editing Techniques
Editing TechniquesEditing Techniques
Editing Techniquesgbuche
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence OverviewClaudio Menozzi
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Tableau Airline Solutions
Tableau Airline SolutionsTableau Airline Solutions
Tableau Airline Solutionsdghodke
 
Airline Analytics: Decision Analytics Centers of Excellence
Airline Analytics: Decision Analytics Centers of ExcellenceAirline Analytics: Decision Analytics Centers of Excellence
Airline Analytics: Decision Analytics Centers of ExcellenceBooz Allen Hamilton
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse ArchitecturesTheju Paul
 
ETL Using Informatica Power Center
ETL Using Informatica Power CenterETL Using Informatica Power Center
ETL Using Informatica Power CenterEdureka!
 
Transportation KPI Dashboard & Report - Example
Transportation KPI Dashboard & Report - ExampleTransportation KPI Dashboard & Report - Example
Transportation KPI Dashboard & Report - ExampleEquilibria, Inc.
 
Effective Dashboard Design: Why Your Baby is Ugly
Effective Dashboard Design: Why Your Baby is UglyEffective Dashboard Design: Why Your Baby is Ugly
Effective Dashboard Design: Why Your Baby is UglyAaron Hursman
 
Informatica PowerCenter
Informatica PowerCenterInformatica PowerCenter
Informatica PowerCenterRamy Mahrous
 
From KPIs to dashboards
From KPIs to dashboardsFrom KPIs to dashboards
From KPIs to dashboardsAni Lopez
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyMark Ginnebaugh
 
The Power of Infographics
The Power of InfographicsThe Power of Infographics
The Power of InfographicsMark Smiciklas
 
Fundamental Ways We Use Data Visualizations
Fundamental Ways We Use Data VisualizationsFundamental Ways We Use Data Visualizations
Fundamental Ways We Use Data VisualizationsInitial State
 

Viewers also liked (20)

Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
"Taming Advanced Analytics Implementations at EA Scale" - Electronic Arts, Di...
"Taming Advanced Analytics Implementations at EA Scale" - Electronic Arts, Di..."Taming Advanced Analytics Implementations at EA Scale" - Electronic Arts, Di...
"Taming Advanced Analytics Implementations at EA Scale" - Electronic Arts, Di...
 
Editing Techniques
Editing TechniquesEditing Techniques
Editing Techniques
 
Air Miles Customer Dashboard
Air Miles Customer DashboardAir Miles Customer Dashboard
Air Miles Customer Dashboard
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Informatica ppt
Informatica pptInformatica ppt
Informatica ppt
 
Tableau Airline Solutions
Tableau Airline SolutionsTableau Airline Solutions
Tableau Airline Solutions
 
Airline Analytics: Decision Analytics Centers of Excellence
Airline Analytics: Decision Analytics Centers of ExcellenceAirline Analytics: Decision Analytics Centers of Excellence
Airline Analytics: Decision Analytics Centers of Excellence
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
ETL Using Informatica Power Center
ETL Using Informatica Power CenterETL Using Informatica Power Center
ETL Using Informatica Power Center
 
Transportation KPI Dashboard & Report - Example
Transportation KPI Dashboard & Report - ExampleTransportation KPI Dashboard & Report - Example
Transportation KPI Dashboard & Report - Example
 
Effective Dashboard Design: Why Your Baby is Ugly
Effective Dashboard Design: Why Your Baby is UglyEffective Dashboard Design: Why Your Baby is Ugly
Effective Dashboard Design: Why Your Baby is Ugly
 
Informatica PowerCenter
Informatica PowerCenterInformatica PowerCenter
Informatica PowerCenter
 
From KPIs to dashboards
From KPIs to dashboardsFrom KPIs to dashboards
From KPIs to dashboards
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 
The Power of Infographics
The Power of InfographicsThe Power of Infographics
The Power of Infographics
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Fundamental Ways We Use Data Visualizations
Fundamental Ways We Use Data VisualizationsFundamental Ways We Use Data Visualizations
Fundamental Ways We Use Data Visualizations
 

Similar to Store, Extract, Transform, Load, Visualize. Untagged Conference

Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructureSimon Belak
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
Visualising montioring and evaluation data
Visualising montioring and evaluation dataVisualising montioring and evaluation data
Visualising montioring and evaluation dataRob Worthington
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - PublicJulian Payne
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data WarehousingDavide Mauri
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerAntonios Chatzipavlis
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayAmazon Web Services
 
Marketing Analytics
Marketing AnalyticsMarketing Analytics
Marketing Analyticsisabat1
 
Ellucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIEllucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIKent Brooks
 
Tableau Seattle BI Event How Tableau Changed My Life
Tableau Seattle BI Event How Tableau Changed My LifeTableau Seattle BI Event How Tableau Changed My Life
Tableau Seattle BI Event How Tableau Changed My LifeRussell Spangler
 
Data Foundation for Analytics Excellence by Tanimura, cathy from Okta
Data Foundation for Analytics Excellence by Tanimura, cathy from OktaData Foundation for Analytics Excellence by Tanimura, cathy from Okta
Data Foundation for Analytics Excellence by Tanimura, cathy from OktaTin Ho
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdfAnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdfNamanGulati17
 

Similar to Store, Extract, Transform, Load, Visualize. Untagged Conference (20)

Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Visualising montioring and evaluation data
Visualising montioring and evaluation dataVisualising montioring and evaluation data
Visualising montioring and evaluation data
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - Public
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
SPS Toronto 2015
SPS Toronto 2015SPS Toronto 2015
SPS Toronto 2015
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBay
 
Marketing Analytics
Marketing AnalyticsMarketing Analytics
Marketing Analytics
 
Ellucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIEllucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BI
 
Tableau Seattle BI Event How Tableau Changed My Life
Tableau Seattle BI Event How Tableau Changed My LifeTableau Seattle BI Event How Tableau Changed My Life
Tableau Seattle BI Event How Tableau Changed My Life
 
Data Foundation for Analytics Excellence by Tanimura, cathy from Okta
Data Foundation for Analytics Excellence by Tanimura, cathy from OktaData Foundation for Analytics Excellence by Tanimura, cathy from Okta
Data Foundation for Analytics Excellence by Tanimura, cathy from Okta
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdfAnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
 

Recently uploaded

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Recently uploaded (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 

Store, Extract, Transform, Load, Visualize. Untagged Conference

  • 1. Store, Extract, Transform, Load, Visualize
  • 3. What is this All About
  • 4. In the beginning there was Data
  • 5. Infrastructure & Data Base Admins BIs Analysts And everybody was fairly happy
  • 6. Data got big & moved in need of strong support
  • 7. What made Analysts’ work way harder
  • 8. How do we solve that?
  • 9. As long as you have access to Sources, and control over SETL, you are ready to funk it up!
  • 10. Go beyond GA/AA interface. You have to No need to be an engineer. You can do it BigData is not scary anymore
  • 11. This is about how you take over the process with minimum or no technical knowledge Analyze Visualize Store, Extract, Transform, Load Automate!
  • 13. Typical sources • Online traffic measuring tools like GA or AA • Social media platforms • Customer Relationship Management platforms • Booking systems, Call centers, Retailing • Telemetry Data don't exist till fixed somewhere
  • 14. First challenge: get access • Amount of sources: one, many, too many • Access difficulty: simple, complicated, impossible • Combinations of the above Sources usually come with a Storing Solution
  • 15. Yours Why Our Own Storage? Source Source Source Source Source
  • 16. Safe Why Our Own Storage? Source Source Source
  • 17. Types • Internal • Excel • MSSQL / MySQL Server • External or Cloud • BigQuery, Cloud SQL, Big Table, DataStorage • Amazon Redshift Build your Own Storage
  • 18. If you are lucky • All data in a decent storage. Nothing else to do! • DB / Infrastructure Admins connect the pipes for you If you don’t • Do it yourself, a little bit of coding becomes handy • Cry for help How?
  • 20. First • From Sources to your Storage • Minimum or no transformation at all Second • From your Storage to Intermediate tables • Heavily transformed Two moments of Extraction
  • 21. Dirty cheap • Next Analytics / BigQuery add-ins for Excel • Supermetrics / OWOX BQ add-ins for Google Sheets Careful • They should be able to automate extraction • If not some scripting might be required Tools for Extraction (I)
  • 22. Data Integration Services Not so cheap, no coding! • Analytics Canvas • Xplenty • Alteryx • Fivetran • Mode Tools for Extraction (II)
  • 23. With a hand from DBAs and Engineers • Google Cloud Dataflow • Amazon Kinesis Tools for Extraction (III)
  • 25. • Viz is important, transformation is key • No good data = No SUCCESS Transformation
  • 26. First • Data cleansing • Data enrichment • Consistency ensuring Second • Data Modeling previous to analysis or visualization Two moments of Transformation
  • 27. • SQL is the tool to answer complex business question • It can take you to the BI realm = more $$$ :-D • A bit of code takes you further • modeanalytics.com --> Resources Learn SQL and some JS/Python
  • 28.
  • 30. Why not connecting Viz tool directly to Storage? • They die when volume of data is huge • Limited options for transformation Solution • Automate materialization to intermediate tables • Feed Viz tools from those tables Feed the Viz
  • 31.
  • 32. Rows: 3,706M Total time: 180 secs CPU time: 1.7 days Rows: 2,3M Total time: 18 secs CPU time: 17 secs
  • 33. Flight delays 1 year of data Extract only November 10% sample of that Quick guess What city and day of November had highest delays?
  • 34. And you need some quick charts too
  • 35. If you don’t know SQL Xplenty
  • 38. • It's not the same a dashboard than a visual analysis tool • Insights don't come from any of those • Insights are the outcome of analyst’s work Let’s get some stuff straight
  • 39. • Objective of the visualization itself, representative or exploratory • Interactivity requirements (on click drill down?) • Maturity of client's Measurement Culture • What's data consumer's role: CEO, Analyst, Media planner • Size of the audience and distribution needs • Available infrastructure • Data government and its requirements • Time to finish the project • Budget • Politics Viz: Factors determining What & How to use
  • 40. • All of them • From humble Excel • To big guys like Qlik and Tableau • And the middle ones like Data Studio • Desktop or online solutions • Coding your own (D3.js)? Interesting but resources intensive, not agile for those just creating / distributing dashboards Viz Tools?
  • 41. • Lady Gaga KO • Tron Legacy KO • Minimal OK 3 Styles of Dashboards
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. • Those using Excel default charts deserve the worst • Same with the new shiny thing: Data Studio
  • 54. What dashboards made with default styles look like to me
  • 55. • Never use Excel default charts or Data Studio templates • Read about art • Modern Art de Giulio Carlo Argan • Focus on: Rationalism / Minimalism / Functionalism • Follow Viz masters • Edward Tufte, Stephen Few, Robert Kosara, Alberto Cairo For Fucks Sake, Educate your Aesthetics!
  • 57. Viz 1. Franchise Based Business SETLV all in once Windows Task Scheduler
  • 58. Online Source Internal Store Offline Source Server Plotly + Shiny 2. Large Department Store Group. First Setup Transform & Viz to Storage
  • 59. Online Source Internal Store Offline Source Server 2. Large Department Store Group. Second Setup Transform & Load Vizto Storage
  • 60. Storage Vizto Storage 3. Sports Equipment Company Transform GA Views Load .tde
  • 62. Automated ETL with BigQuery + Apps Script $0.0, 30 lines of code, 10 minutes Scheduled Transformation Small & Fast BQ Table Visualization Tool of your choice Huge BQ Table
  • 65. SQL QUERY doing the Transformation
  • 66. We want • To run the transformation every day/week/month • Append results to existing table feeding the visualization tool We need • Your Transforming Query + SQL minifier • Google Sheets + Apps Script (JavaScript) Destination Table
  • 67. Process • Open a new Google Sheet • Go to Tools > Script Editor In Script Editor go to Resources • Advanced Google Services: Enable BigQuery API • Developers Console Project: Project Number (of the project where tables live) • Place the script and tweak accordingly. Save and schedule Google Sheets
  • 68. function saveQueryToTable() { // Get previous day from cell B2 in spreadsheet var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1'); var previousDay = sheet.getRange("B2").getValue() // Query var sql = 'SELECT date, COUNT(*) FROM [bigquery-146904:test_datasets.flights_MASTER] WHERE YEAR(date)=2012 AND MONTH(date)='+previousDay+' GROUP BY date'; // Table destination details var projectId = 'bigquery-XXXXXX'; var datasetId = 'test_datasets'; var newTableId = 'flights_2012'; // Job definition var job = { configuration: { query: { query: sql, writeDisposition:'WRITE_APPEND', destinationTable: { projectId: projectId, datasetId: datasetId, tableId: newTableId } } } }; // Job execution var queryResults = BigQuery.Jobs.insert(job, projectId); Logger.log(queryResults.status); } JS Script
  • 71. • Don’t try to sell to stakeholders the megaproject of your life • Start small and simple, get buy in, grow little by little • Plan SETLV carefully according to circumstances • Don’t just buy first vendor solution presented • Many solutions out there, ask for demos • It tends to get messy, don’t panic $0.02 more of advice