No doubt Visualization of Data is a key component of our industry. The path data travels since it is created till it takes shape in a chart is sometimes obscure and overlooked as it tends to live in the engineering side (when volume is relevant), an area where Data Scientist tend to visit but not the usual Web/Marketing Data Analyst. Nowadays the options to tame all that journey and make the best of it are many and they don't require extensive engineering knowledge. Small or Big Data, let's see what "Store, Extract, Transform, Load, Visualize" is all about.
13. Typical sources
• Online traffic measuring tools like GA or AA
• Social media platforms
• Customer Relationship Management platforms
• Booking systems, Call centers, Retailing
• Telemetry
Data don't exist till fixed somewhere
14. First challenge: get access
• Amount of sources: one, many, too many
• Access difficulty: simple, complicated, impossible
• Combinations of the above
Sources usually come with a Storing Solution
17. Types
• Internal
• Excel
• MSSQL / MySQL Server
• External or Cloud
• BigQuery, Cloud SQL, Big Table, DataStorage
• Amazon Redshift
Build your Own Storage
18. If you are lucky
• All data in a decent storage. Nothing else to do!
• DB / Infrastructure Admins connect the pipes for you
If you don’t
• Do it yourself, a little bit of coding becomes handy
• Cry for help
How?
20. First
• From Sources to your Storage
• Minimum or no transformation at all
Second
• From your Storage to Intermediate tables
• Heavily transformed
Two moments of Extraction
21. Dirty cheap
• Next Analytics / BigQuery add-ins for Excel
• Supermetrics / OWOX BQ add-ins for Google Sheets
Careful
• They should be able to automate extraction
• If not some scripting might be required
Tools for Extraction (I)
22. Data Integration Services
Not so cheap, no coding!
• Analytics Canvas
• Xplenty
• Alteryx
• Fivetran
• Mode
Tools for Extraction (II)
23. With a hand from DBAs and Engineers
• Google Cloud Dataflow
• Amazon Kinesis
Tools for Extraction (III)
25. • Viz is important, transformation is key
• No good data = No SUCCESS
Transformation
26. First
• Data cleansing
• Data enrichment
• Consistency ensuring
Second
• Data Modeling previous to analysis or visualization
Two moments of Transformation
27. • SQL is the tool to answer complex business question
• It can take you to the BI realm = more $$$ :-D
• A bit of code takes you further
• modeanalytics.com --> Resources
Learn SQL and some JS/Python
30. Why not connecting Viz tool directly to Storage?
• They die when volume of data is huge
• Limited options for transformation
Solution
• Automate materialization to intermediate tables
• Feed Viz tools from those tables
Feed the Viz
31.
32. Rows: 3,706M
Total time: 180 secs
CPU time: 1.7 days
Rows: 2,3M
Total time: 18 secs
CPU time: 17 secs
33. Flight delays
1 year of data
Extract only November
10% sample of that
Quick guess
What city and day of November had highest delays?
38. • It's not the same a dashboard than a visual analysis tool
• Insights don't come from any of those
• Insights are the outcome of analyst’s work
Let’s get some stuff straight
39. • Objective of the visualization itself, representative or exploratory
• Interactivity requirements (on click drill down?)
• Maturity of client's Measurement Culture
• What's data consumer's role: CEO, Analyst, Media planner
• Size of the audience and distribution needs
• Available infrastructure
• Data government and its requirements
• Time to finish the project
• Budget
• Politics
Viz: Factors determining What & How to use
40. • All of them
• From humble Excel
• To big guys like Qlik and Tableau
• And the middle ones like Data Studio
• Desktop or online solutions
• Coding your own (D3.js)? Interesting but resources intensive,
not agile for those just creating / distributing dashboards
Viz Tools?
41. • Lady Gaga KO
• Tron Legacy KO
• Minimal OK
3 Styles of Dashboards
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53. • Those using Excel default charts deserve the worst
• Same with the new shiny thing: Data Studio
55. • Never use Excel default charts or Data Studio templates
• Read about art
• Modern Art de Giulio Carlo Argan
• Focus on: Rationalism / Minimalism / Functionalism
• Follow Viz masters
• Edward Tufte, Stephen Few, Robert Kosara, Alberto Cairo
For Fucks Sake, Educate your Aesthetics!
62. Automated ETL with BigQuery + Apps Script
$0.0, 30 lines of code, 10 minutes
Scheduled
Transformation
Small & Fast
BQ Table
Visualization Tool
of your choice
Huge
BQ Table
66. We want
• To run the transformation every day/week/month
• Append results to existing table feeding the visualization tool
We need
• Your Transforming Query + SQL minifier
• Google Sheets + Apps Script (JavaScript)
Destination Table
67. Process
• Open a new Google Sheet
• Go to Tools > Script Editor
In Script Editor go to Resources
• Advanced Google Services: Enable BigQuery API
• Developers Console Project: Project Number (of the project
where tables live)
• Place the script and tweak accordingly. Save and schedule
Google Sheets
68. function saveQueryToTable() {
// Get previous day from cell B2 in spreadsheet
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1');
var previousDay = sheet.getRange("B2").getValue()
// Query
var sql = 'SELECT date, COUNT(*) FROM [bigquery-146904:test_datasets.flights_MASTER] WHERE YEAR(date)=2012 AND MONTH(date)='+previousDay+' GROUP BY date';
// Table destination details
var projectId = 'bigquery-XXXXXX';
var datasetId = 'test_datasets';
var newTableId = 'flights_2012';
// Job definition
var job = {
configuration: {
query: {
query: sql,
writeDisposition:'WRITE_APPEND',
destinationTable: {
projectId: projectId,
datasetId: datasetId,
tableId: newTableId
}
}
}
};
// Job execution
var queryResults = BigQuery.Jobs.insert(job, projectId);
Logger.log(queryResults.status);
}
JS Script
71. • Don’t try to sell to stakeholders the megaproject of your life
• Start small and simple, get buy in, grow little by little
• Plan SETLV carefully according to circumstances
• Don’t just buy first vendor solution presented
• Many solutions out there, ask for demos
• It tends to get messy, don’t panic
$0.02 more of advice