Every scientist who needs big data analytics to save millions of lives should have that power. Complex interactive Big Data analytics solutions require massive architecture, and Know-How to build a fast real-time computing system.BigQuery solves this problem by enabling super-fast, SQL-like queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, working with BigQuery, streaming inserts, User Defined Functions in Javascript, and several use cases for everyday developer: funnel analytics, behavioral analytics, exploring unstructured data.
I TAKE Unconference 2017 - Powering interactive data analysis with Google BigQuery
1. Powering Interactive Data Analysis
with Google BigQuery
Márton Kodok / @martonkodok
Google Developer Expert at REEA
May 2017 - Bucharest, Romania
2. ● Geek. Hiker. Do-er.
● Among the Top3 romanians on Stackoverflow
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net
● BigQuery and database engine expert
● Active in mentoring
Twitter: @martonkodok
StackOverflow: pentium10
Slideshare: martonkodok
GitHub: pentium10
Powering Interactive Data Analysis with Google BigQuery @martonkodok
About me
3. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Agenda
The
Challenge
Powering interactive
Data Analysis/Reporting system
Architecture
Overview
Strategy &
Tricks
Winning
Solution
4. ❏ Need backend/database to STORE, QUERY, EXTRACT data
❏ Deep analytics - large, multi-source, complex, unstructured
❏ Be real time
❏ Terabyte scale
❏ Cost effective
❏ Run Ad-Hoc reports - as the occasion requires
❏ Without Developer - interactive
❏ Minimal engineering efforts
❏ Support streaming - data is generated on a continual basis
❏ Withstand #BlackFriday
❏ Simple Query language (prefered SQL / Javascript)
Powering Interactive Data Analysis with Google BigQuery @martonkodok
The Challenge
5. “We can't solve problems by
using the same kind of
thinking we used when we
created them”
-Albert Einstein
Powering Interactive Data Analysis with Google BigQuery @martonkodok
The Challenge
6. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
7. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BehindtheScenes:
DaysToInsights
8. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean and
Aggregate
DAYS TO
INSIGHTS
9. ● Terabyte scalable storage
● Real-time row ingestion
● Ask sophisticated queries
● Query-performance
● Low-maintenance
● Cost effective
● Wire them up easily
Goal: Store everything accessible by SQL immediately.
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Desired system/platform
Engines:
● MongoDB, Riak, Redis
● ELK Stack (Elasticsearch-Logstash-Kibana)
● Cassandra, Hive, Hadoop...
● Amazon Athena, Google BigQuery...
11. ● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● SQL 2011 Standard + Javascript UDF (User Defined Functions)
● Familiar DB Structure (table, views, record, nested, JSON)
● Open Interfaces (Web UI, BQ command line tool, REST, ODBC)
● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
● Client libraries available in YFL (your favorite languages)
● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017
Powering Interactive Data Analysis with Google BigQuery @martonkodok
What is BigQuery?
12. ● Columnar storage (max 10 000 columns in table)
● Batch load file size limits: 5TB (CSV or JSON)
● User Defined Functions in SQL or Javascript
● Rich SQL 2011: JSON, IP, Math, RegExp, Window functions
● Data types: String, Integer, Float, Boolean, Timestamp,
Record, Nested, Struct, Array.
● Append-only tables prefered (DML syntax available)
● Day partitioned tables
● ACL - row level locking (individual or group based)
Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery: Convenience of SQL
13. * 1 Petabyte storage, 10 TB inserts, 100 TB queries => $22000
Queries Storage Ingestion
➔ 1 TB per month free
➔ 5 USD per TB
➔ only pay for the columns you use
in your query
➔ 20 USD per TB frequently accessed
data
➔ 10 USD per TB long term storage
90 days
➔ Batch load free (CSV/JSON)
➔ Exporting free
➔ Table copy free
➔ Streaming 50 USD per TB
Estimate 1
- Storage 5 TB
- Streaming Inserts 1 TB
- Queries 3 TB
Monthly total: $165
Estimate 2
- Storage 25 TB
- Streaming Inserts 1 TB
- Queries 50 TB
Monthly total: $788
Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery Costs - May 2017
14. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
15. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Access to Insights without Developer support
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
16. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Data Pipeline Integration
Analytics Backend
BigQuery
On-Premises Servers
Pipelines ETL
Database
SQL
Standard
Devices
HTTPS
Ingest
Events
Monitoring
Logging
FluentD
Cloud
Storage
Report & Share
Business Analysis
Firebase
archive
Load
Export
Replay
Application
ServersServers
18. ● On data that it is difficult to process/analyze using traditional databases
● On exploring unstructured data
● Not a replacement to traditional DBs, but it compliments the system
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: JS for natural language processing)
● On streams (form wizard ...)
● On IoT streams
● Major strength is handling Large datasets
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Where to use BigQuery?
19. Go to the BigQuery web UI.
https://bigquery.cloud.google.com/
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Query a public dataset
20. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Romanian stations that record the most days of snow
21. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Mentions of RO politicians since ‘16 Nov in GDELT articles
27. ● Funnel Analysis
● Email URL click heatmap
● Email Health Dashboard (SPAM, ISP deferral, content
A/B split tests, trends or low open rate campaigns)
● Advanced segmentation (all raw data stored)
● Behavioral analytics - engaged users etc...
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Achievements Continued
28. ● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
● no need to re-implement tricky concepts
(time windows / join streams)
● pay only the columns we have in your queries
● run raw ad-hoc queries (either by analysts/sales or Devs)
● no more throwing away-, expiring-, aggregating old data.
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Our benefits
29. ● No manual sharding
● No capacity guessing
● No idle resources
● No maintenance windows
● No manual scaling
● No file mgmt
Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery: Serverless Data Warehouse
30. Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery: Sample projects to try out 1
31. Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery: Sample projects to try out 2
32. Powering Interactive Data Analysis with Google BigQuery @martonkodok
HttpArchive - multiple JS frameworks
33. Powering Interactive Data Analysis with Google BigQuery @martonkodok
HttpArchive - multiple jQuery versions
34. Powering Interactive Data Analysis with Google BigQuery @martonkodok
Easily Build Custom Reports and Dashboards