SlideShare a Scribd company logo
1 of 55
Download to read offline
Exploring Open Data with BigQuery
Jenny Tong
Developer Advocate
Google Cloud Platform
@MimmingCodes
Agenda
● Origin story
● Count stuff
● How it works
● Some cool open data
● Do something useful
Google Research Publications
Google Research Publications
Managed Cloud Versions
Bigtable
Flume
Dremel
Bigtable
Dataflow
BigQuery
Google BigQueryGoogle BigQuery
Let's count some stuff
SELECT count(word)
FROM publicdata:samples.shakespeare
Words in Shakespeare
SELECT sum(requests) as total
FROM [fh-bigquery:wikipedia.pagecounts_20150511_05]
Wikipedia hits over 1 hour
SELECT sum(requests) as total
FROM [fh-bigquery:wikipedia.pagecounts_201505]
Wikipedia hits over 1 month
Several years of Wikipedia data
SELECT sum(requests) as total
FROM
[fh-bigquery:wikipedia.pagecounts_201105],
[fh-bigquery:wikipedia.pagecounts_201106],
[fh-bigquery:wikipedia.pagecounts_201107],
...
SELECT
SUM(requests) AS total
FROM
TABLE_QUERY(
[fh-bigquery:wikipedia],
'REGEXP_MATCH(
table_id,
r"pagecounts_2015[0-9]{2}$")')
Several years of Wikipedia data
How about a RegExp
SELECT
SUM(requests) AS total
FROM
TABLE_QUERY(
[fh-bigquery:wikipedia],
'REGEXP_MATCH(
table_id,
r"pagecounts_2015[0-9]{2}$")')
WHERE
(REGEXP_MATCH(title, '.*[dD]inosaur.*'))
How did it do that?
o_O
Qualities of a good RDBMS
Qualities of a good RDBMS
● Inserts & locking
● Indexing
● Cache
● Query planning
Qualities of a good RDBMS
● Inserts & locking
● Indexing
● Cache
● Query planning
Storing data
-- -- -- --
-- -- -- --
-- -- -- --
Table
Columns
Disks
Reading data: Life of a BigQuery
SELECT sum(requests) as sum
FROM (
SELECT requests, title
FROM [fh-bigquery:wikipedia.
pagecounts_201501]
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
)
Life of a BigQuery
L L
MMixer
Leaf
Storage
L L L L
M M
M
Life of a BigQuery
Root Mixer
Mixer
Leaf
Storage
Life of a BigQuery
Query
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
SELECT requests, title
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT requests, title
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT sum(requests)
5.8 Mil
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
SELECT requests, title
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT sum(requests)
5.8 Mil
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
SELECT requests, title
SELECT sum(requests)
Open Data
Finding Open Data
opendata.stackexchange.com
Finding Open Data
reddit.com/r/dataisbeautiful
Time to explore
GSOD
Weather in Half Moon Bay
SELECT DATE(year+mo+da) day, min, max
FROM [fh-bigquery:weather_gsod.gsod2013]
WHERE stn IN (
SELECT usaf FROM [fh-bigquery:weather_gsod.stations]
WHERE name = 'HALF MOON BAY AIRPOR')
AND max < 200
ORDER BY day;
Weather in Half Moon Bay
SELECT DATE(year+mo+da) day, min, max
FROM [fh-bigquery:weather_gsod.gsod2013]
WHERE stn IN (
SELECT usaf FROM [fh-bigquery:weather_gsod.stations]
WHERE name = 'HALF MOON BAY AIRPOR')
AND max < 200
ORDER BY day;
Global high temperatures
SELECT year, max(max) as max
FROM
TABLE_QUERY(
[fh-bigquery:weather_gsod],
'table_id CONTAINS "gsod"')
where max < 200
group by year order by year asc
GDELT
Stories per month - Massachusetts
SELECT DATE(STRING(MonthYear) + '01') month,
SUM(ActionGeo_ADM1Code='USMA') US
FROM [gdelt-bq:full.events]
WHERE MonthYear > 0
GROUP BY 1 ORDER BY 1
SELECT DATE(STRING(MonthYear) + '01') month,
SUM(ActionGeo_ADM1Code='USMA') / COUNT(*) newsyness
FROM [gdelt-bq:full.events]
WHERE MonthYear > 0
GROUP BY 1 ORDER BY 1
Stories per month, normalized
https://developers.google.com/genomics/
Genomics
Genomics
SELECT Sample, SUM(single), SUM(double),
FROM (
SELECT call.call_set_name AS Sample,
SOME(call.genotype > 0) AND NOT EVERY(call.
genotype > 0) WITHIN call AS single,
EVERY(call.genotype > 0) WITHIN call AS double,
FROM[genomics-public-data:1000_genomes.variants]
OMIT RECORD IF reference_name IN ("X","Y","MT"))
GROUP BY Sample ORDER BY Sample
Genomics
SELECT Sample, SUM(single), SUM(double),
FROM (
SELECT call.call_set_name AS Sample,
SOME(call.genotype > 0) AND NOT EVERY(call.
genotype > 0) WITHIN call AS single,
EVERY(call.genotype > 0) WITHIN call AS double,
FROM[genomics-public-data:1000_genomes.variants]
OMIT RECORD IF reference_name IN ("X","Y","MT"))
GROUP BY Sample ORDER BY Sample
Something useful:
Use Wikipedia data to pick a movie
1. Wikipedia edits
2. ???
3. Movie recommendation
Follow the edits
Same
editor
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where
title contains 'Hackers'
and title contains '(film)'
and wp_namespace = 0
group by title, id
order by edits
limit 10
Pick a great movie
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where contributor_id in (
select contributor_id
from [publicdata:samples.wikipedia]
where
id=264176
and contributor_id is not null
and is_bot is null
and wp_namespace = 0
and title CONTAINS '(film)'
group by contributor_id)
and wp_namespace = 0
and id != 264176
and title CONTAINS '(film)'
group each by title, id
order by edits desc
limit 100
Find edits in common
Discover the most broadly popular films
select id from (
select id, count(id) as edits
from [publicdata:samples.wikipedia]
where
wp_namespace = 0
and title CONTAINS '(film)'
group each by id
order by edits desc
limit 20)
Edits in common, minus broadly popular
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where contributor_id in (
select contributor_id
from [publicdata:samples.wikipedia]
where
id=264176
and contributor_id is not null
and is_bot is null
and wp_namespace = 0
and title CONTAINS '(film)'
group by contributor_id)
and wp_namespace = 0
and id != 264176
and title CONTAINS '(film)'
and id not in (
select id from (
select id, count(id) as edits
from [publicdata:samples.
wikipedia]
where
wp_namespace = 0
and title CONTAINS '(film)'
group each by id
order by edits desc
limit 20
)
)
group each by title, id
order by edits desc
limit 100
What we talked about
● Origin story
● Count stuff
● How it works
● Some cool open data
● Practical applications
● Try BigQuery
○ bigquery.cloud.google.com
● Queries we ran
○ github.com/mimming/snippets
● Me
○ @MimmingCodes
○ google.com/+mimming
The end
Exploring Open Date with BigQuery: Jenny Tong

More Related Content

What's hot

Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Group
kchodorow
 

What's hot (20)

Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB and Python
MongoDB and PythonMongoDB and Python
MongoDB and Python
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
Python and MongoDB
Python and MongoDB Python and MongoDB
Python and MongoDB
 
hySON - D2Fest
hySON - D2FesthySON - D2Fest
hySON - D2Fest
 
hySON
hySONhySON
hySON
 
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.
 
GraphDB
GraphDBGraphDB
GraphDB
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Group
 
Profile of NPOESS HDF5 Files
Profile of NPOESS HDF5 FilesProfile of NPOESS HDF5 Files
Profile of NPOESS HDF5 Files
 
Building social network with Neo4j and Python
Building social network with Neo4j and PythonBuilding social network with Neo4j and Python
Building social network with Neo4j and Python
 

Viewers also liked

Get to know Holiday Extras 2011
Get to know Holiday Extras 2011Get to know Holiday Extras 2011
Get to know Holiday Extras 2011
Matthew Pack
 
SMX 2010 Summary of Hot Topics from SEO Track
SMX 2010 Summary of Hot Topics from SEO TrackSMX 2010 Summary of Hot Topics from SEO Track
SMX 2010 Summary of Hot Topics from SEO Track
Matthew Pack
 
Presentation Hassle Free Anna
Presentation Hassle Free AnnaPresentation Hassle Free Anna
Presentation Hassle Free Anna
Matthew Pack
 

Viewers also liked (20)

Access to iDevices
Access to iDevicesAccess to iDevices
Access to iDevices
 
Social business online information 201112
Social business online information 201112 Social business online information 201112
Social business online information 201112
 
Put the romance back into rome
Put the romance back into romePut the romance back into rome
Put the romance back into rome
 
How to set your ADI business profile
How to set your ADI business profileHow to set your ADI business profile
How to set your ADI business profile
 
Get to know Holiday Extras 2011
Get to know Holiday Extras 2011Get to know Holiday Extras 2011
Get to know Holiday Extras 2011
 
Break away old
Break away oldBreak away old
Break away old
 
How to get started with Roadio in under 60 seconds
How to get started with Roadio in under 60 secondsHow to get started with Roadio in under 60 seconds
How to get started with Roadio in under 60 seconds
 
SMX 2010 Summary of Hot Topics from SEO Track
SMX 2010 Summary of Hot Topics from SEO TrackSMX 2010 Summary of Hot Topics from SEO Track
SMX 2010 Summary of Hot Topics from SEO Track
 
Hotleads:upsell
Hotleads:upsellHotleads:upsell
Hotleads:upsell
 
Presentation Hassle Free Anna
Presentation Hassle Free AnnaPresentation Hassle Free Anna
Presentation Hassle Free Anna
 
How to manage your payments
How to manage your paymentsHow to manage your payments
How to manage your payments
 
Static Sites Can be the Solution (Simon Wood)
Static Sites Can be the Solution (Simon Wood)Static Sites Can be the Solution (Simon Wood)
Static Sites Can be the Solution (Simon Wood)
 
Cinematic UX, Brad Weaver
Cinematic UX, Brad WeaverCinematic UX, Brad Weaver
Cinematic UX, Brad Weaver
 
Polyglot polywhat polywhy
Polyglot polywhat polywhyPolyglot polywhat polywhy
Polyglot polywhat polywhy
 
Online Presence
Online PresenceOnline Presence
Online Presence
 
Design+Startup 2013
Design+Startup 2013Design+Startup 2013
Design+Startup 2013
 
BreakAway
BreakAwayBreakAway
BreakAway
 
Apache Cordova, Hybrid Application Development
Apache Cordova, Hybrid Application DevelopmentApache Cordova, Hybrid Application Development
Apache Cordova, Hybrid Application Development
 
Osservatorio congressuale Torino 2014 2015
Osservatorio congressuale Torino 2014 2015Osservatorio congressuale Torino 2014 2015
Osservatorio congressuale Torino 2014 2015
 
Surviving the enterprise storm - @RianVDM
Surviving the enterprise storm - @RianVDMSurviving the enterprise storm - @RianVDM
Surviving the enterprise storm - @RianVDM
 

Similar to Exploring Open Date with BigQuery: Jenny Tong

Similar to Exploring Open Date with BigQuery: Jenny Tong (20)

CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
Jeff Jacob MSBI Training portfolio
Jeff Jacob MSBI Training portfolioJeff Jacob MSBI Training portfolio
Jeff Jacob MSBI Training portfolio
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
 
Term 2 CS Practical File 2021-22.pdf
Term 2 CS Practical File 2021-22.pdfTerm 2 CS Practical File 2021-22.pdf
Term 2 CS Practical File 2021-22.pdf
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
DB_lecturs8 27 11.pptx
DB_lecturs8 27 11.pptxDB_lecturs8 27 11.pptx
DB_lecturs8 27 11.pptx
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...
 
Mashing Up The Guardian
Mashing Up The GuardianMashing Up The Guardian
Mashing Up The Guardian
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Mashing Up The Guardian
Mashing Up The GuardianMashing Up The Guardian
Mashing Up The Guardian
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at Squidoo
 
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
 
Where is the World is my Open Government Data?
Where is the World is my Open Government Data?Where is the World is my Open Government Data?
Where is the World is my Open Government Data?
 

More from Future Insights

More from Future Insights (20)

The Human Body in the IoT. Tim Cannon + Ryan O'Shea
The Human Body in the IoT. Tim Cannon + Ryan O'SheaThe Human Body in the IoT. Tim Cannon + Ryan O'Shea
The Human Body in the IoT. Tim Cannon + Ryan O'Shea
 
Pretty pictures - Brandon Satrom
Pretty pictures - Brandon SatromPretty pictures - Brandon Satrom
Pretty pictures - Brandon Satrom
 
Putting real time into practice - Saul Diez-Guerra
Putting real time into practice - Saul Diez-GuerraPutting real time into practice - Saul Diez-Guerra
Putting real time into practice - Saul Diez-Guerra
 
A Universal Theory of Everything, Christopher Murphy
A Universal Theory of Everything, Christopher MurphyA Universal Theory of Everything, Christopher Murphy
A Universal Theory of Everything, Christopher Murphy
 
Horizon Interactive Awards, Mike Sauce & Jeff Jahn
Horizon Interactive Awards, Mike Sauce & Jeff JahnHorizon Interactive Awards, Mike Sauce & Jeff Jahn
Horizon Interactive Awards, Mike Sauce & Jeff Jahn
 
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
 
Front End Development Transformation at Scale, Damon Deaner
Front End Development Transformation at Scale, Damon DeanerFront End Development Transformation at Scale, Damon Deaner
Front End Development Transformation at Scale, Damon Deaner
 
Structuring Data from Unstructured Things. Sean Lorenz
Structuring Data from Unstructured Things. Sean LorenzStructuring Data from Unstructured Things. Sean Lorenz
Structuring Data from Unstructured Things. Sean Lorenz
 
The Future is Modular, Jonathan Snook
The Future is Modular, Jonathan SnookThe Future is Modular, Jonathan Snook
The Future is Modular, Jonathan Snook
 
Designing an Enterprise CSS Framework is Hard, Stephanie Rewis
Designing an Enterprise CSS Framework is Hard, Stephanie RewisDesigning an Enterprise CSS Framework is Hard, Stephanie Rewis
Designing an Enterprise CSS Framework is Hard, Stephanie Rewis
 
Accessibility Is More Than What Lies In The Code, Jennison Asuncion
Accessibility Is More Than What Lies In The Code, Jennison AsuncionAccessibility Is More Than What Lies In The Code, Jennison Asuncion
Accessibility Is More Than What Lies In The Code, Jennison Asuncion
 
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
 
Designing for Dyslexia, Andrew Zusman
Designing for Dyslexia, Andrew ZusmanDesigning for Dyslexia, Andrew Zusman
Designing for Dyslexia, Andrew Zusman
 
Beyond Measure, Erika Hall
Beyond Measure, Erika HallBeyond Measure, Erika Hall
Beyond Measure, Erika Hall
 
Real Artists Ship, Haraldur Thorleifsson
Real Artists Ship, Haraldur ThorleifssonReal Artists Ship, Haraldur Thorleifsson
Real Artists Ship, Haraldur Thorleifsson
 
Ok Computer. Peter Gasston
Ok Computer. Peter GasstonOk Computer. Peter Gasston
Ok Computer. Peter Gasston
 
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi KayaDigital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
 
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer RigginsHow to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
 
The Wordpress Game Changer. Jenny Wong
The Wordpress Game Changer. Jenny WongThe Wordpress Game Changer. Jenny Wong
The Wordpress Game Changer. Jenny Wong
 
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Exploring Open Date with BigQuery: Jenny Tong

  • 1. Exploring Open Data with BigQuery
  • 2. Jenny Tong Developer Advocate Google Cloud Platform @MimmingCodes
  • 3. Agenda ● Origin story ● Count stuff ● How it works ● Some cool open data ● Do something useful
  • 10. SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_20150511_05] Wikipedia hits over 1 hour
  • 11. SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_201505] Wikipedia hits over 1 month
  • 12. Several years of Wikipedia data SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_201105], [fh-bigquery:wikipedia.pagecounts_201106], [fh-bigquery:wikipedia.pagecounts_201107], ...
  • 14. How about a RegExp SELECT SUM(requests) AS total FROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")') WHERE (REGEXP_MATCH(title, '.*[dD]inosaur.*'))
  • 15. How did it do that? o_O
  • 16. Qualities of a good RDBMS
  • 17. Qualities of a good RDBMS ● Inserts & locking ● Indexing ● Cache ● Query planning
  • 18. Qualities of a good RDBMS ● Inserts & locking ● Indexing ● Cache ● Query planning
  • 19.
  • 20.
  • 21.
  • 22. Storing data -- -- -- -- -- -- -- -- -- -- -- -- Table Columns Disks
  • 23. Reading data: Life of a BigQuery SELECT sum(requests) as sum FROM ( SELECT requests, title FROM [fh-bigquery:wikipedia. pagecounts_201501] WHERE (REGEXP_MATCH(title, '[Jj]en.+')) )
  • 24. Life of a BigQuery L L MMixer Leaf Storage
  • 25. L L L L M M M Life of a BigQuery Root Mixer Mixer Leaf Storage
  • 26. Life of a BigQuery Query L L L L M M MRoot Mixer Mixer Leaf Storage
  • 27. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage SELECT requests, title
  • 28. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT requests, title WHERE (REGEXP_MATCH(title, '[Jj]en.+'))
  • 29. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT sum(requests) 5.8 Mil WHERE (REGEXP_MATCH(title, '[Jj]en.+')) SELECT requests, title
  • 30. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT sum(requests) 5.8 Mil WHERE (REGEXP_MATCH(title, '[Jj]en.+')) SELECT requests, title SELECT sum(requests)
  • 35. GSOD
  • 36. Weather in Half Moon Bay SELECT DATE(year+mo+da) day, min, max FROM [fh-bigquery:weather_gsod.gsod2013] WHERE stn IN ( SELECT usaf FROM [fh-bigquery:weather_gsod.stations] WHERE name = 'HALF MOON BAY AIRPOR') AND max < 200 ORDER BY day;
  • 37. Weather in Half Moon Bay SELECT DATE(year+mo+da) day, min, max FROM [fh-bigquery:weather_gsod.gsod2013] WHERE stn IN ( SELECT usaf FROM [fh-bigquery:weather_gsod.stations] WHERE name = 'HALF MOON BAY AIRPOR') AND max < 200 ORDER BY day;
  • 38. Global high temperatures SELECT year, max(max) as max FROM TABLE_QUERY( [fh-bigquery:weather_gsod], 'table_id CONTAINS "gsod"') where max < 200 group by year order by year asc
  • 39. GDELT
  • 40. Stories per month - Massachusetts SELECT DATE(STRING(MonthYear) + '01') month, SUM(ActionGeo_ADM1Code='USMA') US FROM [gdelt-bq:full.events] WHERE MonthYear > 0 GROUP BY 1 ORDER BY 1
  • 41. SELECT DATE(STRING(MonthYear) + '01') month, SUM(ActionGeo_ADM1Code='USMA') / COUNT(*) newsyness FROM [gdelt-bq:full.events] WHERE MonthYear > 0 GROUP BY 1 ORDER BY 1 Stories per month, normalized
  • 43.
  • 44. Genomics SELECT Sample, SUM(single), SUM(double), FROM ( SELECT call.call_set_name AS Sample, SOME(call.genotype > 0) AND NOT EVERY(call. genotype > 0) WITHIN call AS single, EVERY(call.genotype > 0) WITHIN call AS double, FROM[genomics-public-data:1000_genomes.variants] OMIT RECORD IF reference_name IN ("X","Y","MT")) GROUP BY Sample ORDER BY Sample
  • 45. Genomics SELECT Sample, SUM(single), SUM(double), FROM ( SELECT call.call_set_name AS Sample, SOME(call.genotype > 0) AND NOT EVERY(call. genotype > 0) WITHIN call AS single, EVERY(call.genotype > 0) WITHIN call AS double, FROM[genomics-public-data:1000_genomes.variants] OMIT RECORD IF reference_name IN ("X","Y","MT")) GROUP BY Sample ORDER BY Sample
  • 46. Something useful: Use Wikipedia data to pick a movie
  • 47. 1. Wikipedia edits 2. ??? 3. Movie recommendation
  • 49. select title, id, count(id) as edits from [publicdata:samples.wikipedia] where title contains 'Hackers' and title contains '(film)' and wp_namespace = 0 group by title, id order by edits limit 10 Pick a great movie
  • 50. select title, id, count(id) as edits from [publicdata:samples.wikipedia] where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)' group each by title, id order by edits desc limit 100 Find edits in common
  • 51. Discover the most broadly popular films select id from ( select id, count(id) as edits from [publicdata:samples.wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20)
  • 52. Edits in common, minus broadly popular select title, id, count(id) as edits from [publicdata:samples.wikipedia] where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)' and id not in ( select id from ( select id, count(id) as edits from [publicdata:samples. wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20 ) ) group each by title, id order by edits desc limit 100
  • 53. What we talked about ● Origin story ● Count stuff ● How it works ● Some cool open data ● Practical applications
  • 54. ● Try BigQuery ○ bigquery.cloud.google.com ● Queries we ran ○ github.com/mimming/snippets ● Me ○ @MimmingCodes ○ google.com/+mimming The end