BigQuery =Command line tools and Tips for business use=
Mulodo Open Study Group (MOSG) @Ho chi minh, Vietnam
http://www.meetup.com/Open-Study-Group-Saigon/events/231504491/
2. Whatâs BigQuery
Official site : https://cloud.google.com/bigquery/docs/
BigQuery is Google's fully managed, petabyte scale, low
cost analytics data warehouse.
BigQuery is NoOpsâthere is no infrastructure to manage
and you don't need a database administratorâso you can
focus on analyzing data to find meaningful insights, use
familiar SQL, and take advantage of our pay-as-you-go
model.
â DWH: SQL like (easy to use), Petabyte scale(for Huge data)
3. Previous study
âBigQuery - The First Step -â (2016/05/26)
⢠Just try to start for Google Big Query
⢠Using query on the Google Cloud Platform console.
⢠Create your own Dataset and Table
⢠Using query for your table GPC console.
http://www.meetup.com/Open-Study-Group-Saigon/events/231233151/
http://www.slideshare.net/nemo-mulodo/big-query-the-first-step-mosg
c.f. âBig Data - Overview - â
http://www.slide http://www.meetup.com/Open-Study-Group-Saigon/events/229243903/
share.net/nemo-mulodo/big-data-overview-mosg
4. Command line tools and Tips
1. Preparation (install SDK and settings)
2. Try command line tools
create datasets, tables and insert data.
3. Tips for business use.
How to charge?
Tips to reduce cost.
8. Install SKD to your PC. (1)
nemo@ubuntu-14:~$ curl https://sdk.cloud.google.com | bash
:
Installation directory (this will create a google-cloud-sdk subdirectory)
(/home/nemo): <-- Just type Enter (or you want)
:
Do you want to help improve the Google Cloud SDK (Y/n)? y
:
! BigQuery Command Line Tool ! 2.0.24 ! < 1 MiB !
! BigQuery Command Line Tool (Platform Specific)! 2.0.24 ! < 1 MiB !
:
Modify profile to update your $PATH and enable shell command
completion? (Y/n)? y (or you want)
:
For more information on how to get started, please visit:
https://cloud.google.com/sdk/#Getting_Started
nemo@ubuntu-14:~$ . ~/.bashrc <-- reload your bash environment
nemo@ubuntu-14:~$
9. Install SKD to your PC. (2)
// check the commands
nemo@ubuntu-14:~$ which bq
/home/nemo/google-cloud-sdk/bin/bq
nemo@ubuntu-14:~$ which gcloud
/Users/nemo/google-cloud-sdk/bin/gcloud
11. Activate your GPC account (1)
1. Preparation (create account)
2. Go to Google Cloud platform (has no account)
3. âTry IT Freeâ
https://cloud.google.com
nemo@ubuntu-14:~$ gcloud init
Welcome! This command will take you through the configuration of gcloud.
Your current configuration has been set to: [default]
To continue, you must log in. Would you like to log in (Y/n)?
Go to the following link in your browser:
https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
access_type=offline
Enter verification code: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
You are now logged in as: [xxxx@example.com]
This account has no projects. Please create one in developers console (https://
console.developers.google.com/project) before running this command.
nemo@ubuntu-14:~$
12. nemo@ubuntu-14:~$ gcloud init
Welcome! This command will take you through the configuration of gcloud.
Your current configuration has been set to: [default]
To continue, you must log in. Would you like to log in (Y/n)?
Go to the following link in your browser:
https://accounts.google.com/o/oauth2/auth?redirect_uri=ur&xxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
access_type=offline
Enter verification code: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
You are now logged in as: [xxxx@example.com]
This account has no projects. Please create one in developers console (https://
console.developers.google.com/project) before running this command.
nemo@ubuntu-14:~$
Activate your GPC account (2)
18. Activate your GPC account (8)
// set Project ID
nemo@ubuntu-14:~$ gcloud config set project {{PROJECT_ID}}
nemo@ubuntu-14:~$
// check the accounts
nemo@ubuntu-14:~$ gcloud auth list
- xxx@example.com (active)
To set the active account, run:
$ gcloud config set account ``ACCOUNT''
nemo@ubuntu-14:~$
27. Create table and import data (3)
Data (data.json)
{"id":1,"name":"nemo","engineer_type":1}
{"id":2,"name":"miki","engineer_type":1}
28. Create table and import data (4)
nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON
saigon_engineers.engineer_list data.json schema.json
Upload complete.
Waiting on bqjob_r23b898932d75d49a_000001554e5cae2f_1 ... (1s)
Current status: DONE
nemo@ubuntu-14:~$
bk load {PROJECT_ID}:{DATASET}.{TABLE} {data} {schema}
Create table and import data
https://cloud.google.com/bigquery/loading-data
29. Create table and import data (5)
nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON
saigon_engineers.engineer_list data.json
id:integer,
name:string,
engineer_type:integer
Upload complete.
Waiting on bqjob_r33b7802ea96b2c5d_000001554e4d21d5_1 ... (2s)
Current status: DONE
nemo@ubuntu-14:~$
Create table and import data : Another way
30. Create table and import data (6)
nemo@ubuntu-14:~$ bq mk open-study-group-
saigon:saigon_engineers.engineer_list schema.json
nemo@ubuntu-14:~$
Create table
bk mk {PROJECT_ID}:{DATASET}.{TABLE} {schema}
31. Create table and import data (7)
nemo@ubuntu-14:~$ bq load --source_format=NEWLINE_DELIMITED_JSON
saigon_engineers.engineer_list data.json
Upload complete.
Waiting on bqjob_r13717485c2c472e3_000001554e5b3ca3_1 ... (2s)
Current status: DONE
nemo@ubuntu-14:~$
Import data to database
bk load {PROJECT_ID}:{DATASET}.{TABLE} {data}
32. Query (1)
nemo@ubuntu-14:~$ bq show saigon_engineers.engineer_list
Last modified Schema Total Rows
Total Bytes Expiration
----------------- --------------------------- ------------
------------- ------------
14 Jun 10:02:35 |- id: integer 2 44
|- name: string
|- engineer_type: integer
nemo@ubuntu-14:~$
33. Query (2)
nemo@ubuntu-14:~$ bq query "SELECT name FROM
saigon_engineers.engineer_list"
Waiting on bqjob_r12185d1aa88d92c8_0000015552d709d2_1 ... (0s)
Current status: DONE
+------+
| name |
+------+
| nemo |
| miki |
+------+
nemo@ubuntu-14:~$
34. Query (3)
nemo@ubuntu-14:~$ bq query --dry_run "SELECT name FROM
saigon_engineers.engineer_list"
Query successfully validated. Assuming the tables are not
modified, running this query will process 12 bytes of data.
nemo@ubuntu-14:~$
bk query --dry_run âQUERYâ
- get size of using memory before execution.
39. Column oriented (1)
Sample case : database of Books
ID
(indexed)
title
(indexed)
contents
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
3 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
select id, title from books where name = âThe Catâ
40. Column oriented (2)
ID
(indexed)
title
(indexed)
contents
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
3 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
select * from books where title = âThe Catâ
@RDBMS
index (name)
hash data
hash data
hash data
data in databaseIndexes
scanned data
41. Column oriented (3)
ID
(indexed)
title
(indexed)
contents
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
3 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
select * from books where title = âThe Catâ
@BigQuery
data in database
scanned data
42. Column oriented (3)
ID
(indexed)
title
(indexed)
contents
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
3 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
select * from books where title = âThe Catâ
@BigQuery
data in database
scanned data
Full-scanâ¨
ANYTIME!!
43. Column oriented (4)
ID
(indexed)
title
(indexed)
contents
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
3 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
select * from books where title = âThe Catâ
@BigQuery
data in database
If your database is Tera-byte scale,
$5 per query !!!!
44. Column oriented (5)
ID
(indexed)
title
(indexed)
contents
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
3 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
select id, title from books where title = âThe Catâ
@RDBMS
index (name)
hash data
hash data
hash data
data in databaseIndexes
scanned data
45. Column oriented (6)
ID
(indexed)
title
(indexed)
contents
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
3 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
select id, title from books where title = âThe Catâ
@BigQuery
data in database
scanned data
46. Column oriented (6)
ID
(indexed)
title
(indexed)
contents
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
3 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
select id, title from books where title = âThe Catâ
@BigQuery
data in database
scanned data
Column
Oriented
48. Table division
Sample case : database of Books
select id, title from books where time in â2016/06/17â
: : : :
ID
(indexed)
title
(indexed)
contents
time
(indexed)
1 The Cat
Lorem ipsum dolor sit amet,
consectetur (... 1.2MB)
2016/01/01
00:00:00
2 Cats are love
Lorem ipsum dolor sit amet,
consectetur (... 1.5MB)
2016/01/01
00:01:23
353485397 Littul Kittons
Lorem ipsum dolor sit amet,
consectetur (... 0.8MB)
2016/06/17
00:01:46
49. Table division (1)
index (time)
hash data
hash data
hash data
data in databaseIndexes
scanned data
: : : :
ID
(indexed)
title
(indexed)
contents
time
(indexed)
1 The Cat
Lorem ipsum dolor sit
amet, consectetur (...
2016/01/01
00:00:00
2
Cats are
love
Lorem ipsum dolor sit
amet, consectetur (...
2016/01/01
00:01:23
353485397
Littul
Kittons
Lorem ipsum dolor sit
amet, consectetur (...
0.8MB)
2016/06/17
00:01:46
select id, title from books where time in â2016/06/17â
@RDBMS
50. Table division (2)
data in database
scanned data
: : : :
ID
(indexed)
title
(indexed)
contents
time
(indexed)
1 The Cat
Lorem ipsum dolor sit
amet, consectetur (...
1.2MB)
2016/01/01
00:00:00
2 Cats are love
Lorem ipsum dolor sit
amet, consectetur (...
1.5MB)
2016/01/01
00:01:23
353485397
Littul
Kittons
Lorem ipsum dolor sit
amet, consectetur (...
0.8MB)
2016/06/17
00:01:46
select id, title from books where time in â2016/06/17â
@BigQuery
Huge size
51. Table division (2)
data in database
scanned data
: : : :
ID
(indexed)
title
(indexed)
contents
time
(indexed)
1 The Cat
Lorem ipsum dolor sit
amet, consectetur (...
1.2MB)
2016/01/01
00:00:00
2 Cats are love
Lorem ipsum dolor sit
amet, consectetur (...
1.5MB)
2016/01/01
00:01:23
353485397
Littul
Kittons
Lorem ipsum dolor sit
amet, consectetur (...
0.8MB)
2016/06/17
00:01:46
select id, title from books where time in â2016/06/17â
@BigQuery
Huge size
52. Table division (3)
ID
(indexed)
title
(indexed)
contents
time
(indexed)
1 The Cat
Lorem ipsum dolor sit
amet, consectetur (...
1.2MB)
2016/01/01
00:00:00
2 Cats are love
Lorem ipsum dolor sit
amet, consectetur (...
1.5MB)
2016/01/01
00:01:23
ID
(indexed)
title
(indexed)
contents
time
(indexed)
353485397
Littul
Kittons
Lorem ipsum dolor sit
amet, consectetur (...
0.8MB)
2016/06/17
00:01:46
:
Tables
books_20160101
:
books_20160617
Divide tables for each day.
53. Table division (4)
ID
(indexed)
title
(indexed)
contents
time
(indexed)
1 The Cat
Lorem ipsum dolor sit
amet, consectetur (...
1.2MB)
2016/01/01
00:00:00
2 Cats are love
Lorem ipsum dolor sit
amet, consectetur (...
1.5MB)
2016/01/01
00:01:23
ID
(indexed)
title
(indexed)
contents
time
(indexed)
353485397
Littul
Kittons
Lorem ipsum dolor sit
amet, consectetur (...
0.8MB)
2016/06/17
00:01:46
:
books_20160101
:
books_20160617
select id, title from books where time in â2016/06/17â
@BigQuery
54. Table division (5)
ID
(indexed)
title
(indexed)
contents
time
(indexed)
1 The Cat
Lorem ipsum dolor sit
amet, consectetur (...
1.2MB)
2016/01/01
00:00:00
books_20160101
::
ID
(indexed)
title
(indexed)
contents
time
(indexed)
353485397
The Great
Catsby
Lorem ipsum dolor sit
amet, consectetur (...
0.8MB)
2016/06/16
00:01:46
books_20160616
select id, title from books
where time in â2016/06/16 - 2016/06/17â
@BigQuery
ID
(indexed)
title
(indexed)
contents
time
(indexed)
353485397
Littul
Kittons
Lorem ipsum dolor sit
amet, consectetur (...
2016/06/17
00:01:46
books_20160617