With BigQuery ML, you can build machine learning models without leaving the data warehouse environment and training it on massive datasets. We are going to demonstrate how to build, train, eval and predict, your own scalable machine learning models using standard SQL language in Google BigQuery.
We will see how can we use CREATE MODEL sql syntax to build different models such as:
-Linear regression
-Multiclass logistic regression for classification
-K-means clustering
-Import TensorFlow models for prediction in BigQuery
We will see how we can apply these models on tabular data in retail and marketing use cases.
Models are trained and accessed in BigQuery using SQL — a language data analysts know. This enables business decision making through predictive analytics across the organization without leaving the query editor.
2. ● Among the Top3 romanians on Stackoverflow 175k reputation
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net
● BigQuery + Redis database engine expert
Slideshare: martonkodok
Twitter: @martonkodok
StackOverflow: pentium10
GitHub: pentium10
BigQuery ML - Machine Learning at scale using SQL @martonkodok
About me
3. 1. What is BigQuery? - Data warehouse in the Cloud
2. Introduction to BigQuery ML - execute ML models using SQL
3. Practical use cases
4. Segment and recommend with BigQuery ML
5. Conclusions
Agenda
BigQuery ML - Machine Learning at scale using SQL @martonkodok
4. Legacy Reporting System
App
Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BigQuery ML - Machine Learning at scale using SQL @martonkodok
5. Serverless Reporting System
App
Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BigQuery Data Studio
Report & Share
Business Analysis
BigQuery ML - Machine Learning at scale using SQL @martonkodok
6. BigQuery ML - Machine Learning at scale using SQL @martonkodok
7. Analytics-as-a-Service - Data Warehouse in the Cloud
Familiar DB Structure (table, columns, views, struct, nested, JSON)
Decent pricing (storage: $20/TB cold: $10/TB,queries $5/TB) *May 2020
SQL 2011 + Javascript UDF (User Defined Functions)
BigQuery ML enables users to create machine learning models by SQL queries
Scales into Exabytes on Managed Infrastructure
Integrates with Cloud SQL + Cloud Storage + Sheets + Pub/Sub connectors
What is BigQuery?
BigQuery ML - Machine Learning at scale using SQL @martonkodok
8. 1. Load from file - either local or from GCS (max 5TB each)
2. Streaming rows - event driven approach - high throughput 1M rows/sec
3. Functions - observer-trigger based (Google Cloud Functions)
4. Join with Cloud SQL - Ability to join with MySQL, Postgres
5. Pipelines - flexibility to do ETL - FluentD, Kafka, Google Dataflow
6. Export from connected services - Firestore, Billing, AuditLogs, Stackdriver
7. Firebase - Analytics - Messaging - Crashlytics - Perf. Monitoring - Predictions
Loading Data into BigQuery
BigQuery ML - Machine Learning at scale using SQL @martonkodok
10. “ We have our app outside of GCP.
We need to join with our SQL database.
Solution: EXTERNAL_QUERY
BigQuery ML - Machine Learning at scale using SQL @martonkodok
11. Combine on-premise with Cloud
App
Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
BigQuery
BigQuery ML - Machine Learning at scale using SQL @martonkodok
Zone 1
us-east1-a
Replica
Cloud SQL
Cloud
VPN
Gateway
Execute combined
queries
Report
12. EXTERNAL_QUERY: Run in BQ a query from Cloud SQL db
BigQuery ML - Machine Learning at scale using SQL @martonkodok
13. ➢ Optimize product pages
Find, store, analyse in BQ time consuming user actions from using
25x more custom events/hits than Google Analytics
➢ Email engagement
Having stored every open/click raw data improve: subject line, layout,
follow up action emails, assistant like experience by heavy
A/B Split Tests on email marketing campaigns (interactive feedback loop)
➢ Funnel Analysis
Wrangle all the data to discover: a small improvement, an AI driven
upsell personal like experience, pre-sell products configured on the go -
not yet in catalog, but easily can be tweaked/customized
Where to use BigQuery?
BigQuery ML - Machine Learning at scale using SQL @martonkodok
14. ● SQL language to run BigData queries
● run raw ad-hoc queries (either by analysts/sales or Devs)
● no more throwing away-, expiring-, aggregating old data
● it’s serverless
● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
Our benefits
BigQuery ML - Machine Learning at scale using SQL @martonkodok
15. Easily Build Custom Reports and Dashboards
BigQuery ML - Machine Learning at scale using SQL @martonkodok
17. BigQuery ML - Machine Learning at scale using SQL @martonkodok
BigQuery ML
1. Execute ML initiatives without moving
data from BigQuery
2. Integrate on models in SQL in BigQuery
to increase development speed
3. Automate common ML tasks and
hyperparameter tuning
18. Developer SQL Data Scientist Use cases and skills
TensorFlow and
CloudML Engine
● Build and deploy state-of-art custom models
● Requires deep understanding of ML and
programming
BigQuery ML
● Build and deploy custom models using SQL
● Requires only basic understanding of ML
AutoML and
CloudML APIs
● Build and deploy Google-provided models for
standard use cases
● Requires almost no ML knowledge
BigQuery ML - Machine Learning at scale using SQL @martonkodok
Making ML accessible for all audiences
19. ● ML is hard, we don’t have dedicated team.
With BigQuery ML you need only devs who have good SQL skills.
● Extending your current stack with ML is no longer a steep learning curve using BigQuery ML
● Understand how to connect pieces of tabular data to fulfil a business requirement
● Start using the Cloud benefits and BigQuery ML as a complementary system
● Understand BigQuery ML to see that you don’t need large budget to add ML product improvements
#increase #innovation #work on #fun #stuff
Common mindset blockers
BigQuery ML - Machine Learning at scale using SQL @martonkodok
20. ● Linear regression for forecasting
● Binary or Multiclass logistic regression for classification (labels can have up to 50 unique values)
● K-means clustering for data segmentation (unsupervised learning - not require labels/training)
● Matrix factorization
● Import TensorFlow models for prediction in BigQuery
Supported models in BigQuery ML
BigQuery ML - Machine Learning at scale using SQL @martonkodok
21. Conversion/Purchase prediction MODEL: Logistic-Regression
Predict if a user “converts” or "purchases". It is in the company's interest if many users sign up for this
membership as it helps streamline their Ads convertion and also helps with recurring revenue.
Customer Lifetime Value (LTV) prediction. MODEL: Logistic-Regression
It is used by the organisations to identify and prioritizesignificantcustomersegments that would be most
valuable to the company.
Customer Segmentation MODEL: K-means clustering
dividing a client base into groups in specific ways relevanttomarketing, such as interestsandspending
habits. Segmentation allows marketers to better customize their efforts to various audience groups.
E-commerce Use Cases
BigQuery ML - Machine Learning at scale using SQL @martonkodok
22. Create a MODELthat predicts whether a website visitor will make a transaction.
● CREATEMODEL statement
● TheML.EVALUATE function to evaluate the ML model
● TheML.PREDICTfunction to make predictions using the ML model
Getting started with BigQuery ML
BigQuery ML - Machine Learning at scale using SQL @martonkodok
23. Create a binarylogisticregressionmodel
BigQuery ML - Machine Learning at scale using SQL @martonkodok
3
2
Create training dataset
using a labelcolumn
CREATEMODEL syntax
1
2
SELECT features
3
1
26. Use cases:
● Customer segmentation
● Data quality
Options and defaults
● Number of clusters: Default log10
(num_rows) clusters
● Distance type - Euclidean(default), Cosine
● Supports all major SQL data types including GIS
K-means clustering
BigQuery ML - Machine Learning at scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “kmeans”)
AS SELECT..
ml.PREDICT maps rows to closest clusters
ml.CENTROID for cluster centroids
ml.EVALUATE
ml.TRAINING_INFO
ml.FEATURE_INFO
27. Available data:
● Encode yes/no features
(eg: has a microwave, has a kitchen, has a TV, has a bathroom)
● Can apply clustering on the encoded data
K-means clustering: Problem definition
BigQuery ML - Machine Learning at scale using SQL @martonkodok
28. Premise
Predicting LTV for a new user
helps a company determine
which users are of most “value”,
understand those users’
common characteristics,
and focus more on them.
K-means clustering: Customer Lifetime Value
BigQuery ML - Machine Learning at scale using SQL @martonkodok
29. Premise
We can identify oddities
(potential data quality issues)
by grouping things together
and separating outliers.
K-means clustering: Problem definition
BigQuery ML - Machine Learning at scale using SQL @martonkodok
30. Use cases:
● Product recommendation
● Marketing campaign target optimization tool
Options and defaults
● Input: User, Item, Rating
● Can use L2 regularization
● Specify training-test split (default random 80-20)
Matrix Factorization
BigQuery ML - Machine Learning at scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “matrix_factorization”)
AS SELECT..
ml.PREDICT for user-item ratings
ml.RECOMMEND for full user-item matrix
ml.EVALUATE
ml.WEIGHTS
ml.TRAINING_INFO
ml.FEATURE_INFO
31. Available data:
● User
● Item
● Rating
Problem
● assigning values for previously unknown values
(zeros in our case)
Matrix Factorization: Problem definition
BigQuery ML - Machine Learning at scale using SQL @martonkodok
32. BigQuery ML - Matrix Factorization
BigQuery ML - Machine Learning at scale using SQL @martonkodok
CREATE MODEL wr_temp.purchases_mf_model
options(model_type= 'matrix_factorization' )
as
SELECT user,item,rating FROM `wr_temp.purchases`;
SELECT * FROM
ML.RECOMMEND(MODEL wr_temp.purchases_mf_model);
Step 1
Create a model from a dataset.
Step 2
To view the rating associated with a
given user-item pair, use ML.PREDICT
with the model name.
The output will return a rating
for each user-item pair.
33. BigQuery ML - Matrix Factorization
BigQuery ML - Machine Learning at scale using SQL @martonkodok
SELECT * FROM
ML.RECOMMEND(MODEL wr_temp.purchases_mf_model, (select 'John' as user)) r
WHERE item NOT IN (SELECT item FROM `wr_temp.purchases` p WHERE p.user=r.user AND rating <>0 )
ORDER BY predicted_rating ASC
Step 3
Pass a second optional table argument
to ML.RECOMMEND to get all of the
recommendations for
a set of users or items.
34. Use cases:
● Easily add TensorFlow predictions to BigQuery
● Build unstructured data models in TensorFlow,
predict in BigQuery
Key restrictions
● Model size limit of 250MB
Import TensorFlow models for prediction
BigQuery ML - Machine Learning at scale using SQL @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type =“tensorflow”,
Model_path =’gs://’)
ml.PREDICT()
DEMO
Search 'QueryIt Smart' on GitHub to learn more.
35. New on BigQuery UI - Training tab charts
BigQuery ML - Machine Learning at scale using SQL @martonkodok
36. New on BigQuery UI - Evaluation charts
BigQuery ML - Machine Learning at scale using SQL @martonkodok
37. New on BigQuery UI - Confusion Matrix
BigQuery ML - Machine Learning at scale using SQL @martonkodok
Percentage of actual
labels that were
classified:
- Correctly (Blue)
- Incorrectly (Grey)
39. Automation
● Run the process daily
● Determine hyperparameters
● Surface the results and route them somewhere for inspection and improvement
Testing
● AB test around impact of data quality on conversion and customer NPS (net promoter score)
Improvements
● Determine, and explore outliers
● Repeat, automate
Considerations
BigQuery ML - Machine Learning at scale using SQL @martonkodok
40. ● Democratizes the use of ML by empowering data analysts to build and run models using existing
business intelligence tools and spreadsheets
● Generalist team. Models are trained using SQL. There is no need to program an ML solution using
Python or Java.
● Increases the innovation and speed of model development by removing the need to export data from
the data warehouse.
● A Model serves a purpose. Easy to change/recycle.
Benefits of BigQuery ML
BigQuery ML - Machine Learning at scale using SQL @martonkodok
41. The possibilities are endless
BigQuery ML - Machine Learning at scale using SQL @martonkodok
Marketing Retail IndustrialandIoT Media/gaming
Predict customer value
Predict funnel conversion
Personalize ads, email,
webpage content
Optimize inventory
Forecase revenue
Enable product
recommendations
Optimize staff promotions
Forecast demand for
parking, traffic utilities,
personnel
Prevent equipment
downtime
Predict maintenance needs
Personalize content
Predict game difficulty
Predict player lifetime value
42. Thank you.
Rate session at registration site:
Registration.cloud.developerdays.pl
Slides available on:
slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity
to deliver projects.