SlideShare a Scribd company logo
1 of 45
Download to read offline
Smarter Together
Bringing Relational Algebra, Powered by Apache
Calcite, into Looker’s Query Engine
Julian Hyde
Principal Software Engineer
#JOINdata
@lookerdata
This presentation contains forward-looking statements including, among other things, statements regarding the expected performance and benefits, potential availability and planned
features of Looker’s offerings. These forward-looking statements are subject to risks, uncertainties, and assumptions. If the risks materialize or assumptions prove incorrect, actual
results could differ materially from the results implied by these forward-looking statements. Risks include, but are not limited to, delay in product availability, failure to meet or satisfy
customer requirements, and slower than anticipated growth in our market. Looker assumes no obligation to, and does not currently intend to, update any such forward-looking
statements after the date of this presentation.
Any unreleased services, features or products referenced in this presentation are not currently available and may not be delivered on time or at all. Customers who purchase Looker’s
products and services should make their purchase decisions based upon features that are currently available.
Smarter Together
Bringing Relational Algebra,
Powered by Apache Calcite, into
Looker’s Query Engine
[ Julian Hyde | JOIN 2019 | 5-7 November 2019 | San Francisco ]
Ice cream costs $1.50
Topping costs $0.50
How much do 3 ice creams with toppings cost?
An algebra problem
Ice cream costs $1.50
Topping costs $0.50
How much do 3 ice creams with
toppings cost?
Method 1 (3 * $1.50) + (3 * $0.50) = $4.50 + $1.50 = $6.00
Method 2 3 * ($1.50 + $0.50) = 3 * $2.00 = $6.00
An algebra problem
How did you solve it?
Ice cream costs $1.50
Topping costs $0.50
How much do 3 ice creams with
toppings cost?
Method 1 (3 * $1.50) + (3 * $0.50) = $4.50 + $1.50 = $6.00
Method 2 3 * ($1.50 + $0.50) = 3 * $2.00 = $6.00
Algebraic identity: (a * b) + (a * c) = a * (b + c)
An algebra problem
It’s best to apply toppings first
Algebra is a mathematical
language for combining values
using operators.
Algebraic identities tell us how
the operators and values interact.
We can prove things by using the
identities as rewrite rules.
Algebra
Values numbers
Operators + - * /
Algebraic identities
●  a * (b + c) = (a * b) + (a * c)
●  a + 0 = a
●  a * 0 = 0
●  a + b = b + a
Relational algebra is an algebra
where the values are relations (aka
tables, multisets of records).
Relational algebra
Values relations
Operators join (⨝) union(∪)
project (Π) filter (σ)
Algebraic identities
●  A ⨝ (B ∪ C) = A ⨝ B ∪ A ⨝ C
●  A ∪ ∅ = A
●  A ⨝ ∅ = A
●  A ∪ B = B ∪ A
●  σP (σQ (A)) = σP ∧ Q (A)
A ⋈ (B ∪ C) = A ⋈ B ∪ A ⋈ C σP (σQ (A)) = σP ∧ Q (A)
Algebraic identities in SQL
SELECT *
FROM (SELECT * FROM products
WHERE color = ‘red’)
WHERE type = ‘bicycle’
=
SELECT * FROM products
WHERE color = ‘red’
AND type = ‘bicycle’
SELECT * FROM products
JOIN (SELECT FROM orders_2018
UNION ALL
SELECT * FROM orders_2019)
=
SELECT * FROM products JOIN orders_2018
UNION ALL
SELECT * FROM products JOIN orders_2019
Agenda
Relational algebra and query optimization
How Looker executes a query (and how Calcite can help)
Aggregate awareness & materialized views
Generate LookML model from SQL
Agenda
Relational algebra and query optimization
How Looker executes a query (and how Calcite can help)
Aggregate awareness & materialized views
Generate LookML model from SQL
Apache top-level project
Query planning framework used in many
projects and products
Also works standalone: embedded
federated query engine with SQL / JDBC
front end
Apache community development model
https://calcite.apache.org
https://github.com/apache/calcite
Apache Calcite
Core – Operator expressions
(relational algebra) and planner
(based on Volcano/Cascades)
External – Data storage,
algorithms and catalog
Optional – SQL parser, JDBC
& ODBC drivers
Extensible – Planner rewrite
rules, statistics, cost model,
algebra, UDFs
Apache Calcite - architecture
Based on set theory, plus
operators: Project, Filter,
Aggregate, Union, Join, Sort
Requires: declarative language
(SQL), query planner
Original goal: data independence
Enables: query optimization, new
algorithms and data structures
Relational algebra
Translation to SQL
SELECT d.name, COUNT(*) AS c
FROM emps AS e
JOIN depts AS d USING (deptno)
WHERE e.age > 50
GROUP BY d.deptno
HAVING COUNT(*) > 5
ORDER BY c DESC
Scan [emps] Scan [depts]
Join [e.deptno = d.deptno]
Filter [e.age > 50]
Aggregate [deptno, COUNT(*) AS c]
Sort [c DESC]
Project [name, c]
Filter [c > 5]
Optimize by applying rewrite rules
that preserve semantics
Hopefully the result is less
expensive; but it’s OK if it’s not
(planner keeps “before” and
“after”)
Planner uses dynamic
programming, seeking the lowest
total cost
Relational algebra
Algebraic rewrite
SELECT d.name, COUNT(*) AS c
FROM emps AS e
JOIN depts AS d USING (deptno)
WHERE e.age > 50
GROUP BY d.deptno
HAVING COUNT(*) > 5
ORDER BY c DESC
Scan [emps] Scan [depts]
Join [e.deptno = d.deptno]
Filter [e.age > 50]
Aggregate [deptno, COUNT(*) AS c]
Filter [c > 5]
Project [name, c]
Sort [c DESC]
Calcite framework
Cost, statistics
RelOptCost
RelOptCostFactory
RelMetadataProvider
•  RelMdColumnUniqueness
•  RelMdDistinctRowCount
•  RelMdSelectivity
SQL parser
SqlNode
SqlParser
SqlValidator
Transformation rules
RelOptRule
•  FilterMergeRule
•  AggregateUnionTransposeRule
•  100+ more
Global transformations
•  Unification (materialized view)
•  Column trimming
•  De-correlation
Relational algebra
RelNode (operator)
•  TableScan
•  Filter
•  Project
•  Union
•  Aggregate
•  …
RelDataType (type)
RexNode (expression)
RelTrait (physical property)
•  RelConvention (calling-convention)
•  RelCollation (sortedness)
•  RelDistribution (partitioning)
RelBuilder
JDBC driver
Remote driver
Local driver
Metadata
Schema
Table
Function
•  TableFunction
•  TableMacro
Lattice
Agenda
Relational algebra and query optimization
How Looker executes a query (and how Calcite can help)
Aggregate awareness & materialized views
Generate LookML model from SQL
Looker SQL Generation Overview
Looker
Query
Request
Generated
SQL
Rel Builder
Looker
Extensions
(Kotlin)
Compiled
LookML a.k.a.
HTQuery
Relational Algebra
Expressions
(Calcite/Java)
SQL Node
HTDialect
(Ruby)
SQL AST
(Calcite/Java)
HTSQLWriter
HTDialect
PDT Service
Aggregate
Awareness
Looker SQL Generation (old)
Looker
Query
Request
Generated
SQL
Rel Builder
Looker
Extensions
(Kotlin)
Compiled
LookML a.k.a.
HTQuery
Relational Algebra
Expressions
(Calcite/Java)
SQL Node
HTDialect
(Ruby)
SQL AST
(Calcite/Java)
HTSQLWriter
HTDialect
PDT Service
Aggregate
Awareness
Looker SQL Generation (new)
Looker
Query
Request
Generated
SQL
Rel Builder
Looker
Extensions
(Kotlin)
Compiled
LookML a.k.a.
HTQuery
Relational Algebra
Expressions
(Calcite/Java)
SQL Node
HTDialect
(Ruby)
SQL AST
(Calcite/Java)
HTSQLWriter
HTDialect
PDT Service
Aggregate
Awareness
Agenda
Relational algebra and query optimization
How Looker executes a query (and how Calcite can help)
Aggregate awareness & materialized views
Generate LookML model from SQL
Aggregates to the rescue
date product_id customer_id revenue
2018-01-05 100 12345 $17
2018-01-07 105 23456 $120
customer_id state
12345 CA
23456 TX
orders (100 million rows)
customers (12 million rows)
Business question:
Which state had the most
revenue in 2018?
Raw data – 120 million rows
Summary data – 2,000 rows
Aggregates to the rescue
quarter state sum_revenue sales_count
2018-Q1 CA $15,567 46
2018-Q1 TX $23,006 88
orders_by_state_and_quarter (2,000 rows)
Business question:
Which state had the most
revenue in 2018?
date product_id customer_id revenue
2018-01-05 100 12345 $17
2018-01-07 105 23456 $120
customer_id state
12345 CA
23456 TX
orders (100 million rows)
customers (12 million rows)
Aggregates in LookML
LookML - orders
view: orders {
dimension_group: created_at {
type: time
sql: ${TABLE}.created_at ;;
}
dimension: product_id {}
dimension: customer_id {}
measure: revenue {}
}
view: customers {
dimension: customer_id {}
dimension: state {}
}
explore: orders {
join: customers {}
}
NOTE: This feature is not production, and syntax may change
LookML - orders_by_state_quarter
view: orders_by_state_quarter {
derived_table: {
explore_source: orders {
column: created_at_quarter {
field: orders.created_at_quarter
}
column: state {
field: customers.state
}
column: sum_revenue {
sql: sum(${orders.revenue});;
}
}
explore: orders_by_state_quarter {}
A view is a virtual table that expands to a relational expression
●  No data stored on disk (except its definition)
●  Purpose: abstraction
A materialized view is a table whose contents are defined by a relational expression
●  Purpose: improve query performance
A aggregate table or summary table is a materialized view that uses GROUP BY
(and perhaps JOIN) and has many fewer rows than its base table
A system with aggregate awareness automatically builds aggregate tables and uses
them to make queries faster
Views, materialized views,
aggregate awareness
View
SELECT deptno, MIN(salary)
FROM managers
WHERE age > 50
GROUP BY deptno
CREATE VIEW managers AS
SELECT *
FROM emps AS e
WHERE EXISTS (
SELECT *
FROM emps AS underling
WHERE underling.manager = e.id)
Declaration of the
managers view
Query that uses the
managers view
View
With relational algebra
SELECT deptno, MIN(salary)
FROM managers
WHERE age > 50
GROUP BY deptno
Scan [emps] Scan [emps]
Join [e.id = underling.manager]
Project [id, deptno, salary, age]
Aggregate [manager]
CREATE VIEW managers AS
SELECT *
FROM emps AS e
WHERE EXISTS (
SELECT *
FROM emps AS underling
WHERE underling.manager = e.id)
Filter [age > 50]
Aggregate [deptno, MIN(salary)]
Scan [managers]
View
Showing the view and the relational expression that will replace it
SELECT deptno, MIN(salary)
FROM managers
WHERE age > 50
GROUP BY deptno
Scan [emps] Scan [emps]
Join [e.id = underling.manager]
Project [id, deptno, salary, age]
Aggregate [manager]
CREATE VIEW managers AS
SELECT *
FROM emps AS e
WHERE EXISTS (
SELECT *
FROM emps AS underling
WHERE underling.manager = e.id)
Filter [age > 50]
Aggregate [deptno, MIN(salary)]
Scan [managers]
View
After view has been expanded
SELECT deptno, MIN(salary)
FROM managers
WHERE age > 50
GROUP BY deptno
Scan [emps] Scan [emps]
Join [e.id = underling.manager]
Project [id, deptno, salary, age]
Aggregate [manager]
CREATE VIEW managers AS
SELECT *
FROM emps AS e
WHERE EXISTS (
SELECT *
FROM emps AS underling
WHERE underling.manager = e.id)
Filter [age > 50]
Aggregate [deptno, MIN(salary)]
View
After pushing down “age > 50” filter
SELECT deptno, MIN(salary)
FROM managers
WHERE age > 50
GROUP BY deptno
Scan [emps] Scan [emps]
Join [e.id = underling.manager]
Project [id, deptno, salary, age]CREATE VIEW managers AS
SELECT *
FROM emps AS e
WHERE EXISTS (
SELECT *
FROM emps AS underling
WHERE underling.manager = e.id)
Aggregate [deptno, MIN(salary)]
Filter [age > 50] Aggregate [manager]
Materialized view
CREATE MATERIALIZED VIEW
emps_by_deptno_gender AS
SELECT deptno, gender,
COUNT(*) AS c, SUM(sal) AS s
FROM emps
GROUP BY deptno, gender
SELECT COUNT(*) AS c
FROM emps
WHERE deptno = 10
AND gender = ‘M’
Query that uses the emps table
but could potentially use the
emps_by_deptno_gender
materialized view
Declaration of the
emps_by_deptno_gender
materialized view
Materialized view
With relational algebra
CREATE MATERIALIZED VIEW
emps_by_deptno_gender AS
SELECT deptno, gender,
COUNT(*) AS c, SUM(sal) AS s
FROM emps
GROUP BY deptno, gender
SELECT COUNT(*) AS c
FROM emps
WHERE deptno = 10
AND gender = ‘M’
Scan [emps]
Filter [deptno = 10 AND gender = ‘M’]
Aggregate [COUNT(*)]
Scan [emps_by_
deptno_gender] = Scan [emps]
Aggregate [deptno, gender,
COUNT(*), SUM(salary)]
Materialized view
Rewrite query to match
CREATE MATERIALIZED VIEW
emps_by_deptno_gender AS
SELECT deptno, gender,
COUNT(*) AS c, SUM(sal) AS s
FROM emps
GROUP BY deptno, gender
SELECT COUNT(*) AS c
FROM emps
WHERE deptno = 10
AND gender = ‘M’
Scan [emps_by_
deptno_gender] = Scan [emps]
Aggregate [deptno, gender,
COUNT(*), SUM(salary)]
Scan [emps]
Filter [deptno = 10 AND gender = ‘M’]
Aggregate [deptno, gender,
COUNT(*), SUM(salary)]
Project [c]
Materialized view
Fragment of query is now identical to materialized view
CREATE MATERIALIZED VIEW
emps_by_deptno_gender AS
SELECT deptno, gender,
COUNT(*) AS c, SUM(sal) AS s
FROM emps
GROUP BY deptno, gender
SELECT COUNT(*) AS c
FROM emps
WHERE deptno = 10
AND gender = ‘M’
Scan [emps_by_
deptno_gender] = Scan [emps]
Aggregate [deptno, gender,
COUNT(*), SUM(salary)]
Scan [emps]
Filter [deptno = 10 AND gender = ‘M’]
Aggregate [deptno, gender,
COUNT(*), SUM(salary)]
Project [c]
Materialized view
Substitute table scan
CREATE MATERIALIZED VIEW
emps_by_deptno_gender AS
SELECT deptno, gender,
COUNT(*) AS c, SUM(sal) AS s
FROM emps
GROUP BY deptno, gender
SELECT COUNT(*) AS c
FROM emps
WHERE deptno = 10
AND gender = ‘M’
Filter [deptno = 10 AND gender = ‘M’]
Scan [emps_by_
deptno_gender] = Scan [emps]
Aggregate [deptno, gender,
COUNT(*), SUM(salary)]
Project [c]
Scan [emps_by_
deptno_gender]
Aggregates in LookML
Aggregate table is used even by queries on the orders explore
LookML - orders
view: orders {
dimension_group: created_at {
type: time
sql: ${TABLE}.created_at ;;
}
dimension: product_id {}
dimension: customer_id {}
measure: revenue {}
}
view: customers {
dimension: customer_id {}
dimension: state {}
}
explore: orders {
join: customers {}
}
LookML - orders_by_state_quarter
view: orders_by_state_quarter {
derived_table: {
explore_source: orders {
column: created_at_quarter {
field: orders.created_at_quarter
}
column: state {
field: customers.state
}
column: sum_revenue {
sql: sum(${orders.revenue});;
}
}
explore: orders_by_state_quarter {}
Agenda
Relational algebra and query optimization
How Looker executes a query (and how Calcite can help)
Aggregate awareness & materialized views
Generate LookML model from SQL
Companies that are just starting to use Looker often have a
database and log of queries run against that database.
Parsing the SQL queries in the log, we can recover the
dimensions, measures, relationships, and put them into a
LookML model.
We call this process “model induction”.
A model out of thin air
Convert a query to algebra
SELECT YEAR(o.created_at), COUNT(*) AS c,
SUM(i.revenue) AS sum_revenue,
SUM(i.revenue - i.cost) AS sum_profit
FROM orders AS o,
customers AS c,
order_items AS i
WHERE o.customer_id = c.customer_id
AND o.order_id = i.order_id
AND i.product_id IN (
SELECT product_id
FROM products AS p
WHERE color = ‘Red’)
GROUP BY YEAR(o.created_at)
Scan [orders]
Scan [products]
Join [true]
Project [*, YEAR(o.created_at)
AS y, i.revenue - i.cost AS p]
Aggregate [deptno, COUNT(*),
SUM(i.revenue), SUM(p)]
Filter [color = ‘Red’]
Scan [customers]
Scan [order_items]
Join [true]
Filter [o.customer_id = c.customer_id
AND o.order_id = i.order_id]
SemiJoin [i.product_id = p.product_id]
Convert a query to algebra
SELECT YEAR(o.created_at), COUNT(*) AS c,
SUM(i.revenue) AS sum_revenue,
SUM(i.revenue - i.cost) AS sum_profit
FROM orders AS o,
customers AS c,
order_items AS i
WHERE o.customer_id = c.customer_id
AND o.order_id = i.order_id
AND i.product_id IN (
SELECT product_id
FROM products AS p
WHERE color = ‘Red’)
GROUP BY YEAR(o.created_at)
orders
products
Join [true]
Project [*, YEAR(o.created_at)
AS y, i.revenue - i.cost AS p]
Aggregate [deptno, COUNT(*),
SUM(i.revenue), SUM(p)]
Filter [color = ‘Red’]
customers
order_items
Join [true]
Filter [o.customer_id = c.customer_id
AND o.order_id = i.order_id]
SemiJoin [i.product_id = p.product_id]
Convert query to join graph, measures
SELECT YEAR(o.created_at), COUNT(*) AS c,
SUM(i.revenue) AS sum_revenue,
SUM(i.revenue - i.cost) AS sum_profit
FROM orders AS o,
customers AS c,
order_items AS i
WHERE o.customer_id = c.customer_id
AND o.order_id = i.order_id
AND i.product_id IN (
SELECT product_id
FROM products AS p
WHERE color = ‘Red’)
GROUP BY YEAR(o.created_at)
orders
products
customers
order_items
measures
COUNT(*)
SUM(i.revenue)
SUM(i.revenue - i.cost)
Query file
SELECT d1, d2, m1, m2
FROM customers
JOIN orders
SELECT d1, d3, m2, m3
FROM products
JOIN orders
SELECT d2, m4
FROM products
JOIN inventory
SELECT m5
FROM products
JOIN product_types
customers orders
products
product
types
inventoryproducts
products orders
1. Parse & validate SQL
query log; convert to
relational algebra
2. Construct join graphs;
deduce PK-FK relationships by
running cardinality queries
orders
customers
products
product
types
inventory
products
3. Cluster join graphs into
stars, each with a central
“fact table”
LookML
view: customers {}
view: orders {}
view: products {}
view: product_types {}
explore: orders {
join: customers {}
join: products {}
join: product_types {}
}
view: products {}
view: inventory {}
explore: inventory {
join: products {}
}
4. Generate LookML;
one explore per star
Generate LookML model from SQL queries
Any questions?
@julianhyde
@ApacheCalcite
https://calcite.apache.org
https://github.com/apache/calcite
Questions?
Rate this session in
the JOIN mobile app #JOINdata
@lookerdata
@julianhyde
@ApacheCalcite
https://calcite.apache.org
https://github.com/apache/calcite
#JOINdata
@lookerdata

More Related Content

What's hot

Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can helpChristian Tzolov
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Care and Feeding of Catalyst Optimizer
Care and Feeding of Catalyst OptimizerCare and Feeding of Catalyst Optimizer
Care and Feeding of Catalyst OptimizerDatabricks
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Spark Summit
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 

What's hot (20)

Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Care and Feeding of Catalyst Optimizer
Care and Feeding of Catalyst OptimizerCare and Feeding of Catalyst Optimizer
Care and Feeding of Catalyst Optimizer
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 

Similar to Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, into Looker’s Query Engine

Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in HiveDataWorks Summit
 
Chris Seebacher Portfolio
Chris Seebacher PortfolioChris Seebacher Portfolio
Chris Seebacher Portfolioguest3ea163
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfoliorolee23
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergWalaa Eldin Moustafa
 
William Schaffrans Bus Intelligence Portfolio
William Schaffrans Bus Intelligence PortfolioWilliam Schaffrans Bus Intelligence Portfolio
William Schaffrans Bus Intelligence Portfoliowschaffr
 
Explain the explain_plan
Explain the explain_planExplain the explain_plan
Explain the explain_planMaria Colgan
 
05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptxKareemBullard1
 
SQL Server Query Optimization, Execution and Debugging Query Performance
SQL Server Query Optimization, Execution and Debugging Query PerformanceSQL Server Query Optimization, Execution and Debugging Query Performance
SQL Server Query Optimization, Execution and Debugging Query PerformanceVinod Kumar
 
Nitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence PortfolioNitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence Portfolionpatel2362
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence PortfolioChris Seebacher
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cRonald Francisco Vargas Quesada
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statementsxKinAnx
 
Oracle data integrator project
Oracle data integrator projectOracle data integrator project
Oracle data integrator projectAmit Sharma
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
 
Rodney Matejek Portfolio
Rodney Matejek PortfolioRodney Matejek Portfolio
Rodney Matejek Portfoliormatejek
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Johann de Boer
 
MS SQL Server Analysis Services 2008 and Enterprise Data Warehousing
MS SQL Server Analysis Services 2008 and Enterprise Data WarehousingMS SQL Server Analysis Services 2008 and Enterprise Data Warehousing
MS SQL Server Analysis Services 2008 and Enterprise Data WarehousingSlava Kokaev
 
KWizCom Aggregation solutions for sharepoint
KWizCom Aggregation solutions for sharepointKWizCom Aggregation solutions for sharepoint
KWizCom Aggregation solutions for sharepointNimrod Geva
 

Similar to Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, into Looker’s Query Engine (20)

Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
 
Chris Seebacher Portfolio
Chris Seebacher PortfolioChris Seebacher Portfolio
Chris Seebacher Portfolio
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfolio
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
William Schaffrans Bus Intelligence Portfolio
William Schaffrans Bus Intelligence PortfolioWilliam Schaffrans Bus Intelligence Portfolio
William Schaffrans Bus Intelligence Portfolio
 
Explain the explain_plan
Explain the explain_planExplain the explain_plan
Explain the explain_plan
 
05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx
 
SQL Server Query Optimization, Execution and Debugging Query Performance
SQL Server Query Optimization, Execution and Debugging Query PerformanceSQL Server Query Optimization, Execution and Debugging Query Performance
SQL Server Query Optimization, Execution and Debugging Query Performance
 
Nitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence PortfolioNitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence Portfolio
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
SQL Optimizer vs Hive
SQL Optimizer vs Hive SQL Optimizer vs Hive
SQL Optimizer vs Hive
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statements
 
Oracle data integrator project
Oracle data integrator projectOracle data integrator project
Oracle data integrator project
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Rodney Matejek Portfolio
Rodney Matejek PortfolioRodney Matejek Portfolio
Rodney Matejek Portfolio
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
 
MS SQL Server Analysis Services 2008 and Enterprise Data Warehousing
MS SQL Server Analysis Services 2008 and Enterprise Data WarehousingMS SQL Server Analysis Services 2008 and Enterprise Data Warehousing
MS SQL Server Analysis Services 2008 and Enterprise Data Warehousing
 
KWizCom Aggregation solutions for sharepoint
KWizCom Aggregation solutions for sharepointKWizCom Aggregation solutions for sharepoint
KWizCom Aggregation solutions for sharepoint
 

More from Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 

More from Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 

Recently uploaded

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 

Recently uploaded (20)

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 

Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, into Looker’s Query Engine

  • 1. Smarter Together Bringing Relational Algebra, Powered by Apache Calcite, into Looker’s Query Engine Julian Hyde Principal Software Engineer #JOINdata @lookerdata
  • 2. This presentation contains forward-looking statements including, among other things, statements regarding the expected performance and benefits, potential availability and planned features of Looker’s offerings. These forward-looking statements are subject to risks, uncertainties, and assumptions. If the risks materialize or assumptions prove incorrect, actual results could differ materially from the results implied by these forward-looking statements. Risks include, but are not limited to, delay in product availability, failure to meet or satisfy customer requirements, and slower than anticipated growth in our market. Looker assumes no obligation to, and does not currently intend to, update any such forward-looking statements after the date of this presentation. Any unreleased services, features or products referenced in this presentation are not currently available and may not be delivered on time or at all. Customers who purchase Looker’s products and services should make their purchase decisions based upon features that are currently available.
  • 3. Smarter Together Bringing Relational Algebra, Powered by Apache Calcite, into Looker’s Query Engine [ Julian Hyde | JOIN 2019 | 5-7 November 2019 | San Francisco ]
  • 4. Ice cream costs $1.50 Topping costs $0.50 How much do 3 ice creams with toppings cost? An algebra problem
  • 5. Ice cream costs $1.50 Topping costs $0.50 How much do 3 ice creams with toppings cost? Method 1 (3 * $1.50) + (3 * $0.50) = $4.50 + $1.50 = $6.00 Method 2 3 * ($1.50 + $0.50) = 3 * $2.00 = $6.00 An algebra problem How did you solve it?
  • 6. Ice cream costs $1.50 Topping costs $0.50 How much do 3 ice creams with toppings cost? Method 1 (3 * $1.50) + (3 * $0.50) = $4.50 + $1.50 = $6.00 Method 2 3 * ($1.50 + $0.50) = 3 * $2.00 = $6.00 Algebraic identity: (a * b) + (a * c) = a * (b + c) An algebra problem It’s best to apply toppings first
  • 7. Algebra is a mathematical language for combining values using operators. Algebraic identities tell us how the operators and values interact. We can prove things by using the identities as rewrite rules. Algebra Values numbers Operators + - * / Algebraic identities ●  a * (b + c) = (a * b) + (a * c) ●  a + 0 = a ●  a * 0 = 0 ●  a + b = b + a
  • 8. Relational algebra is an algebra where the values are relations (aka tables, multisets of records). Relational algebra Values relations Operators join (⨝) union(∪) project (Π) filter (σ) Algebraic identities ●  A ⨝ (B ∪ C) = A ⨝ B ∪ A ⨝ C ●  A ∪ ∅ = A ●  A ⨝ ∅ = A ●  A ∪ B = B ∪ A ●  σP (σQ (A)) = σP ∧ Q (A)
  • 9. A ⋈ (B ∪ C) = A ⋈ B ∪ A ⋈ C σP (σQ (A)) = σP ∧ Q (A) Algebraic identities in SQL SELECT * FROM (SELECT * FROM products WHERE color = ‘red’) WHERE type = ‘bicycle’ = SELECT * FROM products WHERE color = ‘red’ AND type = ‘bicycle’ SELECT * FROM products JOIN (SELECT FROM orders_2018 UNION ALL SELECT * FROM orders_2019) = SELECT * FROM products JOIN orders_2018 UNION ALL SELECT * FROM products JOIN orders_2019
  • 10. Agenda Relational algebra and query optimization How Looker executes a query (and how Calcite can help) Aggregate awareness & materialized views Generate LookML model from SQL
  • 11. Agenda Relational algebra and query optimization How Looker executes a query (and how Calcite can help) Aggregate awareness & materialized views Generate LookML model from SQL
  • 12. Apache top-level project Query planning framework used in many projects and products Also works standalone: embedded federated query engine with SQL / JDBC front end Apache community development model https://calcite.apache.org https://github.com/apache/calcite Apache Calcite
  • 13. Core – Operator expressions (relational algebra) and planner (based on Volcano/Cascades) External – Data storage, algorithms and catalog Optional – SQL parser, JDBC & ODBC drivers Extensible – Planner rewrite rules, statistics, cost model, algebra, UDFs Apache Calcite - architecture
  • 14. Based on set theory, plus operators: Project, Filter, Aggregate, Union, Join, Sort Requires: declarative language (SQL), query planner Original goal: data independence Enables: query optimization, new algorithms and data structures Relational algebra Translation to SQL SELECT d.name, COUNT(*) AS c FROM emps AS e JOIN depts AS d USING (deptno) WHERE e.age > 50 GROUP BY d.deptno HAVING COUNT(*) > 5 ORDER BY c DESC Scan [emps] Scan [depts] Join [e.deptno = d.deptno] Filter [e.age > 50] Aggregate [deptno, COUNT(*) AS c] Sort [c DESC] Project [name, c] Filter [c > 5]
  • 15. Optimize by applying rewrite rules that preserve semantics Hopefully the result is less expensive; but it’s OK if it’s not (planner keeps “before” and “after”) Planner uses dynamic programming, seeking the lowest total cost Relational algebra Algebraic rewrite SELECT d.name, COUNT(*) AS c FROM emps AS e JOIN depts AS d USING (deptno) WHERE e.age > 50 GROUP BY d.deptno HAVING COUNT(*) > 5 ORDER BY c DESC Scan [emps] Scan [depts] Join [e.deptno = d.deptno] Filter [e.age > 50] Aggregate [deptno, COUNT(*) AS c] Filter [c > 5] Project [name, c] Sort [c DESC]
  • 16. Calcite framework Cost, statistics RelOptCost RelOptCostFactory RelMetadataProvider •  RelMdColumnUniqueness •  RelMdDistinctRowCount •  RelMdSelectivity SQL parser SqlNode SqlParser SqlValidator Transformation rules RelOptRule •  FilterMergeRule •  AggregateUnionTransposeRule •  100+ more Global transformations •  Unification (materialized view) •  Column trimming •  De-correlation Relational algebra RelNode (operator) •  TableScan •  Filter •  Project •  Union •  Aggregate •  … RelDataType (type) RexNode (expression) RelTrait (physical property) •  RelConvention (calling-convention) •  RelCollation (sortedness) •  RelDistribution (partitioning) RelBuilder JDBC driver Remote driver Local driver Metadata Schema Table Function •  TableFunction •  TableMacro Lattice
  • 17. Agenda Relational algebra and query optimization How Looker executes a query (and how Calcite can help) Aggregate awareness & materialized views Generate LookML model from SQL
  • 18. Looker SQL Generation Overview Looker Query Request Generated SQL Rel Builder Looker Extensions (Kotlin) Compiled LookML a.k.a. HTQuery Relational Algebra Expressions (Calcite/Java) SQL Node HTDialect (Ruby) SQL AST (Calcite/Java) HTSQLWriter HTDialect PDT Service Aggregate Awareness
  • 19. Looker SQL Generation (old) Looker Query Request Generated SQL Rel Builder Looker Extensions (Kotlin) Compiled LookML a.k.a. HTQuery Relational Algebra Expressions (Calcite/Java) SQL Node HTDialect (Ruby) SQL AST (Calcite/Java) HTSQLWriter HTDialect PDT Service Aggregate Awareness
  • 20. Looker SQL Generation (new) Looker Query Request Generated SQL Rel Builder Looker Extensions (Kotlin) Compiled LookML a.k.a. HTQuery Relational Algebra Expressions (Calcite/Java) SQL Node HTDialect (Ruby) SQL AST (Calcite/Java) HTSQLWriter HTDialect PDT Service Aggregate Awareness
  • 21. Agenda Relational algebra and query optimization How Looker executes a query (and how Calcite can help) Aggregate awareness & materialized views Generate LookML model from SQL
  • 22. Aggregates to the rescue date product_id customer_id revenue 2018-01-05 100 12345 $17 2018-01-07 105 23456 $120 customer_id state 12345 CA 23456 TX orders (100 million rows) customers (12 million rows) Business question: Which state had the most revenue in 2018?
  • 23. Raw data – 120 million rows Summary data – 2,000 rows Aggregates to the rescue quarter state sum_revenue sales_count 2018-Q1 CA $15,567 46 2018-Q1 TX $23,006 88 orders_by_state_and_quarter (2,000 rows) Business question: Which state had the most revenue in 2018? date product_id customer_id revenue 2018-01-05 100 12345 $17 2018-01-07 105 23456 $120 customer_id state 12345 CA 23456 TX orders (100 million rows) customers (12 million rows)
  • 24. Aggregates in LookML LookML - orders view: orders { dimension_group: created_at { type: time sql: ${TABLE}.created_at ;; } dimension: product_id {} dimension: customer_id {} measure: revenue {} } view: customers { dimension: customer_id {} dimension: state {} } explore: orders { join: customers {} } NOTE: This feature is not production, and syntax may change LookML - orders_by_state_quarter view: orders_by_state_quarter { derived_table: { explore_source: orders { column: created_at_quarter { field: orders.created_at_quarter } column: state { field: customers.state } column: sum_revenue { sql: sum(${orders.revenue});; } } explore: orders_by_state_quarter {}
  • 25. A view is a virtual table that expands to a relational expression ●  No data stored on disk (except its definition) ●  Purpose: abstraction A materialized view is a table whose contents are defined by a relational expression ●  Purpose: improve query performance A aggregate table or summary table is a materialized view that uses GROUP BY (and perhaps JOIN) and has many fewer rows than its base table A system with aggregate awareness automatically builds aggregate tables and uses them to make queries faster Views, materialized views, aggregate awareness
  • 26. View SELECT deptno, MIN(salary) FROM managers WHERE age > 50 GROUP BY deptno CREATE VIEW managers AS SELECT * FROM emps AS e WHERE EXISTS ( SELECT * FROM emps AS underling WHERE underling.manager = e.id) Declaration of the managers view Query that uses the managers view
  • 27. View With relational algebra SELECT deptno, MIN(salary) FROM managers WHERE age > 50 GROUP BY deptno Scan [emps] Scan [emps] Join [e.id = underling.manager] Project [id, deptno, salary, age] Aggregate [manager] CREATE VIEW managers AS SELECT * FROM emps AS e WHERE EXISTS ( SELECT * FROM emps AS underling WHERE underling.manager = e.id) Filter [age > 50] Aggregate [deptno, MIN(salary)] Scan [managers]
  • 28. View Showing the view and the relational expression that will replace it SELECT deptno, MIN(salary) FROM managers WHERE age > 50 GROUP BY deptno Scan [emps] Scan [emps] Join [e.id = underling.manager] Project [id, deptno, salary, age] Aggregate [manager] CREATE VIEW managers AS SELECT * FROM emps AS e WHERE EXISTS ( SELECT * FROM emps AS underling WHERE underling.manager = e.id) Filter [age > 50] Aggregate [deptno, MIN(salary)] Scan [managers]
  • 29. View After view has been expanded SELECT deptno, MIN(salary) FROM managers WHERE age > 50 GROUP BY deptno Scan [emps] Scan [emps] Join [e.id = underling.manager] Project [id, deptno, salary, age] Aggregate [manager] CREATE VIEW managers AS SELECT * FROM emps AS e WHERE EXISTS ( SELECT * FROM emps AS underling WHERE underling.manager = e.id) Filter [age > 50] Aggregate [deptno, MIN(salary)]
  • 30. View After pushing down “age > 50” filter SELECT deptno, MIN(salary) FROM managers WHERE age > 50 GROUP BY deptno Scan [emps] Scan [emps] Join [e.id = underling.manager] Project [id, deptno, salary, age]CREATE VIEW managers AS SELECT * FROM emps AS e WHERE EXISTS ( SELECT * FROM emps AS underling WHERE underling.manager = e.id) Aggregate [deptno, MIN(salary)] Filter [age > 50] Aggregate [manager]
  • 31. Materialized view CREATE MATERIALIZED VIEW emps_by_deptno_gender AS SELECT deptno, gender, COUNT(*) AS c, SUM(sal) AS s FROM emps GROUP BY deptno, gender SELECT COUNT(*) AS c FROM emps WHERE deptno = 10 AND gender = ‘M’ Query that uses the emps table but could potentially use the emps_by_deptno_gender materialized view Declaration of the emps_by_deptno_gender materialized view
  • 32. Materialized view With relational algebra CREATE MATERIALIZED VIEW emps_by_deptno_gender AS SELECT deptno, gender, COUNT(*) AS c, SUM(sal) AS s FROM emps GROUP BY deptno, gender SELECT COUNT(*) AS c FROM emps WHERE deptno = 10 AND gender = ‘M’ Scan [emps] Filter [deptno = 10 AND gender = ‘M’] Aggregate [COUNT(*)] Scan [emps_by_ deptno_gender] = Scan [emps] Aggregate [deptno, gender, COUNT(*), SUM(salary)]
  • 33. Materialized view Rewrite query to match CREATE MATERIALIZED VIEW emps_by_deptno_gender AS SELECT deptno, gender, COUNT(*) AS c, SUM(sal) AS s FROM emps GROUP BY deptno, gender SELECT COUNT(*) AS c FROM emps WHERE deptno = 10 AND gender = ‘M’ Scan [emps_by_ deptno_gender] = Scan [emps] Aggregate [deptno, gender, COUNT(*), SUM(salary)] Scan [emps] Filter [deptno = 10 AND gender = ‘M’] Aggregate [deptno, gender, COUNT(*), SUM(salary)] Project [c]
  • 34. Materialized view Fragment of query is now identical to materialized view CREATE MATERIALIZED VIEW emps_by_deptno_gender AS SELECT deptno, gender, COUNT(*) AS c, SUM(sal) AS s FROM emps GROUP BY deptno, gender SELECT COUNT(*) AS c FROM emps WHERE deptno = 10 AND gender = ‘M’ Scan [emps_by_ deptno_gender] = Scan [emps] Aggregate [deptno, gender, COUNT(*), SUM(salary)] Scan [emps] Filter [deptno = 10 AND gender = ‘M’] Aggregate [deptno, gender, COUNT(*), SUM(salary)] Project [c]
  • 35. Materialized view Substitute table scan CREATE MATERIALIZED VIEW emps_by_deptno_gender AS SELECT deptno, gender, COUNT(*) AS c, SUM(sal) AS s FROM emps GROUP BY deptno, gender SELECT COUNT(*) AS c FROM emps WHERE deptno = 10 AND gender = ‘M’ Filter [deptno = 10 AND gender = ‘M’] Scan [emps_by_ deptno_gender] = Scan [emps] Aggregate [deptno, gender, COUNT(*), SUM(salary)] Project [c] Scan [emps_by_ deptno_gender]
  • 36. Aggregates in LookML Aggregate table is used even by queries on the orders explore LookML - orders view: orders { dimension_group: created_at { type: time sql: ${TABLE}.created_at ;; } dimension: product_id {} dimension: customer_id {} measure: revenue {} } view: customers { dimension: customer_id {} dimension: state {} } explore: orders { join: customers {} } LookML - orders_by_state_quarter view: orders_by_state_quarter { derived_table: { explore_source: orders { column: created_at_quarter { field: orders.created_at_quarter } column: state { field: customers.state } column: sum_revenue { sql: sum(${orders.revenue});; } } explore: orders_by_state_quarter {}
  • 37. Agenda Relational algebra and query optimization How Looker executes a query (and how Calcite can help) Aggregate awareness & materialized views Generate LookML model from SQL
  • 38. Companies that are just starting to use Looker often have a database and log of queries run against that database. Parsing the SQL queries in the log, we can recover the dimensions, measures, relationships, and put them into a LookML model. We call this process “model induction”. A model out of thin air
  • 39. Convert a query to algebra SELECT YEAR(o.created_at), COUNT(*) AS c, SUM(i.revenue) AS sum_revenue, SUM(i.revenue - i.cost) AS sum_profit FROM orders AS o, customers AS c, order_items AS i WHERE o.customer_id = c.customer_id AND o.order_id = i.order_id AND i.product_id IN ( SELECT product_id FROM products AS p WHERE color = ‘Red’) GROUP BY YEAR(o.created_at) Scan [orders] Scan [products] Join [true] Project [*, YEAR(o.created_at) AS y, i.revenue - i.cost AS p] Aggregate [deptno, COUNT(*), SUM(i.revenue), SUM(p)] Filter [color = ‘Red’] Scan [customers] Scan [order_items] Join [true] Filter [o.customer_id = c.customer_id AND o.order_id = i.order_id] SemiJoin [i.product_id = p.product_id]
  • 40. Convert a query to algebra SELECT YEAR(o.created_at), COUNT(*) AS c, SUM(i.revenue) AS sum_revenue, SUM(i.revenue - i.cost) AS sum_profit FROM orders AS o, customers AS c, order_items AS i WHERE o.customer_id = c.customer_id AND o.order_id = i.order_id AND i.product_id IN ( SELECT product_id FROM products AS p WHERE color = ‘Red’) GROUP BY YEAR(o.created_at) orders products Join [true] Project [*, YEAR(o.created_at) AS y, i.revenue - i.cost AS p] Aggregate [deptno, COUNT(*), SUM(i.revenue), SUM(p)] Filter [color = ‘Red’] customers order_items Join [true] Filter [o.customer_id = c.customer_id AND o.order_id = i.order_id] SemiJoin [i.product_id = p.product_id]
  • 41. Convert query to join graph, measures SELECT YEAR(o.created_at), COUNT(*) AS c, SUM(i.revenue) AS sum_revenue, SUM(i.revenue - i.cost) AS sum_profit FROM orders AS o, customers AS c, order_items AS i WHERE o.customer_id = c.customer_id AND o.order_id = i.order_id AND i.product_id IN ( SELECT product_id FROM products AS p WHERE color = ‘Red’) GROUP BY YEAR(o.created_at) orders products customers order_items measures COUNT(*) SUM(i.revenue) SUM(i.revenue - i.cost)
  • 42. Query file SELECT d1, d2, m1, m2 FROM customers JOIN orders SELECT d1, d3, m2, m3 FROM products JOIN orders SELECT d2, m4 FROM products JOIN inventory SELECT m5 FROM products JOIN product_types customers orders products product types inventoryproducts products orders 1. Parse & validate SQL query log; convert to relational algebra 2. Construct join graphs; deduce PK-FK relationships by running cardinality queries orders customers products product types inventory products 3. Cluster join graphs into stars, each with a central “fact table” LookML view: customers {} view: orders {} view: products {} view: product_types {} explore: orders { join: customers {} join: products {} join: product_types {} } view: products {} view: inventory {} explore: inventory { join: products {} } 4. Generate LookML; one explore per star Generate LookML model from SQL queries
  • 44. Questions? Rate this session in the JOIN mobile app #JOINdata @lookerdata @julianhyde @ApacheCalcite https://calcite.apache.org https://github.com/apache/calcite