SlideShare a Scribd company logo
1 of 32
Download to read offline
SQL on everything, in memory 
Page ‹#› © Hortonworks Inc. 2014 
Julian Hyde 
Strata, NYC 
October 16th, 2014 
Apache 
Calcite
About me 
Julian Hyde 
Architect at Hortonworks 
Open source: 
• Founder & lead, Apache Calcite (query optimization framework) 
• Founder & lead, Pentaho Mondrian (analysis engine) 
• Committer, Apache Drill 
• Contributor, Apache Hive 
• Contributor, Cascading Lingual (SQL interface to Cascading) 
Past: 
• SQLstream (streaming SQL) 
• Broadbase (data warehouse) 
• Oracle (SQL kernel development) 
Page ‹#› © Hortonworks Inc. 2014
SQL: in and out of fashion 
1969 — CODASYL (network database) 
1979 — First commercial SQL RDBMSs 
1990 — Acceptance — transaction processing on SQL 
1993 — Multi-dimensional databases 
1996 — SQL EDWs 
2006 — Hadoop and other “big data” technologies 
2008 — NoSQL 
2011 — SQL on Hadoop 
2014 — Interactive analytics on {Hadoop, NoSQL, DBMS}, using SQL 
! 
SQL remains popular. 
But why? 
Page ‹#› © Hortonworks Inc. 2014
“SQL inside” 
Implementing SQL well is hard 
• System cannot just “run the query” as written 
• Require relational algebra, query planner (optimizer) & metadata 
…but it’s worth the effort 
! 
Algebra-based systems are more flexible 
• Add new algorithms (e.g. a better join) 
• Re-organize data 
• Choose access path based on statistics 
• Dumb queries (e.g. machine-generated) 
• Relational, schema-less, late-schema, non-relational (e.g. key-value, document) 
Page ‹#› © Hortonworks Inc. 2014
Apache Calcite 
Page ‹#› © Hortonworks Inc. 2014 
Apache 
Calcite
Apache Calcite 
Apache incubator project since May, 2014 
• Originally named Optiq 
Query planning framework 
• Relational algebra, rewrite rules, cost model 
• Extensible 
Packaging 
• Library (JDBC server optional) 
• Open source 
• Community-authored rules, adapters 
Adoption 
• Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Kylin OLAP 
• Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory data 
Page ‹#› © Hortonworks Inc. 2014
Conventional DB architecture 
Page ‹#› © Hortonworks Inc. 2014
Calcite architecture 
Page ‹#› © Hortonworks Inc. 2014
Demo 
{sqlline, apache-calcite-0.9.1, .csv} 
Page ‹#› © Hortonworks Inc. 2014
Expression tree 
Splunk 
Table: splunk 
MySQL 
Page ‹#› © Hortonworks Inc. 2014 
SELECT p.“product_name”, COUNT(*) AS c 
FROM “splunk”.”splunk” AS s 
JOIN “mysql”.”products” AS p 
ON s.”product_id” = p.”product_id” 
WHERE s.“action” = 'purchase' 
GROUP BY p.”product_name” 
ORDER BY c DESC 
Key: product_id 
join 
Key: product_name 
Agg: count 
group 
Condition: 
action = 
'purchase' 
filter 
Key: c DESC 
sort 
scan 
scan 
Table: products
Expression tree 
(optimized) 
Splunk 
Table: splunk 
Page ‹#› © Hortonworks Inc. 2014 
Key: product_id 
join 
Key: product_name 
Agg: count 
group 
Condition: 
action = 
'purchase' 
filter 
Key: c DESC 
sort 
scan 
MySQL 
scan 
Table: products 
SELECT p.“product_name”, COUNT(*) AS c 
FROM “splunk”.”splunk” AS s 
JOIN “mysql”.”products” AS p 
ON s.”product_id” = p.”product_id” 
WHERE s.“action” = 'purchase' 
GROUP BY p.”product_name” 
ORDER BY c DESC
Calcite – APIs and SPIs 
Page ‹#› © Hortonworks Inc. 2014 
Cost, statistics 
RelOptCost 
RelOptCostFactory 
RelMetadataProvider 
• RelMdColumnUniquensss 
• RelMdDistinctRowCount 
• RelMdSelectivity 
SQL parser 
SqlNode 
SqlParser 
SqlValidator 
Transformation rules 
RelOptRule 
• MergeFilterRule 
• PushAggregateThroughUnionRule 
• 100+ more 
Global transformations 
• Unification (materialized view) 
• Column trimming 
• De-correlation 
Relational algebra 
RelNode (operator) 
• TableScan 
• Filter 
• Project 
• Union 
• Aggregate 
• … 
RelDataType (type) 
RexNode (expression) 
RelTrait (physical property) 
• RelConvention (calling-convention) 
• RelCollation (sortedness) 
• TBD (bucketedness/distribution) JDBC driver 
Metadata 
Schema 
Table 
Function 
• TableFunction 
• TableMacro 
Lattice
Calcite Planning Process 
SqlNode ! 
RelNode + RexNode 
Page ‹#› © Hortonworks Inc. 2014 
SQL 
parse 
tree 
Planner 
RelNode 
Graph 
Sql-to-Rel Converter 
1. Plan Graph 
• Node for each node in 
Input Plan 
• Each node is a Set of 
alternate Sub Plans 
• Set further divided into 
Subsets: based on traits 
like sortedness 
2. Rules 
• Rule: specifies an Operator 
sub-graph to match and 
logic to generate equivalent 
‘better’ sub-graph 
• New and original sub-graph 
both remain in contention 
3. Cost Model 
• RelNodes have Cost & 
Cumulative Cost 
4. Metadata Providers 
- Used to plug in Schema, 
cost formulas 
- Filter selectivity 
- Join selectivity 
- NDV calculations 
Rule Match Queue 
- Add Rule matches to Queue 
- Apply Rule match 
transformations to plan graph 
- Iterate for fixed iterations or until 
cost doesn’t change 
- Match importance based on 
cost of RelNode and height 
Best RelNode Graph 
Translate to 
runtime 
Logical Plan 
Based on “Volcano” & “Cascades” papers [G. Graefe]
Demo 
{sqlline, apache-calcite-0.9.1, .csv, CsvPushProjectOntoTableRule} 
Page ‹#› © Hortonworks Inc. 2014
Analytics 
Page ‹#› © Hortonworks Inc. 2014
Mondrian OLAP (Saiku user interface)
Interactive queries on NoSQL 
Typical requirements 
NoSQL operational database (e.g. HBase, MongoDB, Cassandra) 
Analytic queries aggregate over full scan 
Speed-of-thought response (< 5 seconds first query, < 1 second second query) 
Data freshness (< 10 minutes) 
! 
Other requirements 
Hybrid system (e.g. Hive + HBase) 
Star schema 
Page ‹#› © Hortonworks Inc. 2014
Star schema 
Page ‹#› © Hortonworks Inc. 2014 
Sales Time Inventory 
Customer 
Warehouse 
Key 
Fact table 
Dimension table 
Many-to-one relationship 
Product 
Product 
class 
Promotion
Simple analytics problem? 
System 
100M US census records 
1KB each record, 100GB total 
4 SATA 3 disks, total read throughput 1.2GB/s 
! 
Requirement 
Count all records in < 5s 
Solution #1 
It’s not possible! It takes 80s just to read the data 
Solution #2 
Cheat! 
Page ‹#› © Hortonworks Inc. 2014
How to cheat 
Multiple tricks 
Compress data 
Column-oriented storage 
Store data in sorted order 
Put data in memory 
Cache previous results 
Pre-compute (materialize) aggregates 
! 
Common factors 
Make a copy of the data 
Organize it in a different way 
Optimizer chooses the most suitable data organization 
SQL query is unchanged 
Page ‹#› © Hortonworks Inc. 2014
Filter-join-aggregate query 
SELECT product.id, sum(sales.units), sum(sales.price), count(*) 
FROM sales … 
JOIN customer ON … 
JOIN time ON … 
JOIN product ON … 
JOIN product_class ON … 
WHERE time.year = 2014 
AND time.quarter = ‘Q1’ 
AND product.color = ‘Red’ 
GROUP BY … 
Page ‹#› © Hortonworks Inc. 2014 
Time 
Sales Inventory 
Product 
Customer 
Warehouse 
ProductClass
Materialized view, lattice, tile 
Materialized view 
A table whose contents are guaranteed to be the same as 
executing a given query. 
Lattice 
Recommends, builds, and recognizes summary 
materialized views (tiles) based on a star schema. 
A query defines the tables and many:1 relationships in the 
star schema. 
Tile 
A summary materialized view that belongs to a lattice. 
A tile may or may not be materialized. 
Materialization methods: 
• Declare in lattice 
• Generate via recommender algorithm 
• Created in response to query 
Page ‹#› © Hortonworks Inc. 2014 
(FAKE SYNTAX) 
CREATE MATERIALIZED VIEW t AS 
SELECT * FROM emps 
WHERE deptno = 10; 
CREATE LATTICE star AS 
SELECT * 
FROM sales_fact_1997 AS s 
JOIN product AS p ON … 
JOIN product_class AS pc ON … 
JOIN customer AS c ON … 
JOIN time_by_day AS t ON …; 
CREATE MATERIALIZED VIEW zg IN star 
SELECT gender, zipcode, 
COUNT(*), SUM(unit_sales) 
FROM star 
GROUP BY gender, zipcode;
Lattice () 1 
Page ‹#› © Hortonworks Inc. 2014 
> select count(*) as c, sum(unit_sales) as s 
> from star; 
+-----------+-----------+ 
| C | S | 
+-----------+-----------+ 
| 1,000,000 | 266,773.0 | 
+-----------+-----------+ 
1 row selected 
! 
> select * from star; 
1,000,000 rows selected 
raw 1m
Lattice - top tiles () 1 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
raw 1m 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
> select zipcode, count(*) as c, 
> sum(unit_sales) as s 
> from star 
> group by zipcode; 
+---------+-----------+-----------+ 
| ZIPCODE | C | S | 
+---------+-----------+-----------+ 
| 10000 | 23 | 31.5 | 
… 
+---------+-----------+-----------+ 
43,000 rows selected 
> select state, count(*) as c, 
> sum(unit_sales) as s 
> from star 
> group by state; 
+-------+-----------+-----------+ 
| STATE | C | S | 
+-------+-----------+-----------+ 
| AL | 201,693 | 5,520.0 | 
… 
+-------+-----------+-----------+ 
50 rows selected
Lattice - more tiles () 1 
(z, s) 43.4k (g, y) 10 (y, m) 60 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
(z, s, g, y, 
m) 912k 
(s, g, y, m) 6k 
raw 1m 
(g, y, m) 120 
Fewer 
than you would 
expect, because 5m 
combinations cannot 
occur in 1m row 
table 
Fewer than you 
would expect, because 
state depends on 
zipcode
Lattice - complete () 1 
(z, s) 43.4k (y, m) 60 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
(z, s, g, y, 
m) 910k 
(s, g, y, m) 6k 
(z, g, y, m) 
909k 
(z, s, y, m) 
830k 
raw 1m 
(z, s, g, m) 
643k 
(z, s, g, y) 
391k 
(z, s, g) 87k 
(g, y) 10 
(g, y, m) 120 
(g, m) 24
Lattice - optimized () 1 
(z, s) 43.4k (y, m) 60 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
(z, s, g, y, 
m) 912k 
(s, g, y, m) 6k 
(z, g, y, m) 
909k 
(z, s, y, m) 
831k 
raw 1m 
(z, s, g, m) 
644k 
(z, s, g, y) 
392k 
(z, s, g) 87k 
(g, y) 10 
(g, y, m) 120 
(g, m) 24
Lattice - optimized () 1 
(z, s) 43.4k (y, m) 60 
Key 
! 
z zipcode (43k) 
s state (50) 
g gender (2) 
y year (5) 
m month (12) 
(z) 43k (s) 50 (g) 2 (y) 5 (m) 12 
Page ‹#› © Hortonworks Inc. 2014 
(z, s, g, y, 
m) 912k 
(s, g, y, m) 6k 
(z, g, y, m) 
909k 
(z, s, y, m) 
831k 
raw 1m 
(z, s, g, m) 
644k 
(z, s, g, y) 
392k 
(z, s, g) 87k 
(g, y) 10 
(g, y, m) 120 
(g, m) 24 
Aggregate Cost 
(rows) 
Benefit (query 
rows saved) 
% queries 
s, g, y, m 6k 497k 50% 
z, s, g 87k 304k 33% 
g, y 10 1.5k 25% 
g, m 24 1.5k 25% 
s, g 100 1.5k 25% 
y, m 60 1.5k 25%
Demo 
{mysql-foodmart-lattice-model.json} 
Page ‹#› © Hortonworks Inc. 2014
Tiled, in-memory materializations 
Page ‹#› © Hortonworks Inc. 2014 
Query: SELECT x, SUM(y) FROM t GROUP BY x 
In-memory 
materialized 
queries 
Tables 
on disk 
Where we’re going… smart, distributed memory cache & compute framework 
http://hortonworks.com/blog/dmmq/
Kylin OLAP engine 
Calcite 
used here 
Page ‹#› © Hortonworks Inc. 2014
Thank you! 
Apache 
Calcite 
@julianhyde 
http://calcite.incubator.apache.org 
http://www.kylin.io 
Page ‹#› © Hortonworks Inc. 2014

More Related Content

What's hot

Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryKai Wähner
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streamsconfluent
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache CalciteJordan Halterman
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 

What's hot (20)

Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel Industry
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 

Viewers also liked

Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / OptiqJulian Hyde
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsWes McKinney
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowDremio Corporation
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too muchJulian Hyde
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeDremio Corporation
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 

Viewers also liked (12)

Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too much
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 

Similar to SQL on everything, in memory

Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in HiveDataWorks Summit
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteSalesforce Engineering
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataJohn Beresniewicz
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Julian Hyde
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internalDavid Lauzon
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
 

Similar to SQL on everything, in memory (20)

Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internal
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 

More from Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 

More from Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

SQL on everything, in memory

  • 1. SQL on everything, in memory Page ‹#› © Hortonworks Inc. 2014 Julian Hyde Strata, NYC October 16th, 2014 Apache Calcite
  • 2. About me Julian Hyde Architect at Hortonworks Open source: • Founder & lead, Apache Calcite (query optimization framework) • Founder & lead, Pentaho Mondrian (analysis engine) • Committer, Apache Drill • Contributor, Apache Hive • Contributor, Cascading Lingual (SQL interface to Cascading) Past: • SQLstream (streaming SQL) • Broadbase (data warehouse) • Oracle (SQL kernel development) Page ‹#› © Hortonworks Inc. 2014
  • 3. SQL: in and out of fashion 1969 — CODASYL (network database) 1979 — First commercial SQL RDBMSs 1990 — Acceptance — transaction processing on SQL 1993 — Multi-dimensional databases 1996 — SQL EDWs 2006 — Hadoop and other “big data” technologies 2008 — NoSQL 2011 — SQL on Hadoop 2014 — Interactive analytics on {Hadoop, NoSQL, DBMS}, using SQL ! SQL remains popular. But why? Page ‹#› © Hortonworks Inc. 2014
  • 4. “SQL inside” Implementing SQL well is hard • System cannot just “run the query” as written • Require relational algebra, query planner (optimizer) & metadata …but it’s worth the effort ! Algebra-based systems are more flexible • Add new algorithms (e.g. a better join) • Re-organize data • Choose access path based on statistics • Dumb queries (e.g. machine-generated) • Relational, schema-less, late-schema, non-relational (e.g. key-value, document) Page ‹#› © Hortonworks Inc. 2014
  • 5. Apache Calcite Page ‹#› © Hortonworks Inc. 2014 Apache Calcite
  • 6. Apache Calcite Apache incubator project since May, 2014 • Originally named Optiq Query planning framework • Relational algebra, rewrite rules, cost model • Extensible Packaging • Library (JDBC server optional) • Open source • Community-authored rules, adapters Adoption • Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Kylin OLAP • Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory data Page ‹#› © Hortonworks Inc. 2014
  • 7. Conventional DB architecture Page ‹#› © Hortonworks Inc. 2014
  • 8. Calcite architecture Page ‹#› © Hortonworks Inc. 2014
  • 9. Demo {sqlline, apache-calcite-0.9.1, .csv} Page ‹#› © Hortonworks Inc. 2014
  • 10. Expression tree Splunk Table: splunk MySQL Page ‹#› © Hortonworks Inc. 2014 SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC Key: product_id join Key: product_name Agg: count group Condition: action = 'purchase' filter Key: c DESC sort scan scan Table: products
  • 11. Expression tree (optimized) Splunk Table: splunk Page ‹#› © Hortonworks Inc. 2014 Key: product_id join Key: product_name Agg: count group Condition: action = 'purchase' filter Key: c DESC sort scan MySQL scan Table: products SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC
  • 12. Calcite – APIs and SPIs Page ‹#› © Hortonworks Inc. 2014 Cost, statistics RelOptCost RelOptCostFactory RelMetadataProvider • RelMdColumnUniquensss • RelMdDistinctRowCount • RelMdSelectivity SQL parser SqlNode SqlParser SqlValidator Transformation rules RelOptRule • MergeFilterRule • PushAggregateThroughUnionRule • 100+ more Global transformations • Unification (materialized view) • Column trimming • De-correlation Relational algebra RelNode (operator) • TableScan • Filter • Project • Union • Aggregate • … RelDataType (type) RexNode (expression) RelTrait (physical property) • RelConvention (calling-convention) • RelCollation (sortedness) • TBD (bucketedness/distribution) JDBC driver Metadata Schema Table Function • TableFunction • TableMacro Lattice
  • 13. Calcite Planning Process SqlNode ! RelNode + RexNode Page ‹#› © Hortonworks Inc. 2014 SQL parse tree Planner RelNode Graph Sql-to-Rel Converter 1. Plan Graph • Node for each node in Input Plan • Each node is a Set of alternate Sub Plans • Set further divided into Subsets: based on traits like sortedness 2. Rules • Rule: specifies an Operator sub-graph to match and logic to generate equivalent ‘better’ sub-graph • New and original sub-graph both remain in contention 3. Cost Model • RelNodes have Cost & Cumulative Cost 4. Metadata Providers - Used to plug in Schema, cost formulas - Filter selectivity - Join selectivity - NDV calculations Rule Match Queue - Add Rule matches to Queue - Apply Rule match transformations to plan graph - Iterate for fixed iterations or until cost doesn’t change - Match importance based on cost of RelNode and height Best RelNode Graph Translate to runtime Logical Plan Based on “Volcano” & “Cascades” papers [G. Graefe]
  • 14. Demo {sqlline, apache-calcite-0.9.1, .csv, CsvPushProjectOntoTableRule} Page ‹#› © Hortonworks Inc. 2014
  • 15. Analytics Page ‹#› © Hortonworks Inc. 2014
  • 16. Mondrian OLAP (Saiku user interface)
  • 17. Interactive queries on NoSQL Typical requirements NoSQL operational database (e.g. HBase, MongoDB, Cassandra) Analytic queries aggregate over full scan Speed-of-thought response (< 5 seconds first query, < 1 second second query) Data freshness (< 10 minutes) ! Other requirements Hybrid system (e.g. Hive + HBase) Star schema Page ‹#› © Hortonworks Inc. 2014
  • 18. Star schema Page ‹#› © Hortonworks Inc. 2014 Sales Time Inventory Customer Warehouse Key Fact table Dimension table Many-to-one relationship Product Product class Promotion
  • 19. Simple analytics problem? System 100M US census records 1KB each record, 100GB total 4 SATA 3 disks, total read throughput 1.2GB/s ! Requirement Count all records in < 5s Solution #1 It’s not possible! It takes 80s just to read the data Solution #2 Cheat! Page ‹#› © Hortonworks Inc. 2014
  • 20. How to cheat Multiple tricks Compress data Column-oriented storage Store data in sorted order Put data in memory Cache previous results Pre-compute (materialize) aggregates ! Common factors Make a copy of the data Organize it in a different way Optimizer chooses the most suitable data organization SQL query is unchanged Page ‹#› © Hortonworks Inc. 2014
  • 21. Filter-join-aggregate query SELECT product.id, sum(sales.units), sum(sales.price), count(*) FROM sales … JOIN customer ON … JOIN time ON … JOIN product ON … JOIN product_class ON … WHERE time.year = 2014 AND time.quarter = ‘Q1’ AND product.color = ‘Red’ GROUP BY … Page ‹#› © Hortonworks Inc. 2014 Time Sales Inventory Product Customer Warehouse ProductClass
  • 22. Materialized view, lattice, tile Materialized view A table whose contents are guaranteed to be the same as executing a given query. Lattice Recommends, builds, and recognizes summary materialized views (tiles) based on a star schema. A query defines the tables and many:1 relationships in the star schema. Tile A summary materialized view that belongs to a lattice. A tile may or may not be materialized. Materialization methods: • Declare in lattice • Generate via recommender algorithm • Created in response to query Page ‹#› © Hortonworks Inc. 2014 (FAKE SYNTAX) CREATE MATERIALIZED VIEW t AS SELECT * FROM emps WHERE deptno = 10; CREATE LATTICE star AS SELECT * FROM sales_fact_1997 AS s JOIN product AS p ON … JOIN product_class AS pc ON … JOIN customer AS c ON … JOIN time_by_day AS t ON …; CREATE MATERIALIZED VIEW zg IN star SELECT gender, zipcode, COUNT(*), SUM(unit_sales) FROM star GROUP BY gender, zipcode;
  • 23. Lattice () 1 Page ‹#› © Hortonworks Inc. 2014 > select count(*) as c, sum(unit_sales) as s > from star; +-----------+-----------+ | C | S | +-----------+-----------+ | 1,000,000 | 266,773.0 | +-----------+-----------+ 1 row selected ! > select * from star; 1,000,000 rows selected raw 1m
  • 24. Lattice - top tiles () 1 (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 raw 1m Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) > select zipcode, count(*) as c, > sum(unit_sales) as s > from star > group by zipcode; +---------+-----------+-----------+ | ZIPCODE | C | S | +---------+-----------+-----------+ | 10000 | 23 | 31.5 | … +---------+-----------+-----------+ 43,000 rows selected > select state, count(*) as c, > sum(unit_sales) as s > from star > group by state; +-------+-----------+-----------+ | STATE | C | S | +-------+-----------+-----------+ | AL | 201,693 | 5,520.0 | … +-------+-----------+-----------+ 50 rows selected
  • 25. Lattice - more tiles () 1 (z, s) 43.4k (g, y) 10 (y, m) 60 Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 (z, s, g, y, m) 912k (s, g, y, m) 6k raw 1m (g, y, m) 120 Fewer than you would expect, because 5m combinations cannot occur in 1m row table Fewer than you would expect, because state depends on zipcode
  • 26. Lattice - complete () 1 (z, s) 43.4k (y, m) 60 Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 (z, s, g, y, m) 910k (s, g, y, m) 6k (z, g, y, m) 909k (z, s, y, m) 830k raw 1m (z, s, g, m) 643k (z, s, g, y) 391k (z, s, g) 87k (g, y) 10 (g, y, m) 120 (g, m) 24
  • 27. Lattice - optimized () 1 (z, s) 43.4k (y, m) 60 Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 (z, s, g, y, m) 912k (s, g, y, m) 6k (z, g, y, m) 909k (z, s, y, m) 831k raw 1m (z, s, g, m) 644k (z, s, g, y) 392k (z, s, g) 87k (g, y) 10 (g, y, m) 120 (g, m) 24
  • 28. Lattice - optimized () 1 (z, s) 43.4k (y, m) 60 Key ! z zipcode (43k) s state (50) g gender (2) y year (5) m month (12) (z) 43k (s) 50 (g) 2 (y) 5 (m) 12 Page ‹#› © Hortonworks Inc. 2014 (z, s, g, y, m) 912k (s, g, y, m) 6k (z, g, y, m) 909k (z, s, y, m) 831k raw 1m (z, s, g, m) 644k (z, s, g, y) 392k (z, s, g) 87k (g, y) 10 (g, y, m) 120 (g, m) 24 Aggregate Cost (rows) Benefit (query rows saved) % queries s, g, y, m 6k 497k 50% z, s, g 87k 304k 33% g, y 10 1.5k 25% g, m 24 1.5k 25% s, g 100 1.5k 25% y, m 60 1.5k 25%
  • 29. Demo {mysql-foodmart-lattice-model.json} Page ‹#› © Hortonworks Inc. 2014
  • 30. Tiled, in-memory materializations Page ‹#› © Hortonworks Inc. 2014 Query: SELECT x, SUM(y) FROM t GROUP BY x In-memory materialized queries Tables on disk Where we’re going… smart, distributed memory cache & compute framework http://hortonworks.com/blog/dmmq/
  • 31. Kylin OLAP engine Calcite used here Page ‹#› © Hortonworks Inc. 2014
  • 32. Thank you! Apache Calcite @julianhyde http://calcite.incubator.apache.org http://www.kylin.io Page ‹#› © Hortonworks Inc. 2014