SlideShare a Scribd company logo
1 of 34
Query Optimization and
JIT-based Vectorized
Execution in Apache Tajo
Hyunsik Choi
Research Director, Gruter
Hadoop Summit North America 2014
Talk Outline
• Introduction to Apache Tajo
• Key Topics
– Query Optimization in Apache Tajo
• Join Order Optimization
• Progressive Optimization
– JIT-based Vectorized Engines
About Me
• Hyunsik Choi (pronounced “Hyeon-shick Cheh”)
• PhD (Computer Science & Engineering, 2013), Korea Uni.
• Director of Research, Gruter Corp, Seoul, South Korea
• Open-source Involvement
– Full-time contributor to Apache Tajo (2013.6 ~ )
– Apache Tajo PMC member and committer (2013.3 ~ )
– Apache Giraph PMC member and committer (2011. 8 ~ )
• Contact Info
– Email: hyunsik@apache.org
– Linkedin: http://linkedin.com/in/hyunsikchoi/
Apache Tajo
• Open-source “SQL-on-H” “Big DW” system
• Apache Top-level project since March 2014
• Supports SQL standards
• Low latency, long running batch queries
• Features
– Supports Joins (inner and all outer), Groupby, and Sort
– Window function
– Most SQL data types supported (except for Decimal)
• Recent 0.8.0 release
– https://blogs.apache.org/tajo/entry/apache_tajo_0_8_0
Overall Architecture
Query Optimization
Optimization in Tajo
Query Optimization Steps
Logical Plan Optimization in Tajo
• Rewrite Rules
– Projection Push Down
• push expressions to operators lower as possible
• narrow read columns
• remove duplicated expressions
– if some expressions has common expression
– Selection Push Down
• reduce rows to be processed earlier as possible
– Extensible Rewrite rule interfaces
• Allow developers to write their own rewrite rules
• Join order optimization
– Enumerate possible join orders
– Determine the optimized join order in greedy manner
– Currently, we use simple cost-model using table volumes.
Join Optimization - Greedy Operator Ordering
Set<LogicalNode> remainRelations = new LinkedHashSet<LogicalNode>();
for (RelationNode relation : block.getRelations()) {
remainRelations.add(relation);
}
LogicalNode latestJoin;
JoinEdge bestPair;
while (remainRelations.size() > 1) {
// Find the best join pair among all joinable operators in candidate set.
bestPair = getBestPair(plan, joinGraph, remainRelations);
// remainRels = remainRels  Ti
remainRelations.remove(bestPair.getLeftRelation());
// remainRels = remainRels  Tj
remainRelations.remove(bestPair.getRightRelation());
latestJoin = createJoinNode(plan, bestPair);
remainRelations.add(latestJoin);
}
findBestOrder() in GreedyHeuristicJoinOrderAlgorithm.java
Progressive Optimization (in DAG controller)
• Query plans often suboptimal as estimation-based
• Progressive Optimization:
– Statistics collection over running query in runtime
– Re-optimization of remaining plan stages
• Optimal ranges and partitions based on operator type (join,
aggregation, and sort) in runtime (since v0.2)
• In-progress work (planned for 1.0)
– Re-optimize join orders
– Re-optimize distributed join plan
• Symmetric shuffle Join >>> broadcast join
– Shrink multiple stages into fewer stages
JIT-based Vectorized
Query Engine
Vectorized Processing - Motivation
• So far have focused on I/O throughput
• Achieved 70-110MB/s in disk bound queries
• Increasing customer demand for faster storages such as S
AS disk and SSD
• BMT with fast storage indicates performance likely
CPU-bound rather than disk-bound
• Current execution engine based on tuple-at-a-time
approach
What is Tuple-at-a-time model?
• Every physical operator produces a tuple
by recursively calling next() of child
operators
tuples
next() call
Upside
• Simple Interface
• All arbitrary operator combinations
Downside (performance degradation)
• Too many function calls
• Too many branches
• Bad for CPU pipelining
• Bad data/instruction cache hits
Performance Degradation
Tajo also uses:
• Immutable Datum classes wrapping Java
primitives
– Used in expression evaluation and serialization
Resulting in:
• Object creation overheads
• Big memory footprint (particularly inefficient in-memory op
erations)
• Expression trees
– Each primitive operator evaluation involves
function call
Benchmark Breakdown
• TPC-H Q1:
select
l_returnflag, l_linestatus, sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= '1998-09-01’
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus
Benchmark Breakdown
• TPC-H dataset (scale factor = 3)
– 17,996,609 (about 18M) rows
• Plain text lineitem table (2.3 GB)
• CSV dataset >> Parquet format file
– To minimize the effectiveness of other factors which
may impact CPU cost
– No compression
– 256MB block size, 1MB pagesize
• Single 1GB Parquet file
Benchmark Breakdown
• H/W environment
– CPU i7-4770 (3.4GHz), 32GB Ram
– 1 SATA Disk (WD2003FZEX)
• Read throughput: 105-167MB/s (avg. 144 MB/s)
according to http://hdd.userbenchmark.com.
• Single thread and single machine
• Directly call next() of the root of physical op
erator tree
Benchmark Breakdown
CPU accounts for 50% total query
processing time in TPC-H Q1
milliseconds
About
100MB/S
Benchmark Breakdownmilliseconds
FROM lineitem
GROUP BY l_returnflag
GROUP BY l_returnflag, l_shipflag
sum(…) x 4
avg(…) x 3
TPC-H Q1
Benchmark Analysis
• Much room for improvement
• Each tuple evaluation may involve overheads in
tuple-at-a-time model
– not easy to measure cache misses and branch mispredictions
• Each expression causes non-trivial CPU costs
– Interpret overheads
– Composite keys seem to degrade performance
• Too many objects created (yourkit profiler analysis)
– Difficult to avoid object creation to retain all tuples and datum
instances used in in-memory operators
• Hash aggregation
– Java HashMap - effective, but not cheap
– Non-trivial GC time found in other tests when distinct keys > 10M
– Java objects - big memory footprint, cache misses
Our Solution
• Vectorized Processing
– Columnar processing on primitive arrays
• JIT helps vectorization engine
– Elimination of vectorization impediments
• Unsafe-based in-memory structure for vectors
– No object creations
• Unsafe-based Cukcoo HashTable
– Fast lookup and No GC
Vectorized Processing
• Originated from database research
– Cstore, MonetDB and Vectorwise
• Recently adopted in Hive 0.13
• Key ideas:
– Use primitive type arrays as column values
– Small and simple loop processing
– In-cache processing
– Less branches for CPU pipelining
– SIMD
• SIMD in Java??
• http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/s
hare/vm/opto/superword.cpp
Vectorized Processing
Id Name Age
101 abc 22
102 def 37
104 ghi 45
105 jkl 25
108 mno 31
112 pqr 27
114 owx 35
101 abc 22 102
def 37 104 ghi
45 105 jkl 25
mn
o
31 112 pqr
27 114 owx 35
A relation
N-array storage model
(NSM)
101 102 104 105
108 112 abc def
ghi jkl mn
o
pqr
owx 22 37 45
25 31 27 35
Decomposition storage model
(DSM)
A Row Column values
Vectorized Processing
Id
101
102
104
105
108
112
114
Name
abc
def
ghi
jkl
mno
pqr
owx
Age
22
37
45
25
31
27
35
Decomposition storage model
Id
101
102
104
105
Name
abc
def
ghi
jkl
Age
22
37
45
25
Vectorized model
108
112
114
mno
pqr
owx
31
27
35
vector
block A
(fitting in cache)
vector
block B
(fitting in cache)
(bad cache hits) (better cache hits)
Vectorized Processing
MapAddLongIntColCol(int vecNum, long [] result, long [] col1, int [] col2,
int [] selVec) {
if (selVec == null) {
for (int i = 0; I = 0; i < vecNum; i++) {
result[i] = col1[i] + col2[i];
}
} else {
int selIdx;
for (int i = 0; I = 0; i < vecNum; i++) {
selIdx = selVec[i];
result[selIdx] = col1[selIdx] + col2[selIdx];
}
}
}
Example: Add primitive for long and int vectors
Vectorized Processing
SelLEQLongIntColCol(int vecNum, int [] resSelVec, long [] col1, int [] col2,
int [] selVec) {
if (selVec == null) {
int selected;
for (int rowIdx = 0; rowIdx < vecNum; rowIdx++) {
resSelVec[selected] = rowIdx;
selected += col1[rowIdx] <= col2[rowIdx] ? 1 : 0;
}
} else {
…
}
}
Example: Less than equal filter primitive for long and int vectors
Vectorized Processing
vector block 1
vector block 2
vector block 3
Column Values
l_shipdate l_discount l_extprice l_tax returnflag
l_shipdate <=
'1998-09-01’
1-l_discount
l_extprice * l_tax
aggregation
An example of vectorized processing
Vectorized Processing in Tajo
• Unsafe-based in-memory structure for vectors
– Fast direct memory access
– More opportunities to use byte-level operations
• Vectorization + Just-in-time compilation
– Byte code generation for vectorization primitives
in runtime
– Significantly reduces branches and interpret
overheads
• One memory chunk divided into
multiple fixed-length vectors
• Variable length values stored in p
ages of variable areas
– Only pointers stored in fixed-length
vector
• Less data copy and object
creation
• Fast direct access
• Easy byte-level operations
– Guava’s FastByteComparisons
which compare two strings via long c
omparison
• Forked it to directly access
string vectors
Unsafe-based In-memory Structure for Vectors
Fixed Area Variable Area
variable-length
field vector
pointers
Vectorization + Just-in-time Compilation
• For single operation types, many type combinations required:
– INT vector (+,-,*,/,%) INT vector
– INT vector (+,-,*,/,%) INT single value
– INT single value (+,-,*,/,%) INT vector
– INT column (+,-,*,/,%) LONG vector
– …
– FLOAT column …..
• ASM used to generate Java byte code in runtime for various
primitives
– Cheaper code maintenance
– Composite keys for Sort, Groupby, and Hash functions
• Less branches and nested loops
• Complex Vectorization Primitive Generation (Planned)
– Combining Multiple primitives into one primitive
Unsafe-based Cukcoo Hash Table
• Advantages of Cuckoo hash table
– Use of multiple hash functions
– No linked list
– Only one item in each bucket
– Worst-case constant lookup time
• Single direct memory allocation for a hash table
– Indexed chunks used as buckets
– No GC overheads even if rehash entire buckets
• Simple and fast lookup
• Current implementation only supports fixed-
length hash bucket
Benchmark Breakdown: Tajo JIT + Vec Enginemilliseconds
Scanning lineitem
(throughput 138MB/s)
Expression evaluation
(projection)
Hashing groupby key columns
Finding all hash bucket ids
Aggregation
TPC-H Q1
Summary
• Tajo uses Join order optimization and
re-optimizes special cases during running
queries
• JIT-based Vectorized Engine prototype
– Significantly reduces CPU times through:
• Vectorized processing
• Unsafe-based vector in-memory structure
• Unsafe-based Cuckoo hashing
• Future work
– A single complex primitive generation to process
multiple operators at a time
– Improvement for production level
Get Involved!
• We are recruiting contributors!
• General
– http://tajo.apache.org
• Getting Started
– http://tajo.apache.org/docs/0.8.0/getting_started.html
• Downloads
– http://tajo.apache.org/docs/0.8.0/getting_started/downloading_source.html
• Jira – Issue Tracker
– https://issues.apache.org/jira/browse/TAJO
• Join the mailing list
– dev-subscribe@tajo.apache.org
– issues-subscribe@tajo.apache.org

More Related Content

What's hot

Full Text search in Django with Postgres
Full Text search in Django with PostgresFull Text search in Django with Postgres
Full Text search in Django with Postgressyerram
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)wqchen
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationOlaf Hartig
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFSData correlation using PySpark and HDFS
Data correlation using PySpark and HDFSJohn Conley
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Holden Karau
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in SearchAmund Tveit
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backVictor_Cr
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flinkFlink Forward
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Unsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkUnsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkDB Tsai
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)Jamey Hanson
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 

What's hot (20)

Full Text search in Django with Postgres
Full Text search in Django with PostgresFull Text search in Django with Postgres
Full Text search in Django with Postgres
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFSData correlation using PySpark and HDFS
Data correlation using PySpark and HDFS
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes back
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Unsupervised Learning with Apache Spark
Unsupervised Learning with Apache SparkUnsupervised Learning with Apache Spark
Unsupervised Learning with Apache Spark
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 

Viewers also liked

Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ OracleUnderstanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ OracleGuatemala User Group
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewMYXPLAIN
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Jaime Crespo
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
SQL Joins and Query Optimization
SQL Joins and Query OptimizationSQL Joins and Query Optimization
SQL Joins and Query OptimizationBrian Gallagher
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukGraham Tackley
 

Viewers also liked (6)

Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ OracleUnderstanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
SQL Joins and Query Optimization
SQL Joins and Query OptimizationSQL Joins and Query Optimization
SQL Joins and Query Optimization
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.uk
 

Similar to Query Optimization and JIT-based Vectorized Execution in Apache Tajo

Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQLGrant Fritchey
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingGrant Fritchey
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiSatoshi Nagayasu
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Yandex
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovNikolay Samokhvalov
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld
 

Similar to Query Optimization and JIT-based Vectorized Execution in Apache Tajo (20)

Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and Alerting
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptx
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Query Optimization and JIT-based Vectorized Execution in Apache Tajo

  • 1. Query Optimization and JIT-based Vectorized Execution in Apache Tajo Hyunsik Choi Research Director, Gruter Hadoop Summit North America 2014
  • 2. Talk Outline • Introduction to Apache Tajo • Key Topics – Query Optimization in Apache Tajo • Join Order Optimization • Progressive Optimization – JIT-based Vectorized Engines
  • 3. About Me • Hyunsik Choi (pronounced “Hyeon-shick Cheh”) • PhD (Computer Science & Engineering, 2013), Korea Uni. • Director of Research, Gruter Corp, Seoul, South Korea • Open-source Involvement – Full-time contributor to Apache Tajo (2013.6 ~ ) – Apache Tajo PMC member and committer (2013.3 ~ ) – Apache Giraph PMC member and committer (2011. 8 ~ ) • Contact Info – Email: hyunsik@apache.org – Linkedin: http://linkedin.com/in/hyunsikchoi/
  • 4. Apache Tajo • Open-source “SQL-on-H” “Big DW” system • Apache Top-level project since March 2014 • Supports SQL standards • Low latency, long running batch queries • Features – Supports Joins (inner and all outer), Groupby, and Sort – Window function – Most SQL data types supported (except for Decimal) • Recent 0.8.0 release – https://blogs.apache.org/tajo/entry/apache_tajo_0_8_0
  • 7. Optimization in Tajo Query Optimization Steps
  • 8. Logical Plan Optimization in Tajo • Rewrite Rules – Projection Push Down • push expressions to operators lower as possible • narrow read columns • remove duplicated expressions – if some expressions has common expression – Selection Push Down • reduce rows to be processed earlier as possible – Extensible Rewrite rule interfaces • Allow developers to write their own rewrite rules • Join order optimization – Enumerate possible join orders – Determine the optimized join order in greedy manner – Currently, we use simple cost-model using table volumes.
  • 9. Join Optimization - Greedy Operator Ordering Set<LogicalNode> remainRelations = new LinkedHashSet<LogicalNode>(); for (RelationNode relation : block.getRelations()) { remainRelations.add(relation); } LogicalNode latestJoin; JoinEdge bestPair; while (remainRelations.size() > 1) { // Find the best join pair among all joinable operators in candidate set. bestPair = getBestPair(plan, joinGraph, remainRelations); // remainRels = remainRels Ti remainRelations.remove(bestPair.getLeftRelation()); // remainRels = remainRels Tj remainRelations.remove(bestPair.getRightRelation()); latestJoin = createJoinNode(plan, bestPair); remainRelations.add(latestJoin); } findBestOrder() in GreedyHeuristicJoinOrderAlgorithm.java
  • 10. Progressive Optimization (in DAG controller) • Query plans often suboptimal as estimation-based • Progressive Optimization: – Statistics collection over running query in runtime – Re-optimization of remaining plan stages • Optimal ranges and partitions based on operator type (join, aggregation, and sort) in runtime (since v0.2) • In-progress work (planned for 1.0) – Re-optimize join orders – Re-optimize distributed join plan • Symmetric shuffle Join >>> broadcast join – Shrink multiple stages into fewer stages
  • 12. Vectorized Processing - Motivation • So far have focused on I/O throughput • Achieved 70-110MB/s in disk bound queries • Increasing customer demand for faster storages such as S AS disk and SSD • BMT with fast storage indicates performance likely CPU-bound rather than disk-bound • Current execution engine based on tuple-at-a-time approach
  • 13. What is Tuple-at-a-time model? • Every physical operator produces a tuple by recursively calling next() of child operators tuples next() call Upside • Simple Interface • All arbitrary operator combinations Downside (performance degradation) • Too many function calls • Too many branches • Bad for CPU pipelining • Bad data/instruction cache hits
  • 14. Performance Degradation Tajo also uses: • Immutable Datum classes wrapping Java primitives – Used in expression evaluation and serialization Resulting in: • Object creation overheads • Big memory footprint (particularly inefficient in-memory op erations) • Expression trees – Each primitive operator evaluation involves function call
  • 15. Benchmark Breakdown • TPC-H Q1: select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= '1998-09-01’ group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus
  • 16. Benchmark Breakdown • TPC-H dataset (scale factor = 3) – 17,996,609 (about 18M) rows • Plain text lineitem table (2.3 GB) • CSV dataset >> Parquet format file – To minimize the effectiveness of other factors which may impact CPU cost – No compression – 256MB block size, 1MB pagesize • Single 1GB Parquet file
  • 17. Benchmark Breakdown • H/W environment – CPU i7-4770 (3.4GHz), 32GB Ram – 1 SATA Disk (WD2003FZEX) • Read throughput: 105-167MB/s (avg. 144 MB/s) according to http://hdd.userbenchmark.com. • Single thread and single machine • Directly call next() of the root of physical op erator tree
  • 18. Benchmark Breakdown CPU accounts for 50% total query processing time in TPC-H Q1 milliseconds About 100MB/S
  • 19. Benchmark Breakdownmilliseconds FROM lineitem GROUP BY l_returnflag GROUP BY l_returnflag, l_shipflag sum(…) x 4 avg(…) x 3 TPC-H Q1
  • 20. Benchmark Analysis • Much room for improvement • Each tuple evaluation may involve overheads in tuple-at-a-time model – not easy to measure cache misses and branch mispredictions • Each expression causes non-trivial CPU costs – Interpret overheads – Composite keys seem to degrade performance • Too many objects created (yourkit profiler analysis) – Difficult to avoid object creation to retain all tuples and datum instances used in in-memory operators • Hash aggregation – Java HashMap - effective, but not cheap – Non-trivial GC time found in other tests when distinct keys > 10M – Java objects - big memory footprint, cache misses
  • 21. Our Solution • Vectorized Processing – Columnar processing on primitive arrays • JIT helps vectorization engine – Elimination of vectorization impediments • Unsafe-based in-memory structure for vectors – No object creations • Unsafe-based Cukcoo HashTable – Fast lookup and No GC
  • 22. Vectorized Processing • Originated from database research – Cstore, MonetDB and Vectorwise • Recently adopted in Hive 0.13 • Key ideas: – Use primitive type arrays as column values – Small and simple loop processing – In-cache processing – Less branches for CPU pipelining – SIMD • SIMD in Java?? • http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/s hare/vm/opto/superword.cpp
  • 23. Vectorized Processing Id Name Age 101 abc 22 102 def 37 104 ghi 45 105 jkl 25 108 mno 31 112 pqr 27 114 owx 35 101 abc 22 102 def 37 104 ghi 45 105 jkl 25 mn o 31 112 pqr 27 114 owx 35 A relation N-array storage model (NSM) 101 102 104 105 108 112 abc def ghi jkl mn o pqr owx 22 37 45 25 31 27 35 Decomposition storage model (DSM) A Row Column values
  • 24. Vectorized Processing Id 101 102 104 105 108 112 114 Name abc def ghi jkl mno pqr owx Age 22 37 45 25 31 27 35 Decomposition storage model Id 101 102 104 105 Name abc def ghi jkl Age 22 37 45 25 Vectorized model 108 112 114 mno pqr owx 31 27 35 vector block A (fitting in cache) vector block B (fitting in cache) (bad cache hits) (better cache hits)
  • 25. Vectorized Processing MapAddLongIntColCol(int vecNum, long [] result, long [] col1, int [] col2, int [] selVec) { if (selVec == null) { for (int i = 0; I = 0; i < vecNum; i++) { result[i] = col1[i] + col2[i]; } } else { int selIdx; for (int i = 0; I = 0; i < vecNum; i++) { selIdx = selVec[i]; result[selIdx] = col1[selIdx] + col2[selIdx]; } } } Example: Add primitive for long and int vectors
  • 26. Vectorized Processing SelLEQLongIntColCol(int vecNum, int [] resSelVec, long [] col1, int [] col2, int [] selVec) { if (selVec == null) { int selected; for (int rowIdx = 0; rowIdx < vecNum; rowIdx++) { resSelVec[selected] = rowIdx; selected += col1[rowIdx] <= col2[rowIdx] ? 1 : 0; } } else { … } } Example: Less than equal filter primitive for long and int vectors
  • 27. Vectorized Processing vector block 1 vector block 2 vector block 3 Column Values l_shipdate l_discount l_extprice l_tax returnflag l_shipdate <= '1998-09-01’ 1-l_discount l_extprice * l_tax aggregation An example of vectorized processing
  • 28. Vectorized Processing in Tajo • Unsafe-based in-memory structure for vectors – Fast direct memory access – More opportunities to use byte-level operations • Vectorization + Just-in-time compilation – Byte code generation for vectorization primitives in runtime – Significantly reduces branches and interpret overheads
  • 29. • One memory chunk divided into multiple fixed-length vectors • Variable length values stored in p ages of variable areas – Only pointers stored in fixed-length vector • Less data copy and object creation • Fast direct access • Easy byte-level operations – Guava’s FastByteComparisons which compare two strings via long c omparison • Forked it to directly access string vectors Unsafe-based In-memory Structure for Vectors Fixed Area Variable Area variable-length field vector pointers
  • 30. Vectorization + Just-in-time Compilation • For single operation types, many type combinations required: – INT vector (+,-,*,/,%) INT vector – INT vector (+,-,*,/,%) INT single value – INT single value (+,-,*,/,%) INT vector – INT column (+,-,*,/,%) LONG vector – … – FLOAT column ….. • ASM used to generate Java byte code in runtime for various primitives – Cheaper code maintenance – Composite keys for Sort, Groupby, and Hash functions • Less branches and nested loops • Complex Vectorization Primitive Generation (Planned) – Combining Multiple primitives into one primitive
  • 31. Unsafe-based Cukcoo Hash Table • Advantages of Cuckoo hash table – Use of multiple hash functions – No linked list – Only one item in each bucket – Worst-case constant lookup time • Single direct memory allocation for a hash table – Indexed chunks used as buckets – No GC overheads even if rehash entire buckets • Simple and fast lookup • Current implementation only supports fixed- length hash bucket
  • 32. Benchmark Breakdown: Tajo JIT + Vec Enginemilliseconds Scanning lineitem (throughput 138MB/s) Expression evaluation (projection) Hashing groupby key columns Finding all hash bucket ids Aggregation TPC-H Q1
  • 33. Summary • Tajo uses Join order optimization and re-optimizes special cases during running queries • JIT-based Vectorized Engine prototype – Significantly reduces CPU times through: • Vectorized processing • Unsafe-based vector in-memory structure • Unsafe-based Cuckoo hashing • Future work – A single complex primitive generation to process multiple operators at a time – Improvement for production level
  • 34. Get Involved! • We are recruiting contributors! • General – http://tajo.apache.org • Getting Started – http://tajo.apache.org/docs/0.8.0/getting_started.html • Downloads – http://tajo.apache.org/docs/0.8.0/getting_started/downloading_source.html • Jira – Issue Tracker – https://issues.apache.org/jira/browse/TAJO • Join the mailing list – dev-subscribe@tajo.apache.org – issues-subscribe@tajo.apache.org

Editor's Notes

  1. Hi all, My name is Hyunsik. I’m going to present two trials for performance improvement in Tajo project.
  2. As you can see, this is the outline of my talk today. First of all, I’ll give a brief introduction to Apache Tajo for unfamiliar guys. And then, I’ll give our efforts to improve Tajo. First topic is query optimization. second topic is JIT-based vectorization engine. Even though there are two topics, in this talk, I’ll mostly cover JIT-based vectorized engine.
  3. Just a bit about myself. My name is Hyunsik Choi. [pronounce your name very slowly] I’m the Apache Tajo PMC Chair, and I’ve been a full-time contributor since June last year.
  4. If you’re not familiar with Apache Tajo, it is an open source “SQL-on-Hadoop” big data warehouse system. Tajo became an Apache Top-Level Project in March this year. Tajo supports SQL standards and it is designed for low-latency as well as long-running batch queries. Currently, Tajo supports most of standard SQL features, and recently, we’ve added window function and datetime type. Recently, we released 0.8.0 last month.
  5. Here, you can see the overall architecture of Tajo. Currently, a Tajo cluster consists of one Tajo master and a number of workers. The Tajo master acts as a gateway for clients and also manages cluster resources. QueryMaster is the actual master for each query. It is launched in one of workers when a query is submitted. QueryMaster generates plans, and then optimizes them, and controls the query stages. A local query engine processes the actual data. In this architecture, TajoMaster is only single point of failure. For that, we are working on TajoMaster HA issue.
  6. This figure shows how a query statement is transformed into a distributed execution plan. As you can see, the transformation goes through multiple steps including some optimization steps. In this talk, I explain the two optimization steps performed in logical plan optimizer and dag controller.
  7. Roughly, Logical optimizer rewrites queries and determine join order. For rewrite, projection push down and selection push down are basically performed in this step. Projection push down narrows the width of processing values as earlier as possible. And, selection push down reduces the number of rows to be processed as earlier as possible They all significantly reduce intermediate data. This step also determine the join order.
  8. The main goal of join ordering algorithm is to find the best join order among all possible join orders. For it, we use an heuristic called greedy operator order. At the first, this algorithm adds all relations into the remain relation set. Then, in loop, it finds the best join pair among remain relation set. The best join pair is inserted into the remain relation set as the join node. And then, continue it iteratively until all relations are joined. Finally, a sequence of selected best pairs are used as the optimized join order.
  9. As you may be aware, query optimization are based on statistics-based estimation or rules. So, the determined plans can be often suboptimal. Unlike OLTP system, suboptimal and analytical queries on big data may run possibly for hours instead of minutes. In order to solve this problem, Tajo tries to reoptmize the query plan in runtime by collecting statistic information from completed tasks. In the current implementation of this, Tajo uses the collected information to determine the proper range and/or number of partitions of a query. We are working on the issue to reoptimize join order and distributed join strategy.
  10. The second subject is about the prototype for next-generation execution engine. we are working on.
  11. So far, in terms of performance, we have focused on I/O throughput. As a result, we already achieved high throughput. In I/O bound queries, processing throughput reaches from 70 to 110MB/s. BTW, due to increasing customer demand for faster storage, we’ve carried out benchmark test on these storage. Interestingly, we observed that performance is likely to be CPU bound rather than I/O bounds. From this observation, we started the investigation on this issue. Firstly, we noted that the current execution engine is based on tuple-at-a-time approach.
  12. What is the tuple at a time model? This is a traditional query evaluation model, and many databases still use it. In Tuple-at-a-time, Each operator has next() function. Arbitrary physical operators are assembled into one physical operator tree. Then, a query evaluation is performed by iteratively calling a next() of root operator. Tuple-at-a-time is sufficient for OLTP systems, whereas it is unsuited to analytical processing systems. This is because, as you can see, it involves an iterative function call from a physical operator root. This call then invokes the next() function of its descendent operators recursively, meaning that in order to get one tuple, the call path is likely to be very deep. Also, each operator has to retain the operator state and plan information. As a result, it often causes data/instruction cache misses. Also, this approach involves many branches impeding CPU pipelining.
  13. Further to this, current implementation has additional performance degradation factors. Tajo uses Datum classes to wrap Java primitive types. According to our profiling, it results in many object creations and big memory footprint especially in in-memory operators. Tajo also uses an expression tree is used to represent and evaluate expressions. It causes interpret overheads. It has the similar problem as Tuple-at-a-time.
  14. In order to investigate related bottleneck points, we have carried out various benchmark tests. In this talk, I’ll show one benchmark breakdown using TPC-H Q1 shown in the slide This query is usually used to evaluate the computation throughput of dw system. As you can see, in the query, there are 8 aggregation functions and nested expressions. the query has a simple filter, as well as group by and order by clauses. However, the query results in only 4 group by keys. So, this query is good to reveal CPU time while minimizing other factors like GC.
  15. For this benchmark, we used TPC-H lineitem table with SF = 3. Basically, TPC-H generates a plain text table similar to a kind of CSV file. Parsing this kind of file consumes non-trivial CPU times. so, in order to minimize this cost, we converted this to a Parquet format file with the following options: And we use it in our benchmark.
  16. The HW environment for the test was as follows: To eliminate interference, we executed the query in a single thread and on a single machine. We directly invoked the next() function of the root of the physical operator tree. In benchmark, we still use SATA disk. This is because most of guys are more familiar with the I/O throughput on disks. And it is enough to expose our problem.
  17. TPC-H Q1 took about 22 seconds. To expose only CPU time, we also performed only a full scan The scan alone took about 10 seconds of the total time. So, we can assume that the total CPU time was 12 seconds; slightly more than half of the entire query processing time. Of course, it’s natural that complex queries consume CPU costs, but we see rooms for improvement.
  18. In order to investigate it in more detail, we broke the TPC-H Q1 query into parts. After slightly modifying some parts of the physical executors, we measured each part separately. Although there are only four distinct grouping keys, only grouping consumes more than 50% of entire CPU cost. According to my analysis, this cost includes groupby key hashing, hash lookup, and tuple navigation. Also, we can know that each aggregation function and its nested expressions consumes non-trivlal costs.
  19. In addition to that benchmark breakdown, we also analyzed query performance by using a profiler. From BMT and profiler, we were able to observe the following: Definitely, the query performance is affected by tuple-at-a-atime. But, in this benchmark, it is not easy to measure cashes misses and branch mispredictions. We will point it later. Also, each expression consumes significant CPU costs. In the current approach, too many objects are created. This is particularly hard to avoid in in-memory operators. Our hash aggregation also uses Java HashMap. It works well, but it’s not cheap. Also, according to our additional benchmark, we observed that rehashing causes GC frequently and java object has large memory footprint.
  20. Okay, I have mentioned performance degradation points to overcome. I’m going to present our trials to solve these problems. First, we are going to adopt vectorized processing. Vectorized processing evaluates expressions and relational operators in a columnar way. Also, we will use runtime bytecode generation for the vectorization primitives, and to eliminate some vectorization impediments. We have also designed an Unsafe-based in-memory structure for vectors and tuples, as well as an Unsafe-based Cuckoo HashTable.
  21. As you are probably aware, vectorization are originated in some database researches such as CStore, MonetDB and Vectorwise. Recently, Hive has also adopted this approach. The central idea of vectorization is to use primitive arrays for column chunks which can fit into CPU L2 cache. It also uses small and simple loop processing on primitive type arrays. Basically, this approach can significantly reduce branches and makes good use of CPU pipelining. Original research manually uses SIMD. But, In Java, it’s impossible to manually control them. In JAVA, we can just try to implement vectorization primitives satisfying the superword optimization condition described in JVM source code.
  22. Before I discuss Tajo’s vectorization, I’d like to briefly explain the concept of vectorized processing. First of all, I’m going to explain traditional two tuple memory structures: N array storage model and decomposition storage model. They are shortly called NSM and DSM. In NSM, column values in each row is are sequentially stored into memory space. In contrast, In the decomposition storage model (DSM), which is an old columnar memory structure model, all values in certain column are sequentially stored in memory.
  23. The DSM can be also represented as the left side figure. As I mentioned, the DSM model stores sequentially all values in certain column in memory space. BTW, most expressions consume multiple columns. In DSM, each expression evaluation processes corresponding values in certain columns at a time. So, DSM is likely to be bad cache hits. In order to solve this problem, the vectorization model was introduced. In vector mode, columns are divided into cache-fit-sized blocks. All expression evaluations are performed for each vector block.
  24. In vectorized processing, each expression element is evaluated by one primitive. For example, this primitive is add operator between long columns and integer columns. A Selection vector stores an array of indices of filtered values. If selection vector is given, It enables to compute only selected rows. As you can see, this approach uses a simple and small loop for each evaluation. There is no branch and it can make good use of CPU pipelining. Also, due to the cache-fit-sized vector, all evaluations are in-cache processing.
  25. This is a filter primitive. This primitive results in selection vector. This primitive checks if long column values are less than or equal to corresponding to integer columns. The array indices of only matched rows are stored to the selection vector. This seems to be a branch, and it is also represented as branch in JVM byte code. But, it is not a branch when it is translated to an x86 native code.
  26. In sum, in a vectorization model each primitive consumes vectors and outputs result vectors. When a selection vector is given, each primitive only processes selected column values. This figure shows an example of overall work of vectorized processing.
  27. So far, I explained the concept of vectorized processing. From now, I’m going to explain how we implement it. -- Tajo’s vectorized processing model has some additional features and modifications. First, we use an Unsafe-based in-memory structure for vectors. It’s for two main reasons. First is fast direct memory access, and second reason is more opportunities to use byte-level operations. We will also use runtime byte code generation to support vectorization. While vectorization itself reduces interpret overheads, there are still some remain elements involving interpret overheads. Composite key handling for groupby or sort and hash function for composite keys, and tuple navigation are good examples. If we generate them in runtime, we can get more performance benefits.
  28. This slide presents Tajo’s in-memory structure for vectors. We call this structure vectoried row block. As you can see, one vectorized row block has two memory areas. The fixed area is a single memory space, and this memory space is divided into several memory-aligned chunks by using offset and lengths. Each chunk is used as each vector. A variable area consists of a number of pages. Multiple variable-length values such as variable characters can be stored in a single page. If needed, additional pages are allocated dynamically. A vector for variable length type only stores the pointers to actual values in pages. Earlier, I mentioned more opportunities to use byte-level operations. One example is string comparison. We forked Guava’s FastButeComparisons to access string vectors. It compares string values in long values. Another example is hashing for composite key. It enable to handle composite key as a sequence of bytes.
  29. Vectorization itself requires a lot of primitives. Only for just arithmetic and compare primitives, we need primitives for all kinds of data types and operator types. In some projects, primitives are generated by using template techniques. However, we generate them using ASM in runtime as we believe it is cheaper to maintain the code. Also, we generate composite keys and hash functions in order to eliminate branching and nested loops.
  30. The performance gain of vectorized processing comes from in-cache processing and CPU-pipelining. Such performance benefits will only be realized if the other parts are also very efficient. For this reason, we designed an Unsafe-based Cuckoo hash table implementation. Inherently, Cuckoo hash has cache-friendly implementation. It does not use a linked list, and only one item is stored in each bucket. At worst, it guarantees constant lookup time. We tried to implement Cukcoo hash table with Unsafe memory allocation. Currently, this approach can be used for only fixed length hash bucket. for one hash table, we allocate a single memory space. Then, we divide the memory space into a number of fixed-sized chunks. Each chunk is used as a hash bucket. Our cukcoo hash table implementation is GC free and is very suited to analytical processing. We also designed bucket handler interface for custom bucket payload. This approach does not incur garbage collection even if we deal with a large hash table. Basically, Cukcoo hashing is very strong in read-intensive situation due to fast lookup. Also, if we make use of its cache locality, it can be faster several times than java Hash Map. Actually, hash group-by and join do not require deletion. They involve one insertation for each key and lots lookup. So, we believe cukcoo hashing is proper to this application.
  31. Here you can see the benchmark results from our enhancements. As you can see, it significantly reduces CPU times. In addition, we were surprised to find that it also improves I/O throughput. This means that only scan has been affected by the inefficient factors of Tuple-at-a-time approach.
  32. In sum, Tajo uses join order optimization by employing a greedy operator order, and re-optimizes a distributed plan in runtime. Also, we has developed a prototype for a JIT-based vectorization engine which significantly reduces CPU times.