SlideShare a Scribd company logo
1 of 40
Download to read offline
How Apache Drill enables fast analytics over
NoSQL databases and distributed file systems
Aman Sinha
About me
• Apache Drill PMC and Apache Calcite PMC. Past PMC chair of Drill.
• Employment:
– Currently at MapR, Santa Clara. Industry’s leading platform for AI and Analytics.
– Past: ParAccel (columnar DB whose technology powers Amazon Redshift), IBM Silicon
Valley Lab
• Main areas of interest: SQL query processing for RDBMS, NoSQL, Hadoop
• Contact: amansinha@apache.org, Github: https://github.com/amansinha100
Talk Outline
• Motivation
• Brief overview of Drill Architecture
• Improving query performance on NoSQL databases
• Improving query performance on distributed file systems
• Best practices
Motivation
NoSQL DB tables
DFS data
(Parquet, CSV,
JSON files)
Data
ingest
ETL
Additional
data
ingest
● Enterprises often have heterogeneous data sources in the
same cluster
○ Typically, NoSQL distributed databases and flat files in a distributed file
system
Need a Unified Query Capability
NoSQL DB tables
DFS data
(Parquet, CSV,
JSON files)
Data
ingest
ETL
Additional
data
ingest
Query Engine
Provide SQL operations within and
across data sources
Key Challenge
NoSQL DB tables
DFS data
(Parquet, CSV,
JSON files)
Data
ingest
ETL
Additional
data
ingest
Query Engine
How to make these couplings
highly performant in a cluster
deployment ?
Brief Overview of Drill Architecture
Apache Drill Introduction
• Distributed SQL query engine
– Originally inspired in part by Google Dremel paper
– Became TLP in December 2014. Excerpt from the ASF announcement::
“World's first schema-free SQL query engine brings self-service data exploration to Apache
Hadoop™”
– Supports wide range of data sources in the form of Plugins
• Data sources
– Distributed file systems: HDFS, MapR-FS
– NoSQL databases: HBase, MapR-DB (binary and JSON), MongoDB
– Streaming: Kafka
– JDBC
– Hive (tables are managed by Hive but querying is done through Drill for interactive
performance)
– Others: OpenTSDB, Kudu, Amazon S3
• File data formats: Parquet, CSV, TSV, JSON
• Extensible
- Sub-directory
- HBase namespace
- Hive database
SELECT * FROM dfs.yelp.`business.json`
Workspace
- Pathnames
- HBase table
- Hive table
Table
- File system
- HBase
- Hive
Storage Plugin
Components of a FROM Clause in Drill
HDFS HBase
MapR-
DB
HDFS HBase
MapR-
DB
DRILLBIT (any
drillbit can be
Foreman)
HDFS HBase
MapR-
DB
DRILLBIT DRILLBIT
JDBC/ODBC
Client
Web
Console
Data Sources (exposed to Drill as
Storage/Format Plugins)
Zookeeper (could be
co-resident with
Drillbit nodes)
Cluster Architecture with Drill
BI Tools
Architecture Summary
• Schema-on-read
• No centralized metastore
• Fully Java based
• In-memory columnar processing
• Mostly off-heap memory management (negligible GC overhead)
• Code generation for run-time operators
• Optimistic, pipelined execution model
• Spill to disk for blocking operations under memory pressure
• Integrated with YARN for resource management
• Provides a strong framework for UDFs
Operator Execution Model
Receiver
Sender
Fragment boundary
An Exchange
operator
Scan Op
Root Op
● Iterator based model: PULL model
within a major fragment, PUSH model
across major fragments
○ Several minor fragments (threads)
constitute a major fragment
● Parent operator calls next() on its child
● Data is processed in ‘Record Batches’
(upper bounded to 64K records per
batch)
● Data flow is pipelined until a blocking
operator is encountered (Sort, Hash)
● Operators do run-time code generation
for each new Schema
Drill Web UI with Operator Profiles
Columnar In-memory Format
RecordBatch
BatchSchema (List of
MaterializedField)
● Predecessor to Apache Arrow
● MaterializedField has a name and
a data type
● Value Vectors are facades to byte buffers
provided by underlying Netty
● Fixed-width, Variable-width value
vectors
● Various data types
● 3 sub types of Value Vectors
○ Optional (Nullable)
○ Required (Non-Nullable)
○ Repeated
● Code generation: Each operator (except
Scan) generates Java code at run-time
based on RecordBatch schema
ValueVectors
Data Locality and Parallelization
● Drill tries to ensure that scan threads are co-located with the
data file
● Encapsulated as ‘affinity’
List<EndpointAffinity> getOperatorAffinity();
Core interface methods for parallelization at the scan level
int getMinParallelizationWidth();
int getMaxParallelizationWidth();
Back to the Performance Question …
• Exploiting locality helps reduce network data transfers
• Parallelization improves CPU utilization
• What about disk I/O ?
– Rest of the talk will discuss this in the context of 2 types of data
sources:
• NoSQL distributed databases
• Distributed file systems (focus on Parquet format)
Improving query performance on NoSQL
databases
Secondary Index: Background
• HBase and MapR-DB tables have primary key (row key) column
– This column values are sorted
– Efficient range pruning is done for rowkey predicates:
WHERE rowkey BETWEEN 150 AND 350
• Secondary columns (e.g ‘State’) values are not sorted
– Predicate WHERE state = ‘CA’ need full table scan !
• Solution ? Create secondary index ‘tables’
– PK of index table is a concatenation: state + rowkey :
• (AZ_500), … (CA_250), (CA_300), ...
100 200 300 400 1000
Sequential scan
only this range
regions
rowkey
Secondary Index
• NoSQL DBs supporting secondary index
– MongoDB
– MapR-DB JSON
– HBase + Phoenix
– Couchbase
– Cassandra
• What’s missing ?
– Other than Hbase + Phoenix, others don’t have an ANSI SQL interface
– There’s a need for a generalized cost-based index planning and execution framework
– A key requirement:
• Framework must be able to support ‘global’ non-covering indexes, not just covering
index
Leveraging Secondary Index via Drill
• Storage/Format plugin whose backend supports secondary indexing
– Reference implementation is with MapR-DB JSON
• Index metadata is exposed to Drill planner through well defined interfaces
• Statistics (if available) are also exposed
• New run-time operators added for executing index plans
• Drill Planner extends Apache Calcite’s planner with additional rules for index
planning.
– Generates index-access plans and compare them cost-wise (using Apache Calcite’s
Volcano planner) to full table scan and with each other
• Feature will be available in upcoming Drill release (please follow JIRA:
DRILL-6381, PR: https://github.com/apache/drill/pull/1466)
Types of Queries Eligible for Index Planning
• WHERE clause with local filters
– <, >, =, BETWEEN
– IN, LIKE
– Eligible ANDed conditions
– Eligible ORed conditions
– Certain types of functions, e.g CAST(zipcode as BIGINT) = 12345 (only if data
source supports functional indexes)
• ORDER BY
• GROUP BY (using StreamingAggregate)
• JOIN (using MergeJoin)
Leading Prefix Columns and Statistics
• Query predicate: WHERE a > 10 AND b < 30
a b c
Composite index key
a c b
Comments
Leading prefix columns ‘a’ and ‘b’ hence full predicate is eligible for
index range pruning
Leading prefix columns only ‘a’ hence only a > 10 eligible for index
range pruning
• Statistics
– Index planning relies on statistics exposed by underlying DB (in future Drill will
collect these)
– Each individual conjunct may have an estimated row count
• a > 10 : 2M rows
• b < 30: 5M rows
• Total row count of table: 100M rows
Selectivity = 0.02 * 0.05
IndexSelector uses these for
cost-based analysis
Covering vs. Non-Covering Indexes
• Covering: All columns referenced in the query are available in the index
– Easier to handle by the planner. Generate an index-only plan.
• Non-Covering: Only a subset of the columns are available in the index
– Needs more supporting infrastructure from planner and executor
• RowKey join
• Restricted (‘skip’ scan)
• Range partitioning with a plugin-specific partitioning function
Join-back to Primary Table (for non-covering indexes)
• SELECT * FROM T WHERE a > 10 AND b < 30
• Composite key index on {a, b}
• How to produce the remaining (‘star’) columns ?
Index Scan
a > 10 AND b < 30
RowKey Join
Range Partition
Exchange
Restricted (a.k.a
‘skip’ ) scan
Return rows
Row keys
Supply row
keys
Do ‘bucketing’ of row keys belonging
to the same region/tablet (this needs
knowledge of the tablet map)
Index Intersection
• SELECT * FROM T WHERE a > 10 AND b < 30
• Suppose single key index exists on ‘a’ and ‘b’
Index Scan
a > 10 Index Scan
b < 30
Intersect HashJoin
Broadcast Exchange
RowKey Join
Range Partition
Exchange
Restricted Scan
Example: GROUP BY queries
SELECT a, b, SUM(c)
FROM T WHERE ..
GROUP BY a, b
• Suppose composite key index exists on {a, b}
• Planner will create 2 types of plans
– HashAggregate plan which does hashing on {a, b}
– StreamingAggregate plan which relies on sorted input on {a, b}
• The sorted input is provided by the index
• This plan is typically cheaper than the HashAggregate plan
Sample interfaces to be implemented by plugin
● DbGroupScan
○ IndexCollection getSecondaryIndexCollection(RelNode scan)
○ DbGroupScan getRestrictedScan(List<SchemaPath> columns);
○ PartitionFunction getRangePartitionFunction(List<FieldReference> refList)
○ PluginCost getPluginCostModel()
● PluginCost
○ int getSequentialBlockReadCost(GroupScan scan)
○ int getRandomBlockReadCost(GroupScan scan)
● IndexDiscover
○ IndexCollection getTableIndex(String tableName)
● IndexDefinition
○ List<LogicalExpression> getRowKeyColumns()
○ List<LogicalExpression> getIndexColumns()
○ List<LogicalExpression> getNonIndexColumns()
○ Map<LogicalExpression, RelFieldCollation> getCollationMap()
Improving query performance on Distributed File
Systems
Directory based partition pruning
Orders / 2015 / Jan / 1.parquet
Orders / 2015 / Jan / 2.parquet
dir0 dir1
Orders / 2018 / Aug / 30.parquet
Orders / 2018 / Aug / 31.parquet
Implicit metadata
columns
CREATE VIEW V1 AS
SELECT `dir0` as `year`, `dir1` as `month`,
o_custkey, o_totalprice, o_orderstatus
FROM dfs.`Orders`;
SELECT o_custkey, SUM(o_totalprice)
FROM V1
WHERE `year` = 2018 AND
`month` = ‘July’
GROUP BY o_custkey;
● 2 ways to query the implicit directory
columns in filter conditions
○ WHERE dir0 = 2018
○ Via a view with aliasing as below
File based partition pruning (Parquet)
• CREATE TABLE T1 (a, b, c)
PARTITION BY a, b
AS SELECT col1 as a, col2 as b, col3 as c FROM …
• CTAS with Partition-By creates separate files, each file with 1 partition value
• Multiple files may be created for the same partition value
• Can prune entire file (row group) based on filter on partitioning column
Handling Complex Predicates
Boolean operator AND, OR
Comparison operators <, >, = etc.
● The leaf nodes are either partitioning columns or non-partitioning columns or
constants. E.g `year` = 2015
● Given arbitrary expression tree, Drill will determine what predicates can be pushed
down safely for partition pruning.
○ OR can only be pushed if both sides are eligible
Parquet Row Group Metadata
"path" : "/Users/asinha/data/table3_meta/0_0_3.parquet",
"length" : 245,
"rowGroups" : [ {
"start" : 4,
"length" : 91,
"rowCount" : 1,
"hostAffinity" : {
"localhost" : 1.0
},
"columns" : [ {
"name" : "`c1`",
"primitiveType" : "INT32",
"originalType" : null,
"nulls" : 0,
"max" : 5,
"min" : 5
}, {
"name" : "`c3`",
"primitiveType" : "BINARY",
"originalType" : "UTF8",
"nulls" : 0,
"max" : {
"bytes" : "IHl5"
},
"min" : {
"bytes" : "IHl5"
}
} ]
● Drill has to read RG metadata during
planning time
○ Get host affinity, stats etc.
● Doesn’t scale for hundreds of thousands of
files
● One solution: Cache the metadata on disk by
running explicit ‘REFRESH TABLE
METADATA <table>’ command
● Can improve planning time by 10x
Parquet Filter Pushdown
• Applied during planning process
– Based on MIN/MAX statistics of the column in Parquet Row Group
– Eliminates row groups after intersecting ANDed filters
• Pushdown is slightly different from partition pruning
– Pushdown is applicable for all columns that have min/max statistics, not just partitioning
columns
• Pushdown applicable for scalar columns and complex type columns
– NOTE: Filter on VARCHAR and DECIMAL types are not currently pushed down but support
is being added for upcoming Drill release
Filter Pushdown and Pruning
• Complex Types Support
SELECT * FROM table WHERE col_name.nested_col_int = 23
SELECT * FROM table WHERE col_name.nested_col_bln is not null
SELECT * FROM table WHERE col_name.nested_col_arr[0] > 10
• Limitations:
– Logical optimizations can be applied only for the columns with available statistics
SELECT * FROM table WHERE col_name.nested_col_arr[2] is null
– Other limitations are the same as for the Filter Pushdown for scalar data types
Filter Pushdown and Pruning
● Transitive Closure
SELECT * FROM dfs.`/tmp/first` t1 JOIN dfs.`/tmp/SECOND` t2
ON t1.`MONTH` = t2.`MONTH` WHERE t2.`MONTH` = 4
Predicate pushed
down on both sides
of Join
More examples:
SELECT * FROM t1
JOIN t2 T2 ON t1.a = t2.a WHERE t2.a NOT IN (4, 6)
SELECT * FROM t1
JOIN (SELECT a, b FROM table WHERE a = 1987 AND b = 5) t2
ON t1.a = t2.a AND t1.b = t2.b
Run-Time Filter Pushdown for HashJoin
SELECT a1 FROM fact WHERE b1 IN (SELECT b2 FROM dim WHERE dim_col = ‘CA’)
Hash Join (b1 = b2)
Scan
‘Fact’ table
Scan
‘Dim’ table
Filter dim_col = ‘CA’
Hash Exchange Hash Exchange
Build sideProbe side
Qualified b2
values
Bloom filter on
b1 values
Goal: Substantially
reduce network transfer
by this exchange
Best Practices for Parquet Data Layout
• Large number of small files or fewer large files ?
– Recommended to have fewer large files subject to constraints such as Parquet block size
and desired parallelism
• Too many small files (order of few KBs) may cause data skew and affect performance
• Too few large files reduces parallelism … need to determine sweet spot
• Block size considerations
– Best to have Parquet block size equal or less than HDFS/MapR-FS block size
• Larger block size will spread 1 row group over 2 or more nodes incurring remote reads.
• Preferred compression type
– Snappy is preferred for performance, gzip for space but higher cost to decompress
– Default for Drill - snappy
Summary
• Apache Drill’s extensible architecture makes it easy to support multiple data
sources
– While exploiting data locality and parallelism suited to the data source
• Fast analytics on NoSQL databases using a new generalized framework to
do index based planning and execution
• Fast analytics on distributed file system tables by doing intelligent partitioning
and filter pushdown
Useful links
• Drill resources
– http://drill.apache.org
– Twitter: @ApacheDrill
– Mailing lists:
• user@drill.apache.org
• dev@drill.apache.org
•
Get involved with the Drill community !
Q & A

More Related Content

What's hot

A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Spark Summit
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisliang chen
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5Yan Zhou
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Improving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInImproving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInDatabricks
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Denny Lee
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsDatabricks
 

What's hot (20)

A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysis
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Spark etl
Spark etlSpark etl
Spark etl
 
Improving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInImproving Spark SQL at LinkedIn
Improving Spark SQL at LinkedIn
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 

Similar to How Apache Drill enables fast analytics over NoSQL databases and distributed file systems

10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010John Sichi
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseJulien Le Dem
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedTony Rogerson
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed databaseJulien Le Dem
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introductionJakub Stransky
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPTony Rogerson
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemSerendio Inc.
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 

Similar to How Apache Drill enables fast analytics over NoSQL databases and distributed file systems (20)

10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed database
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed database
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introduction
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 

Recently uploaded

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

How Apache Drill enables fast analytics over NoSQL databases and distributed file systems

  • 1. How Apache Drill enables fast analytics over NoSQL databases and distributed file systems Aman Sinha
  • 2. About me • Apache Drill PMC and Apache Calcite PMC. Past PMC chair of Drill. • Employment: – Currently at MapR, Santa Clara. Industry’s leading platform for AI and Analytics. – Past: ParAccel (columnar DB whose technology powers Amazon Redshift), IBM Silicon Valley Lab • Main areas of interest: SQL query processing for RDBMS, NoSQL, Hadoop • Contact: amansinha@apache.org, Github: https://github.com/amansinha100
  • 3. Talk Outline • Motivation • Brief overview of Drill Architecture • Improving query performance on NoSQL databases • Improving query performance on distributed file systems • Best practices
  • 4. Motivation NoSQL DB tables DFS data (Parquet, CSV, JSON files) Data ingest ETL Additional data ingest ● Enterprises often have heterogeneous data sources in the same cluster ○ Typically, NoSQL distributed databases and flat files in a distributed file system
  • 5. Need a Unified Query Capability NoSQL DB tables DFS data (Parquet, CSV, JSON files) Data ingest ETL Additional data ingest Query Engine Provide SQL operations within and across data sources
  • 6. Key Challenge NoSQL DB tables DFS data (Parquet, CSV, JSON files) Data ingest ETL Additional data ingest Query Engine How to make these couplings highly performant in a cluster deployment ?
  • 7. Brief Overview of Drill Architecture
  • 8. Apache Drill Introduction • Distributed SQL query engine – Originally inspired in part by Google Dremel paper – Became TLP in December 2014. Excerpt from the ASF announcement:: “World's first schema-free SQL query engine brings self-service data exploration to Apache Hadoop™” – Supports wide range of data sources in the form of Plugins • Data sources – Distributed file systems: HDFS, MapR-FS – NoSQL databases: HBase, MapR-DB (binary and JSON), MongoDB – Streaming: Kafka – JDBC – Hive (tables are managed by Hive but querying is done through Drill for interactive performance) – Others: OpenTSDB, Kudu, Amazon S3 • File data formats: Parquet, CSV, TSV, JSON • Extensible
  • 9. - Sub-directory - HBase namespace - Hive database SELECT * FROM dfs.yelp.`business.json` Workspace - Pathnames - HBase table - Hive table Table - File system - HBase - Hive Storage Plugin Components of a FROM Clause in Drill
  • 10. HDFS HBase MapR- DB HDFS HBase MapR- DB DRILLBIT (any drillbit can be Foreman) HDFS HBase MapR- DB DRILLBIT DRILLBIT JDBC/ODBC Client Web Console Data Sources (exposed to Drill as Storage/Format Plugins) Zookeeper (could be co-resident with Drillbit nodes) Cluster Architecture with Drill BI Tools
  • 11. Architecture Summary • Schema-on-read • No centralized metastore • Fully Java based • In-memory columnar processing • Mostly off-heap memory management (negligible GC overhead) • Code generation for run-time operators • Optimistic, pipelined execution model • Spill to disk for blocking operations under memory pressure • Integrated with YARN for resource management • Provides a strong framework for UDFs
  • 12. Operator Execution Model Receiver Sender Fragment boundary An Exchange operator Scan Op Root Op ● Iterator based model: PULL model within a major fragment, PUSH model across major fragments ○ Several minor fragments (threads) constitute a major fragment ● Parent operator calls next() on its child ● Data is processed in ‘Record Batches’ (upper bounded to 64K records per batch) ● Data flow is pipelined until a blocking operator is encountered (Sort, Hash) ● Operators do run-time code generation for each new Schema
  • 13. Drill Web UI with Operator Profiles
  • 14. Columnar In-memory Format RecordBatch BatchSchema (List of MaterializedField) ● Predecessor to Apache Arrow ● MaterializedField has a name and a data type ● Value Vectors are facades to byte buffers provided by underlying Netty ● Fixed-width, Variable-width value vectors ● Various data types ● 3 sub types of Value Vectors ○ Optional (Nullable) ○ Required (Non-Nullable) ○ Repeated ● Code generation: Each operator (except Scan) generates Java code at run-time based on RecordBatch schema ValueVectors
  • 15. Data Locality and Parallelization ● Drill tries to ensure that scan threads are co-located with the data file ● Encapsulated as ‘affinity’ List<EndpointAffinity> getOperatorAffinity(); Core interface methods for parallelization at the scan level int getMinParallelizationWidth(); int getMaxParallelizationWidth();
  • 16. Back to the Performance Question … • Exploiting locality helps reduce network data transfers • Parallelization improves CPU utilization • What about disk I/O ? – Rest of the talk will discuss this in the context of 2 types of data sources: • NoSQL distributed databases • Distributed file systems (focus on Parquet format)
  • 17. Improving query performance on NoSQL databases
  • 18. Secondary Index: Background • HBase and MapR-DB tables have primary key (row key) column – This column values are sorted – Efficient range pruning is done for rowkey predicates: WHERE rowkey BETWEEN 150 AND 350 • Secondary columns (e.g ‘State’) values are not sorted – Predicate WHERE state = ‘CA’ need full table scan ! • Solution ? Create secondary index ‘tables’ – PK of index table is a concatenation: state + rowkey : • (AZ_500), … (CA_250), (CA_300), ... 100 200 300 400 1000 Sequential scan only this range regions rowkey
  • 19. Secondary Index • NoSQL DBs supporting secondary index – MongoDB – MapR-DB JSON – HBase + Phoenix – Couchbase – Cassandra • What’s missing ? – Other than Hbase + Phoenix, others don’t have an ANSI SQL interface – There’s a need for a generalized cost-based index planning and execution framework – A key requirement: • Framework must be able to support ‘global’ non-covering indexes, not just covering index
  • 20. Leveraging Secondary Index via Drill • Storage/Format plugin whose backend supports secondary indexing – Reference implementation is with MapR-DB JSON • Index metadata is exposed to Drill planner through well defined interfaces • Statistics (if available) are also exposed • New run-time operators added for executing index plans • Drill Planner extends Apache Calcite’s planner with additional rules for index planning. – Generates index-access plans and compare them cost-wise (using Apache Calcite’s Volcano planner) to full table scan and with each other • Feature will be available in upcoming Drill release (please follow JIRA: DRILL-6381, PR: https://github.com/apache/drill/pull/1466)
  • 21. Types of Queries Eligible for Index Planning • WHERE clause with local filters – <, >, =, BETWEEN – IN, LIKE – Eligible ANDed conditions – Eligible ORed conditions – Certain types of functions, e.g CAST(zipcode as BIGINT) = 12345 (only if data source supports functional indexes) • ORDER BY • GROUP BY (using StreamingAggregate) • JOIN (using MergeJoin)
  • 22. Leading Prefix Columns and Statistics • Query predicate: WHERE a > 10 AND b < 30 a b c Composite index key a c b Comments Leading prefix columns ‘a’ and ‘b’ hence full predicate is eligible for index range pruning Leading prefix columns only ‘a’ hence only a > 10 eligible for index range pruning • Statistics – Index planning relies on statistics exposed by underlying DB (in future Drill will collect these) – Each individual conjunct may have an estimated row count • a > 10 : 2M rows • b < 30: 5M rows • Total row count of table: 100M rows Selectivity = 0.02 * 0.05 IndexSelector uses these for cost-based analysis
  • 23. Covering vs. Non-Covering Indexes • Covering: All columns referenced in the query are available in the index – Easier to handle by the planner. Generate an index-only plan. • Non-Covering: Only a subset of the columns are available in the index – Needs more supporting infrastructure from planner and executor • RowKey join • Restricted (‘skip’ scan) • Range partitioning with a plugin-specific partitioning function
  • 24. Join-back to Primary Table (for non-covering indexes) • SELECT * FROM T WHERE a > 10 AND b < 30 • Composite key index on {a, b} • How to produce the remaining (‘star’) columns ? Index Scan a > 10 AND b < 30 RowKey Join Range Partition Exchange Restricted (a.k.a ‘skip’ ) scan Return rows Row keys Supply row keys Do ‘bucketing’ of row keys belonging to the same region/tablet (this needs knowledge of the tablet map)
  • 25. Index Intersection • SELECT * FROM T WHERE a > 10 AND b < 30 • Suppose single key index exists on ‘a’ and ‘b’ Index Scan a > 10 Index Scan b < 30 Intersect HashJoin Broadcast Exchange RowKey Join Range Partition Exchange Restricted Scan
  • 26. Example: GROUP BY queries SELECT a, b, SUM(c) FROM T WHERE .. GROUP BY a, b • Suppose composite key index exists on {a, b} • Planner will create 2 types of plans – HashAggregate plan which does hashing on {a, b} – StreamingAggregate plan which relies on sorted input on {a, b} • The sorted input is provided by the index • This plan is typically cheaper than the HashAggregate plan
  • 27. Sample interfaces to be implemented by plugin ● DbGroupScan ○ IndexCollection getSecondaryIndexCollection(RelNode scan) ○ DbGroupScan getRestrictedScan(List<SchemaPath> columns); ○ PartitionFunction getRangePartitionFunction(List<FieldReference> refList) ○ PluginCost getPluginCostModel() ● PluginCost ○ int getSequentialBlockReadCost(GroupScan scan) ○ int getRandomBlockReadCost(GroupScan scan) ● IndexDiscover ○ IndexCollection getTableIndex(String tableName) ● IndexDefinition ○ List<LogicalExpression> getRowKeyColumns() ○ List<LogicalExpression> getIndexColumns() ○ List<LogicalExpression> getNonIndexColumns() ○ Map<LogicalExpression, RelFieldCollation> getCollationMap()
  • 28. Improving query performance on Distributed File Systems
  • 29. Directory based partition pruning Orders / 2015 / Jan / 1.parquet Orders / 2015 / Jan / 2.parquet dir0 dir1 Orders / 2018 / Aug / 30.parquet Orders / 2018 / Aug / 31.parquet Implicit metadata columns CREATE VIEW V1 AS SELECT `dir0` as `year`, `dir1` as `month`, o_custkey, o_totalprice, o_orderstatus FROM dfs.`Orders`; SELECT o_custkey, SUM(o_totalprice) FROM V1 WHERE `year` = 2018 AND `month` = ‘July’ GROUP BY o_custkey; ● 2 ways to query the implicit directory columns in filter conditions ○ WHERE dir0 = 2018 ○ Via a view with aliasing as below
  • 30. File based partition pruning (Parquet) • CREATE TABLE T1 (a, b, c) PARTITION BY a, b AS SELECT col1 as a, col2 as b, col3 as c FROM … • CTAS with Partition-By creates separate files, each file with 1 partition value • Multiple files may be created for the same partition value • Can prune entire file (row group) based on filter on partitioning column
  • 31. Handling Complex Predicates Boolean operator AND, OR Comparison operators <, >, = etc. ● The leaf nodes are either partitioning columns or non-partitioning columns or constants. E.g `year` = 2015 ● Given arbitrary expression tree, Drill will determine what predicates can be pushed down safely for partition pruning. ○ OR can only be pushed if both sides are eligible
  • 32. Parquet Row Group Metadata "path" : "/Users/asinha/data/table3_meta/0_0_3.parquet", "length" : 245, "rowGroups" : [ { "start" : 4, "length" : 91, "rowCount" : 1, "hostAffinity" : { "localhost" : 1.0 }, "columns" : [ { "name" : "`c1`", "primitiveType" : "INT32", "originalType" : null, "nulls" : 0, "max" : 5, "min" : 5 }, { "name" : "`c3`", "primitiveType" : "BINARY", "originalType" : "UTF8", "nulls" : 0, "max" : { "bytes" : "IHl5" }, "min" : { "bytes" : "IHl5" } } ] ● Drill has to read RG metadata during planning time ○ Get host affinity, stats etc. ● Doesn’t scale for hundreds of thousands of files ● One solution: Cache the metadata on disk by running explicit ‘REFRESH TABLE METADATA <table>’ command ● Can improve planning time by 10x
  • 33. Parquet Filter Pushdown • Applied during planning process – Based on MIN/MAX statistics of the column in Parquet Row Group – Eliminates row groups after intersecting ANDed filters • Pushdown is slightly different from partition pruning – Pushdown is applicable for all columns that have min/max statistics, not just partitioning columns • Pushdown applicable for scalar columns and complex type columns – NOTE: Filter on VARCHAR and DECIMAL types are not currently pushed down but support is being added for upcoming Drill release
  • 34. Filter Pushdown and Pruning • Complex Types Support SELECT * FROM table WHERE col_name.nested_col_int = 23 SELECT * FROM table WHERE col_name.nested_col_bln is not null SELECT * FROM table WHERE col_name.nested_col_arr[0] > 10 • Limitations: – Logical optimizations can be applied only for the columns with available statistics SELECT * FROM table WHERE col_name.nested_col_arr[2] is null – Other limitations are the same as for the Filter Pushdown for scalar data types
  • 35. Filter Pushdown and Pruning ● Transitive Closure SELECT * FROM dfs.`/tmp/first` t1 JOIN dfs.`/tmp/SECOND` t2 ON t1.`MONTH` = t2.`MONTH` WHERE t2.`MONTH` = 4 Predicate pushed down on both sides of Join More examples: SELECT * FROM t1 JOIN t2 T2 ON t1.a = t2.a WHERE t2.a NOT IN (4, 6) SELECT * FROM t1 JOIN (SELECT a, b FROM table WHERE a = 1987 AND b = 5) t2 ON t1.a = t2.a AND t1.b = t2.b
  • 36. Run-Time Filter Pushdown for HashJoin SELECT a1 FROM fact WHERE b1 IN (SELECT b2 FROM dim WHERE dim_col = ‘CA’) Hash Join (b1 = b2) Scan ‘Fact’ table Scan ‘Dim’ table Filter dim_col = ‘CA’ Hash Exchange Hash Exchange Build sideProbe side Qualified b2 values Bloom filter on b1 values Goal: Substantially reduce network transfer by this exchange
  • 37. Best Practices for Parquet Data Layout • Large number of small files or fewer large files ? – Recommended to have fewer large files subject to constraints such as Parquet block size and desired parallelism • Too many small files (order of few KBs) may cause data skew and affect performance • Too few large files reduces parallelism … need to determine sweet spot • Block size considerations – Best to have Parquet block size equal or less than HDFS/MapR-FS block size • Larger block size will spread 1 row group over 2 or more nodes incurring remote reads. • Preferred compression type – Snappy is preferred for performance, gzip for space but higher cost to decompress – Default for Drill - snappy
  • 38. Summary • Apache Drill’s extensible architecture makes it easy to support multiple data sources – While exploiting data locality and parallelism suited to the data source • Fast analytics on NoSQL databases using a new generalized framework to do index based planning and execution • Fast analytics on distributed file system tables by doing intelligent partitioning and filter pushdown
  • 39. Useful links • Drill resources – http://drill.apache.org – Twitter: @ApacheDrill – Mailing lists: • user@drill.apache.org • dev@drill.apache.org • Get involved with the Drill community !
  • 40. Q & A