26. Dremel is a scalable, interactive ad-hoc
query system for analysis of read-only
nested data. By combining multi-level
execution trees and columnar data layout,
it is capable of running aggregation
queries over trillion-row tables in
seconds. The system scales to thousands of
CPUs and petabytes of data, and has
thousands of users at Google.
…
“
“
Google’s Dremel
http://research.google.com/pubs/pub36632.html
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar,
Matt Tolton, Theo Vassilakis, Proc. of the 36th Int'l Conf on Very Large Data Bases
(2010), pp. 330-339
30. Apache Drill–key facts
Inspired by Google’s Dremel
Standard SQL 2003 support
Plug-able data sources
Nested data is a first-class citizen
Schema is optional
Community driven, open, 100’s involved
34. Principled Query Execution
Source query—what we want to do (analyst
friendly)
Logical Plan— what we want to do (language
agnostic, computer friendly)
Physical Plan—how we want to do it (the best
way we can tell)
Execution Plan—where we want to do it
35. Principled Query Execution
Source
Query
SQL 2003
DrQL
MongoQL
DSL
Parser
parser API
Logical
Plan
query: [
{
@id: "log",
op: "sequence",
do: [
{
op: "scan",
source: “logs”
},
{
op: "filter",
condition:
"x > 3”
},
Optimizer
Topology
CF
etc.
Physical
Plan
Execution
scanner API
36. Wire-level Architecture
Each node: Drillbit - maximize data locality
Co-ordination, query planning, execution, etc, are distributed
Any node can act as endpoint for a query—foreman
Drillbit
Drillbit
Drillbit
Drillbit
Storage
Storage
Process
Process
Storage
Storage
Process
Process
Storage
Storage
Process
Process
Storage
Storage
Process
Process
node
node
node
node
37. Wire-level Architecture
Curator/Zookeeper for ephemeral cluster membership info
Distributed cache (Hazelcast) for metadata, locality
information, etc.
Curator/Zk
Curator/Zk
Drillbit
Drillbit
Drillbit
Drillbit
Distributed
Distributed
Cache
Cache
Distributed
Distributed
Cache
Cache
Distributed
Distributed
Cache
Cache
Distributed
Distributed
Cache
Cache
Storage
Storage
Process
Process
Storage
Storage
Process
Process
Storage
Storage
Process
Process
Storage
Storage
Process
Process
node
node
node
node
38. Wire-level Architecture
Originating Drillbit acts as foreman: manages query execution,
scheduling, locality information, etc.
Streaming data communication avoiding SerDe
Curator/Zk
Curator/Zk
Drillbit
Drillbit
Drillbit
Drillbit
Distributed
Distributed
Cache
Cache
Distributed
Distributed
Cache
Cache
Distributed
Distributed
Cache
Cache
Distributed
Distributed
Cache
Cache
Storage
Storage
Process
Process
Storage
Storage
Process
Process
Storage
Storage
Process
Process
Storage
Storage
Process
Process
node
node
node
node
39. Wire-level Architecture
Foreman turns into
root of the multi-level
execution tree, leafs
activate their storage
engine interface.
node
Curator/Zk
Curator/Zk
node
node
41. On the shoulders of giants …
Jackson for JSON SerDe for metadata
Typesafe HOCON for configuration and module management
Netty4 as core RPC engine, protobuf for communication
Vanilla Java, LArray and Netty ByteBuf for off-heap large data
structures
Hazelcast for distributed cache
Netflix Curator on top of Zookeeper for service registry
Optiq for SQL parsing and cost optimization
Parquet (http://parquet.io)/ ORC
Janino for expression compilation
ASM for ByteCode manipulation
Yammer Metrics for metrics
Guava extensively
Carrot HPC for primitive collections
42. Key features
Full SQL – ANSI SQL 2003
Nested Data as first class citizen
Optional Schema
Extensibility Points …
43. Extensibility Points
Source query parser API
Custom operators, UDF logical plan
Serving tree, CF, topology physical plan/optimizer
Data sources &formats scanner API
Source
Query
Parser
Logical
Plan
Optimizer
Physical
Plan
Execution
44. User Interfaces
API—DrillClient
Encapsulates endpoint discovery
Supports logical and physical plan submission, query
cancellation, query status
Supports streaming return results
JDBC driver, converting JDBC into DrillClient
communication.
REST proxy for DrillClient
48. Useful Resources
Getting Started guide
https://github.com/vrtx/incubatordrill/blob/getting_started/docs/getting_started.rst
Demo HowTo
https://cwiki.apache.org/confluence/display/DRILL/
Demo+HowTo
How to build/install Apache Drill on Ubuntu 13.04
http://www.confusedcoders.com/bigdata/apachedrill/how-to-build-apache-drill-on-ubuntu-13-04
50. Status
Heavy development by multiple organizations
(MapR, Pentaho, Microsoft, Thoughtworks,
XingCloud, etc.)
Currently more than 100k LOC
Alpha available via
http://people.apache.org/~jacques/apache-drill1.0.0-m1.rc3/
51. Kudos to …
Julian Hyde, Pentaho
Lisen Mu, XingCloud
Tim Chen, Microsoft
Chris Merrick, RJMetrics
David Alves, UT Austin
Sree Vaadi, SSS
Srihari Srinivasan,
ThoughtWorks
•
•
•
•
•
•
•
•
•
Ben Becker, MapR
Jacques Nadeau, MapR
Ted Dunning, MapR
Keys Botzum, MapR
Jason Frantz
Ellen Friedman
Chris Wensel, Concurrent
Gera Shegalov, Oracle
Ryan Rawson, Ohm Data
Alexandre Beche, CERN
Jason Altekruse, MapR
http://incubator.apache.org/drill/team.html
53. Engage!
Follow @ApacheDrill on Twitter
Sign up at mailing lists (user | dev)
http://incubator.apache.org/drill/mailing-lists.html
Standing G+ hangouts every Tuesday at 18:00 CET
http://j.mp/apache-drill-hangouts
Keep an eye on http://drill-user.org/
- Search Data pipeline, real-time ingesting social data (FB/Twitter)
- Battle nations, Top grossing #3, Matchmaking, Online real-time PvP
- Open Source PaaS, Metrics and Usage data
- Halo new Big Data pipeline, working on data ingestion, with open source like Kafka
Love open source, with enthusiastic people offering their time and energy
Smart people and a great sense of community
- Also some less well known projects such as MMORPG game engine, etc.
And I found Drill!
Interactive / real-time is HOT
Batch processing doesn’t serve all our needs anymore
- TC originally article that led me to start contributing
- Drill’s Apache incubator proposal, outlining what Drill is trying to achieve
Pro: Can handle big data, MapReduce abstracts all the distribution and management, flexible code to process
Con: Slow
Hive query startup requires a long processing time….
Pro: Highly available solutions, handle large writes/reads
Con: Not easy to do adhoc SQL like queries
Stream processing is becoming very popular, and new projects are rising up such as Samza based on Kafka.
Walmart labs has a project called Muppet that they called Fast data..
Con: Need to definite topology, and cannot do adhoc querying
AWS CloudSearch, Lucene, all these technology performs searches with the basis of an index they maintain, therefore needs preprocessing of data.
What most people do in querying multiple sources is to engineer an ETL pipeline to a common DW, and query a DW for any cross segments data.
But obviously ETL implies a delay, we can do better.
Drill is not to replace MapReduce, but to supplement it
Two innovations: handle nested-data column style (column-striped representation) and multi-level execution trees
repetition levels (r) — at what repeated field in the field’s path the value has repeated.
definition levels (d) — how many fields in path that could be undefined (because they are optional or repeated) are actually present
Only repeated fields increment the repetition level, only non-required fields increment the definition level.
Required fields are always defined and do not need a definition level. Non repeated fields do not need a repetition level.
An optional field requires one extra bit to store zero if it is NULL and one if it is defined.
NULL values do not need to be stored as the definition level captures this information.
Source query - Human (eg DSL) or tool written(eg SQL/ANSI compliant) query
Source query is parsed and transformed to produce the logical plan
Logical plan: dataflow of what should logically be done
Typically, the logical plan lives in memory in the form of Java objects, but also has a textual form
The logical query is then transformed and optimized into the physical plan.
Optimizer introduces of parallel computation, taking topology into account
Optimizer handles columnar data to improve processing speed
The physical plan represents the actual structure of computation as it is done by the system
How physical and exchange operators should be applied
Assignment to particular nodes and cores + actual query execution per node
Drillbits per node, maximize data locality
Co-ordination, query planning, optimization, scheduling, execution are distributed
By default, Drillbits hold all roles, modules can optionally be disabled.
Any node/Drillbit can act as endpoint for particular query.
Zookeeper maintains ephemeral cluster membership information only
Small distributed cache utilizing embedded Hazelcast maintains information about individual queue depth, cached query plans, metadata, locality information, etc.
Originating Drillbit acts as foreman, manages all execution for their particular query, scheduling based on priority, queue depth and locality information.
Drillbit data communication is streaming and avoids any serialization/deserialization
Red: originating drillbit, is the root of the multi-level execution tree, per query/job
Leafs use their storage engine interface to scan respective data source (DB, file, etc.)