SlideShare a Scribd company logo
1 of 50
Download to read offline
Anatomy of
유동민
Presto Anatomy
1. Who

2. What

3. Library

4. Code Generation

5. Plugin

6. Future
Who am I

Who uses Presto
Who am I
ZeroMQ Committer, Presto Contributor
Who am I
49 Contributors

7 Facebook Developer
Who uses Presto
Who uses Presto
http://www.slideshare.net/dain1/presto-meetup-2015

http://www.slideshare.net/NezihYigitbasi/prestoatnetflixhadoopsummit15
@ Facebook / daily
Scan PBs (ORC?)

Trillions Rows

30K Queries

1000s Users
@ Netflix / daily
Scan 15+ PBs (Parquet)

2.5K Queries

300 Users

1 coordinator / r3.4xlarge

220 workers / r3.4xlarge
@ TD / daily
Scan PBs (A Columnar)

Billions Rows

18K Queries

2 coordinators / r3.4xlarge

26 Workers / c3.8xlarge
Who uses Presto - Airbnb
Airpal - a web-based, query execution tool

Presto is amazing. It's an order of magnitude
faster than Hive in most our use cases. It reads
directly from HDFS, so unlike Redshift, there
isn't a lot of ETL before you can use it. It just
works.

- Christopher Gutierrez, Manager of Online Analytics, Airbnb
What is Presto
Presto - Command Line
Presto is
An open source distributed SQL query engine 

for running interactive analytic queries 

against data sources of all sizes 

ranging from gigabytes to petabytes.
http://prestodb.io
Presto is
•Fast !!! (10x faster than Hive)

•Even faster with new Presto ORC reader

•Written in Java with a pluggable backend

•Not SQL-like, but ANSI-SQL

•Code generation like LLVM

•Not only source is open, but open sourced (No private branch)
Presto is
HTTP
HTTP
Presto is
Presto is
CREATE TABLE mysql.hello.order_item AS

SELECT o.*, i.*

FROM hive.world.orders o —― TABLESAMPLE SYSTEM (10)

JOIN mongo.deview.lineitem i —― TABLESAMPLE BERNOULLI (40)

ON o.orderkey = i.orderkey

WHERE conditions..
Coordinator
Presto - Planner
Fragmenter
Worker
SQL Analyzer
Analysis
Logical
Plan
Optimizer
Plan
Plan

Fragment
Distributed
Query

Scheduler
Stage

Execution

Plan
Worker
Worker
Local

Execution

Planner
TASK
TASK
Presto - Page
PAGE
- positionCount
VAR_WIDTH BLOCK
nulls

Offsets
Values
FIEXED_WIDTH BLOCK
nulls
Values
positionCount blockCount 11 F I X E D _ W I D T H po
sitionCount bit encoded nullFlags values length
values
14 V A R I A B L E _ W
I D T H positionCount offsets[0] offsets[1] offsets[2…]
offsets[pos-1] offsets[pos] bit encoded nullFlags
values
Page / Block serialization
Presto - Cluster Memory Manager
Coordinator
Worker
Worker
Worker
GET /v1/memory
@Config(“query.max-memory”) = 20G
@Config(“query.max-memory-per-node”) = 1G
@Config(“resources.reserved-system-memory”) = 40% of -Xmx
System

reserved-system-memory
Reserved

max-memory-per-node
General
Task
Block All tasks
Presto - IndexManager
•LRU worker memory Cache

•@Config(task.max-index-memory) = 64M

•Table (List<IndexColumn> columns)

•Good for Key / Constant tables
Presto - Prestogres
https://github.com/treasure-data/prestogres
•Clients to use PostgreSQL protocol
to run queries on Presto

•Modified pgpoll-ii

•No ODBC driver yet

•Supports Tableau, ChartIO and etc
Presto - Security
•Authentication

•Single Sign On - Kerberos, SSL client cert

•Authorization

•Simple allow / deny
Presto - Misc.
• Presto Verifier

• Presto Benchmark Driver 

• Query Queue

• ${USER} and ${SOURCE} based maxConcurrent, maxQueued

• JDBC Driver

• Python / Ruby Presto Client (Treasure Data)

• Presto Metrics (Treasure Data)

• GET /v1/jmx/mbean/{mbean}, /v1/query/{query}, /v1/node/

• Presto Python/Java Event Collector (Treasure Data)

• QueryStart, SpitCompletion, QueryCompletion
23
Library
Slice - Efficient memory accessor
•https://github.com/airlift/slice

•ByteBuffer is slow

•Slices.allocate(size), Slices.allocateDirect(size)

•Slices.wrappedBuffer(byteBuffer)

•sun.misc.Unsafe

•Address

•((DirectBuffer) byteBuffer).getAddress()

•unsafe.ARRAY_BYTE_OFFSET + byteBuffer.arrayOffset()

•unsafe.getLong(address), unsafe.getInt(address),

•unsafe.copyMemory(src, address, dst, address)
Airlift - Distributed service framework
•https://github.com/airlift/airlift

•Core of Presto communication

•HTTP

•Bootstrap

•Node discovery

•RESTful API

•Dependency Injection

•Configuration

•Utilities
@Path("/v2/event")

public class EventResource {

@POST

public Response createQuery(EventRequests events) {

…

}

}

public class CollectorMainModule implements ConfigurationAwareModule { 

    @Override 

    public synchronized void configure(Binder binder) { 

        discoveryBinder(binder).bindHttpAnnouncement("collector"); 

        jsonCodecBinder(binder).bindJsonCodec(EventRequest.class); 

        jaxrsBinder(binder).bind(EventResource.class); 

    } 

public static void main(String[] args){

Bootstrap app = new Bootstrap(ImmutableList.of(

new NodeModule(),

new DiscoveryModule(),

new HttpServerModule(),

new JsonModule(), new JaxrsModule(true),

new EventModule(),

new CollectorMainModule()

));

Injector injector = app.strictConfig().initialize();

injector.getInstance(Announcer.class).start();

}

}
ex) https://github.com/miniway/presto-event-collector
Fastutil - Fast Java collection
•FastUtil 6.6.0 turned out to be consistently fast.

•Koloboke is getting second in many tests.

•GS implementation is good enough, but is slower than FastUtil and Koloboke.
http://java-performance.info/hashmap-overview-jdk-fastutil-goldman-sachs-hppc-koloboke-trove-january-2015/
ASM - Bytecode manipulation
package pkg;



public interface SumInterface {

long sum(long value);

}

public class MyClass implements SumInterface {

private long result = 0L;



public MyClass(long value) {

result = value;

}



@Override

public long sum(long value) {

result += value;

return result;

}

}

ClassWriter cw = new ClassWriter(0);

cw.visit(V1_7, ACC_PUBLIC, 

"pkg/MyClass", null, 

"java/lang/Object", 

new String[] { "pkg/SumInterface" });

cw.visitField(ACC_PRIVATE, 

"result", "J", null, new Long(0));



// constructor

MethodVisitor m = cw.visitMethod(ACC_PUBLIC, 

"<init>", "(J)V", null, null);

m.visitCode();



// call super()

m.visitVarInsn(ALOAD, 0); // this

m.visitMethodInsn(INVOKESPECIAL, 

"java/lang/Object", "<init>", “()V",

false);
ASM - Bytecode manipulation (Cont.)
// this.result = value

m.visitVarInsn(ALOAD, 0); // this

m.visitVarInsn(LLOAD, 1 ); // value

m.visitFieldInsn(PUTFIELD, 

"pkg/MyClass", "result", "J");

m.visitInsn(RETURN);

m.visitMaxs(-1, -1).visitEnd();



// public long sum(long value)

m = cw.visitMethod(ACC_PUBLIC , "sum", "(J)J", 

null, null);

m.visitCode();



m.visitVarInsn(ALOAD, 0); // this

m.visitVarInsn(ALOAD, 0); // this

m.visitFieldInsn(GETFIELD,

"pkg/MyClass", "result", "J");

m.visitVarInsn(LLOAD, 1); // value

// this.result + value

m.visitInsn(LADD); 

m.visitFieldInsn(PUTFIELD, 

"pkg/MyClass", "result", "J");



m.visitVarInsn(ALOAD, 0); // this

m.visitFieldInsn(GETFIELD, 

"pkg/MyClass", "result", "J");

m.visitInsn(LRETURN);

m.visitMaxs(-1, -1).visitEnd();



cw.visitEnd();

byte[] bytes = cw.toByteArray();

ClassLoader.defindClass(bytes)
Library - Misc.
•JDK 8u40 +

•Guice - Lightweight dependency injection

•Guava - Replacing Java8 Stream, Optional and Lambda

•ANTLR4 - Parser generator, SQL parser

•Jetty - HTTP Server and Client

•Jackson - JSON

•Jersey - RESTful API
Byte Code Generation

Function Variable Binding
Code Generation
•ASM

•Runtime Java classes and methods generation base on SQL

•Where

•Filter and Projection

•Join Lookup source

•Join Probe

•Order By

•Aggregation
Code Generation - Filter
SELECT * FROM lineitem 

WHERE orderkey = 100 AND quantity = 200
AND
EQ

(#0,100)
EQ

(#1,200)
Logical Planner
class AndOperator extends Operator {

private Operator left = new EqualOperator(#1, 100);

private Operator right = new EqualOperator(#2, 200);

@Override

public boolean evaluate(Cursor cur)

{

if (!left.evaluate(cur)) {

return false;

}

return right.evaluate(cur);

}

}

class EqualOperator extends Operator {

@Override

public boolean evaluate(Cursor c)

{

return cur.getValue(position).equals(value);

}

}
Code Generation - Filter
// invoke MethodHandle( $operator$EQUAL(#0, 100) )

push cursor.getValue(#0)

push 100

$statck = invokeDynamic boostrap(0) $operator$EQUAL      

if (!$stack) { goto end; }

push cursor.getValue(#1)

push 200

$stack = invokeDynamic boostrap(0) $operator$EQUAL      

end:      

return $stack 

@ScalarOperator(EQUAL)

@SqlType(BOOLEAN)

public static boolean equal(@SqlType(BIGINT) long left,

@SqlType(BIGINT) long right){

    return left == right;

}

=> MethodHandle(“$operator$EQUAL(long, long): boolean”)
AND
$op$EQ

(#0,100)
$op$EQ

(#1,200)
Local Execution Planner
Code Generation - Join Lookup Source
SELECT col1, col2, col3 FROM tabA JOIN tabB 

ON tabA.col1 = tabB.colX /* BIGINT */ AND tabA.col2 = tabB.colY /* VARCHAR */
Page
Block(colY)
Block(colX)
PagesIndex
0,0 0,1 0,1023 2,100
Block(colX)
Block(colY)
B(X)
B(Y)
B(X) B(X)
B(Y) B(Y)
addresses
channels
Lookup

Source
JoinProbe
-1
-1
3000
-1
1023
100
-1
PagesHash
- hashRow
- equalsRow
ROW
hash
Found!
Code Generation - PageHash
class PagesHash { 

   List<Type> types = [BIGINT, VARCHAR] 

   List<Integer> hashChannels = [0,1] 

   List<List<Block>> channels = [ 

[BLOCK(colX), BLOCK(colX), ...] , 

              [BLOCK(colY), BLOCK(colY), ...] 

]  

} 

class (Compiled)PageHash { 

     Type type_colX = BIGINT 

     Type type_colY = VARCHAR 

     int hashChannel_colX = 0 

     int hashChannel_colY = 1 

     List<Block> channel_colX = 

[ BLOCK(colX), BLOCK(colX), … ] 

     List<Block> channel_colY = 

[ BLOCK(colY), BLOCK(colY), … ] 

}
Code Generation - PageHash (Cont.)
long hashRow (int position, 

Block[] blocks) { 

int result = 0; 

  for (int i = 0; i < hashChannels.size(); i++) {

int hashChannel = hashChannels.get(i); 

       Type type = types.get(hashChannel); 

            

result = result * 31 + 

type.hash(blocks[i], position); 

    } 

return result; 

} 

long (Compiled)hashRow (int position,

Block[] blocks) { 

        int result = 0; 

        result = result * 31 + 

type_colX.hash(block[0], position); 

        result = result * 31 + 

type_colY.hash(block[1], position); 

        return result; 

}
Code Generation - PageHash (Cont.)
boolean equalsRow (

int leftBlockIndex, int leftPosition, 

int rightPosition, Block[] rightBlocks) { 

    for (int i = 0; i < hashChannels.size(); i++) { 

    int hashChannel = hashChannels.get(i); 

       Type type = types.get(hashChannel); 

Block leftBlock = 

channels.get(hashChannel)

.get(leftBlockIndex); 

       if (!type.equalTo(leftBlock, leftPosition,

  rightBlocks[i], rightPosition)) { 

            return false; 

        } 

    } 

    return true; 

}

boolean (Compiled)equalsRow (

int leftBlockIndex, int leftPosition, 

int rightPosition, Block[] rightBlocks) { 

    Block leftBlock = 

channels_colX.get(leftBlockIndex); 

    if (!type.equalTo(leftBlock, leftPosition, 

rightBlocks[0], rightPosition)) { 

            return false; 

    } 

    leftBlock = 

channels_colY.get(leftBlockIndex); 

    if (!type.equalTo(leftBlock, leftPosition, 

rightBlocks[1], rightPosition)) { 

            return false; 

    } 

    return true; 

}
Method Variable Binding
1. regexp_like(string, pattern) → boolean

2. regexp_like(string, cast(pattern as RegexType))  // OperatorType.CAST 

3. regexp_like(string, new Regex(pattern))

4. MethodHandle handle = MethodHandles.insertArgument(1, new Regex(pattern))

5. handle.invoke (string)
@ScalarOperator(OperatorType.CAST)

@SqlType(“RegExp”)

public static Regex castToRegexp(@SqlType(VARCHAR) Slice pattern){ 

  return new Regex(pattern.getBytes(), 0, pattern.length()); 

} 

@ScalarFunction

@SqlType(BOOLEAN)

public static boolean regexpLike(@SqlType(VARCHAR) Slice source,

@SqlType(“RegExp”) Regex pattern){

    Matcher m = pattern.matcher(source.getBytes());

    int offset = m.search(0, source.length());

    return offset != -1;

}
Plugin - Connector
Plugin
•Hive

•Hadoop 1, Hadoop2, CDH4, CDH 5

•MySQL

•PostgreSQL

•Cassandra

•MongoDB

•Kafka

•Raptor
•Machine Learning

•BlackHole

•JMX

•TPC-H

•Example
Plugin - Raptor
•Storage data in flash on the Presto machines in ORC format

•Metadata is stored in MySQL (Extendable)

•Near real-time loads (5 - 10mins)

•3TB / day, 80B rows/day , 5 secs query

•CREATE VIEW myview AS SELECT …

•DELETE FROM tab WHERE conditions…

•UPDATE (Future)

•Coarse grained Index : min / max value of all columns

•Compaction

•Backup Store (Extendable)
No more ?!
Plugin - How to write
•https://prestodb.io/docs/current/develop/spi-overview.html

•ConnectorFactory

•ConnectorMetadata

•ConnectorSplitManager

•ConnectorHandleResolver

•ConnectorRecordSetProvider (PageSourceProvider)

•ConnectorRecordSinkProvider (PageSinkProvider)

•Add new Type

•Add new Function (A.K.A UDF)
Plugin - MongoDB
•https://github.com/facebook/presto/pull/3337

•5 Non-business days

•Predicate Pushdown

•Add a Type (ObjectId)

•Add UDFs (objectid(), objectid(string))
public class MongoPlugin implements Plugin { 

    @Override 

    public <T> List<T> getServices(Class<T> type) { 

        if (type == ConnectorFactory.class) { 

           return ImmutableList.of(

new MongoConnectorFactory(…)); 

        } else if (type == Type.class) { 

            return ImmutableList.of(OBJECT_ID); 

        } else if (type == FunctionFactory.class) { 

            return ImmutableList.of(

new MongoFunctionFactory(typeManager)); 

        } 

        return ImmutableList.of(); 

    } 

}
Plugin - MongoDB
class MongoFactory implements ConnectorFactory {

@Override

public Connector create(String connectorId) {

Bootstrap app = new Bootstrap(new
MongoClientModule());

return app.initialize()

.getInstance(MongoConnector.class);

}

}

class MongoClientModule implements Module {

@Override

public void configure(Binder binder){

binder.bind(MongoConnector.class)

.in(SINGLETON);

…

configBinder(binder)

.bindConfig(MongoClientConfig.class);

}

}
class MongoConnector implements Connector {

@Inject

public MongoConnector(

MongoSession mongoSession,

MongoMetadata metadata,

MongoSplitManager splitManager,

MongoPageSourceProvider 

pageSourceProvider,

MongoPageSinkProvider 

pageSinkProvider,

MongoHandleResolver 

handleResolver) {

… 

}

}
Plugin - MongoDB UDF
public class MongoFunctionFactory 

        implements FunctionFactory { 

    @Override 

    public List<ParametricFunction> listFunctions() 

    { 

        return new FunctionListBuilder(typeManager) 

                .scalar(ObjectIdFunctions.class) 

                .getFunctions(); 

    } 

} 

public class ObjectIdType 

        extends AbstractVariableWidthType { 

    ObjectIdType OBJECT_ID = new ObjectIdType(); 

    @JsonCreator 

    public ObjectIdType() { 

        super(parseTypeSignature("ObjectId"), 

Slice.class); 

    } 

}
public class ObjectIdFunctions { 

    @ScalarFunction("objectid") 

    @SqlType("ObjectId") 

    public static Slice ObjectId() { 

        return Slices.wrappedBuffer(

new ObjectId().toByteArray()); 

    } 

    @ScalarFunction("objectid") 

    @SqlType("ObjectId") 

    p.s Slice ObjectId(@SqlType(VARCHAR) Slice value) { 

        return Slices.wrappedBuffer(

new ObjectId(value.toStringUtf8()).toByteArray())
    } 

    @ScalarOperator(EQUAL) 

    @SqlType(BOOLEAN) 

    p.s boolean equal(@SqlType("ObjectId") Slice left,

@SqlType("ObjectId") Slice right) { 

        return left.equals(right); 

    } 

}
Future
Presto - Future
•Cost based optimization

•Join Hint (Right table must be smaller)

•Huge Join / Aggregation

•Coordinator High Availability

•Task Recovery

•Work Stealing

•Full Pushdown

•Vectorization

• ODBC Driver (Early 2016, Teradata)

• More security (LDAP, Grant SQL)
SELECT question FROM you
Thank You

More Related Content

What's hot

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 

What's hot (20)

Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
 
ABRIS: Avro Bridge for Apache Spark with Felipe Melo and Georgi Chochov
 ABRIS: Avro Bridge for Apache Spark with Felipe Melo and Georgi Chochov ABRIS: Avro Bridge for Apache Spark with Felipe Melo and Georgi Chochov
ABRIS: Avro Bridge for Apache Spark with Felipe Melo and Georgi Chochov
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Presto, Zeppelin을 이용한 초간단 BI 구축 사례
Presto, Zeppelin을 이용한 초간단 BI 구축 사례Presto, Zeppelin을 이용한 초간단 BI 구축 사례
Presto, Zeppelin을 이용한 초간단 BI 구축 사례
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Eventually, Scylla Chooses Consistency
Eventually, Scylla Chooses ConsistencyEventually, Scylla Chooses Consistency
Eventually, Scylla Chooses Consistency
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 

Similar to Presto anatomy

Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
Sages
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
David Padbury
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applications
Tom Croucher
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale
Subbu Allamaraju
 
Composing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopComposing re-useable ETL on Hadoop
Composing re-useable ETL on Hadoop
Paul Lam
 

Similar to Presto anatomy (20)

[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
JS everywhere 2011
JS everywhere 2011JS everywhere 2011
JS everywhere 2011
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
 
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance Issues
 
Kotlin+MicroProfile: Ensinando 20 anos para uma linguagem nova
Kotlin+MicroProfile: Ensinando 20 anos para uma linguagem novaKotlin+MicroProfile: Ensinando 20 anos para uma linguagem nova
Kotlin+MicroProfile: Ensinando 20 anos para uma linguagem nova
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applications
 
Implementing Comet using PHP
Implementing Comet using PHPImplementing Comet using PHP
Implementing Comet using PHP
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
 
The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016
 
Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New Tricks
 
Non-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.jsNon-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.js
 
루비가 얼랭에 빠진 날
루비가 얼랭에 빠진 날루비가 얼랭에 빠진 날
루비가 얼랭에 빠진 날
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale
 
How Secure Are Docker Containers?
How Secure Are Docker Containers?How Secure Are Docker Containers?
How Secure Are Docker Containers?
 
Composing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopComposing re-useable ETL on Hadoop
Composing re-useable ETL on Hadoop
 

Recently uploaded

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Presto anatomy

  • 2. Presto Anatomy 1. Who 2. What 3. Library 4. Code Generation 5. Plugin 6. Future
  • 3. Who am I Who uses Presto
  • 4. Who am I ZeroMQ Committer, Presto Contributor
  • 5. Who am I 49 Contributors 7 Facebook Developer
  • 7. Who uses Presto http://www.slideshare.net/dain1/presto-meetup-2015 http://www.slideshare.net/NezihYigitbasi/prestoatnetflixhadoopsummit15 @ Facebook / daily Scan PBs (ORC?) Trillions Rows 30K Queries 1000s Users @ Netflix / daily Scan 15+ PBs (Parquet) 2.5K Queries 300 Users 1 coordinator / r3.4xlarge 220 workers / r3.4xlarge @ TD / daily Scan PBs (A Columnar) Billions Rows 18K Queries 2 coordinators / r3.4xlarge 26 Workers / c3.8xlarge
  • 8. Who uses Presto - Airbnb Airpal - a web-based, query execution tool Presto is amazing. It's an order of magnitude faster than Hive in most our use cases. It reads directly from HDFS, so unlike Redshift, there isn't a lot of ETL before you can use it. It just works. - Christopher Gutierrez, Manager of Online Analytics, Airbnb
  • 12. Presto is •Fast !!! (10x faster than Hive) •Even faster with new Presto ORC reader •Written in Java with a pluggable backend •Not SQL-like, but ANSI-SQL •Code generation like LLVM •Not only source is open, but open sourced (No private branch)
  • 15. Presto is CREATE TABLE mysql.hello.order_item AS SELECT o.*, i.* FROM hive.world.orders o —― TABLESAMPLE SYSTEM (10) JOIN mongo.deview.lineitem i —― TABLESAMPLE BERNOULLI (40) ON o.orderkey = i.orderkey WHERE conditions..
  • 16. Coordinator Presto - Planner Fragmenter Worker SQL Analyzer Analysis Logical Plan Optimizer Plan Plan Fragment Distributed Query Scheduler Stage Execution Plan Worker Worker Local Execution Planner TASK TASK
  • 17. Presto - Page PAGE - positionCount VAR_WIDTH BLOCK nulls Offsets Values FIEXED_WIDTH BLOCK nulls Values positionCount blockCount 11 F I X E D _ W I D T H po sitionCount bit encoded nullFlags values length values 14 V A R I A B L E _ W I D T H positionCount offsets[0] offsets[1] offsets[2…] offsets[pos-1] offsets[pos] bit encoded nullFlags values Page / Block serialization
  • 18. Presto - Cluster Memory Manager Coordinator Worker Worker Worker GET /v1/memory @Config(“query.max-memory”) = 20G @Config(“query.max-memory-per-node”) = 1G @Config(“resources.reserved-system-memory”) = 40% of -Xmx System reserved-system-memory Reserved max-memory-per-node General Task Block All tasks
  • 19. Presto - IndexManager •LRU worker memory Cache •@Config(task.max-index-memory) = 64M •Table (List<IndexColumn> columns) •Good for Key / Constant tables
  • 20. Presto - Prestogres https://github.com/treasure-data/prestogres •Clients to use PostgreSQL protocol to run queries on Presto •Modified pgpoll-ii •No ODBC driver yet •Supports Tableau, ChartIO and etc
  • 21. Presto - Security •Authentication •Single Sign On - Kerberos, SSL client cert •Authorization •Simple allow / deny
  • 22. Presto - Misc. • Presto Verifier • Presto Benchmark Driver • Query Queue • ${USER} and ${SOURCE} based maxConcurrent, maxQueued • JDBC Driver • Python / Ruby Presto Client (Treasure Data) • Presto Metrics (Treasure Data) • GET /v1/jmx/mbean/{mbean}, /v1/query/{query}, /v1/node/ • Presto Python/Java Event Collector (Treasure Data) • QueryStart, SpitCompletion, QueryCompletion
  • 23. 23
  • 25. Slice - Efficient memory accessor •https://github.com/airlift/slice •ByteBuffer is slow •Slices.allocate(size), Slices.allocateDirect(size) •Slices.wrappedBuffer(byteBuffer) •sun.misc.Unsafe •Address •((DirectBuffer) byteBuffer).getAddress() •unsafe.ARRAY_BYTE_OFFSET + byteBuffer.arrayOffset() •unsafe.getLong(address), unsafe.getInt(address), •unsafe.copyMemory(src, address, dst, address)
  • 26. Airlift - Distributed service framework •https://github.com/airlift/airlift •Core of Presto communication •HTTP •Bootstrap •Node discovery •RESTful API •Dependency Injection •Configuration •Utilities @Path("/v2/event") public class EventResource { @POST public Response createQuery(EventRequests events) { … } } public class CollectorMainModule implements ConfigurationAwareModule {     @Override     public synchronized void configure(Binder binder) {         discoveryBinder(binder).bindHttpAnnouncement("collector");         jsonCodecBinder(binder).bindJsonCodec(EventRequest.class);         jaxrsBinder(binder).bind(EventResource.class);     } public static void main(String[] args){ Bootstrap app = new Bootstrap(ImmutableList.of( new NodeModule(), new DiscoveryModule(), new HttpServerModule(), new JsonModule(), new JaxrsModule(true), new EventModule(), new CollectorMainModule() )); Injector injector = app.strictConfig().initialize(); injector.getInstance(Announcer.class).start(); } } ex) https://github.com/miniway/presto-event-collector
  • 27. Fastutil - Fast Java collection •FastUtil 6.6.0 turned out to be consistently fast. •Koloboke is getting second in many tests. •GS implementation is good enough, but is slower than FastUtil and Koloboke. http://java-performance.info/hashmap-overview-jdk-fastutil-goldman-sachs-hppc-koloboke-trove-january-2015/
  • 28. ASM - Bytecode manipulation package pkg;
 
 public interface SumInterface {
 long sum(long value);
 } public class MyClass implements SumInterface {
 private long result = 0L;
 
 public MyClass(long value) {
 result = value;
 }
 
 @Override
 public long sum(long value) {
 result += value;
 return result;
 }
 } ClassWriter cw = new ClassWriter(0);
 cw.visit(V1_7, ACC_PUBLIC, "pkg/MyClass", null, "java/lang/Object", new String[] { "pkg/SumInterface" });
 cw.visitField(ACC_PRIVATE, "result", "J", null, new Long(0));
 
 // constructor
 MethodVisitor m = cw.visitMethod(ACC_PUBLIC, "<init>", "(J)V", null, null);
 m.visitCode(); 
 // call super()
 m.visitVarInsn(ALOAD, 0); // this
 m.visitMethodInsn(INVOKESPECIAL, "java/lang/Object", "<init>", “()V", false);
  • 29. ASM - Bytecode manipulation (Cont.) // this.result = value
 m.visitVarInsn(ALOAD, 0); // this
 m.visitVarInsn(LLOAD, 1 ); // value
 m.visitFieldInsn(PUTFIELD, "pkg/MyClass", "result", "J");
 m.visitInsn(RETURN);
 m.visitMaxs(-1, -1).visitEnd();
 
 // public long sum(long value)
 m = cw.visitMethod(ACC_PUBLIC , "sum", "(J)J", null, null);
 m.visitCode(); 
 m.visitVarInsn(ALOAD, 0); // this
 m.visitVarInsn(ALOAD, 0); // this
 m.visitFieldInsn(GETFIELD, "pkg/MyClass", "result", "J");
 m.visitVarInsn(LLOAD, 1); // value
 // this.result + value
 m.visitInsn(LADD); 
 m.visitFieldInsn(PUTFIELD, "pkg/MyClass", "result", "J");
 
 m.visitVarInsn(ALOAD, 0); // this
 m.visitFieldInsn(GETFIELD, "pkg/MyClass", "result", "J");
 m.visitInsn(LRETURN);
 m.visitMaxs(-1, -1).visitEnd();
 
 cw.visitEnd();
 byte[] bytes = cw.toByteArray(); ClassLoader.defindClass(bytes)
  • 30. Library - Misc. •JDK 8u40 + •Guice - Lightweight dependency injection •Guava - Replacing Java8 Stream, Optional and Lambda •ANTLR4 - Parser generator, SQL parser •Jetty - HTTP Server and Client •Jackson - JSON •Jersey - RESTful API
  • 31. Byte Code Generation Function Variable Binding
  • 32. Code Generation •ASM •Runtime Java classes and methods generation base on SQL •Where •Filter and Projection •Join Lookup source •Join Probe •Order By •Aggregation
  • 33. Code Generation - Filter SELECT * FROM lineitem WHERE orderkey = 100 AND quantity = 200 AND EQ (#0,100) EQ (#1,200) Logical Planner class AndOperator extends Operator { private Operator left = new EqualOperator(#1, 100); private Operator right = new EqualOperator(#2, 200); @Override public boolean evaluate(Cursor cur) { if (!left.evaluate(cur)) { return false; } return right.evaluate(cur); } } class EqualOperator extends Operator { @Override public boolean evaluate(Cursor c) { return cur.getValue(position).equals(value); } }
  • 34. Code Generation - Filter // invoke MethodHandle( $operator$EQUAL(#0, 100) ) push cursor.getValue(#0) push 100 $statck = invokeDynamic boostrap(0) $operator$EQUAL      if (!$stack) { goto end; } push cursor.getValue(#1) push 200 $stack = invokeDynamic boostrap(0) $operator$EQUAL      end:       return $stack @ScalarOperator(EQUAL) @SqlType(BOOLEAN) public static boolean equal(@SqlType(BIGINT) long left, @SqlType(BIGINT) long right){     return left == right; } => MethodHandle(“$operator$EQUAL(long, long): boolean”) AND $op$EQ (#0,100) $op$EQ (#1,200) Local Execution Planner
  • 35. Code Generation - Join Lookup Source SELECT col1, col2, col3 FROM tabA JOIN tabB ON tabA.col1 = tabB.colX /* BIGINT */ AND tabA.col2 = tabB.colY /* VARCHAR */ Page Block(colY) Block(colX) PagesIndex 0,0 0,1 0,1023 2,100 Block(colX) Block(colY) B(X) B(Y) B(X) B(X) B(Y) B(Y) addresses channels Lookup Source JoinProbe -1 -1 3000 -1 1023 100 -1 PagesHash - hashRow - equalsRow ROW hash Found!
  • 36. Code Generation - PageHash class PagesHash {    List<Type> types = [BIGINT, VARCHAR]    List<Integer> hashChannels = [0,1]    List<List<Block>> channels = [ [BLOCK(colX), BLOCK(colX), ...] ,               [BLOCK(colY), BLOCK(colY), ...] ]  } class (Compiled)PageHash {      Type type_colX = BIGINT      Type type_colY = VARCHAR      int hashChannel_colX = 0      int hashChannel_colY = 1      List<Block> channel_colX = [ BLOCK(colX), BLOCK(colX), … ]      List<Block> channel_colY = [ BLOCK(colY), BLOCK(colY), … ] }
  • 37. Code Generation - PageHash (Cont.) long hashRow (int position, Block[] blocks) { int result = 0;   for (int i = 0; i < hashChannels.size(); i++) { int hashChannel = hashChannels.get(i);        Type type = types.get(hashChannel);             result = result * 31 +  type.hash(blocks[i], position);     } return result; } long (Compiled)hashRow (int position, Block[] blocks) {         int result = 0;         result = result * 31 + type_colX.hash(block[0], position);         result = result * 31 + type_colY.hash(block[1], position);         return result; }
  • 38. Code Generation - PageHash (Cont.) boolean equalsRow ( int leftBlockIndex, int leftPosition,  int rightPosition, Block[] rightBlocks) {     for (int i = 0; i < hashChannels.size(); i++) {     int hashChannel = hashChannels.get(i);        Type type = types.get(hashChannel); Block leftBlock =  channels.get(hashChannel) .get(leftBlockIndex);        if (!type.equalTo(leftBlock, leftPosition,   rightBlocks[i], rightPosition)) {             return false;         }     }     return true; } boolean (Compiled)equalsRow ( int leftBlockIndex, int leftPosition,  int rightPosition, Block[] rightBlocks) {     Block leftBlock =  channels_colX.get(leftBlockIndex);     if (!type.equalTo(leftBlock, leftPosition,  rightBlocks[0], rightPosition)) {             return false;     }     leftBlock =  channels_colY.get(leftBlockIndex);     if (!type.equalTo(leftBlock, leftPosition,  rightBlocks[1], rightPosition)) {             return false;     }     return true; }
  • 39. Method Variable Binding 1. regexp_like(string, pattern) → boolean 2. regexp_like(string, cast(pattern as RegexType))  // OperatorType.CAST  3. regexp_like(string, new Regex(pattern)) 4. MethodHandle handle = MethodHandles.insertArgument(1, new Regex(pattern)) 5. handle.invoke (string) @ScalarOperator(OperatorType.CAST) @SqlType(“RegExp”) public static Regex castToRegexp(@SqlType(VARCHAR) Slice pattern){   return new Regex(pattern.getBytes(), 0, pattern.length());  } @ScalarFunction @SqlType(BOOLEAN) public static boolean regexpLike(@SqlType(VARCHAR) Slice source, @SqlType(“RegExp”) Regex pattern){     Matcher m = pattern.matcher(source.getBytes());     int offset = m.search(0, source.length());     return offset != -1; }
  • 41. Plugin •Hive •Hadoop 1, Hadoop2, CDH4, CDH 5 •MySQL •PostgreSQL •Cassandra •MongoDB •Kafka •Raptor •Machine Learning •BlackHole •JMX •TPC-H •Example
  • 42. Plugin - Raptor •Storage data in flash on the Presto machines in ORC format •Metadata is stored in MySQL (Extendable) •Near real-time loads (5 - 10mins) •3TB / day, 80B rows/day , 5 secs query •CREATE VIEW myview AS SELECT … •DELETE FROM tab WHERE conditions… •UPDATE (Future) •Coarse grained Index : min / max value of all columns •Compaction •Backup Store (Extendable) No more ?!
  • 43. Plugin - How to write •https://prestodb.io/docs/current/develop/spi-overview.html •ConnectorFactory •ConnectorMetadata •ConnectorSplitManager •ConnectorHandleResolver •ConnectorRecordSetProvider (PageSourceProvider) •ConnectorRecordSinkProvider (PageSinkProvider) •Add new Type •Add new Function (A.K.A UDF)
  • 44. Plugin - MongoDB •https://github.com/facebook/presto/pull/3337 •5 Non-business days •Predicate Pushdown •Add a Type (ObjectId) •Add UDFs (objectid(), objectid(string)) public class MongoPlugin implements Plugin {     @Override     public <T> List<T> getServices(Class<T> type) {         if (type == ConnectorFactory.class) {            return ImmutableList.of( new MongoConnectorFactory(…));         } else if (type == Type.class) {             return ImmutableList.of(OBJECT_ID);         } else if (type == FunctionFactory.class) {             return ImmutableList.of( new MongoFunctionFactory(typeManager));         }         return ImmutableList.of();     } }
  • 45. Plugin - MongoDB class MongoFactory implements ConnectorFactory { @Override public Connector create(String connectorId) { Bootstrap app = new Bootstrap(new MongoClientModule()); return app.initialize() .getInstance(MongoConnector.class); } } class MongoClientModule implements Module { @Override public void configure(Binder binder){ binder.bind(MongoConnector.class) .in(SINGLETON); … configBinder(binder) .bindConfig(MongoClientConfig.class); } } class MongoConnector implements Connector { @Inject public MongoConnector( MongoSession mongoSession, MongoMetadata metadata, MongoSplitManager splitManager, MongoPageSourceProvider pageSourceProvider, MongoPageSinkProvider pageSinkProvider, MongoHandleResolver handleResolver) { … } }
  • 46. Plugin - MongoDB UDF public class MongoFunctionFactory         implements FunctionFactory {     @Override     public List<ParametricFunction> listFunctions()     {         return new FunctionListBuilder(typeManager)                 .scalar(ObjectIdFunctions.class)                 .getFunctions();     } } public class ObjectIdType         extends AbstractVariableWidthType {     ObjectIdType OBJECT_ID = new ObjectIdType();     @JsonCreator     public ObjectIdType() {         super(parseTypeSignature("ObjectId"),  Slice.class);     } } public class ObjectIdFunctions {     @ScalarFunction("objectid")     @SqlType("ObjectId")     public static Slice ObjectId() {         return Slices.wrappedBuffer( new ObjectId().toByteArray());     }     @ScalarFunction("objectid")     @SqlType("ObjectId")     p.s Slice ObjectId(@SqlType(VARCHAR) Slice value) {         return Slices.wrappedBuffer( new ObjectId(value.toStringUtf8()).toByteArray())     }     @ScalarOperator(EQUAL)     @SqlType(BOOLEAN)     p.s boolean equal(@SqlType("ObjectId") Slice left, @SqlType("ObjectId") Slice right) {         return left.equals(right);     } }
  • 48. Presto - Future •Cost based optimization •Join Hint (Right table must be smaller) •Huge Join / Aggregation •Coordinator High Availability •Task Recovery •Work Stealing •Full Pushdown •Vectorization • ODBC Driver (Early 2016, Teradata) • More security (LDAP, Grant SQL)