Presto in Treasure Data

1. Presto in Treasure Data Mitsunori Komatsu, Treasure Data

2. Who am I? • Mitsunori Komatsu,   Software engineer @ Treasure Data. • Presto, Hive, Plazma, td-android-sdk,   td-ios-sdk, Mobile SDK backend,  embedded-sdk • github:komamitsu,  msgpack-java committer, Presto contributor, etc…

3. Today’s talk • What's Presto? • Pros & Cons • Architecture • Who uses Presto? • How do we use Presto?

4. What’s Presto?

5. Fast • Distributed SQL query engine (MPP) • Low latency and good performance • No disk IO • Pipelined execution (not Map Reduce) • Compile a query plan down to byte code • Off heap memory • Suitable for ad-hoc query

6. Pluggable • Pluggable backends (“connectors”) • Cassandra / Hive / JMX / Kafka / MySQL / PostgreSQL / System / TPCH • We can add a new connector by   extending SPI • Treasure Data has been developed a connector to access our storage

7. What kind of SQL • Supports ANSI SQL (Not HiveQL) • Easy to use Presto compared to HiveQL • Structural type: Map, Array, JSON, Row • Window functions • Approximate queries • http://blinkdb.org/

8. Limitations • Fails with huge JOIN • In memory only (broadcast / distributed JOIN) • No grace / hybrid hash join • No fault tolerance • Coordinator is SPOF • No “cost based” optimization • No authentication / authorization • No native ODBC => Prestogres

9. Architectural overview https://prestodb.io/overview.html With Hive connector

10. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]  - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]  - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=  ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]  - custkey := tpch:custkey:1  - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]  - custkey_0 := tpch:custkey:0  - nationkey := tpch:nationkey:3 select  c.nationkey,  count(1)  from orders o join customer c  on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey

11. Stage 1 Stage 2 Stage 0 Query, stage, task and split Query Task 0.0 Split Task 1.0 Split Task 1.1 Task 1.2 Split Split Split Task 2.0 Split Task 2.1 Task 2.2 Split Split Split Split Split Split Split Split For example… TableScan (FROM) Aggregation (GROUP BY) Output @worker#2 @worker#3 @worker#0

12. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]  - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]  - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=  ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]  - custkey := tpch:custkey:1  - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]  - custkey_0 := tpch:custkey:0  - nationkey := tpch:nationkey:3 select  c.nationkey,  count(1)  from orders o join customer c  on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey

13. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]  - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]  - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=  ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]  - custkey := tpch:custkey:1  - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]  - custkey_0 := tpch:custkey:0  - nationkey := tpch:nationkey:3 select  c.nationkey,  count(1)  from orders o join customer c  on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3

14. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]  - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]  - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=  ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]  - custkey := tpch:custkey:1  - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]  - custkey_0 := tpch:custkey:0  - nationkey := tpch:nationkey:3 select  c.nationkey,  count(1)  from orders o join customer c  on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2

15. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]  - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]  - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=  ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]  - custkey := tpch:custkey:1  - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]  - custkey_0 := tpch:custkey:0  - nationkey := tpch:nationkey:3 select  c.nationkey,  count(1)  from orders o join customer c  on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2 Stage 1

16. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]  - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]  - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=  ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]  - custkey := tpch:custkey:1  - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]  - custkey_0 := tpch:custkey:0  - nationkey := tpch:nationkey:3 select  c.nationkey,  count(1)  from orders o join customer c  on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2 Stage 1 Stage 0

17. What each component do? Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External  Metadata?

22. Who uses Presto? • Facebook http://www.slideshare.net/dain1/presto-meetup-2015

23. • Dropbox Who uses Presto?

24. • Airbnb Who uses Presto?

25. • Qubole • SaaS • Treasure Data • SaaS • Teradata (new!) • commercial support Who uses Presto? As a service…

26. Today’s talk • What's Presto? • How do we use Presto? • What’s Treasure Data • Architecture • How we manage Presto

27. How do we use Presto? We…?

28. Treasure Data

29. Treasure Data 18 600,000

30. Treasure Data Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /  Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,  TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported

31. Architecture in Treasure Data worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker result bucket (S3) Retry failed query if needed Authentication / Authorization Columnar ﬁle format. Schema-less. td-presto connector

39. Schema on read time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table User added a new column “user_id” in imported data User can select this column with only adding it to the schema (w/o reconstruct the table) Schema on read

40. Columnar ﬁle format time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table time code method user_id Columnar ﬁle format This query accesses only code column   select code, count(1) from tbl group by code

41. td-presto connector • MessagePack v07 • off heap • Async IO with Jetty-client • Scheduling & Resource management

42. How we manage Presto • Blue-Green Deployment • Stress test tool • Monitoring with DataDog

43. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rc

44. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rcTest Test Test!

45. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production!

46. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.  • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999

50. Monitoring with DataDog Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto process td-agent in_presto_metrics/v1/jmx/mbean /v1/query /v1/node out_metricsense DataDog

51. Monitoring with DataDog Query stalled time - Most important for us. - It triggers alert calls to us… - It can be mainly increased by td-presto connector problems. Most of them are race condition issue.

52. How many queries processed More than 10000 queries / day

53. How many queries processed Most queries ﬁnish within 1min

54. Check: treasuredata.com  treasure-data.hateblo.jp/ (Japan blog) Cloud service for the entire data pipeline

Presto in Treasure Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Presto in Treasure Data

Similar to Presto in Treasure Data (20)

Recently uploaded

Recently uploaded (20)

Presto in Treasure Data