SlideShare a Scribd company logo
1 of 35
Download to read offline
Tuning Performance for SQL-on-Anything Analytics
Martin Traverso, Co-creator of Presto
Kamil Bajda-Pawlikowski, CTO Starburst
@prestosql @starburstdata
Strata Data 2019
San Francisco, CA
Presto: SQL-on-Anything
Deploy Anywhere, Query Anything
Why Presto?
Community-driven
open source project
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute
and storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
Project History
©2017 Starburst Data, Inc. All Rights Reserved
Community
See more at our Wiki
Presto in Production
Facebook: 10,000+ of nodes, HDFS (ORC, RCFile), sharded MySQL, 1000s of users
Uber: 2,000+ nodes (several clusters on premises) with 160K+ queries daily over HDFS (Parquet/ORC)
Twitter: 2,000+ nodes (several clusters on premises and GCP), 20K+ queries daily (Parquet)
LinkedIn: 500+ nodes, 200K+ queries daily over HDFS (ORC), and ~1000 users
Lyft: ------ redacted due to the quiet period for the IPO -----------
Netflix: 300+ nodes in AWS, 100+ PB in S3 (Parquet)
Yahoo! Japan: 200+ nodes for HDFS (ORC), and ObjectStore
FINRA: 120+ nodes in AWS, 4PB in S3 (ORC), 200+ users
Starburst Data
© 2019 7
Founded by Presto committers:
● Over 4 years of contributions to Presto
● Presto distro for on-prem and cloud env
● Supporting large customers in production
● Enterprise subscription add-ons (ODBC,
Ranger, Sentry, Oracle, Teradata)
Notable features contributed:
● ANSI SQL syntax enhancements
● Execution engine improvements
● Security integrations
● Spill to disk
● Cost-Based Optimizer
https://www.starburstdata.com/presto-enterprise/
Performance
Built for Performance
Query Execution Engine:
● MPP-style pipelined in-memory execution
● Columnar and vectorized data processing
● Runtime query bytecode compilation
● Memory efficient data structures
● Multi-threaded multi-core execution
● Optimized readers for columnar formats (ORC and Parquet)
● Predicate and column projection pushdown
● Now also Cost-Based Optimizer
CBO in a nutshell
Presto Cost-Based Optimizer includes:
● support for statistics stored in Hive Metastore
● join reordering based on selectivity estimates and cost
● automatic join type selection (repartitioned vs broadcast)
● automatic left/right side selection for joined tables
https://www.starburstdata.com/technical-blog/
Statistics & Cost
Hive Metastore statistics:
● number of rows in a table
● number of distinct values in a column
● fraction of NULL values in a column
● minimum/maximum value in a column
● average data size for a column
Cost calculation includes:
● CPU
● Memory
● Network I/O
Join type selection
Join left/right side decision
Join reordering with filter
Join tree shapes
CBO off
CBO on
https://www.starburstdata.com/presto-benchmarks/
Benchmark results
Benchmark results
● on average 7x improvement vs EMR Presto
● EMR Presto cannot execute many TPC-DS queries
● All TPC-DS queries pass on Starburst Presto
https://www.starburstdata.com/presto-aws/
Recent CBO enhancements
● Deciding on semi-join distribution type based on cost
● Support for outer joins
● Capping a broadcasted table size
● Various minor fixes in cardinality estimation
● ANALYZE table (native in Presto)
● Stats for AWS Glue Catalog (exclusive from Starburst)
Current and Future work
What’s next for Optimizer
● Stats support
○ Improved stats for Hive
○ Stats for DBMS connectors and NoSQL connectors
○ Tolerate missing / incomplete stats
● Core CBO enhancements
○ Cost more operators
○ Adjust cost model weights based on the hardware
○ Adaptive optimizations
○ Introduce Traits
● Involve connectors in optimizations
Involving Connectors in Optimization
History and Current State
● Original motivation: partition pruning for queries over Hive tables
● Simple range predicates and nullability checks passed to connectors.
Modeled as TupleDomain
((col0 BETWEEN ? AND ?) OR (col0 BETWEEN ? and ?) OR …))
AND
((col1 BETWEEN ? AND ?) OR (col1 BETWEEN ? and ?) OR …))
AND
...
History and Current State
● Partial evaluation of non-trivial expressions
○ Bind only known variables
○ Result in "true/false/null" or "can't tell”. E.g.,
f(a, b) := lower(a) LIKE ‘john%’ AND b = 1
f(‘Mary’, ?) → false → can prune
f(‘John S’, ?) → b = 1 → ¯_(ツ)_/¯
Beyond Simple Filter Pushdown...
● Dereference expressions. E.g., x.a > 5
● Array/map subscript. E.g., a[‘key’] = 10
● Complex filters and projections
● Aggregations
● Joins
● Limit: https://github.com/prestosql/presto/pull/421
● Sampling
● Others…
https://github.com/prestosql/presto/issues/18
A
B
C
D E
F G
A
C’
B’
D E
F G
B
C
? ?
C’
B’?
?
Pattern Result
Rule 1
A
C’
E’
D B’’
F G
B
E
?
E’
B’
?
Pattern Result
A
C’
B’
D E
F G
Rule 2
Filter
x.f > 5 AND y LIKE ‘a%b’
Scan(t_0)
Filter
z > 5
Scan(t_1)
FilterIntoScan
Rule
Connector.applyFilter(...)
Table: t_0
Filter: x.f > 5 AND y LIKE ‘a%b’
Table t
x :: row(f bigint, g bigint)
y :: varchar(10)
Derived Table: t_1
Filter: z > 5 [z :: bigint]
SELECT count(*)
FROM t
WHERE x.f > 5 AND y LIKE ‘a%b’
New Connector APIs
applyFilter(ConnectorTableHandle table, Expression filter)
applyLimit(ConnectorTableHandle table, long limit)
applyAggregation(ConnectorTableHandle table, List<Aggregation> aggregates)
applySampling(ConnectorTableHandle table, double samplingRate)
...
Performance Benefits (?)
● Better support for sophisticated backend systems
○ Druid, Pinot, ElasticSearch
○ SQL databases
● Improved performance for columnar data formats (Parquet, ORC)
ORC Performance Improvements
https://github.com/prestosql/presto/pull/555
ORC Performance Improvements - TPC-DS
Project Roadmap
● Coordinator HA
● Kubernetes
● Dynamic filtering
● Connectors
○ Phoenix
○ Iceberg
○ Druid
● TIMESTAMP semantics
● And more… https://github.com/prestosql/presto/labels/roadmap
Getting Involved
● Join us on Slack
○ Invite link: https://prestosql.io/community.html
● Github: https://github.io/prestosql/presto
● Website: https://prestosql.io
Further reading
https://www.starburstdata.com/presto-newsletter/
https://fivetran.com/blog/warehouse-benchmark
https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-emr-sql/
http://bytes.schibsted.com/bigdata-sql-query-engine-benchmark/
https://virtuslab.com/blog/benchmarking-spark-sql-presto-hive-bi-processing-googles-cloud-d
ataproc/
Thank You!
@prestosql @starburstdata
www.starburstdata.comwww.prestosql.io

More Related Content

More from kbajda

More from kbajda (6)

Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 

Recently uploaded

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 

Recently uploaded (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 

Presto talk @ Strata Data CA 2019

  • 1. Tuning Performance for SQL-on-Anything Analytics Martin Traverso, Co-creator of Presto Kamil Bajda-Pawlikowski, CTO Starburst @prestosql @starburstdata Strata Data 2019 San Francisco, CA
  • 3. Why Presto? Community-driven open source project High performance ANSI SQL engine • New Cost-Based Query Optimizer • Proven scalability • High concurrency Separation of compute and storage • Scale storage and compute independently • No ETL or data integration necessary to get to insights • SQL-on-anything No vendor lock-in • No Hadoop distro vendor lock-in • No storage engine vendor lock-in • No cloud vendor lock-in
  • 4. Project History ©2017 Starburst Data, Inc. All Rights Reserved
  • 6. Presto in Production Facebook: 10,000+ of nodes, HDFS (ORC, RCFile), sharded MySQL, 1000s of users Uber: 2,000+ nodes (several clusters on premises) with 160K+ queries daily over HDFS (Parquet/ORC) Twitter: 2,000+ nodes (several clusters on premises and GCP), 20K+ queries daily (Parquet) LinkedIn: 500+ nodes, 200K+ queries daily over HDFS (ORC), and ~1000 users Lyft: ------ redacted due to the quiet period for the IPO ----------- Netflix: 300+ nodes in AWS, 100+ PB in S3 (Parquet) Yahoo! Japan: 200+ nodes for HDFS (ORC), and ObjectStore FINRA: 120+ nodes in AWS, 4PB in S3 (ORC), 200+ users
  • 7. Starburst Data © 2019 7 Founded by Presto committers: ● Over 4 years of contributions to Presto ● Presto distro for on-prem and cloud env ● Supporting large customers in production ● Enterprise subscription add-ons (ODBC, Ranger, Sentry, Oracle, Teradata) Notable features contributed: ● ANSI SQL syntax enhancements ● Execution engine improvements ● Security integrations ● Spill to disk ● Cost-Based Optimizer https://www.starburstdata.com/presto-enterprise/
  • 9. Built for Performance Query Execution Engine: ● MPP-style pipelined in-memory execution ● Columnar and vectorized data processing ● Runtime query bytecode compilation ● Memory efficient data structures ● Multi-threaded multi-core execution ● Optimized readers for columnar formats (ORC and Parquet) ● Predicate and column projection pushdown ● Now also Cost-Based Optimizer
  • 10. CBO in a nutshell Presto Cost-Based Optimizer includes: ● support for statistics stored in Hive Metastore ● join reordering based on selectivity estimates and cost ● automatic join type selection (repartitioned vs broadcast) ● automatic left/right side selection for joined tables https://www.starburstdata.com/technical-blog/
  • 11. Statistics & Cost Hive Metastore statistics: ● number of rows in a table ● number of distinct values in a column ● fraction of NULL values in a column ● minimum/maximum value in a column ● average data size for a column Cost calculation includes: ● CPU ● Memory ● Network I/O
  • 17. Benchmark results ● on average 7x improvement vs EMR Presto ● EMR Presto cannot execute many TPC-DS queries ● All TPC-DS queries pass on Starburst Presto https://www.starburstdata.com/presto-aws/
  • 18. Recent CBO enhancements ● Deciding on semi-join distribution type based on cost ● Support for outer joins ● Capping a broadcasted table size ● Various minor fixes in cardinality estimation ● ANALYZE table (native in Presto) ● Stats for AWS Glue Catalog (exclusive from Starburst)
  • 20. What’s next for Optimizer ● Stats support ○ Improved stats for Hive ○ Stats for DBMS connectors and NoSQL connectors ○ Tolerate missing / incomplete stats ● Core CBO enhancements ○ Cost more operators ○ Adjust cost model weights based on the hardware ○ Adaptive optimizations ○ Introduce Traits ● Involve connectors in optimizations
  • 21. Involving Connectors in Optimization
  • 22. History and Current State ● Original motivation: partition pruning for queries over Hive tables ● Simple range predicates and nullability checks passed to connectors. Modeled as TupleDomain ((col0 BETWEEN ? AND ?) OR (col0 BETWEEN ? and ?) OR …)) AND ((col1 BETWEEN ? AND ?) OR (col1 BETWEEN ? and ?) OR …)) AND ...
  • 23. History and Current State ● Partial evaluation of non-trivial expressions ○ Bind only known variables ○ Result in "true/false/null" or "can't tell”. E.g., f(a, b) := lower(a) LIKE ‘john%’ AND b = 1 f(‘Mary’, ?) → false → can prune f(‘John S’, ?) → b = 1 → ¯_(ツ)_/¯
  • 24. Beyond Simple Filter Pushdown... ● Dereference expressions. E.g., x.a > 5 ● Array/map subscript. E.g., a[‘key’] = 10 ● Complex filters and projections ● Aggregations ● Joins ● Limit: https://github.com/prestosql/presto/pull/421 ● Sampling ● Others… https://github.com/prestosql/presto/issues/18
  • 25. A B C D E F G A C’ B’ D E F G B C ? ? C’ B’? ? Pattern Result Rule 1
  • 26. A C’ E’ D B’’ F G B E ? E’ B’ ? Pattern Result A C’ B’ D E F G Rule 2
  • 27. Filter x.f > 5 AND y LIKE ‘a%b’ Scan(t_0) Filter z > 5 Scan(t_1) FilterIntoScan Rule Connector.applyFilter(...) Table: t_0 Filter: x.f > 5 AND y LIKE ‘a%b’ Table t x :: row(f bigint, g bigint) y :: varchar(10) Derived Table: t_1 Filter: z > 5 [z :: bigint] SELECT count(*) FROM t WHERE x.f > 5 AND y LIKE ‘a%b’
  • 28. New Connector APIs applyFilter(ConnectorTableHandle table, Expression filter) applyLimit(ConnectorTableHandle table, long limit) applyAggregation(ConnectorTableHandle table, List<Aggregation> aggregates) applySampling(ConnectorTableHandle table, double samplingRate) ...
  • 29. Performance Benefits (?) ● Better support for sophisticated backend systems ○ Druid, Pinot, ElasticSearch ○ SQL databases ● Improved performance for columnar data formats (Parquet, ORC)
  • 32. Project Roadmap ● Coordinator HA ● Kubernetes ● Dynamic filtering ● Connectors ○ Phoenix ○ Iceberg ○ Druid ● TIMESTAMP semantics ● And more… https://github.com/prestosql/presto/labels/roadmap
  • 33. Getting Involved ● Join us on Slack ○ Invite link: https://prestosql.io/community.html ● Github: https://github.io/prestosql/presto ● Website: https://prestosql.io