Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto Meetup @ Facebook (2014-05-14)

Presto: Past, Present, and Future

In the talk we discuss the progress since Presto was open sourced, what the Presto team is working on now, and what we will be working on over the next year.

See more at

Presto Meetup @ Facebook (2014-05-14)

  1. 1. Presto Past, Present, and Future Dain Sundstrom
  2. 2. SELECT now() - INTERVAL ‘6’ MONTH
  3. 3. By The Numbers ▪6 months ▪15 releases ▪30 contributors ▪662 commits ▪1406 files changed ▪130,305 insertions(+) 43,699 deletions(-)
  4. 4. New SQL Features ▪Create table ▪Distinct aggregations ▪Cross joins ▪Custom functions
  5. 5. Optimizations ▪Range predicate push down ▪Distributed aggregations ▪Distributed window functions ▪Distinct-limit optimization ▪Approximate queries
  6. 6. Type System ▪Plugins can add new scalar types ▪Extensible operators ▪DATE, TIME, TIMESTAMP and INTERVAL ▪Time zones with DST rules ▪Localized parse and format ▪HyperLogLog type
  7. 7. New Connectors ▪Hadoop 1.x ▪Hadoop 2.x ▪CDH 5 ▪Custom S3 integration for Hadoop ▪Cassandra ▪TPC-H
  8. 8. SELECT now()
  9. 9. Hive 0.13 Support ▪New file formats ▪ORC ▪Parquet ▪DWRF ▪Vectorized ORC (2-3x more efficient) ▪ORC stripe skipping
  10. 10. Index Joins ▪Targeting low cardinality joins ▪Lazy hash build ▪Predicate push down ▪Aggregation push down ▪Initial version in already checked in ▪Currently supported in HBase and MySQL
  11. 11. Connectors ▪HBase ▪Requires features in Facebook HBase ▪Index joins ▪JDBC (MySQL) ▪Sharding ▪Index joins
  12. 12. Views ▪Create/drop views ▪View definition stored in connector ▪Fully optimized by Presto ▪Views stored in Presto syntax ▪Not compatible with existing Hive views
  13. 13. Machine Learning ▪Supports classification and regression ▪Multiple algorithms (Currently only SVM) ▪Feature extraction and normalization ▪New functions and types ▪Possibly extend SQL grammar ▪Highly experimental
  14. 14. Continuous Integration ▪Continuous correctness testing ▪Run queries against prod and trunk ▪Continuous benchmark ▪Run full test suite with every connector ▪Faster release cycle
  16. 16. SQL Features ▪Structs, Maps and Lists ▪Table generating functions ▪Scalar sub queries ▪Features required to run all TPC-DS ▪Create table with partitioning ▪Possibly: Insert, delete, drop partition
  17. 17. Execution Engine ▪Huge joins and aggregations ▪Hash distributed ▪Co-distributed and co-partitioned ▪Spill to disk (flash) ▪Work stealing ▪Basic task recovery
  18. 18. Native Store ▪Stores data directly on worker nodes ▪Uses custom data format ▪Initial use cases ▪Store for ‘hot’ data ▪Store for ‘live’ data ▪Support co-distributed data
  19. 19. Security ▪Authentication ▪Username/password, Kerberos, SSL cert ▪Authorization ▪Integration with plugins ▪Grant permissions from SQL
  20. 20. New REST API ▪Prepared statements ▪Bound parameters ▪Server managed sessions ▪Explicit support for non-query (DML/DDL) ▪Split query submission, stats, and data fetching
  21. 21. ODBC Driver ▪Targeting major BI tools ▪Tableau, MicroStrategy and Excel ▪Support for Windows, Mac and Linux ▪Will require new REST API ▪Written in D ▪Entirely open source (ASL2)
  22. 22. Plugins ▪Plugin repository ▪Manage plugins from CLI ▪Function catalogs ▪Push down joins and aggregations ▪Custom optimizers
  23. 23. SELECT question FROM audience WHERE isAwesome(question)
  24. 24. (c) 2007 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0