1. The document summarizes a presentation given by Kamil Bajda-Pawlikowski and Matt Fuller at the Boston Hadoop User Group Meetup on July 7, 2015 about Presto and Teradata's involvement with it.
2. Presto is an open source distributed SQL query engine that allows fast interactive querying of large datasets. It was originally developed at Facebook and is now supported by Teradata.
3. Teradata acquired the company that founded Presto in 2014 and has been contributing to the open source project, with plans to further its support and expand Presto's capabilities and adoption over multiple phases.
1. 1
Boston Hadoop User Group Meetup, July 7, 2015
Kamil Bajda-Pawlikowski
Matt Fuller
2. 2
โขโฏ History of Teradata Center for Hadoop
โโฏ Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and
Abadi
โโฏ Pioneered SQL-on-Hadoop market
โโฏ Based on work done by database research group in Yale Computer Science
Department
โโฏ Hybrid of Hadoop scalability and DBMS performance
โขโฏ Today
โโฏ Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
โโฏ 30 developers with deep Hadoop and database expertise
โโฏ Headquarters in Boston, MA
โโฏ Contributors to open source project Presto
Who are we? - Teradata Center for Hadoop!
3. 3
โขโฏ What is Presto?
โขโฏ What is Teradata doing?
โขโฏ Can I see a Demo?
โขโฏ How can I contribute?
Talk Agenda
4. 4
โขโฏ 100% open source distributed ANSI SQL engine for Big Data
โโฏ Modern code base
โโฏ Proven scalability
โโฏ Optimized for low latency, Interactive querying
โขโฏ Cross platform query capability, not only SQL on Hadoop
โขโฏ Distributed under the Apache license, now supported by Teradata
โขโฏ Used by a community of well known, well respected technology companies
What is Presto?
5. 5
History of Presto
FALL 2012
4 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
SPRING 2015
98 Releases
65 Contributors
4587 Commits
---------
Teradata joins
Presto community
& offers support
SPRING 2013
Presto rolled out
within Facebook
FALL 2013
Facebook open
sources Presto
FALL 2008
Facebook
open sources
Hive
Timeline image courtesy of Facebook
6. 6
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Scheduler
Worker
Client
Data location
API
Pluggable
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
7. 7
Presto Extensibility โ connectors
Parser/
analyzer
Planner
Worker
Data location API
Hive
Cassandra
Kafka
MySQL
โฆ
Metadata API
Hive
Cassandra
Kafka
MySQL
โฆ
Data stream API
Hive
Cassandra
Kafka
MySQL
โฆ
Scheduler
Coordinator
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
8. 8
โขโฏ Data stays in memory during execution and is pipelined across nodes MPP-style
โขโฏ Vectorized columnar processing
โขโฏ Presto is written in highly tuned Java
โโฏ Efficient in-memory data structures
โโฏ Very careful coding of inner loops
โโฏ Bytecode generation
โขโฏ Optimized ORC reader
Presto = Performance
9. 9
โขโฏ Facebook
โโฏ Multiple production clusters (100s of nodes total)
-โฏ Including 300PB Hadoop data warehouse
โโฏ 1000s of internal daily active users
โโฏ Millions of queries each month
โโฏ Multiple PBs scanned every day
โโฏ Trillions of rows a day
โขโฏ Netflix
โโฏ Over 200-node production cluster on EC2
โโฏ Over 15 PB in S3 (Parquet format)
โโฏ Over 300 users and 2.5K queries daily
Presto in Production
10. 10
โขโฏ 100% open source contributions to Presto to
increase adoption in the enterprise
โขโฏ A multi-year roadmap commitment to
phased enhancements of the open source
code
โขโฏ The first ever commercial support offering for
Presto
What is Teradata Doing?
Teradata Certified Presto
www.teradata.com/presto
11. 11
โขโฏ Hadoop Distro Agnostic
โขโฏ Modern Code Base
โโฏ Presto is well-designed open source software with proper database
architecture
โขโฏ Strong Like-Minded Community
โขโฏ Push down processing across multiple data platforms
โขโฏ Leverage Teradata expertise to make SQL for Hadoop viable
Why is Teradata Contributing to Presto?
14. 14
โขโฏ Ease of install and management via Presto-Admin tool
โโฏ www.github.com/prestodb/presto-admin
โโฏ Packaging Presto as an RPM
โขโฏ Testing Framework for Presto
โโฏ www.github.com/prestodb/tempto
โโฏ Added large number of tests
โขโฏ Improvements to JDBC driver
โโฏ To be open sourced on www.github.com/prestodb soon!
โขโฏ Various SQL improvements
Teradataโs Contributions
15. 15
โขโฏ YARN Integration
โขโฏ Ambari Integration
โขโฏ ODBC & JDBC Drivers that actually work
โขโฏ Security โ Authentication & Authorization
โขโฏ Continued SQL Improvements
โขโฏ BI tool certifications โ e.g. Tableau
โขโฏ More Connectors โ e.g. Hbase
โขโฏ Open Source our Docker based Dev Env
โขโฏ Open our Continuous Integration platform to the community
Teradataโs Contribution Product Roadmap