Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto for the Enterprise @ Hadoop Meetup

Presentation on Presto (http://prestodb.io) basics, design and Teradata's open source involvement. Presented on Sept 24th 2015 by Wojciech Biela and Łukasz Osipiuk at the #20 Warsaw Hadoop User Group meetup http://www.meetup.com/warsaw-hug/events/224872317

  • Login to see the comments

Presto for the Enterprise @ Hadoop Meetup

  1. 1. 11 Warsaw Hadoop User Group Wojciech Biela Łukasz Osipiuk www.teradata.com/presto
  2. 2. 2 ➔ History of Teradata Center for Hadoop ◆ Formerly Hadapt Founded in July, 2010 by Justin Borgman, Kamil Bajda- Pawlikowski, and Daniel Abadi ◆ Pioneered SQL-on-Hadoop market ◆ Based on work done by database research group in Yale Computer Science Department ◆ Hybrid of Hadoop scalability and DBMS performance ➔ Today ◆ Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop ◆ 20+ developers with deep Hadoop and database expertise ◆ Headquarters in Boston, MA ◆ Teams in US (MA, CA) and Poland (Warsaw) ◆ Contributors to open source project Presto Who are we? - Teradata Center for Hadoop!
  3. 3. 3 ➔ What is Presto? ➔ What is Teradata doing? ➔ Can I see a Demo? ➔ How can I contribute? Talk Agenda
  4. 4. 4 ➔ 100% open source distributed ANSI SQL engine for Big Data ◆ Modern code base ◆ Proven scalability ➔ Optimized for low latency, Interactive querying ◆ Cross platform query capability, not only SQL on Hadoop ◆ Distributed under the Apache license, now supported by Teradata ◆ Used by a community of well known, well respected technology companies What is Presto?
  5. 5. 5 History of Presto FALL 2012 6 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2015 98 Releases 65 Contributors 4587 Commits --------- Teradata joins Presto community & offers support SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive
  6. 6. 6 Query Execution Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable
  7. 7. 7 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  8. 8. 8 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  9. 9. 9 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  10. 10. 10 select shipdate, count(*) count, cast(sum(extendedprice) as bigint) price from h_lineitem where returnflag = 'R' group by shipdate order by count limit 20 Logical and fragmented plan
  11. 11. 11 select * from hive.default.h_nation, psql.public.p_region where h_nation.regionkey = p_region.regionkey; Logical and fragmented plan
  12. 12. 12 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  13. 13. 13 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  14. 14. 14 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable page 1 blockA blockB page blockA blockB ...
  15. 15. 15 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  16. 16. 16 Plan execution Hive Presto map reduce I/O I/O I/O I/O I/O task task task task task task task I/O
  17. 17. 17 Presto Extensibility – plugins ➔ Connectors ➔ Data types ➔ Extra functions ➔ (new) Security providers
  18. 18. 18 Presto Extensibility – connector interfaces Parser/ analyzer Planner Worker Data location API Hive Cassandra Kafka MySQL … Metadata API Hive Cassandra Kafka MySQL … Data stream API Hive Cassandra Kafka MySQL … Scheduler Coordinator
  19. 19. 19 Presto Extensibility – connector interfaces public interface Connector { ConnectorHandleResolver getHandleResolver(); ConnectorMetadata getMetadata(); ConnectorSplitManager getSplitManager(); ConnectorPageSourceProvider getPageSourceProvider() ConnectorRecordSetProvider getRecordSetProvider() ConnectorPageSinkProvider getPageSinkProvider() ConnectorRecordSinkProvider getRecordSinkProvider() ConnectorIndexResolver getIndexResolver() Set<SystemTable> getSystemTables() List<PropertyMetadata<?>> getSessionProperties() List<PropertyMetadata<?>> getTableProperties() ConnectorAccessControl getAccessControl() void shutdown() {} }
  20. 20. 20 ➔ Data stays in memory during execution and is pipelined across nodes MPP- style ➔ Vectorized columnar processing ➔ Presto is written in highly tuned Java ◆ Efficient in-memory data structures ◆ Very careful coding of inner loops ◆ Bytecode generation ➔ Optimized ORC reader ➔ Predicates push-down ➔ Query optimizer Presto = Performance
  21. 21. 21 ➔ Facebook ◆ Multiple production clusters (100s of nodes total) ● Including 300PB Hadoop data warehouse ◆ 1000s of internal daily active users ◆ Millions of queries each month ◆ Multiple PBs scanned every day ◆ Trillions of rows a day ➔ Netflix ◆ Over 200-node production cluster on EC2 ◆ Over 15 PB in S3 (Parquet format) ◆ Over 300 users and 2.5K queries daily Presto in Production
  22. 22. 22 ➔ 100% open source contributions to Presto to increase adoption in the enterprise ➔ A multi-year roadmap commitment to phased enhancements of the open source code ➔ The first ever commercial support offering for Presto What is Teradata Doing? Teradata Certified Presto www.teradata.com/presto
  23. 23. 23 ➔ Hadoop Distro Agnostic ➔ Modern Code Base ◆ Presto is well-designed open source software with proper database architecture ➔ Strong Like-Minded Community ➔ Push down processing across multiple data platforms ➔ Leverage Teradata expertise to make SQL for Hadoop viable Why is Teradata Contributing to Presto?
  24. 24. 24 Implement Integrate Proliferate Installer Documentation Monitoring & Support Tools ODBC / JDBC Drivers BI Certification Security Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto Management Tools Integration YARN Integration
  25. 25. 25 ➔ Ease of install and management via Presto-Admin tool ◆ www.github.com/prestodb/presto-admin ◆ Packaging Presto as an RPM ➔ Testing Framework for Presto ◆ www.github.com/prestodb/tempto ◆ Added large number of tests ➔ JDBC driver for JAVA 6 ➔ Various SQL improvements Teradata’s Contributions
  26. 26. 26 ➔ Continued SQL Improvements ➔ Security – Authentication & Authorization ➔ More Connectors – e.g. Hbase ➔ ODBC & JDBC Drivers that actually work ➔ BI tool certifications – e.g. Tableau ➔ YARN Integration ➔ Ambari Integration ➔ Open Source our Docker based Dev Env - WIP ➔ Open our Continuous Integration platform to the community Teradata’s Contribution Product Roadmap
  27. 27. 27 Teradata Engineers Dedicated to Presto
  28. 28. 28 “Presto is an integral part of the Airbnb data infrastructure stack with hundreds of employees running queries each day with the technology. We are excited to see Teradata joining the Presto open source community and are encouraged by the direction of their contributions” - James Mayfield, product lead, Airbnb. "We are excited to see Teradata's commitment to Presto and adding capabilities in the open source domain. This will create interesting opportunities within our technical and business teams to open up more access options to our critical data. We think this is a positive for Teradata and for the community as a whole” - Steve Deasy, vice president of Engineering, Groupon. Early Feedback is Extremely Positive
  29. 29. 29 Demo Time!
  30. 30. 30 www.github.com/facebook/presto www.github.com/prestodb Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto : User’s Group: www.groups.google.com/group/presto-users Facebook Page: www.facebook.com/prestodb Twitter: #prestodb How can I contribute?
  31. 31. 31 We’re hiring! ➔ Warsaw ➔ Boston Job Offer: bit.do/presto Contact: Wojciech.Biela@teradata.com Join us!
  32. 32. 32 Available for Download ➔ Presto 101t Server, CLI, JDBC ➔ Presto-Admin 0.1 ➔ Documentation ➔ HDP w/ Presto VM Sandbox ➔ CDH w/ Presto VM Sandbox www.teradata.com/presto Presto 101t certified by Teradata
  33. 33. 33 Wojciech.Biela@teradata.com Lukasz.Osipiuk@teradata.com

×