SlideShare a Scribd company logo
1 of 33
11
Warsaw Hadoop User Group
Wojciech Biela
Łukasz Osipiuk
www.teradata.com/presto
2
➔ History of Teradata Center for Hadoop
◆ Formerly Hadapt Founded in July, 2010 by Justin Borgman, Kamil Bajda-
Pawlikowski, and Daniel Abadi
◆ Pioneered SQL-on-Hadoop market
◆ Based on work done by database research group in Yale Computer Science
Department
◆ Hybrid of Hadoop scalability and DBMS performance
➔ Today
◆ Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
◆ 20+ developers with deep Hadoop and database expertise
◆ Headquarters in Boston, MA
◆ Teams in US (MA, CA) and Poland (Warsaw)
◆ Contributors to open source project Presto
Who are we? - Teradata Center for Hadoop!
3
➔ What is Presto?
➔ What is Teradata doing?
➔ Can I see a Demo?
➔ How can I contribute?
Talk Agenda
4
➔ 100% open source distributed ANSI SQL engine for Big Data
◆ Modern code base
◆ Proven scalability
➔ Optimized for low latency, Interactive querying
◆ Cross platform query capability, not only SQL on Hadoop
◆ Distributed under the Apache license, now supported by Teradata
◆ Used by a community of well known, well respected technology companies
What is Presto?
5
History of Presto
FALL 2012
6 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
SPRING 2015
98 Releases
65 Contributors
4587 Commits
---------
Teradata joins
Presto community
& offers support
SPRING 2013
Presto rolled out
within Facebook
FALL 2013
Facebook open
sources Presto
FALL 2008
Facebook open
sources Hive
6
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Data location
API
Pluggable
7
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
8
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
9
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
10
select
shipdate,
count(*) count,
cast(sum(extendedprice)
as bigint) price
from
h_lineitem
where
returnflag = 'R'
group by shipdate
order by count
limit 20
Logical and fragmented plan
11
select
*
from
hive.default.h_nation,
psql.public.p_region
where
h_nation.regionkey = p_region.regionkey;
Logical and fragmented plan
12
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
13
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
14
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
page 1
blockA
blockB
page
blockA
blockB ...
15
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
16
Plan execution
Hive Presto
map
reduce
I/O
I/O
I/O
I/O
I/O
task task
task task
task task
task
I/O
17
Presto Extensibility – plugins
➔ Connectors
➔ Data types
➔ Extra functions
➔ (new) Security providers
18
Presto Extensibility – connector interfaces
Parser/
analyzer Planner
Worker
Data location API
Hive
Cassandra
Kafka
MySQL
…
Metadata API
Hive
Cassandra
Kafka
MySQL
…
Data stream API
Hive
Cassandra
Kafka
MySQL
…
Scheduler
Coordinator
19
Presto Extensibility – connector interfaces
public interface Connector
{
ConnectorHandleResolver getHandleResolver();
ConnectorMetadata getMetadata();
ConnectorSplitManager getSplitManager();
ConnectorPageSourceProvider getPageSourceProvider()
ConnectorRecordSetProvider getRecordSetProvider()
ConnectorPageSinkProvider getPageSinkProvider()
ConnectorRecordSinkProvider getRecordSinkProvider()
ConnectorIndexResolver getIndexResolver()
Set<SystemTable> getSystemTables()
List<PropertyMetadata<?>> getSessionProperties()
List<PropertyMetadata<?>> getTableProperties()
ConnectorAccessControl getAccessControl()
void shutdown() {}
}
20
➔ Data stays in memory during execution and is pipelined across nodes MPP-
style
➔ Vectorized columnar processing
➔ Presto is written in highly tuned Java
◆ Efficient in-memory data structures
◆ Very careful coding of inner loops
◆ Bytecode generation
➔ Optimized ORC reader
➔ Predicates push-down
➔ Query optimizer
Presto = Performance
21
➔ Facebook
◆ Multiple production clusters (100s of nodes total)
● Including 300PB Hadoop data warehouse
◆ 1000s of internal daily active users
◆ Millions of queries each month
◆ Multiple PBs scanned every day
◆ Trillions of rows a day
➔ Netflix
◆ Over 200-node production cluster on EC2
◆ Over 15 PB in S3 (Parquet format)
◆ Over 300 users and 2.5K queries daily
Presto in Production
22
➔ 100% open source contributions to Presto to
increase adoption in the enterprise
➔ A multi-year roadmap commitment to phased
enhancements of the open source code
➔ The first ever commercial support offering for
Presto
What is Teradata Doing?
Teradata Certified Presto
www.teradata.com/presto
23
➔ Hadoop Distro Agnostic
➔ Modern Code Base
◆ Presto is well-designed open source software with proper database architecture
➔ Strong Like-Minded Community
➔ Push down processing across multiple data platforms
➔ Leverage Teradata expertise to make SQL for Hadoop viable
Why is Teradata Contributing to Presto?
24
Implement Integrate Proliferate
Installer
Documentation
Monitoring & Support Tools
ODBC / JDBC Drivers
BI Certification
Security
Connectors
Commercial Support
Phase 1 Phase 2 Phase 3
June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
Teradata Contributions to Presto
Management Tools
Integration
YARN Integration
25
➔ Ease of install and management via Presto-Admin tool
◆ www.github.com/prestodb/presto-admin
◆ Packaging Presto as an RPM
➔ Testing Framework for Presto
◆ www.github.com/prestodb/tempto
◆ Added large number of tests
➔ JDBC driver for JAVA 6
➔ Various SQL improvements
Teradata’s Contributions
26
➔ Continued SQL Improvements
➔ Security – Authentication & Authorization
➔ More Connectors – e.g. Hbase
➔ ODBC & JDBC Drivers that actually work
➔ BI tool certifications – e.g. Tableau
➔ YARN Integration
➔ Ambari Integration
➔ Open Source our Docker based Dev Env - WIP
➔ Open our Continuous Integration platform to the community
Teradata’s Contribution Product Roadmap
27
Teradata Engineers Dedicated to Presto
28
“Presto is an integral part of the Airbnb data infrastructure stack with hundreds
of employees running queries each day with the technology. We are excited to
see Teradata joining the Presto open source community and are encouraged by
the direction of their contributions”
- James Mayfield, product lead, Airbnb.
"We are excited to see Teradata's commitment to Presto and adding capabilities
in the open source domain. This will create interesting opportunities within our
technical and business teams to open up more access options to our critical
data. We think this is a positive for Teradata and for the community as a whole”
- Steve Deasy, vice president of Engineering, Groupon.
Early Feedback is Extremely Positive
29
Demo Time!
30
www.github.com/facebook/presto
www.github.com/prestodb
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto : User’s Group: www.groups.google.com/group/presto-users
Facebook Page: www.facebook.com/prestodb
Twitter: #prestodb
How can I contribute?
31
We’re hiring!
➔ Warsaw
➔ Boston
Job Offer: bit.do/presto
Contact: Wojciech.Biela@teradata.com
Join us!
32
Available for Download
➔ Presto 101t Server, CLI, JDBC
➔ Presto-Admin 0.1
➔ Documentation
➔ HDP w/ Presto VM Sandbox
➔ CDH w/ Presto VM Sandbox
www.teradata.com/presto
Presto 101t certified by Teradata
33
Wojciech.Biela@teradata.com
Lukasz.Osipiuk@teradata.com

More Related Content

What's hot

Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
Databricks
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
Qubole
 

What's hot (20)

Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Presto
PrestoPresto
Presto
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
 

Viewers also liked

Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
Taro L. Saito
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2
wyukawa
 

Viewers also liked (20)

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWS
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Big Data: SQL query federation for Hadoop and RDBMS data
Big Data:  SQL query federation for Hadoop and RDBMS dataBig Data:  SQL query federation for Hadoop and RDBMS data
Big Data: SQL query federation for Hadoop and RDBMS data
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
Presto changes
Presto changesPresto changes
Presto changes
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Data virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss TeiidData virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss Teiid
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligence
 

Similar to Presto for the Enterprise @ Hadoop Meetup

Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologies
Serge Pagop
 

Similar to Presto for the Enterprise @ Hadoop Meetup (20)

Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraConnecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
 
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
 
Simplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptxSimplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptx
 
SharePoint 2016 Is Coming! Are You Ready?
SharePoint 2016 Is Coming! Are You Ready?SharePoint 2016 Is Coming! Are You Ready?
SharePoint 2016 Is Coming! Are You Ready?
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologies
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoicePaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
The Decoupled CMS in Financial Services
The Decoupled CMS in Financial ServicesThe Decoupled CMS in Financial Services
The Decoupled CMS in Financial Services
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Presto for the Enterprise @ Hadoop Meetup

  • 1. 11 Warsaw Hadoop User Group Wojciech Biela Łukasz Osipiuk www.teradata.com/presto
  • 2. 2 ➔ History of Teradata Center for Hadoop ◆ Formerly Hadapt Founded in July, 2010 by Justin Borgman, Kamil Bajda- Pawlikowski, and Daniel Abadi ◆ Pioneered SQL-on-Hadoop market ◆ Based on work done by database research group in Yale Computer Science Department ◆ Hybrid of Hadoop scalability and DBMS performance ➔ Today ◆ Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop ◆ 20+ developers with deep Hadoop and database expertise ◆ Headquarters in Boston, MA ◆ Teams in US (MA, CA) and Poland (Warsaw) ◆ Contributors to open source project Presto Who are we? - Teradata Center for Hadoop!
  • 3. 3 ➔ What is Presto? ➔ What is Teradata doing? ➔ Can I see a Demo? ➔ How can I contribute? Talk Agenda
  • 4. 4 ➔ 100% open source distributed ANSI SQL engine for Big Data ◆ Modern code base ◆ Proven scalability ➔ Optimized for low latency, Interactive querying ◆ Cross platform query capability, not only SQL on Hadoop ◆ Distributed under the Apache license, now supported by Teradata ◆ Used by a community of well known, well respected technology companies What is Presto?
  • 5. 5 History of Presto FALL 2012 6 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2015 98 Releases 65 Contributors 4587 Commits --------- Teradata joins Presto community & offers support SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive
  • 6. 6 Query Execution Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable
  • 7. 7 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 8. 8 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 9. 9 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 10. 10 select shipdate, count(*) count, cast(sum(extendedprice) as bigint) price from h_lineitem where returnflag = 'R' group by shipdate order by count limit 20 Logical and fragmented plan
  • 12. 12 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 13. 13 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 14. 14 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable page 1 blockA blockB page blockA blockB ...
  • 15. 15 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 17. 17 Presto Extensibility – plugins ➔ Connectors ➔ Data types ➔ Extra functions ➔ (new) Security providers
  • 18. 18 Presto Extensibility – connector interfaces Parser/ analyzer Planner Worker Data location API Hive Cassandra Kafka MySQL … Metadata API Hive Cassandra Kafka MySQL … Data stream API Hive Cassandra Kafka MySQL … Scheduler Coordinator
  • 19. 19 Presto Extensibility – connector interfaces public interface Connector { ConnectorHandleResolver getHandleResolver(); ConnectorMetadata getMetadata(); ConnectorSplitManager getSplitManager(); ConnectorPageSourceProvider getPageSourceProvider() ConnectorRecordSetProvider getRecordSetProvider() ConnectorPageSinkProvider getPageSinkProvider() ConnectorRecordSinkProvider getRecordSinkProvider() ConnectorIndexResolver getIndexResolver() Set<SystemTable> getSystemTables() List<PropertyMetadata<?>> getSessionProperties() List<PropertyMetadata<?>> getTableProperties() ConnectorAccessControl getAccessControl() void shutdown() {} }
  • 20. 20 ➔ Data stays in memory during execution and is pipelined across nodes MPP- style ➔ Vectorized columnar processing ➔ Presto is written in highly tuned Java ◆ Efficient in-memory data structures ◆ Very careful coding of inner loops ◆ Bytecode generation ➔ Optimized ORC reader ➔ Predicates push-down ➔ Query optimizer Presto = Performance
  • 21. 21 ➔ Facebook ◆ Multiple production clusters (100s of nodes total) ● Including 300PB Hadoop data warehouse ◆ 1000s of internal daily active users ◆ Millions of queries each month ◆ Multiple PBs scanned every day ◆ Trillions of rows a day ➔ Netflix ◆ Over 200-node production cluster on EC2 ◆ Over 15 PB in S3 (Parquet format) ◆ Over 300 users and 2.5K queries daily Presto in Production
  • 22. 22 ➔ 100% open source contributions to Presto to increase adoption in the enterprise ➔ A multi-year roadmap commitment to phased enhancements of the open source code ➔ The first ever commercial support offering for Presto What is Teradata Doing? Teradata Certified Presto www.teradata.com/presto
  • 23. 23 ➔ Hadoop Distro Agnostic ➔ Modern Code Base ◆ Presto is well-designed open source software with proper database architecture ➔ Strong Like-Minded Community ➔ Push down processing across multiple data platforms ➔ Leverage Teradata expertise to make SQL for Hadoop viable Why is Teradata Contributing to Presto?
  • 24. 24 Implement Integrate Proliferate Installer Documentation Monitoring & Support Tools ODBC / JDBC Drivers BI Certification Security Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto Management Tools Integration YARN Integration
  • 25. 25 ➔ Ease of install and management via Presto-Admin tool ◆ www.github.com/prestodb/presto-admin ◆ Packaging Presto as an RPM ➔ Testing Framework for Presto ◆ www.github.com/prestodb/tempto ◆ Added large number of tests ➔ JDBC driver for JAVA 6 ➔ Various SQL improvements Teradata’s Contributions
  • 26. 26 ➔ Continued SQL Improvements ➔ Security – Authentication & Authorization ➔ More Connectors – e.g. Hbase ➔ ODBC & JDBC Drivers that actually work ➔ BI tool certifications – e.g. Tableau ➔ YARN Integration ➔ Ambari Integration ➔ Open Source our Docker based Dev Env - WIP ➔ Open our Continuous Integration platform to the community Teradata’s Contribution Product Roadmap
  • 28. 28 “Presto is an integral part of the Airbnb data infrastructure stack with hundreds of employees running queries each day with the technology. We are excited to see Teradata joining the Presto open source community and are encouraged by the direction of their contributions” - James Mayfield, product lead, Airbnb. "We are excited to see Teradata's commitment to Presto and adding capabilities in the open source domain. This will create interesting opportunities within our technical and business teams to open up more access options to our critical data. We think this is a positive for Teradata and for the community as a whole” - Steve Deasy, vice president of Engineering, Groupon. Early Feedback is Extremely Positive
  • 30. 30 www.github.com/facebook/presto www.github.com/prestodb Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto : User’s Group: www.groups.google.com/group/presto-users Facebook Page: www.facebook.com/prestodb Twitter: #prestodb How can I contribute?
  • 31. 31 We’re hiring! ➔ Warsaw ➔ Boston Job Offer: bit.do/presto Contact: Wojciech.Biela@teradata.com Join us!
  • 32. 32 Available for Download ➔ Presto 101t Server, CLI, JDBC ➔ Presto-Admin 0.1 ➔ Documentation ➔ HDP w/ Presto VM Sandbox ➔ CDH w/ Presto VM Sandbox www.teradata.com/presto Presto 101t certified by Teradata