SlideShare a Scribd company logo
1 of 21
Download to read offline
1
Presto - SQL on anything
January 2017
Grzegorz Kokosiński
Karol Sobczak
Teradata Center for Hadoop
2
Agenda
- Who are we?
- What is Presto?
- What is data federation?
- Different federation strategies in other databases (HIVE)
- what is supported and what are the problems
- Presto Connector
- Show time
3
Lets make some noise
• Let tweet about this presentation!
– #whug
– #prestodb
– #teradata
• Later on we will query that data!
4
Who are we
5
What is Presto?
• 100% open source distributed SQL query engine
- Originally developed by Facebook
• Key Differentiators:
- Performance & Scale
- Cross platform query capability, not only SQL on Hadoop
• Apache licensed, hosted on GitHub
- Certified distro & support from Teradata
6
Presto Users
See more at https://github.com/prestodb/presto/wiki/Presto-Users
7
• Facebook
– Multiple production clusters (100s of nodes total)
- 300PB in HDFS, sharded MySQL, SSD-based Raptor
– 1000s of internal daily active users
– 10s-100s of concurrent queries
• Netflix
– 250+ node on EC2, 40+ PB in S3 (Parquet format)
– Over 650 active users and 6K+ queries daily
• Twitter
– 200+ nodes on-premises over Parquet nested data
• Uber
– 200+ nodes (2 dedicated clusters) with 25K+ & 3K+ queries daily
• FINRA
– 120+ nodes in AWS, 2PB is S3, 200+ users (supported by Teradata)
Presto in Production
8
• In-memory processing
• Pipelined execution across nodes (MPP-style)
– Vectorized columnar processing
– Multithreaded execution keeps all CPU cores busy
• Presto is written in highly tuned Java
– Efficient memory management (reduced GC overhead)
– Very careful coding of inner loops
– Runtime bytecode generation
• Optimized ORC & Parquet readers
• Excellent performance with interactive SQL analytics
– Enables to use BI tools
Presto – Query Execution Performance
9
• Hadoop/Hive connector & file formats (HDFS/S3):
– HDFS & S3 + HCatalog
– ORC, RCFile, Parquet, SequenceFile, Text
• Raptor
– columnar store on flash driven by Facebook
• Open source data stores (driven by the community)
– MySQL & PostgreSQL (non-parallel)
– Cassandra (by Teradata)
– Kafka
– Redis
– MongoDB
– ElasticSearch
– Accumulo (by Bloomberg)
Supported data sources & file formats
10
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM table1 [[ INNER | OUTER ] JOIN table2 ON (…)]
[ WHERE condition ]
[ GROUP BY expression [, ...] ]
[ HAVING condition]
[ UNION [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count | ALL ] ]
In addition:
• Windowing functions
• UNNEST, TABLESAMPLE
• ROLLUP, CUBE, GROUPING SETS
• UNION, EXCEPT, INTERSECT
• Subqueries (EXISTS, IN)
ANSI SQL Support
11
Presto is not a database!
• Presto is a query execution engine (storage independent)
• Pluggable custom user functionalities
– Connectors
– Functions
– Types
– System access controllers
– Resource group configuration managers
– Event listeners
– …
• Built-in core functionalities:
– parser, execution, types, sql functions, monitoring
12
Data federation
• Query data from several data sources (databases)
• Streaming
– One to One
- there is a single connection between database access points
- e.g. PSQL via PSQL
- using storage handlers to access RDBMS data from Hive
– Many to One
- many connections from one database nodes to a single access point of
other database
- Accessing REST from UDF in (possibly each) HIVE map/reduce task
– Many to Many
- workers talk to each other directly
• Through storage
– Needs (intermittent) data materialization
• Presto supports them all!
13
Data federation common problems
• model incompatibilities
• multinode streaming is not always possible
• transactions
• cost based optimizations (statistics)
• SQL pushdown (predicates, projections, aggregations?, joins?)
14
Connector
• Presto interface to access arbitrary data source (hive, mysql, jmx)
• Provides:
– metadata
– ability to distributed, parallel and streamed read/write
– transaction boundary
– physical data layouts
– statistics
– (SQL) predicate pushdown)
– indexes (index join)
– session or table properties
– access control
– procedures (CALL …
– . . .
• Most (if not all) of the above points are optional
15
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Scheduler
Worker
Client
Data location
API
Pluggable
16
Data federation with Presto
• Through the storage
• Demo
– HIVE
HDFS
DataNode
HDFS
DataNode
Hive
Metastore
HDFS
Namenode
data transfer
Presto
worker
Presto
worker
Presto
coordinator
data transfer
metadata
metadata
17
Data federation with Presto
• One to One
• Demo
– psql
– REST
– and above with HIVE
Presto
worker
Presto
worker
Presto
coordinator
SQL
Database
JDBC metadataJDBC data
18
Many to many - data federation with Presto
AMP
AMP
AMP
AMP
Q
G
E
x
c
h
a
n
g
e
Q
G
E
x
c
h
a
n
g
e
PE Coordinator
Worker Thread
Worker Thread
Worker Thread
Worker Thread
Init & metadata exchange
Bi-directional
fully parallel
data exchange
TERADATA PRESTO
• Key features:
• Low latency
• High performance
• Concurrency
• SQL pushdown
• Data conversion
• Compression
• Efficient CPU usage
19
Conclusion
• Presto Connector is expressive
• 3rd party data source is 1st class citizen
• Single ANSI SQL to rule them all
– use BI tools on data which is not BI friendly
• Rapid data integration
20
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto Users Group: www.groups.google.com/group/presto-users
GitHub:
www.github.com/prestodb/presto
www.github.com/Teradata/presto
More information
21
www.teradata.com/presto

More Related Content

What's hot

Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseMatt Fuller
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178Kai Sasaki
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
 
Expand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinExpand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinDataWorks Summit
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010Membase
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small StartHiroshi Toyama
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Databricks
 

What's hot (20)

Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the Enterprise
 
Presto
PrestoPresto
Presto
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Expand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with ZeppelinExpand data analysis tool at scale with Zeppelin
Expand data analysis tool at scale with Zeppelin
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
 

Similar to Presto - SQL on anything

Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkkbajda
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalMichael Rainey
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerhdhappy001
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetupRemus Rusanu
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hiveSubhas Kumar Ghosh
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightAnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightŁukasz Grala
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010John Sichi
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summitOpen Analytics
 

Similar to Presto - SQL on anything (20)

Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetup
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me Anything
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightAnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Apache drill
Apache drillApache drill
Apache drill
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 

Recently uploaded

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 

Recently uploaded (20)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Presto - SQL on anything

  • 1. 1 Presto - SQL on anything January 2017 Grzegorz Kokosiński Karol Sobczak Teradata Center for Hadoop
  • 2. 2 Agenda - Who are we? - What is Presto? - What is data federation? - Different federation strategies in other databases (HIVE) - what is supported and what are the problems - Presto Connector - Show time
  • 3. 3 Lets make some noise • Let tweet about this presentation! – #whug – #prestodb – #teradata • Later on we will query that data!
  • 5. 5 What is Presto? • 100% open source distributed SQL query engine - Originally developed by Facebook • Key Differentiators: - Performance & Scale - Cross platform query capability, not only SQL on Hadoop • Apache licensed, hosted on GitHub - Certified distro & support from Teradata
  • 6. 6 Presto Users See more at https://github.com/prestodb/presto/wiki/Presto-Users
  • 7. 7 • Facebook – Multiple production clusters (100s of nodes total) - 300PB in HDFS, sharded MySQL, SSD-based Raptor – 1000s of internal daily active users – 10s-100s of concurrent queries • Netflix – 250+ node on EC2, 40+ PB in S3 (Parquet format) – Over 650 active users and 6K+ queries daily • Twitter – 200+ nodes on-premises over Parquet nested data • Uber – 200+ nodes (2 dedicated clusters) with 25K+ & 3K+ queries daily • FINRA – 120+ nodes in AWS, 2PB is S3, 200+ users (supported by Teradata) Presto in Production
  • 8. 8 • In-memory processing • Pipelined execution across nodes (MPP-style) – Vectorized columnar processing – Multithreaded execution keeps all CPU cores busy • Presto is written in highly tuned Java – Efficient memory management (reduced GC overhead) – Very careful coding of inner loops – Runtime bytecode generation • Optimized ORC & Parquet readers • Excellent performance with interactive SQL analytics – Enables to use BI tools Presto – Query Execution Performance
  • 9. 9 • Hadoop/Hive connector & file formats (HDFS/S3): – HDFS & S3 + HCatalog – ORC, RCFile, Parquet, SequenceFile, Text • Raptor – columnar store on flash driven by Facebook • Open source data stores (driven by the community) – MySQL & PostgreSQL (non-parallel) – Cassandra (by Teradata) – Kafka – Redis – MongoDB – ElasticSearch – Accumulo (by Bloomberg) Supported data sources & file formats
  • 10. 10 [ WITH with_query [, ...] ] SELECT [ ALL | DISTINCT ] select_expr [, ...] [ FROM table1 [[ INNER | OUTER ] JOIN table2 ON (…)] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition] [ UNION [ ALL | DISTINCT ] select ] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT [ count | ALL ] ] In addition: • Windowing functions • UNNEST, TABLESAMPLE • ROLLUP, CUBE, GROUPING SETS • UNION, EXCEPT, INTERSECT • Subqueries (EXISTS, IN) ANSI SQL Support
  • 11. 11 Presto is not a database! • Presto is a query execution engine (storage independent) • Pluggable custom user functionalities – Connectors – Functions – Types – System access controllers – Resource group configuration managers – Event listeners – … • Built-in core functionalities: – parser, execution, types, sql functions, monitoring
  • 12. 12 Data federation • Query data from several data sources (databases) • Streaming – One to One - there is a single connection between database access points - e.g. PSQL via PSQL - using storage handlers to access RDBMS data from Hive – Many to One - many connections from one database nodes to a single access point of other database - Accessing REST from UDF in (possibly each) HIVE map/reduce task – Many to Many - workers talk to each other directly • Through storage – Needs (intermittent) data materialization • Presto supports them all!
  • 13. 13 Data federation common problems • model incompatibilities • multinode streaming is not always possible • transactions • cost based optimizations (statistics) • SQL pushdown (predicates, projections, aggregations?, joins?)
  • 14. 14 Connector • Presto interface to access arbitrary data source (hive, mysql, jmx) • Provides: – metadata – ability to distributed, parallel and streamed read/write – transaction boundary – physical data layouts – statistics – (SQL) predicate pushdown) – indexes (index join) – session or table properties – access control – procedures (CALL … – . . . • Most (if not all) of the above points are optional
  • 15. 15 Presto Architecture Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable
  • 16. 16 Data federation with Presto • Through the storage • Demo – HIVE HDFS DataNode HDFS DataNode Hive Metastore HDFS Namenode data transfer Presto worker Presto worker Presto coordinator data transfer metadata metadata
  • 17. 17 Data federation with Presto • One to One • Demo – psql – REST – and above with HIVE Presto worker Presto worker Presto coordinator SQL Database JDBC metadataJDBC data
  • 18. 18 Many to many - data federation with Presto AMP AMP AMP AMP Q G E x c h a n g e Q G E x c h a n g e PE Coordinator Worker Thread Worker Thread Worker Thread Worker Thread Init & metadata exchange Bi-directional fully parallel data exchange TERADATA PRESTO • Key features: • Low latency • High performance • Concurrency • SQL pushdown • Data conversion • Compression • Efficient CPU usage
  • 19. 19 Conclusion • Presto Connector is expressive • 3rd party data source is 1st class citizen • Single ANSI SQL to rule them all – use BI tools on data which is not BI friendly • Rapid data integration
  • 20. 20 Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto Users Group: www.groups.google.com/group/presto-users GitHub: www.github.com/prestodb/presto www.github.com/Teradata/presto More information