Big Fast Queries with Presto on OpenShift

•

1 like•850 views

Presented at ODSC West 2019: https://odsc.com/training/portfolio/big-fast-queries-with-presto-on-openshift-2/ Abstract: Next generation data platforms are embracing the proliferation of technologies that help organizations discover, catalog, process, and derive insight from their data. OpenShift, and OpenShift Container Storage are at the forefront of this transition and provide a foundation for building a self service environment for developers, data engineers, and data scientists. In this demo we'll share how Starburst Presto on OpenShift can power your interactive and ad-hoc data discovery. SQL on anything means fast, secure access to data in OpenShift Container Storage, and federated access to data anywhere. With Starburst on OpenShift you have access to the world’s fastest open source SQL query engine, enterprise ready, across clouds public and private. Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber. In the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.

Data & Analytics

With Presto on OpenShift
BIG FAST SQL
Kamil Bajda-Pawlikowski
Co-founder / CTO
Michael St-Jean
Principal Marketing Manager
1
ODSCWest-2019@SanFrancisco

CONFIDENTIAL Designator
PRIMER ON
PRESTO

3
Presto: SQL-on-Anything
Deploy
Anywhere,
Query
Anything

Why Presto?
Community-driven
open source
project
High performance ANSI SQL
engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute
and storage
• Scale storage and
compute independently
• No ETL or data integration
necessary to get to
insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in

Enterprise edition
© 2019 6
Founded by Presto committers:
● Over 4 years of contributions to Presto
● Presto distro for on-prem and cloud env
● Supporting large customers in production
● Enterprise subscription add-ons (ODBC,
Ranger, Sentry, Oracle, Teradata, K8S)
Notable features contributed:
● ANSI SQL syntax enhancements
● Execution engine improvements
● Security integrations
● Spill to disk
● Cost-Based Optimizer
https://www.starburstdata.com/presto-enterprise/

CBO off
CBO on
https://www.starburstdata.com/presto-benchmarks/
Benchmark results

Administrative challenges
● Conﬁguring and managing clusters
● Autotuning properties based on the hardware provisioned
● High Availability for Presto Coordinator
● Scaling cluster elastically based on query load
● Gracefully decommissioning Presto Workers to avoid killing queries
● Monitoring of hardware and software layers
https://www.starburstdata.com/technical-blog/presto-on-kubernetes/

https://docs.starburstdata.com/latest/kubernetes.html
Presto on OpenShift
Presto Worker
Pod
Presto Worker
Pod
10
Presto Coordinator
Pod
Presto Worker
Pod
Horizontal Pod
Autoscaler (HPA)
Presto Operator
K8s Operator
Presto
Service
Hive Metastore
Service
Pod
Hadoop / Hive
RDBMS

CONFIDENTIAL Designator
CEPH OBJECT
INTRODUCTION

● Massively scalable
○ 10’s-100’s of PBs
○ Billions of objects
○ 100’s of gigabits
● Erasure coding drives storage efﬁciency
● High level of ﬁdelity with S3 API
● Open source software - LGPL 2.1 / 3

● Operators: OCS, Rook-Ceph, Rook-Noobaa
● Ceph for block (RWO), ﬁle (RWO/RWX), object (S3)
● Noobaa for data federation in the hybrid cloud

Hive connector conﬁguration example (hive.properties)
connector.name=hive-hadoop2
hive.metastore.uri=thrift://metastore.example.com:9083
hive.s3.endpoint=s3.example.com
hive.s3.aws-access-key=ACCESS_KEY
hive.s3.aws-secret-key=SECRET_KEY
hive.s3.use-instance-credentials=false
hive.s3.staging-directory=/tmp
hive.s3.ssl.enabled=true

presto> CREATE SCHEMA hive.s3_export WITH (location =
's3://my_bucket/some/path');
presto> CREATE TABLE hive.s3_export.my_table
WITH (format = 'ORC')
AS <source query>;

orders
customer
JOIN Query
Federated JOIN Query
Data Scientist
tpc-hschema

Thank You!
18
Twitter: @starburstdata @prestosql
Blog: www.starburstdata.com/technical-blog/
Newsletter: www.starburstdata.com/newsletter
© 2019

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

Midocean dropshipping via API with DroFxolyaivanovalion

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Ukraine War presentation: KNOW THE BASICSAishani27

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

Midocean dropshipping via API with DroFx

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Sampling (random) method and Non random.ppt

Generative AI on Enterprise Cloud with NiFi and Milvus

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

Schema on read is obsolete. Welcome metaprogramming..pdf

BabyOno dropshipping via API with DroFx.pptx

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Ukraine War presentation: KNOW THE BASICS

RA-11058_IRR-COMPRESS Do 198 series of 1998

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

CebaBaby dropshipping via API with DroFX.pptx

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

Log Analysis using OSSEC sasoasasasas.pptx

Ravak dropshipping via API with DroFx.pptx

Big Fast Queries with Presto on OpenShift

1. With Presto on OpenShift BIG FAST SQL Kamil Bajda-Pawlikowski Co-founder / CTO Michael St-Jean Principal Marketing Manager 1 ODSCWest-2019@SanFrancisco

2. CONFIDENTIAL Designator PRIMER ON PRESTO

3. 3 Presto: SQL-on-Anything Deploy Anywhere, Query Anything

4. Community See more at our Wiki

5. Why Presto? Community-driven open source project High performance ANSI SQL engine • New Cost-Based Query Optimizer • Proven scalability • High concurrency Separation of compute and storage • Scale storage and compute independently • No ETL or data integration necessary to get to insights • SQL-on-anything No vendor lock-in • No Hadoop distro vendor lock-in • No storage engine vendor lock-in • No cloud vendor lock-in

6. Enterprise edition © 2019 6 Founded by Presto committers: ● Over 4 years of contributions to Presto ● Presto distro for on-prem and cloud env ● Supporting large customers in production ● Enterprise subscription add-ons (ODBC, Ranger, Sentry, Oracle, Teradata, K8S) Notable features contributed: ● ANSI SQL syntax enhancements ● Execution engine improvements ● Security integrations ● Spill to disk ● Cost-Based Optimizer https://www.starburstdata.com/presto-enterprise/

7. CBO off CBO on https://www.starburstdata.com/presto-benchmarks/ Benchmark results

9. Administrative challenges ● Conﬁguring and managing clusters ● Autotuning properties based on the hardware provisioned ● High Availability for Presto Coordinator ● Scaling cluster elastically based on query load ● Gracefully decommissioning Presto Workers to avoid killing queries ● Monitoring of hardware and software layers https://www.starburstdata.com/technical-blog/presto-on-kubernetes/

10. https://docs.starburstdata.com/latest/kubernetes.html Presto on OpenShift Presto Worker Pod Presto Worker Pod 10 Presto Coordinator Pod Presto Worker Pod Horizontal Pod Autoscaler (HPA) Presto Operator K8s Operator Presto Service Hive Metastore Service Pod Hadoop / Hive RDBMS

11. Now available in OpenShift Catalog!

12. CONFIDENTIAL Designator CEPH OBJECT INTRODUCTION

13. ● Massively scalable ○ 10’s-100’s of PBs ○ Billions of objects ○ 100’s of gigabits ● Erasure coding drives storage efﬁciency ● High level of ﬁdelity with S3 API ● Open source software - LGPL 2.1 / 3

14. ● Operators: OCS, Rook-Ceph, Rook-Noobaa ● Ceph for block (RWO), ﬁle (RWO/RWX), object (S3) ● Noobaa for data federation in the hybrid cloud

15. Hive connector conﬁguration example (hive.properties) connector.name=hive-hadoop2 hive.metastore.uri=thrift://metastore.example.com:9083 hive.s3.endpoint=s3.example.com hive.s3.aws-access-key=ACCESS_KEY hive.s3.aws-secret-key=SECRET_KEY hive.s3.use-instance-credentials=false hive.s3.staging-directory=/tmp hive.s3.ssl.enabled=true

16. presto> CREATE SCHEMA hive.s3_export WITH (location = 's3://my_bucket/some/path'); presto> CREATE TABLE hive.s3_export.my_table WITH (format = 'ORC') AS <source query>;

17. orders customer JOIN Query Federated JOIN Query Data Scientist tpc-hschema

Big Fast Queries with Presto on OpenShift

Recommended

Recommended

More Related Content

More from kbajda

More from kbajda (6)

Recently uploaded

Recently uploaded (20)

Big Fast Queries with Presto on OpenShift