SlideShare a Scribd company logo
1 of 27
Download to read offline
Presto as a Service
Tips for operation and monitoring
Taro L. Saito
Treasure Data, Inc.
leo@treasure-data.com
January 20, 2015
Presto Meetup Japan @ FreakOut, Roppongi
About Me: @taroleo
•  2007 University of Tokyo. Ph.D.
–  XML DBMS, Transaction Processing
•  Relational-Style XML Query. ACM SIGMOD 2008
•  ~ 2014 Assistant Professor at University of Tokyo
–  Genome Science Research
•  Distributed Computing
•  2014.3月~ Treasure Data
–  Software Engineer, MPP Team Leader
2
My Open Source Projects
•  sqlite-jdbc
–  SQLite DBMS for Java
–  1file =1DB
•  snappy-java
–  Fast compression library
–  More than 100,000 downloads/month
•  Used in Spark, Parquet, etc.
•  msgpack-java
•  UT Genome Browser (UTGB)
–  Visualization of massive amount of
genome science data
3
Topics
•  Presto as a Service in Treasure Data
–  Error Recovery
–  Presto Deployment
•  Tips for Monitoring Presto
–  JSON API
–  Presto + Fluentd
4
Treasure Data: Presto as a Service
5
Presto Public Release
Hive
TD API /
Web ConsoleInteractive query
batch query
Presto
Treasure Data
PlazmaDB:
MessagePack Columnar Storage
td-presto connector
Deployment
•  Building Presto takes more than 20 minutes.
•  Facebook frequently releases new versions
•  Let CircleCI build Presto
–  Deploy jar files to private Maven repository
–  We sometime use non-release versions
•  for fixing serious bugs
•  hot-fix patches
•  Integration Test
–  td-presto connector
•  PlazmaDB, Multi-tenant query scheduler
•  Query optimizer
–  Run test queries on staging cluster
7
Production: Blue-Green Deployment
•  http://martinfowler.com/bliki/BlueGreenDeployment.html
•  2 Presto Coordinators (Blue/Green)
–  Route Presto queries to the active cluster
–  No down-time upon deployment
•  Launch Presto worker instances with chef <- less than 5 min. in AWS
•  Inactive clusters is used for pre-production testing and customer support
–  Investigation and tuning of customer query performance
–  Trouble shooting
8
Error Recovery
•  Presto has no fault tolerance
•  Error types
–  User error
•  Syntax errors
–  SQL syntax, missing function
•  Semantic errors
–  missing tables/columns
–  Insufficient resource
•  Exceeded task memory size
–  Internal failure
•  I/O error
–  S3/Riak CS
•  worker failure
•  etc.
9
Worth A Retry!
Failed Query Rate
10
11
Query Retry Patterns used in TD
•  Error code + message pattern
12
Monitoring Presto
•  REST API for monitoring Presto state
–  JSON format
•  (presto server IP):8080/v1/query
–  List of recent queries (BasicQueryInfo class)
•  (presto server IP):8080/v1/query/(query id)
–  Detailed query state information
–  Query plan, tasks and running worker IDs
–  Processed rows/data size
13
Query List /v1/query
14
Detailed query Info /v1/query/(query id)
15
/ui/query-execution/(query id)
16
Complex Queries
17
18
Presto Coordinator
•  Organizes query execution pipelines
–  Coordinates presto workers
•  Retrieves table partition and split location from connectors
–  Creates distributed query plans
•  Full GC
–  Stalls coordinator
•  When memory is insufficient
–  Use memory-rich machine
–  GC Tuning
•  CMSInitiatingOccupancyFraction
19
Monitoring Presto with Fluentd
20
Hive
Presto
presto-metrics (Ruby)
•  https://github.com/xerial/presto-metrics
21
22
23
Detecting Anomaly
•  Started Query Rate (in 5min/15min)
–  If no query has started, cluster may be down (or not started properly)
•  Processed rows in a query
–  Sum up the number of the processed rows from all of the sub stages
–  Simple, but the most reliable measure
•  Send an alert
–  HipChat notification
–  PagerDuty call
•  JP/US team rotation
24
Benchmarking
•  Query performance comparison
–  between two versions of Presto
•  Benchmark
–  Run query set multiple times
–  Store the results to TD
–  Report the result with Presto
•  Aggregation query
25
Presto Operation Tool
•  Prestop
–  Our internal tool for managing multiple presto
clusters
•  written in Scala
–  Query monitoring
–  Benchmarking
–  Workload simulation
•  stress testing
•  Monitoring
–  Librato
–  Datadog
–  ChartIO (query stats)
26
WE ARE HIRING!
27
Check: www.treasuredata.com

More Related Content

What's hot

Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 

What's hot (20)

Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
20명 규모의 팀에서 Vault 사용하기
20명 규모의 팀에서 Vault 사용하기20명 규모의 팀에서 Vault 사용하기
20명 규모의 팀에서 Vault 사용하기
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafana
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 

Viewers also liked

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
DataWorks Summit
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 

Viewers also liked (20)

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Presto overview
Presto overviewPresto overview
Presto overview
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Building Physical in a Virtual World
Building Physical in a Virtual WorldBuilding Physical in a Virtual World
Building Physical in a Virtual World
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 TokyoPrestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
hotdog a TD tool for DD
hotdog a TD tool for DDhotdog a TD tool for DD
hotdog a TD tool for DD
 
Presto Meetup @ Facebook (3/22/2016)
Presto Meetup @ Facebook (3/22/2016)Presto Meetup @ Facebook (3/22/2016)
Presto Meetup @ Facebook (3/22/2016)
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWS
 
Diary of Support Engineer
Diary of Support EngineerDiary of Support Engineer
Diary of Support Engineer
 

Similar to Presto as a Service - Tips for operation and monitoring

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 

Similar to Presto as a Service - Tips for operation and monitoring (20)

Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Lipstick On Pig
Lipstick On Pig Lipstick On Pig
Lipstick On Pig
 
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring data
 

More from Taro L. Saito

More from Taro L. Saito (20)

Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
 
Airframe RPC
Airframe RPCAirframe RPC
Airframe RPC
 
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
 
Airframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpecAirframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpec
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
 
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataHow To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
 
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
 
Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
 
Learning Silicon Valley Culture
Learning Silicon Valley CultureLearning Silicon Valley Culture
Learning Silicon Valley Culture
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure Data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例
 
JNuma Library
JNuma LibraryJNuma Library
JNuma Library
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
 

Recently uploaded

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Presto as a Service - Tips for operation and monitoring

  • 1. Presto as a Service Tips for operation and monitoring Taro L. Saito Treasure Data, Inc. leo@treasure-data.com January 20, 2015 Presto Meetup Japan @ FreakOut, Roppongi
  • 2. About Me: @taroleo •  2007 University of Tokyo. Ph.D. –  XML DBMS, Transaction Processing •  Relational-Style XML Query. ACM SIGMOD 2008 •  ~ 2014 Assistant Professor at University of Tokyo –  Genome Science Research •  Distributed Computing •  2014.3月~ Treasure Data –  Software Engineer, MPP Team Leader 2
  • 3. My Open Source Projects •  sqlite-jdbc –  SQLite DBMS for Java –  1file =1DB •  snappy-java –  Fast compression library –  More than 100,000 downloads/month •  Used in Spark, Parquet, etc. •  msgpack-java •  UT Genome Browser (UTGB) –  Visualization of massive amount of genome science data 3
  • 4. Topics •  Presto as a Service in Treasure Data –  Error Recovery –  Presto Deployment •  Tips for Monitoring Presto –  JSON API –  Presto + Fluentd 4
  • 5. Treasure Data: Presto as a Service 5 Presto Public Release
  • 6. Hive TD API / Web ConsoleInteractive query batch query Presto Treasure Data PlazmaDB: MessagePack Columnar Storage td-presto connector
  • 7. Deployment •  Building Presto takes more than 20 minutes. •  Facebook frequently releases new versions •  Let CircleCI build Presto –  Deploy jar files to private Maven repository –  We sometime use non-release versions •  for fixing serious bugs •  hot-fix patches •  Integration Test –  td-presto connector •  PlazmaDB, Multi-tenant query scheduler •  Query optimizer –  Run test queries on staging cluster 7
  • 8. Production: Blue-Green Deployment •  http://martinfowler.com/bliki/BlueGreenDeployment.html •  2 Presto Coordinators (Blue/Green) –  Route Presto queries to the active cluster –  No down-time upon deployment •  Launch Presto worker instances with chef <- less than 5 min. in AWS •  Inactive clusters is used for pre-production testing and customer support –  Investigation and tuning of customer query performance –  Trouble shooting 8
  • 9. Error Recovery •  Presto has no fault tolerance •  Error types –  User error •  Syntax errors –  SQL syntax, missing function •  Semantic errors –  missing tables/columns –  Insufficient resource •  Exceeded task memory size –  Internal failure •  I/O error –  S3/Riak CS •  worker failure •  etc. 9 Worth A Retry!
  • 11. 11
  • 12. Query Retry Patterns used in TD •  Error code + message pattern 12
  • 13. Monitoring Presto •  REST API for monitoring Presto state –  JSON format •  (presto server IP):8080/v1/query –  List of recent queries (BasicQueryInfo class) •  (presto server IP):8080/v1/query/(query id) –  Detailed query state information –  Query plan, tasks and running worker IDs –  Processed rows/data size 13
  • 15. Detailed query Info /v1/query/(query id) 15
  • 18. 18
  • 19. Presto Coordinator •  Organizes query execution pipelines –  Coordinates presto workers •  Retrieves table partition and split location from connectors –  Creates distributed query plans •  Full GC –  Stalls coordinator •  When memory is insufficient –  Use memory-rich machine –  GC Tuning •  CMSInitiatingOccupancyFraction 19
  • 20. Monitoring Presto with Fluentd 20 Hive Presto
  • 22. 22
  • 23. 23
  • 24. Detecting Anomaly •  Started Query Rate (in 5min/15min) –  If no query has started, cluster may be down (or not started properly) •  Processed rows in a query –  Sum up the number of the processed rows from all of the sub stages –  Simple, but the most reliable measure •  Send an alert –  HipChat notification –  PagerDuty call •  JP/US team rotation 24
  • 25. Benchmarking •  Query performance comparison –  between two versions of Presto •  Benchmark –  Run query set multiple times –  Store the results to TD –  Report the result with Presto •  Aggregation query 25
  • 26. Presto Operation Tool •  Prestop –  Our internal tool for managing multiple presto clusters •  written in Scala –  Query monitoring –  Benchmarking –  Workload simulation •  stress testing •  Monitoring –  Librato –  Datadog –  ChartIO (query stats) 26
  • 27. WE ARE HIRING! 27 Check: www.treasuredata.com