SlideShare a Scribd company logo
1 of 22
Download to read offline
Naoki Takezoe
Presto Conference Tokyo 2020
Nov 20, 2020
Testing Distributed Query
Engine as a Service
Deliver our service to customers as safe as possible
© 2020 Treasure Data
Who am I?
• Naoki Takezoe
• Joined Treasure Data in 2018
• Work for Presto / Apache Spark
• Open Source
• GitBucket
• Scalatra
• Apache PredictionIO
• Books
• Japanese translation of Scala Puzzlers
• Scala 300 recipes, etc
Twitter: @takezoen
GitHub: https://github.com/takezoe
© 2020 Treasure Data
Treasure Data
Logs
Device
Data
Batch
Data
PlazmaDB
Table Schema
Data Collection Cloud Storage Distributed Data Processing
Jobs
Job Management
SQL Editor
Scheduler
Workflows
Machine
Learning
Treasure Data OSS
Third Party OSS
Data
Ready to use Cloud Data Platform
© 2020 Treasure Data
Presto at Treasure Data
• 2010
• Presto, developed at Facebook, was open-sourced
• Treasure Data was providing Impala As A Service
• 2014
• Launched Presto As A Service as a replacement of Impala
• 2015
• 20,000 queries / day
• 2019
• Reached 1,000,000 queries / day
• Presto creators (Martin, Dain and David) left Facebook and founded an
NPO Presto Software Foundation (prestosql), then joined Starburst
• Hosted Presto Conference in Tokyo
© 2020 Treasure Data
© 2020 Treasure Data
Deliver our service to customers
as safe as possible
© 2020 Treasure Data
Testing distributed database is challenging
• Variety of workload
• Possible performance degradation
• Cluster status
• Many corner cases
© 2020 Treasure Data
Test can be more important when upgrading Presto
• Presto development is super active
• 27 releases in 2019
• 18 releases in 2020 at this point (Nov 14)
• No stable version
• Incompatible updates come with bug fixes
• Sticking to one version cannot be an option
• Backport bug fixes and new features from newer version also gets
challenging over time
How we can upgrade Presto safely...?
© 2020 Treasure Data
In order to minimize the risk
Unit test Integration test System test
Regular performance proving
Gradual migration for big updateInternal dogfooding
Cluster status monitoring
Test
Release process
Monitoring
© 2020 Treasure Data
What are missing?
• Covering variety of use cases
• Performance degradation in corner cases
• Unknown compatibility issues
• Production-scale environment
• Data size and characteristics
• Number of queries, cluster size, etc
© 2020 Treasure Data
What’s a solution?
© 2020 Treasure Data
presto-query-simulator
Test using production data and queries with security and safety
Base Cluster
Target Cluster
Query Log Hashed Results
ReportQuery Set
Real Database Test Database
read write
• Security: We don’t see customer data and query results
• Safety: We don’t cause any side-effect on customer data
Query Metrics
© 2020 Treasure Data
Challenges in query-simulator
• Query simulation takes very long time
• Testing 1-day queries will take 1 day at least, theoretically
• Not only time, but also cost of test clusters is the matter
• Result verification is not straightforward
• Many false positives and duplications
• Result analysis tends to depend on personal knowledge
© 2020 Treasure Data
Make query simulation faster
• Reduce number of queries by grouping by query signature (up to -90%)
• Reduce amount of data by narrowing table scan ranges (up to -80%)
• Use multiple Presto clusters
• Test only long-running queries
© 2020 Treasure Data
Query signature
SELECT time, path, user_agent
FROM access
WHERE TD_INTERVAL(time, '-1M')
SELECT time, path, user_agent
FROM access a
INNER JOIN account b ON a.account_id = b.account_id
S(T) access->#
S(J(T,T)) access->#,account->#
Simplified expression of query structure
Open-source Scala implementation is included in Airframe:
https://github.com/wvlet/airframe/blob/master/airframe-sql/src/main/scala/wvlet/airframe/sql/anal
yzer/QuerySignature.scala
© 2020 Treasure Data
Narrowing scan ranges
Time distribution of records
Use only x% of total records by adding a time range predicate
SELECT time, parh, user_agent
FROM access
SELECT time, path, user_agent
FROM (
SELECT time, path, user_agent
FROM access
)
WHERE TD_TIME_RANGE(time, from, to)
Original scan range
Use this range only
© 2020 Treasure Data
We choose these options depending on the
purpose of query simulation
• Reduce number of queries by grouping by query signature (up to -90%)
• Reduce amount of data by narrowing table scan ranges (up to -80%)
• Use multiple Presto clusters
• Test only long-running queries
for checking compatibility? or for checking performance difference?
© 2020 Treasure Data
Make result verification easier
• Auto detect non-deterministic query results
• Running query multiple times to see if results are the same
• Grouping similar errors
• Fuzzy comparison of error messages
•
• List problematic queries based on internal metrics
• Performance, resource usage, scan ranges, worker distribution, etc
• Finally, check problematic queries by human
© 2020 Treasure Data
We just need to check queries listed on the report
Give a possible reason of
the inconsistent result
Failures are grouped by the
similarity of error messages
List only queries more
than 5 min slower
© 2020 Treasure Data
Future work for further improvement
• Run query simulation more frequently (hopefully regularly)
• Further speed up is required
• Maintain small but effective query sets for quick test
• Automate test environment provisioning
• Improve test coverage
• Overcome some system-level restriction
• Test with schema and data of that time (like time travel)
• Improve the resolution of query grouping
• ...and more!!
© 2020 Treasure Data
Related Work
© 2020 Treasure Data
Related Work
• Snowtrail: Testing with Production Queries on a Cloud Database
• https://resources.snowflake.com/report/snowtrail-testing-with-producti
on-series-on-a-cloud-database
• クエリログを使ったAurora MySQLの負荷テスト
• https://techlife.cookpad.com/entry/2020/10/13/090000
• Building an Automated Testing Framework Based on Chaos Mesh and Argo
• https://pingcap.com/blog/building-automated-testing-framework-base
d-on-chaos-mesh-and-argo

More Related Content

What's hot

What's hot (20)

Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Getting Started on Google Cloud Platform
Getting Started on Google Cloud PlatformGetting Started on Google Cloud Platform
Getting Started on Google Cloud Platform
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...
 
Treasure Data and Fluentd
Treasure Data and FluentdTreasure Data and Fluentd
Treasure Data and Fluentd
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL Databases
 
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 

Similar to Testing Distributed Query Engine as a Service

Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloud
takezoe
 
Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...
Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...
Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...
Curiosity Software Ireland
 
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity Software Ireland
 

Similar to Testing Distributed Query Engine as a Service (20)

Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloud
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Automate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaSAutomate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaS
 
Automate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaSAutomate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaS
 
Load testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew SiemerLoad testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew Siemer
 
Optimizing Your Search Experience
Optimizing Your Search ExperienceOptimizing Your Search Experience
Optimizing Your Search Experience
 
Measuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrongMeasuring CDN performance and why you're doing it wrong
Measuring CDN performance and why you're doing it wrong
 
Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...
Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...
Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016
 
StasD & Graphite - Measure anything, Measure Everything
StasD & Graphite - Measure anything, Measure EverythingStasD & Graphite - Measure anything, Measure Everything
StasD & Graphite - Measure anything, Measure Everything
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...
 
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
 
AWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfAWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdf
 

More from takezoe

GitBucket: Git Centric Software Development Platform by Scala
GitBucket:  Git Centric Software Development Platform by ScalaGitBucket:  Git Centric Software Development Platform by Scala
GitBucket: Git Centric Software Development Platform by Scala
takezoe
 
Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」
takezoe
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
takezoe
 
Tracing Microservices with Zipkin
Tracing Microservices with ZipkinTracing Microservices with Zipkin
Tracing Microservices with Zipkin
takezoe
 
Type-safe front-end development with Scala
Type-safe front-end development with ScalaType-safe front-end development with Scala
Type-safe front-end development with Scala
takezoe
 
ネタじゃないScala.js
ネタじゃないScala.jsネタじゃないScala.js
ネタじゃないScala.js
takezoe
 

More from takezoe (20)

GitBucket: Open source self-hosting Git server built by Scala
GitBucket: Open source self-hosting Git server built by ScalaGitBucket: Open source self-hosting Git server built by Scala
GitBucket: Open source self-hosting Git server built by Scala
 
Revisit Dependency Injection in scala
Revisit Dependency Injection in scalaRevisit Dependency Injection in scala
Revisit Dependency Injection in scala
 
How to keep maintainability of long life Scala applications
How to keep maintainability of long life Scala applicationsHow to keep maintainability of long life Scala applications
How to keep maintainability of long life Scala applications
 
頑張りすぎないScala
頑張りすぎないScala頑張りすぎないScala
頑張りすぎないScala
 
GitBucket: Git Centric Software Development Platform by Scala
GitBucket:  Git Centric Software Development Platform by ScalaGitBucket:  Git Centric Software Development Platform by Scala
GitBucket: Git Centric Software Development Platform by Scala
 
Non-Functional Programming in Scala
Non-Functional Programming in ScalaNon-Functional Programming in Scala
Non-Functional Programming in Scala
 
Scala警察のすすめ
Scala警察のすすめScala警察のすすめ
Scala警察のすすめ
 
Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」
 
The best of AltJava is Xtend
The best of AltJava is XtendThe best of AltJava is Xtend
The best of AltJava is Xtend
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
 
Tracing Microservices with Zipkin
Tracing Microservices with ZipkinTracing Microservices with Zipkin
Tracing Microservices with Zipkin
 
Type-safe front-end development with Scala
Type-safe front-end development with ScalaType-safe front-end development with Scala
Type-safe front-end development with Scala
 
Scala Frameworks for Web Application 2016
Scala Frameworks for Web Application 2016Scala Frameworks for Web Application 2016
Scala Frameworks for Web Application 2016
 
Macro in Scala
Macro in ScalaMacro in Scala
Macro in Scala
 
Java9 and Project Jigsaw
Java9 and Project JigsawJava9 and Project Jigsaw
Java9 and Project Jigsaw
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
 
markedj: The best of markdown processor on JVM
markedj: The best of markdown processor on JVMmarkedj: The best of markdown processor on JVM
markedj: The best of markdown processor on JVM
 
ネタじゃないScala.js
ネタじゃないScala.jsネタじゃないScala.js
ネタじゃないScala.js
 
Excel方眼紙を支えるJava技術 2015
Excel方眼紙を支えるJava技術 2015Excel方眼紙を支えるJava技術 2015
Excel方眼紙を支えるJava技術 2015
 
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscalaビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

Testing Distributed Query Engine as a Service

  • 1. Naoki Takezoe Presto Conference Tokyo 2020 Nov 20, 2020 Testing Distributed Query Engine as a Service Deliver our service to customers as safe as possible
  • 2. © 2020 Treasure Data Who am I? • Naoki Takezoe • Joined Treasure Data in 2018 • Work for Presto / Apache Spark • Open Source • GitBucket • Scalatra • Apache PredictionIO • Books • Japanese translation of Scala Puzzlers • Scala 300 recipes, etc Twitter: @takezoen GitHub: https://github.com/takezoe
  • 3. © 2020 Treasure Data Treasure Data Logs Device Data Batch Data PlazmaDB Table Schema Data Collection Cloud Storage Distributed Data Processing Jobs Job Management SQL Editor Scheduler Workflows Machine Learning Treasure Data OSS Third Party OSS Data Ready to use Cloud Data Platform
  • 4. © 2020 Treasure Data Presto at Treasure Data • 2010 • Presto, developed at Facebook, was open-sourced • Treasure Data was providing Impala As A Service • 2014 • Launched Presto As A Service as a replacement of Impala • 2015 • 20,000 queries / day • 2019 • Reached 1,000,000 queries / day • Presto creators (Martin, Dain and David) left Facebook and founded an NPO Presto Software Foundation (prestosql), then joined Starburst • Hosted Presto Conference in Tokyo
  • 6. © 2020 Treasure Data Deliver our service to customers as safe as possible
  • 7. © 2020 Treasure Data Testing distributed database is challenging • Variety of workload • Possible performance degradation • Cluster status • Many corner cases
  • 8. © 2020 Treasure Data Test can be more important when upgrading Presto • Presto development is super active • 27 releases in 2019 • 18 releases in 2020 at this point (Nov 14) • No stable version • Incompatible updates come with bug fixes • Sticking to one version cannot be an option • Backport bug fixes and new features from newer version also gets challenging over time How we can upgrade Presto safely...?
  • 9. © 2020 Treasure Data In order to minimize the risk Unit test Integration test System test Regular performance proving Gradual migration for big updateInternal dogfooding Cluster status monitoring Test Release process Monitoring
  • 10. © 2020 Treasure Data What are missing? • Covering variety of use cases • Performance degradation in corner cases • Unknown compatibility issues • Production-scale environment • Data size and characteristics • Number of queries, cluster size, etc
  • 11. © 2020 Treasure Data What’s a solution?
  • 12. © 2020 Treasure Data presto-query-simulator Test using production data and queries with security and safety Base Cluster Target Cluster Query Log Hashed Results ReportQuery Set Real Database Test Database read write • Security: We don’t see customer data and query results • Safety: We don’t cause any side-effect on customer data Query Metrics
  • 13. © 2020 Treasure Data Challenges in query-simulator • Query simulation takes very long time • Testing 1-day queries will take 1 day at least, theoretically • Not only time, but also cost of test clusters is the matter • Result verification is not straightforward • Many false positives and duplications • Result analysis tends to depend on personal knowledge
  • 14. © 2020 Treasure Data Make query simulation faster • Reduce number of queries by grouping by query signature (up to -90%) • Reduce amount of data by narrowing table scan ranges (up to -80%) • Use multiple Presto clusters • Test only long-running queries
  • 15. © 2020 Treasure Data Query signature SELECT time, path, user_agent FROM access WHERE TD_INTERVAL(time, '-1M') SELECT time, path, user_agent FROM access a INNER JOIN account b ON a.account_id = b.account_id S(T) access-># S(J(T,T)) access->#,account-># Simplified expression of query structure Open-source Scala implementation is included in Airframe: https://github.com/wvlet/airframe/blob/master/airframe-sql/src/main/scala/wvlet/airframe/sql/anal yzer/QuerySignature.scala
  • 16. © 2020 Treasure Data Narrowing scan ranges Time distribution of records Use only x% of total records by adding a time range predicate SELECT time, parh, user_agent FROM access SELECT time, path, user_agent FROM ( SELECT time, path, user_agent FROM access ) WHERE TD_TIME_RANGE(time, from, to) Original scan range Use this range only
  • 17. © 2020 Treasure Data We choose these options depending on the purpose of query simulation • Reduce number of queries by grouping by query signature (up to -90%) • Reduce amount of data by narrowing table scan ranges (up to -80%) • Use multiple Presto clusters • Test only long-running queries for checking compatibility? or for checking performance difference?
  • 18. © 2020 Treasure Data Make result verification easier • Auto detect non-deterministic query results • Running query multiple times to see if results are the same • Grouping similar errors • Fuzzy comparison of error messages • • List problematic queries based on internal metrics • Performance, resource usage, scan ranges, worker distribution, etc • Finally, check problematic queries by human
  • 19. © 2020 Treasure Data We just need to check queries listed on the report Give a possible reason of the inconsistent result Failures are grouped by the similarity of error messages List only queries more than 5 min slower
  • 20. © 2020 Treasure Data Future work for further improvement • Run query simulation more frequently (hopefully regularly) • Further speed up is required • Maintain small but effective query sets for quick test • Automate test environment provisioning • Improve test coverage • Overcome some system-level restriction • Test with schema and data of that time (like time travel) • Improve the resolution of query grouping • ...and more!!
  • 21. © 2020 Treasure Data Related Work
  • 22. © 2020 Treasure Data Related Work • Snowtrail: Testing with Production Queries on a Cloud Database • https://resources.snowflake.com/report/snowtrail-testing-with-producti on-series-on-a-cloud-database • クエリログを使ったAurora MySQLの負荷テスト • https://techlife.cookpad.com/entry/2020/10/13/090000 • Building an Automated Testing Framework Based on Chaos Mesh and Argo • https://pingcap.com/blog/building-automated-testing-framework-base d-on-chaos-mesh-and-argo