SlideShare a Scribd company logo
1 of 33
Download to read offline
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Big Data Conference in Vilnius 2018
Kai Sasaki
Infrastructure for
Auto Scaling
Distributed System
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Bio
Kai Sasaki (佐々木 海)
• Senior Software Engineer at Arm Treasure Data since 2015
• Hadoop, Presto, Spark, TensorFlow.js, Apache Hivemall
• Books
– Available as paperback
and ebook.
• Twitter
– @Lewuathe
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Agenda
• Who is Treasure Data?
• What is distributed data analysis?
• What kind of challenges we have?
– Operational Cost
– Stability and Scalability
• Our Approach
– AWS CodeDeploy & Auto Scaling Group
– Query Simulation
– Graceful/Force Shutdown
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Who is Treasure Data?
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
Founded in Dec, 2011 in Silicon Valley
• Mountain View, CA
• DMP, eCDP, IoT, Cloud
• We joined Arm Oct, 2018
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
We are providing end-to-end integrated data analysis platform.
• Data Ingestion
– Mobile Device, Automotive, IoT
• Enterprise Customer Data Platform
• Service Integration
– BI tool (e.g. Tableau)
– Marketing tool
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
Open Source Lover
• Fluentd
• Embulk
• Digdag
• Apache Hivemall
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Enterprise Data Analysis
• Scalable processing
• Reliable platform
• Secure data protection
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Arm Pelion Platform
Treasure Data is a part of Arm Pelion IoT Platform
• Flexibility in connectivity management
• Efficient data processing
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Data
Analysis
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Data Analysis
Service component that enables us to process huge dataset
Scalability Throughput Data Consistency
• Easy to do horizontal scaling
• Flexible to the business
requirement
– Interface (e.g. SQL)
– Data Format
• Impossible scale with single
node machine
• Business requirement for batch
processing (e.g. daily batch)
• Write side operation is possible
– INSERT, DELETE, UPDATE
• Correct measurement is the
key for data analysis
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Processing Engines
Bunch of open source softwares are available for distributed processing
• Hadoop
• Presto
• Spark
• Kafka
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Typical Architecture
Master-Worker Model
https://www.tutorialspoint.com/apache_presto/apache_presto_architecture.htm
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Plan
select
t1.class,
t2.features,
count(1)
from iris t1
join iris t2
on t1.class = t2.class
group by 1, 2;
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Maintaining distributed data analysis platform in real world is not easy.
• Operation
– Deployment
– Logging Investigation
– Monitoring
• Money
– Large Scale Cluster
– Network Cost
• Stability
– Capacity Sufficiency
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Manual launch/termination?
Capacity estimation is correct?
Which version is deployed?
What kind of metrics do we
need to monitor?
How much does it cost?
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Manual launch/termination?
Capacity estimation is correct?
Which version is deployed?
What kind of metrics do we
need to monitor?
How much does it cost?
MANUALLY
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Approach
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Approach
Practical solutions by taking full advantage of public cloud services
• AWS CodeDeploy
– Integration with Auto Scaling Group
• EC2 Auto Scaling Group
– Load test by Query Simulation
– Metric Based Capacity Estimation
– Graceful/Force Instance Termination
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
CodeDeploy
Deployment Service for Deployment in AWS
• Easy to Integrate with Auto Scaling Group
• Available Everywhere
– Supporting On-Premise Instances
• Scalable for distributed system use cases
• https://docs.aws.amazon.com/codedeploy/index.html
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Auto Scaling System
System should be scaled automatically without any manual operation
• Load test by Query Simulation
• Metric Based Capacity Estimation
• Graceful Termination & Force Termination
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Simulation
Load test should be based on the real world workload.
• Get query list from the past history of our customer
• Query signature clustering
• Construct data set and query list based on the list
• That enables us to do load test easily based on production workload
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Signature
Query signature represents a query in a shortened format.
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Simulation
Conductor
c5.9xlarge
1. Get raw query list 2. Construct test data and query list
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Metric Based Capacity Estimation
Designed to achieve target metric value by adjusting capacity
• Add/reduce instances proportional to the target metric value
• e.g. Target average CPU usage = 40%
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Metric Based Capacity Estimation
Designed to achieve target metric value by adjusting capacity
• 40% is the threshold to balance the cost and performance
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Graceful Termination
Terminating instances gracefully
• Avoid making worse user experience
• Lifecycle hook in auto scaling group
• Cron job to check running tasks
– Number of tasks in the worker
– Send completion to lifecycle hook
https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Graceful Termination
Terminating instances gracefully
1. Instance is moved to Terminating:Wait status
2. Cron job make the state transition to Terminating:Proceed
3. The instance is gracefully terminated
Send complete lifecycle hook
ASG terminate the instance
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Force Termination
Long running task can block graceful termination
• Put “timeout” limitation
• Simulate “how long it takes to terminate gracefully”
Date Time
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Instance Termination
Balance between customer experience and cost optimization.
Graceful Termination
Keep queries running as much as possible
satisfies customer expectation.
• Non fault tolerant system such as Presto
• Distributed analysis workload tends to be too long
to be retried
Force Termination
Cost optimization is one of the primary
goal of auto scaling
• Auto scale out/in around 10 minutes does not lose
agility for capacity adjustment.
• Force termination happening only over 10 mins
queries is acceptable
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Recap
• Who is Treasure Data?
• What is distributed data analysis?
• What kind of challenges we have?
– Operational Cost
– Stability and Scalability
• Our Approach
– AWS CodeDeploy & Auto Scaling Group
– Query Simulation
– Graceful/Force Shutdown
Thank You!
Danke!
Merci!
谢谢!
Gracias!
Kiitos!
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

More Related Content

What's hot

Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Amazon Web Services
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
 
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...Amazon Web Services
 
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...Amazon Web Services
 
Easy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSAmazon Web Services
 
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazioneMigrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazioneAmazon Web Services
 
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...Amazon Web Services
 
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...Amazon Web Services
 
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...Amazon Web Services
 
Migrating your Data Centre to AWS
Migrating your Data Centre to AWSMigrating your Data Centre to AWS
Migrating your Data Centre to AWSAmazon Web Services
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Amazon Web Services
 
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...Amazon Web Services
 
Getting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceAmazon Web Services
 
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...Amazon Web Services
 
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...Amazon Web Services
 
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Amazon Web Services
 
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Amazon Web Services
 
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Amazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Amazon Web Services
 

What's hot (20)

Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
 
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...
 
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...
 
Easy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWS
 
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazioneMigrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione
 
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...
 
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
 
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...
 
Migrating your Data Centre to AWS
Migrating your Data Centre to AWSMigrating your Data Centre to AWS
Migrating your Data Centre to AWS
 
SMS-and-CloudEndure-Module4
SMS-and-CloudEndure-Module4SMS-and-CloudEndure-Module4
SMS-and-CloudEndure-Module4
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
 
Getting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration Service
 
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...
 
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...
 
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
 
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
 
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 

Similar to Infrastructure for auto scaling distributed system

Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSAmazon Web Services
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Amazon Web Services
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...Amazon Web Services
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Amazon Web Services
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Amazon Web Services
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Amazon Web Services
 
Migrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWSMigrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWSAmazon Web Services
 
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdfRodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdfAmazon Web Services
 
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018Amazon Web Services
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksAmazon Web Services
 
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Amazon Web Services
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Amazon Web Services
 
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...Chris Munns
 
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...Amazon Web Services
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Amazon Web Services
 
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...Amazon Web Services
 
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon Web Services
 

Similar to Infrastructure for auto scaling distributed system (20)

Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
 
Amazon EC2 Spot Instances
Amazon EC2 Spot InstancesAmazon EC2 Spot Instances
Amazon EC2 Spot Instances
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
Migrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWSMigrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWS
 
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdfRodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
 
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...
 
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
 
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
 
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
 

More from Kai Sasaki

Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤Kai Sasaki
 
Continuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisKai Sasaki
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoKai Sasaki
 
Real World Storage in Treasure Data
Real World Storage in Treasure DataReal World Storage in Treasure Data
Real World Storage in Treasure DataKai Sasaki
 
20180522 infra autoscaling_system
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_systemKai Sasaki
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBKai Sasaki
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.jsKai Sasaki
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageKai Sasaki
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178Kai Sasaki
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Embulk makes Japan visible
Embulk makes Japan visibleEmbulk makes Japan visible
Embulk makes Japan visibleKai Sasaki
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopKai Sasaki
 
図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure CodingKai Sasaki
 
Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~Kai Sasaki
 
How I tried MADE
How I tried MADEHow I tried MADE
How I tried MADEKai Sasaki
 
Reading kernel org
Reading kernel orgReading kernel org
Reading kernel orgKai Sasaki
 
Kernel bootstrap
Kernel bootstrapKernel bootstrap
Kernel bootstrapKai Sasaki
 

More from Kai Sasaki (20)

Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤
 
Continuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData Analysis
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future Presto
 
Real World Storage in Treasure Data
Real World Storage in Treasure DataReal World Storage in Treasure Data
Real World Storage in Treasure Data
 
20180522 infra autoscaling_system
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_system
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.js
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Embulk makes Japan visible
Embulk makes Japan visibleEmbulk makes Japan visible
Embulk makes Japan visible
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
 
図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding
 
Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~
 
How I tried MADE
How I tried MADEHow I tried MADE
How I tried MADE
 
Reading kernel org
Reading kernel orgReading kernel org
Reading kernel org
 
Reading drill
Reading drillReading drill
Reading drill
 
Kernel ext4
Kernel ext4Kernel ext4
Kernel ext4
 
Kernel bootstrap
Kernel bootstrapKernel bootstrap
Kernel bootstrap
 

Recently uploaded

Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 

Recently uploaded (20)

Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 

Infrastructure for auto scaling distributed system

  • 1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Big Data Conference in Vilnius 2018 Kai Sasaki Infrastructure for Auto Scaling Distributed System
  • 2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Bio Kai Sasaki (佐々木 海) • Senior Software Engineer at Arm Treasure Data since 2015 • Hadoop, Presto, Spark, TensorFlow.js, Apache Hivemall • Books – Available as paperback and ebook. • Twitter – @Lewuathe
  • 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Agenda • Who is Treasure Data? • What is distributed data analysis? • What kind of challenges we have? – Operational Cost – Stability and Scalability • Our Approach – AWS CodeDeploy & Auto Scaling Group – Query Simulation – Graceful/Force Shutdown
  • 4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Who is Treasure Data?
  • 5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data Founded in Dec, 2011 in Silicon Valley • Mountain View, CA • DMP, eCDP, IoT, Cloud • We joined Arm Oct, 2018
  • 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data We are providing end-to-end integrated data analysis platform. • Data Ingestion – Mobile Device, Automotive, IoT • Enterprise Customer Data Platform • Service Integration – BI tool (e.g. Tableau) – Marketing tool
  • 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data Open Source Lover • Fluentd • Embulk • Digdag • Apache Hivemall
  • 8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Enterprise Data Analysis • Scalable processing • Reliable platform • Secure data protection
  • 9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Arm Pelion Platform Treasure Data is a part of Arm Pelion IoT Platform • Flexibility in connectivity management • Efficient data processing
  • 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Data Analysis
  • 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Data Analysis Service component that enables us to process huge dataset Scalability Throughput Data Consistency • Easy to do horizontal scaling • Flexible to the business requirement – Interface (e.g. SQL) – Data Format • Impossible scale with single node machine • Business requirement for batch processing (e.g. daily batch) • Write side operation is possible – INSERT, DELETE, UPDATE • Correct measurement is the key for data analysis
  • 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Processing Engines Bunch of open source softwares are available for distributed processing • Hadoop • Presto • Spark • Kafka
  • 13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Typical Architecture Master-Worker Model https://www.tutorialspoint.com/apache_presto/apache_presto_architecture.htm
  • 14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Plan select t1.class, t2.features, count(1) from iris t1 join iris t2 on t1.class = t2.class group by 1, 2;
  • 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges
  • 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Maintaining distributed data analysis platform in real world is not easy. • Operation – Deployment – Logging Investigation – Monitoring • Money – Large Scale Cluster – Network Cost • Stability – Capacity Sufficiency
  • 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Manual launch/termination? Capacity estimation is correct? Which version is deployed? What kind of metrics do we need to monitor? How much does it cost?
  • 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Manual launch/termination? Capacity estimation is correct? Which version is deployed? What kind of metrics do we need to monitor? How much does it cost? MANUALLY
  • 19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our Approach
  • 20. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our Approach Practical solutions by taking full advantage of public cloud services • AWS CodeDeploy – Integration with Auto Scaling Group • EC2 Auto Scaling Group – Load test by Query Simulation – Metric Based Capacity Estimation – Graceful/Force Instance Termination
  • 21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. CodeDeploy Deployment Service for Deployment in AWS • Easy to Integrate with Auto Scaling Group • Available Everywhere – Supporting On-Premise Instances • Scalable for distributed system use cases • https://docs.aws.amazon.com/codedeploy/index.html
  • 22. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Auto Scaling System System should be scaled automatically without any manual operation • Load test by Query Simulation • Metric Based Capacity Estimation • Graceful Termination & Force Termination
  • 23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Simulation Load test should be based on the real world workload. • Get query list from the past history of our customer • Query signature clustering • Construct data set and query list based on the list • That enables us to do load test easily based on production workload
  • 24. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Signature Query signature represents a query in a shortened format.
  • 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Simulation Conductor c5.9xlarge 1. Get raw query list 2. Construct test data and query list
  • 26. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Metric Based Capacity Estimation Designed to achieve target metric value by adjusting capacity • Add/reduce instances proportional to the target metric value • e.g. Target average CPU usage = 40%
  • 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Metric Based Capacity Estimation Designed to achieve target metric value by adjusting capacity • 40% is the threshold to balance the cost and performance
  • 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Graceful Termination Terminating instances gracefully • Avoid making worse user experience • Lifecycle hook in auto scaling group • Cron job to check running tasks – Number of tasks in the worker – Send completion to lifecycle hook https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html
  • 29. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Graceful Termination Terminating instances gracefully 1. Instance is moved to Terminating:Wait status 2. Cron job make the state transition to Terminating:Proceed 3. The instance is gracefully terminated Send complete lifecycle hook ASG terminate the instance
  • 30. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Force Termination Long running task can block graceful termination • Put “timeout” limitation • Simulate “how long it takes to terminate gracefully” Date Time
  • 31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Instance Termination Balance between customer experience and cost optimization. Graceful Termination Keep queries running as much as possible satisfies customer expectation. • Non fault tolerant system such as Presto • Distributed analysis workload tends to be too long to be retried Force Termination Cost optimization is one of the primary goal of auto scaling • Auto scale out/in around 10 minutes does not lose agility for capacity adjustment. • Force termination happening only over 10 mins queries is acceptable
  • 32. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Recap • Who is Treasure Data? • What is distributed data analysis? • What kind of challenges we have? – Operational Cost – Stability and Scalability • Our Approach – AWS CodeDeploy & Auto Scaling Group – Query Simulation – Graceful/Force Shutdown
  • 33. Thank You! Danke! Merci! 谢谢! Gracias! Kiitos! Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.