SlideShare a Scribd company logo
1 of 67
Efficient cluster resource
management
using Mesos and Cook
Li Jin
About Me
• Software Engineer @ Two Sigma
Outline
• Introduction: Mesos and Cook
What is Mesos
• Open Source Apache Project
• 2010: AMPLab, University of California Berkeley
• 2012: Twitter, Airbnb
• 2015: Twitter, Airbnb, Apple, Bloomberg, Cisco,
eBay, Yelp…
What is Mesos
• Tool to build distributed applications
– Hadoop, Spark…
– Cassandra, Kafta, Riak…
What is Mesos
• Distributed applications commonality:
– Manages resources (cpu, memory, disk…) on
worker hosts
– Manages life cycle of remote processes
– Manages communication between masters
and workers
What is Mesos
What is Mesos
What is Mesos
What is Mesos
• Distributed applications commonality:
– Manages resources (cpu, memory, disk…)
on worker hosts
– Manages life cycle of remote processes
– Manages communication between masters
and workers
Mesos Primitives
Mesos @ Two Sigma
Cook
Mesos
What is Cook
• Two Sigma’s Simulation Platform
• Manages tens of thousands of simulations
• Shares compute resources among users
What is Simulation
• Idempotent, distributed, resource intensive
computations
• Simulation set
• A handful ~ thousands of simulations
• Simulation
• Multiple Mesos tasks
What is Simulation
• Simulation task footprint
• 10 ~ 100 GB RAM
• 1 ~ 20 CPUs
• 15 minutes ~ a few hours
• Simulation use cases
• Interactive
• Batch processing
Problem
• High resource demand
• 5 x capacity during peak hours
• Optimize
• Utilization
• Process workloads as fast as possible
• Fairness
• Allocate resources fairly to users
What is Fairness
• FIFO
• Time sharing
• Throw a dice
• …
What is Fairness
• A story…
What is Fairness
Resource Allocation
What is Fairness, Really
• Fairness is not about ‘fair’
• Fairness is about user experience
• User should get their share of the cluster whenever
they need it
Outline
• Introduction: Mesos and Cook
• Problem: Utilization and Fairness
• Fairness: How do we do it
Static Quota
• Quota = Max percentage of the cluster allowed for
single user
• Static
• 100 % / # Max concurrent users
• Pros:
• Fairness
• Cons:
• Poor Utilization
Dynamic Quota
• Dynamic
• Quota * Utilization Adjustment
• Pros:
• Higher Utilization
• Cons:
• Poor Fairness
Dynamic Quota
Unfair Resource
Allocation
Fair Resource
Allocation
Hours…
Can we do better?
Static Quota Dynamic Quota ?
Fairness
Utilization
Preemption
• Kill a Simulation task and reschedule later
• Reclaim resource faster!
Unfair Resource
Allocation
Minutes!
Fair Resource
Allocation
Outline
• Introduction: Mesos and Cook
• Problem: Utilization and Fairness
• Fairness: How do we do it
• Preemption: How do we do it
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
??
?
??
?Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Problem
• Not all tasks are equal
• We just preempted some important tasks!
Bad User
Experience
Score Function
• Score Function: Reflect task’s value
• Fairness
• Importance
• Preemption principal:
• Preempt low score task for high score task
Preemption: Intuition
€€€
€€
€
Running
Waiting
£££££
££££
£££
¥¥¥¥
¥¥¥
¥¥
Preemption: Intuition
₽₽₽
₽₽
₽
Running
Waiting
₽₽₽
₽₽
₽
₽₽₽
₽₽
₽
Preemption: Intuition
₽₽₽
₽₽
₽
Running
Waiting
₽₽₽
₽₽
₽
₽₽₽
₽₽
₽
Preemption: Intuition
₽₽₽
₽₽
₽
Running
Waiting
₽₽₽
₽₽
₽
₽₽₽
₽₽
₽
Preemption: Intuition
₽₽₽
₽₽
₽
Running
Waiting
₽₽₽
₽₽
₽
₽₽₽
₽₽
₽
Preemption: Intuition
₽₽₽
₽₽
₽
Running
Waiting
₽₽₽
₽₽
₽
₽₽₽
₽₽
₽
Preemption: Intuition
₽₽₽
₽₽
₽
Running
Waiting
₽₽₽
₽₽
₽
₽₽₽
₽₽
₽
Preemption: Intuition
€€€
€€
€
Running
Waiting
£££££
££££
£££
¥¥¥¥
¥¥¥
¥¥
Outline
• Introduction: Mesos and Cook
• Problem: Utilization and Fairness
• Fairness: How do we do it
• Preemption: How do we do it
• Intuition
• Formalization
Cumulative Resource Share (CRS)
• Assuming there is an total order of tasks for
each user, where > means ‘more important
than’.
– CRS of task t is sum of all tasks of the same user
that are greater than or equal to t, divided by total
cluster resource.
• 𝐶𝑅𝑆 𝑡 =
1
𝑅 𝑇𝑜𝑡𝑎𝑙
𝑡′≥𝑡 𝑅 𝑡′
Cumulative Resource Share (CRS)
• 𝑅 𝑎 = 𝑅 𝑏 = 𝑅 𝑐 = 1 𝑐𝑝𝑢, 𝑅𝑡𝑜𝑡𝑎𝑙 = 6 𝑐𝑝𝑢𝑠
• 𝑎 > 𝑏 > 𝑐
• 𝐶𝑅𝑆 𝑎 =
𝑅 𝑎
𝑅 𝑇𝑜𝑡𝑎𝑙
=
1
6
• 𝐶𝑅𝑆 𝑏 =
𝑅 𝑎+𝑅 𝑏
𝑅 𝑇𝑜𝑡𝑎𝑙
=
2
6
• 𝐶𝑅𝑆 𝑐 =
𝑅 𝑎+𝑅 𝑏+𝑅 𝑐
𝑅 𝑇𝑜𝑡𝑎𝑙
=
3
6
Preemption: Formalization
€€€
€€
€
Running
Waiting
£££££
££££
£££
¥¥¥¥
¥¥¥
¥¥
Preemption: Formalization
1/6
2/6
3/6
Running
Waiting
1/6
2/6
3/6
1/6
2/6
3/6
Preemption: Formalization
1/6
2/6
3/6
Running
Waiting
1/6
2/6
3/6
1/6
2/6
3/6
Preemption: Formalization
1/6
2/6
3/6
Running
Waiting
1/6
2/6
3/6
1/6
2/6
3/6
Preemption: Formalization
1/6
2/6
3/6
Running
Waiting
1/6
2/6
3/6
1/6
2/6
3/6
Multiple Resources?
• Dominant Resource Fairness: Fair Allocation of
Multiple Resource Types
• Published by UC Berkeley in 2011
Dominant Cumulative Resource
Share
• 𝐶𝑅𝑆 𝑡 =
1
𝑅 𝑇𝑜𝑡𝑎𝑙
𝑡′≥𝑡 𝑅 𝑡′
• 𝐷𝐶𝑅𝑆 𝑡 = max
𝑅
1
𝑅 𝑇𝑜𝑡𝑎𝑙
𝑡′≥𝑡 𝑅 𝑡′
• 𝑆𝑐𝑜𝑟𝑒(𝑡) = −𝐷𝐶𝑅𝑆(𝑡)
Outline
• Introduction: Mesos and Cook
• Problem: Utilization and Fairness
• Fairness: How do we do it
• Preemption: How do we do it
• Intuition
• Formalization
• Put things together: Mesos and Cook
Cook: Architecture
Are we doing better?
Static Quota Dynamic Quota Preemption?
Fairness
Utilization
Outline
• Introduction: Mesos and Cook
• Problem: Utilization and Fairness
• Fairness: How do we do it
• Preemption: How do we do it
• Intuition
• Formalization
• Put things together: Mesos and Cook
• Benchmark
Benchmark
• Simulated
• 7 day production workload trace
Benchmark
0
2
4
6
8
10
12
SpeedUp
Simulation Set Speed Up Distribution
Dynamic Quota
Preemption
Benchmark
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7
Utilization
Effective Utilization
Dynamic Quota
Preemption
It works!
Open Source
• https://github.com/apache/mesos
• https://github.com/twosigma/cook
• @icexelloss
Questions?

More Related Content

What's hot

A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientBuilding Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientMike Friedman
 
PostgreSQL @Alibaba Cloud / Xianming Dou (Alibaba Cloud)
PostgreSQL @Alibaba Cloud / Xianming Dou (Alibaba Cloud)PostgreSQL @Alibaba Cloud / Xianming Dou (Alibaba Cloud)
PostgreSQL @Alibaba Cloud / Xianming Dou (Alibaba Cloud)Ontico
 
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...Miguel Gallardo
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Aaron Mildenstein - Using Logstash with Zabbix
Aaron Mildenstein - Using Logstash with ZabbixAaron Mildenstein - Using Logstash with Zabbix
Aaron Mildenstein - Using Logstash with ZabbixZabbix
 
Document Locking with Redis in Symfony2
Document Locking with Redis in Symfony2Document Locking with Redis in Symfony2
Document Locking with Redis in Symfony2Tom Corrigan
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsDave Nielsen
 
비동기 회고 발표자료
비동기 회고 발표자료비동기 회고 발표자료
비동기 회고 발표자료Benjamin Kim
 
Twitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat ArchitectureTwitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat ArchitectureC4Media
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nltieleman
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperknowbigdata
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014Modern Data Stack France
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 

What's hot (20)

A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientBuilding Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::Client
 
PostgreSQL @Alibaba Cloud / Xianming Dou (Alibaba Cloud)
PostgreSQL @Alibaba Cloud / Xianming Dou (Alibaba Cloud)PostgreSQL @Alibaba Cloud / Xianming Dou (Alibaba Cloud)
PostgreSQL @Alibaba Cloud / Xianming Dou (Alibaba Cloud)
 
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Move Over, Rsync
Move Over, RsyncMove Over, Rsync
Move Over, Rsync
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Aaron Mildenstein - Using Logstash with Zabbix
Aaron Mildenstein - Using Logstash with ZabbixAaron Mildenstein - Using Logstash with Zabbix
Aaron Mildenstein - Using Logstash with Zabbix
 
Document Locking with Redis in Symfony2
Document Locking with Redis in Symfony2Document Locking with Redis in Symfony2
Document Locking with Redis in Symfony2
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale Apps
 
비동기 회고 발표자료
비동기 회고 발표자료비동기 회고 발표자료
비동기 회고 발표자료
 
Twitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat ArchitectureTwitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat Architecture
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 

Viewers also liked

How Raft consensus algorithm will make replication even better in MongoDB 3.2...
How Raft consensus algorithm will make replication even better in MongoDB 3.2...How Raft consensus algorithm will make replication even better in MongoDB 3.2...
How Raft consensus algorithm will make replication even better in MongoDB 3.2...Ontico
 
Максим Литвинчик (Wargaming.net)
Максим Литвинчик (Wargaming.net)Максим Литвинчик (Wargaming.net)
Максим Литвинчик (Wargaming.net)Ontico
 
Premiers pas avec Ops Manager
Premiers pas avec Ops ManagerPremiers pas avec Ops Manager
Premiers pas avec Ops ManagerMongoDB
 
WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0MongoDB
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)Ontico
 
デジタル教材等の規格標準化の意義と動向
デジタル教材等の規格標準化の意義と動向デジタル教材等の規格標準化の意義と動向
デジタル教材等の規格標準化の意義と動向Kazuo Shimokawa
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Использование haproxy/iptables+etcd+confd для автоматического service discove...
Использование haproxy/iptables+etcd+confd для автоматического service discove...Использование haproxy/iptables+etcd+confd для автоматического service discove...
Использование haproxy/iptables+etcd+confd для автоматического service discove...Ontico
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)Ontico
 
Доставляя неприятности: о профессиональных наказаниях подчиненных в интеллект...
Доставляя неприятности: о профессиональных наказаниях подчиненных в интеллект...Доставляя неприятности: о профессиональных наказаниях подчиненных в интеллект...
Доставляя неприятности: о профессиональных наказаниях подчиненных в интеллект...Ontico
 
«Секретные» технологии инвестиционных банков / Алексей Рагозин (Дойче Банк)
«Секретные» технологии инвестиционных банков / Алексей Рагозин (Дойче Банк)«Секретные» технологии инвестиционных банков / Алексей Рагозин (Дойче Банк)
«Секретные» технологии инвестиционных банков / Алексей Рагозин (Дойче Банк)Ontico
 
Быстрое прототипирование бэкенда игры с геолокацией на OpenResty, Redis и Doc...
Быстрое прототипирование бэкенда игры с геолокацией на OpenResty, Redis и Doc...Быстрое прототипирование бэкенда игры с геолокацией на OpenResty, Redis и Doc...
Быстрое прототипирование бэкенда игры с геолокацией на OpenResty, Redis и Doc...Ontico
 
Как мы сделали ровную балансировку нагрузки на фронтенд-кластере / Насретдино...
Как мы сделали ровную балансировку нагрузки на фронтенд-кластере / Насретдино...Как мы сделали ровную балансировку нагрузки на фронтенд-кластере / Насретдино...
Как мы сделали ровную балансировку нагрузки на фронтенд-кластере / Насретдино...Ontico
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 

Viewers also liked (20)

How Raft consensus algorithm will make replication even better in MongoDB 3.2...
How Raft consensus algorithm will make replication even better in MongoDB 3.2...How Raft consensus algorithm will make replication even better in MongoDB 3.2...
How Raft consensus algorithm will make replication even better in MongoDB 3.2...
 
Максим Литвинчик (Wargaming.net)
Максим Литвинчик (Wargaming.net)Максим Литвинчик (Wargaming.net)
Максим Литвинчик (Wargaming.net)
 
Premiers pas avec Ops Manager
Premiers pas avec Ops ManagerPremiers pas avec Ops Manager
Premiers pas avec Ops Manager
 
Advertising 2015
Advertising 2015Advertising 2015
Advertising 2015
 
Ethiek en dierproeven
Ethiek en dierproevenEthiek en dierproeven
Ethiek en dierproeven
 
Vivliostyleの紹介
Vivliostyleの紹介Vivliostyleの紹介
Vivliostyleの紹介
 
WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
 
デジタル教材等の規格標準化の意義と動向
デジタル教材等の規格標準化の意義と動向デジタル教材等の規格標準化の意義と動向
デジタル教材等の規格標準化の意義と動向
 
Pixivの今と出版業界への関わり
Pixivの今と出版業界への関わりPixivの今と出版業界への関わり
Pixivの今と出版業界への関わり
 
Hoofdstuk 18 2008 deel 2
Hoofdstuk 18 2008  deel 2Hoofdstuk 18 2008  deel 2
Hoofdstuk 18 2008 deel 2
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Использование haproxy/iptables+etcd+confd для автоматического service discove...
Использование haproxy/iptables+etcd+confd для автоматического service discove...Использование haproxy/iptables+etcd+confd для автоматического service discove...
Использование haproxy/iptables+etcd+confd для автоматического service discove...
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
 
Доставляя неприятности: о профессиональных наказаниях подчиненных в интеллект...
Доставляя неприятности: о профессиональных наказаниях подчиненных в интеллект...Доставляя неприятности: о профессиональных наказаниях подчиненных в интеллект...
Доставляя неприятности: о профессиональных наказаниях подчиненных в интеллект...
 
«Секретные» технологии инвестиционных банков / Алексей Рагозин (Дойче Банк)
«Секретные» технологии инвестиционных банков / Алексей Рагозин (Дойче Банк)«Секретные» технологии инвестиционных банков / Алексей Рагозин (Дойче Банк)
«Секретные» технологии инвестиционных банков / Алексей Рагозин (Дойче Банк)
 
Быстрое прототипирование бэкенда игры с геолокацией на OpenResty, Redis и Doc...
Быстрое прототипирование бэкенда игры с геолокацией на OpenResty, Redis и Doc...Быстрое прототипирование бэкенда игры с геолокацией на OpenResty, Redis и Doc...
Быстрое прототипирование бэкенда игры с геолокацией на OpenResty, Redis и Doc...
 
Как мы сделали ровную балансировку нагрузки на фронтенд-кластере / Насретдино...
Как мы сделали ровную балансировку нагрузки на фронтенд-кластере / Насретдино...Как мы сделали ровную балансировку нагрузки на фронтенд-кластере / Насретдино...
Как мы сделали ровную балансировку нагрузки на фронтенд-кластере / Насретдино...
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
"Обзор Tarantool DB"
"Обзор Tarantool DB""Обзор Tarantool DB"
"Обзор Tarantool DB"
 

Similar to Efficient cluster resource management using Mesos and Cook

STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMSTUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMAvay Minni
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Alex Gilgur
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleDatabricks
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?CS, NcState
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013Daniel Austin
 
Using Hystrix to Build Resilient Distributed Systems
Using Hystrix to Build Resilient Distributed SystemsUsing Hystrix to Build Resilient Distributed Systems
Using Hystrix to Build Resilient Distributed SystemsMatt Jacobs
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project PresentationGary Spencer
 
Cloud workload analysis and simulation
Cloud workload analysis and simulationCloud workload analysis and simulation
Cloud workload analysis and simulationPrabhakar Ganesamurthy
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework reviewtaeseon ryu
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsRussell Jurney
 
Actor model : A Different Concurrency Approach
Actor model : A Different Concurrency ApproachActor model : A Different Concurrency Approach
Actor model : A Different Concurrency ApproachEmre Akış
 
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeLearn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeDynamic Yield
 
Aws atlanta march_2015
Aws atlanta march_2015Aws atlanta march_2015
Aws atlanta march_2015Adam Book
 
Test estimation session
Test estimation sessionTest estimation session
Test estimation sessionVipul Agarwal
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingFan Robbin
 

Similar to Efficient cluster resource management using Mesos and Cook (20)

STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMSTUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Using Hystrix to Build Resilient Distributed Systems
Using Hystrix to Build Resilient Distributed SystemsUsing Hystrix to Build Resilient Distributed Systems
Using Hystrix to Build Resilient Distributed Systems
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project Presentation
 
Cloud workload analysis and simulation
Cloud workload analysis and simulationCloud workload analysis and simulation
Cloud workload analysis and simulation
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework review
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics Applications
 
Actor model : A Different Concurrency Approach
Actor model : A Different Concurrency ApproachActor model : A Different Concurrency Approach
Actor model : A Different Concurrency Approach
 
ENAR short course
ENAR short courseENAR short course
ENAR short course
 
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeLearn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
 
Aws atlanta march_2015
Aws atlanta march_2015Aws atlanta march_2015
Aws atlanta march_2015
 
Test estimation session
Test estimation sessionTest estimation session
Test estimation session
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
 

More from Ontico

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...Ontico
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Ontico
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Ontico
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Ontico
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Ontico
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)Ontico
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Ontico
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Ontico
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)Ontico
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)Ontico
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Ontico
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Ontico
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Ontico
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Ontico
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)Ontico
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Ontico
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Ontico
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...Ontico
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Ontico
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Ontico
 

More from Ontico (20)

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
 

Recently uploaded

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 

Recently uploaded (20)

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 

Efficient cluster resource management using Mesos and Cook

Editor's Notes

  1. Hello Everyone. It’s an honor to be here today. My name is Li Jin. I am from New York. Today I am going to talk about …
  2. First, a little bit background about me. I am a software engineer @ Two Sigma. I have been working on Mesos and Cook for a little bit over a year now. Two Sigma is a quantitative hedge fund based in New York City. It is a technology company that applies computer science, engineering and math in finance and investment.
  3. Ok, let’s jump right into it. Let’s talk about what’s Mesos and Cook
  4. First of all, Mesos is a open source apache project. It is created in UC Berkeley in 2010. In 2012, Mesos is used by Twitter and Airbnb in their production enviroment And now, Mesos is powering many many more companies such as Apple, Bloomberg, Ciso
  5. Mesos is powerful tool to build distributed applications: Here by distributed application, I mean applications that launches and manages remote processes on a set of worker hosts For instance, it can be distributed computing framework like hadoop or spark, or distributed storage systems like cassandra…
  6. To explain why Mesos is a great tool to build distributed applications, let us think about commonality among those: Distributed applications need to account for resources on worker hosts in order not to overload them. They also need to implement resource isolation to make sure different processes don’t affect each other. And, these two things become even harder when multiple applications are running on the same set of worker hosts because they need to be aware of how the work hosts are being by the other applications. Distributed applications need to monitor the life cycle of remote processes. They need to know when a remote process starts, succeeds and fails. This might sound easy but if you think about all the failure cases – host can go down or worse, be overloaded; network partition can happen; the application can lose track of remote processes, and etc. How about communication? All distributed applications need to have some communication mechanism. Http, messaging, rpc…you name it. And worse, they all need to deal with message loss and resending. Finally, application need to optimize for execution. This including prioritizing workload, handle workload dependencies and etc, Hadoop, for instance, does straggler detection
  7. Now let’s talk a look at how Mesos helps. Mesos provides an abstraction layer of the physical machines and presents those machines at essentially “resources” to the applications. Applications, then, can use those resources for their workloads. …
  8. Now the applications no longer needs to worry about resource management. Whenever there is resource available, Mesos will just send resource offers to the application. Resouces isolation is taken care of as well. Mesos will launch remote processes in containers in monitor the resource usage of those containers.
  9. TwoSigma is powered by Mesos. We have multiple data centers that run Mesos and we run multiple frameworks on top of that. Some of them are open source frameworks like Marathon and Spark, and some of them are built by us to meet specific use cases. The framework I am going to talk about today is a framework we developed at Two Sigma called Cook.
  10. So what is Cook? Cook is TS’s simulation platform. At the very high level, Cook manages tens of thousands of simulation. And since the platform is shared by all researchers, cook is also responsible for sharing compute resources among users.
  11. Simulation is a tool that quantitative researchers at TS use to back test their investment strategy. From an abstract point of view, simulations are just idempotent, distributed, resource intensive computations. One simulation is implemented as multiple Mesos tasks.
  12. So here is what a simulation task looks like. It takes 10 – 100 GB of Memory, 1 – 20 Cpus and it runs from 15 mintues up to a few hours. But mostly, there are two major use cases. The first is interactive research. This type of workload usually finishes in 30 min to an hour and user actively waits for the result. The second type is batch computation, this type of workloads usually consume more resources, and users don’t care too much about the latency as long as they finish over night or weekend.
  13. So In Cook, we face a very high resource demand. We can easily receive workload that are 5x capacity of the cluster during peak hours and we are often at full or near full utilization during business hours. Under such workloads, it’s very important for Cook to optimize for two things. First is utilization, because we want to process workloads as fast as possible. And second, since Cook is a shared platform, we need to make sure it allocates resources to all users fairly for some definition of ‘Fair’. Well, we all know what utilization means but fairness is a little bit unclear at the moment.
  14. So what is fairness? Well, fairness has a lot of definitions and there are a lot ways to achieve fairness. Let’s see some examples. First come first serve is a way to achieve fairness. Most of services we use in real life everyday is first come first serve. Stores, post offices, you name it. Maybe we can do the same. Time-sharing is another way to achieve fairness. We can split one day into 1hour chunks and we can fairly share the cluster among 24 researchers. Or we can throw a dice every day and decide who is going to use the cluster for that day. *Explain more why they don’t work* *Explain how user experience maps to fairness* So, These approaches are all ‘fair’ but is that we want?
  15. Let me use a story to answer that question. Imagine yourself as a researcher at Two Sigma, you have this great idea that you think is going to make a lot of money and you want to run some simulations to test your it. You submit a batch of simulations, normally they should complete in an hour so you decide to go get some lunch. So you have this great lunch, you are fully energized and ready to go, you sit down and start to look at the results. However, you find that your simulations are still sitting in the queue. You are quite upset because this is blocking you from doing your job. You need those make What makes you more upset is when you open the utilization dashboard, you see this.
  16. You see you only have a tiny bit of the cluster and other users are using much more. A few words pop in your mind “This is not fair!” I can only assume this is what the researcher becomes.
  17. So what is fairness, really Well, I think fairness is not about fair. If we think about the story again, the researcher won’t look at the dashboard in the first place if he gets his results back. So I think fairness is about user experience. Fairness is a way to make sure users can get resources to do their job. So Fairness to us means users should get their share of the cluster whenever they need it.
  18. Now we have a better idea of what fairness is, let’s talk about how to achieve it
  19. Well, the easiest thing we can do is to use quota Quota is basically a max percentage of the cluster allowed to single user. A static quota can be total resources divided by the number of max concurrent users Quota can guarantee fairness, any user can get his quota any time. However, an obvious problem with static quota is that it can lead to low utilization. During peak hours, we can still have 80, 90% utilization but during night, since the number of users are usually lower, utilization can drop to 30, 40% while workloads are being sitting in the queue because of quota.
  20. To solve the utilization problem, we introduce this notion of dynamic quota. The basic idea is instead of use a static quota, we adjust quota based on current utilization. The lower the utilization is, the higher the quota can be. This approach brings us much higher utilization. During night, the utilization jumps from 30, 40% to 60,70%. However, dynamic quota brings us a new problem of unfairness
  21. Let’s take a look at this. Since some users enter the system when it’s relatively empty, they can have a higher quota and run a lot of jobs. As the utilization increases, quota decreases and we reach the allocation on the left side. The problem is that even though quota can change quickly based on utilization, the change of allocation is much slower because we don’t have a way to reclaim resources other than wait for simulations to complete and as I mentioned earlier, that can take hours. These long delays can be very problematic for us because again they can lead to bad user experience.
  22. So far we have static quota which is great in fairness but poor in utilization. And dynamic quota is quite the opposite. Can we do something better? Can we find an approach that have both high utilization and high fairness?
  23. Well, not surprisingly, our answer is to use preemption. Preemption here simply means to kill a Simulation task and reschedule it later. The most important idea behind preemption is that we can reclaim resources much faster. By using preemption, instead of hours, we only need minutes to go from the left side to the right side.
  24. So how to do preemption? Or more specifically, what’s the criteria to choose what tasks to preempt under what condition? Let’s first walk through an example to get some intuition behind preemption
  25. Let’s say we have a cluster of 6 cpus. Each box here represents a task taking 1 cpu. Here we have two users, Jerry and Kevin, each of them is using half the cluster
  26. Well, we know eventually, we want to reach a fair allocation like this.
  27. Well, we know eventually, we want to reach a fair allocation like this.
  28. But we don’t know how to get them yet. We don’t know which one of the six tasks we should preempt.
  29. Well, we know that both Jerry of Kevin are above their fair share. So intuitively we can preempt either Jerry or Kevin, but we don’t know much more beyond that. So we consider all their tasks for preemption, which are marked in orange here to schedule for Dave’s task, which is marked in yellow.
  30. And we decide to preempt one of Jerry’s task
  31. And end up like this
  32. Now we do it again and this time, since Jerry is no longer above his fair share, we only consider kevin’s tasks for preemption.
  33. And similar, we decide to preempt one of Kevin’s task
  34. And end up like this. So did we do a good job? Well, it turns out we did not.
  35. The problem is not all tasks are equal. Different tasks are of different importance to the users and we’ve just preempted some important tasks. This, again, leads to bad user experience.
  36. Now we know we cannot treat all tasks as equal so we need a score function to reflect a task’s value. We use value here to represent two things we’ve mentioned so far. The first is fairness, we want to use the score function to achieve fairness easily. The second is importance, we want to have the score function also reflects how important a task is. And Cook will use the score as a preemption criteria and preempt low score task for high score task
  37. Let’s see how that works. First, we don’t quite know how to express the relative importance among all task. It is hard for us to say one researcher’s task is more important than another. But we do know how to express the relative importance among tasks of the same user. The user has a easy way to tell us which of his tasks are more important. Here the importance are shown in currencies and we have a ordering for each user’s tasks. But since they are in different currencies, we still cannot compare them across users.
  38. Now it’s important for us to unify the currency. Here we apply our principle of fairness and say all users’ most important task are of the same value and so on and so forth and by doing this. The dollar amount on the task now reflects both fairness and importance.
  39. Now things becomes easier, when we choose tasks to preempt for the yellow one, we consider all tasks that have a lower value. The reason we need to consider multiple tasks instead of the lowest one is because we need to preemption subject to bin packing constraint. The yellow task needs to be able to fit on the host after the preemption. In this example we don’t have this problem because all tasks are of equal size but in reality that is no longer true. *Add arrows*
  40. Here, we preempt Jerry’s task
  41. Similarly we do this again
  42. This time, there is only task considered for preemption.
  43. And finally, we reach fair allocation and we are running most important tasks for each user.
  44. And finally, we reach fair allocation and we are running most important tasks for each user.
  45. Now we have developed some intuition though this example. Especially, how the score function should look like. Let take a look at how do we formalize it.
  46. Assume there is an total order of jobs for each user, where > means ‘has higher value than’ We introduce the notion of cumulative resource share or CRS. CRS of a job j is the sum of all jobs of the same user, that are greater than or equal to j, divided by total resource. Or in mathematic form, this.
  47. Let’s see how that works. First, we don’t quite know how to express the relative importance among all task. It is hard for us to say one researcher’s task is more important than another. But we do know how to express the relative importance among tasks of the same user. The user has a easy way to tell us which of his tasks are more important. Here the importance are shown in currencies and we have a ordering for each user’s tasks. But since they are in different currencies, we still cannot compare them across users.
  48. Note here unlike currency notion we used before, here a more valuable task has a lower CRS
  49. Well, we don’t quite know how to express the relative value among all user’s job, but we have a fairly good idea of how to express relative value among a single user’s job
  50. So far we are only considering a single type of resources but in reality we have multiple, for instance, memory and cpu. Luckily, there is already some interesting research to help us with that. Dominant Resource Fairness is a way to achieve fair allocation of multiple resource types. It’s paper published by UC Berkeley in 2011. And is implemented in Mesos itself. It introduces the notion of Dominant resource share, or DRS, to be the maximum of all user’s resource share. It’s simple yet has a lot good property. I won’t dig too much into it and I strongly suggest reading the paper.
  51. Here, we extend the same idea to cumulative resource share. To recap, here is the definition of CRS. Dominant cumulative resource share, or DRS is defined as the max CRS among all resources Finally, we define score to be the negation of DCRS because the higher the score, the more valuable the job, and DCRS is the opposite.
  52. So far we’ve talk about the problem, fairness, preemption, score function. Finally, let’s see how these fit together in Cook.
  53. This is a high level architecture of Cook. On the left side, we have cook, which consists of three components. The first component on the left side is Ranker. It’s functionality is to take all running and waiting jobs, sort them for each user, compute the score for those jobs return a list of jobs sorted by score. The list of jobs is then passed to the other two components. On the top side is Matcher. This component takes resource offers from Mesos, and match them with the list of jobs to see if the offers are big enough to fit those jobs and if so, it sends them to Mesos. The third component, Rebalancer, does preemption. Let’s zoom in to see what it does.
  54. We asked the question of can we do better. Now is the time to answer it.
  55. So far we’ve talk about the problem, fairness, preemption, score function. Finally, let’s see how these fit together in Cook.
  56. Here is the results from the benchmark we ran against We took a trace from our production workload and ran it with