SlideShare a Scribd company logo
1 of 43
MULTI-TENANCY KAFKA
CLUSTER FOR LINE SERVICES
WITH 250 BILLION DAILY
MESSAGES
SPEAKER
● Yuto Kawamura
● Senior Software Engineer
● Lead of the team providing company-
wide Kafka platform
● Active in Apache Kafka Community
● Code contribution
● Speaking at Kafka Summit
Agenda • Multitenancy company-wide
Kafka Platform
APACHE KAFKA
● Middleware for streaming data
● Highly scalable/available
● Data persistency
● Supports Pub-Sub model
KAFKA COMPONENTS
Consumer
ConsumerProducer
Producer
Broker cluster
KAFKA PLATFORM
● Large scale Kafka clusters provided for any systems/services
inside LINE
● Started internally from messaging server development team
● Expanded to company-wide platform
USAGE
● Two kinds of usage
● Distributed task queue for buffering and processing business
logic asynchronously
● e.g, Queueing heavy task from web app server to background
task processor
● "Data Hub" for distributing data to other services
PUB-SUB EXAMPLE
SINGLE CLUSTER SHARED BY MANY SYSTEMS
● Concept of “Data Hub”
● Easy to find and access data
● Operational and management efficiency
● Not making operational cost to be proportional to users
● Concentrate engineering resources for maximizing reliability/
performance
MULTITENANCY
Blockchain
Platform
Data Analysis
System
Timeline
5 million / second
210TB
Daily inflow
4GB / second
50+
systems
250 billion
Daily messages
SCALE
● Cluster can protect itself against abusive workloads
● Accidental workload doesn't propagate to other users
● We can track on which client is sending requests
● Find source of strange requests
● Certain level of isolation among client workloads
● Slow response for one client doesn't appears to another client
MULTITENANCY REQUIREMENTS
● It is more important to manage number of requests over
incoming/outgoing byte rate
● Kafka is amazingly durable for large data
● Good design leveraging system functions
● Page cache for caching data
● sendfile(2) for zero copy transfer data
● Native batching
● Typical danger exists in clients sending many requests
PROTECT CLUSTER AGAINST ABUSING
REQUEST QUOTA
● Use Request Quota
● Limit “Time of broker threads that
can be used by each client group”
● Set default quota
● Prevent single client from
accidentally consuming all broker
resources
ISOLATION AMONG CLIENT WORKLOADS
● When can performance isolation among different clients violated?
● Let’s look at example of actual troubleshooting.
DETECTION
● 50x ~ 100x slower response time
in 99th %ile Produce response
time
● Normal: ~20ms
● Observed: 50ms ~ 200ms
FINDING #1
● Coincidental disk read of a
certain amount
FINDING #2
● Network threads' utilization was
very high
REQUEST HANDLING IN KAFKA BROKER
● Network Threads: Reads/
Writes request/response from/to
client sockets
● Request Handler Threads:
Processes requests and
produces response object
REQUEST HANDLING - READ REQUEST
REQUEST HANDLING - PROCESS
REQUEST HANDLING - WRITE RESPONSE
NETWORK THREAD RUNS EVENT LOOP
● Multiplex and processes client sockets assigned sequentially
● It never blocks awaiting IO completion
WHEN NETWORK THREADS GETS BUSY...
● It means one of following:
● 1. Really busy doing lots of work Many requests/responses to
read/write
● 2. Blocked by some operations (which should not happen in
event loop in general)
RESPONSE HANDLING - NORMAL REQUESTS
● When response is in queue, all
data to be transferred are in
memory
RESPONSE HANDLING - FETCH RESPONSE
● When response is in queue, topic
data segments are not in
userspace memory
● => Copy to client socket directly
inside the kernel using
sendfile(2) system call
IF TARGET DATA NOT EXISTS IN PAGE CACHE
● Target data in page cache:
● => Just a memory copy. Very fast: ~ 100us
● Target data is NOT in page cache:
● => Needs to load data from disk into page cache first. Can be
slow: ~ 50ms (or even slower)
● => If this happens in event loop…?
SUSPECTING BLOCKING IN SENDFILE(2)
● Inspect duration of sendfile system calls issued by broker
process
● How?
SYSTEMTAP
● A kernel layer dynamic tracing tool
and scripting language
● Safe to run in production because
of low overhead
● Alternatively: DTrace, eBPF, etc…
INSPECT SENDFILE(2) DURATION
PROBLEM HYPOTHESIS
SOLUTION
● Make sure that data is ready on memory before the response is
passed to the network thread
● => Event loop never blocks
WARMUP PAGE CACHE
● Move blocking part to request
handler threads (= single queue
and pool of threads)
WARMUP PAGE CACHE
● When Network thread calls
sendfile(2) for transferring log data,
it's always in page cache
WARMUP IMPLEMENTATION
● Easiest way: Do synchronous read(2) on target data
● => Large overhead by copying memory from kernel to
userland
● Why is Kafka using sendfile(2) for transferring topic data?
● => To avoid expensive large memory copy
● How can we achieve it keeping this property?
TRICK - ZERO COPY PAGE LOAD
● Call sendfile(2) for target data with
dest /dev/null
● The /dev/null driver does not
actually copy data to anywhere
WHY IT HAS ALMOST NO OVERHEAD?
● Linux kernel internally uses `splice` to implement sendfile(2)
● `splice` implementation of /dev/null returns without iterating
target data
PATCHING KAFKA
IT WORKED
● No response time degradation
with coincidence of Fetch request
reading disk
KAFKA-7504 Broker performance degradation caused by call of sendfile
reading disk in network thread - x50 ~ x100 response time reduction
KAFKA-4614 Long GC pause harming broker performance which is caused by
mmap objects created for OffsetIndex - x100 ~ response time reduction
KAFKA-6501 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend
earlier on shutdown - Eliminate significant latency during broker restart
WHY NOT CONTRIBUTE IT BACK?
CONCLUSION
● Introduced our engineering for operating the company-wide
Kafka platform
● Quota, SystemTap and patch understanding system deeply
● After fixing some issues, our hosting policy is working well and
efficiently, keeping:
● concept of single "Data Hub" and
● operational cost not to be proportional to the number of users/
usages
● We are contributing to the world through OSS
… AND NEXT
● Kafka platform evolution
● Clients standardization and management
● Higher availability
● SRE team
● Planning to rollout new team for Reliability Engineering
● Share knowledge/tools which are independent from
Middleware
● Come and ask me more!
THANK YOU

More Related Content

What's hot

BGP Unnumbered で遊んでみた
BGP Unnumbered で遊んでみたBGP Unnumbered で遊んでみた
BGP Unnumbered で遊んでみたakira6592
 
コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線Motonori Shindo
 
Cisco Modeling Labs (CML)を使ってネットワークを学ぼう!(DevNet編)
Cisco Modeling Labs (CML)を使ってネットワークを学ぼう!(DevNet編)Cisco Modeling Labs (CML)を使ってネットワークを学ぼう!(DevNet編)
Cisco Modeling Labs (CML)を使ってネットワークを学ぼう!(DevNet編)シスコシステムズ合同会社
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Kohei Tokunaga
 
今話題のいろいろなコンテナランタイムを比較してみた
今話題のいろいろなコンテナランタイムを比較してみた今話題のいろいろなコンテナランタイムを比較してみた
今話題のいろいろなコンテナランタイムを比較してみたKohei Tokunaga
 
大規模サービスを支えるネットワークインフラの全貌
大規模サービスを支えるネットワークインフラの全貌大規模サービスを支えるネットワークインフラの全貌
大規模サービスを支えるネットワークインフラの全貌LINE Corporation
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Prometheus at Preferred Networks
Prometheus at Preferred NetworksPrometheus at Preferred Networks
Prometheus at Preferred NetworksPreferred Networks
 
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...NTT DATA Technology & Innovation
 
フロー技術によるネットワーク管理
フロー技術によるネットワーク管理フロー技術によるネットワーク管理
フロー技術によるネットワーク管理Motonori Shindo
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Akihiro Suda
 
Apache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersApache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersSeiya Mizuno
 
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48Preferred Networks
 
プログラマ目線から見たRDMAのメリットと その応用例について
プログラマ目線から見たRDMAのメリットとその応用例についてプログラマ目線から見たRDMAのメリットとその応用例について
プログラマ目線から見たRDMAのメリットと その応用例についてMasanori Itoh
 
Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門Etsuji Nakai
 
CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~
CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~ CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~
CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~ SEGADevTech
 

What's hot (20)

Kafka・Storm・ZooKeeperの認証と認可について #kafkajp
Kafka・Storm・ZooKeeperの認証と認可について #kafkajpKafka・Storm・ZooKeeperの認証と認可について #kafkajp
Kafka・Storm・ZooKeeperの認証と認可について #kafkajp
 
BGP Unnumbered で遊んでみた
BGP Unnumbered で遊んでみたBGP Unnumbered で遊んでみた
BGP Unnumbered で遊んでみた
 
KafkaとPulsar
KafkaとPulsarKafkaとPulsar
KafkaとPulsar
 
コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線
 
Cisco Modeling Labs (CML)を使ってネットワークを学ぼう!(DevNet編)
Cisco Modeling Labs (CML)を使ってネットワークを学ぼう!(DevNet編)Cisco Modeling Labs (CML)を使ってネットワークを学ぼう!(DevNet編)
Cisco Modeling Labs (CML)を使ってネットワークを学ぼう!(DevNet編)
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
今話題のいろいろなコンテナランタイムを比較してみた
今話題のいろいろなコンテナランタイムを比較してみた今話題のいろいろなコンテナランタイムを比較してみた
今話題のいろいろなコンテナランタイムを比較してみた
 
大規模サービスを支えるネットワークインフラの全貌
大規模サービスを支えるネットワークインフラの全貌大規模サービスを支えるネットワークインフラの全貌
大規模サービスを支えるネットワークインフラの全貌
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Prometheus at Preferred Networks
Prometheus at Preferred NetworksPrometheus at Preferred Networks
Prometheus at Preferred Networks
 
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
 
フロー技術によるネットワーク管理
フロー技術によるネットワーク管理フロー技術によるネットワーク管理
フロー技術によるネットワーク管理
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
Apache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersApache Avro vs Protocol Buffers
Apache Avro vs Protocol Buffers
 
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
 
MapReduce/YARNの仕組みを知る
MapReduce/YARNの仕組みを知るMapReduce/YARNの仕組みを知る
MapReduce/YARNの仕組みを知る
 
プログラマ目線から見たRDMAのメリットと その応用例について
プログラマ目線から見たRDMAのメリットとその応用例についてプログラマ目線から見たRDMAのメリットとその応用例について
プログラマ目線から見たRDMAのメリットと その応用例について
 
Java Clientで入門する Apache Kafka #jjug_ccc #ccc_e2
Java Clientで入門する Apache Kafka #jjug_ccc #ccc_e2Java Clientで入門する Apache Kafka #jjug_ccc #ccc_e2
Java Clientで入門する Apache Kafka #jjug_ccc #ccc_e2
 
Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門
 
CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~
CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~ CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~
CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~
 

Similar to Multi-tenancy Kafka Cluster for Line Services with 250 Billion Daily Messages

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesScyllaDB
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailInternet World
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_finalYutaka Kawai
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanAmbassador Labs
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformMatteo Merli
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaAvinash Ramineni
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 

Similar to Multi-tenancy Kafka Cluster for Line Services with 250 Billion Daily Messages (20)

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million Devices
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill Monkman
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
 

More from LINE Corporation

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTLINE Corporation
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesLINE Corporation
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたLINE Corporation
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionLINE Corporation
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingLINE Corporation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5LINE Corporation
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI TestingLINE Corporation
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE Corporation
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享LINE Corporation
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE Corporation
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享LINE Corporation
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Corporation
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed KubernetesLINE Corporation
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE Corporation
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE Corporation
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Corporation
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Corporation
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Corporation
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發LINE Corporation
 

More from LINE Corporation (20)

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LT
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin Coroutines
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 Testing
 
GA Test Automation
GA Test AutomationGA Test Automation
GA Test Automation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI Testing
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發
 

Recently uploaded

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Multi-tenancy Kafka Cluster for Line Services with 250 Billion Daily Messages

  • 1. MULTI-TENANCY KAFKA CLUSTER FOR LINE SERVICES WITH 250 BILLION DAILY MESSAGES
  • 2. SPEAKER ● Yuto Kawamura ● Senior Software Engineer ● Lead of the team providing company- wide Kafka platform ● Active in Apache Kafka Community ● Code contribution ● Speaking at Kafka Summit
  • 3. Agenda • Multitenancy company-wide Kafka Platform
  • 4. APACHE KAFKA ● Middleware for streaming data ● Highly scalable/available ● Data persistency ● Supports Pub-Sub model
  • 6. KAFKA PLATFORM ● Large scale Kafka clusters provided for any systems/services inside LINE ● Started internally from messaging server development team ● Expanded to company-wide platform
  • 7. USAGE ● Two kinds of usage ● Distributed task queue for buffering and processing business logic asynchronously ● e.g, Queueing heavy task from web app server to background task processor ● "Data Hub" for distributing data to other services
  • 9. SINGLE CLUSTER SHARED BY MANY SYSTEMS ● Concept of “Data Hub” ● Easy to find and access data ● Operational and management efficiency ● Not making operational cost to be proportional to users ● Concentrate engineering resources for maximizing reliability/ performance
  • 11. 5 million / second 210TB Daily inflow 4GB / second 50+ systems 250 billion Daily messages SCALE
  • 12. ● Cluster can protect itself against abusive workloads ● Accidental workload doesn't propagate to other users ● We can track on which client is sending requests ● Find source of strange requests ● Certain level of isolation among client workloads ● Slow response for one client doesn't appears to another client MULTITENANCY REQUIREMENTS
  • 13. ● It is more important to manage number of requests over incoming/outgoing byte rate ● Kafka is amazingly durable for large data ● Good design leveraging system functions ● Page cache for caching data ● sendfile(2) for zero copy transfer data ● Native batching ● Typical danger exists in clients sending many requests PROTECT CLUSTER AGAINST ABUSING
  • 14. REQUEST QUOTA ● Use Request Quota ● Limit “Time of broker threads that can be used by each client group” ● Set default quota ● Prevent single client from accidentally consuming all broker resources
  • 15. ISOLATION AMONG CLIENT WORKLOADS ● When can performance isolation among different clients violated? ● Let’s look at example of actual troubleshooting.
  • 16. DETECTION ● 50x ~ 100x slower response time in 99th %ile Produce response time ● Normal: ~20ms ● Observed: 50ms ~ 200ms
  • 17. FINDING #1 ● Coincidental disk read of a certain amount
  • 18. FINDING #2 ● Network threads' utilization was very high
  • 19. REQUEST HANDLING IN KAFKA BROKER ● Network Threads: Reads/ Writes request/response from/to client sockets ● Request Handler Threads: Processes requests and produces response object
  • 20. REQUEST HANDLING - READ REQUEST
  • 22. REQUEST HANDLING - WRITE RESPONSE
  • 23. NETWORK THREAD RUNS EVENT LOOP ● Multiplex and processes client sockets assigned sequentially ● It never blocks awaiting IO completion
  • 24. WHEN NETWORK THREADS GETS BUSY... ● It means one of following: ● 1. Really busy doing lots of work Many requests/responses to read/write ● 2. Blocked by some operations (which should not happen in event loop in general)
  • 25. RESPONSE HANDLING - NORMAL REQUESTS ● When response is in queue, all data to be transferred are in memory
  • 26. RESPONSE HANDLING - FETCH RESPONSE ● When response is in queue, topic data segments are not in userspace memory ● => Copy to client socket directly inside the kernel using sendfile(2) system call
  • 27. IF TARGET DATA NOT EXISTS IN PAGE CACHE ● Target data in page cache: ● => Just a memory copy. Very fast: ~ 100us ● Target data is NOT in page cache: ● => Needs to load data from disk into page cache first. Can be slow: ~ 50ms (or even slower) ● => If this happens in event loop…?
  • 28. SUSPECTING BLOCKING IN SENDFILE(2) ● Inspect duration of sendfile system calls issued by broker process ● How?
  • 29. SYSTEMTAP ● A kernel layer dynamic tracing tool and scripting language ● Safe to run in production because of low overhead ● Alternatively: DTrace, eBPF, etc…
  • 32. SOLUTION ● Make sure that data is ready on memory before the response is passed to the network thread ● => Event loop never blocks
  • 33. WARMUP PAGE CACHE ● Move blocking part to request handler threads (= single queue and pool of threads)
  • 34. WARMUP PAGE CACHE ● When Network thread calls sendfile(2) for transferring log data, it's always in page cache
  • 35. WARMUP IMPLEMENTATION ● Easiest way: Do synchronous read(2) on target data ● => Large overhead by copying memory from kernel to userland ● Why is Kafka using sendfile(2) for transferring topic data? ● => To avoid expensive large memory copy ● How can we achieve it keeping this property?
  • 36. TRICK - ZERO COPY PAGE LOAD ● Call sendfile(2) for target data with dest /dev/null ● The /dev/null driver does not actually copy data to anywhere
  • 37. WHY IT HAS ALMOST NO OVERHEAD? ● Linux kernel internally uses `splice` to implement sendfile(2) ● `splice` implementation of /dev/null returns without iterating target data
  • 39. IT WORKED ● No response time degradation with coincidence of Fetch request reading disk
  • 40. KAFKA-7504 Broker performance degradation caused by call of sendfile reading disk in network thread - x50 ~ x100 response time reduction KAFKA-4614 Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex - x100 ~ response time reduction KAFKA-6501 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown - Eliminate significant latency during broker restart WHY NOT CONTRIBUTE IT BACK?
  • 41. CONCLUSION ● Introduced our engineering for operating the company-wide Kafka platform ● Quota, SystemTap and patch understanding system deeply ● After fixing some issues, our hosting policy is working well and efficiently, keeping: ● concept of single "Data Hub" and ● operational cost not to be proportional to the number of users/ usages ● We are contributing to the world through OSS
  • 42. … AND NEXT ● Kafka platform evolution ● Clients standardization and management ● Higher availability ● SRE team ● Planning to rollout new team for Reliability Engineering ● Share knowledge/tools which are independent from Middleware ● Come and ask me more!