SlideShare a Scribd company logo
1 of 28
Michael Kehoe
Staff Site Reliability Engineer
LinkedIn
Going all in:
From single use-case to many
2
Overview
• The LinkedIn Story
• Couchbase Use-Cases
• Development & Operations
• Conclusions
• Questions
$ whoami
3
Michael Kehoe
• Staff Site Reliability Engineer (SRE)
• Production-SRE team
• Funny accent = Australian
• Contact
• linkedin.com/in/michaelkkehoe
• @matrixtek
$ whatis SRE
4
Michael Kehoe
• Site Reliability Engineering
• Operations for the production application environment
• Responsibilities include
• Architecture design
• Capacity planning
• Operations
• Tooling
$ whatis CBVT
5
Michael Kehoe
• Couchbase Virtual Team
• ~10 SRE’s
• 2 Software Engineers
• Sponsored by SRE Director
• 5-90% of their time to support Couchbase
• Encourage as many people to contribute as possible
• What do we do?
• Operational work on Couchbase clusters
• Evangelize the use of Couchbase within LinkedIn
• Develop tools for the Couchbase Ecosystem
6
The LinkedIn Story
• Founded in 2002, LinkedIn has grown into the world’s largest professional social
media network
• 30 offices in 24 countries, Available in 24 languages
• More than 450+ million members worldwide
7
The LinkedIn Story
• Growth in Products
• Profiles
• Groups
• Recruiter
• Sales Navigator
• Growth in Internet Traffic
• Billions of page-hits per day
• 100k+ QPS to production services
In-Memory Storage Needs
8
The LinkedIn Story
• LinkedIn started as an Oracle shop
• Hyper-growth = Scaling challenges
• Read-Scaling becomes important
• Applicable use-cases
• Simple cache store
• Pre-warmed
• Read through
• Potential for Source of Truth (SoT) store
Enter Couchbase
9
The LinkedIn Story
• Until 2012, we were only using Memcache as a non SoT In-Memory store
• Drawbacks
• Difficult to pre-warm
• No partitioning/sharding (had to write our own)
• Cold-cache restarts
• Difficult to move data across hosts/clusters data-centers
Enter Couchbase
10
The LinkedIn Story
• Evaluated replacement systems for Memcached: Mongo, Redis, and others
• Couchbase had distinct advantages:
• Simple replacement for Memcached
• Built-in replication and cluster expansion
• Automatic partitioning
• Low latency
• Async writes to disk
• Building tooling is simple
Enter Couchbase
11
The LinkedIn Story
• Today we run Couchbase in our Corporate, Staging and Production environments
• Production/ Staging statistics:
• 148 buckets
• 2821 hosts
• 10M+ QPS
• Largest Clusters:
• By Hosts: 72 Hosts
• By Documents: 1.4B Documents
• By QPS: 2.5M QPS
Summary
12
Use-Cases
Today’s use-cases:
• Simple read-through cache
• Ephemeral Counter Store
• Temporary de-duping store
• SoT data-store for internal tooling
Simple read-through cache
13
Use-Cases
• Drop-in replacement for memcache
• Read-scaling
• Protecting backend database from large amounts of traffic
• E.g. 3rd party ingestion credential cache
Counter Store
14
Use-Cases
• In certain places, we simply need to increment counters from multiple systems and
store them
• E.g. Anti-abuse/Anti-scraping systems (Fuse)
Temporary De-duping store
15
Use-Cases
• Need to de-dup data over a large application cluster
• E.g. Email systems – Ensure we don’t send the same email twice
SoT Store for Internal Tools
16
Use-Cases
• For Non-Member facing tools, we use Couchbase as a SoT store.
• Benefits:
• Schema-less
• Short setup time
• Couchbase Python Client works easily in our environment
• Use views for simple map-reduce
• Example Uses:
• Nurse – Autoremediation system
• TrafficshiftIn – Global traffic automation system
• Availability – Storing and tracking Linkedin availability data
Couchbase Ecosystem
17
The LinkedIn Story
18
Developing around Couchbase
• Java – li-couchbase-client
• Wrapper around standard Java Couchbase Client
• Custom metrics emission
• Using Spring interface
• Storing data as Java serialized objects
• Python – couchbase-python-client
19
Operational Tooling
In order to efficiently use Couchbase as SRE’s, we need the following:
• Provisioning
• Installation
• Monitoring & Alerting
• Infrastructure Visibility
Provisioning
20
Operational Tooling
• Provisioning Flow
• Seek estimated usage statistics for cluster
• Size of data to be stored
• QPS
• Redundancy Needs
• Calculate cluster sizing
• Currently done with a template
• Couchbase has a simple calculator available online: http://docs.couchbase.com/prebuilt/calculators/sizing-
calc.html
• Request hardware for cluster(s)
Installation
21
Operational Tooling
• Process
• Enter cluster metadata into our management system (Range)
• Use Salt States to install and configure cluster
• See Issa Fattah’s post for more information:
• https://engineering.linkedin.com/blog/2016/04/leveraging-saltstack-to-scale-couchbase
• Benefits
• Ability to perform ‘state enforcement’
• Using Salt Pillar’s to encrypt cluster/ bucket passwords end-to-end
Monitoring & Alerting
22
Operational Tooling
• We run a daemon on each Couchbase Server that collects metrics every minute via
Couchbase API’s
• Use cluster metadata from range to build dashboards with our own system
InGraphs
• See: ‘Monitoring production deployments’: 4pm - Great America 1
Monitoring & Alerting
23
Operational Tooling
Management
24
Operational Tooling
• We want to see a world-view of all the clusters we run
• Having bucket cluster/server level statistics is useful
• Having a global view of who owns and operates each cluster/ bucket is useful
Management
25
Operational Tooling
26
Conclusions
• Couchbase was a natural fit into our existing infrastructure
• Building an ecosystem around Couchbase was important to us and has helped
Couchbase be successful at LinkedIn
• Expanding use of Couchbase
• In the past year we’ve grown the number of buckets over 50%
• Starting to use Views in production
• Moving Couchbase into LinkedIn standard deployment infrastructure
27
Thank You
Questions?
©2014 LinkedIn Corporation. All Rights Reserved.©2014 LinkedIn Corporation. All Rights Reserved.

More Related Content

What's hot

What's hot (20)

Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabric
 
Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Server Monitoring from the Cloud
Server Monitoring from the CloudServer Monitoring from the Cloud
Server Monitoring from the Cloud
 
Microsoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoringMicrosoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoring
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Intro to.net core 20170111
Intro to.net core   20170111Intro to.net core   20170111
Intro to.net core 20170111
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to Espresso
 
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified log
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation Engine
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Office Online Server 2016 - a must for on-premises installation for SharePoin...
Office Online Server 2016 - a must for on-premises installation for SharePoin...Office Online Server 2016 - a must for on-premises installation for SharePoin...
Office Online Server 2016 - a must for on-premises installation for SharePoin...
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance
 

Viewers also liked

CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the day
Pooja Tangi
 

Viewers also liked (15)

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potential
 
CouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones AndroidCouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones Android
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errors
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the day
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineering
 
Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth model
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE way
 
Event driven-automation and workflows
Event driven-automation and workflowsEvent driven-automation and workflows
Event driven-automation and workflows
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
 

Similar to Couchbase Connect 2016

Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith
Markus Eisele
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
Membase
 
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Lucas Jellema
 

Similar to Couchbase Connect 2016 (20)

Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith
 
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureSql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su Azure
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
The Need of Cloud-Native Application
The Need of Cloud-Native ApplicationThe Need of Cloud-Native Application
The Need of Cloud-Native Application
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Sitecore - what to look forward to
Sitecore - what to look forward toSitecore - what to look forward to
Sitecore - what to look forward to
 
Cloud-native Data
Cloud-native DataCloud-native Data
Cloud-native Data
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisCloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia Davis
 
Criteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkCriteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech Talk
 
TechBeats #2
TechBeats #2TechBeats #2
TechBeats #2
 
High performance web sites with multilevel caching
High performance web sites with multilevel cachingHigh performance web sites with multilevel caching
High performance web sites with multilevel caching
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
 
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher Kubernetes
 
Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
 
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginnersKoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
 
Meet Magento Belarus 2015: Jurģis Lukss
Meet Magento Belarus 2015: Jurģis LukssMeet Magento Belarus 2015: Jurģis Lukss
Meet Magento Belarus 2015: Jurģis Lukss
 
Urbanesia - Development History
Urbanesia - Development HistoryUrbanesia - Development History
Urbanesia - Development History
 

More from Michael Kehoe

More from Michael Kehoe (17)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Couchbase Connect 2016

  • 1. Michael Kehoe Staff Site Reliability Engineer LinkedIn Going all in: From single use-case to many
  • 2. 2 Overview • The LinkedIn Story • Couchbase Use-Cases • Development & Operations • Conclusions • Questions
  • 3. $ whoami 3 Michael Kehoe • Staff Site Reliability Engineer (SRE) • Production-SRE team • Funny accent = Australian • Contact • linkedin.com/in/michaelkkehoe • @matrixtek
  • 4. $ whatis SRE 4 Michael Kehoe • Site Reliability Engineering • Operations for the production application environment • Responsibilities include • Architecture design • Capacity planning • Operations • Tooling
  • 5. $ whatis CBVT 5 Michael Kehoe • Couchbase Virtual Team • ~10 SRE’s • 2 Software Engineers • Sponsored by SRE Director • 5-90% of their time to support Couchbase • Encourage as many people to contribute as possible • What do we do? • Operational work on Couchbase clusters • Evangelize the use of Couchbase within LinkedIn • Develop tools for the Couchbase Ecosystem
  • 6. 6 The LinkedIn Story • Founded in 2002, LinkedIn has grown into the world’s largest professional social media network • 30 offices in 24 countries, Available in 24 languages • More than 450+ million members worldwide
  • 7. 7 The LinkedIn Story • Growth in Products • Profiles • Groups • Recruiter • Sales Navigator • Growth in Internet Traffic • Billions of page-hits per day • 100k+ QPS to production services
  • 8. In-Memory Storage Needs 8 The LinkedIn Story • LinkedIn started as an Oracle shop • Hyper-growth = Scaling challenges • Read-Scaling becomes important • Applicable use-cases • Simple cache store • Pre-warmed • Read through • Potential for Source of Truth (SoT) store
  • 9. Enter Couchbase 9 The LinkedIn Story • Until 2012, we were only using Memcache as a non SoT In-Memory store • Drawbacks • Difficult to pre-warm • No partitioning/sharding (had to write our own) • Cold-cache restarts • Difficult to move data across hosts/clusters data-centers
  • 10. Enter Couchbase 10 The LinkedIn Story • Evaluated replacement systems for Memcached: Mongo, Redis, and others • Couchbase had distinct advantages: • Simple replacement for Memcached • Built-in replication and cluster expansion • Automatic partitioning • Low latency • Async writes to disk • Building tooling is simple
  • 11. Enter Couchbase 11 The LinkedIn Story • Today we run Couchbase in our Corporate, Staging and Production environments • Production/ Staging statistics: • 148 buckets • 2821 hosts • 10M+ QPS • Largest Clusters: • By Hosts: 72 Hosts • By Documents: 1.4B Documents • By QPS: 2.5M QPS
  • 12. Summary 12 Use-Cases Today’s use-cases: • Simple read-through cache • Ephemeral Counter Store • Temporary de-duping store • SoT data-store for internal tooling
  • 13. Simple read-through cache 13 Use-Cases • Drop-in replacement for memcache • Read-scaling • Protecting backend database from large amounts of traffic • E.g. 3rd party ingestion credential cache
  • 14. Counter Store 14 Use-Cases • In certain places, we simply need to increment counters from multiple systems and store them • E.g. Anti-abuse/Anti-scraping systems (Fuse)
  • 15. Temporary De-duping store 15 Use-Cases • Need to de-dup data over a large application cluster • E.g. Email systems – Ensure we don’t send the same email twice
  • 16. SoT Store for Internal Tools 16 Use-Cases • For Non-Member facing tools, we use Couchbase as a SoT store. • Benefits: • Schema-less • Short setup time • Couchbase Python Client works easily in our environment • Use views for simple map-reduce • Example Uses: • Nurse – Autoremediation system • TrafficshiftIn – Global traffic automation system • Availability – Storing and tracking Linkedin availability data
  • 18. 18 Developing around Couchbase • Java – li-couchbase-client • Wrapper around standard Java Couchbase Client • Custom metrics emission • Using Spring interface • Storing data as Java serialized objects • Python – couchbase-python-client
  • 19. 19 Operational Tooling In order to efficiently use Couchbase as SRE’s, we need the following: • Provisioning • Installation • Monitoring & Alerting • Infrastructure Visibility
  • 20. Provisioning 20 Operational Tooling • Provisioning Flow • Seek estimated usage statistics for cluster • Size of data to be stored • QPS • Redundancy Needs • Calculate cluster sizing • Currently done with a template • Couchbase has a simple calculator available online: http://docs.couchbase.com/prebuilt/calculators/sizing- calc.html • Request hardware for cluster(s)
  • 21. Installation 21 Operational Tooling • Process • Enter cluster metadata into our management system (Range) • Use Salt States to install and configure cluster • See Issa Fattah’s post for more information: • https://engineering.linkedin.com/blog/2016/04/leveraging-saltstack-to-scale-couchbase • Benefits • Ability to perform ‘state enforcement’ • Using Salt Pillar’s to encrypt cluster/ bucket passwords end-to-end
  • 22. Monitoring & Alerting 22 Operational Tooling • We run a daemon on each Couchbase Server that collects metrics every minute via Couchbase API’s • Use cluster metadata from range to build dashboards with our own system InGraphs • See: ‘Monitoring production deployments’: 4pm - Great America 1
  • 24. Management 24 Operational Tooling • We want to see a world-view of all the clusters we run • Having bucket cluster/server level statistics is useful • Having a global view of who owns and operates each cluster/ bucket is useful
  • 26. 26 Conclusions • Couchbase was a natural fit into our existing infrastructure • Building an ecosystem around Couchbase was important to us and has helped Couchbase be successful at LinkedIn • Expanding use of Couchbase • In the past year we’ve grown the number of buckets over 50% • Starting to use Views in production • Moving Couchbase into LinkedIn standard deployment infrastructure
  • 28. ©2014 LinkedIn Corporation. All Rights Reserved.©2014 LinkedIn Corporation. All Rights Reserved.

Editor's Notes

  1. The LinkedIn Story Couchbase Use-Cases Development & Operations Conclusions Questions
  2. Site Reliability Engineering A term coined by Ben Treynor from Google Hybrid of: Sysadmin Network Engineer Architect Troubleshooter Software Engineer Ninja’s – Digital economy
  3. 10 SRE’s, with a tech-lead Sponsored by a SRE Director Input from Software Engineers on development
  4. Founded in 2002, LinkedIn has grown into the world’s largest professional social media network 30 offices in 24 countries, Available in 24 languages More than 450+ million members worldwide
  5. LinkedIn started as an Oracle shop To-date, we still run a significant number of Oracle databases Oracle is fine for writes, scaling reads becomes challenging HyperGrowth == Scaling challenges Scaling writes isn’t a common problem in most cases Scaling reads to 100k+ QPS, is challenging Failures in read-scaling infra can take down back-end systems Applicable use-cases Simple cache store Pre-warmed Read-through SoT Store
  6. Until 2012, we were only using Memcache as a non SoT In-Memory store Drawbacks of memcache: Difficult to pre-warm, not easy to copy-data No native sharding for clusters, had to write our own Restarting memcache servers caused problems Couldn’t copy data across for new DC’s, expanding clusters etc Mid-2012, started testing Couchbase
  7. Evaluated replacement systems for Memcached: Mongo, Redis, and others Couchbase had distinct advantages Simple replacement for memcache  JAVA Spring made this simpler Built-in replication and cluster expansion, significantly reduces ops-workload Automatic partitioning, doesn’t become a concern anymore Low-latency, reads from disk are still very fast Async write to disk, can write a low of data at once without it being a problem Lots of API’s that make tooling relatively simple
  8. Insert fuse architecture
  9. We have a deduplication filter in stork that you can take advantage of to make sure we don't send duplicates of your email. This is highly recommended for any email using kafka (kafka can potentially deliver your email to our system twice)
  10. Don’t use as SoT store as Espresso is our primary key-value store