SlideShare a Scribd company logo
1 of 38
Download to read offline
Automation and Culture Changes for
40M Subscriber Platform Operation
Yuichiro Sano
Yahoo! JAPAN
ysano@yahoo-corp.jp
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
About me
Yuichiro Sano
• Yahoo! JAPAN Platform Head of Cloud Platform Department
• Responsible for operational support, promotion and management of on-
premises platform (PCF k8s) at Yahoo! JAPAN
2
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
About Yahoo! JAPAN
4
3/03 67
A B HC CB C H H
H BC C M CAD B B H
CF ! F F J M
F B B LD B B C F
B B F B
2
1 ! A FHD CB ! H H B
CH F J F J
BCFAC HF
03 5 51
H 0 B E F !
CB H F C H . D B
DCD H CB F H
6 CC . D B F
+ 6 0 7
B CDH A F H H F
B HC B B J
F H F H CF
FJ H F BH CFA
C B F H C
F F
1 4
+B B F B ,C
J F FJ
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Why Cloud Foundry?
• Needed to modernize systems, reduce operational cost
• Productivity. Needed an environment where Engineers could just focus on
building services and not worry about the rest
• Bosh
• Buildpack model
• Auto-scale needed
• Wanted to leverage existing OpenStack environment
5
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
The Journey to Cloud Foundry
2015 Apr
PoC start
2015 Oct
Pilot
Dev start
2016 Oct
Pilot
release
2017 Apr
PCF GA
(openStack)
2017 Oct
openstack
+1 cluster
Today
Openstack x2
vSphere x4
11,000 AI1,000 AI 3,000 AI
2018 Apr
vSphere
+2 cluster
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Current Cluster Spec
7
Sandbox Development Production
cluster 2 2 6
HV 40 120 360
Diego cell - 300 900
App Instance - 8,000 11,000
Total request/sec - - 600,000
Log traffic log/sec - - 90,000
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Ecosystem around PCF
8
RDB MQ
Splunk RBAC
FaaS
Repository
KVS
Object
Storage
Redis
GitHub
Enterprise
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Organizational Structure
9
System dep.
Platform dep.
Cloud Platform dep.
Infrastructure dep.
other Platform dep. Network IaaSLBother Platform dep.other Platform dep.
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Team Structure
10
Cloud Platform dep.
PaaS CaaS
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Introducing PCF to the Organization
Most people were new to PCF, so we did the following:
• Held company seminars, hands-on workshops for > 1000 developers
• Maintained Japanese language reference material and tutorials
• Provided best practice guidance for various development use-cases
• Used Pivotal consulting services to provide support for platform and SRE
• Created a Service Broker to handle special YJ cookie offload
11
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Growing Pains
• With the addition of clusters the operational burden and alert support
increased
• Along with the increase in #Apps, with insufficient clusters PCF was unable to
accept new apps
• With increasing log volume, our log management system became overloaded
• We found App config mistakes (eg. timeouts) could affect the cluster
(goRouter)
• Time dealing with user support issues made it difficult to introduce stable
operations policies
12
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Clearly Defining the Role of Each Team
13
CRE
SRE
PCF users 2,500 Engineers
PaaS team
Propose efficient usage methods and
proactively resolve issues to ease the
transition for engineers
Platform as a Product. Focus on increasing
system reliability by preventing failure and
promoting automation
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
CRE team
14
Developer Counterpart
l First line support for users
l Contact users in case
application mis-behaves
Developers Education
l Deliver workshops
l Provide default CI/CD
templates
l Best practices
l Applications Architecture
support
Service Integra8ons
l Create service broker
services
l Support service team
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
SRE team
15
CRE counterpart
l Implements CRE needs
l Works closely with CRE for
Capacity Planning
Platform Updates
l PCF Updates
l Add new features around
the platform
l Logging, Metrics for users
l Automate all the Things ….
Platform Stability
l Define SLO, SLA
l Platform monitoring, alerts,
etc...
l Defining alerts, what, when,
how ?
l Capacity planning
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
SRE
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Automate all the Things
17
Install PCF
Backup with BBR
Cluster
Integrity
Update PCF
Prometheus
deployment
Quota, Usage
check
Buildpack Update
Logs forwarder
Deployment
IaaS layer check
(blobstore,...)
Smoke Test
User/Space/Orgs
Management
etc ….
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
PCF Install & Update Pipeline
18
Deploy
bosh
upload
Tile
Install
Tile
1 Day
Deploy
Opsman
Create IaaS
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Buildpack Update Pipeline
19
Dev
Update
staging
Update
Production
Update
Sandbox
Update
Every month
Buildpack x 8
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
PCF Backup
20
every-2am
every-3am
every-4am
every-5am
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
PCF Smoke-Test
21
All environments
Every 10 minutes
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Monitoring all the Things
22
App Instance
Log Traffic
Cell Capacity
Avg response
time
cf push
Duration
Router Traffic
CPU Usage
Cluster
Healthcheck
Probe
Mem Usage
Log missing rate
etc ….
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cluster Dashboard
23
Routetr Rps Routetr Go Routines
Routetr Latency
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cell Capacity
24
Used
Availlable
Used
Availlable
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
App Latency
25
Latency 99%
90%
50%
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Smoke Test
26
Cf Push Duration
Cf Scale Duration
Cf Start Duration
Cf Delete Duration
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
BOSH
Use BOSH for Logging Components
• Alert & App logs are transferred to the platform
• Using BOSH makes it easier to scale the nozzle and relay components
27
Internal
Notifications
App A
App B Splunk
loggregator
Monitor
nozzle
Splunk
nozzle
Monitor
relay
Splunk
relay
easier to scale
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Log summary
28
Noisy Neighbor
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Missing Log Data Dashboard
29
100%
100%
Every 1hour
Every 1minute
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 30
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Benefits of Platform Automation
• Automation has reduced SRE team platform install & update work
by 85%
• Precision has increased and human error has been removed
which has saved a lot of effort and time.
• Anyone can now easily work with the platform so we are not
dependent on individual “rockstars”
31
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Benefits of Focus on Observability
• Time to identify problems has been radically reduced
• Able to move from a Reactive to a Pro-active problem
resolution approach
• Contributed to a more sustainable, stress-free work
environment
32
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Outcomes
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Outcomes
Speed
75%
Time to Deploy
Weeks → 4hrs
Workspace
Provisioning
Scaling
Weeks → 5secs
Time to Scale
600k
TPS
Reliability
ZERO
Downtime
2hr → 2mins
VM Recovery
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Outcomes
Security
5x
Patch Frequency
1d → 4hr
Time to Patch
Productivity
11,000
AIs
3845
Apps in Production
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Future Plans
Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
For improving the value of users
• Adding further clusters
x6 ⇒ x12
• Evolution of Log related architecture
relay ⇒ queuing
• All Platform Service Broker support
3PF ⇒ 1XXPF
• Proactive Operations
Able to safely take a nap on the job SRE J
37
> Stay Connected.
<Your CTA>
<Related Session>
#springone@s1p

More Related Content

Similar to Automation and Culture Changes for 40M Subscriber Platform Operation

Similar to Automation and Culture Changes for 40M Subscriber Platform Operation (20)

PCF 2.3: A First Look
PCF 2.3: A First LookPCF 2.3: A First Look
PCF 2.3: A First Look
 
It’s a Multi-Cloud World, But What About The Data?
It’s a Multi-Cloud World, But What About The Data?It’s a Multi-Cloud World, But What About The Data?
It’s a Multi-Cloud World, But What About The Data?
 
Heavyweights: Tipping the Scales with Very Large Foundations
Heavyweights: Tipping the Scales with Very Large FoundationsHeavyweights: Tipping the Scales with Very Large Foundations
Heavyweights: Tipping the Scales with Very Large Foundations
 
SpringOnePlatform2017 recap
SpringOnePlatform2017 recapSpringOnePlatform2017 recap
SpringOnePlatform2017 recap
 
Building a Data Exchange with Spring Cloud Data Flow
Building a Data Exchange with Spring Cloud Data FlowBuilding a Data Exchange with Spring Cloud Data Flow
Building a Data Exchange with Spring Cloud Data Flow
 
My Personal DevOps Journey: From Pipelines to Platforms
My Personal DevOps Journey: From Pipelines to PlatformsMy Personal DevOps Journey: From Pipelines to Platforms
My Personal DevOps Journey: From Pipelines to Platforms
 
Accelerating the Consumption of APIs Built on Cloud Foundry
Accelerating the Consumption of APIs Built on Cloud FoundryAccelerating the Consumption of APIs Built on Cloud Foundry
Accelerating the Consumption of APIs Built on Cloud Foundry
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
 
Numbers in the Hidden: A Pragmatic View of 'Nirvana'
Numbers in the Hidden: A Pragmatic View of 'Nirvana'Numbers in the Hidden: A Pragmatic View of 'Nirvana'
Numbers in the Hidden: A Pragmatic View of 'Nirvana'
 
PKS Networking with NSX-T: You Focus on your App, We'll Take Care of the Rest!
PKS Networking with NSX-T: You Focus on your App, We'll Take Care of the Rest!PKS Networking with NSX-T: You Focus on your App, We'll Take Care of the Rest!
PKS Networking with NSX-T: You Focus on your App, We'll Take Care of the Rest!
 
What We're Learning Adopting Spring Boot and PCF for Dell.com's eCommerce
What We're Learning Adopting Spring Boot and PCF for Dell.com's eCommerceWhat We're Learning Adopting Spring Boot and PCF for Dell.com's eCommerce
What We're Learning Adopting Spring Boot and PCF for Dell.com's eCommerce
 
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...
 
In the workshop with GCP, Home Depot & Cloud Foundry
In the workshop with GCP, Home Depot & Cloud FoundryIn the workshop with GCP, Home Depot & Cloud Foundry
In the workshop with GCP, Home Depot & Cloud Foundry
 
Connecting All Abstractions with Istio
Connecting All Abstractions with IstioConnecting All Abstractions with Istio
Connecting All Abstractions with Istio
 
Chaos Engineering for PCF
Chaos Engineering for PCFChaos Engineering for PCF
Chaos Engineering for PCF
 
What's new in Reactor Californium
What's new in Reactor CaliforniumWhat's new in Reactor Californium
What's new in Reactor Californium
 
Cloud Foundry Services on PKS with No Extra Code, "We Bosh So You Don’t Have ...
Cloud Foundry Services on PKS with No Extra Code, "We Bosh So You Don’t Have ...Cloud Foundry Services on PKS with No Extra Code, "We Bosh So You Don’t Have ...
Cloud Foundry Services on PKS with No Extra Code, "We Bosh So You Don’t Have ...
 
YugaByte DB—A Planet-Scale Database for Low Latency Transactional Apps
YugaByte DB—A Planet-Scale Database for Low Latency Transactional AppsYugaByte DB—A Planet-Scale Database for Low Latency Transactional Apps
YugaByte DB—A Planet-Scale Database for Low Latency Transactional Apps
 
Cross-Platform Observability for Cloud Foundry
Cross-Platform Observability for Cloud FoundryCross-Platform Observability for Cloud Foundry
Cross-Platform Observability for Cloud Foundry
 

More from VMware Tanzu

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

Automation and Culture Changes for 40M Subscriber Platform Operation

  • 1. Automation and Culture Changes for 40M Subscriber Platform Operation Yuichiro Sano Yahoo! JAPAN ysano@yahoo-corp.jp
  • 2. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ About me Yuichiro Sano • Yahoo! JAPAN Platform Head of Cloud Platform Department • Responsible for operational support, promotion and management of on- premises platform (PCF k8s) at Yahoo! JAPAN 2
  • 3. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
  • 4. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ About Yahoo! JAPAN 4 3/03 67 A B HC CB C H H H BC C M CAD B B H CF ! F F J M F B B LD B B C F B B F B 2 1 ! A FHD CB ! H H B CH F J F J BCFAC HF 03 5 51 H 0 B E F ! CB H F C H . D B DCD H CB F H 6 CC . D B F + 6 0 7 B CDH A F H H F B HC B B J F H F H CF FJ H F BH CFA C B F H C F F 1 4 +B B F B ,C J F FJ
  • 5. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Why Cloud Foundry? • Needed to modernize systems, reduce operational cost • Productivity. Needed an environment where Engineers could just focus on building services and not worry about the rest • Bosh • Buildpack model • Auto-scale needed • Wanted to leverage existing OpenStack environment 5
  • 6. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ The Journey to Cloud Foundry 2015 Apr PoC start 2015 Oct Pilot Dev start 2016 Oct Pilot release 2017 Apr PCF GA (openStack) 2017 Oct openstack +1 cluster Today Openstack x2 vSphere x4 11,000 AI1,000 AI 3,000 AI 2018 Apr vSphere +2 cluster
  • 7. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Current Cluster Spec 7 Sandbox Development Production cluster 2 2 6 HV 40 120 360 Diego cell - 300 900 App Instance - 8,000 11,000 Total request/sec - - 600,000 Log traffic log/sec - - 90,000
  • 8. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Ecosystem around PCF 8 RDB MQ Splunk RBAC FaaS Repository KVS Object Storage Redis GitHub Enterprise
  • 9. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Organizational Structure 9 System dep. Platform dep. Cloud Platform dep. Infrastructure dep. other Platform dep. Network IaaSLBother Platform dep.other Platform dep.
  • 10. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Team Structure 10 Cloud Platform dep. PaaS CaaS
  • 11. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Introducing PCF to the Organization Most people were new to PCF, so we did the following: • Held company seminars, hands-on workshops for > 1000 developers • Maintained Japanese language reference material and tutorials • Provided best practice guidance for various development use-cases • Used Pivotal consulting services to provide support for platform and SRE • Created a Service Broker to handle special YJ cookie offload 11
  • 12. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Growing Pains • With the addition of clusters the operational burden and alert support increased • Along with the increase in #Apps, with insufficient clusters PCF was unable to accept new apps • With increasing log volume, our log management system became overloaded • We found App config mistakes (eg. timeouts) could affect the cluster (goRouter) • Time dealing with user support issues made it difficult to introduce stable operations policies 12
  • 13. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Clearly Defining the Role of Each Team 13 CRE SRE PCF users 2,500 Engineers PaaS team Propose efficient usage methods and proactively resolve issues to ease the transition for engineers Platform as a Product. Focus on increasing system reliability by preventing failure and promoting automation
  • 14. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ CRE team 14 Developer Counterpart l First line support for users l Contact users in case application mis-behaves Developers Education l Deliver workshops l Provide default CI/CD templates l Best practices l Applications Architecture support Service Integra8ons l Create service broker services l Support service team
  • 15. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SRE team 15 CRE counterpart l Implements CRE needs l Works closely with CRE for Capacity Planning Platform Updates l PCF Updates l Add new features around the platform l Logging, Metrics for users l Automate all the Things …. Platform Stability l Define SLO, SLA l Platform monitoring, alerts, etc... l Defining alerts, what, when, how ? l Capacity planning
  • 16. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SRE
  • 17. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Automate all the Things 17 Install PCF Backup with BBR Cluster Integrity Update PCF Prometheus deployment Quota, Usage check Buildpack Update Logs forwarder Deployment IaaS layer check (blobstore,...) Smoke Test User/Space/Orgs Management etc ….
  • 18. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ PCF Install & Update Pipeline 18 Deploy bosh upload Tile Install Tile 1 Day Deploy Opsman Create IaaS
  • 19. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Buildpack Update Pipeline 19 Dev Update staging Update Production Update Sandbox Update Every month Buildpack x 8
  • 20. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ PCF Backup 20 every-2am every-3am every-4am every-5am
  • 21. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ PCF Smoke-Test 21 All environments Every 10 minutes
  • 22. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Monitoring all the Things 22 App Instance Log Traffic Cell Capacity Avg response time cf push Duration Router Traffic CPU Usage Cluster Healthcheck Probe Mem Usage Log missing rate etc ….
  • 23. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cluster Dashboard 23 Routetr Rps Routetr Go Routines Routetr Latency
  • 24. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cell Capacity 24 Used Availlable Used Availlable
  • 25. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ App Latency 25 Latency 99% 90% 50%
  • 26. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Smoke Test 26 Cf Push Duration Cf Scale Duration Cf Start Duration Cf Delete Duration
  • 27. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ BOSH Use BOSH for Logging Components • Alert & App logs are transferred to the platform • Using BOSH makes it easier to scale the nozzle and relay components 27 Internal Notifications App A App B Splunk loggregator Monitor nozzle Splunk nozzle Monitor relay Splunk relay easier to scale
  • 28. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Log summary 28 Noisy Neighbor
  • 29. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Missing Log Data Dashboard 29 100% 100% Every 1hour Every 1minute
  • 30. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 30
  • 31. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Benefits of Platform Automation • Automation has reduced SRE team platform install & update work by 85% • Precision has increased and human error has been removed which has saved a lot of effort and time. • Anyone can now easily work with the platform so we are not dependent on individual “rockstars” 31
  • 32. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Benefits of Focus on Observability • Time to identify problems has been radically reduced • Able to move from a Reactive to a Pro-active problem resolution approach • Contributed to a more sustainable, stress-free work environment 32
  • 33. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Outcomes
  • 34. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Outcomes Speed 75% Time to Deploy Weeks → 4hrs Workspace Provisioning Scaling Weeks → 5secs Time to Scale 600k TPS Reliability ZERO Downtime 2hr → 2mins VM Recovery
  • 35. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Outcomes Security 5x Patch Frequency 1d → 4hr Time to Patch Productivity 11,000 AIs 3845 Apps in Production
  • 36. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Future Plans
  • 37. Unless otherwise indicated, these slides are © 2013-2018 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ For improving the value of users • Adding further clusters x6 ⇒ x12 • Evolution of Log related architecture relay ⇒ queuing • All Platform Service Broker support 3PF ⇒ 1XXPF • Proactive Operations Able to safely take a nap on the job SRE J 37
  • 38. > Stay Connected. <Your CTA> <Related Session> #springone@s1p