SpringOne Platform 2019
SB Payment Service (part of the SoftBank Group) is developing a next-generation, in-house payment system using Spring and Pivotal Application Service.
In this session, we'll talk about how and why we introduced Pivotal Platform to the organization and provide an overview of the Spring Boot- and Spring Cloud-based architecture, as well as how we implemented CI/CD and designed for observability and high resilience. We'll also discuss the benefits introducing a platform has brought to both development and operations.
This system is now running in production. We'd like to share the things we've done to reduce incidents in production and make operations stable.
3. We issue credit cards
"Softbank Card" to
consumers.
Credit Card Issuer
Payment Aggregator
We provide a comprehensive
payment platform that offers
various online payment solutions.
Credit Card Acquirer
We are the only payment aggregator
in Japan who accepts and processes
transactions made with major brands
(VISA/ mastercard/ UnionPay).
Softbank customers can also
pay for online purchases via
their phone bill as Japan’s
leading carrier company.
Carrier Billing Provider
4. We issue credit cards
"Softbank Card" to consumer.
Credit Card Issuer
Payment Aggregator
We provide a comprehensive
payment platform that offers
various online payment solutions.
Credit Card Acquirer
We are an only payment aggregator
in Japan that accepts and
processes transactions made with a
major brands(VISA/ mastercard/
UnionPay) as an acquirer.
We provide consumers to
pay for online purchases with
their phone bill as Japanese
leading carrier company.
Carrier Billing Provider
5. If you have any questions, we’d be
happy to answer them at the end of
our presentation
8. We left all development of our services to
outside vendors.
There were zero in-house engineers writing code.
The development environment was not ready.
22. Requirements for the new system
●
●
●
Before…
Every project was lead by external vendors
(A long path from estimation / contract /
requirement definition to
acceptance)
23. Requirements for the new system
●
●
●
今までは…
案件毎に開発ベンダさんのチカラを借りて構築
(見積もり/要件定義から検収まで長い道のり)
Outsourcing made it impossible to deliver
incrementally and quickly in the agile way.
24. Requirements for the new system
●
●
●
Speedy delivery and
Continuous improvement
through in-house development
34. Team structure and responsibility boundaryTeam structure and responsibility boundary
Networking
Storage
Servers
Virtualization
O/S
Middleware
Runtime
Platform Operators
2 people
35. Team structure and responsibility boundary
Networking
Storage
Servers
Virtualization
O/S
Middleware
Runtime
Data
Application
Application Developers
4 people
Platform Operators
2 people
42. ( ➡ )
API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
43. ( ➡ )
API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
、 、 、
、
44. ( ➡ )
API
Gateway
Service A
Service B
Service C
Financial
Institution A
Financial
Institution B
Financial
Institution C
Merchant X
Merchant Y
Merchant Z
45. ( ➡ )
API
Gateway
Service A
Service B
Service C
Financial
Institution A
Financial
Institution B
Financial
Institution C
Merchant X
Merchant Y
Merchant Z
、 、
、
46. ( ➡ )
API
Gateway
Service A
Service B
Service C
Merchant X
Financial
Institution A
Financial
Institution B
Financial
Institution C
Merchant X
Merchant Y
Merchant Z
Each app is deployed on PAS as
a microservice
47. ( ➡ )
API
Gateway
Service A
Service B
Service C
Financial
Institution A
Financial
Institution B
Financial
Institution C
Merchant X
Merchant Y
Merchant Z
Each app is implemented
with Java and Spring Boot
48. ( ➡ )
API
Gateway
Service A
Service B
Service C
Financial
Institution A
Financial
Institution B
Financial
Institution C
Merchant X
Merchant Y
Merchant Z
50. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
The systems of merchants system and financial
institutions are out of our control
( ➡ )
51. Hystrix
API
Gateway
Service A
Service B
Service C
Merchant A
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Introduced Hystrix Circuit Breaker
for inter-system communications
Hystrix
Hystrix
Hystrix
( ➡ )
52. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Without Circuit Breaker,
If a system outage happens in
financial institution A …
( ➡ )
53. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Slow Response, Timeout
( ➡ )
54. ( ➡ )
API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
The failure is propagated to
Service A, blocking processes
and causing possible thread
exhaustion
55. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Failure propagated to API
Gateway causing blocked
processes, thread depletion
( ➡ )
56. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
( ➡ )
Failure propagated to API
Gateway causing blocked
processes, thread depletion
57. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
B and C are affected by the failure
of financial institution A.
( ➡ )
58. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Hystrix
( ➡ )
With Circuit Breaker
If a system outage happens in
financial institution A …
59. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Circuit Breaker prevents the failure
propagation.
No worry about the effect to other financial
institutions.
( ➡ )
60. API
Gateway
Service A
Service B
Service C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Circuit Breaker adds fault tolerance
and resiliency to the app
( ➡ )
62. ( ➡ )
Notification
Gateway
Receiver A
Receiver B
Receiver C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Hystrix
63. ( ➡ )
Notification
Gateway
Receiver A
Receiver B
Receiver C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Hystrix
Introduced RabbitMQ + Spring Cloud
Stream for async processing
64. ( ➡ )
Notification
Gateway
Receiver A
Receiver B
Receiver C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Hystrix
65. ( ➡ )
Notification
Gateway
Receiver A
Receiver B
Receiver C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Hystrix
When a failure happens in the
merchant system
66. ( ➡ )
Notification
Gateway
Receiver A
Receiver B
Receiver C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Hystrix
The message will be diverted to a “Dead
Letter Queue” and requeued later
67. ( ➡ )
Notification
Gateway
Receiver A
Receiver B
Receiver C
Merchant X
Merchant Y
Merchant Z
Financial
Institution A
Financial
Institution B
Financial
Institution C
Hystrix
Hystrix
Hystrix
Even if an outage happens in the
merchant, the Circuit Breaker will
prevent the failure propagation.
Hystrix
101. Dev
We can always check all metrics in
Grafana that we have ever collected
from logs
・CPU
・JVM Memory(per area)
・Thread
・GC(frequency、time)
・Classloader
102. Dev
Can always check metrics in Grafana
we have ever collected from logs
・CPU
・JVM Memory(per area)
・Thread
・GC(frequency、time)
・Classloader
104. Ops
Dev
No outsourcing to
external monitoring center
org_name: OrgA
severity: fatal
org_name: OrgA
severity: /.*/
severity: fatal
severity: /.*/
Set route/receiver
using org_name,
severity as keys
Twilio call for emergency
(24 hour support)
105. Link to the Grafana dashboard
Detect RabbitMQ Dead Letter QueueCall
132. Case2: Detect of abnormal trend
With long time period,
delayed on specific time slot.
22:00 22:00 22:00
No problem
133. Case2: Detect of abnormal trend
No transaction finished.
Transaction might be locked.
134. On-premise
Existing app
Apps on PCF
Case2: Detect of abnormal trend
Delays on a specific time slot
Re-considered the number of
parallels and timeout.
135. Case2: Detect of abnormal trend
Our system was improved
by detecting slight abnormality
139. Before After
Release
Improvement
Release Work Manual work One click
Release Quality Human error occurs No mistakes
Release Time 45 min 5 min
Use of
Cloud
Scaleout operation Manual work One click
Container
Orchestration
- Leave it to the
platform
Auto-restart Self-made tools Leave it to the
platform
141. A platform cannot be built by only relying on
outsourced vendors.
It’s possible to build and operate a platform by
taking ownership in-house.
A powerful platform allows a small engineering
team to focus on application development.