5. Challenges
4
Productivity
Release Cycle are long,
lack of innovation
Trust
Silo Team, lack of
coordination
Scaling Is Hard
Lack of Agility
Complexity
Change are complex,
increased risk
Business Growth
How we handle business
growth without
interrupting service
Business Technical
6. History
19/10/20 XL Axiata Public 6
β’ Cloud Path
β’ GKE On-Prem
β’ Microservices
β’ Container VM
β’ SOA
β’ Physical Server
2016 2018 2020
7. History - 2016
19/10/20 XL Axiata Public
7
β’ SOA
β’ Physical Server
Legacy
Infrastructure
Afraid of doing
changes
Dependency to
3rd party
Large operation
team
8. History - 2018
19/10/20 XL Axiata Public 8
8
Multi Cloud
Easy
to operate
No downtime
Self service
10. Our Roadmap
10
2011
2012
2013
2014
2015
2016
2017
2018
2019
Phase 1
β’ TechnologyAdaptation for
in-house application
β’ CI/CD pipeline
β’ DevOps Culture
Phase 2
β’ Implementation
Kubernetes platform for
new in-house
development and docker
based application from
partner
2019
2021
Target
Architecture:
application
development run on
top container
technology on cloud
2018
Initiate
Enable
Expansion
2020
15. CI/CD
16
Source
Build
Deploy
Regression
Test
Deploy
& Publish
Dev
Docs
Code
Quality
#1
Design
Low Level Documentation stored on
confluence linked to Jira Software for
detailed Product Backlog
#2
Development
Developer Dev using various SDK
#3
Code Commit!
A Developer Commit will trigger the
Jenkins Pipeline
#4
Code Scanner
Scan & Analyse Static Code
#5
Build Package
Build package & store deployment
artefact
#6
Deploy on QA
Deploy services on SIT environment
#7
Regression Testing
Run Regression testing scenario for
deployed service & its dependency
#8
Go Live!
(optionally) CCB & ManualApproval,
then Deploy to production
Siege
16. 22/10/20 XL Axiata Public 16
Code
Commit
Binary
Build
Deployto
testenv
Manual
Test
Deployto
stageenv
Sanity
Promoteto
Production
CodeCommit UnitTest
BuildBinary
and Docker
Image
DAST(Security
Test)
Deploy totest
env
Automation
Test
Deployto
stageenv CCBInform
Performance
Test SanityTest
Promoteto
Production
CI/CD
17. 19/10/20 XL Axiata Public 17
Platform as a product
Path to Production
Monitoring & MetricCapacity Planning
Emergency Response
Operations Model
19. Platform as a
Product
1 β Chaotic: No documentation for product strategy/roadmap
3 β Managed: Team working for backlog in manner way
4 β Measured: Customer satisfaction survey sent regularly.
5 β Continuous Improvement: Customer participate for ongoing feedback loop to
validate product market fit.
2 β Defined: Backlog already created
Treating the platform like a product that
evolves over time to meet the needs of
end users. This is accomplished by
applying a combination of Lean, User
Centered Design and XP practices.
20. Path to
Production
1 β Chaotic: There is no docs path to production. Expect will use legacy process
3 β Managed: Docs is there and implemented consistently. There is some
automation to non prod env
4 β Measured: Release process fully automated with high trust to test coverage and
pipeline.
5 β Continuous Improvement: Create metric in every release and capture failure
rate. There is collaboration with team to evolve process path to production
2 β Defined: Docs is there but there is some exception for few process
Updates to the platform and to
application workloads running on the
platform are unencumbered by legacy
governance. Release process
incorporates cloud-native architecture,
tooling and practices.
21. Monitoring and
Metrics
1 β Chaotic: No SLI/SLO defined. User report issues. Platform team not aware until
they reported
3 β Managed: Monitoring provides visibility and appropriate alert are sent
4 β Measured: Monitoring and alerting strategies are adjusted as a response to
violations of SLOs
5 β Continuous Improvement: Team iterate on new monitoring graph, proactive
tweak alerting strategy to align with SLO, minimize false alerts
2 β Defined: Some SLI/SLO defined. And clearly defined ownership of monitoring
Establishing desired service behavior,
measuring how the service is actually
behaving, and correcting discrepancies.
Examples: response latency, error or
unanswered query rate, peak utilization
of resources
22. Capacity Planning
1 β Chaotic: No forecasting of future demand. Oversized deployment of resources
(both hardware and GKE components). No tracking of historical resource utilization.
3 β Managed: Capacity actively monitored. Historical trends are tracked and
analyzed to in inform forecasting model. Credible forecast 4-6 quarters
4 β Measured: Mature forecasting based on input user. Busineed need and
historical trend analysis and load testing
5 β Continuous Improvement: There is a feedback loop that incorporates the
accuracy of past demand forecasts and the needs of the business.
2 β Defined: There is credible forecast for 2 quarters based on customer needs
Projecting future demand and ensuring
that a service has enough capacity in
appropriate locations to satisfy that
demand. Examples: quarterly futuremon
projections, clusterizer scenarios
23. Emergency
Response
1 β Chaotic: There is no documented plan for emergency response. The same
incidents keep happening.
3 β Managed: Good follow through on postmortem action items. And root cause is
correctly identified
4 β Measured: MTTR is measured.
5 β Continuous Improvement: Action items are closed in timely manner. Periodical
review of alert & incident patterns
2 β Defined: Docs is there. Postmortem process is defined in action items
Noticing and responding effectively to
service failures in order to preserve the
service's conformance to SLA. Examples:
on-call rotations, prober, dip detection,
primary/secondary/escalation,
playbooks, wheel of misfortune, prod
VPN rooms
24. 24
Summary
An idea in the morning can ship by evening
New
Business
Dev Code Testing Staging CCB
Deploy to
Production
Actor Business
IT Dev
IT Dev IT Dev
IT Testing
IT Dev
IT Ops
Business
IT Dev
IT Ops
Sol Arch
IT Dev
IT Ops
Before x days x days 3-4 days 1 day Next CCB Midnight
(downtime)
After x days x days hours minutes No CCB Minutes,
anytime
Eliminate much of this lead time and automate effort through Microservices platform
improved using automated testing