In this talk we are going to explore how Hailo evolved a monolithic LAMP stack into micro-services platform based on Go. We are going to share the challenges we faced and some of the design patterns that helped us scale our system. We will take a peek into our internal orchestration architecture and the tooling we built to help us automate and manage our platform
2. Outline
• Intro to the Hailo world
• Our cloud journey and architecture evolution
• Platform design patterns and challenges
• Tooling
AWS User Group UK 2014
4. AWS User Group UK 2014
The world’s highest-rated taxi app – almost 20,000
five-star reviews
To date, Hailo has carried more than 11 million
passengers
Hailo has over 50,000 registered taxi drivers
worldwide
6. November 2011: Hailo 1.0 Launch
Users: 1
Regions: eu-west-1
AWS User Group UK 2014
7. eu-west-1
Java
MYSQL
PHP
Architecture specifics
• Monolithic PHP and Java applications
• Built and supported by 3-4 backend
engineers
• City-specific environments
• MySQL master-master replication for
resilience
• Multi-AZ since day 1
AWS specifics
Route 53 ELB S3
AWS User Group UK 2014
8. Challenges
• Hard to develop new features
• Painful to push code changes and to support many independent city specific
environments
• Adding new instances and more capacity is a very slow and expensive
process
• Unreliable and slow failover procedures
• SPOF
AWS User Group UK 2014
9. December 2013: Hailo 2.0
AWS User Group UK 2014
Users: 1 000 000+
Regions: eu-west-1, us-east-1, ap-northeast-1
10. Architecture specifics
• Micro-services architecture based on Go and Java
• Seamless service discovery, service to service communication,
monitoring and instrumentation
• Everything is automated
• Ability to scale services up and down based on demand
AWS specifics
Route 53 ELB S3
AWS User Group UK 2014
Autoscaling Cloudfront Redshift
12. Challenges
• Hard to develop new features
Completing new features in days, not months
• Painful to push code changes
Seamless service deployment and ability to run multiple versions of a service
• Adding new instances and adding more capacity is slow
Our servers scale up and down based on demand
• Unreliable and slow failover procedures
Automated reaping of misbehaving services and AZ failover
• SPOF
Fault-tolerant distributed services architecture
AWS User Group UK 2014
15. AWS User Group UK 2014
Orchestration Layer Overview
• External orchestration
services responsible for all
environments
• Internal orchestration
services responsible for
the local environment only
16. AWS User Group UK 2014
External Orchestration
Layer under the hood
• The external orchestration
layer is built on the same
platform and shares the
same distributed,
scalability and resiliency
specifics
• Each external
orchestration service
instance has a “global”
view of our infrastructure
• Relies heavily on STS to
operate across different
accounts and regions
17. AWS User Group UK 2014
Inside an environment: Auto Scaling and service provisioning
18. • Increased operational and deployment complexity - requires constant service
resource utilization monitoring and manual shuffling.
• Risk of performance impact due to “noisy neighbours”
• Suboptimal resource management
AWS User Group UK 2014
Challenges
19. AWS User Group UK 2014
Micro-services + Containers + Scheduling
20. • Increased operational and deployment complexity – requires constant service
resource utilization monitoring and manual shuffling
On-demand infrastructure resources and services provisioning based on SLA
• Risk of performance impact due to “noisy neighbours”
Each service is isolated from the rest
• Suboptimal resource management
Services are grouped together in the most optimal way. We expect up to 30%
cost reduction of our worker services operational cost once we roll out this
solution
AWS User Group UK 2014
Micro-services + Containers + Scheduling on AWS will be a dominant
architecture pattern in the next few years
Challenges