Hailo, the taxi app, has served more than 5 million passengers in 15 cities and has taken fares of $100 million this year. I'm going to talk about how that rapid growth has been powered by a platform based on Cassandra and operational analytics and insights powered by Acunu Analytics. I'll cover some challenges and lessons learned from scaling fast!
5. What is Hailo?
• The world’s highest-rated taxi app – over 11,000 five-star reviews
• Over 500,000 registered passengers
• A Hailo hail is accepted around the world every 4 seconds
• Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in
nearly 2 years of operation
ALL YOUR BASE 2013
6. The history
The story behind Cassandra and Acunu adoption at Hailo
ALL YOUR BASE 2013
7. Hailo launched in London in November 2011
• Launched on AWS
• Two PHP/MySQL web apps plus a Java backend
• Mostly built by a team of 3 or 4 backend engineers
• MySQL multi-master for single AZ resilience
•
Get/create/update entity
•
•
Analytics
Text search
ALL YOUR BASE 2013
8. Why Cassandra?
• A desire for greater resilience – “become a utility”
Cassandra is designed for high availability
• Plans for international expansion around a single consumer app
Cassandra is good at global replication
• Expected growth
Cassandra scales linearly for both reads and writes
• Prior experience
I had experience with Cassandra and could recommend it
ALL YOUR BASE 2013
9. The path to adoption
• Largely unilateral decision by developers – a result of a startup
culture
• Replacement of key consumer app functionality, splitting up the
PHP/MySQL web app into a mixture of global PHP/Java services
backed by a Cassandra data store
• Launched into production in September 2012 – originally just
powering North American expansion, before gradually switching
over Dublin and London
ALL YOUR BASE 2013
10. One year on...
• Further decompose functionality into Go/Java SOA
• Migrating:
Entity databases to Cassandra
Analytics to Acunu
Search into Elastic Search
ALL YOUR BASE 2013
15. Considerations for entity storage
• Do not read the entire entity, update one property and then write
back a mutation containing every column
• Only mutate columns that have been set
• This avoids read-before-write race conditions
ALL YOUR BASE 2013
20. Considerations for time series storage
• Choose row key carefully, since this partitions the records
• Think about how many records you want in a single row
• Denormalise on write into many indexes/views
ALL YOUR BASE 2013
27. Analytics
• With Cassandra we lost the ability to carry out analytics
eg: COUNT, SUM, AVG, GROUP BY
• We use Acunu Analytics to give us this ability in real time, for preplanned query templates
• It is backed by Cassandra and therefore highly available, resilient
and globally distributed
• Integration is straightforward
ALL YOUR BASE 2013
31. Events
NSQ
Analytics turns events and SQL-like queries into C* operations
Cassandra stores raw events and intermediate results
ALL YOUR BASE 2013
32. Acunu Dashboards provides real-time visualization
Events
Alerts
NSQ
Analytics turns events and SQL-like queries into C* operations
Cassandra stores raw events and intermediate results
ALL YOUR BASE 2013
34. 1 Define aggregate cubes
CREATE CUBE APPROX TOP(keyword)
WHERE browser, time GROUP BY time
count by day
ALL YOUR BASE 2013
count by
hour of day
uniques by
hashtag
35. 1 Define aggregate cubes
CREATE CUBE APPROX TOP(keyword)
WHERE browser, time GROUP BY time
2 New events update cubes
count by day
ALL YOUR BASE 2013
count by
hour of day
uniques by
hashtag
36. 1 Define aggregate cubes
CREATE CUBE APPROX TOP(keyword)
WHERE browser, time GROUP BY time
2 New events update cubes
count by day
ALL YOUR BASE 2013
count by
hour of day
uniques by
hashtag
37. 1 Define aggregate cubes
CREATE CUBE APPROX TOP(keyword)
WHERE browser, time GROUP BY time
2 New events update cubes
raw events
ALL YOUR BASE 2013
count by day
count by
hour of day
uniques by
hashtag
38. 1 Define aggregate cubes
3 Rich instant queries over cubes
CREATE CUBE APPROX TOP(keyword)
WHERE browser, time GROUP BY time
SELECT TOP(keyword) FROM table WHERE
browser = ‘chrome’ AND time BETWEEN..
GROUP BY d1, d2, ...
JOIN ... HAVING.. ORDER BY ..
2 New events update cubes
+
raw events
ALL YOUR BASE 2013
count by day
count by
hour of day
uniques by
hashtag
39. 1 Define aggregate cubes
3 Rich instant queries over cubes
CREATE CUBE APPROX TOP(keyword)
WHERE browser, time GROUP BY time
SELECT TOP(keyword) FROM table WHERE
browser = ‘chrome’ AND time BETWEEN..
GROUP BY d1, d2, ...
JOIN ... HAVING.. ORDER BY ..
2 New events update cubes
+
raw events
count by day
4 Drilldown to raw events
ALL YOUR BASE 2013
count by
hour of day
uniques by
hashtag
40. 1 Define aggregate cubes
3 Rich instant queries over cubes
CREATE CUBE APPROX TOP(keyword)
WHERE browser, time GROUP BY time
SELECT TOP(keyword) FROM table WHERE
browser = ‘chrome’ AND time BETWEEN..
GROUP BY d1, d2, ...
JOIN ... HAVING.. ORDER BY ..
2 New events update cubes
+
raw events
count by day
count by
hour of day
5 Backfill new cubes using historic data
ALL YOUR BASE 2013
uniques by
hashtag
43. Use Cases
• Infrastructure and Application monitoring
• Real-time A/B testing of app layout and incentives
• Real time geo-view of supply/demand for drivers
• Several more in the pipeline!
ALL YOUR BASE 2013
45. We like Cassandra and Acunu
• Solid design
• HA characteristics
• Easy multi-DC setup
• Simplicity of operation
• With Acunu, rich queries again, easier denormalization
ALL YOUR BASE 2013
46. Lessons for successful adoption
• Have an advocate, sell the dream
• Learn the fundamentals, get the best out of Cassandra
• Invest in tools to make life easier
• Keep management in the loop, explain the trade offs
ALL YOUR BASE 2013