0/5
You’ve built a new feature in your app that is ready ship. Or is it? How can you be sure you’ve not introduced regressions in cases you forgot to test? What if your code crashes only on certain devices? Could the feature freeze up for a few users?
Shipping frequently with little to no functional, UX or performance issues or regressions is not easy - but it’s also a problem that has been solved before. Where things get a lot more interesting is how to keep the same quality bar when you have hundreds of pull requests going in every day, with tens or hundreds of developers working on the same project? How do you test that your app still works - with this kind of scale?
In this talk, you’ll learn about the different approaches we combined into a system used by hundreds of mobile engineers at Uber to test our native iOS and Android apps during development, at release as well as when in production. We’ll talk about what did and what did not work for us on our journey of iterating frequently and continuously improving the quality bar.
2. ● Engineering manager @Uber, in Amsterdam
● 10+ years of software development (Skyscanner, Skype, JP Morgan alumni)
● Full-stack, iOS, Android, (Windows Phone)
Introduction
18. ● 600+ cities, 65+ countries, 6 continents
● 10 engineering offices (4x US, Amsterdam, Denmark, 2x India, Sofia,
Vilnius)
● 18,000+ people, of which 2,500+ engineers & 400+ mobile engineers
Some Uber facts
19. Hundreds of mobile
engineers?
Request a ride
Fare split
Cash
Uber for Business
Credit card rewards points
Promotions
Promotions
Safety
Over 10 ways to pay
Scheduled
rides
Drive with Uber
Uber Eats, Freight,
Bike, Rental...
Experimentation
65+ countries,
600+ cities
Performance
Cash
Instant payments
Maps & navigation
uberPOOL
Driver incentives
App health
Developer tools
Networking
Feed cards
Driver experience
Driver recognition
Airport pickup
Uber Family
Beacon
Campaigns
Fraud EATS app
Driver app
Freight app
Restaurants app
Other apps
Fleet app
20. What can “at scale” mean?
● More functionality
● More users & regions, locales
● More code
● More engineers
● More engineering offices & locations
● More automated testing
● More apps
21. ● More functionality
● More users & regions, locales
● More code
● More engineers
● More engineering offices & locations
● More automated testing
● More apps
What does “at scale” mean?
● More bugs
● Smaller/local bugs have bigger impact
● Longer build times
● Communication overhead
● Developer systems need to work 24/7
● Longer time to run tests
● The same problems repeating
Problems
23. … at Scale … at Uber … Framework
Continuous testing...
24. A few things I found different @Uber compared to my previous experience:
● No formal QA role, testing teams or dedicated DevOps team
● Dedicated team(s) owning testing infrastructure & developer tooling
● More formal planning process
● No staging systems: test tenancies instead
● Blameless postmortem culture
Engineering culture
27. Continuous
Integration
arc diff
Phabricator diff
Local
validations
Code
reviewers
● Commit message validation
(e.g. test plan, revert plan)
● Linting
Herald
rules
Rules like:
● “If certain files are touched, add
{certain people} as reviewers
● If the files added contain a certain
phrase, add a comment to the diff
Build results
Do a build with:
● Linting
● Unit tests
● Static code analysis
Create a pull request
30. ● Our lint rules are extensive, evolved since the early years
● NEAL: our language agonistic linting platform (open sourced)
Linting: a first class citizen
39. Test tenancy
Staging Production
code (master)
Test
accounts
Production
accounts
Production
accounts
Test
accounts
Test
tenancy
Production
tenancy
Staged rollout
code (master)
Staging & production systems Production system with test tenancy
48. Facts
● Bugs will be introduced that none of the previous tests catch
● With native apps
○ New builds can take days to ship due to the app store approval
process
○ Users might not update their apps for a while.
Conclusion
● Every change should be revertable, remotely.
● Let’s use backend-controlled feature flags
Rolling out to production on mobile
50. Rollout can be risky if the population is large & there is no monitoring.
Staged rollout
● Control user exposure in early stages via a feature flag
● Monitor the impact on key business metrics at each stage
Rolling out to production (not just) on mobile
56. Continuous testing process @Uber
Write code & land
to master
Pre-release
testing
Ship to users
● Staged rollout
● Monitoring & alerting
○ Crash reports
○ Business events
○ Performance
57. The mobile testing lifecycle
Write code & land
to master
Pre-release
testing
Ship to users
In production
Build cut Release
Staged rollout
& monitoring
Code & functional quality
checks
Functional & UX quality
checks, hotfixes
Are we done testing?
Rolled out
65. The mobile testing lifecycle
Write code & land
to master
Pre-release
testing
Ship to users
In production
Requirements &
planning
Product & engineering spec, with testing plan
Outages &
postmortems
Uh-oh...
“We did not do proper planning.”
“We did not test this edge case.”
“We did not have a test plan.”
66. The mobile testing lifecycle @Uber
Write code & land
to master
Pre-release
testing
Ship to users
In production
Requirements &
planning
Staged rollout
& monitoring
Code level quality checks Functional & UX quality
checks
Outages &
postmortems
Monitor & triage issues/alerts
Spec & testing plan
Build cut Release
Rolled out
67. … at Scale … at Uber … Tools & a
Framework
Continuous testing...
75. The Continuous Testing Pyramid
Manual
tests
UI tests
Unit Tests
Dog
fooding
Blameless
postmortems
Code reviews
Continuous
integration
Monitor
Alert
Triage
Things going wrong
for customers
Team owning testing infrastructure
To make all of this scale:
Improve
processes
& systems
All engineers
All engineers
All engineers
All teams
All employees
All teams
76. Continuous testing at Scale
Why do we test?
To minimize the business impact of mistakes
while maintaining good execution speed.
As you scale, iterate on the tools you use, your team
structure & processes to keep doing this.
77. Gergely Orosz
Engineering Manager, Uber Amsterdam
Thank you Open sourced tools for more efficient testing
● uber.github.io
● Language agonistic linting platform: NEAL
● Android
○ Nanoscope (tracing tool)
○ NullAway (static checks to avoid
NullPointer exceptions)
○ OkBuck: use the buck build system on
a gradle project
@GergelyOrosz
eng.uber.com