The document discusses Aviran Mordo's presentation on Wix's journey towards continuous delivery. Some key points:
- Wix has transitioned from traditional waterfall development to continuous delivery, deploying changes around 60 times per day.
- This was enabled by adopting DevOps practices like test-driven development, feature toggles, A/B testing, automated deployments, and monitoring.
- Tools like App-Info, New Relic, and custom deployment tools were crucial for implementing continuous delivery at Wix's scale across multiple data centers and cloud providers.
- Transitioning required cultural changes, empowering developers, and embracing risk and failure to improve continuously. Wix now develops and replaces infrastructure
1. Aviran Mordo
Head of Engineering
@aviranm
linkedin.com/in/aviran
aviransplace.com
The Road to
Continuous Delivery
2.
3. Wix In Numbers
• Over 66,000,000 users
• Static storage is >2PB of data
• 3 Data centers + 3 Clouds (Google, Amazon)
• 2B HTTP requests/day
• 1000 people work atWix
11. Where We Were
• Working traditional waterfall
• With fear of change
• With low product quality
• With slow development velocity
• With tradition enterprise development lifecycle
-Three months of a “VERSION” development and QA
- Six months of crisis mode stabilizing system
13. Production System
Approach to Production
• Build only what is needed
• Stop if something goes wrong
• Eliminate anything which
does not add value
Philosophy of Work
• Respect for workers
• Full utilization of workers capabilities
• Entrust workers with responsibility & authority
Taiichi Ohno (1912-1990)
14. Seeing Waste
SevenWastes of
Manufacturing
• Inventory
• Extra Processing
• Overproduction
• Transportation
• Waiting
• Motion
• Defects
SevenWastes of Software
Development
• Partially DoneWork
• Paperwork
• Extra Features
• Building the wrong thing
• Waiting for information
• Task switching
• Defects
16. Lean Product development
Top 5 Most-Used Commands in Microsoft Word
• Paste
• Save
• Copy
• Undo
• Bold
Paste itself accounts for more than 11% of all commands
used, and has more than twice as much usage as the #2 entry on
the list, Save.
32% of the total command usage
17. Scaling challenges – Product
Product Minimum Viable Product (MVP)
• Does MVP meet your product standards?
• What about tooltip, help,first time ux, etc.. ?
• And that can win in a/b test …
To Be
Implemente
d
18. Get out of thought land
• The law of failure
Most new “its” will fail even if they are flawlessly
executed
• Invest less, in-touch less , better ability to admit it fail
Data beats opinions - let the customer decide
Make sure you building the right it before build it right
Quick
Feedback
20. Risk
• Waterfall - minimize number of deployments
• CD - minimize number of changes and impact in $$
Risk = #deployments
* chance of something going wrong (~ number of
changes) * impact of something wrong in $$
21. Small Development Iterations
• No Waterfall
• No Scrum
• No Iterations
• No long documents
• Build something small
• When it is ready, deploy it
- Measure it
- Then fix it
- Repeat, until Dev, Product and Customers are happy
23. What Is The Common Denominator?
• Product manager
• Project manager
• QA
• Operations
• DBA
24. CD is culture & mindset
• Trust the developers
- Empower developers to change production
- Developer knows his system best
• Automation as a default choice
- No more “is it worth to automate ?”
- Everything should be automated
• Welcome to the twilight zone
- Product/Dev/QA boundaries are going down
- Everyone need to care about everything
- Less formality : Corridor - IN , Meeting Room - Out
25. Dev Centric Culture –
Involve the Developer
• Product definition (with product)
• Development (with architect)
• Testing (with QA developers)
• Deployment / Rollback(with ops)
• Monitoring / BI (with BI team)
• DevOps – to enable deployment
and rollback, fully automated
Developer
Product
QA
ManagementOperation
BI
Support Circle
26. • The process for releasing/deploying software MUST be
repeatable and reliable
• Automate everything!
• If something's difficult or painful, do it more often
• Keep everything in source control
• Done means “released”
• Built in quality
• Everybody has responsibility for the release process
• Improve continuously
Continuous Delivery principles
27. Test Driven Development
• No new code is pushed to Git without being fully tested
- We currently have over 40,000 automated tests
• Before fixing a bug first write a test to reproduce the bug
• Cover legacy (untested) systems with Integration tests
28. What people think of TDD
• TDD slows down development
• With TDD we write more code (product + test code).
• TDD has no significant impact on quality
29. What people think of TDD
• TDD slows down development
• With TDD we write more code (product + test code).
• TDD has no significant impact on quality
30. TDD Actual impact on development
• We develop products faster
• Removes fear of change
• Easier to enter some one else’s project
• Do we still need QA? (Yes, they code automation tests)
- We don’t have QA for back-end applications
• Writing a feature is 10-30% slower, 45-90% less bugs
• 50% faster to reach production.
• Considerably less time to fix bugs (almost no need for
debuger)
31. Guidelines for successful TDD
• Tests should run on project checkout to a
random computer.
• Tests should be debugged on a developer’s
machine
• Tests should run fast
• Tests have to be readable – They are the
project’s specs
• Fixture is evil!
33. Is Refactoring Rework?
Absolutely NOT !
• Refactoring is the outcome of learning
• Refactoring is the cornerstone of improvement
• Refactoring builds the capacity to change
• Refactoring doesn’t cost, it pays
34. Refactoring
Refactor from inside out
• Small iterations with tests
• Refactor small methods make
sure the tests don’t break
• Deploy often
Re-write from the outside in
• Write from scratch (one piece at
a time)
• Code duplication sometimes
needed (temporary)
• Protected by Feature Toggle
Before refactoring cover everything with tests
- Legacy code usually covered by IT tests
38. Feature Toggles
• Everyone develops on the Trunk
• Every piece of code can get to production at anytime
• Unused new code can go to production – no harm done
• Operational new code goes with a guard – use new or old code
by feature toggle
39.
40. DB Schema Changes Without Downtime
• Adding columns
- Use another table link by primary key
- Use blob field for schema flexibility
• Removing fields
- Stop using. Do not do any DB schema changes
41. New DB schema with data migration
• Plan a lazy migration path controlled by feature toggle
1. Write to old / Read from old
2. Write to both / Read from old
3. Write to both / Read from new, fallback to old
Backward compatibility is a must
4. Write to new / Read from new, fallback to old
5. Eagerly migrate data in the background
6. Write to new / Read from new
42. Feature Toggle Strategies
(gradual expose users)
• Company employees
• Specific users or group of users
• Percentage of traffic
• By GEO
• By Language
• By user-agent
• User Profile based
• By context (site id or some kind of hash on site id)
43. Feature Toggle Override
• By specific server
• Used to test system load
• New database flows/migration
• Refactoring that may affect performance and memory usage
• By Url parameter
• Enable internal testing
• Product acceptance
• Faking GEO
• By FT cookie value
• Testing
• When working with API on a single page application
45. A/B Test
• Every new feature is A/B tested
• We open the new feature to a % of users
- Define KPIs to check if the new feature is better
- If it is better, keep it
- If worse, check why and improve
- impact of flaws is just for % of our users
46. An interesting site effect on product
How many times did you have the conversion “what is
better”?
- Put the menu on top / on the side
Well, how about building both and A/B Testing?
47. Marking users for persistent UX
• Anonymous user
- Toss is randomly determined
- Can not guarantee persistent experience if changing browser
• Registered User
- Toss is determined by the user ID
- Guarantee toss persistency across browsers
- Allows setting additional tossing criteria (for example new users only)
- Only use this for sections that a user has to be authenticated
48. • Do not mix anonymous and registered tests
• AB test parentage of users with optional filters
• New Users Only (Registered users only)
• By language
• By GEO
• By Browser
• user-agent
• OS
• Any other criteria you have on your users
49. A/B Test Features
• A/B Test Override
• Start
• Stop
• Pause
• Bots are always excluded from the test
52. Gradual Deployment
• Assume two components
• We shutdown one and install on it the
new version. It is not active yet
• Do self test
• Activate the new server it is passes self
test
• Continue deploying the other servers,
a few at a time, checking each one with
self test
A 1.1 B 1.1
A 1.1
B 1.2
A 1.1
A 1.1
B 1.1
B 1.1
A 1.1
A 1.1
B 1.1
B 1.2
A 1.1
B 1.2
A 1.1
A 1.1
B 1.1
B 1.2
A 1.1 B 1.1
A 1.1
A 1.1
B 1.1
B 1.2
53. Self Test / Post Deployment
TestAfter each server deployment run a self test before deploying the next
server.
• Checking server configuration and topology
- Make sure DB is accessible
- Is the schema the one I expect
- Access required local resources (files, config, templates, etc’)
- Access remote resources
- RPC / REST endpoints reachable and operational
• Server will refuse requests unless it passes the self test
• Allow a way to skip self test (and continue deployment)
55. Backward and Forward compatible
• Assume two components
• We release a new version of one
• Now Rollback the other…
A 1.1
B 1.2
A 1.2
B 1.1A 1.1A 1.1
B 1.1
B 1.2
A 1.2A 1.1
B 1.1B 1.1
A 1.1 B 1.1A 1.1A 1.1 B 1.1B 1.1
A 1.0
A 1.2A 1.1 B 1.2B 1.1
B 1.2 A 1.2
A 1.2A 1.1 B 1.2B 1.1
B 1.0
57. Time machine event =
• Deployment capabilities : “no click” deployment
- Dozens of services , 130+ servers, 3 Data Centers
• Backward and forward compatibility at the extreme
field test case
- Mixed versions of services / DB with no service
downtime
• Empowerment
- The power we give to individual
• Risk taken and failure embracement
58. 17,000 Deployments (production changes) a year
Double the velocity from last year
Every 7 minutes production changes its state
(during working hours)
59. Do You Have The Guts To
Deploy 60 Times A Day?
60. CD – prepare to invest…..
• Dev infrastructure - Refactor , Refactor, Refactor
• Testing infrastructure & know how
• Deployment infrastructure & tools
• Automation , Automation , Automation
• Monitoring (business and technical)
• hundreds of aspects
• thresholds use is a Must
• Monitor business KPIs
• Internal & external
• Endless Tuning & learning
61. How does it work – CD Practices
• Test driven development
• Small Development Iterations
• Backwards and Forwards compatible
• Gradual Deployment & Self-Test
• Feature Toggle
• A/B Testing
• Exception Classification
• Production visibility
68. Where are we today?
• We have re-written our flash editor product as an HTML 5 editor
- In just 4 months
• Introduced Wix 3rd party applications (developers API)
- In just 6 weeks
• We are easily replacing significant parts of our infrastructure
• And we are doing ~60 releases a day!
• Production state changes every 7 minutes.
69. Aviran Mordo
Head of Back-end Engineering
@aviranm
linkedin.com/in/aviran
aviransplace.com
The Road to
Continuous Delivery
Read more: The Road to Continuous Delivery: http://goo.gl/K6zEK
Dev-Centric Culture: http://goo.gl/0Vo70t
70. How would you do it?
How will you change Wix session encryption key
with as little service interruption as possible
• Encryption key is currently hard coded in the
framework
• All the services have the encryption key
• User server creates a session
• Services can renew a session
• External services not in the framework also have the
encryption key
Editor's Notes
How many built a website?
Wix is a web publishing platform
Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).
Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).
Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).
One of the key components to successful CD
Full load on a single server
Override size limitation by setting a cookie on the client
Link to purchase on the editor was causing drop in conversion because users went there too soon without intent
Link to purchase on the editor was causing drop in conversion because users went there too soon without intent