Many small startups build their systems on top of a traditional toolset like Tomcat, Hibernate, and MySQL. These systems are used because they facilitate easy development and fast progress, but many of them are monolithic and have limited scalability. So as a startup grows, the team is confronted with the problem of how to evolve the system and make it scalable. Facing the same dilemma, Wix.com grew from 0 to 70 million users in just a few years. Facing some interesting challenges, like performance and availability. Traditional performance solutions, such as caching, would not help due to a very long tail problem which causes caching to be highly inefficient. And because every minute of downtime means customers lose money, the product needed to have near 100% availability. Solving these issues required some interesting and out-of-the-box thinking, and this talk will discuss some of these strategies: building a highly preformant, highly available and highly scalable system; and leveraging microservices architecture and multi-cloud platforms to help build a very efficient and cost-effective system.
3. @aviranm
Wix in Numbers
Over 72M users (website builders)
Static storage is >2PB of data
3 data centers + 3 clouds (Google, Amazon,Azure)
2B HTTP requests/day
1000 people work atWix
4. @aviranm
Initial Architecture
Built for fast development
Stateful login (Tomcat session), Ehcache, file uploads
No consideration for performance, scalability and testing
Intended for short-term use
Tomcat, Hibernate, custom web framework
Lighttpd
(file serving) MySQL
DB
Wix
(Tomcat)
5. @aviranm
The Monolithic Giant
One monolithic server that handled everything
Dependency between features
Changes in unrelated areas of the system caused deployment of
the whole system
Failure in unrelated areas will cause system wide downtime
8. @aviranm
Concerns and SLA
DataValidation
Security / Authentication
Data consistency
Lots of data
Edit websites
High availability
High performance
Lots of static files
Very high traffic volume
Viewport optimization
Long tail (immutable)
Serving Media
High availability
High performance
High traffic volume
Long tail (mutable)
View sites, created by
Wix editor
10. @aviranm
HTML
Editor
Flash
Editor
MSM
Private
Media
Public
Media
Editor Segment Public Segment
Premium
Services
eCommerse
List DB
App
Builder
App
Store
App
Market
Dashboard
Statics/me
dia
Mailer
TimeZone
Public
HTML API
Public API
(Flash)
MSP
Public
Server
HTML
Renderer
HTML SEO
Renderer
Flash
Renderer
Flash SEO
Renderer
Sitemap
Renderer
Robots.txt
Renderer
User
Server
Template
Viewer
ContactsHUB
Activit
y
Site
Members
Provided
Mailing
Service
Comments
Snapshoter
User Pref
Feed Me
Shout-out Hotels
PETRI
Site Pref
Dist LoggerSlicer
eCom
Renderer
eCom Cart
eCom
Checkout
eCom
Catalog
eCom
Orders
Payment
Facade
Account
Info
HTML API
HTML
Embeder
BlogMobile
13. @aviranm
Microservices Guidelines
Each service has its own DB schema (if one is needed)
Only one service should write to a specific DB table(s)
There may be additional read-only services that directly
accesses the DB (for performance reasons)
Services are stateless
No DB transactions
Cache is not a building block, but an optimization
14. @aviranm
Microservices Tradeoffs
Each service has its own DB schema (if one is needed)
Gain - Easy to scale microservices based on service level concerns
Tradeoff – system complexity, performance
Only one service should write to a specific DB table(s)
Gain - Decoupling architecture – faster development
Tradeoff – system complexity / performance
May have additional read-only services that accesses the DB
Gain - Performance gain
Tradeoff - coupling
Services are stateless
Gain - Easy to scale out (just add more servers)
Tradeoff - performance / consistency
No DB transactions
Gain - Better DB performance, easier to scale
Tradeoff - system complexity
16. @aviranm
Editor Server
Immutable JSON pages (~3M / day)
Site revisions
Active – standby MySQL cross datacenters
Editor Server
MySQL
Active
Sites
MySQL
Archive
18. @aviranm
Protect The Data
DB outage with fast recovery = replication
Data poisoning/corruption = revisions / backup
Make the data available at all times = data distribution to multiple
locations / providers
21. @aviranm
No DB Transactions
Save each page (JSON) as an atomic operation
Page ID is a content based hash (immutable/idempotent)
Finalize transaction by sending site header (list of pages)
Can generate orphaned pages, not a problem in practice
23. @aviranm
Wix Media Platform (WixMP)
Eventual consistent distributed file system
(2PB user media files)
Dynamic media processing
Multi datacenter aware
Automatic fallback cross DC
Run on commodity servers & cloud
26. @aviranm
Public Segment Roles
Routing (resolve URLs)
Dispatching (to a renderer)
Rendering (HTML,XML,TXT)
Public
Server
HTML
Renderer
HTML SEO
Renderer
Flash
Renderer
Sitemap
Renderer
Robots.txt
Renderer
www.example.com
Flash SEO
Renderer
28. @aviranm
Publish Site
Publish site header (a map of pages for a site)
Publish routing table
Publish site header / routes (CQRS)
Editor Segment Public Segment
29. @aviranm
Built For Speed
Minimize out-of-process hops (2 DB, 1 RPC)
Lookup tables are cached in memory, updated every few minutes
Denormalized data – optimize for read by primary key (MySQL)
Minimize business logic
30. @aviranm
How a Page Gets Rendered
Bootstrap HTML template that contains only data
Only JavaScript imports
JSON data (site-header + dynamic data)
No “real” HTML view
33. @aviranm
Why JSON?
Easy to parse in JavaScript and Java/Scala
Fairly compact text format
Highly compressible (5:1 even for small payloads)
Easy to fix rendering bugs and cross browsers issues (just
deploy a new code)
36. @aviranm
Serving a Site – Sunny Day
Archive
CDN WixMP
Browser
http://example.wix.com
Store HTML
to cache
HTTP
Request
Notify
site view
LB
Public
Renderer
HTML
Resources / Media
HTTP
Request
37. @aviranm
Serving a Site – DC Lost
Archive
CDN WixMP
Browser
http://example.wix.com
LB
Public
Renderer
LB
Public
Renderer
Change DNS
HTTP
Request
38. @aviranm
Serving a Site – Public Lost
Archive
Browser
http://example.wix.com
LB
Public
Renderer
Get
Cached HTML
Version
HTML
HTTP
Request
LB
Public
Renderer
Fallback to 2nd
DC
39. @aviranm
Living in the Browser
CDN WixMP
Browser
http://example.wix.com
LB
Public
Renderer
Editor Pages
Fallback
JSON /
Media
HTML
HTTP
Request Fallback
40. @aviranm
Summary
Identify concerns and SLA for different parts of the system
Build redundancy in critical path (for availability)
De-normalize data (for performance)
Minimize out-of-process hops (for performance)
Take advantage of client’s CPU power
Editor – Read immediately after write – Small working set
Viewer optimize for reads
We fight for every ms. Page view = many resource downloading
Read-only services only if it is part of the same business functionality
Read-only services only if it is part of the same business functionality
Immutable data helps handle eventual consistency
MySql is a great key-value store
Not all data is equal (only 6% of websites are edited 3 months after creation)
Revision keep data safe from poisoning Pay in storage and management
We can change the arrows as we want
Tech vendor lock is a myth, easy to change the api (small dev effort).
Invest in data distribution.
Evaluation of new platform starts by putting the data.
Save pages on JSON
Upload to static storage
Explain what is JSON and what is HTML
UPS dies, secondary power source connected to the same UPS