Take a look at how Netflix uses `conditional dependency graph` and `runtime dynamic bundling` to generate unique UI bundles and the challenges involved in building such a system. Learn about how we hacked Webpack for our needs, leveraged Abstract Syntax Tree (AST) to identify conditional dependencies in our dependency graph and glued them all together to build a highly scalable, server side JS and CSS bundler, that serves these unique user experiences to millions of Netflix customers across the globe.
54. How do we build a
conditional dependency graph
with Webpack?
55. We built a Webpack plugin and worked
with Abstract Syntax Trees (ASTs)
● Hook: parser.hooks.program
● Works for both :
○ @condition syntax
○ $$conditions$$ syntax
Good Morning Everyone
How are we doing?
Welcome to my talk: Conditional Modules and Dynamic Bundling:A Netflix Original
Okay, so Netflix!
Is there anybody who doesn't know what Netflix is?
Netflix is an online video streaming service and we have a wide variety of programs and several Original shows, movies and documentaries.
What is pretty amazing is that we are now in over 190 countries and have over 140M subscribers worldwide.
Yes #netflixeverywhere!
My name is Rajat Kumar
I work at Netflix.
I am part of Node.js Platform team
And we do everything
from building Nodejs libraries To writing Nodejs services that support our UI and product teams.
Today I am going to talk about bundling Javascript files at Netflix and
how we challenged some of the common conventions to solve our set of problems with bundling at Netflix.
Alright
this talk is split into 4 parts
I will start by talking about the problem - what exactly is it that we are trying to solve
Then go on to talk about how we worked through the problem and what we ended up building
And finally end the session with some of the things that we plan to do in near future.
Netflix is a data driven product development company.
At Netflix
we do a TON
of A/B testing.
Our TV App and website netflix.com is built using React and developers focus a lot on gathering metrics
Data gathered through these tests helps us provide our subscribers with an even better Experience.
For example - on our netflix.com member homepage
There are several components that
Have been tested and retested many times
This shows what we might be testing at a given point in time
It could be as simple as testing the Category title,
to testing the size of the box shot,
to even as minute as testing the Netflix’s N logo on the boxart
We run hundreds of AB tests
And with each AB test it is quite possible each subscriber might see a different version of same UI and gets a personalized User Experience
The user experience is not only driven by AB Tests but also on other dimensions
Like - user’s browser, geo-location, users device like TV etc
Lets say -
we are AB Testing two versions of `Search` Component.
Lets call them current Search experience and the new Search experience.
So as an engineering team it is our responsibility to ensure we can send Javascript code down to the user so that we can test these 2 variations of the Search experiences
Our modular javascript code would probably look similar to this.
Here we have an entry point app.js and we have 2 search components and for running each search experience the javascript code needs its dependencies and sub-dependencies
if you are in current Search Experience A/B test cell, The JavaScript bundle that we send to the you - must include the current search component and dependencies that are needed by current search experience to work.
Likewise if you are in new Search Experience A/B test cell then you should only get new Search components and dependencies that are needed by new search experience to work.
So you can imagine that the code for app.js would look like this. Import currentSearch and import newSearch,
then somewhere in the code we render only one of the component
We import both the experiences even though we know only one of them will be executed.
When we import both these experiences,
our package contains all the files needed to support both currentSearch and newSearch component.
But the fact is that several of these files will never be executed by the client device.
A user will only be seeing one of the search experience. So, do you see the problem?
The Problem
so every great story starts with a problem!
And we just saw how we are packaging 2 mutually exclusive code execution paths together…
The fact is Netflix apps are huge, so when we put together such experiences togather our packages become really big! And these sizes are after applying uglification, minification and all sorts of optimizations.
Do you think our users will appreciate downloading 10megs of JS code on their browser? I will not!
We see - poor performance from these apps, painfully longer time to load and time to interactive. End result: Poor User Experience
We see - poor performance from these apps, painfully longer time to load and time to interactive. End result: Poor User Experience
So the challenge in front of us was - improving the overall user experience,
The common conventions says we should be building and sending smaller packages! Can we build smaller packages that are very specific to user’s AB test allocation?
But is that it?
Turns out, Unfortunately, the answer is No!
We run hundreds of AB tests…
and
That has side effects for our team!
It affects the way we approach the task of javascript packaging
Lets see how
When people think of AB testing
They typically think of 2 variations of a test experience variation A and variation B
At netflix, we do `multi-variate` AB testing
`multi-variate` AB testing means - multiple variations for each test
And each cell leads to a slightly different experience
Additionally, we run multiple `mutli-variate` tests
Something like this.
We are collecting data to gauge which cell performs better.
How users are allocated to these tests is completely random,
as long as they meet certain criteria,
they can be allocated into any of these AB test cells
They see a user experience that is unique and is based on test cells they are allocated to.
how many such custom and highly targeted packages do we need to build and serve?
Any guesses on how many packages we should be building here?
The math here is - number of items in each set multiplied by each other,
i.e number of test cells in each test multiplied by each other
Lets make it simple - consider just 15 A/B tests
Let’s assume, we just run 15 AB tests
with 4 variants in each test
doing our math
this number is...
This number is 4 to the power 15
And thats is
Over a Billion unique packages
That is a huge number of unique experiences we want to serve.
And yes, with more AB tests this number is just going to grow
But we run 100s of AB Tests, and assuming each test still has only 4 variations
Then this number is mind boggling - 4 ^ 100 is a very large number number, a number that I cannot fit on this slide
So mathematically we know we might be serving billions of packages
The problem is pretty clear now
We want a system so that we can serve billions of smaller packages -
These packages need to be highly targeted and tailored specifically to a Users AB Test allocation, device, region and few other dimensions.
How can we even manage packaging at this kind of scale?
Can we really pre-build billions of different variations? Of course not!
At this scale it is unreasonable to prebuild all the variations. Do we all agree?
So we dressed up like Defenders and decided to solve this. We were as are as serious as Luke Cage and Jessica Jones. Just to clarify - I wore the suit.
Most of the solutions attempt to tackle this at the client side.
We thought pretty hard about it and we realized something - we have a server, if we can build a service that can quickly package files and serve the targeted packages then we might be able to solve it.
We are talking about generating the bundles on-demand.
We looked back
and noticed that developers use simple conditions to gate the experience.
So for the current and new Search experience
you can think of the 2 variations being gated by
a boolean flag called newSearchExperience
For our runtime bundling to be successful it needs to be fast, that means we should do some pre-work.
What if we can pre-determine the conditions or flag that triggers the the inclusion of a file, and provide that information to our on-demand bundler?
We cannot expect the bundler to figure out all of the repo’s structure at the time of on-demand bundling.
What we are looking for is,
Step 1 at Build Time - use some static analysis magic to determine what are the different AB test variations that exists, and determine the conditions or flag that triggers the the inclusion of a file,
And then step 2 at Runtime - when devices ask for the JS package we apply some smart resolution logic, figure out the the files of files needed for AB Tests experience and finally generate a package dynamically
We chose NodeJS to do the request time resolution of the files to be included in a bundle,
and decided to use Webpack to do the static analyses at build time
We chose webpack because it has a great ecosystem of plugins,
most developers are already comfortable working with it.
Its performance is good and we felt it had all the right hooks and options to customize the behaviour
So the proposed build process was -
Phase 1 Webpack reads our repo, analyses our code and produces an artifact.
This artifact has
all the metadata of how different files are related aka dependency graph,
all metadata needed to resolve conditions,
all the transpiled and optimized source
Think - like a binary of the App
Phase 2:
Happens on the server side, the server uses the same artifact to generate packages.
The artifact acts as the single store which contains
the dependency information,
All the source code
Conditions and what it takes to resolve the conditions.
So this artifact is like whole repo put together in one file
We understand the problem, we know what kind of system we want to build
and now lets explore the some concepts and questions that we need to solve.
To make our build process work, it is important to understand what exactly needs to go into the artifact.
This artifact should ideally have
The dependency graph,
This artifact should ideally have
all metadata needed to resolve conditions,
And -
all the transpiled and optimized source
Think of it like a single file binary of the App.
It has all the content from your source files from the repo!
You can see the similarity between this file and the way webpack generates its output file.
(This artifact should store the metadata that says - newSearch component and its dependencies should only be included if the condition `newSearchExp` is true.
That means we need to store all the conditions, modules that cannot be included until the conditions are true)
We want to use webpack to build this conditional dependency graph.
And we want webpack to then serialized the graph into a file and this file will be our artifact file.
So the terminology - conditional dependency graph sounds complicated, but it isnt.
Its a dependency graph with additional data that highlights the fact that app.js needs newSearch component only when newSearchExperience flag is true.
So newSearch Component is the conditional module, and newSearchExperience is the condition.
So, the next question we needed to solve is -
How do we mark a JS file or module as conditional?
And how do we make Webpack’s build process understand that the module is conditional?
We introduced 2 ways of marking a file as conditional:
The first option available to developers was using @condition <conditionName>.
Soo= from our prev example of current and new Search component, the newSearchComponent could just announce to that hey I can only be included if `newSearchExperience` condition is true. If that condition is false, please dont include me.
We want Webpack to see this syntax and mark the edge in the dependency graph as conditional
The 2nd option available to devs was to require a file conditionally. That is to let the parent file decide which child component to include.
In this case app.js uses a special $$conditions$$ syntax to determine which one should be included in the package.
Again, we wanted Webpack to see this $$condition$$ syntax and mark the 2 edges between app.js -> newSEach and app.js -> currentSearch as conditional
So, this brings us to our next question, How do we build a conditional dependency graph with Webpack?
Simple - we built our own Webpack plugin. This plugin could understand the 2 syntaxes using AST parsing.
Webpack provides a number of hooks, and we used those hooks to look at what files are being resolved by Webpack.
The parser’s program is a hook on the Webpack parser and it returns the AST of the current file. Using this hook the plugin knows that these are certain files which either mark themselves as conditional or the parent file marks them as conditional
We maintain the list of such conditionals and keep adding this information to the webpack’s internal dependency graph
Finally, when webpack is done resolving and building the various files, we latch on to Webpack compiler’s emit hook.
This is the right time to instruct Webpack to write out our Artifact file and we serialize all the graph information into this file.
At this point, we have successfully serialized conditional dependency graph which represents the entire repo.
In a way you can think of Artifact as the “build” or “compiled” output for your app
We still have to perform the bundling. We solved one part of generating the artifacts.
Let’s tackle the next.
At netflix our on demand bundling service is called Codex.
And then, the server can use the same artifact to serve user requests and generate bundles on demand based on the conditions that are valid. Theoretically this seems doable. Artifact provides all the necessary information around how the repo is structures, what and how are the dependencies between various components.
TODO: set the right arrows, mark it clearly what is happening, what is being bundled and requested
So we can expect many versions of the app, so we will have many versions of the artifact over time. Each build means new artifact. You can imagine these artifacts as binary of some sorts. With just these artifacts we want the ability to generate all the possible combinations of bundles for that particular build version.
And these artifacts are for our website (netflix.com), the TV app (which is also built using React). These apps will have multiple releases and all of a sudden we have multiple apps producing multiple artifacts that we need to server
So we can imagine that our service Codex gets a request from our subscribers watching netflix on their TVs or Browsers.
And Codex is creating the highly targetted packages for these users
It is able to figure out different files to combine and looks at
what are the various AB tests the user is allocated to,
based on what device are they access it from,
based on geo-location and so many other dimensions
At this point you must be wondering -
How exactly do we resolve the conditional dependency graph?
How does the runtime even know that certain experiences needs to be unlocked for a user because they belong to the right AB tests and are using the right browser or device etc.?
Let’s look at it in detail.
The one of the interesting things we did - was around the bundle request URLs.
The URLs you see here are actually templatized
and they tell us exactly which artifact is being requested, which app, which version, which entry point and what are the experiences to unlock!
The last section containing the comma - delimited values are the actual conditions that are true for the user.
So the URLs contain almost everything we need to resolve our files
The URL clearly specifies hey look for webui, version 3 artifact
And by the way, when you are resolving, just consider that these conditions are set to true
And start resolving your dependency graph, starting from app.js, and apply the conditions
And once we apply the conditions, we know the exact list of files we need to package, we cobble those together and send it back!
So the actual graph resolution algorithm we use is quite simple.
Consider the graph resolution with these 2 boolean conditions set to true.
As you can see as we start traversing the graph, we immediately hit a fork and need to serve only one of the experiences. In this case `newSearchExp` is true, so we bring in all its deps and sub-deps
As we walk down the graph, we stop until we find another conditional edge and then re-evaluate if the conditions are true or not.
Ex we stop at `smartPlayback` and `smartSession` conditional edges. At this point we see smartPlayback is true and we can pull in this file and its dependencies but because `smartSession` is not true we dont travel down that path.
So, once we resolve the conditions, we might get the final bundle consisting of only these files.
This is a very small bundle, highly targeted, very specific user experience for the User
// improve the graph, dim or darken the one’s that is not used
The original full dependency graph has now been pruned to include only the one’s that makes sense for this user with current set of conditions.
No unnecessary files will make it into the bundle. We have cut down the graph by more than a half just by this resolving the conditions
At this point you must be like - isn’t all of this graph resolution and bundling really slow?
We have spent a lot of time optimizing this and now with optimizations like
use of memoization and storing the graph information on the instances
Using multi level caching of the artifact information
We are seeing a response time that is < 70ms for the website app and closer to 400 ms for our TV app. This is p95 latency seen to package all the files together. We are still optimizing the bundling process and as we add more optimizations we are confident that we can bring down these numbers even further.
Another benefit of having a URL that defines the complete state is that now we can use CDN’s effectively.
Codex services are fronted by a CDN (akamai and Netflix’s own CDN) and that ensures our bundler as a service is never serving the same bundle again and if we do see a request for the same bundle, CDN can serve the
To summarize the whole process
UI developers clearly mark the conditional flag in their code and are explicit about telling which condition should fetch which UI component.
This is achieved with @condition code comment and or $$conditions$$.<conditionName> syntax.
Next you run webpack with our plugin. Plugin is designed to read the webpack internals, figure out the dependency graph, find all the conditions in the code, find all the modules that should be conditional and builds a conditional dependency graph.
This graph is then serialized and saved. You can consider these serialized artifacts as the build output for that version of the app.
Next, we just provide this artifact to the codex service which our the dynamic bundler capable of building packages at runtime
Our subscribers use netflix.com or apps and they request for the bundles passing the features or abtests a user belongs to, and Codex - our Bundling as a Service now builds the right package and serves it. The served bundle also gets cached at our CDN layer.
Building Codex at Netflix scale wasn’t easy and we stumbled a lot and hey nobody ever said setting up a new service is easy! As I wrap up here, I wanted to talk a little bit about what are the next few things we plan to work on
To keep our services performant we need to make sure we only have the active and most freq used artifacts in memory and we have incentives to quickly eliminate those artifacts that are not used.
TVUI and WebUI teams push out new versions at least twice in a week!
So that is minimum of 4 release uploaded into Codex. So the question is - how far back should we go in time and support the bundles? We are still debating and figuring out the best strategy. We have manual steps to remove artifacts but we want to automate that and use smart heuristics and make some judgement about it
We want to leverage the awesome Webpack and its ecosystem but at the moment we have a few restrictions since we are creating a custom artifact and there is a specific way to serve and consume bundles that depends on this artifact.
At the moment we do not support sourcemaps with current setup and that is something we want to build very soon.
We are exploring ways to make sure our bundles can work with existing webpack’s dev server, that will enhance the local developer experience.
We are also exploring bundle optimization options that webpack can do on our dynamically generated bundles.
Thats all for today, my name is Rajat Kumar.
Feel free to reach out to me, you can also find me after this and I will be happy to talk and take questions.
Thank you and have a great day.
The Problem // so every great story starts with a problem!
The Problem // so every great story starts with a problem!