SlideShare a Scribd company logo
1 of 50
Screaming Fast JSON parsing
Karthik Ramgopal
Who am I?
Engineer
Mobile Infrastructure lead
Former engineer on Flagship and Pulse app teams
Obsessed about performance
Connect with me: https://www.linkedin.com/in/karthikrg/
Our user base
LinkedIn’s Android app family
Job Search
Lookup
Pulse
Slideshare
Sales Navigator
Lynda
Recruiter
Students
Android device and network diversity
● Samsung Galaxy S6
● 4x2.1 GHz Cortex-A57 + 4x1.5 GHz Cortex-A53
● 3 GB RAM
● LTE (100 Mbits/s)
● Samsung Star Pro
● 1 Ghz Cortex A5
● 512 MB RAM
● EDGE (384 Kbps)
LinkedIn client app high level architecture
Frontend API server
LinkedIn uses JSON to talk between apps and server
What is JSON?
JavaScript Object Notation is a data serialization format.
Key value encoded data.
Values must be string, boolean, number, array, object, null.
Text based, Light weight (relatively), Human readable.
Wide support across programming languages/platforms
What else is out there?
XML (eXtensible Markup Language)
(+) Text based and human readable.
(-) Very verbose.
Binary Data Formats
Examples include MsgPack, ProtoBuf, FlatBuffers, Cap’n’Proto etc.
(+) More compact than JSON. Positional index based formats even omit keys.
(+) Backing schema to describe data structure with platform specific binding generators
(+) Much faster to parse than JSON when using vanilla parsing techniques.
(-) Not human readable.
(-) No native parsing support in web browsers.
(-) Removed fields still occupy some space in positional formats.
(-) Schema evolution MUST preserve field order in positional formats.
Data Flow
Parser Model Binder View Binder
Data
(JSON/XML/Binary) DataModel ViewModel
Network
Fission
DataModel
MMAP Cache
Binary
What affects JSON parsing performance?
CPU
Validating structure and tokenizing.
Large number of branches causing pipeline stalls.
Memory
Large number of small allocs on the heap
Causes memory churn slowing down the allocator
Garbage collection pauses
Types of JSON parsers
Who controls the flow of parsed data to the consumer?
Pull parser (Consumer controls)
Push parser (Parser controls)
How many times is the data processed?
Once (traditional parsers)
Twice (index overlay parsers)
How is the data processed?
JSON vs Binary
JSON (naturally) has a size disadvantage over binary
But, it is human readable and has wider multi-platform support
Schema evolution is easier
Size does matter or does it?
JSON compresses very well being text based and having key repetition
Binary formats don’t compress as well
With compression, size over the wire is very comparable
Decompression cost is similar, but after decompression binary is smaller
Format Compressed size (gzip) Uncompressed size
JSON 35.2 KB 309.5 KB
ProtocolBuffers 33.7 KB 178.2 KB
FlatBuffers 34.1 KB 192.8 KB
Cap’n’Proto 33.8 KB 166.3 KB
LinkedIn Feed 20 items (90th percentile sizes)
Comparison of Android JSON parsing libraries
Parser Streaming Reflection Parse time (ms) Allocation (KB)
JSONObject No No 297/281 2397/2371
JsonReader Yes No 199/187 409/396
Alibaba streaming Yes No 72/70 220/185
GSON Yes Yes 521/486 1135/302
Moshi Yes Yes 493/311 1088/341
Jackson Databind Yes Yes 402/78 1192/191
Jackson streaming Yes No 79/77 219/187
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
● Using reflection introduces a massive first time penalty.
● Alibaba and Jackson streaming win hands down with Alibaba having the slight edge.
What is the ideal way to parse network responses?
Streaming (SAX) vs blob (DOM) parsing
Stream means parsing can begin before network download finishes.
Memory pressure/Garbage is reduced with streaming.
Typically harder to code by hand (need to handle incremental data load etc.)
Minimize transformations
Typical parsing involves JSON -> Map -> Model object POJO.
Intermediary transformation involves CPU and memory.
Go directly from JSON to POJO.
Android specific code generation considerations
Prefer fields instead of methods for accessors on POJO.
65k method count limit pre Android L
Virtual function execution penalty
Use primitive types wherever possible
int instead of Integer for example
Boxed values are allocated on the heap and result in unnecessary memory churn
Generate compact code
Surely someone must have figured all this out?
Yes! Open source codegenerating JSON parsers based on Jackson streaming.
Instagram JSON parser
LoganSquare (Uses a teeny bit of reflection)
How does the generated code look?
{
“numConnections” : 20,
“name”: “John”
}
profile.json
Profile build(JsonParser parser) {
String name;
int numConnections;
parser.startRecord(); // Consumes ‘{’
while (parser.hasMoreFields()) {
String field = parser.getText();
parser.startField(); // Consumes ‘:’
if (“numConnections”.equals(field)) {
numConnections = parser.getInteger();
} else if (“name”.equals(field)) {
name = parser.getText();
} else {
parser.skipField();
}
}
return new Profile(numConnections, name);
}
But binary still wins!
Much faster (Lesser CPU consumption)
Much less intermediary memory allocs (Memory churn/Garbage reduced)
Parser Streaming Reflection Parse time (ms) Allocation (KB)
Alibaba streaming Yes No 72/70 220/185
Jackson streaming Yes No 79/77 219/187
Protocol Buffers Lite Yes No 32/31 66/62
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
The gap is wider on lower end devices
Binary is ~4x faster
Could be the difference between delight and despair!
Parser Streaming Reflection Parse time (ms) Allocation (KB)
Alibaba streaming Yes No 377/370 220/185
Jackson streaming Yes No 392/397 219/187
Protocol Buffers Lite Yes No 99/97 66/62
LinkedIn Feed 20 items (First/Subsequent) Galaxy Star Pro
Closing the gap with binary
Make the CPU do less work when parsing JSON
Fewer memory allocations
Reduce garbage and memory churn
All when parsing more data
Don’t pay for what you don’t use
The hunt for inefficiencies: JSON keys
Positional binary formats achieve compaction and faster parsing since they
don’t serialize keys, and use position based encoding.
Parsing keys involves the following
Allocating key strings.
Comparing key strings with known “keys” to figure out which field to match
Back to code
Profile build(JsonParser parser) {
String name;
int numConnections;
parser.startRecord(); // Consumes ‘{’
while (parser.hasMoreFields()) {
String field = parser.getText();
parser.startField(); // Consumes ‘:’
if (“numConnections”.equals(field)) {
numConnections = parser.getInteger();
} else if (“name”.equals(field)) {
name = parser.getText();
} else {
parser.skipField();
}
}
return new Profile(numConnections, name);
}
String alloc
Comparisons
The cost of JSON key comparisons
If there are ‘n’ keys with an average length of ‘k’.
Temporary memory allocation space complexity O(nk)
Equality checking time complexity O(n2k)
But we know the keys in advance, so can we use this to our advantage?
Yes! Use a trie with positional ordinals as values
n
a
m
e
u
m
s
1
0
● Trades a 1 time static space allocation for faster performance.
● No temp string allocation. Read character by character from
source and check in trie.
● Avoids multiple comparison branches using if-else.
● Trie can be statically generated (since all key names are known
in advance)
● Trie can also be compacted to reduce storage space for non
redundant subsequences.
● Reduces space complexity to a 1 time cost of O(nk)
● Reduces equality checking time complexity to O(nk)
● Faster performance due to lesser branching.
Generated code with Trie
n
a
m
e
u
m
s
1
0
private static final Trie KEY_STORE = new Trie();
static {
KEY_STORE.put(“name”, 0);
KEY_STORE.put(“numConnections”, 1);
}
Profile build(NewJsonParser parser) {
String name;
int numConnections;
parser.startRecord(); // Consumes ‘{’
while (parser.hasMoreFields()) {
int ordinal = parser.getFieldOrdinal(KEY_STORE);
parser.startField(); // Consumes ‘:’
switch (ordinal) {
case 0: numConnections = parser.getInteger();
Break;
case 1: name = parser.getText();
Break;
default: parser.skipField();
}
}
return new Profile(numConnections, name);
}
How does this change the numbers?
Closes the gap but not enough!
Parser Parse time (ms) Allocation (KB)
Alibaba streaming 72/70 220/185
Jackson streaming 79/77 219/187
Protocol Buffers Lite 32/31 66/62
New Json parser 57/55 129/107
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
Expoiting prior knowledge of value types
Our JSON is backed by a schema. Schemas are written using an IDL.
We internally use PDL (Pegasus Data Language) as the IDL.
record Profile {
numConnections: int?
name: String?
}
● Records define a JSON object.
● Field names here are the field names in the serialized JSON.
● Types in the schema are types of values in the serialized JSON.
● Knowing types beforehand means parsing code can be lax and needn’t have strict checks.
● If an unexpected type is found, JSON is malformed, abort!
{
“numConnections” : 20,
“name”: “John”
}
Vanilla JSON parser field value parsing
Field start (:)
Object/Map Array Number BooleanString Null
{ [ -/ 0 to 9 “ t or f n
● Since we know types beforehand, these branches can be avoided.
● Parsing of value can be on-demand.
● Significantly reduces parse time.
How does this change the numbers?
Closes the gap more on parse time, temp allocations are still pretty bad!
Parser Parse time (ms) Allocation (KB)
Alibaba streaming 72/70 220/185
Jackson streaming 79/77 219/187
Protocol Buffers Lite 32/31 66/62
New Json parser 45/42 127/108
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
All obvious issues seem fixed. What else?
Sometimes profiling is the only answer to find hotspots.
Data arrives as a UTF-8 byte stream over the network not as chars.
LinkedIn app payloads are massively String heavy.
Profiling showed some CPU and allocation hotspots
Converting bytes to chars using Java’s built-in decoder.
Reading strings.
Converting bytes to chars?
Another transformation.
Temporary memory allocs for decoding buffers etc.
Most JSON tokens are ASCII, can use just 1 byte for them instead of 2
Surprise! Jackson, Alibaba etc. do have separate UTF-8 stream parsers.
We adopt a Jackson-like optimized approach when decoding UTF-8 strings.
UTF-8 decoding
Variable length encoding
1 byte/ASCII characters (U+0000 to U+007F)
2 byte chars (U+0080 to U+07FF)
3 byte chars (U+0800 to U+FFFF)
4 byte chars (U+10000 to U+10FFFF)
int c = inputStream.read();
if (c < 0x007f) {
// read 1 byte UTF
}
else if ((c & 0xE0) == 0xC0)
{ // 2 bytes (0x0080 - 0x07FF)
// read 2 byte UTF
}
else if ((c & 0xF0) == 0xE0)
{ // 3 bytes (0x0800 - 0xFFFF)
// read 3 byte UTF
}
else if ((c & 0xF8) == 0xF0)
{
// 4 bytes; double-char with surrogates.
// read 4 byte UTF
}
Upto 4 branches
Can we make this faster? Yes!
● Static 256 int alloc, but helps us massively during
decode.
● Reduces CPU computation during decode as well as
branches.
● Massively speeds up string decode.
UTF-8 decoding revised
int c = inputStream.read();
switch (UTF_8_LOOKUP_TABLE[c]) {
case 0: // read 1 byte char;
break;
case 2: // read 2 byte char;
break;
case 3: // read 3 byte char;
break;
case 4: // read 4 byte char;
break;
default: // handle error;
break;
}
1 branch, 1 comparison computation per char
Reading long strings
Traditional approach using StringBuilder:
StringBuilder builder = new StringBuilder();
while (!parser.stringEndReached()) {
builder.add(parser.nextChar());
}
return builder.toString();
● Every time buffer is enlarged to make more space three things happen
○ Allocating a new buffer (CPU + memory alloc).
○ Copying from old buffer to new buffer (CPU cost).
○ Garbage collecting old buffer (Memory churn and garbage).
● If we pool the underlying buffers in a buffer pool, and use a custom ‘StringBuilder’
○ Memory alloc, garbage and churn reduced.
○ CPU cost of copy still remains.
○ Over large, diverse payloads, pool becomes fragmented so efficiency reduces.
Reading long strings
Segmentation using pooled homogeneous buffers helps performance.
Zero copy cost when builder is enlarged (New buffer is appended to list)
Memory alloc, churn and garbage cost amortized by pooling.
Segmentation into homogeneous chunks means no fragmentation.
Final string computation may be slightly slower, but buffer size is chosen in a way that advantages elsewhere more than
cover it.
Buffer 1 Buffer 2 Buffer 3 Buffer 4
Characters not in the basic multilingual plane
Not encoded as codepoints.
Encoded as UTF-16 surrogate pairs escaped with u.
Historic reason for doing so (Any guesses?)
Needs to be handled carefully when parsing
Static decoder table for hex chars similar to UTF-8 to speed up parsing.
U+1D11E -> uD834uDD1E
Analysis of string content
Strings in LinkedIn apps tend to be very ASCII character heavy.
Even string values in other locales often are interspersed with ASCII content.
ASCII characters often occur together in a sequence.
Parsing these can be speeded up if we use a tight loop for ASCII content.
Break out and do extra branches if non ASCII content is encountered.
Massively improves overall string parsing performance from byte streams.
When reading ASCII byte is the same as the char.
Whitespaces
JSON sent over the wire is not pretty printed for compaction.
When parsing delimiters, check for delimiter first, before skipping whitespace.
Within whitespaces itself, a plain space has a higher chance of occuring than a
carriage return, line feed or tab.
Tight loop for space characters when skipping whitespace.
After doing all this...
The performance is very comparable!
Parser Parse time (ms) Allocation (KB)
Alibaba streaming 72/70 220/185
Jackson streaming 79/77 219/187
Protocol Buffers Lite 32/31 66/62
New Json parser 31/30 62/41
LinkedIn Feed 20 items (First/Subsequent) Nexus 5
● Still human readable
● Still debuggable
● Can still use the same format across iOS/Android/Web
And on low end devices...
The improvements are more profound!
Parser Parse time (ms) Allocation (KB)
Alibaba streaming 377/370 220/185
Jackson streaming 392/397 219/187
Protocol Buffers Lite 99/97 66/62
New Json parser 99/96 62/41
LinkedIn Feed 20 items (First/Subsequent) Samsung Star Pro
● Most of the benefit comes from saving on alloc and GC pauses
● Results in smoother UI
Zero Garbage!
This new parser is Zero garbage.
It does not allocate any transient memory beyond the POJOs it creates as the result of parse.
All intermittent allocs like buffers are pooled.
Pools are homogeneous as much as possible to limit fragmentation.
Pool capacities/buffer sizes are tuned based on device and network.
Lessons learnt
It is possible to parse JSON fast even on low end Android devices.
All formats have their achille’s heels, and there is no one size fits all.
Never adopt some cool new format blindly. Measure measure measure!
What’s next?
Similar parser + codegen for iOS in Obj-C
Open source both as part of Rest.li mobile optimized bindings.
Targeted for Q4 2017
Questions?

More Related Content

What's hot

Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessNick Barkas
 
Adopting Java for the Serverless world at Serverless Meetup New York and Boston
Adopting Java for the Serverless world at Serverless Meetup New York and BostonAdopting Java for the Serverless world at Serverless Meetup New York and Boston
Adopting Java for the Serverless world at Serverless Meetup New York and BostonVadym Kazulkin
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Pragmatic Guide to Apache Kafka®'s Exactly Once Semantics
Pragmatic Guide to Apache Kafka®'s Exactly Once SemanticsPragmatic Guide to Apache Kafka®'s Exactly Once Semantics
Pragmatic Guide to Apache Kafka®'s Exactly Once Semanticsconfluent
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lithium
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Observability on Kubernetes - High Availability on Prometheus
Observability on Kubernetes - High Availability on PrometheusObservability on Kubernetes - High Availability on Prometheus
Observability on Kubernetes - High Availability on PrometheusJulian Alarcon Alarcon
 

What's hot (20)

Distributed "Web Scale" Systems
Distributed "Web Scale" SystemsDistributed "Web Scale" Systems
Distributed "Web Scale" Systems
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Adopting Java for the Serverless world at Serverless Meetup New York and Boston
Adopting Java for the Serverless world at Serverless Meetup New York and BostonAdopting Java for the Serverless world at Serverless Meetup New York and Boston
Adopting Java for the Serverless world at Serverless Meetup New York and Boston
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Pragmatic Guide to Apache Kafka®'s Exactly Once Semantics
Pragmatic Guide to Apache Kafka®'s Exactly Once SemanticsPragmatic Guide to Apache Kafka®'s Exactly Once Semantics
Pragmatic Guide to Apache Kafka®'s Exactly Once Semantics
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Observability on Kubernetes - High Availability on Prometheus
Observability on Kubernetes - High Availability on PrometheusObservability on Kubernetes - High Availability on Prometheus
Observability on Kubernetes - High Availability on Prometheus
 
EVCache at Netflix
EVCache at NetflixEVCache at Netflix
EVCache at Netflix
 

Similar to Screaming fast json parsing on Android

json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college projectAmitSharma397241
 
Json - ideal for data interchange
Json - ideal for data interchangeJson - ideal for data interchange
Json - ideal for data interchangeChristoph Santschi
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Ramamohan Chokkam
 
Mongo db present
Mongo db presentMongo db present
Mongo db presentscottmsims
 
{"JSON, Swift and Type Safety" : "It's a wrap"}
{"JSON, Swift and Type Safety" : "It's a wrap"}{"JSON, Swift and Type Safety" : "It's a wrap"}
{"JSON, Swift and Type Safety" : "It's a wrap"}Anthony Levings
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)Faysal Shaarani (MBA)
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
 
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)Eugene Yokota
 
module 2.pptx for full stack mobile development application on backend applic...
module 2.pptx for full stack mobile development application on backend applic...module 2.pptx for full stack mobile development application on backend applic...
module 2.pptx for full stack mobile development application on backend applic...HemaSenthil5
 

Similar to Screaming fast json parsing on Android (20)

json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
 
Json
JsonJson
Json
 
Json - ideal for data interchange
Json - ideal for data interchangeJson - ideal for data interchange
Json - ideal for data interchange
 
JSON_FIles-Py (2).pptx
JSON_FIles-Py (2).pptxJSON_FIles-Py (2).pptx
JSON_FIles-Py (2).pptx
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
 
Mongo db present
Mongo db presentMongo db present
Mongo db present
 
{"JSON, Swift and Type Safety" : "It's a wrap"}
{"JSON, Swift and Type Safety" : "It's a wrap"}{"JSON, Swift and Type Safety" : "It's a wrap"}
{"JSON, Swift and Type Safety" : "It's a wrap"}
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
 
Javascript2839
Javascript2839Javascript2839
Javascript2839
 
Json
JsonJson
Json
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
 
JSON Injection
JSON InjectionJSON Injection
JSON Injection
 
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
sbt, history of JSON libraries, microservices, and schema evolution (Tokyo ver)
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
Json
JsonJson
Json
 
module 2.pptx for full stack mobile development application on backend applic...
module 2.pptx for full stack mobile development application on backend applic...module 2.pptx for full stack mobile development application on backend applic...
module 2.pptx for full stack mobile development application on backend applic...
 
Json at work overview and ecosystem-v2.0
Json at work   overview and ecosystem-v2.0Json at work   overview and ecosystem-v2.0
Json at work overview and ecosystem-v2.0
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
 
Avro
AvroAvro
Avro
 
Json1
Json1Json1
Json1
 

Recently uploaded

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 

Recently uploaded (20)

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 

Screaming fast json parsing on Android

  • 1. Screaming Fast JSON parsing Karthik Ramgopal
  • 2. Who am I? Engineer Mobile Infrastructure lead Former engineer on Flagship and Pulse app teams Obsessed about performance Connect with me: https://www.linkedin.com/in/karthikrg/
  • 4. LinkedIn’s Android app family Job Search Lookup Pulse Slideshare Sales Navigator Lynda Recruiter Students
  • 5. Android device and network diversity ● Samsung Galaxy S6 ● 4x2.1 GHz Cortex-A57 + 4x1.5 GHz Cortex-A53 ● 3 GB RAM ● LTE (100 Mbits/s) ● Samsung Star Pro ● 1 Ghz Cortex A5 ● 512 MB RAM ● EDGE (384 Kbps)
  • 6. LinkedIn client app high level architecture Frontend API server
  • 7. LinkedIn uses JSON to talk between apps and server
  • 8. What is JSON? JavaScript Object Notation is a data serialization format. Key value encoded data. Values must be string, boolean, number, array, object, null. Text based, Light weight (relatively), Human readable. Wide support across programming languages/platforms
  • 9. What else is out there?
  • 10. XML (eXtensible Markup Language) (+) Text based and human readable. (-) Very verbose.
  • 11. Binary Data Formats Examples include MsgPack, ProtoBuf, FlatBuffers, Cap’n’Proto etc. (+) More compact than JSON. Positional index based formats even omit keys. (+) Backing schema to describe data structure with platform specific binding generators (+) Much faster to parse than JSON when using vanilla parsing techniques. (-) Not human readable. (-) No native parsing support in web browsers. (-) Removed fields still occupy some space in positional formats. (-) Schema evolution MUST preserve field order in positional formats.
  • 12. Data Flow Parser Model Binder View Binder Data (JSON/XML/Binary) DataModel ViewModel Network Fission DataModel MMAP Cache Binary
  • 13. What affects JSON parsing performance? CPU Validating structure and tokenizing. Large number of branches causing pipeline stalls. Memory Large number of small allocs on the heap Causes memory churn slowing down the allocator Garbage collection pauses
  • 14. Types of JSON parsers Who controls the flow of parsed data to the consumer? Pull parser (Consumer controls) Push parser (Parser controls) How many times is the data processed? Once (traditional parsers) Twice (index overlay parsers) How is the data processed?
  • 15. JSON vs Binary JSON (naturally) has a size disadvantage over binary But, it is human readable and has wider multi-platform support Schema evolution is easier
  • 16. Size does matter or does it? JSON compresses very well being text based and having key repetition Binary formats don’t compress as well With compression, size over the wire is very comparable Decompression cost is similar, but after decompression binary is smaller Format Compressed size (gzip) Uncompressed size JSON 35.2 KB 309.5 KB ProtocolBuffers 33.7 KB 178.2 KB FlatBuffers 34.1 KB 192.8 KB Cap’n’Proto 33.8 KB 166.3 KB LinkedIn Feed 20 items (90th percentile sizes)
  • 17. Comparison of Android JSON parsing libraries Parser Streaming Reflection Parse time (ms) Allocation (KB) JSONObject No No 297/281 2397/2371 JsonReader Yes No 199/187 409/396 Alibaba streaming Yes No 72/70 220/185 GSON Yes Yes 521/486 1135/302 Moshi Yes Yes 493/311 1088/341 Jackson Databind Yes Yes 402/78 1192/191 Jackson streaming Yes No 79/77 219/187 LinkedIn Feed 20 items (First/Subsequent) Nexus 5 ● Using reflection introduces a massive first time penalty. ● Alibaba and Jackson streaming win hands down with Alibaba having the slight edge.
  • 18. What is the ideal way to parse network responses? Streaming (SAX) vs blob (DOM) parsing Stream means parsing can begin before network download finishes. Memory pressure/Garbage is reduced with streaming. Typically harder to code by hand (need to handle incremental data load etc.) Minimize transformations Typical parsing involves JSON -> Map -> Model object POJO. Intermediary transformation involves CPU and memory. Go directly from JSON to POJO.
  • 19. Android specific code generation considerations Prefer fields instead of methods for accessors on POJO. 65k method count limit pre Android L Virtual function execution penalty Use primitive types wherever possible int instead of Integer for example Boxed values are allocated on the heap and result in unnecessary memory churn Generate compact code
  • 20. Surely someone must have figured all this out? Yes! Open source codegenerating JSON parsers based on Jackson streaming. Instagram JSON parser LoganSquare (Uses a teeny bit of reflection)
  • 21. How does the generated code look? { “numConnections” : 20, “name”: “John” } profile.json Profile build(JsonParser parser) { String name; int numConnections; parser.startRecord(); // Consumes ‘{’ while (parser.hasMoreFields()) { String field = parser.getText(); parser.startField(); // Consumes ‘:’ if (“numConnections”.equals(field)) { numConnections = parser.getInteger(); } else if (“name”.equals(field)) { name = parser.getText(); } else { parser.skipField(); } } return new Profile(numConnections, name); }
  • 22. But binary still wins! Much faster (Lesser CPU consumption) Much less intermediary memory allocs (Memory churn/Garbage reduced) Parser Streaming Reflection Parse time (ms) Allocation (KB) Alibaba streaming Yes No 72/70 220/185 Jackson streaming Yes No 79/77 219/187 Protocol Buffers Lite Yes No 32/31 66/62 LinkedIn Feed 20 items (First/Subsequent) Nexus 5
  • 23. The gap is wider on lower end devices Binary is ~4x faster Could be the difference between delight and despair! Parser Streaming Reflection Parse time (ms) Allocation (KB) Alibaba streaming Yes No 377/370 220/185 Jackson streaming Yes No 392/397 219/187 Protocol Buffers Lite Yes No 99/97 66/62 LinkedIn Feed 20 items (First/Subsequent) Galaxy Star Pro
  • 24. Closing the gap with binary Make the CPU do less work when parsing JSON Fewer memory allocations Reduce garbage and memory churn All when parsing more data
  • 25. Don’t pay for what you don’t use
  • 26. The hunt for inefficiencies: JSON keys Positional binary formats achieve compaction and faster parsing since they don’t serialize keys, and use position based encoding. Parsing keys involves the following Allocating key strings. Comparing key strings with known “keys” to figure out which field to match
  • 27. Back to code Profile build(JsonParser parser) { String name; int numConnections; parser.startRecord(); // Consumes ‘{’ while (parser.hasMoreFields()) { String field = parser.getText(); parser.startField(); // Consumes ‘:’ if (“numConnections”.equals(field)) { numConnections = parser.getInteger(); } else if (“name”.equals(field)) { name = parser.getText(); } else { parser.skipField(); } } return new Profile(numConnections, name); } String alloc Comparisons
  • 28. The cost of JSON key comparisons If there are ‘n’ keys with an average length of ‘k’. Temporary memory allocation space complexity O(nk) Equality checking time complexity O(n2k) But we know the keys in advance, so can we use this to our advantage?
  • 29. Yes! Use a trie with positional ordinals as values n a m e u m s 1 0 ● Trades a 1 time static space allocation for faster performance. ● No temp string allocation. Read character by character from source and check in trie. ● Avoids multiple comparison branches using if-else. ● Trie can be statically generated (since all key names are known in advance) ● Trie can also be compacted to reduce storage space for non redundant subsequences. ● Reduces space complexity to a 1 time cost of O(nk) ● Reduces equality checking time complexity to O(nk) ● Faster performance due to lesser branching.
  • 30. Generated code with Trie n a m e u m s 1 0 private static final Trie KEY_STORE = new Trie(); static { KEY_STORE.put(“name”, 0); KEY_STORE.put(“numConnections”, 1); } Profile build(NewJsonParser parser) { String name; int numConnections; parser.startRecord(); // Consumes ‘{’ while (parser.hasMoreFields()) { int ordinal = parser.getFieldOrdinal(KEY_STORE); parser.startField(); // Consumes ‘:’ switch (ordinal) { case 0: numConnections = parser.getInteger(); Break; case 1: name = parser.getText(); Break; default: parser.skipField(); } } return new Profile(numConnections, name); }
  • 31. How does this change the numbers? Closes the gap but not enough! Parser Parse time (ms) Allocation (KB) Alibaba streaming 72/70 220/185 Jackson streaming 79/77 219/187 Protocol Buffers Lite 32/31 66/62 New Json parser 57/55 129/107 LinkedIn Feed 20 items (First/Subsequent) Nexus 5
  • 32. Expoiting prior knowledge of value types Our JSON is backed by a schema. Schemas are written using an IDL. We internally use PDL (Pegasus Data Language) as the IDL. record Profile { numConnections: int? name: String? } ● Records define a JSON object. ● Field names here are the field names in the serialized JSON. ● Types in the schema are types of values in the serialized JSON. ● Knowing types beforehand means parsing code can be lax and needn’t have strict checks. ● If an unexpected type is found, JSON is malformed, abort! { “numConnections” : 20, “name”: “John” }
  • 33. Vanilla JSON parser field value parsing Field start (:) Object/Map Array Number BooleanString Null { [ -/ 0 to 9 “ t or f n ● Since we know types beforehand, these branches can be avoided. ● Parsing of value can be on-demand. ● Significantly reduces parse time.
  • 34. How does this change the numbers? Closes the gap more on parse time, temp allocations are still pretty bad! Parser Parse time (ms) Allocation (KB) Alibaba streaming 72/70 220/185 Jackson streaming 79/77 219/187 Protocol Buffers Lite 32/31 66/62 New Json parser 45/42 127/108 LinkedIn Feed 20 items (First/Subsequent) Nexus 5
  • 35. All obvious issues seem fixed. What else? Sometimes profiling is the only answer to find hotspots. Data arrives as a UTF-8 byte stream over the network not as chars. LinkedIn app payloads are massively String heavy. Profiling showed some CPU and allocation hotspots Converting bytes to chars using Java’s built-in decoder. Reading strings.
  • 36. Converting bytes to chars? Another transformation. Temporary memory allocs for decoding buffers etc. Most JSON tokens are ASCII, can use just 1 byte for them instead of 2 Surprise! Jackson, Alibaba etc. do have separate UTF-8 stream parsers. We adopt a Jackson-like optimized approach when decoding UTF-8 strings.
  • 37. UTF-8 decoding Variable length encoding 1 byte/ASCII characters (U+0000 to U+007F) 2 byte chars (U+0080 to U+07FF) 3 byte chars (U+0800 to U+FFFF) 4 byte chars (U+10000 to U+10FFFF) int c = inputStream.read(); if (c < 0x007f) { // read 1 byte UTF } else if ((c & 0xE0) == 0xC0) { // 2 bytes (0x0080 - 0x07FF) // read 2 byte UTF } else if ((c & 0xF0) == 0xE0) { // 3 bytes (0x0800 - 0xFFFF) // read 3 byte UTF } else if ((c & 0xF8) == 0xF0) { // 4 bytes; double-char with surrogates. // read 4 byte UTF } Upto 4 branches
  • 38. Can we make this faster? Yes! ● Static 256 int alloc, but helps us massively during decode. ● Reduces CPU computation during decode as well as branches. ● Massively speeds up string decode.
  • 39. UTF-8 decoding revised int c = inputStream.read(); switch (UTF_8_LOOKUP_TABLE[c]) { case 0: // read 1 byte char; break; case 2: // read 2 byte char; break; case 3: // read 3 byte char; break; case 4: // read 4 byte char; break; default: // handle error; break; } 1 branch, 1 comparison computation per char
  • 40. Reading long strings Traditional approach using StringBuilder: StringBuilder builder = new StringBuilder(); while (!parser.stringEndReached()) { builder.add(parser.nextChar()); } return builder.toString(); ● Every time buffer is enlarged to make more space three things happen ○ Allocating a new buffer (CPU + memory alloc). ○ Copying from old buffer to new buffer (CPU cost). ○ Garbage collecting old buffer (Memory churn and garbage). ● If we pool the underlying buffers in a buffer pool, and use a custom ‘StringBuilder’ ○ Memory alloc, garbage and churn reduced. ○ CPU cost of copy still remains. ○ Over large, diverse payloads, pool becomes fragmented so efficiency reduces.
  • 41. Reading long strings Segmentation using pooled homogeneous buffers helps performance. Zero copy cost when builder is enlarged (New buffer is appended to list) Memory alloc, churn and garbage cost amortized by pooling. Segmentation into homogeneous chunks means no fragmentation. Final string computation may be slightly slower, but buffer size is chosen in a way that advantages elsewhere more than cover it. Buffer 1 Buffer 2 Buffer 3 Buffer 4
  • 42. Characters not in the basic multilingual plane Not encoded as codepoints. Encoded as UTF-16 surrogate pairs escaped with u. Historic reason for doing so (Any guesses?) Needs to be handled carefully when parsing Static decoder table for hex chars similar to UTF-8 to speed up parsing. U+1D11E -> uD834uDD1E
  • 43. Analysis of string content Strings in LinkedIn apps tend to be very ASCII character heavy. Even string values in other locales often are interspersed with ASCII content. ASCII characters often occur together in a sequence. Parsing these can be speeded up if we use a tight loop for ASCII content. Break out and do extra branches if non ASCII content is encountered. Massively improves overall string parsing performance from byte streams. When reading ASCII byte is the same as the char.
  • 44. Whitespaces JSON sent over the wire is not pretty printed for compaction. When parsing delimiters, check for delimiter first, before skipping whitespace. Within whitespaces itself, a plain space has a higher chance of occuring than a carriage return, line feed or tab. Tight loop for space characters when skipping whitespace.
  • 45. After doing all this... The performance is very comparable! Parser Parse time (ms) Allocation (KB) Alibaba streaming 72/70 220/185 Jackson streaming 79/77 219/187 Protocol Buffers Lite 32/31 66/62 New Json parser 31/30 62/41 LinkedIn Feed 20 items (First/Subsequent) Nexus 5 ● Still human readable ● Still debuggable ● Can still use the same format across iOS/Android/Web
  • 46. And on low end devices... The improvements are more profound! Parser Parse time (ms) Allocation (KB) Alibaba streaming 377/370 220/185 Jackson streaming 392/397 219/187 Protocol Buffers Lite 99/97 66/62 New Json parser 99/96 62/41 LinkedIn Feed 20 items (First/Subsequent) Samsung Star Pro ● Most of the benefit comes from saving on alloc and GC pauses ● Results in smoother UI
  • 47. Zero Garbage! This new parser is Zero garbage. It does not allocate any transient memory beyond the POJOs it creates as the result of parse. All intermittent allocs like buffers are pooled. Pools are homogeneous as much as possible to limit fragmentation. Pool capacities/buffer sizes are tuned based on device and network.
  • 48. Lessons learnt It is possible to parse JSON fast even on low end Android devices. All formats have their achille’s heels, and there is no one size fits all. Never adopt some cool new format blindly. Measure measure measure!
  • 49. What’s next? Similar parser + codegen for iOS in Obj-C Open source both as part of Rest.li mobile optimized bindings. Targeted for Q4 2017

Editor's Notes

  1. Typical of our 90th pc devices in US and India