Zeotap: Moving to ScyllaDB - A Graph of Billions Scale

Moving to ScyllaDB - A
Graph of Billions scale
Saurabh Verma, Principal Engineer
K S Sathish, VP Engineering

Presenters
K S Sathish, VP Engineering
Sathish heads the engineering at Zeotap. Bangalore India
Engineering strategy and technical architecture.
17+ years of experience
Been building big data stacks for various verticals for past 8 years
Saurabh Verma, Principal Engineer
Saurabh is a Principal Engineer at Zeotap.
Leads Data engineering team for Identity product suite
Architecture, design and engineering delivery of the Identity product.
Spent the last 6 years in building big data systems.
Place company logo
here

■ Identity and Data platform - People Based data
■ Enables Brands to better understand their customers - 360º View
■ World’s Largest Independent People Graph
■ Full Privacy/GDPR compliant
■ 80+ Data partners
■ Catering to Ad-Tech and MarTech
ZEOTAP

Identity Resolution
● Singular View of all Identities of a
Person
● Multiple Identity sources
● Different Identifiers
○ Web Cookies
○ Mobile
○ Partner Platform
○ CRM
Linkages between these identifiers
are more important than the
individual Identifiers

Identity Use cases
■ Match Test - Reference IDs JOIN with ID universe
■ Export - IDs retrieved based on Match and pushed out
■ Reporting
■ Compliance - Opt Out - Disconnect
■ 3rd party extension
■ Identity Quality
■ Short SLAs for Freshness of Data - meaning quick ingestion and
retrieval

Data Access
Old Implementation
Reports
Redshift
Athena
Partner 1
Partner 2
Partner n
Processing
Curated
Denormalized
Data S3
Processing
Client ID sets Match Test
Exports

Identity Tech - Reqs
■ Workload
● High Read and High Write - Ingestion and Retrieval can happen simultaneously
■ Write
● Ingestion - Streaming and Batch
● Deletion - Streaming and Batch
● Above 50K writes per second to meet SLAs
■ Housekeep
● TTL - based on conditions

Identity Tech- Reqs Cont...
■ Read
● Lookup Matching IDs
● Retrieve Linked IDs
● Retrieve Linked IDs based on conditions
■ ID Type - Android ID, website cookie
■ Property - Recency, quality, country
● Count
● Depth filters

Time to Change
Reports
Processing
Client ID sets Match Test
Exports
ID Graph??
Partner 1
Partner 2
Partner n
Processing

Why Native Graph
Native Graph Database (JanusGraph)
Low latency
neighbourhood traversal
(OLTP) - Lookup & Retrieve
- Graph traversal modeled as iterative low-latency lookups in
the Scylla K,V store
- Runtime proportional to the client data set & overlap
percentage
Lower Data Ingestion SLAs - Ingestion modeled as UPSERT operations
- Aligned with Streaming & Differential data ingestions
- Economically lower footprint to run in production
Linkages are first-class
citizen
- Linkages have properties and traversals can leverage these
properties
- On the fly path computation
Analytics Stats on the
Graph, Clustering (OLAP)
- Bulk export and massive parallel processing available with
GraphComputer integration with Spark, Hadoop, Giraph

And… Concise solutions to the right problems
■ Find the path between 2 user IDs
SQL Gremlin Query
(select * from idmvp
where id1 = '75d630a9-2d34-433e-b05f-2031a0342e42' and idtype1 =
'id_mid_13'
and id2 = '5c557df3-df47-4603-64bc-5a9a63f22245' and idtype2 =
'id_mid_4') // depth = 1
union
(select * from idmvp t1, idmvp t2
where t1.id1 = '75d630a9-2d34-433e-b05f-2031a0342e42' and t1.idtype1 =
'id_mid_13'
and t2.id2 = '5c557df3-df47-4603-64bc-5a9a63f22245' and t2.idtype2 =
union
(select * from idmvp t1, idmvp t2, idmvp t3
where t1.id1 = '75d630a9-2d34-433e-b05f-2031a0342e42' and t1.idtype1 =
'id_mid_13'
and t3.id2 = '5c557df3-df47-4603-64bc-5a9a63f22245' and t3.idtype2 =
g.V()
.has('id','75d630a9-2d34-433e-b05f-2031a0342e42').has('type',
'id_mid_13')
.repeat(both().simplePath().timeLimit(40000))
.until(has('id','5c557df3-df47-4603-64bc-5a9a63f22245')
.has('type','id_mid_4'))
.limit(10).path()
.by(‘id’)

POC Hardware
Janus On Scylla Aerospike OrientDB DGraph
3 x i3.2xLarge 3 x i3.2xLarge 3 x i3.2xLarge 3 x r4.16xLarge
Client Configuration
3 x c5.18xLarge
Server Configuration
Replication Factor
1

Store Benchmarking - 3B IDs, 1B edges
JanusGraph with
ScyllaDB
Aerospike OrientDB DGraph
Sharded, Distributed
Storage Model LPG Custom LPG RDF
Cost of ETL before Ingestion Lower Lower Lower Higher
Native Graph DB
Node / Edge Schema Change without
downtime?
Benchmark dataset load completed?
Acceptable Query Performance? - -
Production Setup Running Cost Lower Higher - -
Production Setup Operational Management
(based on our experience with AS in prod)
Higher Lower - -
✓ ✓ ✓
✓✓✓
✓✓✓ ✓
✓ ✓
✓ ✓
❌
❌
❌ ❌

ID Graph Data Model
label: id
type: online
idtype: adid_sha1
id: c3b2a1ed
os: ‘android’
country: ‘ESP’
dpid: {1}
ip: [1.2.3.4]
linkedTo: {dp1: t1, dp2: t2,
quality: 0.30, linkType: 1}
linkedTo: {dp1: t1, dp2: t2, dp3: t3,
dp4: t4, quality: 0.55, linkType: 3}
label: id
type: online
idtype: adid
id: a711a4de
os: ‘android’
country: ‘ITA’
dpid: {2,3,4}
label: id
type: online
Idtype: googlecookie
id: 01e0ffa7
os: ‘android’
country: ‘ESP’
dpid: {1,2}
label: id
type: online
idtype: adid
id: 412ce1f0
os: ‘android’
country: ‘ITA’
dpid: {2,4}
ip: [1.2.3.4]
label: id
type: offline
idtype: email
id: abc@gmail.com
os: ‘ios’
country: ‘ESP’
dpid: {2,4}
linkedTo: {dp1: t1, quality: 0.25,
linkType: 3, linkSource: ip}
dp4: t4, quality: 0.71,
linkType: 9}

Expressiveness of Model
label: id
type: online
idtype: adid_sha1
id: c3b2a1ed
os: ‘android’
country: ‘ESP’
dpid: {1}
ip: [1.2.3.4]
quality: 0.30, linkType: 1}
linkedTo: {dp1: t1, dp2: t2, dp3: t3,
dp4: t4, quality: 0.55, linkType: 3}
label: id
type: online
idtype: adid
id: a711a4de
os: ‘android’
country: ‘ITA’
dpid: {2,3,4}
label: id
type: online
Idtype: googlecookie
id: 01e0ffa7
os: ‘android’
country: ‘ESP’
dpid: {1,2}
label: id
type: online
idtype: adid
id: 412ce1f0
os: ‘android’
country: ‘ITA’
dpid: {2,4}
ip: [1.2.3.4]
label: id
type: offline
idtype: email
id: abc@gmail.com
os: ‘ios’
country: ‘ESP’
dpid: {2,4}
linkedTo: {dp1: t1, quality: 0.25,
linkType: 3, linkSource: ip}
dp4: t4, quality: 0.71,
linkType: 9}
Quality
Filtered Links
ID Attribute
Filtering
Recency
Filtered Links
Extensible
Data Model
Transitive
Links

Streaming Ingestion
■ Workload
● 300 - 400 million data points per day
● Dedupe & Enrich
● Merge
● Final snapshot
■ Batch Process
● Spark Join
● Merge runtime - 4 to 6 hours
● Redshift load time - 2 to 3 hours
● Painful Failures
Stream & Batch
Dedup
Enrich
S3
Merge
Redshift

Streaming Ingestion
■ And...
● Time - 2 to 3 hours
● Join Vs Lookup
● All Stream
● Failures - down by 83%
Stream
& Batch
Dedup
Enrich
Streaming
Graph Ingester
Streaming
Graph Ingester
Vertex
Edge
KV Store

Findings
■ Consider Splitting Vertex Load from Edge Load
● Write behaviour is different
● Achieve overall better QPS
■ Benchmark Vertex load speed against CPU utilization
● Observed 5K TPS per server core
■ Consider Client Side Caching - Edge Load
● One lookup and One write with many duplicate IDs - Too many disk hits (Thrashing)
● 100% write - 4.8K TPS per core
● LeveledCompactionStrategy performed better than
SizeTieredCompactionStrategy

Findings
■ Be Wary of Supernodes
● Supernodes > 600 vertices drastic QPS drop
● 40K QPS to 2K
■ Multi-Level Traversal - Depth limiting
● QPS decreases though not linear
● depth of 5 - 40K QPS to 12K

Findings
■ Play with Compaction strategies
● For our queries LevelTiered increased QPS by 2.5X
● With LevelTiered - concurrent clients better handled
● QPS stabilized at 30K

Know Your Query And Data
■ Segments are country based - filter based on Countries
■ Vertex Metadata not huge
Fetching individual properties from the Vertex
gremlin>g.V().has('id','1').has('type','email')
.values('id', 'type', 'USA').profile()
Fetching entire property map during traversal
gremlin>g.V().has('id','1').has('type','email')
.valueMap().profile()
Step Traversers Time
JanusGraphStep
_condition=(id=1
AND type = email)
1 0.987
JanusGraphPrope
rtiesStep
_condition=((type[
id] OR type[type]
OR type[USA]))
4 1.337
2.325 s
Step Traversers Time
JanusGraphStep
_condition=(id=1
AND type = email)
1 0.902
PropertyMapStep
(value)
1 0.175
1.077 s
~200%

ID Graph Quality
■ How Trustable is our ID graph
● What happens if match rate is ridiculously high
● Cluster of 63 million IDs
■ Connectivity analysis - heuristics
● Density
● Depth
● Clustering
● Distance
■ Can we arrive at Quality Score for edges?

Scoring V1
■ AD scoring - Edge Agreement (A) / Disagreement (D)
■ Recency Scoring - Augment A & D with Recency
■ Calculate Composite Score
■ Adjust composite score with IDs metadata

Scoring V1
AD Score
Recency
Score
Composite
Score
Adjust
Event Rarity
Final
Score

■ Interaction with JanusGraph backed by ScyllaDB
● For each input ID find the connected IDs in the ID Graph based on filters
● Modeled as Depth First Search implemented in Gremlin in Apache Spark
● Property and depth filtering done at the application layer
● The overlapping ID output is stored on deep storage eg AWS s3
■ Across-Graph Traversals
● Separate compliance requirements per 3rd party Graph vendor
● Probabilistic vs Deterministic Graph vendors
● Each Graph Vendor represented as a separate keyspace in ScyllaDB
● The application layer enables runtime chaining and ordering for Across-Graph
traversals
OLTP Export - ID Overlap Finder Workflow

■ Export Native Graph DB to Deep Storage
■ Apache Spark based ID Graph Quality Scoring
OLAP Export - Storage & Analytics
OLTP ID
Graph
Periodic
Backup
ScyllaDB
SSTables
on AWS s3
OLAP ID
Graph
Periodic
Refresh
SparkOLAP
Export to AWS
s3
GryoOutputFormat
Native Graph on AWS
s3
Periodic Static
Reports
ID Graph Quality
Data Science
Pipeline
ID Graph Quality Score Update

Prod Setup
■ V1 release in Nov 2018
■ In production on AWS i3.4xLarge instances
■ These are 16 core, 122 GB RAM instances
■ ScyllaDB Version 3.0.6 provisioned via AWS Scylla AMIs
■ Using Scylla Grafana Dashboards for Production Metrics
■ Using LevelTieredCompactionStrategy in production
■ Stats (To be updated before final deck)

■ 2 primary Workflows
● ID overlap finder
● ID retriever
Consideration : 2-node Scylla cluster, the peak client connections is around 3,000
ID overlap finder ~4X numbers of ID retriever
Run Together
● Race and SLA degrade!
● High Failure Rates
Whatever The Tool...

Introduce - Prioritization & Throttling
Priority with Aging - Match Test get priority but nothing starves
Throttle - Limit concurrent Jobs
And…
■ SLA from p95 of 10 hours to 2 hours
■ Job failure rate from 20% to 2% per day
All Higher Level Constructs in Control Plane
Good Architecture is a Must!

Thank you Stay in touch
Any questions?
Sathish K S
sathish.ks@gmail
Not on Twitter!
Saurabh Verma
saurabhdec1988@gmail
@saurabhdec1988

Zeotap: Moving to ScyllaDB - A Graph of Billions Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Zeotap: Moving to ScyllaDB - A Graph of Billions Scale

Similar to Zeotap: Moving to ScyllaDB - A Graph of Billions Scale (20)

Recently uploaded

Recently uploaded (20)

Zeotap: Moving to ScyllaDB - A Graph of Billions Scale