JDG DEMO: Consistent Hashing and Topologies

ROME 27-28 march 2015 – Ugo Landini
Quick Start Lab
JBoss Data Grid
Ugo Landini 
Senior Solution Architect 
ugol@redhat.com 
March 26th 2015

Quick Start Lab - JBoss Data Grid2
• Big Data & NoSQL: super quick introduction to terminology
• What developers do to scale out
• Consistent Hashing
• What’s a Data Grid
• DEMO
• Inﬁnispan/JDG features
• Q&A
Agenda

• DEMO
• Q&A
Agenda

new generation of
technologies ... designed to
economically extract value
from very large volumes of a
wide variety of data, by
enabling high velocity
capture, discovery and/or
analysis
IDC, 2012
Big Data

Not Only SQL
Just an alternative to
RDBMS
NoSQL

K/V Store
Document Store
Column based DB
Graph DB
XML, Object DB, Multidimensional, Grid/Cloud, …
see map on https://451research.com/images/Marketing/dataplatformsmapoctober2014.pdf
NoSQL

NoSQL

We’re here
NoSQL

•Very hard to categorise in a systematic way
•Many nuances
•Many cases of “Evolutionary Convergence”
•i.e. evolving similar features having to adapt to similar
environments
NoSQL

CAP Theorem

•Brewer’sTheorem (2000, proven in 2002)
•Three guarantees of a Distributed System
•Consistency
•Availability
•PartitionTolerance
CAP Theorem

All nodes see the same data at the same time
Consistency

A guarantee that every request receives a response
about whether it succeeded or failed
Availability

The system continues to operate despite arbitrary
message loss or failure of part of the system
Partition Tolerance

Consistency:
Transactions
Availability:
Redundancy
Partition
Tolerance:
Scaleout
CAP: Popular Version

Consistency:
Transactions
Availability:
Redundancy
Partition
Tolerance:
Scaleout
NO
GO

Consistency:
Transactions
Availability:
Redundancy
Partition
Tolerance:
Scaleout
RDBMS

Consistency:
Transactions
Availability:
Redundancy
Partition
Tolerance:
Scaleout
NoSQL

Brewer wrote an essay in 2012 to clarify some of the
CAP implications
http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
CAP: Modern Version

The "two out of three" concept can be misleading or
misapplied and it should be considered as a tautology
Many vendors used CAP theorem just as an excuse to
sacriﬁce Consistency
CAP: Modern Version

Partitions are rare, so there is little reason to forfeit C or
A when the system is not partitioned
The choice between C and A can occur many times
within the same system at very ﬁne granularity
CAP: Modern Version

Different decisions about C and A:
•for different operations
•for different data
•in different moments
CAP: Modern Version

Finally, C, A e P are more continuos than binary:
•A is obviously continuous
•Many levels of Consistency (think isolation level in
classic DB)
•Even Partitions have nuances, including disagreement
within the system about whether a partition exists
CAP: Modern Version

• DEMO
• Q&A
Agenda

26
Virtual Machine 1
Client
Cache
RDBMS
read & write
Local Caching

27
Virtual Machine 1
Client
Cache
RDBMS
read & write
•Single JVM
•few memory
•no HA
Local Caching

28
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
1. Client 1 reads A
First try at distributed caching

29
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
2. Client 1 writes A
to Cache 1

30
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
3. Client 2 writes A2
to RDBMS

31
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
4. Client 1 reads A
from Cache 1

32
Distributed Caching on many nodes
What about dirty reads? (i.e. how to cope with multiple
writes, invalidation, etc.)

33
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
1. Client 2 writes A2
to RDBMS
Second try at distributed caching

34
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
2. Client 2 updates
Cache 2

35
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
3. sync Caches

36
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
1. Client 1 reads A2
from Cache 1

New Cache topology
Startup time
State transfers
Incompatible JVM tunings
GCs
Non Java clients

• Q&A
Agenda

Hashing Wheel: a mathematical “wheel” on which you
hash Ks (keys) and Ns (nodes).
The relative position of Ks and Ns determines which
Node is the “owner” of that particular K in a topology
Consistent Hashing

N1 Node 1
N2
N3
Node 2
Node 3
Consistent Hashing

Ns (nodes) on the “wheel” partition the hash space in
segments
Every segment contains a range of Ks
Consistent Hashing

N1 Node 1
N2
N3
Node 2
Node 3
K250
Consistent Hashing

N1 Node 1
N2
N3
Node 2
Node 3
K250
owner = N2
Consistent Hashing

N1 Node 1
N2
N3
Node 2
Node 3
K250
K570
K700
K900
K53
Consistent Hashing

Going clockwise from the K:
•the ﬁrst N is the owner
•next N is the replica
•next next N could be another replica, and so on
Consistent Hashing

N1 Node 1
N2
N3
Node 2
Node 3
K250
K570
K700
K900
K53
owner = N2
replica = N3
Consistent Hashing

What happens if a node dies?
Consistent Hashing

N1 Node 1
N3
Node 2
Node 3
K250
K570
K700
K900
K53
owner = N2
replica = N3
Consistent Hashing

N1 Node 1
N3Node 3
K250
K570
K700
K900
K53
Consistent Hashing

N1 Node 1
N3Node 3
K250
K570
K700
K900
K53
owner = N3
replica = N1
Consistent Hashing

The real CH algorithm implemented in JDG is slightly
different
CH is optimized to minimize state transfer (i.e. number
of keys moving when a node dies or a new one joins the
cluster)
Consistent Hashing

• DEMO
• Q&A
Agenda

Distributed Memory Storage Engine
Networked Memory
A Distributed Cache “on steroids”
ATransactional NoSQL
What’s a Data Grid?

•Key/Value storage
•Search Engine (from K/V to Document storage)
•Linear Scalability, Elasticity and Fault tolerance
•Thanks to CH
•Memory based
•Persistence engines are optional
What’s a Data Grid?

•DifferentTopologies
•Querying
•Task Execution & Map/Reduce
•Partition Handling
•Data Afﬁnity (to squeeze every bit of
performance)
Data Grid > Distributed Caching

•LOCAL
•INVALIDATION
•REPLICATED
•DISTRIBUTED
JDG Cache Topologies (Cluster modes)

•LOCAL
•simple cache (EHCache-like)
•INVALIDATION
•REPLICATED
•DISTRIBUTED

•LOCAL
•INVALIDATION
•no sharing
•REPLICATED
•DISTRIBUTED

•LOCAL
•INVALIDATION
•REPLICATED
•All node are equals
•4 Nodes @ 8 GB = 8 GB
•DISTRIBUTED

•LOCAL
•INVALIDATION
•REPLICATED
•DISTRIBUTED
•For example: 1 Replica
•4 Nodes @ 8 GB = 16 GB

61
Server B
JDG 3 JDG 4
Server A
JDG 1 JDG 2 cluster
4 JDG Nodes on 2 servers
A Simple Grid

62
JDG 1 JDG 2 JDG 3 JDG 4
K0
K1
K6
K3
K8
K2
K4
K9
K5
K7
Distributed without Replica

63
K0
K1
K6
K3
K8
K2
K4
K9
K5
K7
K5
K2 K9
K7
K4
K3
K1
K0
K8
K6
Distributed with Replica

64
K0K1
K6
K3
K8
K2
K4
K9K5
K7
K0K1
K6
K3
K8
K2
K4
K9K5
K7
K0K1
K6
K3
K8
K2
K4
K9K5
K7
K0K1
K6
K3
K8
K2
K4
K9K5
K7
Replicated

•Replicated:
•“Small” set of data with high % of reads vs
writes
•Distributed:
•“Big” set of data: linear scaling
•You need M/R & Distexec
How do I choose?

•You can have different Cache conﬁgurations
in the same CacheManager
•mix&match Replicated and Distributed as
needed

•Default hashing (Distributed mode):
MurmurHash3.
•It’s a simple and standard Hashing:
•you can change it as you like, f.e. if your
key already identiﬁes a partitioning criteria
Tuning your hashing

•Can be “ﬁne tuned” in 4 different ways:
•Server Hinting
•Virtual Servers
•Grouping
•Key Afﬁnity
Tuning your hashing

•A triple (site, rack, server)
•You increase availability avoiding that replicas
ends up in the same (site, rack, server) of the
master
Server Hinting

•Number of di “segments” in which the
cluster is partitioned
•Improve the node distribution on the
hashing wheel to have a better distribution
of keys
•Default: 60
Virtual Servers

•Data colocation
•A cache node contains K but also other
relevant data afferent to K
•Example: customer and its bank
movements
•You just have to deﬁne a group, JDG will
colocate all data of the same group in the
same node
Grouping

•Like Grouping, but from another perspective:
•You just ask a node for a key that will be
hashed on that node
•Grouping/Afﬁnity are your best friends if you
want to reach JDG Nirvana!
Key Afﬁnity

•All data needed by a node of your application are local,
at the distance of a single Java method call
JDG Nirvana

• DEMO
• Q&A
Agenda

•Small self-contained projects that can be used to
simply explain JDG to customers
•https://github.com/redhat-italy/jdg-quickstarts
JDG Quickstarts

• DEMO
• Q&A
Agenda

•If JDG detects a split brain, partitions enter
in degraded mode
•A degraded partition can read/write ONLY
fully owned keys
•A partition fully owns a key if contains
master and replicas nodes for that key
•You’ll get an AvailabilityException for other
keys
Partition Handling

•Cache Store
•Not only in memory!
•Write through & write behind (ACK sync or
async)
•Pluggable “drivers”
•File System, JPA, LevelDB (supported)
•MongoDB, Cassandra, BerkeleyDB, etc.
(community)
Persistence

•To avoid Out Of Memory
•Entry can be “passivated” on disk (you’ll need a
CacheStore)
Eviction

•You assign a lifespan or a max idle time to a
key
•The key will then be automatically removed
after that time
•You don’t need to write “Garbage Clean
code”
Expiry

Expiry

•Both avoid Out Of Memory
•“Evicted” data can be maintained in the Grid
with Passivation
•Eviction is a Cache conﬁguration
•Expiration is a Key conﬁguration
•Expiration could be a business requisite
•Eviction is a system feature
Eviction/Expiry: differences

•JDG has full support for transactions
•LocalTransactions
•GlobalTransactions (XA): if running inside an
AS automatically uses itsTX Manager
•Batching API
Transactions

•Cache/CacheManager events
•Topology changes
•Entries being added, removed, modiﬁed
•Cluster listeners
Listener/Notiﬁcations

•Inﬁnispan-query module
•Hibernate Search & Lucene
•Querying via DSL
•Lucene indexes can be kept in memory, on
disk or in the grid
Querying the grid

•with M/R you can implement distributed global
operation on the grid
•Each node works on its data (Map)
•Results are later aggregated (Reduce)
Map/Reduce

Map/Reduce

•JDG 7 will implement HDFS API
•So it will be able to act as a super fast Hadoop
store
Hadoop, coming soon…

•With Distexec you can submit “tasks” to the
Grid
•The task can be executed on each node or
on a subset of the nodes
•The task can modify data in the Grid
Distributed Execution (Distexec)

Cross Site Replication

•“Follow the Sun” architectures
•Many different clusters that can be kept in
sync
Cross Site Replication

•JSR-107
•JavaTemporary Caching API
•Conﬁrmed in January 2015
•In roadmap for JDG 6.5
•JSR-347
•Data Grids for the Java Platform
•JSR Retired in January 2015
Standard APIs

•Command Line Console
•JMX
•JON Plugin
Management Tooling

•User Authentication
•SASL
•Role Based Access Control (RBAC)
•Users, Roles and mapping between roles and
operations on Cache / Cache-Manager
•Node Authentication & Authorisation
•Encrypted communication between nodes
Data Security

•Library mode
•Embedded in your JVM
•C/S mode
•REST
•Memcached
•Hot Rod
Embedded vs Client/Server

Embedded vs Client/Server

Protocol
Client
Libs
Smart
Routing
Load
Balancing/
Failover
TX Listeners M/R Dist Querying
Separated
Cluster
Library
mode
inVM N/A Yes Dynamic Yes Yes Yes Yes Yes No
REST Text HTTP No
Any HTTP
load
balancer
No No No No No Yes
Memcached Text Many No
Predeﬁned
server list
No No No No No Yes
Hot Rod Binary
Java/
Python/
C++
Yes Dynamic
Local w
MVCC
Yes (6.4) No No Yes (6.3) Yes
Protocol Comparison

Q&A

Thank You!
Leave your feedback on Joind.in!
https://joind.in/event/view/3347
Quick Start Lab
JBoss Data Grid
Ugo Landini 
Senior Solution Architect 
ugol@redhat.com 
March 26th 2015

JDG DEMO: Consistent Hashing and Topologies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to JDG DEMO: Consistent Hashing and Topologies

Similar to JDG DEMO: Consistent Hashing and Topologies (20)

More from Ugo Landini

More from Ugo Landini (11)

Recently uploaded

Recently uploaded (20)

JDG DEMO: Consistent Hashing and Topologies