SlideShare a Scribd company logo
1 of 30
CS 6213 –
Advanced
Data
Structures
TRIES
AN EXCELLENT DATA
STRUCTURE FOR
STRINGS
Instructor
Prof. Amrinder Arora
amrinder@gwu.edu
Please copy TA on emails
Please feel free to call as well
TA
Iswarya Parupudi
iswarya2291@gwmail.gwu.edu
L6 - Tries CS 6213 - Advanced Data Structures - Arora 2
LOGISTICS
Michael T. Goodrich and Roberto Tamassia
Data Structures and Algorithms in Java (4th edition)
John Wiley & Sons, Inc.
ISBN: 0-471-73884-0
Haim Kaplan, Tel Aviv University
Jörg Liebeherr, University of Toronto
L6 - Tries CS 6213 - Advanced Data Structures - Arora 3
CREDITS
Naïve, brute force for searching a text of size n and a
pattern of size m requires O(nm) time.
Preprocessing the pattern speeds up pattern
matching queries. E.g., KMP algorithm performs
pattern matching in time proportional to the text
size: O(n)
If the text is large, immutable and searched often
(e.g., Shakespeare), we may want to preprocess the
text itself. Want to perform the searching in O(m)
time.
L6 - Tries CS 6213 - Advanced Data Structures - Arora 4
MOTIVATION
A trie is a compact data structure for representing a
set of strings, such as all the words in a text. A trie
supports pattern matching queries in time
proportional to the pattern size: O(m)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 5
MOTIVATION (CONT.)
Standard Tries
Compressed Tries
Compact Representation
Suffix Trie
L6 - Tries CS 6213 - Advanced Data Structures - Arora 6
TRIES: TOPICS
 The standard trie for a set of strings S is an ordered tree
such that:
 Each node but the root is labeled with a character
 The children of a node are alphabetically ordered
 The paths from the root to the leaves yield the strings of S
 Example: set of strings S = { bear, bell, bid, bull, buy, sell, stock,
stop }
L6 - Tries CS 6213 - Advanced Data Structures - Arora 7
STANDARD TRIES
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
A standard trie uses O(n) space and supports
searches, insertions and deletions in time
O(dm), where:
n total size of the strings in S
m size of the string parameter of the operation
d size of the alphabet
L6 - Tries CS 6213 - Advanced Data Structures - Arora 8
ANALYSIS OF STANDARD TRIES
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
 We insert
the words of
the text into
a trie
 Each leaf
stores the
occurrences
of the
associated
word in the
text
L6 - Tries CS 6213 - Advanced Data Structures - Arora 9
WORD MATCHING WITH A TRIE
s e e b e a r ? s e l l s t o c k !
s e e b u l l ? b u y s t o c k !
b i d s t o c k !
a
a
h e t h e b e l l ? s t o p !
b i d s t o c k !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
a r
87 88
a
e
b
l
s
u
l
e t
e
0, 24
o
c
i
l
r
6
l
78
d
47, 58
l
30
y
36
l
12
k
17, 40,
51, 62
p
84
h
e
r
69
a
 A compressed trie has
internal nodes of
degree at least two
 It is obtained from
standard trie by
compressing chains of
“redundant” nodes
L6 - Tries CS 6213 - Advanced Data Structures - Arora 10
COMPRESSED TRIES
e
b
ar ll
s
u
ll y
ell to
ck p
id
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
 Compact representation of a compressed trie for an array of
strings:
 Stores at the nodes ranges of indices instead of substrings
 Uses O(s) space, where s is the number of strings in the array
 Serves as an auxiliary index structure
L6 - Tries CS 6213 - Advanced Data Structures - Arora 11
COMPACT REPRESENTATION
s e e
b e a r
s e l l
s t o c k
b u l l
b u y
b i d
h e
b e l l
s t o p
0 1 2 3 4
a rS[0] =
S[1] =
S[2] =
S[3] =
S[4] =
S[5] =
S[6] =
S[7] =
S[8] =
S[9] =
0 1 2 3 0 1 2 3
1, 1, 1
1, 0, 0 0, 0, 0
4, 1, 1
0, 2, 2
3, 1, 2
1, 2, 3 8, 2, 3
6, 1, 2
4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3
7, 0, 3
0, 1, 1
Begins with: where name like ‘x%’
Ends with: where name like ‘%x’
Substring: where name like ‘%x%’
L6 - Tries CS 6213 - Advanced Data Structures - Arora 12
STRING SEARCHES
 The suffix trie of a string X is the compressed trie of all
the suffixes of X
L6 - Tries CS 6213 - Advanced Data Structures - Arora 13
SUFFIX TRIE
e nimize
nimize ze
zei mi
mize nimize ze
m i n i z em i
0 1 2 3 4 5 6 7
Compact representation of the suffix trie for a
string X of size n from an alphabet of size d
 Uses O(n) space
 Supports arbitrary pattern matching queries in X in O(dm)
time, where m is the size of the pattern
 Can be constructed in O(n) time
L6 - Tries CS 6213 - Advanced Data Structures - Arora 14
ANALYSIS OF SUFFIX TRIES
7, 7 2, 7
2, 7 6, 7
6, 7
4, 7 2, 7 6, 7
1, 1 0, 1
m i n i z em i
0 1 2 3 4 5 6 7
Auto complete: User types “Rob” and you can type
with all words that begin with Rob, or all contacts
that begin with Rob, etc.
Sequence Assembly in Genetics Sequences
Sorting of Large Sets of Strings: BurstSort
Big Data: See “TeraSort.java” source code
L6 - Tries CS 6213 - Advanced Data Structures - Arora 15
APPLICATIONS OF TRIES
L6 - Tries CS 6213 - Advanced Data Structures - Arora 16
SAMPLE APPLICATION – IP ROUTING
Packets of Fun
L6 - Tries CS 6213 - Advanced Data Structures - Arora 17
ROUTING TABLE LOOKUP
Routing
Decision
Forwarding
Decision
Forwarding
Decision
Routing
Table
Routing
Table
Routing
Table
Switch Fabric
Output
Scheduling
A standardized exterior gateway protocol designed to
exchange routing and reachability information
between autonomous systems (AS) on the Internet.
Makes routing decisions based on paths, network
policies and/or rule-sets configured by a network
administrator.
Plays a key role in the overall operation of the
Internet and is involved in making core routing
decisions.
[Itself uses TCP to exchange its own data.]
L6 - Tries CS 6213 - Advanced Data Structures - Arora 18
BORDER GATEWAY PROTOCOL (BGP)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 19
IPV4 ROUTING TABLE SIZE
Source:GeoffHuston,APNIC
Destination address Next hop
10.0.0.0/8 R1
128.143.0.0/16 R2
128.143.64.0/20
R3
128.143.192.0/20 R3
128.143.71.0/24 R4
128.143.71.55/32 R3
Default R5
With CIDR, there can be multiple
matches for a destination address in the
routing table
Longest Prefix Match: Search for the
routing table entry that has the longest
match with the prefix of the destination
IP address (Most Specific Router):
1. Search for a match on all 32 bits
2. Search for a match for 31 bits
…..
32. Search for a match on 0 bits
Needed: Data structure that supports a FAST
longest prefix match lookup!
L6 - Tries CS 6213 - Advanced Data Structures - Arora 20
ROUTING TABLE LOOKUP: LONGEST
PREFIX MATCH
128.143.71.21
The longest prefix match for
128.143.71.21 is with
128.143.71.0/24
 Datagram will be sent to R4
The following algorithms are suitable for Longest
Prefix Match routing table lookups
 Tries
 Path-Compressed Tries
 Disjoint-prefix binary Tries
 Multibit Tries
 Binary Search on Prefix
 Prefix Range Search
L6 - Tries CS 6213 - Advanced Data Structures - Arora 21
IP ADDRESS LOOKUP ALGORITHMS
t p
te to po
t p
e o
ten tea
n a
top
o
pot
o
t
A trie is a tree-based
data structure for
storing strings:
 There is one node for every
common prefix
 The strings are stored in
extra leaf nodes
 Prefixes are not only stored
at leaf nodes but also at
internal nodes
L6 - Tries CS 6213 - Advanced Data Structures - Arora 22
SLIGHTLY DIFFERENT VERSION OF TRIE
Structure
 Each leaf contains a
possible address
 Prefixes in the table are
marked (dark)
Search
 Traverse the tree
according to destination
address
 Most recent marked node
is the current longest
prefix
 Search ends when a leaf
node is reached
L6 - Tries CS 6213 - Advanced Data Structures - Arora 23
BINARY TRIE
Update
 Search for the
new entry
 Search ends
when a leaf node
is reached
 If there is no
branch to take,
insert new
node(s)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 24
BINARY TRIE
z 1010*
1
z
0
 Path Compression:
 Requires to store additional information with nodes Bit number
field is added to node
 Bit string of prefixes must be explicitly stored at nodes
 Need to make comparison when searching the tree
 Goal: Eliminate long
sequences of 1-child
nodes
 Path compression 
collapses 1-child
branches
L6 - Tries CS 6213 - Advanced Data Structures - Arora 25
COMPRESSED BINARY TRIE
d
 Search: “010110”
 Root node: Inspect 1st bit and move left
 “a” node:
 Check with prefix of a (“0*”) and find a match
 Inspect 3rd bit and move left
 “b” node:
 Check with prefix of b (“01000*”) and determine that there is no match
 Search stops. Longest prefix match is with a
L6 - Tries CS 6213 - Advanced Data Structures - Arora 26
COMPRESSED BINARY TRIE
d
 Disjoint prefix:
 Nodes are split so that there is only one match for each prefix (“Leaf pushing”)
 Consequence: Internal nodes do not match with prefixes
 Results:
 a (0*) is split into: a1 (00*), a3 (010*), a2 (01001*)
 d (1*) is represented as d1 (101*)
 Multiple matches in
longest prefix rule
require backtracking
of search
 Goal: Transform tree
as to avoid multiple
matches
L6 - Tries CS 6213 - Advanced Data Structures - Arora 27
DISJOINT-PREFIX BINARY TRIE
 2-bit stride:
 1-bit prefix for a (0*) is split into 00* and 01*
 1-bit prefix for d (1*) is split into 10* and 11*
 3-bit prefix for c has been expanded to two nodes
 Why are the prefixes for b and e not expanded?
 Goal: Accelerate lookup
by inspecting more than
one bit at a time
 “Stride”: number of bits
inspected at one time
 With k-bit stride, node
has up to 2k child nodes
L6 - Tries CS 6213 - Advanced Data Structures - Arora 28
VARIABLE-STRIDE MULTIBIT TRIE
Scheme Lookup Update Memory
Binary trie O(W) O(W) O(NW)
Path-compressed trie O(W) O(W) O(NW)
k-stride multibit trie O(W/k) O(W/k+2k) O(2kNW/k)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 29
COMPLEXITY OF THE LOOKUP
 Bounds are expressed for
 Look-up time: What is the longest lookup time?
 Update time: How long does it take to change an entry?
 Memory: How much memory is required to store the data structure?
 W: length of the address (32 bits)
 N: number of prefix in the routing table
Excellent data structure for managing Strings
Supports prefix and suffix kind of lookups
Extremely fast – After the Trie has been built, the
search time is O(m) where m is the size of the
pattern.
Can be used to build indexes
Various applications in areas that use Strings
(Literature/Dictionary/Content, as well as Networks
and Bioinformatics)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 30
CONCLUSIONS: TRIES

More Related Content

What's hot (20)

Tree Traversal
Tree TraversalTree Traversal
Tree Traversal
 
Join
JoinJoin
Join
 
Tries
TriesTries
Tries
 
Tree in data structure
Tree in data structureTree in data structure
Tree in data structure
 
Sorting Algorithms
Sorting AlgorithmsSorting Algorithms
Sorting Algorithms
 
Leftist heap
Leftist heapLeftist heap
Leftist heap
 
Recovery with concurrent transaction
Recovery with concurrent transactionRecovery with concurrent transaction
Recovery with concurrent transaction
 
Binary search tree in data structures
Binary search tree in  data structuresBinary search tree in  data structures
Binary search tree in data structures
 
Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure
 
Doubly linked list (animated)
Doubly linked list (animated)Doubly linked list (animated)
Doubly linked list (animated)
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 
B tree
B  treeB  tree
B tree
 
B tree
B treeB tree
B tree
 
Red black tree
Red black treeRed black tree
Red black tree
 
Functional dependencies in Database Management System
Functional dependencies in Database Management SystemFunctional dependencies in Database Management System
Functional dependencies in Database Management System
 
Binary tree
Binary  treeBinary  tree
Binary tree
 
Tree in data structure
Tree in data structureTree in data structure
Tree in data structure
 
Data Structures - Lecture 9 [Stack & Queue using Linked List]
 Data Structures - Lecture 9 [Stack & Queue using Linked List] Data Structures - Lecture 9 [Stack & Queue using Linked List]
Data Structures - Lecture 9 [Stack & Queue using Linked List]
 
Trees (data structure)
Trees (data structure)Trees (data structure)
Trees (data structure)
 

Viewers also liked

Online Algorithms - An Introduction
Online Algorithms - An IntroductionOnline Algorithms - An Introduction
Online Algorithms - An IntroductionAmrinder Arora
 
BTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsBTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsAmrinder Arora
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmayTanmay 'Unsinkable'
 
Binomial Heaps and Fibonacci Heaps
Binomial Heaps and Fibonacci HeapsBinomial Heaps and Fibonacci Heaps
Binomial Heaps and Fibonacci HeapsAmrinder Arora
 
Splay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresSplay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresAmrinder Arora
 
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisEuclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisAmrinder Arora
 
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Amrinder Arora
 
Fundamental File Processing Operations
Fundamental File Processing OperationsFundamental File Processing Operations
Fundamental File Processing OperationsRico
 
Trees data structure
Trees data structureTrees data structure
Trees data structureSumit Gupta
 
KMP - Social Media Marketing Seminar Blogging For Business
KMP - Social Media Marketing Seminar   Blogging For BusinessKMP - Social Media Marketing Seminar   Blogging For Business
KMP - Social Media Marketing Seminar Blogging For BusinessMicrosoft
 
X86opti 05 s5yata
X86opti 05 s5yataX86opti 05 s5yata
X86opti 05 s5yatas5yata
 
Lecture storage-buffer
Lecture storage-bufferLecture storage-buffer
Lecture storage-bufferKlaas Krona
 
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Hirochika Asai
 
Introduction of suffix tree
Introduction of suffix treeIntroduction of suffix tree
Introduction of suffix treeLiou Shu Hung
 
Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46myrajendra
 

Viewers also liked (20)

Trie (1)
Trie (1)Trie (1)
Trie (1)
 
Trie tree
Trie treeTrie tree
Trie tree
 
Online Algorithms - An Introduction
Online Algorithms - An IntroductionOnline Algorithms - An Introduction
Online Algorithms - An Introduction
 
BTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsBTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTs
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmay
 
Binomial Heaps and Fibonacci Heaps
Binomial Heaps and Fibonacci HeapsBinomial Heaps and Fibonacci Heaps
Binomial Heaps and Fibonacci Heaps
 
Lec18
Lec18Lec18
Lec18
 
Splay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresSplay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data Structures
 
Algorithmic Puzzles
Algorithmic PuzzlesAlgorithmic Puzzles
Algorithmic Puzzles
 
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisEuclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
 
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
 
Fundamental File Processing Operations
Fundamental File Processing OperationsFundamental File Processing Operations
Fundamental File Processing Operations
 
Trees data structure
Trees data structureTrees data structure
Trees data structure
 
KMP - Social Media Marketing Seminar Blogging For Business
KMP - Social Media Marketing Seminar   Blogging For BusinessKMP - Social Media Marketing Seminar   Blogging For Business
KMP - Social Media Marketing Seminar Blogging For Business
 
X86opti 05 s5yata
X86opti 05 s5yataX86opti 05 s5yata
X86opti 05 s5yata
 
Lecture storage-buffer
Lecture storage-bufferLecture storage-buffer
Lecture storage-buffer
 
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction of suffix tree
Introduction of suffix treeIntroduction of suffix tree
Introduction of suffix tree
 
Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46
 

Similar to Tries - Tree Based Structures for Strings

Graphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsGraphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsAmrinder Arora
 
R-Trees and Geospatial Data Structures
R-Trees and Geospatial Data StructuresR-Trees and Geospatial Data Structures
R-Trees and Geospatial Data StructuresAmrinder Arora
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
29_Tries.ppt nnnnnnnnnnnnnnnnnnnnnnnnnnn
29_Tries.ppt nnnnnnnnnnnnnnnnnnnnnnnnnnn29_Tries.ppt nnnnnnnnnnnnnnnnnnnnnnnnnnn
29_Tries.ppt nnnnnnnnnnnnnnnnnnnnnnnnnnnratnapatil14
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11AnwarrChaudary
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14AnwarrChaudary
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19AnwarrChaudary
 
AIN102S Access string function sample queries
AIN102S Access string function sample queriesAIN102S Access string function sample queries
AIN102S Access string function sample queriesDan D'Urso
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in RLun-Hsien Chang
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataFrens Jan Rumph
 
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMOPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMJitendra Choudhary
 
JAVA 2013 IEEE NETWORKSECURITY PROJECT Spatial approximate string search
JAVA 2013 IEEE NETWORKSECURITY PROJECT Spatial approximate string searchJAVA 2013 IEEE NETWORKSECURITY PROJECT Spatial approximate string search
JAVA 2013 IEEE NETWORKSECURITY PROJECT Spatial approximate string searchIEEEGLOBALSOFTTECHNOLOGIES
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingTakuma Wakamori
 

Similar to Tries - Tree Based Structures for Strings (20)

Shishirppt
ShishirpptShishirppt
Shishirppt
 
Graphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsGraphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their Representations
 
R-Trees and Geospatial Data Structures
R-Trees and Geospatial Data StructuresR-Trees and Geospatial Data Structures
R-Trees and Geospatial Data Structures
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
29_Tries.ppt nnnnnnnnnnnnnnnnnnnnnnnnnnn
29_Tries.ppt nnnnnnnnnnnnnnnnnnnnnnnnnnn29_Tries.ppt nnnnnnnnnnnnnnnnnnnnnnnnnnn
29_Tries.ppt nnnnnnnnnnnnnnnnnnnnnnnnnnn
 
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19
 
AIN102S Access string function sample queries
AIN102S Access string function sample queriesAIN102S Access string function sample queries
AIN102S Access string function sample queries
 
Lesson11 transactions
Lesson11 transactionsLesson11 transactions
Lesson11 transactions
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in R
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big Data
 
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMOPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
 
50120130405006
5012013040500650120130405006
50120130405006
 
JAVA 2013 IEEE NETWORKSECURITY PROJECT Spatial approximate string search
JAVA 2013 IEEE NETWORKSECURITY PROJECT Spatial approximate string searchJAVA 2013 IEEE NETWORKSECURITY PROJECT Spatial approximate string search
JAVA 2013 IEEE NETWORKSECURITY PROJECT Spatial approximate string search
 
Spatial approximate string search
Spatial approximate string searchSpatial approximate string search
Spatial approximate string search
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and Processing
 

More from Amrinder Arora

Graph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchGraph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchAmrinder Arora
 
Graph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalGraph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalAmrinder Arora
 
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...Amrinder Arora
 
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaArima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora
 
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...Amrinder Arora
 
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...Amrinder Arora
 
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)Amrinder Arora
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine LearningAmrinder Arora
 
Dynamic Programming - Part II
Dynamic Programming - Part IIDynamic Programming - Part II
Dynamic Programming - Part IIAmrinder Arora
 
Dynamic Programming - Part 1
Dynamic Programming - Part 1Dynamic Programming - Part 1
Dynamic Programming - Part 1Amrinder Arora
 
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsDivide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsAmrinder Arora
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1Amrinder Arora
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAmrinder Arora
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
 
Set Operations - Union Find and Bloom Filters
Set Operations - Union Find and Bloom FiltersSet Operations - Union Find and Bloom Filters
Set Operations - Union Find and Bloom FiltersAmrinder Arora
 
Binary Search Trees - AVL and Red Black
Binary Search Trees - AVL and Red BlackBinary Search Trees - AVL and Red Black
Binary Search Trees - AVL and Red BlackAmrinder Arora
 
Stacks, Queues, Binary Search Trees - Lecture 1 - Advanced Data Structures
Stacks, Queues, Binary Search Trees -  Lecture 1 - Advanced Data StructuresStacks, Queues, Binary Search Trees -  Lecture 1 - Advanced Data Structures
Stacks, Queues, Binary Search Trees - Lecture 1 - Advanced Data StructuresAmrinder Arora
 

More from Amrinder Arora (20)

NP-Completeness - II
NP-Completeness - IINP-Completeness - II
NP-Completeness - II
 
Graph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchGraph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First Search
 
Graph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalGraph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search Traversal
 
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
 
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaArima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
 
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...
 
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...
 
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
Proof of Cook Levin Theorem (Presentation by Xiechuan, Song and Shuo)
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine Learning
 
NP completeness
NP completenessNP completeness
NP completeness
 
Dynamic Programming - Part II
Dynamic Programming - Part IIDynamic Programming - Part II
Dynamic Programming - Part II
 
Dynamic Programming - Part 1
Dynamic Programming - Part 1Dynamic Programming - Part 1
Dynamic Programming - Part 1
 
Greedy Algorithms
Greedy AlgorithmsGreedy Algorithms
Greedy Algorithms
 
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsDivide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data Structures
 
Introduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic NotationIntroduction to Algorithms and Asymptotic Notation
Introduction to Algorithms and Asymptotic Notation
 
Set Operations - Union Find and Bloom Filters
Set Operations - Union Find and Bloom FiltersSet Operations - Union Find and Bloom Filters
Set Operations - Union Find and Bloom Filters
 
Binary Search Trees - AVL and Red Black
Binary Search Trees - AVL and Red BlackBinary Search Trees - AVL and Red Black
Binary Search Trees - AVL and Red Black
 
Stacks, Queues, Binary Search Trees - Lecture 1 - Advanced Data Structures
Stacks, Queues, Binary Search Trees -  Lecture 1 - Advanced Data StructuresStacks, Queues, Binary Search Trees -  Lecture 1 - Advanced Data Structures
Stacks, Queues, Binary Search Trees - Lecture 1 - Advanced Data Structures
 

Recently uploaded

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Tries - Tree Based Structures for Strings

  • 1. CS 6213 – Advanced Data Structures TRIES AN EXCELLENT DATA STRUCTURE FOR STRINGS
  • 2. Instructor Prof. Amrinder Arora amrinder@gwu.edu Please copy TA on emails Please feel free to call as well TA Iswarya Parupudi iswarya2291@gwmail.gwu.edu L6 - Tries CS 6213 - Advanced Data Structures - Arora 2 LOGISTICS
  • 3. Michael T. Goodrich and Roberto Tamassia Data Structures and Algorithms in Java (4th edition) John Wiley & Sons, Inc. ISBN: 0-471-73884-0 Haim Kaplan, Tel Aviv University Jörg Liebeherr, University of Toronto L6 - Tries CS 6213 - Advanced Data Structures - Arora 3 CREDITS
  • 4. Naïve, brute force for searching a text of size n and a pattern of size m requires O(nm) time. Preprocessing the pattern speeds up pattern matching queries. E.g., KMP algorithm performs pattern matching in time proportional to the text size: O(n) If the text is large, immutable and searched often (e.g., Shakespeare), we may want to preprocess the text itself. Want to perform the searching in O(m) time. L6 - Tries CS 6213 - Advanced Data Structures - Arora 4 MOTIVATION
  • 5. A trie is a compact data structure for representing a set of strings, such as all the words in a text. A trie supports pattern matching queries in time proportional to the pattern size: O(m) L6 - Tries CS 6213 - Advanced Data Structures - Arora 5 MOTIVATION (CONT.)
  • 6. Standard Tries Compressed Tries Compact Representation Suffix Trie L6 - Tries CS 6213 - Advanced Data Structures - Arora 6 TRIES: TOPICS
  • 7.  The standard trie for a set of strings S is an ordered tree such that:  Each node but the root is labeled with a character  The children of a node are alphabetically ordered  The paths from the root to the leaves yield the strings of S  Example: set of strings S = { bear, bell, bid, bull, buy, sell, stock, stop } L6 - Tries CS 6213 - Advanced Data Structures - Arora 7 STANDARD TRIES a e b r l l s u l l y e t l l o c k p i d
  • 8. A standard trie uses O(n) space and supports searches, insertions and deletions in time O(dm), where: n total size of the strings in S m size of the string parameter of the operation d size of the alphabet L6 - Tries CS 6213 - Advanced Data Structures - Arora 8 ANALYSIS OF STANDARD TRIES a e b r l l s u l l y e t l l o c k p i d
  • 9.  We insert the words of the text into a trie  Each leaf stores the occurrences of the associated word in the text L6 - Tries CS 6213 - Advanced Data Structures - Arora 9 WORD MATCHING WITH A TRIE s e e b e a r ? s e l l s t o c k ! s e e b u l l ? b u y s t o c k ! b i d s t o c k ! a a h e t h e b e l l ? s t o p ! b i d s t o c k ! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 a r 87 88 a e b l s u l e t e 0, 24 o c i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a
  • 10.  A compressed trie has internal nodes of degree at least two  It is obtained from standard trie by compressing chains of “redundant” nodes L6 - Tries CS 6213 - Advanced Data Structures - Arora 10 COMPRESSED TRIES e b ar ll s u ll y ell to ck p id a e b r l l s u l l y e t l l o c k p i d
  • 11.  Compact representation of a compressed trie for an array of strings:  Stores at the nodes ranges of indices instead of substrings  Uses O(s) space, where s is the number of strings in the array  Serves as an auxiliary index structure L6 - Tries CS 6213 - Advanced Data Structures - Arora 11 COMPACT REPRESENTATION s e e b e a r s e l l s t o c k b u l l b u y b i d h e b e l l s t o p 0 1 2 3 4 a rS[0] = S[1] = S[2] = S[3] = S[4] = S[5] = S[6] = S[7] = S[8] = S[9] = 0 1 2 3 0 1 2 3 1, 1, 1 1, 0, 0 0, 0, 0 4, 1, 1 0, 2, 2 3, 1, 2 1, 2, 3 8, 2, 3 6, 1, 2 4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3 7, 0, 3 0, 1, 1
  • 12. Begins with: where name like ‘x%’ Ends with: where name like ‘%x’ Substring: where name like ‘%x%’ L6 - Tries CS 6213 - Advanced Data Structures - Arora 12 STRING SEARCHES
  • 13.  The suffix trie of a string X is the compressed trie of all the suffixes of X L6 - Tries CS 6213 - Advanced Data Structures - Arora 13 SUFFIX TRIE e nimize nimize ze zei mi mize nimize ze m i n i z em i 0 1 2 3 4 5 6 7
  • 14. Compact representation of the suffix trie for a string X of size n from an alphabet of size d  Uses O(n) space  Supports arbitrary pattern matching queries in X in O(dm) time, where m is the size of the pattern  Can be constructed in O(n) time L6 - Tries CS 6213 - Advanced Data Structures - Arora 14 ANALYSIS OF SUFFIX TRIES 7, 7 2, 7 2, 7 6, 7 6, 7 4, 7 2, 7 6, 7 1, 1 0, 1 m i n i z em i 0 1 2 3 4 5 6 7
  • 15. Auto complete: User types “Rob” and you can type with all words that begin with Rob, or all contacts that begin with Rob, etc. Sequence Assembly in Genetics Sequences Sorting of Large Sets of Strings: BurstSort Big Data: See “TeraSort.java” source code L6 - Tries CS 6213 - Advanced Data Structures - Arora 15 APPLICATIONS OF TRIES
  • 16. L6 - Tries CS 6213 - Advanced Data Structures - Arora 16 SAMPLE APPLICATION – IP ROUTING Packets of Fun
  • 17. L6 - Tries CS 6213 - Advanced Data Structures - Arora 17 ROUTING TABLE LOOKUP Routing Decision Forwarding Decision Forwarding Decision Routing Table Routing Table Routing Table Switch Fabric Output Scheduling
  • 18. A standardized exterior gateway protocol designed to exchange routing and reachability information between autonomous systems (AS) on the Internet. Makes routing decisions based on paths, network policies and/or rule-sets configured by a network administrator. Plays a key role in the overall operation of the Internet and is involved in making core routing decisions. [Itself uses TCP to exchange its own data.] L6 - Tries CS 6213 - Advanced Data Structures - Arora 18 BORDER GATEWAY PROTOCOL (BGP)
  • 19. L6 - Tries CS 6213 - Advanced Data Structures - Arora 19 IPV4 ROUTING TABLE SIZE Source:GeoffHuston,APNIC
  • 20. Destination address Next hop 10.0.0.0/8 R1 128.143.0.0/16 R2 128.143.64.0/20 R3 128.143.192.0/20 R3 128.143.71.0/24 R4 128.143.71.55/32 R3 Default R5 With CIDR, there can be multiple matches for a destination address in the routing table Longest Prefix Match: Search for the routing table entry that has the longest match with the prefix of the destination IP address (Most Specific Router): 1. Search for a match on all 32 bits 2. Search for a match for 31 bits ….. 32. Search for a match on 0 bits Needed: Data structure that supports a FAST longest prefix match lookup! L6 - Tries CS 6213 - Advanced Data Structures - Arora 20 ROUTING TABLE LOOKUP: LONGEST PREFIX MATCH 128.143.71.21 The longest prefix match for 128.143.71.21 is with 128.143.71.0/24  Datagram will be sent to R4
  • 21. The following algorithms are suitable for Longest Prefix Match routing table lookups  Tries  Path-Compressed Tries  Disjoint-prefix binary Tries  Multibit Tries  Binary Search on Prefix  Prefix Range Search L6 - Tries CS 6213 - Advanced Data Structures - Arora 21 IP ADDRESS LOOKUP ALGORITHMS
  • 22. t p te to po t p e o ten tea n a top o pot o t A trie is a tree-based data structure for storing strings:  There is one node for every common prefix  The strings are stored in extra leaf nodes  Prefixes are not only stored at leaf nodes but also at internal nodes L6 - Tries CS 6213 - Advanced Data Structures - Arora 22 SLIGHTLY DIFFERENT VERSION OF TRIE
  • 23. Structure  Each leaf contains a possible address  Prefixes in the table are marked (dark) Search  Traverse the tree according to destination address  Most recent marked node is the current longest prefix  Search ends when a leaf node is reached L6 - Tries CS 6213 - Advanced Data Structures - Arora 23 BINARY TRIE
  • 24. Update  Search for the new entry  Search ends when a leaf node is reached  If there is no branch to take, insert new node(s) L6 - Tries CS 6213 - Advanced Data Structures - Arora 24 BINARY TRIE z 1010* 1 z 0
  • 25.  Path Compression:  Requires to store additional information with nodes Bit number field is added to node  Bit string of prefixes must be explicitly stored at nodes  Need to make comparison when searching the tree  Goal: Eliminate long sequences of 1-child nodes  Path compression  collapses 1-child branches L6 - Tries CS 6213 - Advanced Data Structures - Arora 25 COMPRESSED BINARY TRIE d
  • 26.  Search: “010110”  Root node: Inspect 1st bit and move left  “a” node:  Check with prefix of a (“0*”) and find a match  Inspect 3rd bit and move left  “b” node:  Check with prefix of b (“01000*”) and determine that there is no match  Search stops. Longest prefix match is with a L6 - Tries CS 6213 - Advanced Data Structures - Arora 26 COMPRESSED BINARY TRIE d
  • 27.  Disjoint prefix:  Nodes are split so that there is only one match for each prefix (“Leaf pushing”)  Consequence: Internal nodes do not match with prefixes  Results:  a (0*) is split into: a1 (00*), a3 (010*), a2 (01001*)  d (1*) is represented as d1 (101*)  Multiple matches in longest prefix rule require backtracking of search  Goal: Transform tree as to avoid multiple matches L6 - Tries CS 6213 - Advanced Data Structures - Arora 27 DISJOINT-PREFIX BINARY TRIE
  • 28.  2-bit stride:  1-bit prefix for a (0*) is split into 00* and 01*  1-bit prefix for d (1*) is split into 10* and 11*  3-bit prefix for c has been expanded to two nodes  Why are the prefixes for b and e not expanded?  Goal: Accelerate lookup by inspecting more than one bit at a time  “Stride”: number of bits inspected at one time  With k-bit stride, node has up to 2k child nodes L6 - Tries CS 6213 - Advanced Data Structures - Arora 28 VARIABLE-STRIDE MULTIBIT TRIE
  • 29. Scheme Lookup Update Memory Binary trie O(W) O(W) O(NW) Path-compressed trie O(W) O(W) O(NW) k-stride multibit trie O(W/k) O(W/k+2k) O(2kNW/k) L6 - Tries CS 6213 - Advanced Data Structures - Arora 29 COMPLEXITY OF THE LOOKUP  Bounds are expressed for  Look-up time: What is the longest lookup time?  Update time: How long does it take to change an entry?  Memory: How much memory is required to store the data structure?  W: length of the address (32 bits)  N: number of prefix in the routing table
  • 30. Excellent data structure for managing Strings Supports prefix and suffix kind of lookups Extremely fast – After the Trie has been built, the search time is O(m) where m is the size of the pattern. Can be used to build indexes Various applications in areas that use Strings (Literature/Dictionary/Content, as well as Networks and Bioinformatics) L6 - Tries CS 6213 - Advanced Data Structures - Arora 30 CONCLUSIONS: TRIES