SlideShare a Scribd company logo
1 of 20
24th Annual Symposium on Combinatorial Pattern Matching,
Bad Herrenalb, 2013/6/19

A Succinct Grammar Compression
Yasuo Tabei1,
Yoshimasa Takabatake2,
Hiroshi Sakamoto2

1. Japan Science and Technology Agency
2. Kyusyu Institute of Technology
Straight Line Program (SLP)
• Canonical form of a CFG deriving a single string
• Every production rule satisfies
– Right hand side of a production rule is a digram
– Subscript of the left symbol is larger than subscrips of
the right symbols, i.e., Xk ➝ XiXj (k>i,j)

Example:

X1 → ab
X2 → aX1
X3 → bX1
X4 → X3b
X5 → X2X4

X5
X2
a

X4

X1
ab

b

X3
b

X1
ab
Grammar compression
• Builds up an SLP from a given string
• Two crucial data structures to access a production rule
Xk ➝ XiXj
1. Dictionary (Array) : Given Xk, return XiXj
2. Reverse dictionary (Hash table) : Given XiXj, return Xk if
Xk ➝ XiXj is registered in the dictionary
X1 → ab
X2 → aX1
X3 → bX1
X4 → X3b
X5 → X2X4

Access Xk ➝ A[2k-2]A[2k-1]

Space : 2nlogn bits (n : #variables)
Three open problems about an optimal
encoding of an SLP
1. An nontrivial information theoretic lower bound
for encoding an SLP
2. Optimal encoding of an SLP
– Standard array : 2nlogn bits (n:#variables)
– Present an encoding asymptotically equivalent to the
lower bound, while supporting fast random access

3. Space-efficient data structure for reverse
dictionary
– Hash table uses O(nlog(n)) bits
– Present a data structure of 2nlogn(1+o(1)) bits
An information theoretic lower bound for
representing an SLP
An information theoretic lower bound for
representing an SLP of n variables :
logn! + 2n + o(n) bits
• Use two techniques for the proof
1. Spanning tree decomposition for representing an
SLP as two ordered trees
2. Right most expansion for completely enumerating
ordered trees

• First introduce these two techniques, and then
show a sketch of the proof
Spanning tree decomposition [SPIRE11]
• Any SLP can be represented as left and right
spanning trees.

X5

X5
X2

X2
a

X4

X4

X2

X3

X1

b

X3
b

X1
a b

X5

X5

X4

a b

Spanning trees

DAG representation

Parse tree

X2
X3

X3

X1

X1

X1

a

X4

b
s

Indegree(s) = 2σ

a

b
s

b

a
s
Right most expansion [KDD02,SDM02]
• Build trees of (m+1) nodes from a tree of m
nodes
– Add a node to the nodes on the right most path
Example :

: right most path
Search space : All trees can be enumerated by
applying the right most expansion, recursively

level1 2

3

4
Sketch of the proof (detail)
• Basic idea: Consider a super set S(n) of DAG(n) without
the restriction of the in-degree 2σ of the sink, and count
|S(n)| by the induction
– Get
• Decompose S(n) into the left and right trees by the
spanning tree decomposition
• Count the number of left trees and the right trees of
(n+1) nodes by induction
ー Apply the right most expansion to the left tree
–
• Get the information-theoretic minimum bits for
representing G∈DAG(n) : logn!+2n+o(n) bits
An optimal encoding of an SLP
An optimal encoding of an SLP
• Basic idea : Encode the left and right symbols of the right
hand side of the production rules, respectively
• Rename the variables by traversing the left tree in the
breadth first manner
X5

X5
X4

X2

X4 Rename

X2

b
s

X1
X3
X2

X2

b

a
s

X5

X3

X1

X1
a

X5

X1

X3

X3

X4

X4

a

b
s

b

a
s
Encoding the left symbols
• Left symbols are monotonically increasing
• Apply gap encoding to the left symbols
• Use rank/select dictionary for O(1)-time access
X1 → a
X2 → a
X3 → b
X4 → X1
X5 → X3

X2
b
X2
X5
b

00013
Gap encoding

0010010010(1-0)10(3-1)1
11101001
n + o(n) bits (n: #variables)
O(1) time access
Encoding the right symbols D (detail)
• Extract subarrays si of monotonically increasing and
decreasing elements from D
• Use two integer arrays
,
and two bit arrays B,b
•
bits, and
access
time
Space-efficient data structure for reverse
dictionary
Space-efficient data structure for reverse
dictionary
• Recap : Reverse dictionary D-1

• Basic idea: Build a wavelet tree (WT) consisting
of right symbols XiXj, and simulate reverse
dictionary on the WT.
• Access and update time: O(logn), Space:
2nlogn(1+o(1)) bits
Build WT from digrams : The range of a digram is split
into the higher half (right) and the lower half (left)
Accessing Xk➝XiXj
EX) Access X3X2

rank1

rank0

• Start from the root B1 as
i = n (#variables)
• Apply rank1(Bj,i) for the right
child and rank0(Bj,i) for the
left child
• After reaching a leaf, go up to
the root by applying select
operation
• Solution: Xk = select0/1(B1,i)
Conclusion
• Three open problems related to an optimal
encoding of an SLP
1. an information theoretic lower bound
2. an optimal encoding
3. a dynamic data structure for reverse dictionary

• Novel challenges : Developing succinct data
structures of an SLP for various applications
e.g., self-index, pattern mining, q-gram mining etc
Fully-online grammar compression (FOLCA)
Maruyama, Tabei, Sakamoto, Sadakane (submitted)
• Builds and directly encodes an SLP into a
succinct representation of the partial parse tree
from a give text in an online fashion
Partial parse tree

Text : abaababb

Succinct
representation
12345678910
B:0010101011
L:abaX1X2

• Space : nlogn + 2n + o(n) bits, asymptotically
equivalent to our information theoretic lower bound

More Related Content

What's hot

Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...Tomonari Masada
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filterxlight
 
Monad presentation scala as a category
Monad presentation   scala as a categoryMonad presentation   scala as a category
Monad presentation scala as a categorysamthemonad
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clustersBurak Himmetoglu
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataThomas Gottron
 
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYComputer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYklirantga
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...T. E. BOGALE
 
Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsPractical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsMITSUNARI Shigeo
 
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Fabian Pedregosa
 

What's hot (20)

Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing...
 
LSH
LSHLSH
LSH
 
Big o notation
Big o notationBig o notation
Big o notation
 
Sortsearch
SortsearchSortsearch
Sortsearch
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Linear sorting
Linear sortingLinear sorting
Linear sorting
 
NumPy/SciPy Statistics
NumPy/SciPy StatisticsNumPy/SciPy Statistics
NumPy/SciPy Statistics
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filter
 
Monad presentation scala as a category
Monad presentation   scala as a categoryMonad presentation   scala as a category
Monad presentation scala as a category
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
 
01. haskell introduction
01. haskell introduction01. haskell introduction
01. haskell introduction
 
Heapsort 1
Heapsort 1Heapsort 1
Heapsort 1
 
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYComputer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
 
Matlab bode diagram_instructions
Matlab bode diagram_instructionsMatlab bode diagram_instructions
Matlab bode diagram_instructions
 
Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsPractical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
 
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
 

Similar to CPM2013-tabei201306

zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...Alex Pruden
 
logarithmic, exponential, trigonometric functions and their graphs.ppt
logarithmic, exponential, trigonometric functions and their graphs.pptlogarithmic, exponential, trigonometric functions and their graphs.ppt
logarithmic, exponential, trigonometric functions and their graphs.pptYohannesAndualem1
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingSam Light
 
Towards typesafe deep learning in scala
Towards typesafe deep learning in scalaTowards typesafe deep learning in scala
Towards typesafe deep learning in scalaTongfei Chen
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common SubsequenceSwati Swati
 
Dixon Deep Learning
Dixon Deep LearningDixon Deep Learning
Dixon Deep LearningSciCompIIT
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxRohanBorgalli
 
NameInstructor1. Find the interval(s) where i.docx
NameInstructor1. Find the interval(s) where  i.docxNameInstructor1. Find the interval(s) where  i.docx
NameInstructor1. Find the interval(s) where i.docxrosemarybdodson23141
 
lecture 11
lecture 11lecture 11
lecture 11sajinsc
 
CS-102 BT_24_3_14 Binary Tree Lectures.pdf
CS-102 BT_24_3_14 Binary Tree Lectures.pdfCS-102 BT_24_3_14 Binary Tree Lectures.pdf
CS-102 BT_24_3_14 Binary Tree Lectures.pdfssuser034ce1
 
On the Convex Layers of a Planer Dynamic Set of Points
On the Convex Layers of a Planer Dynamic Set of PointsOn the Convex Layers of a Planer Dynamic Set of Points
On the Convex Layers of a Planer Dynamic Set of PointsKasun Ranga Wijeweera
 
Succinct Data Structure for Analyzing Document Collection
Succinct Data Structure for Analyzing Document CollectionSuccinct Data Structure for Analyzing Document Collection
Succinct Data Structure for Analyzing Document CollectionPreferred Networks
 

Similar to CPM2013-tabei201306 (20)

zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
 
03.01 hash tables
03.01 hash tables03.01 hash tables
03.01 hash tables
 
logarithmic, exponential, trigonometric functions and their graphs.ppt
logarithmic, exponential, trigonometric functions and their graphs.pptlogarithmic, exponential, trigonometric functions and their graphs.ppt
logarithmic, exponential, trigonometric functions and their graphs.ppt
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - Hashing
 
Randamization.pdf
Randamization.pdfRandamization.pdf
Randamization.pdf
 
Towards typesafe deep learning in scala
Towards typesafe deep learning in scalaTowards typesafe deep learning in scala
Towards typesafe deep learning in scala
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common Subsequence
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Dsp manual print
Dsp manual printDsp manual print
Dsp manual print
 
Dixon Deep Learning
Dixon Deep LearningDixon Deep Learning
Dixon Deep Learning
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
 
NameInstructor1. Find the interval(s) where i.docx
NameInstructor1. Find the interval(s) where  i.docxNameInstructor1. Find the interval(s) where  i.docx
NameInstructor1. Find the interval(s) where i.docx
 
lecture 11
lecture 11lecture 11
lecture 11
 
CS-102 BT_24_3_14 Binary Tree Lectures.pdf
CS-102 BT_24_3_14 Binary Tree Lectures.pdfCS-102 BT_24_3_14 Binary Tree Lectures.pdf
CS-102 BT_24_3_14 Binary Tree Lectures.pdf
 
BTree.pptx
BTree.pptxBTree.pptx
BTree.pptx
 
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
 
Editors l21 l24
Editors l21 l24Editors l21 l24
Editors l21 l24
 
On the Convex Layers of a Planer Dynamic Set of Points
On the Convex Layers of a Planer Dynamic Set of PointsOn the Convex Layers of a Planer Dynamic Set of Points
On the Convex Layers of a Planer Dynamic Set of Points
 
Succinct Data Structure for Analyzing Document Collection
Succinct Data Structure for Analyzing Document CollectionSuccinct Data Structure for Analyzing Document Collection
Succinct Data Structure for Analyzing Document Collection
 

More from Yasuo Tabei

Space-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment KernelsSpace-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment KernelsYasuo Tabei
 
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesYasuo Tabei
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabeiYasuo Tabei
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesYasuo Tabei
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeYasuo Tabei
 
Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Yasuo Tabei
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20Yasuo Tabei
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201Yasuo Tabei
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 publicYasuo Tabei
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicYasuo Tabei
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicYasuo Tabei
 

More from Yasuo Tabei (17)

Space-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment KernelsSpace-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment Kernels
 
SISAP17
SISAP17SISAP17
SISAP17
 
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabei
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
 
GIW2013
GIW2013GIW2013
GIW2013
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTree
 
Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 public
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
 
Lp Boost
Lp BoostLp Boost
Lp Boost
 

Recently uploaded

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

CPM2013-tabei201306

  • 1. 24th Annual Symposium on Combinatorial Pattern Matching, Bad Herrenalb, 2013/6/19 A Succinct Grammar Compression Yasuo Tabei1, Yoshimasa Takabatake2, Hiroshi Sakamoto2 1. Japan Science and Technology Agency 2. Kyusyu Institute of Technology
  • 2. Straight Line Program (SLP) • Canonical form of a CFG deriving a single string • Every production rule satisfies – Right hand side of a production rule is a digram – Subscript of the left symbol is larger than subscrips of the right symbols, i.e., Xk ➝ XiXj (k>i,j) Example: X1 → ab X2 → aX1 X3 → bX1 X4 → X3b X5 → X2X4 X5 X2 a X4 X1 ab b X3 b X1 ab
  • 3. Grammar compression • Builds up an SLP from a given string • Two crucial data structures to access a production rule Xk ➝ XiXj 1. Dictionary (Array) : Given Xk, return XiXj 2. Reverse dictionary (Hash table) : Given XiXj, return Xk if Xk ➝ XiXj is registered in the dictionary X1 → ab X2 → aX1 X3 → bX1 X4 → X3b X5 → X2X4 Access Xk ➝ A[2k-2]A[2k-1] Space : 2nlogn bits (n : #variables)
  • 4. Three open problems about an optimal encoding of an SLP 1. An nontrivial information theoretic lower bound for encoding an SLP 2. Optimal encoding of an SLP – Standard array : 2nlogn bits (n:#variables) – Present an encoding asymptotically equivalent to the lower bound, while supporting fast random access 3. Space-efficient data structure for reverse dictionary – Hash table uses O(nlog(n)) bits – Present a data structure of 2nlogn(1+o(1)) bits
  • 5. An information theoretic lower bound for representing an SLP
  • 6. An information theoretic lower bound for representing an SLP of n variables : logn! + 2n + o(n) bits • Use two techniques for the proof 1. Spanning tree decomposition for representing an SLP as two ordered trees 2. Right most expansion for completely enumerating ordered trees • First introduce these two techniques, and then show a sketch of the proof
  • 7. Spanning tree decomposition [SPIRE11] • Any SLP can be represented as left and right spanning trees. X5 X5 X2 X2 a X4 X4 X2 X3 X1 b X3 b X1 a b X5 X5 X4 a b Spanning trees DAG representation Parse tree X2 X3 X3 X1 X1 X1 a X4 b s Indegree(s) = 2σ a b s b a s
  • 8. Right most expansion [KDD02,SDM02] • Build trees of (m+1) nodes from a tree of m nodes – Add a node to the nodes on the right most path Example : : right most path
  • 9. Search space : All trees can be enumerated by applying the right most expansion, recursively level1 2 3 4
  • 10. Sketch of the proof (detail) • Basic idea: Consider a super set S(n) of DAG(n) without the restriction of the in-degree 2σ of the sink, and count |S(n)| by the induction – Get • Decompose S(n) into the left and right trees by the spanning tree decomposition • Count the number of left trees and the right trees of (n+1) nodes by induction ー Apply the right most expansion to the left tree – • Get the information-theoretic minimum bits for representing G∈DAG(n) : logn!+2n+o(n) bits
  • 11. An optimal encoding of an SLP
  • 12. An optimal encoding of an SLP • Basic idea : Encode the left and right symbols of the right hand side of the production rules, respectively • Rename the variables by traversing the left tree in the breadth first manner X5 X5 X4 X2 X4 Rename X2 b s X1 X3 X2 X2 b a s X5 X3 X1 X1 a X5 X1 X3 X3 X4 X4 a b s b a s
  • 13. Encoding the left symbols • Left symbols are monotonically increasing • Apply gap encoding to the left symbols • Use rank/select dictionary for O(1)-time access X1 → a X2 → a X3 → b X4 → X1 X5 → X3 X2 b X2 X5 b 00013 Gap encoding 0010010010(1-0)10(3-1)1 11101001 n + o(n) bits (n: #variables) O(1) time access
  • 14. Encoding the right symbols D (detail) • Extract subarrays si of monotonically increasing and decreasing elements from D • Use two integer arrays , and two bit arrays B,b • bits, and access time
  • 15. Space-efficient data structure for reverse dictionary
  • 16. Space-efficient data structure for reverse dictionary • Recap : Reverse dictionary D-1 • Basic idea: Build a wavelet tree (WT) consisting of right symbols XiXj, and simulate reverse dictionary on the WT. • Access and update time: O(logn), Space: 2nlogn(1+o(1)) bits
  • 17. Build WT from digrams : The range of a digram is split into the higher half (right) and the lower half (left)
  • 18. Accessing Xk➝XiXj EX) Access X3X2 rank1 rank0 • Start from the root B1 as i = n (#variables) • Apply rank1(Bj,i) for the right child and rank0(Bj,i) for the left child • After reaching a leaf, go up to the root by applying select operation • Solution: Xk = select0/1(B1,i)
  • 19. Conclusion • Three open problems related to an optimal encoding of an SLP 1. an information theoretic lower bound 2. an optimal encoding 3. a dynamic data structure for reverse dictionary • Novel challenges : Developing succinct data structures of an SLP for various applications e.g., self-index, pattern mining, q-gram mining etc
  • 20. Fully-online grammar compression (FOLCA) Maruyama, Tabei, Sakamoto, Sadakane (submitted) • Builds and directly encodes an SLP into a succinct representation of the partial parse tree from a give text in an online fashion Partial parse tree Text : abaababb Succinct representation 12345678910 B:0010101011 L:abaX1X2 • Space : nlogn + 2n + o(n) bits, asymptotically equivalent to our information theoretic lower bound