SlideShare a Scribd company logo
1 of 19
     
 pig.sh
120816
   Abstract
   Construction
   Implementation
   Reference
   Alias: position tree, PAT tree
   Important people
    o Weiner (1973)    first introduction
    o McCreight (1976) simplified the construction
    o Ukkonen (1995) fastest construction algorithm
    o Farach (1997)    optimal construction algorithm for all alphabets
   Trie
   string: S, length: N
   Suffix tree of S:
    o the paths from the root to the leaves have a one-to-one relationship
        with the suffixes of S.
    o edges spell non-empty strings.
    o all internal nodes (except perhaps the root) have at least two
        children
    -- reference. Wikipedia. Suffix tree
   String S = {peeper$}; Suffix(S,0) = {peeper$}
          ROOT
     p

     e

      e

     p

     e

      r
          peeper

            $
   String S = {peeper$}; Suffix(S,1) = {eeper$}
          ROOT
     p                 e

     e                       e

      e                      p

     p                       e

     e                       r
                                 eeper
      r
          peeper                  $

            $
   String S = {peeper$}; Suffix(S,2) = {eper$}
          ROOT
     p                 e

     e                       e           p

      e                      p           e

     p                       e           r
                                             eper
     e                       r
                                 eeper        $
      r
          peeper                  $

            $
   String S = {peeper$}; Suffix(S,3) = {per$}
          ROOT
     p                     e

     e                         e           p

      e            r           p           e
                       per
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   String S = {peeper$}; Suffix(S,4) = {er$}
          ROOT
     p                     e

     e                         e           p          r
                                                          er
      e            r           p           e
                       per                                $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   String S = {peeper$}; Suffix(S,5) = {r$}
          ROOT
                                                          r
     p                     e
                                                                   r
     e                         e           p          r
                                                              er   $
      e            r           p           e
                       per                                    $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   However, this isn’t a suffix tree. It’s a suffix trie.
          ROOT
                                                           r
      p                     e
                                                                    r
      e                         e           p          r
                                                               er   $
      e            r            p           e
                       per                                     $
      p                         e           r
                        $                       eper
      e                         r
                                    eeper        $
      r
          peeper                     $

            $
   Suffix trie can be compressed to suffix tree.
          ROOT
                                                          r
     p                     e
                                                                   r
     e                         e           p          r
                                                              er   $
      e            r           p           e
                       per                                    $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   The suffix tree of {peeper$} is completed.
           ROOT
                                                                r
     pe                     e
                                                                         r
    eper            r           eper           per          r
           peeper       per            eeper         eper           er   $

             $          $               $                           $
                                                      $
   There are many ways to implement suffix tree.
    o Sibling lists / unsorted arrays
    o Hash maps
    o Balanced search tree
    o Sorted array
    o Hash maps + sibling lists
Lookup   Insertion   Traversal
 Sibling lists /
unsorted arrays
  Hash maps
Balanced search
      tree
 Sorted arrays
 Hash maps +
  sibling lists
   How to implement the suffix tree/trie – child && sibling
        ROOT

         -85                    0                              72

          0                     0          -85         72

          0          72         -85         0

         -85                    0          72

          0                     72

         72
   struct node{
      struct node *child, *sibling;
      int c_num, s_num;
      int slope;
      int node_type;
      char *obslist_file;
    }
   node_type is used to indicate what the node is.
    (root / inter-node / leaf / terminal)
   obslist_file is used for external memory.
    The data that seldom queried will be recorded in this file.
   If the trie is too big, how can I do?
    o If trie is constructed by C-S-Link, every subtree is a binary tree.
    o Record the in-order and pre-/post- order sequence.
    o Use two sequence to reconstruct, if we want to query the subtree.
   Wikipedia – suffix tree
    http://en.wikipedia.org/wiki/Suffix_tree
   Data Structures, Algorithms, & Applications in Java Suffix Trees
    Copyright 1999 Sartaj Sahni
    http://www.cise.ufl.edu/~sahni/dsaaj/enrich/c16/suffix.htm#tree
   Websites for suffix tree/trie
     o   http://blog.csdn.net/ljsspace/article/details/6581850
     o   http://www.allisons.org/ll/AlgDS/Tree/Suffix/
     o   http://blog.csdn.net/TsengYuen/article/details/4815921
     o   http://www.cppblog.com/yuyang7/archive/2009/03/29/78252.html

More Related Content

What's hot

Graph in data structure
Graph in data structureGraph in data structure
Graph in data structureAbrish06
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structureSajid Marwat
 
Time space trade off
Time space trade offTime space trade off
Time space trade offanisha talwar
 
Hashing and separate chain
Hashing and separate chainHashing and separate chain
Hashing and separate chainVijayapriyaPandi
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)Shweta Ghate
 
Priority Queue in Data Structure
Priority Queue in Data StructurePriority Queue in Data Structure
Priority Queue in Data StructureMeghaj Mallick
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningYashwant Rautela
 
Graph traversal-BFS & DFS
Graph traversal-BFS & DFSGraph traversal-BFS & DFS
Graph traversal-BFS & DFSRajandeep Gill
 
Algorithms Lecture 4: Sorting Algorithms I
Algorithms Lecture 4: Sorting Algorithms IAlgorithms Lecture 4: Sorting Algorithms I
Algorithms Lecture 4: Sorting Algorithms IMohamed Loey
 
Discrete Mathematics Tree
Discrete Mathematics  TreeDiscrete Mathematics  Tree
Discrete Mathematics TreeMasud Parvaze
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Array data structure
Array data structureArray data structure
Array data structuremaamir farooq
 
3.1 bubble sort
3.1 bubble sort3.1 bubble sort
3.1 bubble sortKrish_ver2
 

What's hot (20)

Binary tree
Binary tree Binary tree
Binary tree
 
Graph in data structure
Graph in data structureGraph in data structure
Graph in data structure
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 
Graph theory
Graph  theoryGraph  theory
Graph theory
 
Time space trade off
Time space trade offTime space trade off
Time space trade off
 
Hashing PPT
Hashing PPTHashing PPT
Hashing PPT
 
Hashing and separate chain
Hashing and separate chainHashing and separate chain
Hashing and separate chain
 
Graphs bfs dfs
Graphs bfs dfsGraphs bfs dfs
Graphs bfs dfs
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Priority Queue in Data Structure
Priority Queue in Data StructurePriority Queue in Data Structure
Priority Queue in Data Structure
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Graph traversal-BFS & DFS
Graph traversal-BFS & DFSGraph traversal-BFS & DFS
Graph traversal-BFS & DFS
 
Algorithms Lecture 4: Sorting Algorithms I
Algorithms Lecture 4: Sorting Algorithms IAlgorithms Lecture 4: Sorting Algorithms I
Algorithms Lecture 4: Sorting Algorithms I
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
 
Discrete Mathematics Tree
Discrete Mathematics  TreeDiscrete Mathematics  Tree
Discrete Mathematics Tree
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Array data structure
Array data structureArray data structure
Array data structure
 
3.1 bubble sort
3.1 bubble sort3.1 bubble sort
3.1 bubble sort
 
Splay Tree
Splay TreeSplay Tree
Splay Tree
 

Viewers also liked (14)

Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46
 
Trie tree
Trie treeTrie tree
Trie tree
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
 
Data structure tries
Data structure triesData structure tries
Data structure tries
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Lec18
Lec18Lec18
Lec18
 
Application of tries
Application of triesApplication of tries
Application of tries
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Trie Data Structure
 
Fundamentals
FundamentalsFundamentals
Fundamentals
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
 
Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
Cis82 e2-1-packet forwarding
Cis82 e2-1-packet forwardingCis82 e2-1-packet forwarding
Cis82 e2-1-packet forwarding
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Introduction of suffix tree

  • 1.  pig.sh 120816
  • 2. Abstract  Construction  Implementation  Reference
  • 3. Alias: position tree, PAT tree  Important people o Weiner (1973) first introduction o McCreight (1976) simplified the construction o Ukkonen (1995) fastest construction algorithm o Farach (1997) optimal construction algorithm for all alphabets
  • 4. Trie  string: S, length: N  Suffix tree of S: o the paths from the root to the leaves have a one-to-one relationship with the suffixes of S. o edges spell non-empty strings. o all internal nodes (except perhaps the root) have at least two children -- reference. Wikipedia. Suffix tree
  • 5. String S = {peeper$}; Suffix(S,0) = {peeper$} ROOT p e e p e r peeper $
  • 6. String S = {peeper$}; Suffix(S,1) = {eeper$} ROOT p e e e e p p e e r eeper r peeper $ $
  • 7. String S = {peeper$}; Suffix(S,2) = {eper$} ROOT p e e e p e p e p e r eper e r eeper $ r peeper $ $
  • 8. String S = {peeper$}; Suffix(S,3) = {per$} ROOT p e e e p e r p e per p e r $ eper e r eeper $ r peeper $ $
  • 9. String S = {peeper$}; Suffix(S,4) = {er$} ROOT p e e e p r er e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 10. String S = {peeper$}; Suffix(S,5) = {r$} ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 11. However, this isn’t a suffix tree. It’s a suffix trie. ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 12. Suffix trie can be compressed to suffix tree. ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 13. The suffix tree of {peeper$} is completed. ROOT r pe e r eper r eper per r peeper per eeper eper er $ $ $ $ $ $
  • 14. There are many ways to implement suffix tree. o Sibling lists / unsorted arrays o Hash maps o Balanced search tree o Sorted array o Hash maps + sibling lists
  • 15. Lookup Insertion Traversal Sibling lists / unsorted arrays Hash maps Balanced search tree Sorted arrays Hash maps + sibling lists
  • 16. How to implement the suffix tree/trie – child && sibling ROOT -85 0 72 0 0 -85 72 0 72 -85 0 -85 0 72 0 72 72
  • 17. struct node{ struct node *child, *sibling; int c_num, s_num; int slope; int node_type; char *obslist_file; }  node_type is used to indicate what the node is. (root / inter-node / leaf / terminal)  obslist_file is used for external memory. The data that seldom queried will be recorded in this file.
  • 18. If the trie is too big, how can I do? o If trie is constructed by C-S-Link, every subtree is a binary tree. o Record the in-order and pre-/post- order sequence. o Use two sequence to reconstruct, if we want to query the subtree.
  • 19. Wikipedia – suffix tree http://en.wikipedia.org/wiki/Suffix_tree  Data Structures, Algorithms, & Applications in Java Suffix Trees Copyright 1999 Sartaj Sahni http://www.cise.ufl.edu/~sahni/dsaaj/enrich/c16/suffix.htm#tree  Websites for suffix tree/trie o http://blog.csdn.net/ljsspace/article/details/6581850 o http://www.allisons.org/ll/AlgDS/Tree/Suffix/ o http://blog.csdn.net/TsengYuen/article/details/4815921 o http://www.cppblog.com/yuyang7/archive/2009/03/29/78252.html