SlideShare a Scribd company logo
1 of 22
The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2011)
25 May 2011



         LGM: Mining Frequent Subgraphs
              from Linear Graphs

                                Yasuo Tabei
                           ERATO Minato Project
                    Japan Science and Technology Agency
                               joint work with
                 Daisuke Okanohara (Preferred Infrastructure),
                           Shuichi Hirose (AIST),
                              Koji Tsuda (AIST)


                                             1
                                                                                     1
Outline
• Introduction to linear graph
  ★   Linear subgraph relation
  ★   Total order among edges
• Frequent subgraph mining from a set of
  linear graphs
• Experiments
  ★   Motif extraction from protein 3D
      structures
                       2
                                           2
Linear graph (Davydov et al., 2004)
 • Labeled graph whose vertices are totally
   ordered
 • Linear graph g = (V, E, L , L )   V       E


   ‣ V ⊂ N : ordered vertex set
   ‣ E ⊆ V × V : edge set
   ‣ LV → ΣV : vertex labels
   ‣L →Σ
      E      E : edge labels

Example:
                                     c
                     b

                 a                   a
             1       2   3       4       5   6
             A       B   A       B       C       A
                             3
                                                     3
Linear subgraph relation
•   g1 is a linear subgraph of g2
      i) Conventional subgraph condition
        ★ Vertex labels are matched
        ★ All edges of g1 exist in g2 with the correct labels
       ii) Order of vertices are conserved
Example:
                                             b
                b
                                             c

        1
            a
                2    3
                         ⊂           a                a

                                 1   2   3        4   5   6
        A       B    A           A   A   B        B   C   A
                g1                           g2

                             4
                                                                4
Subgraph but not linear
              subgraph
•   g1 is a subgraph of g2
    ★ vertex labels are matched
    ★ all edges in g1also exist in g2 with
       correct labels
•   g1 is not a linear subgraph of g2
    ★   the order of vertices is not conserved
            b
                                 b       c       a
                 c
        1   2        3       1       2       3       4
        A   A        B       A       A       B       A
            g1                           g2
                         5
                                                         5
Total order among edges in a
             linear graph
• Compare the left vertices first. If they
    are identical, look at the right vertices
•     ∀e1 = (i, j) , e2 = (k, l) ∈ Eg , e1   <e e2
    if and only if (i) i < k or (ii) i = k, j < l
                                 Example:
     e1            e2                        2
                                                         3
                                       1
i         j k           l          1         2       3       4
                             6
                                                                 6
Outline
• Introduction to linear graph
  ★   linear subgraph relation
  ★   Total order among edges
• Frequent subgraph mining from a set of
  linear graphs
• Experiments
  ★   Motif extraction from protein 3D
      structures
                       7
                                           7
Frequent subgraph mining
               from linear graphs
• Enumerate all frequent subgraphs from a set of
    linear graphs
     ★ Subgraphs included in a set of linear graphs at
        least τ times (minimum support threshold)
    ★  Enumerate connected and disconnected subgraphs
       with a unified framework
     ★ Use reverse search for an efficient enumeration
       (Avis and Fukuda, 1993)
•   Polynomial delay
     ★ gSpan = exponential delay
                           8
                                                         8
Enumeration of all linear
  subgraph of a linear graph
• Before considering a mining
  algorithm, we have to solve the
  problem of subgraph enumeration
  first
• How to enumerate graph withoutof
  the following linear
                       all subgraphs
  duplication


                  9
                                       9
Search lattice of all subgraphs
          !"#$%
                        *+,-+!./!0+12!3!24
                                       &



                                       '




                                       (


                                       )

                  10
                                             10
Reverse search (Avis and Fukuda, 1993)
  • To enumerate all subgraphs without
    duplication, we need to define a search tree
    in the search lattice

  • Reduction map f
   ★ Mapping from a child to its parent
   ★ Remove the largest edge


               2       3
                               f            2
           1                            1
       1       2   3       4        1       2   3

                               11
                                                    11
Search tree induced by the
        reduction map
• By applying the reduction map to each
  element, search tree can be induced
                 !"#$%




                         12
                                          12
Inverting the reduction map                         f   −1


• When traversing the tree from the root,
  children nodes are created on demand
• In most cases, the inversion of reduction
  map takes the following two steps:
  ★   Consider all children candidates
  ★   Take the ones that qualify the reduction map

• However, in this particular case, the
  reduction map can be inverted explicitly
  ★   Can derive the pattern extension rule
      (parent to children)
                          13
                                                              13
Pattern extension rule




          14
                         14
Traversing search tree from root
• Depth first traversal for its memory efficiency
      $&!'()*+!,$'!-+!
      .!/')--!-'!-+!     !"#$%




                             15
                                                  15
Frequent subgraph mining
• Basic idea: find all possible extensions of a
    current pattern in the graph database, and
    extend the pattern
• Occurrence list L    G (g)
★   Record every occurrence of a pattern g in
    the graph database G
★   Calculate the support of a pattern g by the
    occurrence list                   !"#$%&'($""


• Usesupport for pruningof
  the
      anti-monotonicity
                                )$*+,+-



                       16
                                                    16
Outline
• Introduction to linear graph
  ★   linear subgraph relation
  ★   Total order among edges
• Frequent subgraph mining from a set of
  linear graphs
• Experiments
  ★   Motif extraction from protein 3D
      structures
                       17
                                           17
Motif extraction from protein
            3D structures
•   Pairs of homologous proteins in thermophilic
     organism and mesophilic organism
•   Construct a linear graph from a protein
     ★ Use vertex order from N- to C- terminal
     ★ Assign vertex labels from {1,...,6}
     ★ Draw an edge between pairs of amino acid
       residues whose distance is 5Å
•   # of data:742, avg. # of vertices:371, avg. # of edges:
    496
•   Rank the enumerated patterns by statistical
    significance (p-value)
     ★ Association to thermophilic/methophilic labels
     ★ Fisher exact test
                          18
                                                              18
Runtime comparison
• Compared to gSpan
• Made gapped linear graphs and run gSpan
• LGM is faster than gSpan




                    19
                                            19
• Minimum support = 10
• 103 patterns whose p-value < 0.001
•★Thermophilic (TATA), Mesophilic (pol II)
    Share the function as DNA binding
    protein, but the thermostatility is
    different




                     20
                                             20
Mapping motifs in 3D structure

• Thermophilic (TATA), Mesophilic (pol II)




                       21
                                             21
Summary

• Efficient subgraph mining algorithm from
  linear graphs
• Search tree is defined by reverse search
  principle
• Patterns include disconnected subgraphs
• Computational time is polynomial-delay
• Interesting patterns from proteins
                     22
                                            22

More Related Content

What's hot

20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04Computer Science Club
 
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...Mumbai B.Sc.IT Study
 
Digital Signals and System (April – 2015) [Revised Syllabus | Question Paper]
Digital Signals and System (April  – 2015) [Revised Syllabus | Question Paper]Digital Signals and System (April  – 2015) [Revised Syllabus | Question Paper]
Digital Signals and System (April – 2015) [Revised Syllabus | Question Paper]Mumbai B.Sc.IT Study
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentationAlizay Khan
 
Functions
FunctionsFunctions
FunctionsGaditek
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search Mustafa Jarrar
 
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]Mumbai B.Sc.IT Study
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Searchmatele41
 
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...Mumbai B.Sc.IT Study
 
Graphical Models In Python | Edureka
Graphical Models In Python | EdurekaGraphical Models In Python | Edureka
Graphical Models In Python | EdurekaEdureka!
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesopenCypher
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)Bablu Shofi
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesHema Kashyap
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesHema Kashyap
 
[Question Paper] Data Communication and Network Standards (Revised Course) [J...
[Question Paper] Data Communication and Network Standards (Revised Course) [J...[Question Paper] Data Communication and Network Standards (Revised Course) [J...
[Question Paper] Data Communication and Network Standards (Revised Course) [J...Mumbai B.Sc.IT Study
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...LDBC council
 

What's hot (20)

20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04
 
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
 
Digital Signals and System (April – 2015) [Revised Syllabus | Question Paper]
Digital Signals and System (April  – 2015) [Revised Syllabus | Question Paper]Digital Signals and System (April  – 2015) [Revised Syllabus | Question Paper]
Digital Signals and System (April – 2015) [Revised Syllabus | Question Paper]
 
Data structure
Data structureData structure
Data structure
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
 
Functions
FunctionsFunctions
Functions
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search
 
C applications
C applicationsC applications
C applications
 
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
 
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
 
Graphical Models In Python | Edureka
Graphical Models In Python | EdurekaGraphical Models In Python | Edureka
Graphical Models In Python | Edureka
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic Searches
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniques
 
[Question Paper] Data Communication and Network Standards (Revised Course) [J...
[Question Paper] Data Communication and Network Standards (Revised Course) [J...[Question Paper] Data Communication and Network Standards (Revised Course) [J...
[Question Paper] Data Communication and Network Standards (Revised Course) [J...
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
 
A star algorithms
A star algorithmsA star algorithms
A star algorithms
 
AI Lesson 04
AI Lesson 04AI Lesson 04
AI Lesson 04
 

Viewers also liked

Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Yasuo Tabei
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009Yasuo Tabei
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306Yasuo Tabei
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeYasuo Tabei
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesYasuo Tabei
 
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesYasuo Tabei
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201Yasuo Tabei
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicYasuo Tabei
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicYasuo Tabei
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20Yasuo Tabei
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabeiYasuo Tabei
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceYasuo Tabei
 
20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_searchComputer Science Club
 
LEXBFS on Chordal Graphs
LEXBFS on Chordal GraphsLEXBFS on Chordal Graphs
LEXBFS on Chordal Graphsnazlitemu
 

Viewers also liked (20)

Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTree
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
 
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20
 
GIW2013
GIW2013GIW2013
GIW2013
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabei
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
Lp Boost
Lp BoostLp Boost
Lp Boost
 
CSMR11b.ppt
CSMR11b.pptCSMR11b.ppt
CSMR11b.ppt
 
20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search
 
Jayant lrs
Jayant lrsJayant lrs
Jayant lrs
 
LEXBFS on Chordal Graphs
LEXBFS on Chordal GraphsLEXBFS on Chordal Graphs
LEXBFS on Chordal Graphs
 

Similar to Lgm pakdd2011 public

Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data StructureAnuj Modi
 
Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data StructureAnuj Modi
 
Graph data structure
Graph data structureGraph data structure
Graph data structureTech_MX
 
Skiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivitySkiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivityzukun
 
Attributed Graph Matching of Planar Graphs
Attributed Graph Matching of Planar GraphsAttributed Graph Matching of Planar Graphs
Attributed Graph Matching of Planar GraphsRaül Arlàndez
 
Propertiesofexponents
PropertiesofexponentsPropertiesofexponents
Propertiesofexponentssgrandstaff
 
Double Patterning (4/2 update)
Double Patterning (4/2 update)Double Patterning (4/2 update)
Double Patterning (4/2 update)Danny Luk
 
6.2 Notes
6.2 Notes6.2 Notes
6.2 Notesmbetzel
 
Object Recognition with Deformable Models
Object Recognition with Deformable ModelsObject Recognition with Deformable Models
Object Recognition with Deformable Modelszukun
 
MinFill_Presentation
MinFill_PresentationMinFill_Presentation
MinFill_PresentationAnna Lasota
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblasMIT
 

Similar to Lgm pakdd2011 public (20)

gSpan algorithm
 gSpan algorithm gSpan algorithm
gSpan algorithm
 
Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data Structure
 
Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data Structure
 
Graph data structure
Graph data structureGraph data structure
Graph data structure
 
Skiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivitySkiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivity
 
Attributed Graph Matching of Planar Graphs
Attributed Graph Matching of Planar GraphsAttributed Graph Matching of Planar Graphs
Attributed Graph Matching of Planar Graphs
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]
 
Graph theory
Graph theoryGraph theory
Graph theory
 
Propertiesofexponents
PropertiesofexponentsPropertiesofexponents
Propertiesofexponents
 
Graph
GraphGraph
Graph
 
Double Patterning (4/2 update)
Double Patterning (4/2 update)Double Patterning (4/2 update)
Double Patterning (4/2 update)
 
6.2 Notes
6.2 Notes6.2 Notes
6.2 Notes
 
Object Recognition with Deformable Models
Object Recognition with Deformable ModelsObject Recognition with Deformable Models
Object Recognition with Deformable Models
 
RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC
 
Surveys
SurveysSurveys
Surveys
 
MinFill_Presentation
MinFill_PresentationMinFill_Presentation
MinFill_Presentation
 
Lec28
Lec28Lec28
Lec28
 
Unit 2: All
Unit 2: AllUnit 2: All
Unit 2: All
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 

Lgm pakdd2011 public

  • 1. The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2011) 25 May 2011 LGM: Mining Frequent Subgraphs from Linear Graphs Yasuo Tabei ERATO Minato Project Japan Science and Technology Agency joint work with Daisuke Okanohara (Preferred Infrastructure), Shuichi Hirose (AIST), Koji Tsuda (AIST) 1 1
  • 2. Outline • Introduction to linear graph ★ Linear subgraph relation ★ Total order among edges • Frequent subgraph mining from a set of linear graphs • Experiments ★ Motif extraction from protein 3D structures 2 2
  • 3. Linear graph (Davydov et al., 2004) • Labeled graph whose vertices are totally ordered • Linear graph g = (V, E, L , L ) V E ‣ V ⊂ N : ordered vertex set ‣ E ⊆ V × V : edge set ‣ LV → ΣV : vertex labels ‣L →Σ E E : edge labels Example: c b a a 1 2 3 4 5 6 A B A B C A 3 3
  • 4. Linear subgraph relation • g1 is a linear subgraph of g2 i) Conventional subgraph condition ★ Vertex labels are matched ★ All edges of g1 exist in g2 with the correct labels ii) Order of vertices are conserved Example: b b c 1 a 2 3 ⊂ a a 1 2 3 4 5 6 A B A A A B B C A g1 g2 4 4
  • 5. Subgraph but not linear subgraph • g1 is a subgraph of g2 ★ vertex labels are matched ★ all edges in g1also exist in g2 with correct labels • g1 is not a linear subgraph of g2 ★ the order of vertices is not conserved b b c a c 1 2 3 1 2 3 4 A A B A A B A g1 g2 5 5
  • 6. Total order among edges in a linear graph • Compare the left vertices first. If they are identical, look at the right vertices • ∀e1 = (i, j) , e2 = (k, l) ∈ Eg , e1 <e e2 if and only if (i) i < k or (ii) i = k, j < l Example: e1 e2 2 3 1 i j k l 1 2 3 4 6 6
  • 7. Outline • Introduction to linear graph ★ linear subgraph relation ★ Total order among edges • Frequent subgraph mining from a set of linear graphs • Experiments ★ Motif extraction from protein 3D structures 7 7
  • 8. Frequent subgraph mining from linear graphs • Enumerate all frequent subgraphs from a set of linear graphs ★ Subgraphs included in a set of linear graphs at least τ times (minimum support threshold) ★ Enumerate connected and disconnected subgraphs with a unified framework ★ Use reverse search for an efficient enumeration (Avis and Fukuda, 1993) • Polynomial delay ★ gSpan = exponential delay 8 8
  • 9. Enumeration of all linear subgraph of a linear graph • Before considering a mining algorithm, we have to solve the problem of subgraph enumeration first • How to enumerate graph withoutof the following linear all subgraphs duplication 9 9
  • 10. Search lattice of all subgraphs !"#$% *+,-+!./!0+12!3!24 & ' ( ) 10 10
  • 11. Reverse search (Avis and Fukuda, 1993) • To enumerate all subgraphs without duplication, we need to define a search tree in the search lattice • Reduction map f ★ Mapping from a child to its parent ★ Remove the largest edge 2 3 f 2 1 1 1 2 3 4 1 2 3 11 11
  • 12. Search tree induced by the reduction map • By applying the reduction map to each element, search tree can be induced !"#$% 12 12
  • 13. Inverting the reduction map f −1 • When traversing the tree from the root, children nodes are created on demand • In most cases, the inversion of reduction map takes the following two steps: ★ Consider all children candidates ★ Take the ones that qualify the reduction map • However, in this particular case, the reduction map can be inverted explicitly ★ Can derive the pattern extension rule (parent to children) 13 13
  • 15. Traversing search tree from root • Depth first traversal for its memory efficiency $&!'()*+!,$'!-+! .!/')--!-'!-+! !"#$% 15 15
  • 16. Frequent subgraph mining • Basic idea: find all possible extensions of a current pattern in the graph database, and extend the pattern • Occurrence list L G (g) ★ Record every occurrence of a pattern g in the graph database G ★ Calculate the support of a pattern g by the occurrence list !"#$%&'($"" • Usesupport for pruningof the anti-monotonicity )$*+,+- 16 16
  • 17. Outline • Introduction to linear graph ★ linear subgraph relation ★ Total order among edges • Frequent subgraph mining from a set of linear graphs • Experiments ★ Motif extraction from protein 3D structures 17 17
  • 18. Motif extraction from protein 3D structures • Pairs of homologous proteins in thermophilic organism and mesophilic organism • Construct a linear graph from a protein ★ Use vertex order from N- to C- terminal ★ Assign vertex labels from {1,...,6} ★ Draw an edge between pairs of amino acid residues whose distance is 5Å • # of data:742, avg. # of vertices:371, avg. # of edges: 496 • Rank the enumerated patterns by statistical significance (p-value) ★ Association to thermophilic/methophilic labels ★ Fisher exact test 18 18
  • 19. Runtime comparison • Compared to gSpan • Made gapped linear graphs and run gSpan • LGM is faster than gSpan 19 19
  • 20. • Minimum support = 10 • 103 patterns whose p-value < 0.001 •★Thermophilic (TATA), Mesophilic (pol II) Share the function as DNA binding protein, but the thermostatility is different 20 20
  • 21. Mapping motifs in 3D structure • Thermophilic (TATA), Mesophilic (pol II) 21 21
  • 22. Summary • Efficient subgraph mining algorithm from linear graphs • Search tree is defined by reverse search principle • Patterns include disconnected subgraphs • Computational time is polynomial-delay • Interesting patterns from proteins 22 22