SlideShare a Scribd company logo
1 of 20
FP-Growth
Fatima Radi
Farah Al-Tufaili
University of Kufa – facility of computer science and math
computer science department 2015
First let’s look back to The Apriori Algorithm
 Apriori is an algorithm for frequent item set mining and
association rule learning over transactional databases.
 It proceeds by identifying the frequent individual items in
the database and extending them to larger and larger item
sets as long as those item sets appear sufficiently often in
the database.
 The frequent item sets determined by Apriori can be used
to determine association rules which highlight general
trends in the database: this has applications in domains
such as market basket analysis.
The Apriori Algorithm — Example
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D itemset sup.
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
itemset sup.
{1} 2
{2} 3
{3} 3
{5} 3
Scan D
C1
L1
itemset
{1 2}
{1 3}
{1 5}
{2 3}
{2 5}
{3 5}
itemset sup
{1 2} 1
{1 3} 2
{1 5} 1
{2 3} 2
{2 5} 3
{3 5} 2
itemset sup
{1 3} 2
{2 3} 2
{2 5} 3
{3 5} 2
L2
C2 C2
Scan D
C3 L3itemset
{2 3 5}
Scan D itemset sup
{2 3 5} 2
Is Apriori fast enough?
Basics of Apriori algorithm
Use frequent (k-1)-itemsets to generate k-itemsets
candidates
Scan the databases to determine frequent k-itemsets
 It is costly to handle a huge number of candidate sets
 If there are 104 frequent 1-itemsts, the Apriori algorithm will
need to generate more than 107 2-itemsets and test their
frequencies
Solving Apriori Algorithm Problems?
Another Method for Frequent Itemset
Generation
ECLAT
FP- Growth
AprioriDP
Context Based Association Rule Mining Algorithm
Node-set-based algorithms
GUHA procedure ASSOC
OPUS search
What Is FP-Growth
 An efficient and scalable method to complete set of frequent
patterns.
 It allow frequent item set discovery without candidate item set
generation.
 Two step approach:
1. Build a compact data structure called the FP-Tree.
2. Extracts frequent item set directly from the FP-Tree.
FP tree example (How to identify frequent
patterns using FP tree algorithm
Suppose we have the following DataBase
Step 1 - Calculate Minimum support
 First should calculate the minimum support count. Question says
minimum support should be 30%. It calculate as follows:
Minimum support count(30/100 * 8) = 2.4
As a result, 2.4 appears but to empower the easy calculation it can be
rounded to to the ceiling value. Now,
Minimum support count is ceiling(30/100 * 8) = 3 - See more at:
http://hareenlaks.blogspot.com/2011/06/fp-tree-example-how-to-
identify.html#sthash.4yIugLAd.dpuf
Step 2 - Find frequency of occurrence
 Now time to find the frequency of occurrence of each item in the
Database table. For example, item A occurs in row 1,row 2,row 3,row 4 and
row 7. Totally 5 times occurs in the Database table. You can see the
counted frequency of occurrence of each item in Table 2
Table 1 - Snapshot of the Database
TID frequency
A 5
B 6
C 3
D 6
E 4
Table2 -Frequency of Occurrence
Step 3 - Prioritize the items
 In Table 2 you can see the numbers written in Red pen. Those are the
priority of each item according to it's frequency of occurrence. Item B got
the highest priority (1) due to it's highest number of occurrences. At the
same time you have opportunity to drop the items which not fulfill the
minimum support requirement. For instance, if Database contain F which
has frequency 1, then you can drop it.
TID frequency priority
A 5 3
B 6 1
C 3 5
D 6 2
E 4 4
Table2 -Frequency of Occurrence
Step 4 -Order the items according to priority
 As you see in the Table 3 new column added to the Table 1. In the
Ordered Items column all the items are queued according to it's priority,
which mentioned in the Red ink in Table 2. For example, in the case of
ordering row 1, the highest priority item is B and after that D, A and E
respectively.
Table 3 - New version of the Table 1
Step 5 -Order the items according to priority
 As a result of previous steps we got a ordered items
table (Table 3). Now it's time to draw the FP tree. We
will mention it row by row
 Row 1:
Note that all FP trees have 'null' node as the root
node. So draw the root node first and attach the
items of the row 1 one by one respectively. (See the
Figure 1) And write their occurrences in front of it.
Figure 1- FP tree for Row 1
 Row 2:
Then update the above tree (Figure 1) by entering
the items of row 2. The items of row 2 are B,D,A,E,C.
Then without creating another branch you can go
through the previous branch up to E and then you
have to create new node after that for C. This case
same as a scenario of traveling through a road to
visit the towns of the country. You should go
through the same road to achieve another town
near to the particular town.
When you going through the branch second time
you should erase one and write two for indicating
the two times you visit to that node. If you visit
through three times then write three after erase
two
Figure 2- FP tree for Row 1,2
 Row 3:
In row 3 you have to visit B,A,E and C
respectively. So you may think you
can follow the same branch again by
replacing the values of B,A,E and C .
But you can't do that you have
opportunity to come through the B.
But can't connect B to existing A
overtaking D. As a result you should
draw another A and connect it to B
and then connect new E to that A and
new C to new E.
Figure 3 - After adding third row
 Row 4:
Then row 4 contain B,D,A. Now we can just rename
the frequency of occurrences in the existing branch.
As B:4,D,A:3.
 Row 5:
n fifth raw have only item D. Now we have
opportunity draw new branch from 'null' node. See
Figure 4.
Figure 4- Connect D to null node
 Row 6:
B and D appears in row 6. So just change the B:4
to B:5 and D:3 to D:4.
 Row 7:
Attach two new nodes A and E to the D node
which hanging on the null node. Then mark D,A,E
as D:2,A:1 and E:1.
 Row 8 :(Ohh.. last row)
Attach new node C to B. Change the traverse
times.(B:6,C:1
Figure 5 - Final FP tree
Step 6 - Validation
 After the five steps the final FP tree as follows: Figure 5.
How we know is this correct?
Now count the frequency of occurrence of each item of the FP tree and
compare it with Table 2. If both counts equal, then it is positive point to
indicate your tree is correct.
Benefits of the FP-tree Structure
Completeness
Preserve complete information for frequent pattern mining
Never break a long pattern of any transaction
Compactness
Reduce irrelevant info—infrequent items are gone
Items in frequency descending order: the more frequently
occurring, the more likely to be shared
Never be larger than the original database (not count node-links
and the count field)
There exists examples of databases, where compression ratio
could be over 100
FP-Growth Complexity
 Therefore, each path in the tree will be at least partially
traversed the number of items existing in that tree path (the
depth of the tree path) * the number of items in the header.
 Complexity of searching through all paths is then bounded by
O(header_count2 * depth of tree)

More Related Content

What's hot

Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Slideshare
 

What's hot (20)

FP-growth.pptx
FP-growth.pptxFP-growth.pptx
FP-growth.pptx
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Linked stacks and queues
Linked stacks and queuesLinked stacks and queues
Linked stacks and queues
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Apriori
AprioriApriori
Apriori
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Similar to Fp growth

Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docxConsider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
maxinesmith73660
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
Shani729
 
هياكلبيانات
هياكلبياناتهياكلبيانات
هياكلبيانات
Rafal Edward
 
Normalization form tutorial
Normalization form tutorialNormalization form tutorial
Normalization form tutorial
Nikhildas P C
 

Similar to Fp growth (20)

Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
Data structures
Data structuresData structures
Data structures
 
Normalization.ppt
Normalization.pptNormalization.ppt
Normalization.ppt
 
VCE Unit 05.pptx
VCE Unit 05.pptxVCE Unit 05.pptx
VCE Unit 05.pptx
 
D05333034
D05333034D05333034
D05333034
 
Data Structures
Data StructuresData Structures
Data Structures
 
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docxConsider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
هياكلبيانات
هياكلبياناتهياكلبيانات
هياكلبيانات
 
Lab6: I/O and Arrays
Lab6: I/O and ArraysLab6: I/O and Arrays
Lab6: I/O and Arrays
 
Assignment#11
Assignment#11Assignment#11
Assignment#11
 
Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)
 
Normalization form tutorial
Normalization form tutorialNormalization form tutorial
Normalization form tutorial
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
01-Introduction of DSA-1.pptx
01-Introduction of DSA-1.pptx01-Introduction of DSA-1.pptx
01-Introduction of DSA-1.pptx
 
Unit i(dsc++)
Unit i(dsc++)Unit i(dsc++)
Unit i(dsc++)
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Data Mining - FP Growth Algorithm
Data Mining - FP Growth AlgorithmData Mining - FP Growth Algorithm
Data Mining - FP Growth Algorithm
 
Bcsl 033 solve assignment
Bcsl 033 solve assignmentBcsl 033 solve assignment
Bcsl 033 solve assignment
 

More from Farah M. Altufaili

More from Farah M. Altufaili (10)

A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
 
Stereo vision
Stereo visionStereo vision
Stereo vision
 
Writing a good cv
Writing a good cvWriting a good cv
Writing a good cv
 
Virtual Private Network VPN
Virtual Private Network VPNVirtual Private Network VPN
Virtual Private Network VPN
 
Fuzzy image processing- fuzzy C-mean clustering
Fuzzy image processing- fuzzy C-mean clusteringFuzzy image processing- fuzzy C-mean clustering
Fuzzy image processing- fuzzy C-mean clustering
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Tiny encryption algorithm
Tiny encryption algorithmTiny encryption algorithm
Tiny encryption algorithm
 
Polygon mesh
Polygon  meshPolygon  mesh
Polygon mesh
 
Nanotechnology and its impact on modern computer
Nanotechnology and its impact on modern computerNanotechnology and its impact on modern computer
Nanotechnology and its impact on modern computer
 
Adversarial search
Adversarial search Adversarial search
Adversarial search
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
 

Recently uploaded (20)

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 

Fp growth

  • 1. FP-Growth Fatima Radi Farah Al-Tufaili University of Kufa – facility of computer science and math computer science department 2015
  • 2. First let’s look back to The Apriori Algorithm  Apriori is an algorithm for frequent item set mining and association rule learning over transactional databases.  It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.  The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.
  • 3. The Apriori Algorithm — Example TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database D itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 Scan D C1 L1 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 L2 C2 C2 Scan D C3 L3itemset {2 3 5} Scan D itemset sup {2 3 5} 2
  • 4. Is Apriori fast enough? Basics of Apriori algorithm Use frequent (k-1)-itemsets to generate k-itemsets candidates Scan the databases to determine frequent k-itemsets
  • 5.  It is costly to handle a huge number of candidate sets  If there are 104 frequent 1-itemsts, the Apriori algorithm will need to generate more than 107 2-itemsets and test their frequencies
  • 6. Solving Apriori Algorithm Problems? Another Method for Frequent Itemset Generation ECLAT FP- Growth AprioriDP Context Based Association Rule Mining Algorithm Node-set-based algorithms GUHA procedure ASSOC OPUS search
  • 7. What Is FP-Growth  An efficient and scalable method to complete set of frequent patterns.  It allow frequent item set discovery without candidate item set generation.  Two step approach: 1. Build a compact data structure called the FP-Tree. 2. Extracts frequent item set directly from the FP-Tree.
  • 8. FP tree example (How to identify frequent patterns using FP tree algorithm Suppose we have the following DataBase
  • 9. Step 1 - Calculate Minimum support  First should calculate the minimum support count. Question says minimum support should be 30%. It calculate as follows: Minimum support count(30/100 * 8) = 2.4 As a result, 2.4 appears but to empower the easy calculation it can be rounded to to the ceiling value. Now, Minimum support count is ceiling(30/100 * 8) = 3 - See more at: http://hareenlaks.blogspot.com/2011/06/fp-tree-example-how-to- identify.html#sthash.4yIugLAd.dpuf
  • 10. Step 2 - Find frequency of occurrence  Now time to find the frequency of occurrence of each item in the Database table. For example, item A occurs in row 1,row 2,row 3,row 4 and row 7. Totally 5 times occurs in the Database table. You can see the counted frequency of occurrence of each item in Table 2 Table 1 - Snapshot of the Database TID frequency A 5 B 6 C 3 D 6 E 4 Table2 -Frequency of Occurrence
  • 11. Step 3 - Prioritize the items  In Table 2 you can see the numbers written in Red pen. Those are the priority of each item according to it's frequency of occurrence. Item B got the highest priority (1) due to it's highest number of occurrences. At the same time you have opportunity to drop the items which not fulfill the minimum support requirement. For instance, if Database contain F which has frequency 1, then you can drop it. TID frequency priority A 5 3 B 6 1 C 3 5 D 6 2 E 4 4 Table2 -Frequency of Occurrence
  • 12. Step 4 -Order the items according to priority  As you see in the Table 3 new column added to the Table 1. In the Ordered Items column all the items are queued according to it's priority, which mentioned in the Red ink in Table 2. For example, in the case of ordering row 1, the highest priority item is B and after that D, A and E respectively. Table 3 - New version of the Table 1
  • 13. Step 5 -Order the items according to priority  As a result of previous steps we got a ordered items table (Table 3). Now it's time to draw the FP tree. We will mention it row by row  Row 1: Note that all FP trees have 'null' node as the root node. So draw the root node first and attach the items of the row 1 one by one respectively. (See the Figure 1) And write their occurrences in front of it. Figure 1- FP tree for Row 1
  • 14.  Row 2: Then update the above tree (Figure 1) by entering the items of row 2. The items of row 2 are B,D,A,E,C. Then without creating another branch you can go through the previous branch up to E and then you have to create new node after that for C. This case same as a scenario of traveling through a road to visit the towns of the country. You should go through the same road to achieve another town near to the particular town. When you going through the branch second time you should erase one and write two for indicating the two times you visit to that node. If you visit through three times then write three after erase two Figure 2- FP tree for Row 1,2
  • 15.  Row 3: In row 3 you have to visit B,A,E and C respectively. So you may think you can follow the same branch again by replacing the values of B,A,E and C . But you can't do that you have opportunity to come through the B. But can't connect B to existing A overtaking D. As a result you should draw another A and connect it to B and then connect new E to that A and new C to new E. Figure 3 - After adding third row
  • 16.  Row 4: Then row 4 contain B,D,A. Now we can just rename the frequency of occurrences in the existing branch. As B:4,D,A:3.  Row 5: n fifth raw have only item D. Now we have opportunity draw new branch from 'null' node. See Figure 4. Figure 4- Connect D to null node
  • 17.  Row 6: B and D appears in row 6. So just change the B:4 to B:5 and D:3 to D:4.  Row 7: Attach two new nodes A and E to the D node which hanging on the null node. Then mark D,A,E as D:2,A:1 and E:1.  Row 8 :(Ohh.. last row) Attach new node C to B. Change the traverse times.(B:6,C:1 Figure 5 - Final FP tree
  • 18. Step 6 - Validation  After the five steps the final FP tree as follows: Figure 5. How we know is this correct? Now count the frequency of occurrence of each item of the FP tree and compare it with Table 2. If both counts equal, then it is positive point to indicate your tree is correct.
  • 19. Benefits of the FP-tree Structure Completeness Preserve complete information for frequent pattern mining Never break a long pattern of any transaction Compactness Reduce irrelevant info—infrequent items are gone Items in frequency descending order: the more frequently occurring, the more likely to be shared Never be larger than the original database (not count node-links and the count field) There exists examples of databases, where compression ratio could be over 100
  • 20. FP-Growth Complexity  Therefore, each path in the tree will be at least partially traversed the number of items existing in that tree path (the depth of the tree path) * the number of items in the header.  Complexity of searching through all paths is then bounded by O(header_count2 * depth of tree)