SlideShare a Scribd company logo
1 of 29
Download to read offline
Sequential Pattern Mining
University of Kashan
Fall 2017
Hamidreza Mahdavipanah
Pegah Hajian
Narges Heydarzadeh
Professor: Dr. S. M. Vahidipour
Outlines
● Introduction
● Definitions
● GSP
● Constraints that GSP supports
Introduction
Studies on Sequential Pattern Mining
● First introduced by Agrawal and Srikant in 1995
○ They presented three algorithms
■ AprioriAll
■ AprioriSome
■ DynamicSome
● Then in 1996 they presented GSP algorithm which was much faster than
former algorithms and it also was generalized for more real life problems
● Pattern-growth methods: FreeSpan and PrefixSpan
● Mining closed sequential patterns: CloSpan
● ...
Applications
● Customer shopping sequences
○ First buy computer, then CD-ROM, and then digital camera, within 3
month.
● Medical treatments
● Natural disasters (e.g., earthquakes)
● Stocks and markets
● DNA sequences and gene structures
Problem Statement
We are given a database D of customer transactions. Each transaction consists of
the following fields:
We want to find all large sequences that have a certain user-specified minimum
support.
It is similar to the frequent itemsets mining, but with consideration of ordering.
customer-id transaction-time items
Example of a
database
Customer Id Transaction Time Items Bought
1 June 25 ‘93 30
1 June 30 ‘93 90
2 June 10 ‘93 10, 20
2 June 15 ‘93 30
2 June 20 ‘93 40, 60, 70
3 June 25 ‘93 30, 50, 70
4 June 25 ‘93 30
4 June 30 ‘93 40, 70
4 June 25 ‘93 90
5 June 12 ‘93 90
We convert DB
to this form
Customer Id Customer Sequence
1 <(30)(90)>
2 <(10 20) (30) (40 60 70)>
3 <(30 50 70)>
4 <(30) (40 70) (90)>
5 <(90)>
Definitions
Itemset and Sequence
An itemset is a non-empty set of items.
- We denote an itemset i by (i1
i2
… im
)
A Sequence is an ordered list of items.
- We denote a sequence s by <s1
s2
… sn
>
Subsequence and supersequence
Given two sequences α = <a1
a2
… an
> and β = <b1
b2
… bm
>:
● α is called a subsequence of β, if there exists integers 1 ≤ j1
< j2
< … < jn
≤ m
such that a1
⊆ b1
, a2
⊆ b2
, …, an
⊆ bjn
● β is called a supersequence of α
Example:
α=<(a b) d> and β=<(a b c) (d e)>
Apriori Property
of Sequential
Patterns
If a sequence S is not
frequent, then none of the
super-sequences of S is
frequent.
Example:
<h b> is infrequent -> so do
<h a b> and <(a h) b>
GSP
Outline of the GSP Method
● Initially, every item in DB is a candidate of length-1
● For each level (i.e., sequences of length-k) do
○ Scan database to collect support count for each candidate sequence
○ Generate candidate length(k + 1) sequences from length-k frequent
sequences using Apriori
○ Repeat until no frequent sequence or no candidate can be found
Major strength of GSP is its
candidate pruning by Apriori
property
Finding Length-1 Sequential Patterns
● Initial candidates:
○ <a>, <b>, <c>, <d>, <e>, <f>, <g>, <h>
● Scan database once, count support for candidates
Generating Length-2 Candidates
= 36
Generating Length-2 Candidates
= 15
Generating Length-2 Candidates
Using Apriori : 36 + 15 = 51 length-2 candidates
Without Apriori : (8 * 8) + (8 * 7) / 2 = 92 length- candidates
Apriori prunes 44.57% candidates
Finding Length-2 Sequential Patterns
● Scan database one more time, collect support count for
each length-2 candidate
● There are 19 length-2 candidates which pass the
minimum support threshold
○ They are length-2 sequential patterns
The GSP Mining Process
The GSP Algorithm
● Take sequences in form of <x> as length-1 candidates
● Scan database once, find F1
, the set of length-1 sequential
patterns
● Let k = 1; while Fk
is not empty do
○ Form Ck + 1
the set of length-(k + 1) candidates from Fk
○ If Ck + 1
is not empty, scan database once, find Fk + 1
, the
set of length(k + 1) sequential patterns
○ Let k = k + 1
The Good, the Bad, and the Ugly
● The Good: benefits from the Apriori pruning which reduces
search space
● The Bad: Scans the database multiple times
● The Ugly: Generates a huge set of candidates sequences
Why GSP is called Generalized Sequential Pattern Mining?
For practical use of SPM, in 1996 Agrawal and Srikant
introduced three type of constraints that makes SPM problem
more general and practical and since GSP support these
constraints it is called Generalized sequential pattern mining.
Constraints that GSP Supports
Time Constraint
An ability for users to specify maximum and/or minimum
time gaps between adjacent elements of the sequential
pattern.
Sliding Window
That is, each element of the pattern can be contained in the
union of the items bought in a set of transactions, as long as
the difference between the maximum and minimum
transaction-times is less than the size of a sliding time
window.
Taxonomy
An ability to define a taxonomy (is-a hierarchy) over the items
in the data.
References
● R. Agrawal and R. Srikant. Mining Sequential Patterns.1995
● R. Srikant and R. Agrawal. Mining Sequential Patterns.1996
● Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, PrefixSpan: Mining
Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. 2001

More Related Content

What's hot

Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
Apriori algorithm
Apriori algorithm Apriori algorithm
Apriori algorithm DHIVYADEVAKI
 
Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Examplekailash shaw
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithmhina firdaus
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)Bablu Shofi
 
Local beam search example
Local beam search exampleLocal beam search example
Local beam search exampleMegha Sharma
 
Local search algorithm
Local search algorithmLocal search algorithm
Local search algorithmMegha Sharma
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 

What's hot (20)

Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Apriori algorithm
Apriori algorithm Apriori algorithm
Apriori algorithm
 
Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Example
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Iterative deepening search
Iterative deepening searchIterative deepening search
Iterative deepening search
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Local beam search example
Local beam search exampleLocal beam search example
Local beam search example
 
Local search algorithm
Local search algorithmLocal search algorithm
Local search algorithm
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 

Similar to Sequential Pattern Mining and GSP

Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases IOSR Journals
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkTheodoros Vasiloudis
 
IFTA2020 Kei Nakagawa
IFTA2020 Kei NakagawaIFTA2020 Kei Nakagawa
IFTA2020 Kei NakagawaKei Nakagawa
 
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...Masaya Kaneko
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.pptAlpha474815
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.pptSagarDR5
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graphJames Wong
 
Graph classification problem.pptx
Graph classification problem.pptxGraph classification problem.pptx
Graph classification problem.pptxTony Nguyen
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graphFraboni Ec
 
Text categorization
Text categorization Text categorization
Text categorization Luis Goldster
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph David Hoen
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graphHarry Potter
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graphYoung Alista
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsWeikun Wang
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemErdi Olmezogullari
 
2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptxssuser1fb3df
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...Thanh Hieu
 

Similar to Sequential Pattern Mining and GSP (20)

Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases Mining Top-k Closed Sequential Patterns in Sequential Databases
Mining Top-k Closed Sequential Patterns in Sequential Databases
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
IFTA2020 Kei Nakagawa
IFTA2020 Kei NakagawaIFTA2020 Kei Nakagawa
IFTA2020 Kei Nakagawa
 
Data_structure using C-Adi.pdf
Data_structure using C-Adi.pdfData_structure using C-Adi.pdf
Data_structure using C-Adi.pdf
 
DAA Notes.pdf
DAA Notes.pdfDAA Notes.pdf
DAA Notes.pdf
 
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 
Graph classification problem.pptx
Graph classification problem.pptxGraph classification problem.pptx
Graph classification problem.pptx
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 
Text categorization
Text categorization Text categorization
Text categorization
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graph
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management System
 
2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 

Recently uploaded

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 

Recently uploaded (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 

Sequential Pattern Mining and GSP

  • 1. Sequential Pattern Mining University of Kashan Fall 2017 Hamidreza Mahdavipanah Pegah Hajian Narges Heydarzadeh Professor: Dr. S. M. Vahidipour
  • 2. Outlines ● Introduction ● Definitions ● GSP ● Constraints that GSP supports
  • 4. Studies on Sequential Pattern Mining ● First introduced by Agrawal and Srikant in 1995 ○ They presented three algorithms ■ AprioriAll ■ AprioriSome ■ DynamicSome ● Then in 1996 they presented GSP algorithm which was much faster than former algorithms and it also was generalized for more real life problems ● Pattern-growth methods: FreeSpan and PrefixSpan ● Mining closed sequential patterns: CloSpan ● ...
  • 5. Applications ● Customer shopping sequences ○ First buy computer, then CD-ROM, and then digital camera, within 3 month. ● Medical treatments ● Natural disasters (e.g., earthquakes) ● Stocks and markets ● DNA sequences and gene structures
  • 6. Problem Statement We are given a database D of customer transactions. Each transaction consists of the following fields: We want to find all large sequences that have a certain user-specified minimum support. It is similar to the frequent itemsets mining, but with consideration of ordering. customer-id transaction-time items
  • 7. Example of a database Customer Id Transaction Time Items Bought 1 June 25 ‘93 30 1 June 30 ‘93 90 2 June 10 ‘93 10, 20 2 June 15 ‘93 30 2 June 20 ‘93 40, 60, 70 3 June 25 ‘93 30, 50, 70 4 June 25 ‘93 30 4 June 30 ‘93 40, 70 4 June 25 ‘93 90 5 June 12 ‘93 90
  • 8. We convert DB to this form Customer Id Customer Sequence 1 <(30)(90)> 2 <(10 20) (30) (40 60 70)> 3 <(30 50 70)> 4 <(30) (40 70) (90)> 5 <(90)>
  • 10. Itemset and Sequence An itemset is a non-empty set of items. - We denote an itemset i by (i1 i2 … im ) A Sequence is an ordered list of items. - We denote a sequence s by <s1 s2 … sn >
  • 11. Subsequence and supersequence Given two sequences α = <a1 a2 … an > and β = <b1 b2 … bm >: ● α is called a subsequence of β, if there exists integers 1 ≤ j1 < j2 < … < jn ≤ m such that a1 ⊆ b1 , a2 ⊆ b2 , …, an ⊆ bjn ● β is called a supersequence of α Example: α=<(a b) d> and β=<(a b c) (d e)>
  • 12. Apriori Property of Sequential Patterns If a sequence S is not frequent, then none of the super-sequences of S is frequent. Example: <h b> is infrequent -> so do <h a b> and <(a h) b>
  • 13. GSP
  • 14. Outline of the GSP Method ● Initially, every item in DB is a candidate of length-1 ● For each level (i.e., sequences of length-k) do ○ Scan database to collect support count for each candidate sequence ○ Generate candidate length(k + 1) sequences from length-k frequent sequences using Apriori ○ Repeat until no frequent sequence or no candidate can be found
  • 15. Major strength of GSP is its candidate pruning by Apriori property
  • 16. Finding Length-1 Sequential Patterns ● Initial candidates: ○ <a>, <b>, <c>, <d>, <e>, <f>, <g>, <h> ● Scan database once, count support for candidates
  • 19. Generating Length-2 Candidates Using Apriori : 36 + 15 = 51 length-2 candidates Without Apriori : (8 * 8) + (8 * 7) / 2 = 92 length- candidates Apriori prunes 44.57% candidates
  • 20. Finding Length-2 Sequential Patterns ● Scan database one more time, collect support count for each length-2 candidate ● There are 19 length-2 candidates which pass the minimum support threshold ○ They are length-2 sequential patterns
  • 21. The GSP Mining Process
  • 22. The GSP Algorithm ● Take sequences in form of <x> as length-1 candidates ● Scan database once, find F1 , the set of length-1 sequential patterns ● Let k = 1; while Fk is not empty do ○ Form Ck + 1 the set of length-(k + 1) candidates from Fk ○ If Ck + 1 is not empty, scan database once, find Fk + 1 , the set of length(k + 1) sequential patterns ○ Let k = k + 1
  • 23. The Good, the Bad, and the Ugly ● The Good: benefits from the Apriori pruning which reduces search space ● The Bad: Scans the database multiple times ● The Ugly: Generates a huge set of candidates sequences
  • 24. Why GSP is called Generalized Sequential Pattern Mining? For practical use of SPM, in 1996 Agrawal and Srikant introduced three type of constraints that makes SPM problem more general and practical and since GSP support these constraints it is called Generalized sequential pattern mining.
  • 26. Time Constraint An ability for users to specify maximum and/or minimum time gaps between adjacent elements of the sequential pattern.
  • 27. Sliding Window That is, each element of the pattern can be contained in the union of the items bought in a set of transactions, as long as the difference between the maximum and minimum transaction-times is less than the size of a sliding time window.
  • 28. Taxonomy An ability to define a taxonomy (is-a hierarchy) over the items in the data.
  • 29. References ● R. Agrawal and R. Srikant. Mining Sequential Patterns.1995 ● R. Srikant and R. Agrawal. Mining Sequential Patterns.1996 ● Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. 2001