SlideShare a Scribd company logo
1 of 25
Association Analysis
Association Analysis-Definition Association Analysis is the task of uncovering relationships among data. Association rules: It  is a model that identifies how the data items are associated with each other. Ex:        It is used in retail sales to identify that are frequently purchased together.
What is a rule?  ,[object Object],If (condition) then (result)  Example: IF a customer purchases coke, then the customer also purchases orange juice  The first part is the rule body and the second part is the rule head
Strength of a rule  How certain is the rule?  Confidence measures the certainty of a rule  It is the percentage of transactions containing all items stated in the condition that also contain the items in result  Confidence (A ,B) = P(B | A)  Example: The rule "If Coke then Oranje Juice" has a confidence of 100%
Strength of a rule  How often is the rule occurred?  Support measures the usefulness of a rule  It is the percentage of transactions that contains all items in the rule  Support (A , B) = P(A ,B)  Example: For the rule If Coke then Oranj juice  In all 5 transactions, 2 contains both coke and OJ  The support of the rule is 40% 
Association Rule Mining Two-step process  Find all frequent k-item sets, k=1, 2, 3, …  All items in a rule is referred as an itemset Rules that contains k item forms a k-itemset The occurrence frequency of an k-itemset is the number of transactions that contain all k items in the itemset An itemset satisfies a minimum support (or minimum occurrence frequency) is called a frequent itemset
Association Rule Mining 2.Generate strong association rules from the frequent k-itemsets Rules satisfy both a minimum support threshold and a minimum confidence threshold are called strong rules
Apriori Algorithm: Find all frequent k-item sets Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent
Illustrating Apriori Principle
Apriori Algorithm Method:  Let k=1 Generate frequent itemsets of length 1 Repeat until no new frequent itemsets are identified Generate length (k+1) candidate itemsets from length k frequent itemsets
Contd… Prune candidate itemsets containing subsets of length k that are infrequent  Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent
Generate strong association rules from the frequent k-itemsets For each frequent k-itemset, generate all non-empty subsets  Fore every nonempty subset, generate the rule and the associated confidence  Output the rule if the minimum confidence threshold is satisfied
Multilevel association rules Difficult to find strong associations at very low or primitive levels of data    Few people may buy "IBM desktop computer" and "Sony b/w printer" together  Many people may purchase "computer" and "printer" together
Concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level EX:                                IBM                                           Microsoft                                           Hp                                              ………                                          computer                                      software                                       printer                                    accessory 
Steps to be followed Top-down, progressive deepening approach  First mine high-level frequent items  Then mine their lower level frequent items and so on  At each level, Apriori algorithm is used  Use uniform minimum support for all levels, or  Use reduced minimum support at lower levels
Sequential Association Rule  Concerns sequences of events  New homeowners purchase shower curtains before purchasing furniture  When a customer goes into a bank branch and ask for an account reconciliation, there is a good chance that he or she will close all his or her accounts
Sequential Association Rule  Transaction must have two additional features:  a time stamp or sequencing information to determine when transactions occurred relative to each other  identifying information, such as account number or id number
Some important parameters  Duration  duration may be the entire available sequence in the database, or a user selected subsequence, such as year 1999  Event folding window  a set of events occurring within a specified period of time, such as within the same day, can be viewed as occurring together.
Some important parameters  Interval  between events in the discovered pattern  0 interval means to find strictly consecutive sequences  min_int <= interval <= max_int means to find patterns that are separated by at least min_int at most max_int interval = c, to find patterns carrying an exact interval
Some Practical Issues  Time window of transactions  Level of aggregation  Level of support and confidence
Time window of transactions  Select a time window for the transaction covers at least 2 product cycles  e.g. customer purchases a product with a frequency of six month or less, select a 12-month window of customer transaction data  For frequently purchased products, a short time window is sufficient  For low frequency items, a longer time window is necessary.
Level of aggregation  If product codes in the data are too specific (such as based on product details such as size and flavour), few associations will be discovered  Group products into categories according to the product hierarchy or create new level manually
Level of support and confidence  Start with a high support and gradually reduce it  Set confidence to around 50% to reduce the number of permutation
Conclusion Association analysis rules such as multidimensional and sequential association rules are studied. Apriori algorithm is described in detail Various practical issues in association rules are analyzed.
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

More Related Content

What's hot

What's hot (20)

Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Cloud computing architectures
Cloud computing architecturesCloud computing architectures
Cloud computing architectures
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Decision tree
Decision treeDecision tree
Decision tree
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data integration
Data integrationData integration
Data integration
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Fundamentals of Database Systems 6th Edition Elmasri Solutions Manual
Fundamentals of Database Systems 6th Edition Elmasri Solutions ManualFundamentals of Database Systems 6th Edition Elmasri Solutions Manual
Fundamentals of Database Systems 6th Edition Elmasri Solutions Manual
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 

Viewers also liked (11)

Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Clustering training
Clustering trainingClustering training
Clustering training
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 

Similar to Association Analysis

Software requirementspecification
Software requirementspecificationSoftware requirementspecification
Software requirementspecification
oshin-japanese
 
Businesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docxBusinesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docx
dewhirstichabod
 
Refining The System Definition
Refining The System DefinitionRefining The System Definition
Refining The System Definition
Sandeep Ganji
 
 risk-based approach of managing information systems is a holistic.docx
 risk-based approach of managing information systems is a holistic.docx risk-based approach of managing information systems is a holistic.docx
 risk-based approach of managing information systems is a holistic.docx
odiliagilby
 

Similar to Association Analysis (20)

IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
IRJET-Comparative Analysis of  Apriori and Apriori with Hashing AlgorithmIRJET-Comparative Analysis of  Apriori and Apriori with Hashing Algorithm
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using Apriori
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
Software requirementspecification
Software requirementspecificationSoftware requirementspecification
Software requirementspecification
 
20IT501_DWDM_PPT_Unit_III.ppt
20IT501_DWDM_PPT_Unit_III.ppt20IT501_DWDM_PPT_Unit_III.ppt
20IT501_DWDM_PPT_Unit_III.ppt
 
viva_dd.pptx
viva_dd.pptxviva_dd.pptx
viva_dd.pptx
 
20IT501_DWDM_U3.ppt
20IT501_DWDM_U3.ppt20IT501_DWDM_U3.ppt
20IT501_DWDM_U3.ppt
 
Businesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docxBusinesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docx
 
Association Rule based Recommendation System using Big Data
Association Rule based Recommendation System using Big DataAssociation Rule based Recommendation System using Big Data
Association Rule based Recommendation System using Big Data
 
A wrapper for QuantLib and reference data
A wrapper for QuantLib and reference dataA wrapper for QuantLib and reference data
A wrapper for QuantLib and reference data
 
Profitable Itemset Mining using Weights
Profitable Itemset Mining using WeightsProfitable Itemset Mining using Weights
Profitable Itemset Mining using Weights
 
Customer Decision Support System
Customer Decision Support SystemCustomer Decision Support System
Customer Decision Support System
 
Refining The System Definition
Refining The System DefinitionRefining The System Definition
Refining The System Definition
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
Predicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsPredicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithms
 
 risk-based approach of managing information systems is a holistic.docx
 risk-based approach of managing information systems is a holistic.docx risk-based approach of managing information systems is a holistic.docx
 risk-based approach of managing information systems is a holistic.docx
 
Lecture7 use case modeling
Lecture7 use case modelingLecture7 use case modeling
Lecture7 use case modeling
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 

More from guest0edcaf (6)

Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Association Analysis

  • 2. Association Analysis-Definition Association Analysis is the task of uncovering relationships among data. Association rules: It is a model that identifies how the data items are associated with each other. Ex: It is used in retail sales to identify that are frequently purchased together.
  • 3.
  • 4. Strength of a rule How certain is the rule? Confidence measures the certainty of a rule It is the percentage of transactions containing all items stated in the condition that also contain the items in result Confidence (A ,B) = P(B | A) Example: The rule "If Coke then Oranje Juice" has a confidence of 100%
  • 5. Strength of a rule How often is the rule occurred? Support measures the usefulness of a rule It is the percentage of transactions that contains all items in the rule Support (A , B) = P(A ,B) Example: For the rule If Coke then Oranj juice In all 5 transactions, 2 contains both coke and OJ The support of the rule is 40% 
  • 6. Association Rule Mining Two-step process Find all frequent k-item sets, k=1, 2, 3, … All items in a rule is referred as an itemset Rules that contains k item forms a k-itemset The occurrence frequency of an k-itemset is the number of transactions that contain all k items in the itemset An itemset satisfies a minimum support (or minimum occurrence frequency) is called a frequent itemset
  • 7. Association Rule Mining 2.Generate strong association rules from the frequent k-itemsets Rules satisfy both a minimum support threshold and a minimum confidence threshold are called strong rules
  • 8. Apriori Algorithm: Find all frequent k-item sets Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent
  • 10. Apriori Algorithm Method: Let k=1 Generate frequent itemsets of length 1 Repeat until no new frequent itemsets are identified Generate length (k+1) candidate itemsets from length k frequent itemsets
  • 11. Contd… Prune candidate itemsets containing subsets of length k that are infrequent Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent
  • 12. Generate strong association rules from the frequent k-itemsets For each frequent k-itemset, generate all non-empty subsets Fore every nonempty subset, generate the rule and the associated confidence Output the rule if the minimum confidence threshold is satisfied
  • 13. Multilevel association rules Difficult to find strong associations at very low or primitive levels of data   Few people may buy "IBM desktop computer" and "Sony b/w printer" together Many people may purchase "computer" and "printer" together
  • 14. Concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level EX: IBM  Microsoft  Hp ……… computer  software  printer  accessory 
  • 15. Steps to be followed Top-down, progressive deepening approach First mine high-level frequent items Then mine their lower level frequent items and so on At each level, Apriori algorithm is used Use uniform minimum support for all levels, or Use reduced minimum support at lower levels
  • 16. Sequential Association Rule  Concerns sequences of events New homeowners purchase shower curtains before purchasing furniture When a customer goes into a bank branch and ask for an account reconciliation, there is a good chance that he or she will close all his or her accounts
  • 17. Sequential Association Rule  Transaction must have two additional features: a time stamp or sequencing information to determine when transactions occurred relative to each other identifying information, such as account number or id number
  • 18. Some important parameters Duration duration may be the entire available sequence in the database, or a user selected subsequence, such as year 1999 Event folding window a set of events occurring within a specified period of time, such as within the same day, can be viewed as occurring together.
  • 19. Some important parameters Interval between events in the discovered pattern 0 interval means to find strictly consecutive sequences min_int <= interval <= max_int means to find patterns that are separated by at least min_int at most max_int interval = c, to find patterns carrying an exact interval
  • 20. Some Practical Issues  Time window of transactions Level of aggregation Level of support and confidence
  • 21. Time window of transactions Select a time window for the transaction covers at least 2 product cycles e.g. customer purchases a product with a frequency of six month or less, select a 12-month window of customer transaction data For frequently purchased products, a short time window is sufficient For low frequency items, a longer time window is necessary.
  • 22. Level of aggregation If product codes in the data are too specific (such as based on product details such as size and flavour), few associations will be discovered Group products into categories according to the product hierarchy or create new level manually
  • 23. Level of support and confidence Start with a high support and gradually reduce it Set confidence to around 50% to reduce the number of permutation
  • 24. Conclusion Association analysis rules such as multidimensional and sequential association rules are studied. Apriori algorithm is described in detail Various practical issues in association rules are analyzed.
  • 25. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net