SlideShare a Scribd company logo
1 of 31
VAIBHAV DHATTARWAL
CSE IDD
5TH YEAR
08211018
UNDER THE GUIDANCE OF
DR.DURGA TOSHNIWAL
Artificial Neural Networks based Data
Mining Techniques
Introduction
 Introduction to Knowledge Discovery in Databases
Process and components of the Data Mining Process.
 The various Data Mining Techniques and a brief
description of these techniques.
 A brief overview of artificial neural networks and their
position as an applicable tool in data mining.
 Applications of the techniques available to data mining
practitioners, including Artificial Neural Networks,
Regression, and Decision Trees.
Presentation Overview
 KDD Process
 Data Mining
 CRISP-DM Model
 Mining Techniques
 Artificial Neural Networks
 Back Propagation Algorithm
 Applications
 Conclusion
Knowledge Discovery in Databases(KDD) Process
Knowledge Discovery in Databases(KDD) Process
 The Knowledge Discovery in Databases (KDD) process is
commonly defined with the stages:
 Selection
 Pre-processing
 Transformation
 Data Mining
 Interpretation/Evaluation.
Data Mining
 Data mining is the term used to describe the process of
extracting value from a database. A Data-warehouse is a
location where information is stored. The type of data stored
depends largely on the type of industry and the company.
 Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), is the process that attempts to
discover patterns in large data sets.
 It utilizes methods at the intersection of artificial
intelligence, machine learning, statistics, and database
systems. The overall goal of the data mining process is to
extract information from a data set and transform it into an
understandable structure for further use.
Data Mining Process : Steps Involved
 Data cleaning The task of this step is to remove noise
and inconsistent data.
 Data integration In this step, multiple data sources like
the ones mentioned in the section above can be
combined to an integrated collection of data.
 Data selection All the data relevant to the analysis task
is retrieved from the database in this step.
 Data transformation The data is transformed or
consolidated into forms appropriate for mining by
performing summary or aggregation operations.
Data Mining Process : Steps Involved
 Data mining The critical step where intelligent methods
are applied in order to extract data patterns.
 Pattern evaluation This step is deployed to identify the
truly interesting patterns representing knowledge
based on certain measures.
 Knowledge presentation In the final step, various
visualization and knowledge representation techniques
are used to present the mined knowledge to the user.
Data Mining Functions
 Classification: It infers the defining characteristics of a
certain group.
 Clustering: It identifies groups of items that share a
particular characteristic.
 Association: It identifies relationships between events
that occur at one time.
 Sequencing: It is similar to association, except that the
relationship exists over a period of time.
 Forecasting: It estimates future values based on patterns
within large sets of data.
Data Mining : Data Types
 Data Mining is performed on the following types of data :
 Relational databases
 Data warehouses
 Transactional databases
 Advanced DB and information repositories
Cross-Industry Standard Process for Data Mining
(CRISP-DM) Model
 Business understanding - In this phase, the business objectives must be
understood clearly by finding out what the client really want to achieve.
Next, we have to assess the situation by finding about the resources,
assumptions, constraints and other important factors. Then from the
business objectives and current situations, we need to create goals to
achieve the business objective within the current situation.
 Data understanding - This phase starts with initial data collection from
available sources to get familiar with data. Data load and Data integration
are carried out to ensure successful data collection. Then, the data need to
be explored by tackling the data mining questions, which can be addressed
using querying, reporting and visualization. Finally, we must check whether
the acquired data is complete, and ensure that there are no missing values
in the acquired data.
 Data preparation - The data preparation normally consumes about 90% of
the time. The outcome of the data preparation phase is the final data set.
When the available data sources are identified, they need to be selected,
cleaned, constructed and formatted into the desired form.
Cross-Industry Standard Process for Data Mining
(CRISP-DM) Model
 Modelling - Several modelling techniques are selected to be used for the
prepared dataset. A test scenario must be generated to validate the
model’s quality. One or more models are created by running the modelling
tool on the prepared dataset. The created models need to be assessed
carefully so that they meet business initiatives.
 Evaluation - In the evaluation phase, the model results must be evaluated
in the context of business objectives in the first phase. In this phase, new
business requirements may be raised due to new patterns has been
discovered in the model results or from other factors. Gaining business
understanding is an iterative process in data mining. The final decision
must be made in this step to move to the deployment phase.
 Deployment - The knowledge or information gained through data mining
process needs to be presented in such a way that it can be used, whenever
it is desired. From project point of view, the final evaluation of the project
needs to summarize the project experiences and review the project to see
what needs to be improved.
Cross-Industry Standard Process for Data Mining
(CRISP-DM) Model
Data Mining Techniques : Classification
 Classification is the most commonly applied data mining technique,
which employs a set of pre-classified examples to develop a model
that can classify the population of records at large.
 Classification is a classic data mining technique based on machine
learning. Basically classification is used to classify each item in a set
of data into one of predefined set of classes or groups.
 The data classification process involves learning and classification.
 In Learning, the training data are analyzed by classification algorithm.
 In classification, test data are used to estimate the accuracy of the classification
rules. If the accuracy is acceptable, the rules can be applied to the new data
tuples.
 Classification method makes use of mathematical techniques such
as decision trees, linear programming, neural network and statistics.
In classification, we make the software that can learn how to classify
the data items into groups.
Data Mining Techniques : Clustering
 Clustering can be defined as identification of similar classes of objects.
 Clustering is a data mining technique that makes meaningful or useful cluster of
objects that have similar characteristic using automatic technique. By using
clustering techniques we can further identify dense and sparse regions in object
space and can discover overall distribution pattern and correlations among data
attributes.
 Due to the fact that classification approach can become costly, Clustering can be
used as pre-processing approach for attribute subset selection and classification.
 In clustering technique, the classes are defined and accordingly objects are put in
them, whereas in classification objects are assigned into predefined classes.
Data Mining Techniques : Regression
 Regression analysis helps in understanding how the typical value of the
dependent variable changes when any one of the independent variables is
varied, while the other independent variables are held fixed. In other
words, it estimates the average value of the dependent variable when the
independent variables are fixed.
 In all cases, the estimation target is a function of the independent
variables called the regression function. In regression analysis, it is also of
interest to characterize the variation of the dependent variable around the
regression function, which can be described by a probability distribution.
Data Mining Techniques : Association Rules
 Association is one of the best known data mining technique.
In association, a pattern is discovered based on a relationship
of a particular item on other items in the same transaction.
 Association and correlation is usually to find frequent item set
findings among large data sets. This type of finding helps
businesses to make certain decisions, such as catalogue
design, cross marketing and customer shopping behaviour
analysis.
 Association rules are usually required to satisfy a user-
specified minimum support and a user-specified minimum
confidence at the same time
Data Mining Techniques : Neural Networks
 An Artificial Neural Network (ANN), usually called neural
network (NN), is a mathematical model or computational
model that is inspired by the structure and functional aspects
of biological neural networks.
 A neural network consists of an interconnected group of
artificial neurons, and it processes information using a
connection based approach to computation.
 In most cases an ANN is an adaptive system that changes its
structure based on external or internal information that flows
through the network during the learning phase.
 Modern neural networks are non-linear statistical data
modelling tools. They are usually used to model complex
relationships between inputs and outputs or to find patterns
in data.
Artificial Neural Network
Artificial Neural Network
 Neural networks are non-linear statistical data modelling
tools. They can be used to model complex relationships
between inputs and outputs; or to find patterns in data and to
infer rules from them.
 Neural networks are useful in providing information on
associations, classifications, clusters, and forecasting. Using
neural networks as a tool, data warehousing firms can harvest
information from datasets in the data mining process.
 Neural networks are used to estimate sampled functions
when we do not know the form of the functions.
 The two abilities: pattern recognition and function estimation
make neural networks a very prevalent utility in data mining.
With their model-free estimators and their dual nature, neural
networks serve data mining in a variety of ways.
Feed Forward Neural Network
 Input data is presented to the network and propagated through the
network until it reaches the output layer. This forward process
produces a predicted output.
 The predicted output is subtracted from the actual output and an
error value for the networks is calculated.
 The neural network then uses supervised learning, which in most
cases is back propagation, to train the network. Back propagation is
a learning algorithm for adjusting the weights. It starts with the
weights between the output layer PE’s and the last hidden layer PE’s
and works backwards through the network.
 Once back propagation has finished, the forward process starts
again, and this cycle is continued until the error between predicted
and actual outputs is minimized.
Feed Forward Neural Network : Training
Back Propagation Algorithm
 Initialize the weights in the network
 Do
 For each example E in the training set
 O = neural-net-output (network, e); forward pass
 T = teacher output for e
 Calculate error (T - O) at the output units
 Compute delta_wh for all weights from hidden layer to output layer ;
backward pass
 Compute delta_wi for all weights from input layer to hidden layer ;
backward pass continued
 Update the weights in the network
 Until all examples classified correctly or stopping criterion
satisfied
 Return the network
Back Propagation Algorithm
 Phase 1: Propagation
 Every propagation involves the following steps:
 Forward propagation of a training pattern's input through the neural
network.
 Backward propagation of the propagation's output activations through the
neural network using the training pattern's target.
 Phase 2: Weight update
 For each weight-synapse the following steps are used:
 Multiply its output delta and input activation to get the gradient of the
weight.
 Bring the weight in the opposite direction of the gradient by subtracting a
ratio of it from the weight.
 Repeat phase 1 and 2 until the performance of the network is
satisfactory.
Applications : Spatial Data Mining
Spatial Data Mining
 Spatial Data Cube Construction
 As with relational data, we can integrate spatial data to
construct a data warehouse that facilitates spatial data
mining. A spatial data warehouse is a subject-oriented,
integrated, time variant and non-volatile collection of both
spatial and non-spatial data in support of spatial data mining
and spatial-data-related decision-making processes.
 There are three types of dimensions in a spatial data cube:
 A non spatial dimension
 A spatial-to-non spatial dimension
 A spatial-to-spatial dimension
Applications : Text Mining
Web Mining
 The World Wide Web serves as a huge, widely distributed, global
information service centre for news, advertisements, consumer
information, financial management, education, government, e-
commerce, and many other information services. The Web also
contains a rich and dynamic collection of hyperlink information and
Web page access and usage information, providing rich sources for
data mining.
 Challenges:
 The Web seems to be too huge for effective data warehousing and data mining
 The complexity of Web pages is far greater than that of any traditional text
document collection
 The Web is a highly dynamic information source
 The Web serves a broad diversity of user communities
 Only a small portion of the information on the Web is truly relevant or useful
 Besides mining Web contents and Web linkage structures, another
important task for Web mining is Web usage mining.
Applications : Intrusion Detection
 The security of our computer systems and data is at continual
risk. The extensive growth of the Internet and increasing
availability of tools and tricks for intruding and attacking
networks have prompted intrusion detection to become a
critical component of network administration. Some areas in
which data mining technology is being applied or further
developed for intrusion detection:
 Development of data mining algorithms for intrusion detection
 Association and correlation analysis, and aggregation to help select and
build discriminating attributes
 Analysis of stream data
 Distributed data mining
 Visualization and querying tools
Conclusions
 Although the basic steps in data mining include data cleaning, selection and
transformation; the functions and techniques are only applied in the vital step where
intelligent methods are used to detect patterns.
 Cross Industry Standard Process for Data Mining Model is an effective approach to a
model which considers business requirements at every step.
 Classification and Clustering techniques are popular and easily applicable in data
mining, however classification we require prior characteristic information.
 Artificial Neural Networks can be deployed to detect patterns and make predictions
which make them capable tools in data mining. A feed forward neural network uses a
back propagation algorithm to train itself.
 The application of data mining techniques along with GIS techniques makes for a
potential opportunity to explore various aspects of Spatial Data Mining.
 The growth of data available for processing, as well as multimedia elements and the
world wide web leads to greater opportunities for data mining techniques. However
the pre-processing, selection and transformation needs to be handled first.
Seminar Presentation

More Related Content

What's hot

A new hybrid algorithm for business intelligence recommender system
A new hybrid algorithm for business intelligence recommender systemA new hybrid algorithm for business intelligence recommender system
A new hybrid algorithm for business intelligence recommender systemIJNSA Journal
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...Editor IJMTER
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceIJDKP
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology rebeccatho
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesRajendran
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Publishing House
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...iosrjce
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 

What's hot (18)

A new hybrid algorithm for business intelligence recommender system
A new hybrid algorithm for business intelligence recommender systemA new hybrid algorithm for business intelligence recommender system
A new hybrid algorithm for business intelligence recommender system
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
 
Ch35
Ch35Ch35
Ch35
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Data mining
Data miningData mining
Data mining
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Ghhh
GhhhGhhh
Ghhh
 

Viewers also liked

Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handoutokeee
 
Machine Learning for Data Mining
Machine Learning for Data MiningMachine Learning for Data Mining
Machine Learning for Data MiningBhuban Roy
 
Basics Of Neural Network Analysis
Basics Of Neural Network AnalysisBasics Of Neural Network Analysis
Basics Of Neural Network Analysisbladon
 
Bee algorithm
Bee algorithmBee algorithm
Bee algorithmkousick
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Deep Learning
Deep LearningDeep Learning
Deep LearningJun Wang
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
neural network
neural networkneural network
neural networkSTUDENT
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 

Viewers also liked (13)

Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handout
 
Machine Learning for Data Mining
Machine Learning for Data MiningMachine Learning for Data Mining
Machine Learning for Data Mining
 
Basics Of Neural Network Analysis
Basics Of Neural Network AnalysisBasics Of Neural Network Analysis
Basics Of Neural Network Analysis
 
Bee algorithm
Bee algorithmBee algorithm
Bee algorithm
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Data mining
Data   miningData   mining
Data mining
 
01 introduction to data mining
01 introduction to data mining01 introduction to data mining
01 introduction to data mining
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
neural network
neural networkneural network
neural network
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 

Similar to Seminar Presentation

Similar to Seminar Presentation (20)

Unit i
Unit iUnit i
Unit i
 
KDD assignmnt data.docx
KDD assignmnt data.docxKDD assignmnt data.docx
KDD assignmnt data.docx
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
data mining
data miningdata mining
data mining
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
DM
DMDM
DM
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Unit II.pdf
Unit II.pdfUnit II.pdf
Unit II.pdf
 
G045033841
G045033841G045033841
G045033841
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining Techniq
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 

Seminar Presentation

  • 1. VAIBHAV DHATTARWAL CSE IDD 5TH YEAR 08211018 UNDER THE GUIDANCE OF DR.DURGA TOSHNIWAL Artificial Neural Networks based Data Mining Techniques
  • 2. Introduction  Introduction to Knowledge Discovery in Databases Process and components of the Data Mining Process.  The various Data Mining Techniques and a brief description of these techniques.  A brief overview of artificial neural networks and their position as an applicable tool in data mining.  Applications of the techniques available to data mining practitioners, including Artificial Neural Networks, Regression, and Decision Trees.
  • 3. Presentation Overview  KDD Process  Data Mining  CRISP-DM Model  Mining Techniques  Artificial Neural Networks  Back Propagation Algorithm  Applications  Conclusion
  • 4. Knowledge Discovery in Databases(KDD) Process
  • 5. Knowledge Discovery in Databases(KDD) Process  The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages:  Selection  Pre-processing  Transformation  Data Mining  Interpretation/Evaluation.
  • 6. Data Mining  Data mining is the term used to describe the process of extracting value from a database. A Data-warehouse is a location where information is stored. The type of data stored depends largely on the type of industry and the company.  Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), is the process that attempts to discover patterns in large data sets.  It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
  • 7. Data Mining Process : Steps Involved  Data cleaning The task of this step is to remove noise and inconsistent data.  Data integration In this step, multiple data sources like the ones mentioned in the section above can be combined to an integrated collection of data.  Data selection All the data relevant to the analysis task is retrieved from the database in this step.  Data transformation The data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations.
  • 8. Data Mining Process : Steps Involved  Data mining The critical step where intelligent methods are applied in order to extract data patterns.  Pattern evaluation This step is deployed to identify the truly interesting patterns representing knowledge based on certain measures.  Knowledge presentation In the final step, various visualization and knowledge representation techniques are used to present the mined knowledge to the user.
  • 9. Data Mining Functions  Classification: It infers the defining characteristics of a certain group.  Clustering: It identifies groups of items that share a particular characteristic.  Association: It identifies relationships between events that occur at one time.  Sequencing: It is similar to association, except that the relationship exists over a period of time.  Forecasting: It estimates future values based on patterns within large sets of data.
  • 10. Data Mining : Data Types  Data Mining is performed on the following types of data :  Relational databases  Data warehouses  Transactional databases  Advanced DB and information repositories
  • 11. Cross-Industry Standard Process for Data Mining (CRISP-DM) Model
  • 12.  Business understanding - In this phase, the business objectives must be understood clearly by finding out what the client really want to achieve. Next, we have to assess the situation by finding about the resources, assumptions, constraints and other important factors. Then from the business objectives and current situations, we need to create goals to achieve the business objective within the current situation.  Data understanding - This phase starts with initial data collection from available sources to get familiar with data. Data load and Data integration are carried out to ensure successful data collection. Then, the data need to be explored by tackling the data mining questions, which can be addressed using querying, reporting and visualization. Finally, we must check whether the acquired data is complete, and ensure that there are no missing values in the acquired data.  Data preparation - The data preparation normally consumes about 90% of the time. The outcome of the data preparation phase is the final data set. When the available data sources are identified, they need to be selected, cleaned, constructed and formatted into the desired form. Cross-Industry Standard Process for Data Mining (CRISP-DM) Model
  • 13.  Modelling - Several modelling techniques are selected to be used for the prepared dataset. A test scenario must be generated to validate the model’s quality. One or more models are created by running the modelling tool on the prepared dataset. The created models need to be assessed carefully so that they meet business initiatives.  Evaluation - In the evaluation phase, the model results must be evaluated in the context of business objectives in the first phase. In this phase, new business requirements may be raised due to new patterns has been discovered in the model results or from other factors. Gaining business understanding is an iterative process in data mining. The final decision must be made in this step to move to the deployment phase.  Deployment - The knowledge or information gained through data mining process needs to be presented in such a way that it can be used, whenever it is desired. From project point of view, the final evaluation of the project needs to summarize the project experiences and review the project to see what needs to be improved. Cross-Industry Standard Process for Data Mining (CRISP-DM) Model
  • 14. Data Mining Techniques : Classification  Classification is the most commonly applied data mining technique, which employs a set of pre-classified examples to develop a model that can classify the population of records at large.  Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups.  The data classification process involves learning and classification.  In Learning, the training data are analyzed by classification algorithm.  In classification, test data are used to estimate the accuracy of the classification rules. If the accuracy is acceptable, the rules can be applied to the new data tuples.  Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we make the software that can learn how to classify the data items into groups.
  • 15. Data Mining Techniques : Clustering  Clustering can be defined as identification of similar classes of objects.  Clustering is a data mining technique that makes meaningful or useful cluster of objects that have similar characteristic using automatic technique. By using clustering techniques we can further identify dense and sparse regions in object space and can discover overall distribution pattern and correlations among data attributes.  Due to the fact that classification approach can become costly, Clustering can be used as pre-processing approach for attribute subset selection and classification.  In clustering technique, the classes are defined and accordingly objects are put in them, whereas in classification objects are assigned into predefined classes.
  • 16. Data Mining Techniques : Regression  Regression analysis helps in understanding how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. In other words, it estimates the average value of the dependent variable when the independent variables are fixed.  In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.
  • 17. Data Mining Techniques : Association Rules  Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction.  Association and correlation is usually to find frequent item set findings among large data sets. This type of finding helps businesses to make certain decisions, such as catalogue design, cross marketing and customer shopping behaviour analysis.  Association rules are usually required to satisfy a user- specified minimum support and a user-specified minimum confidence at the same time
  • 18. Data Mining Techniques : Neural Networks  An Artificial Neural Network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and functional aspects of biological neural networks.  A neural network consists of an interconnected group of artificial neurons, and it processes information using a connection based approach to computation.  In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase.  Modern neural networks are non-linear statistical data modelling tools. They are usually used to model complex relationships between inputs and outputs or to find patterns in data.
  • 20. Artificial Neural Network  Neural networks are non-linear statistical data modelling tools. They can be used to model complex relationships between inputs and outputs; or to find patterns in data and to infer rules from them.  Neural networks are useful in providing information on associations, classifications, clusters, and forecasting. Using neural networks as a tool, data warehousing firms can harvest information from datasets in the data mining process.  Neural networks are used to estimate sampled functions when we do not know the form of the functions.  The two abilities: pattern recognition and function estimation make neural networks a very prevalent utility in data mining. With their model-free estimators and their dual nature, neural networks serve data mining in a variety of ways.
  • 22.  Input data is presented to the network and propagated through the network until it reaches the output layer. This forward process produces a predicted output.  The predicted output is subtracted from the actual output and an error value for the networks is calculated.  The neural network then uses supervised learning, which in most cases is back propagation, to train the network. Back propagation is a learning algorithm for adjusting the weights. It starts with the weights between the output layer PE’s and the last hidden layer PE’s and works backwards through the network.  Once back propagation has finished, the forward process starts again, and this cycle is continued until the error between predicted and actual outputs is minimized. Feed Forward Neural Network : Training
  • 23. Back Propagation Algorithm  Initialize the weights in the network  Do  For each example E in the training set  O = neural-net-output (network, e); forward pass  T = teacher output for e  Calculate error (T - O) at the output units  Compute delta_wh for all weights from hidden layer to output layer ; backward pass  Compute delta_wi for all weights from input layer to hidden layer ; backward pass continued  Update the weights in the network  Until all examples classified correctly or stopping criterion satisfied  Return the network
  • 24. Back Propagation Algorithm  Phase 1: Propagation  Every propagation involves the following steps:  Forward propagation of a training pattern's input through the neural network.  Backward propagation of the propagation's output activations through the neural network using the training pattern's target.  Phase 2: Weight update  For each weight-synapse the following steps are used:  Multiply its output delta and input activation to get the gradient of the weight.  Bring the weight in the opposite direction of the gradient by subtracting a ratio of it from the weight.  Repeat phase 1 and 2 until the performance of the network is satisfactory.
  • 25. Applications : Spatial Data Mining
  • 26. Spatial Data Mining  Spatial Data Cube Construction  As with relational data, we can integrate spatial data to construct a data warehouse that facilitates spatial data mining. A spatial data warehouse is a subject-oriented, integrated, time variant and non-volatile collection of both spatial and non-spatial data in support of spatial data mining and spatial-data-related decision-making processes.  There are three types of dimensions in a spatial data cube:  A non spatial dimension  A spatial-to-non spatial dimension  A spatial-to-spatial dimension
  • 28. Web Mining  The World Wide Web serves as a huge, widely distributed, global information service centre for news, advertisements, consumer information, financial management, education, government, e- commerce, and many other information services. The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.  Challenges:  The Web seems to be too huge for effective data warehousing and data mining  The complexity of Web pages is far greater than that of any traditional text document collection  The Web is a highly dynamic information source  The Web serves a broad diversity of user communities  Only a small portion of the information on the Web is truly relevant or useful  Besides mining Web contents and Web linkage structures, another important task for Web mining is Web usage mining.
  • 29. Applications : Intrusion Detection  The security of our computer systems and data is at continual risk. The extensive growth of the Internet and increasing availability of tools and tricks for intruding and attacking networks have prompted intrusion detection to become a critical component of network administration. Some areas in which data mining technology is being applied or further developed for intrusion detection:  Development of data mining algorithms for intrusion detection  Association and correlation analysis, and aggregation to help select and build discriminating attributes  Analysis of stream data  Distributed data mining  Visualization and querying tools
  • 30. Conclusions  Although the basic steps in data mining include data cleaning, selection and transformation; the functions and techniques are only applied in the vital step where intelligent methods are used to detect patterns.  Cross Industry Standard Process for Data Mining Model is an effective approach to a model which considers business requirements at every step.  Classification and Clustering techniques are popular and easily applicable in data mining, however classification we require prior characteristic information.  Artificial Neural Networks can be deployed to detect patterns and make predictions which make them capable tools in data mining. A feed forward neural network uses a back propagation algorithm to train itself.  The application of data mining techniques along with GIS techniques makes for a potential opportunity to explore various aspects of Spatial Data Mining.  The growth of data available for processing, as well as multimedia elements and the world wide web leads to greater opportunities for data mining techniques. However the pre-processing, selection and transformation needs to be handled first.